The Proceedings of the Eighth International Conference on Creationism (2018)

sets described as follows. The full Gencode22 data set used in this study contains all comprehensive protein coding gene annotations, all comprehensive lncRNA gene annotations, all polyA features (polyA_signal, polyA_site, pseudo_polyA), 2-way consensus (retrotransposed) pseudogenes, and tRNA genes (Derrien et al. 2012). In total, there are 195,178 genic features and their corresponding coordinates in the BED file used for intersecting ITS sites (Table 1). Approximately 2.7% of these genic features contained at least one ITS site of 2 repeats or more. Over 5,000 ITS sites of two repeats or larger intersected with these various Gencode22 annotations. Long intergenic noncoding RNAs (lincRNA) are long noncoding RNA genes located in the intergenic protein space in the genome. Like lncRNA genes, lincRNA genes have complex promoters, are alternatively spliced and transcribed and tend to be highly cell- type specific in their expression patterns (Ulitsky and Bartel 2013). They comprise a hotly pursued area of biomedical research due to their association with human health and cellular development (Guttman et al. 2011; Ulitsky et al. 2011; Batista and Chang 2013). Two different lincRNAdatasets were queried: the UCSC lincRNAs and a much larger lincRNAdataset from a publication by Hangauer et al. (2013) which produced 730 and 300 ITS site intersections, respectively (Table 1). Enhancer regions in the genome regulate the proper temporal and cell type specific activation of gene expression in higher eukaryotes (Dickel et al. 2013;Andersson et al. 2014). Both transcription factor binding and transcription start sites are hallmarks of enhancers. Two data sets of enhancers were used in this study. Robust enhancers are transcribed at a significant expression level in at least one primary cell or tissue sample while all known transcribed enhancers comprise the permissive set, producing numbers of ITS intersections of 63 and 64, respectively (Table 1). Transcription start site associations (TSS) are defined as TSSs that correlate with transcriptional, epigenetic, and transcription factor binding within 500 kb of the TSS (Andersson et al. 2014). The goal of such research is to link enhancers to their target genes. Therefore, a dataset of 64,621 enhancer TSS associations was queried with the ITS sites in which 5,002 intersections were found (Table 1). A surprisingly large 8% of these TSS associated regions intersected with ITS sites. Transcription factor binding sites are determinedvia the biochemical association (binding) of transcription factors to genomic DNA sequence (Furey 2012; Mundade et al. 2014). A comprehensive data set of transcription factor binding sites comprising 8.8 million genomic locations across the human genome (Griffon et al. 2015) was queried with ITS site resulting in 4,489 intersections. Given that much of the evolutionary speculation surrounding the implications of ITS sites as being chromosomal aberrations and playing a role in chromosome breakage and human disease, I also decided to determine if they could be associated with known heritable disease. Therefore, a dataset of 8,801 inherited disease loci developed by the Illumina Corporation and used in the screening of human disease (Kingsmore 2012; Saunders et al. 2012) was queried with the ITS sites. The database contains 550 genes, including coding exons, intron-exon boundaries, and regions harboring pathogenic mutations. Only 5 ITS sites could be intersected with disease related loci, and they were all degenerate ITS of 12 bases in length. This does not implicate them as being a part of the pathology of the locus, but that they were found in these genomic segments. Interestingly, all five were located in exons of protein coding genes. One ITS site was located in the last exon of the peroxisomal biogenesis factor 10 (PEX10) gene on chromosome 1. A second was located in the last exon of the alkylglycerone phosphate synthase (AGPS) gene on chromosome 2. A third was located in the last exon of the desmoplakin (DSP) gene on chromosome 6. A fourth was located in the last exon of the tripeptidyl peptidase 1 (TPP1) gene on chromosome 11. The fifth Tomkins ◀ Interstitial telomeres and chromosome 2 fusion ▶ 2018 ICC 226 Data set Number data set entries Number perfect TTAGGG intersections Number Degenerate TTNGGG intersections Number perfect CCCTAA intersections Number Degenerate CCCNAA intersections Total ITS intersections Gencode22 195,178 258 2,347 249 2,343 5,197 UCSC lincRNAs 21,131 59 299 85 287 730 lincRNAs from Hangauer et al. (2013) 59,177 6 138 26 130 300 Permissive enhancers 42,888 3 24 3 34 64 Robust enhancers 38,443 3 23 3 34 63 Enhancer transcription start site associated regions 64,621 204 2,242 258 2,298 5,002 Remap transcription factor binding 8,822,477 498 1,740 445 1,806 4,489 Tru-sight inherited disease 8,801 0 4 0 1 5 Table 1. Results from the intersection of genome-wide ITS sites two repeats or larger with various ENCODE-related data sets.

RkJQdWJsaXNoZXIy MTM4ODY=