Stratified Random Picks
To make the stratified picks the genome is divided
into the top 20%, middle 30%, and bottom 50% along two axis - gene
density and nontranscribed conservation. Then three random picks
are taken from each strata, and a fourth pick in the strata that
are underrepresented in the manual picks. One additional backup
pick is made in each strata in case there is an unforeseen technical
problem with a region. The backup pick is parenthesized below.
The left coordinate is the June genomic position for the
feature while
the right coordinate is the November genomic position
for the feature.
consNonTx 0% - 50%, gene 0% - 50% (1 manual)
June:
chr13:28500001-29000000 Nov:
chr13:24500016-25000015 consNonTx 2.8%, gene 0.5%
June:
chr2:51700001-52200000 Nov:
chr2:51837455-52337454 consNonTx 3.8%, gene 0.0%
June:
chr4:119000001-119500000 Nov:
chr4:118527386-119027385 consNonTx 3.9%, gene 0.0%
June:
chr10:54300001-54800000 Nov:
chr10:54489120-54989119 consNonTx 2.8%, gene 1.2%
(June:
chr5:15900001-16400000 Nov:
chr5:16187472-16687471 consNonTx 5.1%, gene 1.7%)
consNonTx 0% - 50%, gene 50% - 80% (4 manual)
June:
chr2:115500001-116000000
Nov: chr2:116215329-116715328 consNonTx 6.2%, gene 2.3%
June:
chr18:61100001-61600000 Nov:
chr18:61234622-61734621 consNonTx 3.4%, gene 3.4%
June:
chr12:40500001-41000000 Nov:
chr12:40239443-40739442 consNonTx 1.7%, gene 3.1%
(June:
chr2:196700001-197200000 Nov:
chr2:197214044-197714043 consNonTx 5.4%, gene 3.3%)
consNonTx 0% - 50%, gene 80% - 100% (11 manual)
June:
chr2:232500001-233000000 Nov:
chr2:233173598-233673597 consNonTx 1.3%, gene 4.6%
June:
chr13:111900001-112400000 Nov:
chr13:107927238-108427237 consNonTx 1.1%, gene 5.5%
June:
chr21:36900001-37400000 Nov:
chr21:36983033-37483032 consNonTx 2.3%, gene 5.2%
(June:
chr4:47800001-48300000 Nov:
chr4:48032776-48532775 consNonTx 1.9%, gene 4.4%)
consNonTx 50% - 80%, gene 0% - 50% (2 manual)
June:
chr16:25300001-25800000 Nov:
chr16:25969826-26469825 consNonTx 9.7%, gene 0.5%
June:
chr5:141800001-142300000 Nov:
chr5:142482586-142982585 consNonTx 6.7%, gene 1.7%
June:
chr18:25400001-25900000 Nov:
chr18:25196197-25696196 consNonTx 7.4%, gene 0.9%
(June:
chr4:124800001-125300000 Nov:
chr4:124166677-124666676 consNonTx 6.3%, gene 0.9%)
consNonTx 50% - 80%, gene 50% - 80% (4 manual)
June:
chr5:56000001-56500000 Nov:
chr5:57392856-57892855 consNonTx 7.9%, gene 2.2%
June:
chr6:131800001-132300000 Nov:
chr6:132023965-132523964 consNonTx 6.9%, gene 2.1%
June:
chr6:73700001-74200000 Nov:
chr6:73699933-74199932 consNonTx 6.4%, gene 3.6%
(June:
chr4:53700001-54200000 Nov:
chr4:53859184-54359183 consNonTx 9.0%, gene 2.1%)
consNonTx 50% - 80%, gene 80% - 100% (3 manual)
June:
chr1:149000001-149500000 Nov:
chr1:146905332-147405331 consNonTx 10.2%, gene 8.4%
June:
chr9:122800001-123300000 Nov:
chr9:123331831-123831830 consNonTx 8.3%, gene 5.9%
June:
chr15:39100001-39600000 Nov:
chr15:36628619-37128618 consNonTx 9.7%, gene 10.6%
(June:
chr17:33400001-33900000 Nov:
chr17:35665792-36165791 consNonTx 7.7%, gene 6.1%)
consNonTx 80% - 100%, gene 0% - 50% (3 manual)
June:
chr14:51200001-51700000 Nov:
chr14:47673341-48173340 consNonTx 14.9%, gene 0.1%
June:
chr11:133100001-133600000 Nov:
chr11:132612235-133112234 consNonTx 13.5%, gene 0.3%
June:
chr16:52600001-53100000 Nov:
chr16:62362206-62862205 consNonTx 15.4%, gene 0.0%
(June:
chrX:41900001-42400000 Nov:
chrX:42149253-42649252 consNonTx 13.4%, gene 0.7%)
consNonTx 80% - 100%, gene 50% - 80% (1 manual)
June:
chr8:117800001-118300000 Nov:
chr8:118874200-119374199 consNonTx 11.4%, gene 3.2%
June:
chr14:96900001-97400000 Nov:
chr14:93204045-93704044 consNonTx 15.9%, gene 2.9%
June:
chrX:117500001-118000000 Nov:
chrX:119675382-120175381 consNonTx 10.7%, gene 2.0%
June:
chr6:108100001-108600000 Nov:
chr6:108287568-108787567 consNonTx 18.6%, gene 2.3%
consNonTx 80% - 100%, gene 80% - 100% (1 manual)
June:
chr2:218300001-218800000 Nov:
chr2:218998720-219498719 consNonTx 13.3%, gene 9.1%
June:
chr11:66700001-67200000 Nov:
chr11:65865884-66365883 consNonTx 13.4%, gene 9.0%
June:
chr20:33600001-34100000 Nov:
chr20:33559944-34059943 consNonTx 11.5%, gene 9.2%
June:
chr6:41300001-41800000 Nov:
chr6:41294331-41794330 consNonTx 15.2%, gene 4.8%
(June:
chr9:124300001-124800000 Nov:
chr9:124831831-125331830 consNonTx 11.4%, gene 5.4%)
Stratification of Manual Picks
Here is the noncoding conservation and gene density of non-overlapping
500 kb regions in the manual picks. The boundaries between strata
are:
low 50% middle 30% high 20%
------------------------------
gene 0.0-1.9% 1.9-4.2% 4.2-100%
consNotTx 0.0-6.3% 6.3-10.6% 10.6-100%
CFTR
June: chr7:114288355-116165780 Nov: chr7:114288155-116165580
Interleukin_Cluster
June: chr5:130778557-131778556 Nov: chr5:131703638-132703637
Apo_Cluster
June: chr11:118810001-119310000 Nov: chr11:117969240-118469239
Chr22
June: chr22:28500001-30200000 Nov: chr22:28500001-30200000
Chr21
June: chr21:30323762-32019746 Nov: chr21:30406794-32102778
ChrX
June: chrX:147250001-148500000 Nov: chrX:149572309-150846234
Chr19
June: chr19:55200001-56200000 Nov: chr19:54724484-55728861
Alpha_Globin
June: chr16:79138-579137 Nov: chr16:10001-510000
Beta_Globin
June: chr11:5550000-6549999 Nov: chr11:5076527-6078118
HOXA_cluster
June: chr7:26600001-27100000 Nov: chr7:26599801-27099800
IGF2/H19
June: chr11:300001-900000 Nov: chr11:1941933-2547980 PROBLEMATIC REGION - SUBSTANTIALLY REARRANGED BETWEEN JUNE AND NOVEMBER BUILDS
FOXP2
June: chr7:112410791-113410790 Nov: chr7:112410791-113410790
Semi-Manual Picks
Here's the stratification of the other zoo-seq regions.
I recommend picking 7q21.13 and 7q31.33 to round things out.
7q21.13
June: chr7:88319137-89433560 Nov: chr7:88318937-89433360
7q21.3
June: chr7:91589227-92559635 Nov: chr7:91589027-92559435
7q21.3
June: chr7:93650712-94868826Nov: chr7:93650512-94868626
7q31.33
June: chr7:124556444-125719632 Nov: chr7:124556244-125719432
7q32.1
June: chr7:126427707-127330661 Nov: chr7:126427507-127330461
Methods
Gene density is defined as percentage of bases covered
either by Ensembl genes, or human mRNA best blat alignments in
the UCSC browser database.
Nontranscribed transcription was measured by a fairly
elaborate process. 125 base non-overlapping subwindows were taken
inside of the 500,000 base windows. Subwindows with less than 75%
of their bases in a mouse alignment were thrown out. For the remaining
subwindows the percentage with at least 80% base identity is used as
the conservation score. To get the nontranscribed conservation score
the mouse alignments in regions corresponding to Ensembl genes, all genbank
mRNA blastz alignments, Fgenesh++ gene predictions, twinScan gene predictions,
spliced EST alignments, and repeats were thrown out.