NCBI Tetranychus urticae Annotation Release 101

The RefSeq genome records for Tetranychus urticae were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. This report presents statistics on the annotation products, the input data used in the pipeline and intermediate alignment results.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction
Comparison of the current and previous annotations: What proportion of the genes changed in this annotation

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as NCBI Tetranychus urticae Annotation Release 101

Annotation release ID: 101
Date of Entrez queries for transcripts and proteins: May 7 2018
Date of submission of annotation to the public databases: May 22 2018
Software version: 8.0

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
ASM23943v1	GCF_000239435.1	DOE Joint Genome Institute	12-20-2011	Reference	1 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	ASM23943v1
Genes and pseudogenes	14,185
protein-coding	11,686
non-coding	2,272
transcribed pseudogenes	36
non-transcribed pseudogenes	191
genes with variants	3,025
immunoglobulin/T-cell receptor gene segments	0
other	0
mRNAs	15,671
fully-supported	14,723
with > 5% ab initio	615
partial	189
with filled gap(s)	53
known RefSeq (NM_)	89
model RefSeq (XM_)	15,582
non-coding RNAs	4,480
fully-supported	4,235
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	4,322
pseudo transcripts	41
fully-supported	41
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	41
CDSs	15,684
fully-supported	14,723
with > 5% ab initio	655
partial	177
with major correction(s)	137
known RefSeq (NP_)	89
model RefSeq (XP_)	15,595

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	13,958	4,601	2,278	70	254,337
All transcripts	20,151	2,449	1,993	43	37,653
mRNA	15,671	2,465	1,978	201	37,653
misc_RNA	381	2,751	2,538	137	9,831
tRNA	156	71	73	43	84
lncRNA	3,854	2,476	2,118	155	12,469
snoRNA	11	103	74	70	215
snRNA	13	147	142	106	192
rRNA	65	1,797	2,166	119	4,184
Single-exon transcripts	1,334	1,636	1,389	300	15,814
coding transcripts (NM_/XM_ )	1,334	1,636	1,389	300	15,814
CDSs	15,684	1,705	1,290	132	36,915
Exons	64,809	529	296	2	27,812
in coding transcripts (NM_/XM_ )	54,610	527	289	2	27,812
in non-coding transcripts (NR_/XR_ )	11,012	531	329	3	8,979
Introns	48,051	826	116	30	251,021
in coding transcripts (NM_/XM_ )	41,122	900	111	30	251,021
in non-coding transcripts (NR_/XR_ )	7,730	432	141	30	57,459

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	1.45	1	1	23
Number of exons per transcript	4.83	4	1	44

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the UniProtKB/Swiss-Prot curated proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 11673 coding genes, 8120 genes had a protein with an alignment covering 50% or more of the query and 1705 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: UniProtKB/Swiss-Prot curated proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker for each assembly. RepeatMasker results are only used for organisms for which a comprehensive repeat library is available.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with RepeatMasker	% Masked with WindowMasker
ASM23943v1	GCF_000239435.1	2.97%	25.74%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez, aligned to the genome by Splign or ProSplign and passed to Gnomon, NCBI's gene prediction software.

Depending on the other evidence available, long 454 reads (with average length above 250 nt) may be aligned as traditional evidence and reported in the Transcript alignments section or aligned with RNA-Seq reads and reported in the RNA-Seq alignments section.

Transcript alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species known RefSeq (NM_/NR_)	89	89 (100.00%)	86 (96.63%)	99.74%	99.87%
Same-species Genbank	188	187 (99.47%)	182 (96.81%)	99.45%	99.80%
Same-species EST	80,855	79,256 (98.02%)	77,450 (95.79%)	99.29%	99.62%

RNA-Seq alignments

The following RNA-Seq reads from the Sequence Read Archive were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Publication	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	NA	Aggregate of all aligned samples	2,660,316,794	86%	13%	60,948
SAMD00015880	NA	Tetranychus urticae, 35 degrees Celsius (Tetranychus urticae, SAMD00015880)	302,472	80%	31%	21,422
SAMD00015882	NA	Tetranychus urticae, 25 degrees Celsius (Tetranychus urticae, SAMD00015882)	279,160	77%	31%	19,630
SAMEA2472061	NA	uninfected (Tetranychus urticae, 1 day, male, SAMEA2472061)	68,217,826	87%	14%	46,035
SAMEA2472062	NA	uninfected (Tetranychus urticae, 1 to 3 day, female, SAMEA2472062)	58,412,258	84%	11%	46,816
SAMEA2472063	NA	Wolbachia infection (Tetranychus urticae, 1 to 3 day, female, SAMEA2472063)	57,962,982	89%	11%	47,354
SAMEA2472064	NA	Wolbachia infection (Tetranychus urticae, 1 day, male, SAMEA2472064)	56,829,688	87%	14%	46,143
SAMN00710352	22113690,22393009	adult_techrep1 (Tetranychus urticae, males and females, SAMN00710352)	8,797,042	67%	6%	25,671
SAMN00710353	22113690,22393009	adult_techrep2 (Tetranychus urticae, males and females, SAMN00710353)	14,234,992	63%	6%	28,736
SAMN00710354	22113690,22393009	embryo_techrep1 (Tetranychus urticae, males and females, SAMN00710354)	13,298,799	65%	8%	35,840
SAMN00710355	22113690,22393009	embryo_techrep2 (Tetranychus urticae, males and females, SAMN00710355)	13,517,520	67%	9%	36,169
SAMN00710356	22113690,22393009	larvae_techrep1 (Tetranychus urticae, males and females, SAMN00710356)	5,233,189	76%	10%	31,703
SAMN00710357	22113690,22393009	larvae_techrep2 (Tetranychus urticae, males and females, SAMN00710357)	12,983,204	66%	9%	37,125
SAMN00710358	22113690,22393009	nymph_techrep1 (Tetranychus urticae, males and females, SAMN00710358)	13,247,797	63%	8%	35,770
SAMN00710359	22113690,22393009	nymph_techrep2 (Tetranychus urticae, males and females, SAMN00710359)	13,303,478	66%	8%	36,218
SAMN00710436	22113690	turt_bean_feeding_rep_1 (Tetranychus urticae, SAMN00710436)	9,415,937	78%	7%	31,589
SAMN00710437	22113690	turt_bean_feeding_rep_2 (Tetranychus urticae, SAMN00710437)	8,590,600	77%	7%	30,998
SAMN00710438	22113690	turt_bla2_feeding_rep_1 (Tetranychus urticae, SAMN00710438)	8,299,790	76%	8%	33,940
SAMN00710439	22113690	turt_bla2_feeding_rep_2 (Tetranychus urticae, SAMN00710439)	9,129,984	73%	8%	34,849
SAMN00710440	22113690	turt_kon_feeding_rep_1 (Tetranychus urticae, SAMN00710440)	7,679,354	79%	8%	33,883
SAMN00710441	22113690	turt_kon_feeding_rep_2 (Tetranychus urticae, SAMN00710441)	7,440,728	78%	8%	32,998
SAMN00769558	NA	Tetranychus urticae Strain DeLier1 454 Transcriptome (Tetranychus urticae, SAMN00769558)	206,110	43%	33%	5,203
SAMN00769561	NA	Tetranychus urticae Strain Santpoort1 454/Transcriptome (Tetranychus urticae, SAMN00769561)	232,646	44%	31%	5,698
SAMN00769562	NA	Tetranychus urticae Strain Houten1 454 Transcriptome (Tetranychus urticae, SAMN00769562)	908,678	52%	29%	22,564
SAMN03068946	NA	whole_body (Tetranychus urticae, 3day adult, female, SAMN03068946)	45,645,526	86%	12%	44,793
SAMN03068947	NA	whole_body (Tetranychus urticae, 4day adult, female, SAMN03068947)	43,846,164	88%	12%	43,345
SAMN03068948	NA	whole_body (Tetranychus urticae, 5day adult, female, SAMN03068948)	52,571,924	88%	12%	48,688
SAMN03068949	NA	whole_body (Tetranychus urticae, 6day adult, female, SAMN03068949)	52,617,464	88%	12%	47,344
SAMN04228845	28001328	whole body (Tetranychus urticae, SAMN04228845)	73,938,686	89%	9%	42,457
SAMN04228846	28001328	whole body (Tetranychus urticae, SAMN04228846)	73,444,192	89%	9%	43,081
SAMN04228847	28001328	whole body (Tetranychus urticae, SAMN04228847)	84,500,424	90%	13%	45,489
SAMN04228848	28001328	whole body (Tetranychus urticae, SAMN04228848)	77,539,280	90%	13%	45,460
SAMN04228849	28001328	whole body (Tetranychus urticae, SAMN04228849)	69,133,788	88%	11%	46,104
SAMN04228850	28001328	whole body (Tetranychus urticae, SAMN04228850)	78,480,616	87%	11%	46,779
SAMN04954980	27703040	Whole body (Tetranychus urticae, SAMN04954980)	59,348,082	91%	11%	47,137
SAMN04954981	27703040	Whole body (Tetranychus urticae, SAMN04954981)	30,790,894	91%	10%	43,259
SAMN04954982	27703040	Whole body (Tetranychus urticae, SAMN04954982)	28,742,752	92%	11%	43,274
SAMN04954983	27703040	Proterosoma (Tetranychus urticae, SAMN04954983)	51,318,270	92%	15%	40,287
SAMN05591486	27797949	Intact adult males, replicate 1 (Tetranychus urticae, male, SAMN05591486)	41,920,556	91%	15%	44,423
SAMN05591487	27797949	Intact adult females, replicate 4 (Tetranychus urticae, female, SAMN05591487)	38,736,584	93%	12%	44,901
SAMN05591488	27797949	Intact adult females, replicate 3 (Tetranychus urticae, female, SAMN05591488)	36,221,908	93%	12%	44,026
SAMN05591489	27797949	Intact adult females, replicate 2 (Tetranychus urticae, female, SAMN05591489)	36,008,868	93%	12%	44,319
SAMN05591490	27797949	Intact adult females, replicate 1 (Tetranychus urticae, female, SAMN05591490)	48,511,846	93%	12%	46,053
SAMN05591491	27797949	Intact adult males, replicate 4 (Tetranychus urticae, male, SAMN05591491)	46,471,646	92%	15%	45,865
SAMN05591492	27797949	Intact adult males, replicate 3 (Tetranychus urticae, male, SAMN05591492)	37,050,590	92%	15%	43,633
SAMN05591493	27797949	Intact adult males, replicate 2 (Tetranychus urticae, male, SAMN05591493)	64,823,174	92%	15%	46,627
SAMN06843505	NA	CsA4 (Tetranychus urticae, SAMN06843505)	43,413,820	86%	16%	44,559
SAMN06843506	NA	CsA3 (Tetranychus urticae, SAMN06843506)	40,018,144	85%	16%	45,703
SAMN06843507	NA	CsA2 (Tetranychus urticae, SAMN06843507)	41,735,118	86%	15%	45,834
SAMN06843508	NA	DEM2 (Tetranychus urticae, SAMN06843508)	41,595,810	84%	15%	45,925
SAMN06843509	NA	DEM1 (Tetranychus urticae, SAMN06843509)	45,335,200	87%	15%	46,433
SAMN06843510	NA	DEF4 (Tetranychus urticae, SAMN06843510)	37,859,232	85%	16%	45,325
SAMN06843511	NA	DEF3 (Tetranychus urticae, SAMN06843511)	38,626,840	85%	16%	45,034
SAMN06843512	NA	CsA1 (Tetranychus urticae, SAMN06843512)	40,926,328	86%	15%	46,612
SAMN06843513	NA	DEM4 (Tetranychus urticae, SAMN06843513)	47,576,666	85%	16%	47,054
SAMN06843514	NA	DEM3 (Tetranychus urticae, SAMN06843514)	49,129,412	86%	16%	47,430
SAMN06843515	NA	DEF2 (Tetranychus urticae, SAMN06843515)	40,922,292	83%	16%	44,308
SAMN06843516	NA	DEF1 (Tetranychus urticae, SAMN06843516)	46,845,940	83%	16%	45,602
SAMN06843517	NA	PBO4 (Tetranychus urticae, SAMN06843517)	42,132,014	84%	16%	44,812
SAMN06843518	NA	PBO3 (Tetranychus urticae, SAMN06843518)	44,693,500	85%	16%	44,558
SAMN06843519	NA	PBO2 (Tetranychus urticae, SAMN06843519)	46,210,988	85%	16%	44,904
SAMN06843520	NA	PBO1 (Tetranychus urticae, SAMN06843520)	45,511,142	84%	16%	44,747
SAMN06843521	NA	FORM4 (Tetranychus urticae, SAMN06843521)	39,721,220	87%	16%	45,269
SAMN06843522	NA	FORM3 (Tetranychus urticae, SAMN06843522)	39,508,160	86%	16%	44,748
SAMN06843523	NA	FORM2 (Tetranychus urticae, SAMN06843523)	46,401,330	86%	16%	45,968
SAMN06843524	NA	FORM1 (Tetranychus urticae, SAMN06843524)	48,144,646	86%	16%	45,674
SAMN06843525	NA	CON4 (Tetranychus urticae, SAMN06843525)	41,751,024	87%	16%	45,404
SAMN06843526	NA	CON3 (Tetranychus urticae, SAMN06843526)	40,418,214	85%	16%	45,863
SAMN06843527	NA	CON2 (Tetranychus urticae, SAMN06843527)	47,988,702	84%	15%	46,238
SAMN06843528	NA	CON1 (Tetranychus urticae, SAMN06843528)	48,929,946	85%	15%	46,344
SAMN08803136	NA	whole body (Tetranychus urticae, SAMN08803136)	54,733,638	73%	16%	48,443

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
DRR001715	DRX001184	DRP000514	SAMD00015880	302,472	80%	31%
DRR001714	DRX001183	DRP000514	SAMD00015882	279,160	77%	31%
ERR486874	ERX452705	ERP005624	SAMEA2472061	68,217,826	87%	14%
ERR486873	ERX452706	ERP005624	SAMEA2472062	58,412,258	84%	11%
ERR486876	ERX452707	ERP005624	SAMEA2472063	57,962,982	89%	11%
ERR486875	ERX452708	ERP005624	SAMEA2472064	56,829,688	87%	14%
SRR332195	SRX092406	SRP007866	SAMN00710352	8,797,042	67%	6%
SRR332196	SRX092407	SRP007866	SAMN00710353	14,234,992	63%	6%
SRR332197	SRX092408	SRP007866	SAMN00710354	13,298,799	65%	8%
SRR332198	SRX092409	SRP007866	SAMN00710355	13,517,520	67%	9%
SRR332199	SRX092410	SRP007866	SAMN00710356	5,233,189	76%	10%
SRR332200	SRX092411	SRP007866	SAMN00710357	12,983,204	66%	9%
SRR332201	SRX092412	SRP007866	SAMN00710358	13,247,797	63%	8%
SRR332202	SRX092413	SRP007866	SAMN00710359	13,303,478	66%	8%
SRR332275	SRX092491	SRP007884	SAMN00710436	9,415,937	78%	7%
SRR332276	SRX092492	SRP007884	SAMN00710437	8,590,600	77%	7%
SRR332277	SRX092493	SRP007884	SAMN00710438	8,299,790	76%	8%
SRR332278	SRX092494	SRP007884	SAMN00710439	9,129,984	73%	8%
SRR332279	SRX092495	SRP007884	SAMN00710440	7,679,354	79%	8%
SRR332280	SRX092496	SRP007884	SAMN00710441	7,440,728	78%	8%
SRR393983	SRX113819	SRP010070	SAMN00769558	206,110	43%	33%
SRR393985	SRX113832	SRP010070	SAMN00769562	230,341	42%	22%
SRR393986	SRX113833	SRP010070	SAMN00769562	678,337	55%	31%
SRR393984	SRX113831	SRP010074	SAMN00769561	232,646	44%	31%
SRR1582618	SRX702381	SRP047203	SAMN03068946	45,645,526	86%	12%
SRR1582619	SRX707632	SRP047203	SAMN03068947	43,846,164	88%	12%
SRR1582616	SRX707633	SRP047203	SAMN03068948	52,571,924	88%	12%
SRR1582617	SRX707634	SRP047203	SAMN03068949	52,617,464	88%	12%
SRR2884609	SRX1405968	SRP065565	SAMN04228845	73,938,686	89%	9%
SRR5275420	SRX2579318	SRP065565	SAMN04228846	73,444,192	89%	9%
SRR5275419	SRX2579319	SRP065565	SAMN04228847	84,500,424	90%	13%
SRR5275418	SRX2579320	SRP065565	SAMN04228848	77,539,280	90%	13%
SRR5275416	SRX2579321	SRP065565	SAMN04228849	69,133,788	88%	11%
SRR5275417	SRX2579322	SRP065565	SAMN04228850	78,480,616	87%	11%
SRR3476863	SRX1743285	SRP074404	SAMN04954980	59,348,082	91%	11%
SRR3476864	SRX1743286	SRP074404	SAMN04954981	30,790,894	91%	10%
SRR3476865	SRX1743287	SRP074404	SAMN04954982	28,742,752	92%	11%
SRR3476866	SRX1743288	SRP074404	SAMN04954983	51,318,270	92%	15%
SRR4043738	SRX2034643	SRP082384	SAMN05591486	41,920,556	91%	15%
SRR4043745	SRX2034650	SRP082384	SAMN05591487	38,736,584	93%	12%
SRR4043744	SRX2034649	SRP082384	SAMN05591488	36,221,908	93%	12%
SRR4043743	SRX2034648	SRP082384	SAMN05591489	36,008,868	93%	12%
SRR4043742	SRX2034647	SRP082384	SAMN05591490	48,511,846	93%	12%
SRR4043741	SRX2034646	SRP082384	SAMN05591491	46,471,646	92%	15%
SRR4043740	SRX2034645	SRP082384	SAMN05591492	37,050,590	92%	15%
SRR4043739	SRX2034644	SRP082384	SAMN05591493	64,823,174	92%	15%
SRR5484474	SRX2767754	SRP105368	SAMN06843505	43,413,820	86%	16%
SRR5484473	SRX2767753	SRP105368	SAMN06843506	40,018,144	85%	16%
SRR5484472	SRX2767752	SRP105368	SAMN06843507	41,735,118	86%	15%
SRR5484468	SRX2767748	SRP105368	SAMN06843508	41,595,810	84%	15%
SRR5484467	SRX2767747	SRP105368	SAMN06843509	45,335,200	87%	15%
SRR5484466	SRX2767746	SRP105368	SAMN06843510	37,859,232	85%	16%
SRR5484465	SRX2767745	SRP105368	SAMN06843511	38,626,840	85%	16%
SRR5484471	SRX2767751	SRP105368	SAMN06843512	40,926,328	86%	15%
SRR5484470	SRX2767750	SRP105368	SAMN06843513	47,576,666	85%	16%
SRR5484469	SRX2767749	SRP105368	SAMN06843514	49,129,412	86%	16%
SRR5484464	SRX2767744	SRP105368	SAMN06843515	40,922,292	83%	16%
SRR5484463	SRX2767743	SRP105368	SAMN06843516	46,845,940	83%	16%
SRR5484462	SRX2767742	SRP105368	SAMN06843517	42,132,014	84%	16%
SRR5484461	SRX2767741	SRP105368	SAMN06843518	44,693,500	85%	16%
SRR5484460	SRX2767740	SRP105368	SAMN06843519	46,210,988	85%	16%
SRR5484459	SRX2767739	SRP105368	SAMN06843520	45,511,142	84%	16%
SRR5484458	SRX2767738	SRP105368	SAMN06843521	39,721,220	87%	16%
SRR5484457	SRX2767737	SRP105368	SAMN06843522	39,508,160	86%	16%
SRR5484456	SRX2767736	SRP105368	SAMN06843523	46,401,330	86%	16%
SRR5484455	SRX2767735	SRP105368	SAMN06843524	48,144,646	86%	16%
SRR5484454	SRX2767734	SRP105368	SAMN06843525	41,751,024	87%	16%
SRR5484453	SRX2767733	SRP105368	SAMN06843526	40,418,214	85%	16%
SRR5484452	SRX2767732	SRP105368	SAMN06843527	47,988,702	84%	15%
SRR5484451	SRX2767731	SRP105368	SAMN06843528	48,929,946	85%	15%
SRR6981546	SRX3921926	SRP139438	SAMN08803136	54,733,638	73%	16%

Protein alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Varroa destructor high-quality model RefSeq (XP_)	9,670	5,967 (61.71%)	5,967 (61.71%)	57.72%	44.34%
Pediculus humanus corporis model RefSeq (XP_)	10,775	6,475 (60.09%)	6,475 (60.09%)	57.79%	47.10%
Same-species GenBank	182	180 (98.90%)	180 (98.90%)	74.89%	90.20%
Same-species known RefSeq (NP_)	89	89 (100.00%)	89 (100.00%)	72.22%	91.51%
Arthropoda GenBank	123,888	79,266 (63.98%)	79,266 (63.98%)	62.47%	58.54%
Arthropoda known RefSeq (NP_)	39,312	22,898 (58.25%)	22,898 (58.25%)	58.52%	44.52%
Limulus polyphemus high-quality model RefSeq (XP_)	14,599	10,093 (69.13%)	10,093 (69.13%)	59.52%	48.69%
Tribolium castaneum high-quality model RefSeq (XP_)	11,489	6,854 (59.66%)	6,854 (59.66%)	55.79%	39.71%
Apis mellifera high-quality model RefSeq (XP_)	8,342	5,052 (60.56%)	5,052 (60.56%)	57.24%	45.68%

Comparison of the current and previous annotations

The annotation produced for this release (101) was compared to the annotation in the previous release (100) for each assembly annotated in both releases. Scores for current and previous gene and transcript features were calculated based on overlap in exon sequence and matches in exon boundaries. Pairs of current and previous features were categorized based on these scores, whether they are reciprocal best matches, and changes in attributes (gene biotype, completeness, etc.). If the assembly was updated between the two releases, alignments between the current and the previous assembly were used to match the current and previous gene and transcript features in mapped regions.

The table below summarizes the changes in the gene set for each assembly as a percent of the number of genes in the current annotation release, and provides links to the details of the comparison in tabular format and in a Genome Workbench project.

	ASM23943v1 (Current) to ASM23943v1 (Previous)
Identical	6%
Minor changes	73%
Major changes	9%
New	10%
Deprecated	5%
Other	1%
Download the report	tabular, Genome Workbench

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20

RefSeq

Integrated reference sequences