NCBI Solanum tuberosum Annotation Release 101

The RefSeq genome records for Solanum tuberosum were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. This report presents statistics on the annotation products, the input data used in the pipeline and intermediate alignment results.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction
Comparison of the current and previous annotations: What proportion of the genes changed in this annotation

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as NCBI Solanum tuberosum Annotation Release 101

Annotation release ID: 101
Date of Entrez queries for transcripts and proteins: Dec 11 2015
Date of submission of annotation to the public databases: Jan 5 2016
Software version: 6.5

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
SolTub_3.0	GCF_000226075.1	Potato Genome Sequencing Consortium	09-19-2011	Reference	1 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	SolTub_3.0
Genes and pseudogenes	32,733
protein-coding	28,327
non-coding	2,380
pseudogenes	2,026
genes with variants	6,148
mRNAs	37,876
fully-supported	32,254
with > 5% ab initio	4,648
partial	411
with filled gap(s)	19
known RefSeq (NM_)	696
model RefSeq (XM_)	37,180
Other RNAs	6,036
fully-supported	5,148
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	215
model RefSeq (XR_)	4,933
CDSs	37,876
fully-supported	32,254
with > 5% ab initio	4,733
partial	351
with major correction(s)	253
known RefSeq (NP_)	696
model RefSeq (XP_)	37,180

Detailed reports

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	30,707	4,855	3,186	43	123,914
All transcripts	43,912	1,775	1,522	18	16,875
mRNA	37,876	1,826	1,568	114	16,875
misc_RNA	1,589	2,470	2,147	60	9,196
miRNA	333	22	21	18	26
tRNA	770	74	73	71	87
lncRNA	3,344	1,423	873	33	15,097
Single-exon transcripts	4,747	1,193	987	114	8,241
coding transcripts (NM_/XM_ )	4,747	1,193	987	114	8,241
CDSs	37,876	1,367	1,134	105	16,401
Exons	175,708	316	165	1	11,675
in coding transcripts (NM_/XM_ )	165,168	312	164	1	10,068
in non-coding transcripts (NR_/XR_ )	14,861	330	161	2	11,675
Introns	141,091	804	265	28	93,059
in coding transcripts (NM_/XM_ )	133,789	776	259	28	81,224
in non-coding transcripts (NR_/XR_ )	11,366	1,183	380	30	93,059

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	1.4	1	1	50
Number of exons per transcript	6.14	4	1	80

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the Arabidopsis thaliana known RefSeq proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 28327 coding genes, 25632 genes had a protein with an alignment covering 50% or more of the query and 11837 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: Arabidopsis thaliana known RefSeq proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker for each assembly. RepeatMasker results are only used for organisms for which a comprehensive repeat library is available.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with RepeatMasker	% Masked with WindowMasker
SolTub_3.0	GCF_000226075.1	40.89%	34.00%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez, aligned to the genome by Splign or ProSplign and passed to Gnomon, NCBI's gene prediction software.

Depending on the other evidence available, long 454 reads (with average length above 250 nt) may be aligned as traditional evidence and reported in the Transcript alignments section or aligned with short reads and reported in the Short read transcript alignments section.

Transcript alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species known RefSeq (NM_/NR_)	965	963 (99.79%)	899 (93.16%)	98.83%	98.94%
Same-species Genbank	2,568	2,515 (97.94%)	2,161 (84.15%)	98.38%	96.98%
Same-species EST	250,105	223,478 (89.35%)	184,489 (73.76%)	98.33%	98.97%

RefSeq transcript alignment quality report

The known RefSeq transcripts (NM_ and NR_ accessions) are a set of hiqh-quality transcripts maintained by the RefSeq group at NCBI. Alignment statistics for this group of transcripts, such as percent and number of sequences not aligning at all, percent best alignments split between multiple scaffolds, and percent alignments not covering the full CDS are indicative of the genome quality and are provided below.

	SolTub_3.0 Primary Assembly
Number of sequences retrieved from Entrez	965
Number (%) of sequences not aligning	2 (0.21%)
Number (%) of sequences with multiple best alignments (split genes)	9 (0.93%)
Number (%) of sequences with CDS coverage < 95%	21 (2.81%)

Short read transcript alignments

The following short reads (RNA-Seq) from the Sequence Read Archive were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Track name	Number of reads	Percent aligned reads	Percent spliced reads	Number of introns
All	Aggregate of all aligned samples	3,811,841,326	73%	11%	166,009
SAMEA2149986	leaf, cultivar SW1015 (Solanum tuberosum, SAMEA2149986)	52,014,282	69%	13%	122,631
SAMEA2151923	leaf, cultivar Sarpo Mira (Solanum tuberosum, SAMEA2151923)	52,294,420	89%	17%	123,934
SAMEA2160126	leaf, cultivar Desiree (Solanum tuberosum, SAMEA2160126)	52,873,932	90%	17%	125,600
SAMEA957893	stamen (Solanum tuberosum, SAMEA957893)	12,083,841	86%	4%	63,517
SAMEA957894	flower (Solanum tuberosum, SAMEA957894)	6,133,946	83%	4%	59,114
SAMEA957895	tuber (Solanum tuberosum, SAMEA957895)	6,291,767	84%	3%	49,727
SAMEA957896	tuber sprout (Solanum tuberosum, SAMEA957896)	11,393,776	85%	4%	77,441
SAMEA957897	tuber (Solanum tuberosum, SAMEA957897)	6,345,392	82%	4%	60,007
SAMEA957898	tuber pith (Solanum tuberosum, SAMEA957898)	12,187,256	86%	4%	64,296
SAMEA957899	leaf (Solanum tuberosum, SAMEA957899)	6,694,818	84%	4%	54,284
SAMEA957900	leaf (Solanum tuberosum, SAMEA957900)	12,281,440	81%	4%	71,176
SAMEA957901	tuber peel (Solanum tuberosum, SAMEA957901)	9,626,503	84%	4%	62,307
SAMEA957902	shoot apex (Solanum tuberosum, SAMEA957902)	6,066,010	83%	4%	56,754
SAMEA957903	stem (Solanum tuberosum, SAMEA957903)	6,276,156	83%	4%	59,978
SAMEA957904	stolon (Solanum tuberosum, SAMEA957904)	6,634,548	83%	4%	66,801
SAMEA957905	petiole (Solanum tuberosum, SAMEA957905)	6,548,759	83%	4%	61,727
SAMEA957906	tuber cortex (Solanum tuberosum, SAMEA957906)	11,286,464	85%	4%	65,356
SAMEA957907	root (Solanum tuberosum, SAMEA957907)	12,209,179	84%	4%	75,259
SAMEA957908	whole plant (Solanum tuberosum, SAMEA957908)	8,315,792	84%	4%	70,536
SAMN00259611	mixed tissue - Atlantic (Solanum tuberosum, SAMN00259611)	30,185,186	84%	12%	121,950
SAMN00259612	mixed tissue - Premier Russet (Solanum tuberosum, SAMN00259612)	31,949,096	85%	12%	127,007
SAMN00259613	mixed tissue - Snowden (Solanum tuberosum, SAMN00259613)	33,288,120	84%	12%	125,113
SAMN02147137	whole tuber 1, +RB P. infestans treated, 0hpi (Solanum tuberosum, SAMN02147137)	15,507,046	81%	7%	88,993
SAMN02147138	whole tuber 2, +RB P. infestans treated, 0hpi (Solanum tuberosum, SAMN02147138)	22,891,756	82%	7%	97,767
SAMN02147139	whole tuber 3, +RB P. infestans treated, 0hpi (Solanum tuberosum, SAMN02147139)	15,550,022	67%	6%	85,201
SAMN02147140	whole tuber 1, +RB P. infestans treated, 24hpi (Solanum tuberosum, SAMN02147140)	15,498,896	83%	8%	96,244
SAMN02147141	whole tuber 2, +RB P. infestans treated, 24hpi (Solanum tuberosum, SAMN02147141)	26,963,106	83%	8%	104,824
SAMN02147142	whole tuber 3, +RB P. infestans treated, 24hpi (Solanum tuberosum, SAMN02147142)	22,649,294	73%	7%	101,379
SAMN02147143	whole tuber 1, +RB P. infestans treated, 48hpi (Solanum tuberosum, SAMN02147143)	16,023,232	83%	8%	93,812
SAMN02147144	whole tuber 2, +RB P. infestans treated, 48hpi (Solanum tuberosum, SAMN02147144)	18,696,330	83%	8%	99,433
SAMN02147145	whole tuber 3, +RB P. infestans treated, 48hpi (Solanum tuberosum, SAMN02147145)	33,649,574	77%	8%	108,705
SAMN02147146	whole tuber 1, +RB mock treated, 0hpi (Solanum tuberosum, SAMN02147146)	18,462,038	80%	7%	95,258
SAMN02147147	whole tuber 2, +RB mock treated, 0hpi (Solanum tuberosum, SAMN02147147)	16,086,216	81%	7%	92,070
SAMN02147148	whole tuber 3, +RB mock treated, 0hpi (Solanum tuberosum, SAMN02147148)	16,340,330	80%	7%	91,436
SAMN02147149	whole tuber 1, +RB mock treated, 24hpi (Solanum tuberosum, SAMN02147149)	15,995,594	82%	8%	94,687
SAMN02147150	whole tuber 2, +RB mock treated, 24hpi (Solanum tuberosum, SAMN02147150)	27,206,626	82%	8%	105,257
SAMN02147151	whole tuber 3, +RB mock treated, 24hpi (Solanum tuberosum, SAMN02147151)	43,291,318	82%	8%	112,318
SAMN02147152	whole tuber 1, +RB mock treated, 48hpi (Solanum tuberosum, SAMN02147152)	16,727,598	82%	7%	97,740
SAMN02147153	whole tuber 2, +RB mock treated, 48hpi (Solanum tuberosum, SAMN02147153)	18,254,178	81%	7%	93,690
SAMN02147154	whole tuber 3, +RB mock treated, 48hpi (Solanum tuberosum, SAMN02147154)	26,142,516	82%	7%	103,695
SAMN02147155	whole tuber 1, WT mock treated, 0hpi (Solanum tuberosum, SAMN02147155)	28,580,340	81%	7%	98,387
SAMN02147156	whole tuber 2, WT mock treated, 0hpi (Solanum tuberosum, SAMN02147156)	22,907,202	81%	6%	94,463
SAMN02147157	whole tuber 3, WT mock treated, 0hpi (Solanum tuberosum, SAMN02147157)	53,796,744	82%	7%	107,726
SAMN02147158	whole tuber 1, WT mock treated, 24hpi (Solanum tuberosum, SAMN02147158)	46,506,772	83%	8%	113,060
SAMN02147159	whole tuber 2, WT mock treated, 24hpi (Solanum tuberosum, SAMN02147159)	30,860,620	83%	8%	107,607
SAMN02147160	whole tuber 3, WT mock treated, 24hpi (Solanum tuberosum, SAMN02147160)	31,480,390	81%	8%	104,581
SAMN02147161	whole tuber 1, WT mock treated, 48hpi (Solanum tuberosum, SAMN02147161)	40,654,080	81%	7%	111,137
SAMN02147162	whole tuber 2, WT mock treated, 48hpi (Solanum tuberosum, SAMN02147162)	40,536,236	82%	8%	109,392
SAMN02147163	whole tuber 3, WT mock treated, 48hpi (Solanum tuberosum, SAMN02147163)	15,374,986	81%	7%	93,749
SAMN02147164	whole tuber 1, WT P. infestans treated, 0hpi (Solanum tuberosum, SAMN02147164)	25,610,202	80%	7%	98,794
SAMN02147165	whole tuber 2, WT P. infestans treated, 0hpi (Solanum tuberosum, SAMN02147165)	24,056,646	79%	7%	91,377
SAMN02147166	whole tuber 3, WT P. infestans treated, 0hpi (Solanum tuberosum, SAMN02147166)	26,792,310	81%	6%	77,276
SAMN02147167	whole tuber 1, WT P. infestans treated, 24hpi (Solanum tuberosum, SAMN02147167)	31,824,334	79%	8%	106,432
SAMN02147168	whole tuber 2, WT P. infestans treated, 24hpi (Solanum tuberosum, SAMN02147168)	31,292,110	83%	8%	106,114
SAMN02147169	whole tuber 3, WT P. infestans treated, 24hpi (Solanum tuberosum, SAMN02147169)	40,517,258	83%	8%	108,943
SAMN02147170	whole tuber 1, WT P. infestans treated, 48hpi (Solanum tuberosum, SAMN02147170)	35,757,768	81%	7%	108,752
SAMN02147171	whole tuber 2, WT P. infestans treated, 48hpi (Solanum tuberosum, SAMN02147171)	26,127,282	83%	8%	104,068
SAMN02147172	whole tuber 3, WT P. infestans treated, 48hpi (Solanum tuberosum, SAMN02147172)	27,436,022	81%	8%	104,880
SAMN02711349	leaf (Solanum tuberosum, SAMN02711349)	47,988,790	87%	19%	126,708
SAMN02711350	leaf (Solanum tuberosum, SAMN02711350)	48,679,626	87%	18%	126,691
SAMN02711351	leaf (Solanum tuberosum, SAMN02711351)	47,845,834	80%	17%	125,363
SAMN02711352	leaf (Solanum tuberosum, SAMN02711352)	49,575,886	87%	18%	126,889
SAMN02711353	leaf (Solanum tuberosum, SAMN02711353)	49,953,802	84%	17%	126,764
SAMN02711354	leaf (Solanum tuberosum, SAMN02711354)	49,578,648	82%	17%	124,624
SAMN02711355	leaf (Solanum tuberosum, SAMN02711355)	49,224,502	85%	18%	124,835
SAMN02711356	leaf (Solanum tuberosum, SAMN02711356)	49,158,746	86%	18%	126,291
SAMN02717032	Skin (Solanum tuberosum, harvest, SAMN02717032)	53,426,530	70%	15%	119,728
SAMN02717033	Flesh (Solanum tuberosum, harvest, SAMN02717033)	51,032,576	77%	17%	113,811
SAMN02717034	Skin (Solanum tuberosum, harvest, SAMN02717034)	51,184,258	65%	14%	112,243
SAMN02717035	Flesh (Solanum tuberosum, harvest, SAMN02717035)	53,618,074	55%	12%	105,110
SAMN02725400	Roots (Solanum tuberosum, missing, SAMN02725400)	92,427,100	92%	24%	129,736
SAMN03076265	dormant meristem, fall 2005 (Solanum tuberosum, SAMN03076265)	22,321,582	82%	3%	89,461
SAMN03076266	dormant meristem, fall 2007 (Solanum tuberosum, SAMN03076266)	13,564,362	83%	3%	73,435
SAMN03076267	dormant meristem, fall 2009 (Solanum tuberosum, SAMN03076267)	33,494,287	69%	3%	90,381
SAMN03076268	dormant meristem, fall 2010 (Solanum tuberosum, SAMN03076268)	33,964,952	72%	3%	94,475
SAMN03076269	Non-dormant meristem, spring 2006 (Solanum tuberosum, SAMN03076269)	16,891,071	83%	4%	86,149
SAMN03076270	Non-dormant meristem, spring 2008 (Solanum tuberosum, SAMN03076270)	13,640,088	82%	4%	74,013
SAMN03076271	Non-dormant meristem, spring 2010 (Solanum tuberosum, SAMN03076271)	36,556,253	74%	3%	97,913
SAMN03076272	Non-dormant meristem, spring 2011 (Solanum tuberosum, SAMN03076272)	34,126,663	79%	3%	97,846
SAMN03081353	No_NG_zero_time_rep1 (Solanum tuberosum, SAMN03081353)	48,045,994	85%	8%	116,075
SAMN03081354	No_NG_zero_time_rep2 (Solanum tuberosum, SAMN03081354)	57,998,031	85%	9%	126,046
SAMN03081355	No_NG_zero_time_rep3 (Solanum tuberosum, SAMN03081355)	38,326,017	84%	8%	113,207
SAMN03081356	NG_One_day_after_exposure_rep1 (Solanum tuberosum, SAMN03081356)	35,655,491	83%	7%	110,003
SAMN03081357	NG_One_day_after_exposure_rep2 (Solanum tuberosum, SAMN03081357)	59,906,580	82%	8%	120,143
SAMN03081358	NG_One_day_after_exposure_rep3 (Solanum tuberosum, SAMN03081358)	43,290,578	84%	8%	116,782
SAMN03081359	NG_Four_days_after_exposure_rep1 (Solanum tuberosum, SAMN03081359)	50,655,558	85%	9%	122,755
SAMN03081360	NG_Four_days_after_exposure_rep3 (Solanum tuberosum, SAMN03081360)	51,767,740	85%	9%	122,894
SAMN03081361	NG_Seven_days_after_exposure_rep1 (Solanum tuberosum, SAMN03081361)	44,839,850	85%	8%	120,479
SAMN03081362	NG_Seven_days_after_exposure_rep2 (Solanum tuberosum, SAMN03081362)	57,657,436	85%	9%	125,069
SAMN03081363	NG_Seven_days_after_exposure_rep3 (Solanum tuberosum, SAMN03081363)	56,348,592	83%	9%	124,490
SAMN03394894	stolon tips, control (Solanum tuberosum, not applicable, SAMN03394894)	74,826,108	86%	19%	134,943
SAMN03394895	stolon tips, drought stress (Solanum tuberosum, not applicable, SAMN03394895)	68,670,240	87%	20%	131,309
SAMN03394896	stolon tips, rewatered after drought stress (Solanum tuberosum, not applicable, SAMN03394896)	73,917,562	88%	21%	130,039
SAMN03897511	petiole (Solanum tuberosum, 6 weeks, SAMN03897511)	135,262,334	84%	22%	139,910
SAMN03897512	petiole (Solanum tuberosum, 6 weeks, SAMN03897512)	234,431,322	85%	21%	142,918

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent spliced reads
ERR029918	ERX010642	ERP000527	SAMEA957893	12,083,841	86%	4%
ERR029909	ERX010643	ERP000527	SAMEA957894	6,133,946	83%	4%
ERR029916	ERX010637	ERP000527	SAMEA957895	6,291,767	84%	3%
ERR029923	ERX010638	ERP000527	SAMEA957896	11,393,776	85%	4%
ERR029915	ERX010639	ERP000527	SAMEA957897	6,345,392	82%	4%
ERR029920	ERX010640	ERP000527	SAMEA957898	12,187,256	86%	4%
ERR029910	ERX010641	ERP000527	SAMEA957899	6,694,818	84%	4%
ERR029919	ERX010648	ERP000527	SAMEA957900	12,281,440	81%	4%
ERR029921	ERX010649	ERP000527	SAMEA957901	9,626,503	84%	4%
ERR029912	ERX010650	ERP000527	SAMEA957902	6,066,010	83%	4%
ERR029913	ERX010651	ERP000527	SAMEA957903	6,276,156	83%	4%
ERR029914	ERX010644	ERP000527	SAMEA957904	6,634,548	83%	4%
ERR029911	ERX010645	ERP000527	SAMEA957905	6,548,759	83%	4%
ERR029924	ERX010646	ERP000527	SAMEA957906	11,286,464	85%	4%
ERR029917	ERX010647	ERP000527	SAMEA957907	12,209,179	84%	4%
ERR029922	ERX010652	ERP000527	SAMEA957908	8,315,792	84%	4%
ERR305630	ERX278913	ERP003480	SAMEA2149986	52,014,282	69%	13%
ERR305631	ERX278912	ERP003480	SAMEA2151923	52,294,420	89%	17%
ERR305632	ERX278911	ERP003480	SAMEA2160126	52,873,932	90%	17%
SRR184099	SRX057233	SRP005965	SAMN00259611	12,765,576	81%	11%
SRR184100	SRX057233	SRP005965	SAMN00259611	17,419,610	87%	12%
SRR184101	SRX057234	SRP005965	SAMN00259612	13,304,166	82%	11%
SRR184102	SRX057234	SRP005965	SAMN00259612	18,644,930	87%	13%
SRR184103	SRX057235	SRP005965	SAMN00259613	13,675,120	81%	11%
SRR184104	SRX057235	SRP005965	SAMN00259613	19,613,000	85%	12%
SRR863371	SRX283665	SRP022916	SAMN02147137	15,507,046	81%	7%
SRR863697	SRX283930	SRP022916	SAMN02147138	22,891,756	82%	7%
SRR863698	SRX283932	SRP022916	SAMN02147139	15,550,022	67%	6%
SRR864315	SRX284236	SRP022916	SAMN02147140	15,498,896	83%	8%
SRR864485	SRX284303	SRP022916	SAMN02147141	26,963,106	83%	8%
SRR864713	SRX284481	SRP022916	SAMN02147142	22,649,294	73%	7%
SRR865071	SRX284780	SRP022916	SAMN02147143	16,023,232	83%	8%
SRR865299	SRX284942	SRP022916	SAMN02147144	18,696,330	83%	8%
SRR865383	SRX284997	SRP022916	SAMN02147145	33,649,574	77%	8%
SRR865536	SRX285033	SRP022916	SAMN02147146	18,462,038	80%	7%
SRR865575	SRX285164	SRP022916	SAMN02147147	16,086,216	81%	7%
SRR865691	SRX285229	SRP022916	SAMN02147148	16,340,330	80%	7%
SRR865788	SRX285279	SRP022916	SAMN02147149	15,995,594	82%	8%
SRR865843	SRX285315	SRP022916	SAMN02147150	27,206,626	82%	8%
SRR865902	SRX285356	SRP022916	SAMN02147151	43,291,318	82%	8%
SRR865992	SRX285406	SRP022916	SAMN02147152	16,727,598	82%	7%
SRR866043	SRX285434	SRP022916	SAMN02147153	18,254,178	81%	7%
SRR866220	SRX285597	SRP022916	SAMN02147154	26,142,516	82%	7%
SRR866226	SRX285600	SRP022916	SAMN02147155	28,580,340	81%	7%
SRR866232	SRX285601	SRP022916	SAMN02147156	22,907,202	81%	6%
SRR866237	SRX285602	SRP022916	SAMN02147157	53,796,744	82%	7%
SRR866242	SRX285603	SRP022916	SAMN02147158	46,506,772	83%	8%
SRR866243	SRX285604	SRP022916	SAMN02147159	30,860,620	83%	8%
SRR866244	SRX285605	SRP022916	SAMN02147160	31,480,390	81%	8%
SRR866245	SRX285607	SRP022916	SAMN02147161	40,654,080	81%	7%
SRR866250	SRX285608	SRP022916	SAMN02147162	40,536,236	82%	8%
SRR866252	SRX285613	SRP022916	SAMN02147163	15,374,986	81%	7%
SRR866253	SRX285614	SRP022916	SAMN02147164	25,610,202	80%	7%
SRR866254	SRX285616	SRP022916	SAMN02147165	24,056,646	79%	7%
SRR866256	SRX285617	SRP022916	SAMN02147166	26,792,310	81%	6%
SRR866257	SRX285618	SRP022916	SAMN02147167	31,824,334	79%	8%
SRR866258	SRX285619	SRP022916	SAMN02147168	31,292,110	83%	8%
SRR866259	SRX285620	SRP022916	SAMN02147169	40,517,258	83%	8%
SRR866266	SRX285621	SRP022916	SAMN02147170	35,757,768	81%	7%
SRR866268	SRX285629	SRP022916	SAMN02147171	26,127,282	83%	8%
SRR866275	SRX285633	SRP022916	SAMN02147172	27,436,022	81%	8%
SRR1170971	SRX510886	SRP036626	SAMN02717032	53,426,530	70%	15%
SRR1200975	SRX510899	SRP036626	SAMN02717033	51,032,576	77%	17%
SRR1200976	SRX510900	SRP036626	SAMN02717034	51,184,258	65%	14%
SRR1200977	SRX510902	SRP036626	SAMN02717035	53,618,074	55%	12%
SRR1207287	SRX502839	SRP040682	SAMN02711349	47,988,790	87%	19%
SRR1207290	SRX502842	SRP040682	SAMN02711350	48,679,626	87%	18%
SRR1207284	SRX502836	SRP040682	SAMN02711351	47,845,834	80%	17%
SRR1207286	SRX502838	SRP040682	SAMN02711352	49,575,886	87%	18%
SRR1207285	SRX502837	SRP040682	SAMN02711353	49,953,802	84%	17%
SRR1207283	SRX502835	SRP040682	SAMN02711354	49,578,648	82%	17%
SRR1207289	SRX502841	SRP040682	SAMN02711355	49,224,502	85%	18%
SRR1207288	SRX502840	SRP040682	SAMN02711356	49,158,746	86%	18%
SRR1232054	SRX514973	SRP041108	SAMN02725400	92,427,100	92%	24%
SRR1584263	SRX708855	SRP047434	SAMN03076265	22,321,582	82%	3%
SRR1584264	SRX708856	SRP047434	SAMN03076266	13,564,362	83%	3%
SRR1584265	SRX708857	SRP047434	SAMN03076267	33,494,287	69%	3%
SRR1584266	SRX708858	SRP047434	SAMN03076268	33,964,952	72%	3%
SRR1584267	SRX708859	SRP047434	SAMN03076269	16,891,071	83%	4%
SRR1584268	SRX708860	SRP047434	SAMN03076270	13,640,088	82%	4%
SRR1584269	SRX708861	SRP047434	SAMN03076271	36,556,253	74%	3%
SRR1584270	SRX708862	SRP047434	SAMN03076272	34,126,663	79%	3%
SRR1586392	SRX710692	SRP047517	SAMN03081353	48,045,994	85%	8%
SRR1586393	SRX710693	SRP047517	SAMN03081354	57,998,031	85%	9%
SRR1586394	SRX710694	SRP047517	SAMN03081355	38,326,017	84%	8%
SRR1586395	SRX710695	SRP047517	SAMN03081356	35,655,491	83%	7%
SRR1586396	SRX710696	SRP047517	SAMN03081357	59,906,580	82%	8%
SRR1586397	SRX710697	SRP047517	SAMN03081358	43,290,578	84%	8%
SRR1586398	SRX710698	SRP047517	SAMN03081359	50,655,558	85%	9%
SRR1586399	SRX710699	SRP047517	SAMN03081360	51,767,740	85%	9%
SRR1586400	SRX710700	SRP047517	SAMN03081361	44,839,850	85%	8%
SRR1586401	SRX710701	SRP047517	SAMN03081362	57,657,436	85%	9%
SRR1586402	SRX710702	SRP047517	SAMN03081363	56,348,592	83%	9%
SRR1867889	SRX912186	SRP056128	SAMN03394894	74,826,108	86%	19%
SRR1867891	SRX912191	SRP056128	SAMN03394895	68,670,240	87%	20%
SRR1867903	SRX912192	SRP056128	SAMN03394896	73,917,562	88%	21%
SRR2126855	SRX1117848	SRP061570	SAMN03897511	25,563,138	85%	22%
SRR2126856	SRX1127488	SRP061570	SAMN03897511	20,006,494	85%	22%
SRR2126915	SRX1127489	SRP061570	SAMN03897511	46,815,634	84%	21%
SRR2126975	SRX1127490	SRP061570	SAMN03897511	42,877,068	84%	21%
SRR2127278	SRX1118126	SRP061570	SAMN03897512	70,702,588	85%	22%
SRR2127279	SRX1127491	SRP061570	SAMN03897512	72,781,660	85%	22%
SRR2127280	SRX1127492	SRP061570	SAMN03897512	33,437,656	84%	21%
SRR2127289	SRX1127493	SRP061570	SAMN03897512	57,509,418	85%	21%

Protein alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Arabidopsis thaliana known RefSeq (NP_)	35,173	30,303 (86.15%)	30,303 (86.15%)	66.80%	70.17%
Solanaceae GenBank	10,170	9,870 (97.05%)	9,870 (97.05%)	74.20%	85.34%
Solanaceae known RefSeq (NP_)	61	61 (100.00%)	61 (100.00%)	71.51%	84.76%
Solanum lycopersicum known RefSeq (NP_)	1,576	1,567 (99.43%)	1,567 (99.43%)	75.61%	84.92%
Same-species GenBank	1,978	1,919 (97.02%)	1,919 (97.02%)	77.38%	88.07%
Same-species known RefSeq (NP_)	750	746 (99.47%)	746 (99.47%)	79.65%	87.66%

Comparison of the current and previous annotations

The annotation produced for this release (101) was compared to the annotation in the previous release (100) for each assembly annotated in both releases. Scores for current and previous gene and transcript features were calculated based on overlap in exon sequence and matches in exon boundaries. Pairs of current and previous features were categorized based on these scores, whether they are reciprocal best matches, and changes in attributes (gene biotype, completeness, etc.). If the assembly was updated between the two releases, alignments between the current and the previous assembly were used to match the current and previous gene and transcript features in mapped regions.

The table below summarizes the changes in the gene set for each assembly as a percent of the number of genes in the current annotation release, and provides links to the details of the comparison in tabular format and in a Genome Workbench project.

	SolTub_3.0 (Current) to SolTub_3.0 (Previous)
Identical	6%
Minor changes	60%
Major changes	16%
New	18%
Deprecated	7%
Other	<1%
Download the report	tabular, Genome Workbench

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20

RefSeq

Integrated reference sequences