NCBI Setaria italica Annotation Release 103

The RefSeq genome records for Setaria italica were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. This report presents statistics on the annotation products, the input data used in the pipeline and intermediate alignment results.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction
Comparison of the current and previous annotations: What proportion of the genes changed in this annotation

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as NCBI Setaria italica Annotation Release 103

Annotation release ID: 103
Date of Entrez queries for transcripts and proteins: Oct 11 2017
Date of submission of annotation to the public databases: Oct 16 2017
Software version: 7.4

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
Setaria_italica_v2.0	GCF_000263155.2	JGI-PGF	10-30-2015	Reference	10 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	Setaria_italica_v2.0
Genes and pseudogenes	31,995
protein-coding	27,422
non-coding	2,572
pseudogenes	2,001
genes with variants	5,738
mRNAs	35,761
fully-supported	31,826
with > 5% ab initio	3,060
partial	238
with filled gap(s)	73
known RefSeq (NM_)	25
model RefSeq (XM_)	35,736
Other RNAs	5,645
fully-supported	5,094
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	5,105
CDSs	35,761
fully-supported	31,826
with > 5% ab initio	3,166
partial	223
with major correction(s)	577
known RefSeq (NP_)	25
model RefSeq (XP_)	35,736

Detailed reports

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	29,994	3,567	2,846	71	111,270
All transcripts	41,406	1,973	1,716	64	16,551
mRNA	35,761	1,988	1,731	195	16,551
misc_RNA	1,869	2,628	2,293	64	12,113
tRNA	540	74	73	71	88
lncRNA	3,236	1,751	1,387	79	10,234
Single-exon transcripts	4,981	1,339	1,179	252	6,300
coding transcripts (NM_/XM_ )	4,975	1,339	1,179	252	6,300
non-coding transcripts (NR_/XR_ )	6	1,650	1,740	1,276	2,062
CDSs	35,761	1,418	1,194	180	16,146
Exons	170,757	353	177	1	8,807
in coding transcripts (NM_/XM_ )	159,715	344	173	1	7,589
in non-coding transcripts (NR_/XR_ )	16,295	402	198	3	8,807
Introns	135,730	444	148	30	110,148
in coding transcripts (NM_/XM_ )	128,543	433	146	30	110,148
in non-coding transcripts (NR_/XR_ )	12,287	562	182	31	28,789

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	1.39	1	1	33
Number of exons per transcript	6.01	4	1	79

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the Arabidopsis thaliana known RefSeq proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 27422 coding genes, 23281 genes had a protein with an alignment covering 50% or more of the query and 6948 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: Arabidopsis thaliana known RefSeq proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker for each assembly. RepeatMasker results are only used for organisms for which a comprehensive repeat library is available.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with RepeatMasker	% Masked with WindowMasker
Setaria_italica_v2.0	GCF_000263155.2	1.43%	28.42%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez, aligned to the genome by Splign or ProSplign and passed to Gnomon, NCBI's gene prediction software.

Depending on the other evidence available, long 454 reads (with average length above 250 nt) may be aligned as traditional evidence and reported in the Transcript alignments section or aligned with RNA-Seq reads and reported in the RNA-Seq alignments section.

Transcript alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species known RefSeq (NM_/NR_)	25	25 (100.00%)	25 (100.00%)	99.40%	99.68%
Same-species Genbank	111	109 (98.20%)	83 (74.77%)	99.06%	99.78%
Same-species EST	66,027	47,422 (71.82%)	30,704 (46.50%)	99.42%	99.61%

RNA-Seq alignments

The following RNA-Seq reads from the Sequence Read Archive were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Publication	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	NA	Aggregate of all aligned samples	3,049,922,323	82%	22%	152,153
SAMEA104118845	NA	Yu1-CK; Drought Setaria RNAseq (Setaria italica, SAMEA104118845)	119,411,802	89%	30%	138,343
SAMEA104118846	NA	Yu1-DT; Drought Setaria RNAseq (Setaria italica, SAMEA104118846)	121,644,668	87%	29%	137,923
SAMEA104118847	NA	An04-CK; Drought Setaria RNAseq (Setaria italica, SAMEA104118847)	127,731,974	88%	29%	138,087
SAMEA104118848	NA	An04-DT; Drought Setaria RNAseq (Setaria italica, SAMEA104118848)	119,958,346	89%	28%	138,425
SAMEA3906792	NA	siago1b1; siago1b RNA-seq (Setaria italica, SAMEA3906792)	9,916,542	74%	21%	102,569
SAMEA3906793	NA	siago1b2; siago1b RNA-seq (Setaria italica, SAMEA3906793)	26,643,822	74%	21%	115,981
SAMEA3906794	NA	siago1b3; siago1b RNA-seq (Setaria italica, SAMEA3906794)	6,880,196	74%	21%	96,096
SAMEA3906795	NA	Yugu1-1; Yugu1 RNA-seq (Setaria italica, SAMEA3906795)	9,768,376	69%	19%	97,653
SAMEA3906796	NA	Yugu1-2; Yugu1 RNA-seq (Setaria italica, SAMEA3906796)	9,859,528	69%	19%	97,629
SAMEA3906797	NA	Yugu1-3; Yugu1 RNA-seq (Setaria italica, SAMEA3906797)	34,678,398	69%	19%	115,722
SAMN00728655	19126705,22580951,9023104	Generic sample from Setaria italica (Setaria italica, SAMN00728655)	430,173	89%	51%	64,614
SAMN00728656	19126705,22580951,9023104	pool of seedlings from various conditions, untreated roots (Setaria italica, SAMN00728656)	815,263	90%	48%	77,288
SAMN00749847	NA	Generic sample from Setaria italica (Setaria italica, SAMN00749847)	130,418,650	56%	19%	119,989
SAMN00773765	NA	Generic sample from Setaria italica (Setaria italica, SAMN00773765)	33,561,580	53%	19%	104,281
SAMN00773766	NA	Generic sample from Setaria italica (Setaria italica, SAMN00773766)	38,637,670	45%	20%	83,684
SAMN00773767	NA	Generic sample from Setaria italica (Setaria italica, SAMN00773767)	98,051,204	76%	18%	119,878
SAMN00773768	NA	Generic sample from Setaria italica (Setaria italica, SAMN00773768)	80,185,542	84%	19%	113,959
SAMN00773769	NA	Generic sample from Setaria italica (Setaria italica, SAMN00773769)	43,694,770	76%	21%	95,515
SAMN00773770	NA	Generic sample from Setaria italica (Setaria italica, SAMN00773770)	97,840,646	83%	20%	108,216
SAMN00773771	NA	Generic sample from Setaria italica (Setaria italica, SAMN00773771)	74,374,726	82%	19%	118,372
SAMN00773772	NA	Generic sample from Setaria italica (Setaria italica, SAMN00773772)	47,391,350	74%	20%	97,652
SAMN00773773	NA	Generic sample from Setaria italica (Setaria italica, SAMN00773773)	88,900,964	87%	18%	114,877
SAMN00773774	NA	Generic sample from Setaria italica (Setaria italica, SAMN00773774)	50,003,834	71%	20%	97,540
SAMN00773775	NA	Generic sample from Setaria italica (Setaria italica, SAMN00773775)	38,836,588	46%	19%	97,226
SAMN00773776	NA	Generic sample from Setaria italica (Setaria italica, SAMN00773776)	91,732,086	84%	19%	103,601
SAMN00810008	22580950	root (Setaria italica, SAMN00810008)	38,487,562	74%	17%	111,086
SAMN00810009	22580950	leaf (Setaria italica, SAMN00810009)	38,399,466	88%	17%	89,218
SAMN00810010	22580950	stem (Setaria italica, SAMN00810010)	38,519,470	88%	19%	115,215
SAMN00810011	22580950	tassel (Setaria italica, SAMN00810011)	39,251,372	88%	16%	117,582
SAMN01813499	NA	mCK-1 (Setaria italica, SAMN01813499)	24,344,444	44%	21%	100,557
SAMN01820207	NA	first biological replicate, control (Setaria italica, SAMN01820207)	24,344,444	86%	20%	109,778
SAMN01820208	NA	second biological replicate, control (Setaria italica, SAMN01820208)	46,846,370	83%	20%	118,267
SAMN01820209	NA	first biological replicate, drought (Setaria italica, SAMN01820209)	22,976,688	85%	20%	113,688
SAMN01820210	NA	second biological replicate, drought (Setaria italica, SAMN01820210)	20,186,756	86%	19%	110,796
SAMN05004927	NA	Phytomer: leaf, stem node and internode (Setaria italica, 35 days, SAMN05004927)	350,856,704	83%	24%	141,409
SAMN05830712	28733421	whole plant (Setaria italica, SAMN05830712)	20,774,417	93%	17%	109,995
SAMN05830713	28733421	whole plant (Setaria italica, SAMN05830713)	10,741,682	82%	9%	83,467
SAMN05830714	28733421	whole plant (Setaria italica, SAMN05830714)	9,415,855	87%	9%	85,633
SAMN05830715	28733421	whole plant (Setaria italica, SAMN05830715)	9,866,235	86%	8%	76,468
SAMN05830716	28733421	whole plant (Setaria italica, SAMN05830716)	10,618,021	89%	9%	87,859
SAMN05830717	28733421	whole plant (Setaria italica, SAMN05830717)	8,910,755	87%	9%	81,154
SAMN05830718	28733421	whole plant (Setaria italica, SAMN05830718)	8,388,764	90%	9%	85,041
SAMN05830719	28733421	whole plant (Setaria italica, SAMN05830719)	9,448,316	90%	10%	82,615
SAMN05830720	28733421	whole plant (Setaria italica, SAMN05830720)	8,485,719	90%	9%	84,585
SAMN05830721	28733421	whole plant (Setaria italica, SAMN05830721)	8,942,304	87%	9%	85,134
SAMN05830722	28733421	whole plant (Setaria italica, SAMN05830722)	10,092,581	91%	10%	88,499
SAMN05830723	28733421	whole plant (Setaria italica, SAMN05830723)	9,047,535	85%	10%	82,436
SAMN05830748	28733421	whole plant (Setaria italica, SAMN05830748)	9,632,373	91%	9%	86,897
SAMN05830749	28733421	whole plant (Setaria italica, SAMN05830749)	25,244,823	90%	16%	106,472
SAMN05830750	28733421	whole plant (Setaria italica, SAMN05830750)	23,775,948	91%	17%	112,222
SAMN05830751	28733421	whole plant (Setaria italica, SAMN05830751)	20,408,629	91%	15%	104,538
SAMN05830752	28733421	whole plant (Setaria italica, SAMN05830752)	14,600,091	93%	17%	103,827
SAMN05830753	28733421	whole plant (Setaria italica, SAMN05830753)	20,379,509	92%	16%	107,435
SAMN05830754	28733421	whole plant (Setaria italica, SAMN05830754)	20,849,319	93%	17%	109,061
SAMN05830755	28733421	whole plant (Setaria italica, SAMN05830755)	21,573,967	93%	17%	103,776
SAMN05830756	28733421	whole plant (Setaria italica, SAMN05830756)	23,929,256	93%	17%	111,390
SAMN05830757	28733421	whole plant (Setaria italica, SAMN05830757)	19,080,619	93%	17%	108,380
SAMN05830758	28733421	whole plant (Setaria italica, SAMN05830758)	21,980,942	93%	17%	111,638
SAMN05830759	28733421	whole plant (Setaria italica, SAMN05830759)	20,103,179	93%	18%	110,535
SAMN05830784	28733421	whole plant (Setaria italica, SAMN05830784)	19,115,486	91%	9%	96,434
SAMN05830785	28733421	whole plant (Setaria italica, SAMN05830785)	20,578,261	89%	9%	91,962
SAMN05830786	28733421	whole plant (Setaria italica, SAMN05830786)	26,136,070	89%	9%	100,126
SAMN05830787	28733421	whole plant (Setaria italica, SAMN05830787)	17,106,588	91%	9%	88,684
SAMN05830788	28733421	whole plant (Setaria italica, SAMN05830788)	24,377,971	91%	9%	99,538
SAMN05830789	28733421	whole plant (Setaria italica, SAMN05830789)	19,282,741	91%	9%	93,918
SAMN05830790	28733421	whole plant (Setaria italica, SAMN05830790)	19,502,992	91%	9%	94,159
SAMN05830791	28733421	whole plant (Setaria italica, SAMN05830791)	26,236,981	92%	10%	98,993
SAMN05830792	28733421	whole plant (Setaria italica, SAMN05830792)	20,076,564	89%	9%	97,497
SAMN05830793	28733421	whole plant (Setaria italica, SAMN05830793)	21,056,314	55%	9%	83,300
SAMN05830794	28733421	whole plant (Setaria italica, SAMN05830794)	20,039,874	89%	10%	99,909
SAMN05830795	28733421	whole plant (Setaria italica, SAMN05830795)	19,877,538	91%	10%	98,085
SAMN06313210	NA	Gene expression analysis of Setaria italica Yugu1 - Tiller - 060-D08 (Setaria italica, SAMN06313210)	19,275,468	81%	43%	116,620
SAMN06313354	NA	Gene expression analysis of Setaria italica B100 - Germ shoot 6d - 034-I01 (Setaria italica, SAMN06313354)	58,583,898	83%	40%	130,879
SAMN06313739	NA	Gene expression analysis of Setaria italica B100 - Total aerial - 034-G03 (Setaria italica, SAMN06313739)	38,349,584	84%	42%	122,624
SAMN06313743	NA	Gene expression analysis of Setaria italica B100 - Leaf 6 2wk - 034-H03 (Setaria italica, SAMN06313743)	41,150,014	88%	45%	123,665
SAMN06313941	NA	Gene expression analysis of Setaria italica B100 - Total aerial - 035-A09 (Setaria italica, SAMN06313941)	45,997,500	87%	43%	128,016
SAMN06313989	NA	Gene expression analysis of Setaria italica Yugu1 - Tiller - 060-F03 (Setaria italica, SAMN06313989)	36,683,324	79%	42%	126,472
SAMN06314609	NA	Gene expression analysis of Setaria italica Yugu1 - Roots - 072-H01 (Setaria italica, SAMN06314609)	44,622,342	86%	40%	126,557

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
ERR1337799	ERX1409391	ERP014695	SAMEA3906792	9,916,542	74%	21%
ERR1337800	ERX1409392	ERP014695	SAMEA3906793	26,643,822	74%	21%
ERR1337801	ERX1409393	ERP014695	SAMEA3906794	6,880,196	74%	21%
ERR1337802	ERX1409394	ERP014695	SAMEA3906795	9,768,376	69%	19%
ERR1337803	ERX1409395	ERP014695	SAMEA3906796	9,859,528	69%	19%
ERR1337804	ERX1409396	ERP014695	SAMEA3906797	34,678,398	69%	19%
ERR2003555	ERX2063424	ERP023463	SAMEA104118845	48,624,780	89%	30%
ERR2003556	ERX2063425	ERP023463	SAMEA104118845	43,971,556	89%	31%
ERR2003557	ERX2063426	ERP023463	SAMEA104118845	26,815,466	89%	27%
ERR2003558	ERX2063427	ERP023463	SAMEA104118846	54,881,616	88%	30%
ERR2003559	ERX2063428	ERP023463	SAMEA104118846	45,558,532	85%	29%
ERR2003560	ERX2063429	ERP023463	SAMEA104118846	21,204,520	87%	27%
ERR2003561	ERX2063430	ERP023463	SAMEA104118847	52,586,800	89%	28%
ERR2003562	ERX2063431	ERP023463	SAMEA104118847	50,777,804	86%	30%
ERR2003563	ERX2063432	ERP023463	SAMEA104118847	24,367,370	89%	27%
ERR2003564	ERX2063433	ERP023463	SAMEA104118848	43,041,398	88%	28%
ERR2003565	ERX2063434	ERP023463	SAMEA104118848	55,425,402	89%	30%
ERR2003566	ERX2063435	ERP023463	SAMEA104118848	21,491,546	95%	25%
SRR350548	SRX099595	SRP008639	SAMN00728655	430,173	89%	51%
SRR350549	SRX099596	SRP008639	SAMN00728656	815,263	90%	48%
SRR360531	SRX104107	SRP009166	SAMN00749847	130,418,650	56%	19%
SRR400270	SRX116346	SRP009166	SAMN00773765	33,561,580	53%	19%
SRR400273	SRX116350	SRP009166	SAMN00773766	38,637,670	45%	20%
SRR400275	SRX116351	SRP009166	SAMN00773767	24,851,372	37%	20%
SRR400282	SRX116351	SRP009166	SAMN00773767	73,199,832	90%	18%
SRR400277	SRX116356	SRP009166	SAMN00773768	80,185,542	84%	19%
SRR400281	SRX116353	SRP009166	SAMN00773769	43,694,770	76%	21%
SRR400271	SRX116347	SRP009166	SAMN00773770	15,131,508	58%	20%
SRR400276	SRX116347	SRP009166	SAMN00773770	82,709,138	88%	20%
SRR400274	SRX116352	SRP009166	SAMN00773771	74,374,726	82%	19%
SRR400280	SRX116349	SRP009166	SAMN00773772	47,391,350	74%	20%
SRR400269	SRX116355	SRP009166	SAMN00773773	88,900,964	87%	18%
SRR400278	SRX116354	SRP009166	SAMN00773774	50,003,834	71%	20%
SRR400272	SRX116357	SRP009166	SAMN00773775	38,836,588	46%	19%
SRR400279	SRX116348	SRP009166	SAMN00773776	3,014,858	9%	17%
SRR400286	SRX116348	SRP009166	SAMN00773776	88,717,228	86%	19%
SRR442161	SRX128223	SRP011401	SAMN00810008	38,487,562	74%	17%
SRR442162	SRX128224	SRP011401	SAMN00810009	38,399,466	88%	17%
SRR442163	SRX128225	SRP011401	SAMN00810010	38,519,470	88%	19%
SRR442164	SRX128226	SRP011401	SAMN00810011	39,251,372	88%	16%
SRR616277	SRX204235	SRP017158	SAMN01813499	24,344,444	44%	21%
SRR630959	SRX209171	SRP017158	SAMN01820207	24,344,444	86%	20%
SRR629695	SRX209176	SRP017475	SAMN01820208	46,846,370	83%	20%
SRR629694	SRX209177	SRP017476	SAMN01820209	22,976,688	85%	20%
SRR630961	SRX209187	SRP017478	SAMN01820210	20,186,756	86%	19%
SRR3536617	SRX1770119	SRP075284	SAMN05004927	56,302,576	82%	23%
SRR3536624	SRX1770119	SRP075284	SAMN05004927	61,676,054	82%	23%
SRR3536640	SRX1770119	SRP075284	SAMN05004927	56,115,548	83%	23%
SRR3536722	SRX1770119	SRP075284	SAMN05004927	61,579,210	83%	26%
SRR3536745	SRX1770119	SRP075284	SAMN05004927	51,876,232	84%	26%
SRR3536772	SRX1770119	SRP075284	SAMN05004927	63,307,084	84%	27%
SRR4301592	SRX2195933	SRP090583	SAMN05830712	20,774,417	93%	17%
SRR4301593	SRX2195934	SRP090583	SAMN05830713	10,741,682	82%	9%
SRR4301613	SRX2195953	SRP090583	SAMN05830714	9,415,855	87%	9%
SRR4301624	SRX2195964	SRP090583	SAMN05830715	9,866,235	86%	8%
SRR4301635	SRX2195975	SRP090583	SAMN05830716	10,618,021	89%	9%
SRR4301646	SRX2195986	SRP090583	SAMN05830717	8,910,755	87%	9%
SRR4301657	SRX2195997	SRP090583	SAMN05830718	8,388,764	90%	9%
SRR4301668	SRX2196008	SRP090583	SAMN05830719	9,448,316	90%	10%
SRR4301679	SRX2196019	SRP090583	SAMN05830720	8,485,719	90%	9%
SRR4301690	SRX2196030	SRP090583	SAMN05830721	8,942,304	87%	9%
SRR4301594	SRX2195935	SRP090583	SAMN05830722	10,092,581	91%	10%
SRR4301604	SRX2195944	SRP090583	SAMN05830723	9,047,535	85%	10%
SRR4301631	SRX2195971	SRP090583	SAMN05830748	9,632,373	91%	9%
SRR4301632	SRX2195972	SRP090583	SAMN05830749	25,244,823	90%	16%
SRR4301633	SRX2195973	SRP090583	SAMN05830750	23,775,948	91%	17%
SRR4301634	SRX2195974	SRP090583	SAMN05830751	20,408,629	91%	15%
SRR4301636	SRX2195976	SRP090583	SAMN05830752	14,600,091	93%	17%
SRR4301637	SRX2195977	SRP090583	SAMN05830753	20,379,509	92%	16%
SRR4301638	SRX2195978	SRP090583	SAMN05830754	20,849,319	93%	17%
SRR4301639	SRX2195979	SRP090583	SAMN05830755	21,573,967	93%	17%
SRR4301640	SRX2195980	SRP090583	SAMN05830756	23,929,256	93%	17%
SRR4301641	SRX2195981	SRP090583	SAMN05830757	19,080,619	93%	17%
SRR4301642	SRX2195982	SRP090583	SAMN05830758	21,980,942	93%	17%
SRR4301643	SRX2195983	SRP090583	SAMN05830759	20,103,179	93%	18%
SRR4301671	SRX2196011	SRP090583	SAMN05830784	19,115,486	91%	9%
SRR4301672	SRX2196012	SRP090583	SAMN05830785	20,578,261	89%	9%
SRR4301673	SRX2196013	SRP090583	SAMN05830786	26,136,070	89%	9%
SRR4301674	SRX2196014	SRP090583	SAMN05830787	17,106,588	91%	9%
SRR4301675	SRX2196015	SRP090583	SAMN05830788	24,377,971	91%	9%
SRR4301676	SRX2196016	SRP090583	SAMN05830789	19,282,741	91%	9%
SRR4301677	SRX2196017	SRP090583	SAMN05830790	19,502,992	91%	9%
SRR4301678	SRX2196018	SRP090583	SAMN05830791	26,236,981	92%	10%
SRR4301680	SRX2196020	SRP090583	SAMN05830792	20,076,564	89%	9%
SRR4301681	SRX2196021	SRP090583	SAMN05830793	21,056,314	55%	9%
SRR4301682	SRX2196022	SRP090583	SAMN05830794	20,039,874	89%	10%
SRR4301683	SRX2196023	SRP090583	SAMN05830795	19,877,538	91%	10%
SRR5499092	SRX2779411	SRP106156	SAMN06313743	41,150,014	88%	45%
SRR5499103	SRX2779422	SRP106167	SAMN06313739	38,349,584	84%	42%
SRR5499114	SRX2779433	SRP106177	SAMN06313941	45,997,500	87%	43%
SRR5499125	SRX2779444	SRP106190	SAMN06313354	58,583,898	83%	40%
SRR5574465	SRX2832826	SRP107271	SAMN06313210	19,275,468	81%	43%
SRR5574454	SRX2832837	SRP107282	SAMN06313989	36,683,324	79%	42%
SRR5683282	SRX2918436	SRP109195	SAMN06314609	44,622,342	86%	40%

Protein alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Oryza sativa GenBank	20,414	19,178 (93.95%)	19,178 (93.95%)	71.37%	80.46%
Same-species GenBank	107	107 (100.00%)	107 (100.00%)	78.68%	89.68%
Same-species known RefSeq (NP_)	25	25 (100.00%)	25 (100.00%)	78.90%	91.50%
Sorghum bicolor GenBank	472	458 (97.03%)	458 (97.03%)	73.27%	83.81%
Zea mays GenBank	50,449	45,395 (89.98%)	45,395 (89.98%)	74.70%	83.06%
Zea mays known RefSeq (NP_)	20,992	20,023 (95.38%)	20,023 (95.38%)	73.62%	82.61%

Comparison of the current and previous annotations

The annotation produced for this release (103) was compared to the annotation in the previous release (102) for each assembly annotated in both releases. Scores for current and previous gene and transcript features were calculated based on overlap in exon sequence and matches in exon boundaries. Pairs of current and previous features were categorized based on these scores, whether they are reciprocal best matches, and changes in attributes (gene biotype, completeness, etc.). If the assembly was updated between the two releases, alignments between the current and the previous assembly were used to match the current and previous gene and transcript features in mapped regions.

The table below summarizes the changes in the gene set for each assembly as a percent of the number of genes in the current annotation release, and provides links to the details of the comparison in tabular format and in a Genome Workbench project.

	Setaria_italica_v2.0 (Current) to Setaria_italica_v2.0 (Previous)
Identical	9%
Minor changes	72%
Major changes	9%
New	7%
Deprecated	4%
Other	2%
Download the report	tabular, Genome Workbench

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20

RefSeq

Integrated reference sequences