NCBI Cannabis sativa Annotation Release 100

The RefSeq genome records for Cannabis sativa were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. This report presents statistics on the annotation products, the input data used in the pipeline and intermediate alignment results.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as NCBI Cannabis sativa Annotation Release 100

Annotation release ID: 100
Date of Entrez queries for transcripts and proteins: Aug 20 2019
Date of submission of annotation to the public databases: Aug 28 2019
Software version: 8.2

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
cs10	GCF_900626175.1	HARVARD OEB	02-14-2019	Reference	11 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	cs10
Genes and pseudogenes	31,172
protein-coding	25,297
non-coding	4,512
transcribed pseudogenes	5
non-transcribed pseudogenes	1,358
genes with variants	5,419
immunoglobulin/T-cell receptor gene segments	0
other	0
mRNAs	33,642
fully-supported	27,205
with > 5% ab initio	5,638
partial	119
with filled gap(s)	20
known RefSeq (NM_)	0
model RefSeq (XM_)	33,642
non-coding RNAs	6,516
fully-supported	4,314
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	6,022
pseudo transcripts	5
fully-supported	4
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	5
CDSs	33,677
fully-supported	27,205
with > 5% ab initio	5,743
partial	119
with major correction(s)	657
known RefSeq (NP_)	0
model RefSeq (XP_)	33,677

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	29,809	3,477	2,343	63	976,063
All transcripts	40,158	1,651	1,431	63	17,828
mRNA	33,642	1,791	1,566	114	17,396
misc_RNA	1,229	2,034	1,779	146	14,917
tRNA	491	74	73	71	91
lncRNA	3,085	1,079	605	73	17,828
snoRNA	1,595	106	107	63	243
snRNA	99	142	129	64	196
rRNA	17	903	155	115	3,394
Single-exon transcripts	4,528	1,206	993	114	5,625
coding transcripts (NM_/XM_ )	4,528	1,206	993	114	5,625
CDSs	33,677	1,373	1,140	114	16,644
Exons	155,404	315	167	1	9,202
in coding transcripts (NM_/XM_ )	145,964	317	168	1	7,968
in non-coding transcripts (NR_/XR_ )	13,151	262	138	2	9,202
Introns	123,569	535	156	30	970,509
in coding transcripts (NM_/XM_ )	117,315	518	153	30	970,509
in non-coding transcripts (NR_/XR_ )	9,792	694	202	32	81,832

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	1.35	1	1	50
Number of exons per transcript	5.71	4	1	79

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the Arabidopsis thaliana known RefSeq proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 25262 coding genes, 21379 genes had a protein with an alignment covering 50% or more of the query and 9135 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: Arabidopsis thaliana known RefSeq proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker for each assembly. RepeatMasker results are only used for organisms for which a comprehensive repeat library is available.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with RepeatMasker	% Masked with WindowMasker
cs10	GCF_900626175.1	3.06%	46.84%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez, aligned to the genome by Splign or ProSplign and passed to Gnomon, NCBI's gene prediction software.

Depending on the other evidence available, long 454 reads (with average length above 250 nt) may be aligned as traditional evidence and reported in the Transcript alignments section or aligned with RNA-Seq reads and reported in the RNA-Seq alignments section.

Transcript alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species Genbank	118	113 (95.76%)	110 (93.22%)	98.84%	98.01%
Same-species EST	12,903	11,861 (91.92%)	11,008 (85.31%)	98.97%	99.38%

RNA-Seq alignments

The following RNA-Seq reads from the Sequence Read Archive were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Publication	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	NA	Aggregate of all aligned samples	3,034,531,459	73%	24%	155,604
SAMEA104170451	NA	xylem var. C (Cannabis sativa, SAMEA104170451)	25,515,612	83%	21%	94,285
SAMEA104170452	NA	stem var. A (Cannabis sativa, SAMEA104170452)	35,635,658	87%	22%	104,678
SAMEA104170453	NA	core var. A (Cannabis sativa, SAMEA104170453)	28,687,112	87%	22%	99,749
SAMEA104170454	NA	stem peel var. A (Cannabis sativa, SAMEA104170454)	23,076,862	84%	21%	99,675
SAMN00262803	NA	Cannabis sativa Flower Buds (Cannabis sativa, SAMN00262803)	69,231,888	40%	15%	103,835
SAMN00262804	NA	Cannabis sativa Mature Flower (Cannabis sativa, SAMN00262804)	68,477,878	56%	16%	104,026
SAMN00262805	NA	Cannabis sativa Mature and Immature Leaf (Cannabis sativa, SAMN00262805)	70,387,130	70%	17%	104,748
SAMN00262806	NA	Cannabis sativa Entire Root (Cannabis sativa, SAMN00262806)	73,615,604	83%	15%	109,848
SAMN00262807	NA	Cannabis sativa Primary Stem (Cannabis sativa, SAMN00262807)	61,931,734	43%	17%	101,175
SAMN00630395	NA	CSATR3JP(R1) (Cannabis sativa, SAMN00630395)	750,867	75%	60%	68,003
SAMN00738619	22014239,30409771	mid-stage flowers (Cannabis sativa, SAMN00738619)	37,835,287	83%	24%	107,483
SAMN00738620	22014239,30409771	ealry-stage flowers (Cannabis sativa, SAMN00738620)	37,472,665	86%	25%	101,396
SAMN00738621	22014239,30409771	pre-flowers (Cannabis sativa, SAMN00738621)	54,026,640	85%	24%	109,463
SAMN00738622	22014239,30409771	shoots (Cannabis sativa, SAMN00738622)	55,653,984	86%	24%	107,819
SAMN00738626	22014239,30409771	roots (Cannabis sativa, SAMN00738626)	37,374,640	75%	23%	96,388
SAMN00738627	22014239,30409771	roots, shoots, stems, pre-flowers, early-stage flowers and mid-stage flowers (Cannabis sativa, SAMN00738627)	694,022,020	59%	27%	133,815
SAMN04296299	NA	leaves (Cannabis sativa, have 3-4 paired leaves, SAMN04296299)	268,458,054	83%	24%	126,631
SAMN05509013	27917184,28694530	hypocotyl (Cannabis sativa, SAMN05509013)	25,400,752	83%	18%	103,551
SAMN05509014	27917184,28694530	hypocotyl (Cannabis sativa, SAMN05509014)	22,093,230	81%	17%	99,748
SAMN05509015	27917184,28694530	hypocotyl (Cannabis sativa, SAMN05509015)	29,417,892	83%	18%	104,233
SAMN05509016	27917184,28694530	hypocotyl (Cannabis sativa, SAMN05509016)	30,383,510	83%	18%	105,842
SAMN05509017	27917184,28694530	hypocotyl (Cannabis sativa, SAMN05509017)	35,403,050	80%	17%	103,945
SAMN05509018	27917184,28694530	hypocotyl (Cannabis sativa, SAMN05509018)	22,510,608	81%	17%	100,241
SAMN05509019	27917184,28694530	hypocotyl (Cannabis sativa, SAMN05509019)	24,863,526	83%	18%	103,506
SAMN05509020	27917184,28694530	hypocotyl (Cannabis sativa, SAMN05509020)	26,342,036	82%	18%	103,447
SAMN05509021	27917184,28694530	hypocotyl (Cannabis sativa, SAMN05509021)	26,702,804	82%	18%	104,242
SAMN05509022	27917184,28694530	hypocotyl (Cannabis sativa, SAMN05509022)	23,307,090	81%	16%	98,600
SAMN05509023	27917184,28694530	hypocotyl (Cannabis sativa, SAMN05509023)	21,996,390	79%	17%	98,747
SAMN05509024	27917184,28694530	hypocotyl (Cannabis sativa, SAMN05509024)	21,083,720	74%	16%	96,255
SAMN06277070	27917184,28694530	Bast fibres (Cannabis sativa, SAMN06277070)	17,909,712	78%	18%	93,551
SAMN06277071	27917184,28694530	Bast fibres (Cannabis sativa, SAMN06277071)	18,227,938	78%	17%	94,293
SAMN06277072	27917184,28694530	Bast fibres (Cannabis sativa, SAMN06277072)	17,972,944	79%	18%	93,227
SAMN06277073	27917184,28694530	Bast fibres (Cannabis sativa, SAMN06277073)	15,051,654	80%	18%	91,735
SAMN06277074	27917184,28694530	Bast fibres (Cannabis sativa, SAMN06277074)	20,913,650	85%	17%	95,244
SAMN06277075	27917184,28694530	Bast fibres (Cannabis sativa, SAMN06277075)	18,246,234	84%	17%	94,649
SAMN06277076	27917184,28694530	Bast fibres (Cannabis sativa, SAMN06277076)	19,028,002	85%	18%	96,473
SAMN06277077	27917184,28694530	Bast fibres (Cannabis sativa, SAMN06277077)	20,668,662	85%	18%	96,369
SAMN06277078	27917184,28694530	Bast fibres (Cannabis sativa, SAMN06277078)	22,105,716	84%	18%	94,659
SAMN06277079	27917184,28694530	Bast fibres (Cannabis sativa, SAMN06277079)	18,459,742	84%	19%	95,650
SAMN06277080	27917184,28694530	Bast fibres (Cannabis sativa, SAMN06277080)	30,761,544	81%	18%	99,878
SAMN06277081	27917184,28694530	Bast fibres (Cannabis sativa, SAMN06277081)	35,026,036	84%	18%	103,066
SAMN09747683	NA	Trichome (bulbous) (Cannabis sativa, SAMN09747683)	45,704,938	81%	28%	97,664
SAMN09747684	NA	Trichome (bulbous) (Cannabis sativa, SAMN09747684)	14,799,074	78%	22%	77,589
SAMN09747685	NA	Trichome (bulbous) (Cannabis sativa, SAMN09747685)	21,691,796	71%	27%	87,375
SAMN09747686	NA	Trichome (sessile) (Cannabis sativa, SAMN09747686)	21,420,334	81%	28%	82,246
SAMN09747687	NA	Trichome (sessile) (Cannabis sativa, SAMN09747687)	23,448,164	62%	26%	84,331
SAMN09747688	NA	Trichome (sessile) (Cannabis sativa, SAMN09747688)	34,249,954	69%	23%	70,435
SAMN09747689	NA	Trichome (stalked) (Cannabis sativa, SAMN09747689)	6,787,926	79%	29%	63,744
SAMN09747690	NA	Trichome (stalked) (Cannabis sativa, SAMN09747690)	15,231,236	73%	11%	56,527
SAMN09747691	NA	Trichome (stalked) (Cannabis sativa, SAMN09747691)	13,026,100	64%	25%	73,868
SAMN10330896	31138625	Glandular Trichome, Capitate stalked glandular trichome, (Cannabis sativa, SAMN10330896)	23,666,554	85%	33%	92,782
SAMN10330897	31138625	Glandular Trichome, Capitate stalked glandular trichome, (Cannabis sativa, SAMN10330897)	37,078,848	85%	34%	101,843
SAMN10330898	31138625	Glandular Trichome, Capitate stalked glandular trichome, (Cannabis sativa, SAMN10330898)	43,562,392	84%	34%	109,134
SAMN10330899	31138625	Glandular Trichome, Capitate stalked glandular trichome, (Cannabis sativa, SAMN10330899)	20,860,490	68%	32%	85,168
SAMN10330900	31138625	Glandular Trichome, Capitate stalked glandular trichome, (Cannabis sativa, SAMN10330900)	21,758,678	73%	32%	87,459
SAMN10330901	31138625	Glandular Trichome, Capitate stalked glandular trichome, (Cannabis sativa, SAMN10330901)	26,200,206	73%	31%	89,866
SAMN10330902	31138625	Glandular Trichome, Capitate stalked glandular trichome, (Cannabis sativa, SAMN10330902)	15,103,474	83%	30%	85,948
SAMN10330903	31138625	Glandular Trichome, Capitate stalked glandular trichome, (Cannabis sativa, SAMN10330903)	24,075,384	80%	29%	94,652
SAMN10330904	31138625	Glandular Trichome, Capitate stalked glandular trichome, (Cannabis sativa, SAMN10330904)	20,388,864	81%	31%	93,882
SAMN10330905	31138625	Glandular Trichome, Capitate stalked glandular trichome, (Cannabis sativa, SAMN10330905)	33,722,898	72%	33%	99,778
SAMN10330906	31138625	Glandular Trichome, Capitate stalked glandular trichome, (Cannabis sativa, SAMN10330906)	16,848,856	72%	31%	89,720
SAMN10330907	31138625	Glandular Trichome, Capitate stalked glandular trichome, (Cannabis sativa, SAMN10330907)	14,787,904	61%	22%	71,980
SAMN10330908	31138625	Glandular Trichome, Capitate stalked glandular trichome, (Cannabis sativa, SAMN10330908)	17,011,612	73%	34%	86,000
SAMN10330909	31138625	Glandular Trichome, Capitate stalked glandular trichome, (Cannabis sativa, SAMN10330909)	32,288,440	77%	34%	92,091
SAMN10330910	31138625	Glandular Trichome, Capitate stalked glandular trichome, (Cannabis sativa, SAMN10330910)	22,645,316	69%	34%	87,158
SAMN10330911	31138625	Glandular Trichome, Capitate stalked glandular trichome, (Cannabis sativa, SAMN10330911)	16,850,686	83%	32%	92,898
SAMN10330912	31138625	Glandular Trichome, Capitate stalked glandular trichome, (Cannabis sativa, SAMN10330912)	18,396,400	82%	31%	93,300
SAMN10330913	31138625	Glandular Trichome, Capitate stalked glandular trichome, (Cannabis sativa, SAMN10330913)	17,933,534	83%	31%	93,808
SAMN10330914	31138625	Glandular Trichome, Capitate stalked glandular trichome, (Cannabis sativa, SAMN10330914)	19,605,246	78%	31%	87,320
SAMN10330915	31138625	Glandular Trichome, Capitate stalked glandular trichome, (Cannabis sativa, SAMN10330915)	26,363,702	78%	30%	91,593
SAMN10330916	31138625	Glandular Trichome, Capitate stalked glandular trichome, (Cannabis sativa, SAMN10330916)	28,131,330	77%	32%	94,600
SAMN10330917	31138625	Glandular Trichome, Capitate stalked glandular trichome, (Cannabis sativa, SAMN10330917)	19,960,984	85%	33%	90,995
SAMN10330918	31138625	Glandular Trichome, Capitate stalked glandular trichome, (Cannabis sativa, SAMN10330918)	30,268,868	85%	34%	96,516
SAMN10330919	31138625	Glandular Trichome, Capitate stalked glandular trichome, (Cannabis sativa, SAMN10330919)	31,663,678	85%	32%	97,297
SAMN10330920	31138625	Glandular Trichome, Capitate stalked glandular trichome, (Cannabis sativa, SAMN10330920)	17,228,268	77%	27%	85,167
SAMN10330921	31138625	Glandular Trichome, Capitate stalked glandular trichome, (Cannabis sativa, SAMN10330921)	14,526,082	66%	18%	62,725
SAMN10330922	31138625	Glandular Trichome, Capitate stalked glandular trichome, (Cannabis sativa, SAMN10330922)	21,213,166	70%	23%	85,789

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
ERR2040407	ERX2099464	ERP023948	SAMEA104170451	25,515,612	83%	21%
ERR2040408	ERX2099465	ERP023948	SAMEA104170452	35,635,658	87%	22%
ERR2040409	ERX2099466	ERP023948	SAMEA104170453	28,687,112	87%	22%
ERR2040410	ERX2099467	ERP023948	SAMEA104170454	23,076,862	84%	21%
SRR192369	SRX060178	SRP006678	SAMN00262803	69,231,888	40%	15%
SRR192370	SRX060179	SRP006678	SAMN00262804	68,477,878	56%	16%
SRR192371	SRX060180	SRP006678	SAMN00262805	70,387,130	70%	17%
SRR192372	SRX060181	SRP006678	SAMN00262806	73,615,604	83%	15%
SRR192373	SRX060182	SRP006678	SAMN00262807	61,931,734	43%	17%
SRR292255	SRX079590	SRP007300	SAMN00630395	750,867	75%	60%
SRR352195	SRX100737	SRP008673	SAMN00738619	37,835,287	83%	24%
SRR352196	SRX100738	SRP008673	SAMN00738620	37,472,665	86%	25%
SRR352198	SRX100739	SRP008673	SAMN00738621	54,026,640	85%	24%
SRR352200	SRX100740	SRP008673	SAMN00738622	55,653,984	86%	24%
SRR352202	SRX100744	SRP008673	SAMN00738626	37,374,640	75%	23%
SRR352203	SRX100748	SRP008673	SAMN00738627	220,967,788	60%	27%
SRR352205	SRX100750	SRP008673	SAMN00738627	164,380,088	58%	27%
SRR352208	SRX100751	SRP008673	SAMN00738627	211,474,238	61%	27%
SRR352210	SRX100752	SRP008673	SAMN00738627	97,199,906	59%	27%
SRR2961016	SRX1452312	SRP066670	SAMN04296299	268,458,054	83%	24%
SRR3996358	SRX1997250	SRP133605	SAMN05509013	3,891,366	83%	18%
SRR3996359	SRX1997250	SRP133605	SAMN05509013	3,679,016	83%	18%
SRR3996360	SRX1997250	SRP133605	SAMN05509013	4,438,762	83%	18%
SRR3996361	SRX1997250	SRP133605	SAMN05509013	3,849,506	84%	18%
SRR3996362	SRX1997250	SRP133605	SAMN05509013	4,464,492	83%	18%
SRR3996363	SRX1997250	SRP133605	SAMN05509013	5,077,610	83%	18%
SRR3996352	SRX1997249	SRP133605	SAMN05509014	5,673,488	81%	18%
SRR3996353	SRX1997249	SRP133605	SAMN05509014	3,060,214	81%	18%
SRR3996354	SRX1997249	SRP133605	SAMN05509014	3,262,504	81%	17%
SRR3996355	SRX1997249	SRP133605	SAMN05509014	2,979,984	82%	18%
SRR3996356	SRX1997249	SRP133605	SAMN05509014	3,369,398	81%	17%
SRR3996357	SRX1997249	SRP133605	SAMN05509014	3,747,642	82%	17%
SRR3996346	SRX1997248	SRP133605	SAMN05509015	3,547,136	82%	18%
SRR3996347	SRX1997248	SRP133605	SAMN05509015	4,703,250	83%	18%
SRR3996348	SRX1997248	SRP133605	SAMN05509015	5,288,924	83%	18%
SRR3996349	SRX1997248	SRP133605	SAMN05509015	4,524,976	83%	18%
SRR3996350	SRX1997248	SRP133605	SAMN05509015	5,308,372	83%	18%
SRR3996351	SRX1997248	SRP133605	SAMN05509015	6,045,234	83%	18%
SRR3996340	SRX1997247	SRP133605	SAMN05509016	7,327,836	82%	18%
SRR3996341	SRX1997247	SRP133605	SAMN05509016	3,706,678	83%	18%
SRR3996342	SRX1997247	SRP133605	SAMN05509016	4,606,356	83%	18%
SRR3996343	SRX1997247	SRP133605	SAMN05509016	4,362,568	83%	18%
SRR3996344	SRX1997247	SRP133605	SAMN05509016	4,985,626	83%	18%
SRR3996345	SRX1997247	SRP133605	SAMN05509016	5,394,446	83%	18%
SRR3996334	SRX1997246	SRP133605	SAMN05509017	9,371,940	80%	17%
SRR3996335	SRX1997246	SRP133605	SAMN05509017	4,791,638	80%	17%
SRR3996336	SRX1997246	SRP133605	SAMN05509017	5,295,560	81%	17%
SRR3996337	SRX1997246	SRP133605	SAMN05509017	4,598,556	80%	17%
SRR3996338	SRX1997246	SRP133605	SAMN05509017	5,291,100	80%	17%
SRR3996339	SRX1997246	SRP133605	SAMN05509017	6,054,256	81%	17%
SRR3996328	SRX1997245	SRP133605	SAMN05509018	5,945,504	81%	17%
SRR3996329	SRX1997245	SRP133605	SAMN05509018	3,017,054	81%	17%
SRR3996330	SRX1997245	SRP133605	SAMN05509018	3,374,736	81%	17%
SRR3996331	SRX1997245	SRP133605	SAMN05509018	2,922,600	81%	17%
SRR3996332	SRX1997245	SRP133605	SAMN05509018	3,383,280	81%	17%
SRR3996333	SRX1997245	SRP133605	SAMN05509018	3,867,434	81%	17%
SRR3996322	SRX1997244	SRP133605	SAMN05509019	2,543,040	82%	18%
SRR3996323	SRX1997244	SRP133605	SAMN05509019	4,178,954	83%	18%
SRR3996324	SRX1997244	SRP133605	SAMN05509019	4,558,744	83%	18%
SRR3996325	SRX1997244	SRP133605	SAMN05509019	3,827,718	83%	18%
SRR3996326	SRX1997244	SRP133605	SAMN05509019	4,540,644	83%	18%
SRR3996327	SRX1997244	SRP133605	SAMN05509019	5,214,426	83%	18%
SRR3996316	SRX1997243	SRP133605	SAMN05509020	6,611,772	81%	18%
SRR3996317	SRX1997243	SRP133605	SAMN05509020	3,264,724	82%	18%
SRR3996318	SRX1997243	SRP133605	SAMN05509020	4,005,488	82%	18%
SRR3996319	SRX1997243	SRP133605	SAMN05509020	3,671,422	82%	18%
SRR3996320	SRX1997243	SRP133605	SAMN05509020	4,157,940	82%	18%
SRR3996321	SRX1997243	SRP133605	SAMN05509020	4,630,690	82%	18%
SRR3996310	SRX1997242	SRP133605	SAMN05509021	7,108,732	82%	18%
SRR3996311	SRX1997242	SRP133605	SAMN05509021	3,574,574	83%	18%
SRR3996312	SRX1997242	SRP133605	SAMN05509021	4,000,478	82%	18%
SRR3996313	SRX1997242	SRP133605	SAMN05509021	3,423,120	83%	18%
SRR3996314	SRX1997242	SRP133605	SAMN05509021	4,011,916	82%	18%
SRR3996315	SRX1997242	SRP133605	SAMN05509021	4,583,984	83%	18%
SRR3996304	SRX1997241	SRP133605	SAMN05509022	5,968,392	81%	16%
SRR3996305	SRX1997241	SRP133605	SAMN05509022	3,307,600	81%	16%
SRR3996306	SRX1997241	SRP133605	SAMN05509022	3,474,698	81%	16%
SRR3996307	SRX1997241	SRP133605	SAMN05509022	3,135,398	81%	16%
SRR3996308	SRX1997241	SRP133605	SAMN05509022	3,484,730	81%	16%
SRR3996309	SRX1997241	SRP133605	SAMN05509022	3,936,272	81%	16%
SRR3996298	SRX1997240	SRP133605	SAMN05509023	5,689,198	79%	17%
SRR3996299	SRX1997240	SRP133605	SAMN05509023	2,920,016	79%	17%
SRR3996300	SRX1997240	SRP133605	SAMN05509023	3,313,210	79%	17%
SRR3996301	SRX1997240	SRP133605	SAMN05509023	2,942,706	79%	16%
SRR3996302	SRX1997240	SRP133605	SAMN05509023	3,329,672	79%	16%
SRR3996303	SRX1997240	SRP133605	SAMN05509023	3,801,588	79%	17%
SRR3996292	SRX1997239	SRP133605	SAMN05509024	4,948,536	75%	16%
SRR3996293	SRX1997239	SRP133605	SAMN05509024	2,618,582	73%	16%
SRR3996294	SRX1997239	SRP133605	SAMN05509024	3,204,084	74%	16%
SRR3996295	SRX1997239	SRP133605	SAMN05509024	3,173,214	72%	16%
SRR3996296	SRX1997239	SRP133605	SAMN05509024	3,425,648	74%	16%
SRR3996297	SRX1997239	SRP133605	SAMN05509024	3,713,656	74%	16%
SRR5210004	SRX2523463	SRP133605	SAMN06277070	2,345,804	78%	18%
SRR5210005	SRX2523463	SRP133605	SAMN06277070	3,369,612	78%	18%
SRR5210006	SRX2523463	SRP133605	SAMN06277070	3,893,232	78%	18%
SRR5210007	SRX2523463	SRP133605	SAMN06277070	3,991,928	78%	18%
SRR5210008	SRX2523463	SRP133605	SAMN06277070	4,309,136	78%	18%
SRR5209999	SRX2523462	SRP133605	SAMN06277071	2,479,116	78%	17%
SRR5210000	SRX2523462	SRP133605	SAMN06277071	3,420,676	78%	17%
SRR5210001	SRX2523462	SRP133605	SAMN06277071	3,906,638	78%	17%
SRR5210002	SRX2523462	SRP133605	SAMN06277071	4,051,316	78%	17%
SRR5210003	SRX2523462	SRP133605	SAMN06277071	4,370,192	78%	17%
SRR5209994	SRX2523461	SRP133605	SAMN06277072	2,413,516	79%	18%
SRR5209995	SRX2523461	SRP133605	SAMN06277072	3,338,648	79%	18%
SRR5209996	SRX2523461	SRP133605	SAMN06277072	3,847,646	79%	18%
SRR5209997	SRX2523461	SRP133605	SAMN06277072	4,054,822	79%	18%
SRR5209998	SRX2523461	SRP133605	SAMN06277072	4,318,312	79%	18%
SRR5209989	SRX2523460	SRP133605	SAMN06277073	2,004,870	81%	18%
SRR5209990	SRX2523460	SRP133605	SAMN06277073	2,828,188	81%	18%
SRR5209991	SRX2523460	SRP133605	SAMN06277073	3,256,970	81%	18%
SRR5209992	SRX2523460	SRP133605	SAMN06277073	3,366,278	80%	18%
SRR5209993	SRX2523460	SRP133605	SAMN06277073	3,595,348	80%	18%
SRR5209984	SRX2523459	SRP133605	SAMN06277074	2,776,364	85%	17%
SRR5209985	SRX2523459	SRP133605	SAMN06277074	3,948,450	85%	17%
SRR5209986	SRX2523459	SRP133605	SAMN06277074	4,547,140	85%	17%
SRR5209987	SRX2523459	SRP133605	SAMN06277074	4,644,746	85%	17%
SRR5209988	SRX2523459	SRP133605	SAMN06277074	4,996,950	85%	17%
SRR5209979	SRX2523458	SRP133605	SAMN06277075	2,448,528	84%	17%
SRR5209980	SRX2523458	SRP133605	SAMN06277075	3,395,376	84%	17%
SRR5209981	SRX2523458	SRP133605	SAMN06277075	3,912,826	84%	17%
SRR5209982	SRX2523458	SRP133605	SAMN06277075	4,119,208	84%	17%
SRR5209983	SRX2523458	SRP133605	SAMN06277075	4,370,296	84%	17%
SRR5209974	SRX2523457	SRP133605	SAMN06277076	1,298,996	85%	18%
SRR5209975	SRX2523457	SRP133605	SAMN06277076	3,814,714	85%	18%
SRR5209976	SRX2523457	SRP133605	SAMN06277076	4,418,790	85%	18%
SRR5209977	SRX2523457	SRP133605	SAMN06277076	4,576,702	85%	18%
SRR5209978	SRX2523457	SRP133605	SAMN06277076	4,918,800	85%	18%
SRR5209969	SRX2523456	SRP133605	SAMN06277077	2,699,878	85%	18%
SRR5209970	SRX2523456	SRP133605	SAMN06277077	3,873,384	85%	18%
SRR5209971	SRX2523456	SRP133605	SAMN06277077	4,457,218	85%	18%
SRR5209972	SRX2523456	SRP133605	SAMN06277077	4,646,380	85%	18%
SRR5209973	SRX2523456	SRP133605	SAMN06277077	4,991,802	85%	18%
SRR5209964	SRX2523455	SRP133605	SAMN06277078	2,953,526	84%	18%
SRR5209965	SRX2523455	SRP133605	SAMN06277078	4,196,900	84%	18%
SRR5209966	SRX2523455	SRP133605	SAMN06277078	4,790,670	84%	18%
SRR5209967	SRX2523455	SRP133605	SAMN06277078	4,898,600	84%	18%
SRR5209968	SRX2523455	SRP133605	SAMN06277078	5,266,020	84%	18%
SRR5209959	SRX2523454	SRP133605	SAMN06277079	1,994,696	84%	19%
SRR5209960	SRX2523454	SRP133605	SAMN06277079	3,603,950	84%	19%
SRR5209961	SRX2523454	SRP133605	SAMN06277079	4,075,412	84%	19%
SRR5209962	SRX2523454	SRP133605	SAMN06277079	4,289,318	84%	19%
SRR5209963	SRX2523454	SRP133605	SAMN06277079	4,496,366	84%	19%
SRR5209954	SRX2523453	SRP133605	SAMN06277080	1,830,290	81%	18%
SRR5209955	SRX2523453	SRP133605	SAMN06277080	6,262,264	81%	18%
SRR5209956	SRX2523453	SRP133605	SAMN06277080	7,134,546	81%	18%
SRR5209957	SRX2523453	SRP133605	SAMN06277080	7,555,390	81%	18%
SRR5209958	SRX2523453	SRP133605	SAMN06277080	7,979,054	81%	18%
SRR5209949	SRX2523452	SRP133605	SAMN06277081	1,681,916	84%	18%
SRR5209950	SRX2523452	SRP133605	SAMN06277081	7,171,084	84%	18%
SRR5209951	SRX2523452	SRP133605	SAMN06277081	8,286,690	84%	18%
SRR5209952	SRX2523452	SRP133605	SAMN06277081	8,699,344	84%	18%
SRR5209953	SRX2523452	SRP133605	SAMN06277081	9,187,002	84%	18%
SRR7630403	SRX4494172	SRP155904	SAMN09747683	45,704,938	81%	28%
SRR7630404	SRX4494171	SRP155904	SAMN09747684	14,799,074	78%	22%
SRR7630401	SRX4494174	SRP155904	SAMN09747685	21,691,796	71%	27%
SRR7630402	SRX4494173	SRP155904	SAMN09747686	21,420,334	81%	28%
SRR7630407	SRX4494168	SRP155904	SAMN09747687	23,448,164	62%	26%
SRR7630408	SRX4494167	SRP155904	SAMN09747688	34,249,954	69%	23%
SRR7630405	SRX4494170	SRP155904	SAMN09747689	6,787,926	79%	29%
SRR7630406	SRX4494169	SRP155904	SAMN09747690	15,231,236	73%	11%
SRR7630400	SRX4494175	SRP155904	SAMN09747691	13,026,100	64%	25%
SRR8181716	SRX5001709	SRP168446	SAMN10330896	23,666,554	85%	33%
SRR8181717	SRX5001708	SRP168446	SAMN10330897	37,078,848	85%	34%
SRR8181718	SRX5001707	SRP168446	SAMN10330898	43,562,392	84%	34%
SRR8181719	SRX5001706	SRP168446	SAMN10330899	20,860,490	68%	32%
SRR8181720	SRX5001705	SRP168446	SAMN10330900	21,758,678	73%	32%
SRR8181721	SRX5001704	SRP168446	SAMN10330901	26,200,206	73%	31%
SRR8181722	SRX5001703	SRP168446	SAMN10330902	15,103,474	83%	30%
SRR8181723	SRX5001702	SRP168446	SAMN10330903	24,075,384	80%	29%
SRR8181724	SRX5001701	SRP168446	SAMN10330904	20,388,864	81%	31%
SRR8181725	SRX5001700	SRP168446	SAMN10330905	33,722,898	72%	33%
SRR8181737	SRX5001688	SRP168446	SAMN10330906	16,848,856	72%	31%
SRR8181738	SRX5001687	SRP168446	SAMN10330907	14,787,904	61%	22%
SRR8181739	SRX5001686	SRP168446	SAMN10330908	17,011,612	73%	34%
SRR8181740	SRX5001685	SRP168446	SAMN10330909	32,288,440	77%	34%
SRR8181733	SRX5001692	SRP168446	SAMN10330910	22,645,316	69%	34%
SRR8181734	SRX5001691	SRP168446	SAMN10330911	16,850,686	83%	32%
SRR8181735	SRX5001690	SRP168446	SAMN10330912	18,396,400	82%	31%
SRR8181736	SRX5001689	SRP168446	SAMN10330913	17,933,534	83%	31%
SRR8181741	SRX5001684	SRP168446	SAMN10330914	19,605,246	78%	31%
SRR8181742	SRX5001683	SRP168446	SAMN10330915	26,363,702	78%	30%
SRR8181727	SRX5001698	SRP168446	SAMN10330916	28,131,330	77%	32%
SRR8181726	SRX5001699	SRP168446	SAMN10330917	19,960,984	85%	33%
SRR8181729	SRX5001696	SRP168446	SAMN10330918	30,268,868	85%	34%
SRR8181728	SRX5001697	SRP168446	SAMN10330919	31,663,678	85%	32%
SRR8181731	SRX5001694	SRP168446	SAMN10330920	17,228,268	77%	27%
SRR8181730	SRX5001695	SRP168446	SAMN10330921	14,526,082	66%	18%
SRR8181732	SRX5001693	SRP168446	SAMN10330922	21,213,166	70%	23%

Protein alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species GenBank	111	108 (97.30%)	108 (97.30%)	77.33%	93.26%
Arabidopsis thaliana known RefSeq (NP_)	48,147	41,404 (85.99%)	41,404 (85.99%)	66.32%	70.21%
Rosales GenBank	7,582	7,180 (94.70%)	7,180 (94.70%)	70.63%	81.86%
Rosales known RefSeq (NP_)	895	875 (97.77%)	875 (97.77%)	69.51%	78.27%
Malus domestica high-quality model RefSeq (XP_)	28,198	25,799 (91.49%)	25,799 (91.49%)	68.87%	77.09%

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20

RefSeq

Integrated reference sequences