NCBI Populus euphratica Annotation Release 100

The RefSeq genome records for Populus euphratica were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. This report presents statistics on the annotation products, the input data used in the pipeline and intermediate alignment results.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as NCBI Populus euphratica Annotation Release 100

Annotation release ID: 100
Date of Entrez queries for transcripts and proteins: Jan 5 2015
Date of submission of annotation to the public databases: Jan 6 2015
Software version: 6.2

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
PopEup_1.0	GCF_000495115.1	Lanzhou University	08-12-2014	Reference	1 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	PopEup_1.0
Genes and pseudogenes	35,733
protein-coding	30,684
non-coding	3,231
pseudogenes	1,818
genes with variants	10,727
mRNAs	49,676
fully-supported	47,276
with > 5% ab initio	1,683
partial	459
with filled gap(s)	0
known RefSeq (NM_)	0
model RefSeq (XM_)	49,676
Other RNAs	7,413
fully-supported	6,786
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	6,786
CDSs	49,676
fully-supported	47,276
with > 5% ab initio	1,758
partial	459
with major correction(s)	532
known RefSeq (NP_)	0
model RefSeq (XP_)	49,676

Detailed reports

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	33,915	4,210	3,114	71	107,126
All transcripts	57,089	1,926	1,693	71	39,649
mRNA	49,676	2,012	1,759	144	39,649
misc_RNA	2,768	2,221	1,955	129	10,932
tRNA	627	74	73	71	88
lncRNA	4,018	948	681	72	20,167
Single-exon transcripts	4,023	1,417	1,258	144	39,649
coding transcripts (NM_/XM_ )	4,023	1,417	1,258	144	39,649
CDSs	49,676	1,484	1,239	108	16,380
Exons	224,974	318	167	1	39,649
in coding transcripts (NM_/XM_ )	211,016	317	165	1	39,649
in non-coding transcripts (NR_/XR_ )	23,509	292	157	2	16,868
Introns	180,474	546	194	30	97,288
in coding transcripts (NM_/XM_ )	171,731	508	193	30	97,288
in non-coding transcripts (NR_/XR_ )	17,961	858	213	30	97,288

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	1.67	1	1	40
Number of exons per transcript	6.69	5	1	80

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the Arabidopsis thaliana known RefSeq proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 30684 coding genes, 29121 genes had a protein with an alignment covering 50% or more of the query and 14371 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: Arabidopsis thaliana known RefSeq proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker for each assembly. RepeatMasker results are only used for organisms for which a comprehensive repeat library is available.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with RepeatMasker	% Masked with WindowMasker
PopEup_1.0	GCF_000495115.1	4.98%	42.39%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez, aligned to the genome by Splign or ProSplign and passed to Gnomon, NCBI's gene prediction software.

Depending on the other evidence available, long 454 reads (with average length above 250 nt) may be aligned as traditional evidence and reported in the Transcript alignments section or aligned with short reads and reported in the Short read transcript alignments section.

Transcript alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species Genbank	72	71 (98.61%)	70 (97.22%)	99.32%	99.64%
Same-species EST	13,979	12,961 (92.72%)	12,088 (86.47%)	99.03%	98.87%

Short read transcript alignments

The following short reads (RNA-Seq) from the Sequence Read Archive were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Track name	Number of reads	Number (%) of aligned reads	Number (%) spliced reads	Number of introns
All	Aggregate of all aligned samples	5,883,746,291	4,499,267,730 (76.47%)	953,935,730 (16.21%)	311,060
SAMN00102783	24 h NaCl-treated P. euphratica callus (Populus euphratica, SAMN00102783)	57,974,944	55,089,087 (95.02%)	8,896,188 (15.34%)	146,060
SAMN00102784	P. euphratica field-grown trees, including leaves, roots, flower buds, xylem, phloem (Populus euphratica, SAMN00102784)	28,272,830	24,846,179 (87.88%)	3,187,529 (11.27%)	106,336
SAMN00102785	P. euphratica control callus (Populus euphratica, SAMN00102785)	55,112,298	52,685,205 (95.60%)	8,510,488 (15.44%)	154,141
SAMN00217548	well-watered plant cDNA library (Populus euphratica, SAMN00217548)	218,601	177,235 (81.08%)	99,874 (45.69%)	61,934
SAMN00217549	drought-reponsive plant cDNA library (Populus euphratica, SAMN00217549)	287,120	228,841 (79.70%)	129,262 (45.02%)	72,114
SAMN02203723	Generic sample from untreated Populus euphratica (Populus euphratica, SAMN02203723)	132,575,472	127,777,780 (96.38%)	28,051,984 (21.16%)	172,591
SAMN02203724	Generic sample from chilling treated Populus euphratica (Populus euphratica, SAMN02203724)	135,458,532	130,251,235 (96.16%)	25,280,953 (18.66%)	171,090
SAMN02203725	Generic sample from freezing treated Populus euphratica (Populus euphratica, SAMN02203725)	134,977,754	129,055,353 (95.61%)	24,626,739 (18.25%)	172,781
SAMN02316633	General Sample for Populus euphratica (Populus euphratica, SAMN02316633)	55,112,298	50,878,578 (92.32%)	7,102,935 (12.89%)	151,360
SAMN02316634	General Sample for Populus euphratica (Populus euphratica, SAMN02316634)	48,054,722	44,520,136 (92.64%)	7,104,466 (14.78%)	134,633
SAMN02316635	General Sample for Populus euphratica (Populus euphratica, SAMN02316635)	48,199,694	45,492,379 (94.38%)	7,971,423 (16.54%)	136,112
SAMN02316636	General Sample for Populus euphratica (Populus euphratica, SAMN02316636)	72,270,670	66,728,648 (92.33%)	16,494,914 (22.82%)	165,428
SAMN02316637	General Sample for Populus euphratica (Populus euphratica, SAMN02316637)	48,029,168	45,880,747 (95.53%)	8,443,864 (17.58%)	137,387
SAMN02318741	General Sample for Populus euphratica Oliva, a desert tree species (Populus euphratica, SAMN02318741)	137,660,376	130,105,662 (94.51%)	20,829,881 (15.13%)	166,897
SAMN02419880	leaf (Populus balsamifera, SAMN02419880)	111,334,650	84,879,198 (76.24%)	19,865,444 (17.84%)	193,436
SAMN02419881	leaf (Populus balsamifera, SAMN02419881)	100,228,364	70,612,708 (70.45%)	15,224,761 (15.19%)	176,353
SAMN02419882	leaf (Populus balsamifera, SAMN02419882)	51,987,730	39,603,076 (76.18%)	9,054,024 (17.42%)	166,096
SAMN02419883	leaf (Populus balsamifera, SAMN02419883)	172,632,422	122,311,121 (70.85%)	26,319,453 (15.25%)	206,746
SAMN02419884	leaf (Populus balsamifera, SAMN02419884)	62,505,964	46,596,592 (74.55%)	10,649,763 (17.04%)	174,405
SAMN02419885	leaf (Populus balsamifera, SAMN02419885)	115,344,546	88,495,682 (76.72%)	20,730,142 (17.97%)	188,343
SAMN02419886	leaf (Populus balsamifera, SAMN02419886)	186,963,470	131,710,422 (70.45%)	26,231,097 (14.03%)	203,351
SAMN02419887	leaf (Populus balsamifera, SAMN02419887)	100,171,840	73,365,010 (73.24%)	15,514,690 (15.49%)	187,370
SAMN02419888	leaf (Populus balsamifera, SAMN02419888)	158,678,648	102,457,331 (64.57%)	16,139,081 (10.17%)	187,033
SAMN02419889	leaf (Populus balsamifera, SAMN02419889)	120,443,552	91,783,062 (76.20%)	21,570,589 (17.91%)	187,773
SAMN02419890	leaf (Populus balsamifera, SAMN02419890)	185,235,118	135,234,464 (73.01%)	29,190,158 (15.76%)	198,212
SAMN02419891	leaf (Populus balsamifera, SAMN02419891)	117,442,020	85,972,472 (73.20%)	17,069,050 (14.53%)	191,717
SAMN02419892	leaf (Populus balsamifera, SAMN02419892)	206,812,810	138,539,224 (66.99%)	10,458,963 (5.06%)	164,516
SAMN02419893	leaf (Populus balsamifera, SAMN02419893)	93,838,958	72,017,736 (76.75%)	14,628,622 (15.59%)	177,992
SAMN02419894	leaf (Populus balsamifera, SAMN02419894)	145,107,100	108,475,536 (74.76%)	23,459,354 (16.17%)	197,420
SAMN02419895	leaf (Populus balsamifera, SAMN02419895)	120,271,448	77,986,511 (64.84%)	11,206,729 (9.32%)	182,379
SAMN02419896	leaf (Populus balsamifera, SAMN02419896)	90,204,084	67,232,408 (74.53%)	14,491,743 (16.07%)	189,504
SAMN02419897	leaf (Populus balsamifera, SAMN02419897)	128,443,596	99,527,949 (77.49%)	22,864,371 (17.80%)	194,285
SAMN02419898	leaf (Populus balsamifera, SAMN02419898)	159,617,680	118,956,012 (74.53%)	27,579,159 (17.28%)	192,317
SAMN02419899	leaf (Populus balsamifera, SAMN02419899)	132,530,114	102,595,758 (77.41%)	23,532,804 (17.76%)	197,914
SAMN02419900	leaf (Populus balsamifera, SAMN02419900)	123,762,988	91,086,431 (73.60%)	19,854,446 (16.04%)	191,050
SAMN02419901	leaf (Populus balsamifera, SAMN02419901)	49,552,248	39,462,680 (79.64%)	9,765,797 (19.71%)	162,933
SAMN02419902	leaf (Populus balsamifera, SAMN02419902)	161,222,250	122,393,570 (75.92%)	27,336,168 (16.96%)	201,701
SAMN02419903	leaf (Populus balsamifera, SAMN02419903)	91,380,498	61,395,276 (67.19%)	9,144,458 (10.01%)	170,844
SAMN02419904	leaf (Populus balsamifera, SAMN02419904)	66,378,786	49,895,241 (75.17%)	10,976,709 (16.54%)	172,482
SAMN02419905	leaf (Populus balsamifera, SAMN02419905)	127,744,056	103,111,378 (80.72%)	24,291,657 (19.02%)	202,057
SAMN02419906	leaf (Populus balsamifera, SAMN02419906)	139,086,428	112,909,668 (81.18%)	26,782,387 (19.26%)	202,901
SAMN02419907	leaf (Populus balsamifera, SAMN02419907)	43,019,072	27,813,959 (64.65%)	4,145,538 (9.64%)	140,166
SAMN02419908	leaf (Populus balsamifera, SAMN02419908)	32,755,064	21,170,808 (64.63%)	3,602,425 (11.00%)	140,994
SAMN02419909	leaf (Populus balsamifera, SAMN02419909)	210,276,010	161,365,472 (76.74%)	38,194,546 (18.16%)	211,545
SAMN02639421	catkin (Populus tomentosa, missing, female, SAMN02639421)	600,234,912	294,077,292 (48.99%)	76,939,394 (12.82%)	215,120
SAMN02666645	xylem (Populus tomentosa, missing, SAMN02666645)	334,996,250	305,988,380 (91.34%)	77,762,176 (23.21%)	196,376
SAMN02729205	root (Populus tremula x Populus alba, SAMN02729205)	96,600,262	88,599,092 (91.72%)	22,483,392 (23.27%)	186,462
SAMN02729206	root (Populus tremula x Populus alba, SAMN02729206)	94,230,516	86,762,775 (92.08%)	21,860,549 (23.20%)	185,604
SAMN02729207	root (Populus tremula x Populus alba, SAMN02729207)	101,095,758	90,930,061 (89.94%)	21,906,979 (21.67%)	188,571
SAMN02729208	root (Populus tremula x Populus alba, SAMN02729208)	97,412,600	80,236,310 (82.37%)	16,378,612 (16.81%)	140,684

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Number (%) of aligned reads	Number (%) spliced reads
SRR064170	SRX025568	SRP003271	SAMN00102783	57,974,944	55,089,087 (95.02%)	8,896,188 (15.34%)
SRR064168	SRX025570	SRP003271	SAMN00102784	28,272,830	24,846,179 (87.88%)	3,187,529 (11.27%)
SRR064169	SRX025571	SRP003271	SAMN00102785	55,112,298	52,685,205 (95.60%)	8,510,488 (15.44%)
SRR124475	SRX047542	SRP005997	SAMN00217548	218,601	177,235 (81.08%)	99,874 (45.69%)
SRR124476	SRX047543	SRP005997	SAMN00217549	287,120	228,841 (79.70%)	129,262 (45.02%)
SRR901769	SRX306524	SRP026075	SAMN02203723	132,575,472	127,777,780 (96.38%)	28,051,984 (21.16%)
SRR921507	SRX314777	SRP026075	SAMN02203724	135,458,532	130,251,235 (96.16%)	25,280,953 (18.66%)
SRR922436	SRX316169	SRP026075	SAMN02203725	134,977,754	129,055,353 (95.61%)	24,626,739 (18.25%)
SRR952753	SRX335458	SRP028829	SAMN02316633	55,112,298	50,878,578 (92.32%)	7,102,935 (12.89%)
SRR952754	SRX335459	SRP028829	SAMN02316634	48,054,722	44,520,136 (92.64%)	7,104,466 (14.78%)
SRR952725	SRX335452	SRP028829	SAMN02316635	48,199,694	45,492,379 (94.38%)	7,971,423 (16.54%)
SRR952700	SRX335446	SRP028829	SAMN02316636	8,000,000	7,508,567 (93.86%)	1,866,167 (23.33%)
SRR952701	SRX335446	SRP028829	SAMN02316636	8,000,000	7,318,571 (91.48%)	1,796,139 (22.45%)
SRR952702	SRX335446	SRP028829	SAMN02316636	8,000,000	7,411,532 (92.64%)	1,830,463 (22.88%)
SRR952703	SRX335446	SRP028829	SAMN02316636	8,000,000	7,545,753 (94.32%)	1,881,239 (23.52%)
SRR952704	SRX335446	SRP028829	SAMN02316636	8,000,000	7,382,293 (92.28%)	1,823,527 (22.79%)
SRR952705	SRX335446	SRP028829	SAMN02316636	8,000,000	7,233,212 (90.42%)	1,771,482 (22.14%)
SRR952706	SRX335446	SRP028829	SAMN02316636	8,000,000	7,481,281 (93.52%)	1,867,959 (23.35%)
SRR952707	SRX335446	SRP028829	SAMN02316636	8,000,000	7,342,961 (91.79%)	1,815,017 (22.69%)
SRR952708	SRX335446	SRP028829	SAMN02316636	8,000,000	7,282,592 (91.03%)	1,791,354 (22.39%)
SRR952709	SRX335446	SRP028829	SAMN02316636	270,670	221,886 (81.98%)	51,567 (19.05%)
SRR952726	SRX335454	SRP028829	SAMN02316637	48,029,168	45,880,747 (95.53%)	8,443,864 (17.58%)
SRR955312	SRX337854	SRP029139	SAMN02318741	69,055,560	65,043,606 (94.19%)	7,960,357 (11.53%)
SRR956808	SRX337854	SRP029139	SAMN02318741	68,604,816	65,062,056 (94.84%)	12,869,524 (18.76%)
SRR1036599	SRX382291	SRP033278	SAMN02419880	111,334,650	84,879,198 (76.24%)	19,865,444 (17.84%)
SRR1041799	SRX386137	SRP033278	SAMN02419881	100,228,364	70,612,708 (70.45%)	15,224,761 (15.19%)
SRR1041800	SRX386138	SRP033278	SAMN02419882	51,987,730	39,603,076 (76.18%)	9,054,024 (17.42%)
SRR1041801	SRX386139	SRP033278	SAMN02419883	172,632,422	122,311,121 (70.85%)	26,319,453 (15.25%)
SRR1041802	SRX386140	SRP033278	SAMN02419884	62,505,964	46,596,592 (74.55%)	10,649,763 (17.04%)
SRR1041803	SRX386141	SRP033278	SAMN02419885	115,344,546	88,495,682 (76.72%)	20,730,142 (17.97%)
SRR1041804	SRX386142	SRP033278	SAMN02419886	186,963,470	131,710,422 (70.45%)	26,231,097 (14.03%)
SRR1041805	SRX386143	SRP033278	SAMN02419887	100,171,840	73,365,010 (73.24%)	15,514,690 (15.49%)
SRR1041806	SRX386144	SRP033278	SAMN02419888	158,678,648	102,457,331 (64.57%)	16,139,081 (10.17%)
SRR1041807	SRX386145	SRP033278	SAMN02419889	120,443,552	91,783,062 (76.20%)	21,570,589 (17.91%)
SRR1041808	SRX386146	SRP033278	SAMN02419890	185,235,118	135,234,464 (73.01%)	29,190,158 (15.76%)
SRR1041809	SRX386147	SRP033278	SAMN02419891	117,442,020	85,972,472 (73.20%)	17,069,050 (14.53%)
SRR1041810	SRX386148	SRP033278	SAMN02419892	206,812,810	138,539,224 (66.99%)	10,458,963 (5.06%)
SRR1041811	SRX386149	SRP033278	SAMN02419893	93,838,958	72,017,736 (76.75%)	14,628,622 (15.59%)
SRR1041812	SRX386150	SRP033278	SAMN02419894	145,107,100	108,475,536 (74.76%)	23,459,354 (16.17%)
SRR1041813	SRX386151	SRP033278	SAMN02419895	120,271,448	77,986,511 (64.84%)	11,206,729 (9.32%)
SRR1041814	SRX386152	SRP033278	SAMN02419896	90,204,084	67,232,408 (74.53%)	14,491,743 (16.07%)
SRR1041815	SRX386153	SRP033278	SAMN02419897	128,443,596	99,527,949 (77.49%)	22,864,371 (17.80%)
SRR1041816	SRX386154	SRP033278	SAMN02419898	159,617,680	118,956,012 (74.53%)	27,579,159 (17.28%)
SRR1041817	SRX386155	SRP033278	SAMN02419899	132,530,114	102,595,758 (77.41%)	23,532,804 (17.76%)
SRR1041818	SRX386156	SRP033278	SAMN02419900	123,762,988	91,086,431 (73.60%)	19,854,446 (16.04%)
SRR1041819	SRX386157	SRP033278	SAMN02419901	49,552,248	39,462,680 (79.64%)	9,765,797 (19.71%)
SRR1041820	SRX386158	SRP033278	SAMN02419902	161,222,250	122,393,570 (75.92%)	27,336,168 (16.96%)
SRR1041821	SRX386159	SRP033278	SAMN02419903	91,380,498	61,395,276 (67.19%)	9,144,458 (10.01%)
SRR1041822	SRX386160	SRP033278	SAMN02419904	66,378,786	49,895,241 (75.17%)	10,976,709 (16.54%)
SRR1041823	SRX386161	SRP033278	SAMN02419905	127,744,056	103,111,378 (80.72%)	24,291,657 (19.02%)
SRR1041824	SRX386162	SRP033278	SAMN02419906	139,086,428	112,909,668 (81.18%)	26,782,387 (19.26%)
SRR1041825	SRX386163	SRP033278	SAMN02419907	43,019,072	27,813,959 (64.65%)	4,145,538 (9.64%)
SRR1041826	SRX386164	SRP033278	SAMN02419908	32,755,064	21,170,808 (64.63%)	3,602,425 (11.00%)
SRR1041827	SRX386165	SRP033278	SAMN02419909	210,276,010	161,365,472 (76.74%)	38,194,546 (18.16%)
SRR1165180	SRX467610	SRP036878	SAMN02639421	36,910,256	17,123,294 (46.39%)	4,629,443 (12.54%)
SRR1165192	SRX467610	SRP036878	SAMN02639421	36,910,256	17,104,839 (46.34%)	4,615,401 (12.50%)
SRR1165183	SRX467612	SRP036878	SAMN02639421	36,498,338	17,033,941 (46.67%)	4,581,359 (12.55%)
SRR1165195	SRX467612	SRP036878	SAMN02639421	36,498,338	17,013,020 (46.61%)	4,570,775 (12.52%)
SRR1165187	SRX467616	SRP036878	SAMN02639421	30,322,558	28,229,377 (93.10%)	7,505,351 (24.75%)
SRR1165197	SRX467616	SRP036878	SAMN02639421	30,322,558	14,105,690 (46.52%)	3,746,342 (12.36%)
SRR1165871	SRX468207	SRP036878	SAMN02639421	51,328,922	23,965,280 (46.69%)	6,366,009 (12.40%)
SRR1165877	SRX468207	SRP036878	SAMN02639421	51,328,922	23,937,581 (46.64%)	6,351,507 (12.37%)
SRR1165900	SRX468238	SRP036878	SAMN02639421	39,708,824	18,592,030 (46.82%)	4,940,682 (12.44%)
SRR1165901	SRX468238	SRP036878	SAMN02639421	39,708,824	18,578,523 (46.79%)	4,931,538 (12.42%)
SRR1165903	SRX468240	SRP036878	SAMN02639421	39,813,734	18,606,117 (46.73%)	4,363,312 (10.96%)
SRR1165904	SRX468240	SRP036878	SAMN02639421	39,813,734	18,574,733 (46.65%)	4,342,632 (10.91%)
SRR1165905	SRX468242	SRP036878	SAMN02639421	28,166,650	13,176,078 (46.78%)	3,500,820 (12.43%)
SRR1165906	SRX468242	SRP036878	SAMN02639421	28,166,650	13,139,820 (46.65%)	3,476,706 (12.34%)
SRR1165907	SRX468243	SRP036878	SAMN02639421	37,368,174	17,475,561 (46.77%)	4,524,038 (12.11%)
SRR1165908	SRX468243	SRP036878	SAMN02639421	37,368,174	17,421,408 (46.62%)	4,493,479 (12.02%)
SRR1462339	SRX500342	SRP040531	SAMN02666645	107,091,898	98,476,207 (91.95%)	24,626,772 (23.00%)
SRR1508228	SRX647504	SRP040531	SAMN02666645	108,187,750	98,570,066 (91.11%)	25,285,685 (23.37%)
SRR1508229	SRX647505	SRP040531	SAMN02666645	119,716,602	108,942,107 (91.00%)	27,849,719 (23.26%)
SRR1246865	SRX519864	SRP041243	SAMN02729205	96,600,262	88,599,092 (91.72%)	22,483,392 (23.27%)
SRR1246868	SRX519867	SRP041243	SAMN02729206	94,230,516	86,762,775 (92.08%)	21,860,549 (23.20%)
SRR1246866	SRX519865	SRP041243	SAMN02729207	101,095,758	90,930,061 (89.94%)	21,906,979 (21.67%)
SRR1246867	SRX519866	SRP041243	SAMN02729208	97,412,600	80,236,310 (82.37%)	16,378,612 (16.81%)

Protein alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Populus trichocarpa GenBank	4,390	4,244 (96.67%)	4,244 (96.67%)	80.94%	88.14%
Arabidopsis thaliana GenBank	53,352	48,396 (90.71%)	48,396 (90.71%)	69.93%	76.06%
Arabidopsis thaliana known RefSeq (NP_)	35,173	30,150 (85.72%)	30,150 (85.72%)	67.96%	72.16%
Same-species GenBank	70	68 (97.14%)	68 (97.14%)	72.35%	82.70%

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20

RefSeq

Integrated reference sequences