NCBI Lolium perenne Annotation Release 100

The RefSeq genome records for Lolium perenne were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. This report presents statistics on the annotation products, the input data used in the pipeline and intermediate alignment results.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
BUSCO results: Annotation completeness assessed with BUSCO
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as NCBI Lolium perenne Annotation Release 100

Annotation release ID: 100
Date of Entrez queries for transcripts and proteins: Oct 26 2022
Date of submission of annotation to the public databases: Oct 28 2022
Software version: 10.0

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
MPB_Lper_Kyuss_1697	GCF_019359855.1	ETH Zurich	07-28-2021	Reference	7 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	MPB_Lper_Kyuss_1697
Genes and pseudogenes	57,733
protein-coding	40,142
non-coding	14,581
Transcribed pseudogenes	5
Non-transcribed pseudogenes	3,005
genes with variants	11,021
Immunoglobulin/T-cell receptor gene segments	0
other	0
mRNAs	54,169
fully-supported	45,400
with > 5% ab initio	7,409
partial	191
with filled gap(s)	4
known RefSeq (NM_)	0
model RefSeq (XM_)	54,169
non-coding RNAs	36,337
fully-supported	31,368
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	35,615
pseudo transcripts	5
fully-supported	5
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	5
CDSs	54,169
fully-supported	45,400
with > 5% ab initio	7,633
partial	190
with major correction(s)	141
known RefSeq (NP_)	0
model RefSeq (XP_)	54,169

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	54,723	4,415	2,370	64	213,170
All transcripts	90,506	1,954	1,641	64	16,532
mRNA	54,169	1,842	1,559	138	16,532
misc_RNA	6,642	2,581	2,329	111	11,896
tRNA	722	74	73	71	88
lncRNA	24,731	2,387	2,125	95	14,285
snoRNA	603	108	98	64	222
snRNA	144	152	160	99	201
rRNA	3,495	203	119	114	3,414
Single-exon transcripts	8,393	1,183	1,037	229	8,411
coding transcripts (NM_/XM_ )	8,393	1,183	1,037	229	8,411
CDSs	54,169	1,316	1,101	93	16,254
Exons	274,323	379	197	1	10,887
in coding transcripts (NM_/XM_ )	208,492	355	186	1	9,784
in non-coding transcripts (NR_/XR_ )	73,704	435	225	10	10,887
Introns	209,988	1,107	166	30	138,294
in coding transcripts (NM_/XM_ )	162,871	1,057	146	30	138,294
in non-coding transcripts (NR_/XR_ )	54,500	1,211	255	30	99,963

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	1.66	1	1	50
Number of exons per transcript	5.69	4	1	78

BUSCO analysis of gene annotation

BUSCO v4.1.4 was run in "protein" mode on the annotated gene set picking one longest protein per gene, and run using the poales_odb10 lineage dataset. Results are reported for the gene set from the primary assembly unit, and presented in BUSCO notation.

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the Arabidopsis thaliana known RefSeq proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 40142 coding genes, 28954 genes had a protein with an alignment covering 50% or more of the query and 7822 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: Arabidopsis thaliana known RefSeq proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker (if calculated), for each assembly. RepeatMasker results are only calculated for organisms with complete Dfam HMM model collections.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with WindowMasker
MPB_Lper_Kyuss_1697	GCF_019359855.1	68.61%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez Nucleotide, Entrez Protein, and SRA, and aligned to the genome.

Transcript alignments

The alignments of the following transcripts with Splign were used for gene prediction:

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species Genbank	490	485 (98.98%)	439 (89.59%)	98.63%	99.36%
Same-species TSA	286,220	216,714 (75.72%)	139,278 (48.66%)	98.57%	98.23%
Same-species EST	19,674	17,579 (89.35%)	14,628 (74.35%)	98.43%	99.28%

RNA-Seq alignments

The alignments of the following RNA-Seq reads with STAR were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Publication	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	NA	Aggregate of all aligned samples	8,583,634,422	77%	28%	281,662
SAMN02566648	25126744	Leaf sheath (Lolium perenne, SAMN02566648)	12,512,506	57%	23%	98,110
SAMN02566649	25126744	Inflorescence (Lolium perenne, SAMN02566649)	20,119,180	59%	24%	115,166
SAMN02566650	25126744	Mature_leaf (Lolium perenne, SAMN02566650)	20,922,002	58%	26%	104,071
SAMN02566651	25126744	Meristem (Lolium perenne, SAMN02566651)	13,585,338	59%	25%	102,987
SAMN02566652	25126744	Root (Lolium perenne, SAMN02566652)	19,939,894	38%	24%	100,309
SAMN02566653	25126744	Stem (Lolium perenne, SAMN02566653)	20,271,402	55%	25%	106,920
SAMN03960126	NA	Blade 1 (Lolium perenne, SAMN03960126)	192,709,742	85%	27%	157,346
SAMN03960127	NA	Blade 2 (Lolium perenne, SAMN03960127)	325,604,136	83%	28%	171,142
SAMN03960128	NA	Lower emerging leaf plus stem (Lolium perenne, SAMN03960128)	376,309,698	85%	28%	172,078
SAMN03960129	NA	Upper emerging leaf (Lolium perenne, SAMN03960129)	387,214,398	85%	24%	164,905
SAMN03960130	NA	Root (Lolium perenne, SAMN03960130)	273,985,084	84%	24%	171,833
SAMN03960131	NA	Sheath 1 (Lolium perenne, SAMN03960131)	307,859,746	82%	26%	171,791
SAMN03960132	NA	Sheath 2 (Lolium perenne, SAMN03960132)	271,238,628	75%	23%	169,739
SAMN06603379	NA	tip of the youngest leaf (Lolium perenne, SAMN06603379)	93,175,174	66%	27%	156,352
SAMN06603380	NA	tip of the second youngest leaf (Lolium perenne, SAMN06603380)	94,741,932	54%	27%	152,605
SAMN06603381	NA	tip of the third youngest leaf (Lolium perenne, SAMN06603381)	77,644,836	57%	26%	147,581
SAMN06603382	NA	mid-section of the youngest leaf (Lolium perenne, SAMN06603382)	82,490,514	73%	20%	146,551
SAMN06603383	NA	mid-section of the second youngest leaf (Lolium perenne, SAMN06603383)	91,529,140	57%	27%	150,985
SAMN06603384	NA	mid-section of the third youngest leaf (Lolium perenne, SAMN06603384)	86,696,136	46%	27%	146,869
SAMN06603385	NA	complete pseudostem (Lolium perenne, SAMN06603385)	43,608,210	69%	26%	151,516
SAMN06603386	NA	lower portion of the pseudostem (Lolium perenne, SAMN06603386)	47,025,732	80%	26%	153,955
SAMN06603387	NA	upper portion of the pseudostem (Lolium perenne, SAMN06603387)	47,294,682	79%	26%	148,757
SAMN06603388	NA	mid-section of root mass (Lolium perenne, SAMN06603388)	45,794,780	81%	24%	146,942
SAMN06603389	NA	tip section of root mass (Lolium perenne, SAMN06603389)	46,451,938	79%	22%	147,896
SAMN06603390	NA	complete flower (Lolium perenne, SAMN06603390)	32,077,802	82%	26%	150,277
SAMN06603391	NA	pollinated pistil 1 (Lolium perenne, SAMN06603391)	34,287,568	83%	26%	144,536
SAMN06603392	NA	pollinated pistil 2 (Lolium perenne, SAMN06603392)	33,723,698	80%	22%	141,514
SAMN06603393	NA	pollinated pistil 3 (Lolium perenne, SAMN06603393)	41,060,084	82%	21%	136,520
SAMN06603394	NA	pollinated stigma 1 (Lolium perenne, SAMN06603394)	35,845,834	74%	21%	132,417
SAMN06603395	NA	pollinated stigma 2 (Lolium perenne, SAMN06603395)	43,142,820	81%	22%	132,105
SAMN12636579	NA	leaves (Lolium perenne, SAMN12636579)	177,774,020	72%	32%	176,089
SAMN12636580	NA	leaves (Lolium perenne, SAMN12636580)	188,670,988	72%	33%	185,652
SAMN12784416	NA	whole plant (Lolium perenne, SAMN12784416)	54,343,712	77%	33%	164,309
SAMN12784418	NA	whole plant (Lolium perenne, SAMN12784418)	51,699,962	76%	33%	160,916
SAMN12784419	NA	whole plant (Lolium perenne, SAMN12784419)	46,605,266	76%	33%	159,800
SAMN12784422	NA	whole plant (Lolium perenne, SAMN12784422)	48,041,264	74%	34%	161,825
SAMN12784423	NA	whole plant (Lolium perenne, SAMN12784423)	43,269,648	74%	34%	155,411
SAMN12784424	NA	whole plant (Lolium perenne, SAMN12784424)	46,531,514	74%	33%	158,517
SAMN12784430	NA	whole plant (Lolium perenne, SAMN12784430)	43,462,350	74%	33%	157,401
SAMN12784431	NA	whole plant (Lolium perenne, SAMN12784431)	47,521,084	75%	33%	159,041
SAMN12784433	NA	whole plant (Lolium perenne, SAMN12784433)	50,429,230	75%	32%	158,166
SAMN13511824	NA	leaf (Lolium perenne, SAMN13511824)	33,417,788	78%	27%	111,777
SAMN13511825	NA	leaf (Lolium perenne, SAMN13511825)	35,537,182	77%	27%	115,365
SAMN13511826	NA	leaf (Lolium perenne, SAMN13511826)	35,698,564	75%	24%	137,999
SAMN13511827	NA	leaf (Lolium perenne, SAMN13511827)	32,836,878	77%	26%	131,881
SAMN13511828	NA	leaf (Lolium perenne, SAMN13511828)	41,796,314	72%	27%	130,858
SAMN13511829	NA	leaf (Lolium perenne, SAMN13511829)	34,989,900	72%	28%	130,097
SAMN13511830	NA	leaf (Lolium perenne, SAMN13511830)	34,859,380	81%	28%	122,602
SAMN13511831	NA	leaf (Lolium perenne, SAMN13511831)	34,491,140	82%	29%	123,744
SAMN13511832	NA	leaf (Lolium perenne, SAMN13511832)	35,560,376	79%	25%	126,690
SAMN13511833	NA	leaf (Lolium perenne, SAMN13511833)	33,243,130	80%	26%	120,938
SAMN13618356	NA	stem (Lolium perenne, 27 days, pooled male and female, SAMN13618356)	63,095,072	61%	33%	152,090
SAMN13618357	NA	stem (Lolium perenne, 27 days, pooled male and female, SAMN13618357)	66,218,778	65%	34%	155,445
SAMN13618358	NA	stem (Lolium perenne, 27 days, pooled male and female, SAMN13618358)	55,923,878	67%	34%	150,838
SAMN13618359	NA	stem (Lolium perenne, 27 days, pooled male and female, SAMN13618359)	69,192,304	67%	33%	156,012
SAMN13622397	NA	root (Lolium perenne, 27 days, pooled male and female, SAMN13622397)	61,867,822	68%	35%	148,901
SAMN13622398	NA	root (Lolium perenne, 27 days, pooled male and female, SAMN13622398)	69,828,544	68%	35%	150,455
SAMN13622399	NA	root (Lolium perenne, 27 days, pooled male and female, SAMN13622399)	56,608,366	70%	36%	157,643
SAMN13622400	NA	root (Lolium perenne, 27 days, pooled male and female, SAMN13622400)	76,168,276	69%	36%	162,499
SAMN13622401	NA	root (Lolium perenne, 27 days, pooled male and female, SAMN13622401)	83,471,992	69%	35%	151,506
SAMN13622402	NA	root (Lolium perenne, 27 days, pooled male and female, SAMN13622402)	80,576,098	70%	35%	159,323
SAMN15935733	NA	whole plant (Lolium perenne, 6 weeks, SAMN15935733)	47,998,874	75%	30%	158,988
SAMN15935734	NA	whole plant (Lolium perenne, 6 weeks, SAMN15935734)	47,176,568	74%	30%	164,624
SAMN15935735	NA	whole plant (Lolium perenne, 6 weeks, SAMN15935735)	49,547,944	76%	29%	163,089
SAMN15935736	NA	whole plant (Lolium perenne, 6 weeks, SAMN15935736)	54,794,370	74%	25%	149,015
SAMN15935737	NA	whole plant (Lolium perenne, 6 weeks, SAMN15935737)	43,075,714	76%	30%	156,915
SAMN15935738	NA	whole plant (Lolium perenne, 6 weeks, SAMN15935738)	49,243,666	79%	30%	159,253
SAMN15935739	NA	whole plant (Lolium perenne, 6 weeks, SAMN15935739)	44,771,448	73%	30%	156,693
SAMN15935740	NA	whole plant (Lolium perenne, 6 weeks, SAMN15935740)	43,030,474	73%	25%	155,851
SAMN15935741	NA	whole plant (Lolium perenne, 6 weeks, SAMN15935741)	44,284,084	76%	28%	150,829
SAMN15935742	NA	whole plant (Lolium perenne, 6 weeks, SAMN15935742)	44,732,806	70%	27%	155,126
SAMN15935743	NA	whole plant (Lolium perenne, 6 weeks, SAMN15935743)	50,344,184	76%	30%	155,895
SAMN15935744	NA	whole plant (Lolium perenne, 6 weeks, SAMN15935744)	48,567,430	76%	29%	146,981
SAMN19341584	NA	Ovary (Lolium perenne, SAMN19341584)	156,701,532	86%	25%	165,709
SAMN19341585	NA	Ovary (Lolium perenne, SAMN19341585)	172,178,480	86%	26%	166,253
SAMN19341586	NA	Ovary (Lolium perenne, SAMN19341586)	167,348,454	85%	24%	167,820
SAMN19341587	NA	Ovary (Lolium perenne, SAMN19341587)	192,121,556	87%	26%	161,638
SAMN19341588	NA	Ovary (Lolium perenne, SAMN19341588)	190,983,524	85%	23%	165,275
SAMN19341589	NA	Ovary (Lolium perenne, SAMN19341589)	208,830,842	85%	24%	162,578
SAMN19341590	NA	Inflorescence primordium (Lolium perenne, SAMN19341590)	173,576,568	81%	31%	159,523
SAMN19341591	NA	Inflorescence primordium (Lolium perenne, SAMN19341591)	149,433,676	79%	33%	157,566
SAMN19341592	NA	Inflorescence primordium (Lolium perenne, SAMN19341592)	159,530,148	81%	32%	158,211
SAMN19341593	NA	Inflorescence primordium (Lolium perenne, SAMN19341593)	168,141,864	80%	32%	157,201
SAMN19341594	NA	Inflorescence primordium (Lolium perenne, SAMN19341594)	177,707,780	81%	32%	155,695
SAMN19341595	NA	Inflorescence primordium (Lolium perenne, SAMN19341595)	185,411,990	82%	33%	156,117
SAMN21840399	NA	Leaf (Lolium perenne, SAMN21840399)	58,466,992	75%	29%	157,352
SAMN21840400	NA	Leaf (Lolium perenne, SAMN21840400)	55,949,436	77%	36%	175,454
SAMN21840401	NA	Leaf (Lolium perenne, SAMN21840401)	54,220,956	77%	36%	174,594
SAMN21840402	NA	Leaf (Lolium perenne, SAMN21840402)	46,918,470	75%	34%	182,935
SAMN21840403	NA	Leaf (Lolium perenne, SAMN21840403)	56,705,452	76%	29%	181,703
SAMN21840404	NA	Leaf (Lolium perenne, SAMN21840404)	40,945,160	76%	28%	167,131
SAMN21840405	NA	Leaf (Lolium perenne, SAMN21840405)	43,006,190	79%	31%	151,015
SAMN21840406	NA	Leaf (Lolium perenne, SAMN21840406)	47,537,370	79%	37%	171,788
SAMN21840407	NA	Leaf (Lolium perenne, SAMN21840407)	40,792,580	79%	38%	166,236
SAMN21840408	NA	Leaf (Lolium perenne, SAMN21840408)	42,722,886	77%	30%	175,600
SAMN21840409	NA	Leaf (Lolium perenne, SAMN21840409)	47,481,986	77%	35%	184,842
SAMN21840410	NA	Leaf (Lolium perenne, SAMN21840410)	48,436,788	76%	35%	184,111
SAMN22074138	NA	leaf and stem (Lolium perenne, SAMN22074138)	33,945,822	48%	27%	106,854
SAMN30088781	NA	Anther (Lolium perenne, SAMN30088781)	58,415,174	77%	26%	158,977
SAMN30088782	NA	Stigma (Lolium perenne, SAMN30088782)	64,979,628	76%	27%	150,009
SAMN30088783	NA	Anther (Lolium perenne, SAMN30088783)	59,736,302	83%	23%	147,611
SAMN30088784	NA	Stigma (Lolium perenne, SAMN30088784)	61,245,800	78%	30%	148,949

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
SRR1161517	SRX670823	SRP044151	SAMN02566648	12,512,506	57%	23%
SRR1161519	SRX670824	SRP044151	SAMN02566649	20,119,180	59%	24%
SRR1161520	SRX670825	SRP044151	SAMN02566650	20,922,002	58%	26%
SRR1161521	SRX670826	SRP044151	SAMN02566651	13,585,338	59%	25%
SRR1161522	SRX670827	SRP044151	SAMN02566652	19,939,894	38%	24%
SRR1161523	SRX670828	SRP044151	SAMN02566653	20,271,402	55%	25%
SRR2148823	SRX1167577	SRP062084	SAMN03960126	33,838,540	83%	27%
SRR2185506	SRX1167577	SRP062084	SAMN03960126	26,275,532	85%	27%
SRR2209092	SRX1167577	SRP062084	SAMN03960126	18,837,158	87%	26%
SRR2148826	SRX1167578	SRP062084	SAMN03960126	71,277,330	84%	26%
SRR2185505	SRX1167578	SRP062084	SAMN03960126	19,588,726	85%	27%
SRR2209093	SRX1167578	SRP062084	SAMN03960126	22,892,456	86%	27%
SRR2148819	SRX1167579	SRP062084	SAMN03960127	116,770,458	80%	27%
SRR2185508	SRX1167579	SRP062084	SAMN03960127	20,769,594	84%	28%
SRR2209132	SRX1167579	SRP062084	SAMN03960127	28,602,796	86%	27%
SRR2157403	SRX1167580	SRP062084	SAMN03960127	108,923,020	84%	28%
SRR2185507	SRX1167580	SRP062084	SAMN03960127	32,564,614	84%	28%
SRR2209176	SRX1167580	SRP062084	SAMN03960127	17,973,654	85%	27%
SRR2148827	SRX1167581	SRP062084	SAMN03960128	62,878,370	84%	29%
SRR2185504	SRX1167581	SRP062084	SAMN03960128	13,586,562	85%	28%
SRR2209177	SRX1167581	SRP062084	SAMN03960128	19,300,272	85%	26%
SRR2148824	SRX1167582	SRP062084	SAMN03960128	65,186,436	82%	28%
SRR2185503	SRX1167582	SRP062084	SAMN03960128	161,161,782	85%	27%
SRR2209079	SRX1167582	SRP062084	SAMN03960128	25,769,494	86%	26%
SRR2209816	SRX1167582	SRP062084	SAMN03960128	28,426,782	86%	27%
SRR2148828	SRX1167589	SRP062084	SAMN03960129	44,131,208	83%	26%
SRR2185498	SRX1167589	SRP062084	SAMN03960129	32,962,924	84%	21%
SRR2209214	SRX1167589	SRP062084	SAMN03960129	25,170,162	87%	26%
SRR2148829	SRX1167590	SRP062084	SAMN03960129	52,939,664	83%	27%
SRR2185497	SRX1167590	SRP062084	SAMN03960129	178,386,298	86%	22%
SRR2209227	SRX1167590	SRP062084	SAMN03960129	53,624,142	87%	26%
SRR2148825	SRX1167583	SRP062084	SAMN03960130	74,942,548	84%	23%
SRR2185500	SRX1167583	SRP062084	SAMN03960130	18,503,124	81%	26%
SRR2209642	SRX1167583	SRP062084	SAMN03960130	44,307,558	84%	25%
SRR2148817	SRX1167584	SRP062084	SAMN03960130	69,286,728	83%	26%
SRR2185499	SRX1167584	SRP062084	SAMN03960130	19,512,004	83%	18%
SRR2209662	SRX1167584	SRP062084	SAMN03960130	47,433,122	85%	27%
SRR2148818	SRX1167585	SRP062084	SAMN03960131	51,019,304	83%	26%
SRR2185502	SRX1167585	SRP062084	SAMN03960131	25,948,172	84%	27%
SRR2209697	SRX1167585	SRP062084	SAMN03960131	23,555,794	85%	22%
SRR2148821	SRX1167586	SRP062084	SAMN03960131	104,804,090	81%	27%
SRR2185501	SRX1167586	SRP062084	SAMN03960131	81,924,922	81%	26%
SRR2209715	SRX1167586	SRP062084	SAMN03960131	20,607,464	82%	25%
SRR2148822	SRX1167587	SRP062084	SAMN03960132	41,329,582	83%	28%
SRR2209738	SRX1167587	SRP062084	SAMN03960132	31,125,146	86%	25%
SRR2148820	SRX1167588	SRP062084	SAMN03960132	32,030,688	80%	27%
SRR2185440	SRX1167588	SRP062084	SAMN03960132	128,005,768	84%	20%
SRR2209739	SRX1167588	SRP062084	SAMN03960132	14,054,826	79%	27%
SRR5387766	SRX2682781	SRP102678	SAMN06603379	93,175,174	66%	27%
SRR5387765	SRX2682780	SRP102678	SAMN06603380	94,741,932	54%	27%
SRR5387764	SRX2682779	SRP102678	SAMN06603381	77,644,836	57%	26%
SRR5387763	SRX2682778	SRP102678	SAMN06603382	82,490,514	73%	20%
SRR5387762	SRX2682777	SRP102678	SAMN06603383	91,529,140	57%	27%
SRR5387761	SRX2682776	SRP102678	SAMN06603384	86,696,136	46%	27%
SRR5387760	SRX2682775	SRP102678	SAMN06603385	43,608,210	69%	26%
SRR5387759	SRX2682774	SRP102678	SAMN06603386	47,025,732	80%	26%
SRR5387758	SRX2682773	SRP102678	SAMN06603387	47,294,682	79%	26%
SRR5387757	SRX2682772	SRP102678	SAMN06603388	45,794,780	81%	24%
SRR5387756	SRX2682771	SRP102678	SAMN06603389	46,451,938	79%	22%
SRR5387755	SRX2682770	SRP102678	SAMN06603390	32,077,802	82%	26%
SRR5387754	SRX2682769	SRP102678	SAMN06603391	34,287,568	83%	26%
SRR5387753	SRX2682768	SRP102678	SAMN06603392	33,723,698	80%	22%
SRR5387752	SRX2682767	SRP102678	SAMN06603393	41,060,084	82%	21%
SRR5387751	SRX2682766	SRP102678	SAMN06603394	35,845,834	74%	21%
SRR5387750	SRX2682765	SRP102678	SAMN06603395	43,142,820	81%	22%
SRR10050131	SRX6784525	SRP219951	SAMN12636579	37,023,414	65%	32%
SRR10050130	SRX6784526	SRP219951	SAMN12636579	26,771,564	75%	32%
SRR10050125	SRX6784531	SRP219951	SAMN12636579	26,419,118	73%	31%
SRR10050124	SRX6784532	SRP219951	SAMN12636579	29,040,588	76%	32%
SRR10050123	SRX6784533	SRP219951	SAMN12636579	32,138,538	74%	34%
SRR10050122	SRX6784534	SRP219951	SAMN12636579	26,380,798	71%	30%
SRR10050133	SRX6784523	SRP219951	SAMN12636580	30,467,570	70%	33%
SRR10050132	SRX6784524	SRP219951	SAMN12636580	26,868,596	70%	33%
SRR10050129	SRX6784527	SRP219951	SAMN12636580	29,587,980	75%	32%
SRR10050128	SRX6784528	SRP219951	SAMN12636580	36,801,222	69%	34%
SRR10050127	SRX6784529	SRP219951	SAMN12636580	27,494,336	77%	33%
SRR10050126	SRX6784530	SRP219951	SAMN12636580	37,451,284	71%	35%
SRR10150454	SRX6875876	SRP222657	SAMN12784416	54,343,712	77%	33%
SRR10150453	SRX6875877	SRP222657	SAMN12784418	51,699,962	76%	33%
SRR10150452	SRX6875878	SRP222657	SAMN12784419	46,605,266	76%	33%
SRR10150451	SRX6875879	SRP222657	SAMN12784422	48,041,264	74%	34%
SRR10150450	SRX6875880	SRP222657	SAMN12784423	43,269,648	74%	34%
SRR10150449	SRX6875881	SRP222657	SAMN12784424	46,531,514	74%	33%
SRR10150448	SRX6875882	SRP222657	SAMN12784430	43,462,350	74%	33%
SRR10150447	SRX6875883	SRP222657	SAMN12784431	47,521,084	75%	33%
SRR10150446	SRX6875884	SRP222657	SAMN12784433	50,429,230	75%	32%
SRR10611415	SRX7290687	SRP235237	SAMN13511824	33,417,788	78%	27%
SRR10611414	SRX7290686	SRP235237	SAMN13511825	35,537,182	77%	27%
SRR10611413	SRX7290685	SRP235237	SAMN13511826	35,698,564	75%	24%
SRR10611412	SRX7290684	SRP235237	SAMN13511827	32,836,878	77%	26%
SRR10611411	SRX7290683	SRP235237	SAMN13511828	41,796,314	72%	27%
SRR10611410	SRX7290682	SRP235237	SAMN13511829	34,989,900	72%	28%
SRR10611409	SRX7290681	SRP235237	SAMN13511830	34,859,380	81%	28%
SRR10611408	SRX7290680	SRP235237	SAMN13511831	34,491,140	82%	29%
SRR10611407	SRX7290679	SRP235237	SAMN13511832	35,560,376	79%	25%
SRR10611406	SRX7290678	SRP235237	SAMN13511833	33,243,130	80%	26%
SRR10723352	SRX7399486	SRP237973	SAMN13618356	63,095,072	61%	33%
SRR10723351	SRX7399487	SRP237973	SAMN13618357	66,218,778	65%	34%
SRR10723350	SRX7399488	SRP237973	SAMN13618358	55,923,878	67%	34%
SRR10723349	SRX7399489	SRP237973	SAMN13618359	69,192,304	67%	33%
SRR10727791	SRX7403906	SRP238043	SAMN13622397	61,867,822	68%	35%
SRR10727790	SRX7403907	SRP238043	SAMN13622398	69,828,544	68%	35%
SRR10727789	SRX7403908	SRP238043	SAMN13622399	56,608,366	70%	36%
SRR10727788	SRX7403909	SRP238043	SAMN13622400	76,168,276	69%	36%
SRR10727787	SRX7403910	SRP238043	SAMN13622401	83,471,992	69%	35%
SRR10727786	SRX7403911	SRP238043	SAMN13622402	80,576,098	70%	35%
SRR12557256	SRX9046245	SRP279480	SAMN15935733	47,998,874	75%	30%
SRR12557255	SRX9046246	SRP279480	SAMN15935734	47,176,568	74%	30%
SRR12557252	SRX9046249	SRP279480	SAMN15935735	49,547,944	76%	29%
SRR12557251	SRX9046250	SRP279480	SAMN15935736	54,794,370	74%	25%
SRR12557250	SRX9046251	SRP279480	SAMN15935737	43,075,714	76%	30%
SRR12557249	SRX9046252	SRP279480	SAMN15935738	49,243,666	79%	30%
SRR12557248	SRX9046253	SRP279480	SAMN15935739	44,771,448	73%	30%
SRR12557247	SRX9046254	SRP279480	SAMN15935740	43,030,474	73%	25%
SRR12557246	SRX9046255	SRP279480	SAMN15935741	44,284,084	76%	28%
SRR12557245	SRX9046256	SRP279480	SAMN15935742	44,732,806	70%	27%
SRR12557254	SRX9046247	SRP279480	SAMN15935743	50,344,184	76%	30%
SRR12557253	SRX9046248	SRP279480	SAMN15935744	48,567,430	76%	29%
SRR14655089	SRX10993591	SRP321377	SAMN19341584	156,701,532	86%	25%
SRR14655088	SRX10993592	SRP321377	SAMN19341585	172,178,480	86%	26%
SRR14655085	SRX10993595	SRP321377	SAMN19341586	167,348,454	85%	24%
SRR14655084	SRX10993596	SRP321377	SAMN19341587	192,121,556	87%	26%
SRR14655083	SRX10993597	SRP321377	SAMN19341588	190,983,524	85%	23%
SRR14655082	SRX10993598	SRP321377	SAMN19341589	208,830,842	85%	24%
SRR14655081	SRX10993599	SRP321377	SAMN19341590	173,576,568	81%	31%
SRR14655080	SRX10993600	SRP321377	SAMN19341591	149,433,676	79%	33%
SRR14655079	SRX10993601	SRP321377	SAMN19341592	159,530,148	81%	32%
SRR14655078	SRX10993602	SRP321377	SAMN19341593	168,141,864	80%	32%
SRR14655087	SRX10993593	SRP321377	SAMN19341594	177,707,780	81%	32%
SRR14655086	SRX10993594	SRP321377	SAMN19341595	185,411,990	82%	33%
SRR16234976	SRX12514828	SRP333355	SAMN22074138	33,945,822	48%	27%
SRR16093287	SRX12379593	SRP338991	SAMN21840399	58,466,992	75%	29%
SRR16093286	SRX12379594	SRP338991	SAMN21840400	55,949,436	77%	36%
SRR16093283	SRX12379597	SRP338991	SAMN21840401	54,220,956	77%	36%
SRR16093294	SRX12379586	SRP338991	SAMN21840402	46,918,470	75%	34%
SRR16093293	SRX12379587	SRP338991	SAMN21840403	56,705,452	76%	29%
SRR16093292	SRX12379588	SRP338991	SAMN21840404	40,945,160	76%	28%
SRR16093291	SRX12379589	SRP338991	SAMN21840405	43,006,190	79%	31%
SRR16093290	SRX12379590	SRP338991	SAMN21840406	47,537,370	79%	37%
SRR16093289	SRX12379591	SRP338991	SAMN21840407	40,792,580	79%	38%
SRR16093288	SRX12379592	SRP338991	SAMN21840408	42,722,886	77%	30%
SRR16093285	SRX12379595	SRP338991	SAMN21840409	47,481,986	77%	35%
SRR16093284	SRX12379596	SRP338991	SAMN21840410	48,436,788	76%	35%
SRR20744806	SRX16765071	SRP389258	SAMN30088781	58,415,174	77%	26%
SRR20744805	SRX16765072	SRP389258	SAMN30088782	64,979,628	76%	27%
SRR20744804	SRX16765073	SRP389258	SAMN30088783	59,736,302	83%	23%
SRR20744803	SRX16765074	SRP389258	SAMN30088784	61,245,800	78%	30%

Protein alignments

The alignments of the following proteins with ProSplign were used for gene prediction:

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Pooideae GenBank	36,819	33,853 (91.94%)	33,853 (91.94%)	71.92%	83.34%
Pooideae known RefSeq (NP_)	234	228 (97.44%)	228 (97.44%)	73.54%	82.54%
Arabidopsis thaliana known RefSeq (NP_)	48,147	39,478 (81.99%)	39,478 (81.99%)	65.32%	67.34%
Same-species GenBank	256	254 (99.22%)	254 (99.22%)	77.28%	90.39%
Oryza sativa GenBank	22,107	20,568 (93.04%)	20,568 (93.04%)	70.81%	80.36%
Oryza sativa known RefSeq (NP_)	1,720	1,688 (98.14%)	1,688 (98.14%)	72.15%	82.96%
Zea mays GenBank	50,057	44,425 (88.75%)	44,425 (88.75%)	72.36%	80.70%
Zea mays known RefSeq (NP_)	20,484	19,427 (94.84%)	19,427 (94.84%)	70.75%	79.92%

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
BUSCO: Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. Molecular biology and evolution 2021.38(10):4647-4654
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20
STAR: Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. Bioinformatics 2013 Jan 1;29(1):15-21.
Minimap2: Li H. Bioinformatics 2018 Sep 15;34(18):3094-3100

RefSeq

Integrated reference sequences