NCBI Pisum sativum Annotation Release 100

The RefSeq genome records for Pisum sativum were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. This report presents statistics on the annotation products, the input data used in the pipeline and intermediate alignment results.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
BUSCO results: Annotation completeness assessed with BUSCO
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as NCBI Pisum sativum Annotation Release 100

Annotation release ID: 100
Date of Entrez queries for transcripts and proteins: Sep 23 2022
Date of submission of annotation to the public databases: Oct 6 2022
Software version: 10.0

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
CAAS_Psat_ZW6_1.0	GCF_024323335.1	Institute of Microbiology, Chinese Academy of Sciences	07-19-2022	Reference	7 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	CAAS_Psat_ZW6_1.0
Genes and pseudogenes	65,672
protein-coding	40,025
non-coding	20,261
Transcribed pseudogenes	14
Non-transcribed pseudogenes	5,372
genes with variants	7,453
Immunoglobulin/T-cell receptor gene segments	0
other	0
mRNAs	51,089
fully-supported	36,711
with > 5% ab initio	13,220
partial	439
with filled gap(s)	27
known RefSeq (NM_)	0
model RefSeq (XM_)	51,089
non-coding RNAs	25,977
fully-supported	11,702
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	23,161
pseudo transcripts	14
fully-supported	13
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	14
CDSs	51,089
fully-supported	36,711
with > 5% ab initio	13,363
partial	434
with major correction(s)	100
known RefSeq (NP_)	0
model RefSeq (XP_)	51,089

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	60,286	3,115	1,441	60	209,518
All transcripts	77,066	1,384	1,158	60	22,911
mRNA	51,089	1,683	1,407	105	22,911
misc_RNA	1,855	2,199	1,805	199	20,067
tRNA	2,816	74	73	69	88
lncRNA	9,847	1,462	1,174	132	14,730
snoRNA	7,014	107	107	60	224
snRNA	235	158	161	88	204
rRNA	4,210	296	119	116	3,468
Single-exon transcripts	10,082	1,045	838	138	7,047
coding transcripts (NM_/XM_ )	10,082	1,045	838	138	7,047
CDSs	51,089	1,305	1,062	105	16,200
Exons	221,376	345	192	1	18,975
in coding transcripts (NM_/XM_ )	196,310	337	181	1	18,975
in non-coding transcripts (NR_/XR_ )	30,170	379	249	10	13,307
Introns	168,481	892	148	30	207,231
in coding transcripts (NM_/XM_ )	152,073	814	146	30	207,231
in non-coding transcripts (NR_/XR_ )	21,313	1,387	173	30	99,658

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	1.29	1	1	50
Number of exons per transcript	4.52	3	1	78

BUSCO analysis of gene annotation

BUSCO v4.1.4 was run in "protein" mode on the annotated gene set picking one longest protein per gene, and run using the fabales_odb10 lineage dataset. Results are reported for the gene set from the primary assembly unit, and presented in BUSCO notation.

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the Arabidopsis thaliana known RefSeq proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 40025 coding genes, 28838 genes had a protein with an alignment covering 50% or more of the query and 12501 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: Arabidopsis thaliana known RefSeq proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker (if calculated), for each assembly. RepeatMasker results are only calculated for organisms with complete Dfam HMM model collections.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with WindowMasker
CAAS_Psat_ZW6_1.0	GCF_024323335.1	65.79%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez Nucleotide, Entrez Protein, and SRA, and aligned to the genome.

Transcript alignments

The alignments of the following transcripts with Splign were used for gene prediction:

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species Genbank	1,553	1,523 (98.07%)	1,466 (94.40%)	99.38%	98.88%
Same-species TSA	865,074	814,236 (94.12%)	622,794 (71.99%)	99.19%	97.25%
Same-species EST	18,460	17,730 (96.05%)	17,129 (92.79%)	99.61%	97.57%

RNA-Seq alignments

The alignments of the following RNA-Seq reads with STAR were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	Aggregate of all aligned samples	5,470,881,753	95%	36%	203,129
SAMD00246357	JI128 (Pisum sativum, SAMD00246357)	72,747,880	70%	30%	135,424
SAMD00246358	JI4 (Pisum sativum, SAMD00246358)	55,892,584	69%	29%	103,016
SAMN04412739	Node 2 axillary bud (Pisum sativum, 7 days, SAMN04412739)	18,357,450	27%	33%	106,662
SAMN04412742	Node 2 axillary bud (Pisum sativum, 7 days, SAMN04412742)	29,640,232	30%	20%	107,050
SAMN04412745	Node 2 axillary bud (Pisum sativum, 7 days, SAMN04412745)	21,235,234	49%	26%	113,621
SAMN04412748	Node 2 axillary bud (Pisum sativum, 7 days, SAMN04412748)	16,100,896	40%	31%	108,214
SAMN04412751	Node 2 axillary bud (Pisum sativum, 7 days, SAMN04412751)	26,194,356	30%	22%	107,387
SAMN04412754	Node 2 axillary bud (Pisum sativum, 7 days, SAMN04412754)	42,641,438	42%	22%	119,151
SAMN04412757	Node 2 axillary bud (Pisum sativum, 7 days, SAMN04412757)	30,315,240	42%	24%	112,879
SAMN04412760	Node 2 axillary bud (Pisum sativum, 7 days, SAMN04412760)	29,117,312	46%	23%	113,751
SAMN10838899	immature seeds (Pisum sativum, SAMN10838899)	60,880,042	93%	40%	142,571
SAMN10838900	immature seeds (Pisum sativum, SAMN10838900)	48,876,526	92%	40%	140,141
SAMN10838901	immature seeds (Pisum sativum, SAMN10838901)	41,103,814	91%	40%	137,754
SAMN10838902	immature seeds (Pisum sativum, SAMN10838902)	57,921,512	90%	40%	142,415
SAMN10838903	immature seeds (Pisum sativum, SAMN10838903)	50,149,378	91%	40%	140,086
SAMN10838904	immature seeds (Pisum sativum, SAMN10838904)	51,433,250	92%	40%	142,253
SAMN10838905	immature seeds (Pisum sativum, SAMN10838905)	56,355,920	92%	40%	142,666
SAMN10838906	immature seeds (Pisum sativum, SAMN10838906)	55,633,184	92%	39%	142,695
SAMN10838907	immature seeds (Pisum sativum, SAMN10838907)	69,134,526	91%	40%	143,851
SAMN10838908	immature seeds (Pisum sativum, SAMN10838908)	37,600,312	90%	40%	136,720
SAMN10838909	immature seeds (Pisum sativum, SAMN10838909)	31,229,916	90%	40%	133,844
SAMN10838910	immature seeds (Pisum sativum, SAMN10838910)	36,615,110	91%	40%	136,400
SAMN10838911	immature seeds (Pisum sativum, SAMN10838911)	51,980,148	91%	40%	141,344
SAMN10838912	immature seeds (Pisum sativum, SAMN10838912)	49,376,078	92%	40%	140,813
SAMN12027149	pod (Pisum sativum, SAMN12027149)	53,177,298	96%	36%	122,601
SAMN12027150	pod (Pisum sativum, SAMN12027150)	46,603,202	96%	36%	122,451
SAMN12027151	pod (Pisum sativum, SAMN12027151)	48,464,020	92%	36%	129,931
SAMN12027152	pod (Pisum sativum, SAMN12027152)	48,697,994	95%	36%	129,251
SAMN12027153	pod (Pisum sativum, SAMN12027153)	88,118,240	96%	35%	137,270
SAMN12027154	pod (Pisum sativum, SAMN12027154)	47,763,822	96%	36%	131,689
SAMN12027155	pod (Pisum sativum, SAMN12027155)	46,031,500	94%	35%	125,362
SAMN12027156	pod (Pisum sativum, SAMN12027156)	77,183,790	94%	36%	132,314
SAMN12027157	pod (Pisum sativum, SAMN12027157)	73,290,302	94%	36%	131,539
SAMN12027158	pod (Pisum sativum, SAMN12027158)	46,463,170	90%	34%	126,818
SAMN12027159	pod (Pisum sativum, SAMN12027159)	52,649,984	95%	35%	129,715
SAMN12027160	pod (Pisum sativum, SAMN12027160)	44,744,222	95%	36%	129,207
SAMN12027161	pod (Pisum sativum, SAMN12027161)	58,309,564	86%	31%	119,708
SAMN12027162	pod (Pisum sativum, SAMN12027162)	45,361,508	86%	31%	115,156
SAMN12027163	pod (Pisum sativum, SAMN12027163)	65,635,850	86%	31%	121,222
SAMN12027164	pod (Pisum sativum, SAMN12027164)	50,178,144	93%	34%	130,027
SAMN12027165	pod (Pisum sativum, SAMN12027165)	49,062,324	96%	35%	131,132
SAMN12027166	pod (Pisum sativum, SAMN12027166)	50,407,392	92%	34%	133,598
SAMN12027167	pod (Pisum sativum, SAMN12027167)	52,540,204	96%	35%	134,983
SAMN12027168	pod (Pisum sativum, SAMN12027168)	43,241,374	95%	35%	131,636
SAMN12027169	pod (Pisum sativum, SAMN12027169)	64,373,550	96%	35%	137,096
SAMN12027170	pod (Pisum sativum, SAMN12027170)	70,121,000	96%	36%	138,063
SAMN12027171	pod (Pisum sativum, SAMN12027171)	42,866,954	96%	36%	132,162
SAMN12027172	pod (Pisum sativum, SAMN12027172)	50,629,222	96%	36%	133,603
SAMN12027173	pod (Pisum sativum, SAMN12027173)	46,291,538	95%	36%	131,044
SAMN12027174	pod (Pisum sativum, SAMN12027174)	55,535,564	96%	36%	134,895
SAMN12027175	pod (Pisum sativum, SAMN12027175)	56,985,700	95%	35%	133,477
SAMN12027176	pod (Pisum sativum, SAMN12027176)	51,783,256	93%	33%	128,195
SAMN12027177	pod (Pisum sativum, SAMN12027177)	64,124,068	93%	33%	130,460
SAMN12027178	pod (Pisum sativum, SAMN12027178)	51,513,084	93%	33%	126,800
SAMN12568801	seed (Pisum sativum, SAMN12568801)	59,789,258	96%	38%	137,676
SAMN12568802	seed (Pisum sativum, SAMN12568802)	64,380,286	96%	38%	136,430
SAMN12568803	seed (Pisum sativum, SAMN12568803)	42,832,440	86%	38%	129,144
SAMN12568804	seed (Pisum sativum, SAMN12568804)	47,660,108	95%	38%	131,837
SAMN12568805	seed (Pisum sativum, SAMN12568805)	46,695,054	96%	39%	124,790
SAMN12568806	seed (Pisum sativum, SAMN12568806)	48,183,842	96%	38%	133,981
SAMN12568807	seed (Pisum sativum, SAMN12568807)	40,273,992	97%	36%	112,940
SAMN12568808	seed (Pisum sativum, SAMN12568808)	39,276,408	97%	35%	110,801
SAMN12568809	seed (Pisum sativum, SAMN12568809)	55,537,474	98%	37%	115,195
SAMN12568810	seed (Pisum sativum, SAMN12568810)	43,704,610	97%	30%	105,631
SAMN12568811	seed (Pisum sativum, SAMN12568811)	58,785,472	97%	30%	108,700
SAMN12568812	seed (Pisum sativum, SAMN12568812)	51,671,028	97%	30%	108,960
SAMN12568813	seed (Pisum sativum, SAMN12568813)	82,106,892	96%	32%	116,113
SAMN12568814	seed (Pisum sativum, SAMN12568814)	62,247,762	95%	32%	112,513
SAMN12568815	seed (Pisum sativum, SAMN12568815)	55,000,560	96%	32%	111,379
SAMN12568816	seed (Pisum sativum, SAMN12568816)	59,207,402	97%	39%	137,768
SAMN12568817	seed (Pisum sativum, SAMN12568817)	74,721,590	95%	37%	144,356
SAMN12568818	seed (Pisum sativum, SAMN12568818)	80,980,998	96%	38%	145,946
SAMN12568819	seed (Pisum sativum, SAMN12568819)	72,018,954	96%	38%	140,254
SAMN12568820	seed (Pisum sativum, SAMN12568820)	53,421,032	96%	38%	135,974
SAMN12568821	seed (Pisum sativum, SAMN12568821)	72,862,098	97%	38%	137,833
SAMN12568822	seed (Pisum sativum, SAMN12568822)	55,732,762	98%	36%	117,868
SAMN12568823	seed (Pisum sativum, SAMN12568823)	47,733,924	98%	37%	115,388
SAMN12568824	seed (Pisum sativum, SAMN12568824)	40,321,180	98%	36%	111,962
SAMN12568825	seed (Pisum sativum, SAMN12568825)	48,105,006	97%	30%	106,744
SAMN12568826	seed (Pisum sativum, SAMN12568826)	50,271,474	96%	30%	101,023
SAMN12568827	seed (Pisum sativum, SAMN12568827)	52,261,548	96%	30%	106,562
SAMN12568828	seed (Pisum sativum, SAMN12568828)	48,185,930	96%	28%	112,786
SAMN12568829	seed (Pisum sativum, SAMN12568829)	43,871,140	97%	28%	112,314
SAMN12568830	seed (Pisum sativum, SAMN12568830)	56,528,730	96%	28%	109,173
SAMN14838828	Seed (Pisum sativum, SAMN14838828)	8,880,702	86%	37%	115,785
SAMN20570107	leaf (Pisum sativum, 1-week-old, SAMN20570107)	134,042,620	90%	38%	141,593
SAMN23644683	Shoot apex (Pisum sativum, SAMN23644683)	37,799,752	91%	39%	135,045
SAMN23644684	Shoot apex (Pisum sativum, SAMN23644684)	40,761,054	90%	39%	137,712
SAMN23644685	Shoot apex (Pisum sativum, SAMN23644685)	46,507,582	90%	39%	136,809
SAMN23644686	Shoot apex (Pisum sativum, SAMN23644686)	35,596,180	92%	40%	136,280
SAMN23644687	Shoot apex (Pisum sativum, SAMN23644687)	32,664,786	91%	40%	133,627
SAMN23644688	Shoot apex (Pisum sativum, SAMN23644688)	37,373,424	91%	39%	134,449
SAMN23644689	Shoot apex (Pisum sativum, SAMN23644689)	40,484,872	92%	40%	140,078
SAMN23644690	Shoot apex (Pisum sativum, SAMN23644690)	40,906,340	92%	39%	142,849
SAMN23644691	Shoot apex (Pisum sativum, SAMN23644691)	37,434,964	93%	40%	139,520
SAMN23644692	Shoot apex (Pisum sativum, SAMN23644692)	44,301,408	90%	38%	136,638
SAMN23644693	Shoot apex (Pisum sativum, SAMN23644693)	31,389,770	91%	39%	133,496
SAMN23644694	Shoot apex (Pisum sativum, SAMN23644694)	35,478,774	91%	39%	135,235
SAMN23644695	Shoot apex (Pisum sativum, SAMN23644695)	41,042,640	91%	39%	134,808
SAMN23644696	Shoot apex (Pisum sativum, SAMN23644696)	40,721,448	90%	39%	135,371
SAMN23644697	Shoot apex (Pisum sativum, SAMN23644697)	37,726,598	91%	39%	135,320
SAMN23644698	Shoot apex (Pisum sativum, SAMN23644698)	43,511,682	91%	38%	143,180
SAMN23644699	Shoot apex (Pisum sativum, SAMN23644699)	33,092,898	92%	39%	136,580
SAMN23644700	Shoot apex (Pisum sativum, SAMN23644700)	35,843,272	92%	39%	139,800
SAMN24390291	Stems and leaves (Pisum sativum, 10day, SAMN24390291)	68,243,348	176%	36%	156,869
SAMN24390292	Stems and leaves (Pisum sativum, 10day, SAMN24390292)	69,382,683	176%	36%	158,890
SAMN24425025	Cauline tissue (Pisum sativum, 10 day, SAMN24425025)	70,244,910	178%	36%	159,730
SAMN24425026	Cauline tissue (Pisum sativum, 10 day, SAMN24425026)	68,607,786	175%	35%	159,180

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
DRR244486	DRX234283	DRP006562	SAMD00246357	26,368,142	69%	29%
DRR244487	DRX234283	DRP006562	SAMD00246357	22,368,666	73%	31%
DRR244488	DRX234283	DRP006562	SAMD00246357	24,011,072	69%	30%
DRR244489	DRX234284	DRP006562	SAMD00246358	20,463,504	75%	32%
DRR244490	DRX234284	DRP006562	SAMD00246358	14,257,332	56%	23%
DRR244491	DRX234284	DRP006562	SAMD00246358	21,171,748	72%	31%
SRR3159235	SRX1569470	SRP068822	SAMN04412739	5,597,198	51%	34%
SRR3159236	SRX1569471	SRP068822	SAMN04412739	3,862,094	53%	32%
SRR3159238	SRX1569473	SRP068822	SAMN04412742	10,588,436	25%	15%
SRR3159239	SRX1569474	SRP068822	SAMN04412742	9,294,782	28%	12%
SRR3159240	SRX1569475	SRP068822	SAMN04412742	9,757,014	36%	28%
SRR3159241	SRX1569476	SRP068822	SAMN04412745	8,444,326	62%	22%
SRR3159242	SRX1569477	SRP068822	SAMN04412745	2,550,234	21%	33%
SRR3159243	SRX1569478	SRP068822	SAMN04412745	10,240,674	45%	30%
SRR3159244	SRX1569479	SRP068822	SAMN04412748	5,449,100	54%	32%
SRR3159245	SRX1569480	SRP068822	SAMN04412748	6,328,946	39%	32%
SRR3159246	SRX1569481	SRP068822	SAMN04412748	4,322,850	25%	30%
SRR3159247	SRX1569482	SRP068822	SAMN04412751	7,566,764	25%	20%
SRR3159248	SRX1569483	SRP068822	SAMN04412751	6,477,452	72%	25%
SRR3159249	SRX1569484	SRP068822	SAMN04412751	12,150,140	10%	13%
SRR3159250	SRX1569485	SRP068822	SAMN04412754	11,769,460	42%	26%
SRR3159251	SRX1569486	SRP068822	SAMN04412754	10,669,616	39%	29%
SRR3159252	SRX1569487	SRP068822	SAMN04412754	20,202,362	45%	16%
SRR3162503	SRX1542976	SRP068822	SAMN04412757	19,745,008	45%	22%
SRR3159253	SRX1569488	SRP068822	SAMN04412757	4,688,040	37%	22%
SRR3159254	SRX1569489	SRP068822	SAMN04412757	5,882,192	37%	31%
SRR3159255	SRX1569490	SRP068822	SAMN04412760	3,768,760	14%	25%
SRR3159256	SRX1569491	SRP068822	SAMN04412760	5,603,544	71%	24%
SRR3159257	SRX1569492	SRP068822	SAMN04412760	19,745,008	45%	22%
SRR8709719	SRX5504167	SRP188118	SAMN10838899	60,880,042	93%	40%
SRR8709720	SRX5504166	SRP188118	SAMN10838900	48,876,526	92%	40%
SRR8709718	SRX5504169	SRP188118	SAMN10838901	41,103,814	91%	40%
SRR8709728	SRX5504168	SRP188118	SAMN10838902	57,921,512	90%	40%
SRR8709723	SRX5504163	SRP188118	SAMN10838903	50,149,378	91%	40%
SRR8709724	SRX5504162	SRP188118	SAMN10838904	51,433,250	92%	40%
SRR8709721	SRX5504165	SRP188118	SAMN10838905	56,355,920	92%	40%
SRR8709722	SRX5504164	SRP188118	SAMN10838906	55,633,184	92%	39%
SRR8709725	SRX5504161	SRP188118	SAMN10838907	69,134,526	91%	40%
SRR8709726	SRX5504160	SRP188118	SAMN10838908	37,600,312	90%	40%
SRR8709730	SRX5504157	SRP188118	SAMN10838909	31,229,916	90%	40%
SRR8709731	SRX5504156	SRP188118	SAMN10838910	36,615,110	91%	40%
SRR8709727	SRX5504159	SRP188118	SAMN10838911	51,980,148	91%	40%
SRR8709729	SRX5504158	SRP188118	SAMN10838912	49,376,078	92%	40%
SRR9276690	SRX6046501	SRP201163	SAMN12027149	53,177,298	96%	36%
SRR9276691	SRX6046500	SRP201163	SAMN12027150	46,603,202	96%	36%
SRR9276692	SRX6046499	SRP201163	SAMN12027151	48,464,020	92%	36%
SRR9276693	SRX6046498	SRP201163	SAMN12027152	48,697,994	95%	36%
SRR9276694	SRX6046497	SRP201163	SAMN12027153	88,118,240	96%	35%
SRR9276695	SRX6046496	SRP201163	SAMN12027154	47,763,822	96%	36%
SRR9276696	SRX6046495	SRP201163	SAMN12027155	46,031,500	94%	35%
SRR9276697	SRX6046494	SRP201163	SAMN12027156	77,183,790	94%	36%
SRR9276698	SRX6046493	SRP201163	SAMN12027157	73,290,302	94%	36%
SRR9276699	SRX6046492	SRP201163	SAMN12027158	46,463,170	90%	34%
SRR9276714	SRX6046477	SRP201163	SAMN12027159	52,649,984	95%	35%
SRR9276715	SRX6046476	SRP201163	SAMN12027160	44,744,222	95%	36%
SRR9276716	SRX6046475	SRP201163	SAMN12027161	58,309,564	86%	31%
SRR9276717	SRX6046474	SRP201163	SAMN12027162	45,361,508	86%	31%
SRR9276710	SRX6046481	SRP201163	SAMN12027163	65,635,850	86%	31%
SRR9276711	SRX6046480	SRP201163	SAMN12027164	50,178,144	93%	34%
SRR9276712	SRX6046479	SRP201163	SAMN12027165	49,062,324	96%	35%
SRR9276713	SRX6046478	SRP201163	SAMN12027166	50,407,392	92%	34%
SRR9276718	SRX6046473	SRP201163	SAMN12027167	52,540,204	96%	35%
SRR9276719	SRX6046472	SRP201163	SAMN12027168	43,241,374	95%	35%
SRR9276701	SRX6046490	SRP201163	SAMN12027169	64,373,550	96%	35%
SRR9276700	SRX6046491	SRP201163	SAMN12027170	70,121,000	96%	36%
SRR9276703	SRX6046488	SRP201163	SAMN12027171	42,866,954	96%	36%
SRR9276702	SRX6046489	SRP201163	SAMN12027172	50,629,222	96%	36%
SRR9276705	SRX6046486	SRP201163	SAMN12027173	46,291,538	95%	36%
SRR9276704	SRX6046487	SRP201163	SAMN12027174	55,535,564	96%	36%
SRR9276707	SRX6046484	SRP201163	SAMN12027175	56,985,700	95%	35%
SRR9276706	SRX6046485	SRP201163	SAMN12027176	51,783,256	93%	33%
SRR9276709	SRX6046482	SRP201163	SAMN12027177	64,124,068	93%	33%
SRR9276708	SRX6046483	SRP201163	SAMN12027178	51,513,084	93%	33%
SRR9963107	SRX6710622	SRP218269	SAMN12568801	59,789,258	96%	38%
SRR9963108	SRX6710621	SRP218269	SAMN12568802	64,380,286	96%	38%
SRR9963109	SRX6710620	SRP218269	SAMN12568803	42,832,440	86%	38%
SRR9963110	SRX6710619	SRP218269	SAMN12568804	47,660,108	95%	38%
SRR9963103	SRX6710626	SRP218269	SAMN12568805	46,695,054	96%	39%
SRR9963104	SRX6710625	SRP218269	SAMN12568806	48,183,842	96%	38%
SRR9963105	SRX6710624	SRP218269	SAMN12568807	40,273,992	97%	36%
SRR9963106	SRX6710623	SRP218269	SAMN12568808	39,276,408	97%	35%
SRR9963111	SRX6710618	SRP218269	SAMN12568809	55,537,474	98%	37%
SRR9963112	SRX6710617	SRP218269	SAMN12568810	43,704,610	97%	30%
SRR9963113	SRX6710616	SRP218269	SAMN12568811	58,785,472	97%	30%
SRR9963114	SRX6710615	SRP218269	SAMN12568812	51,671,028	97%	30%
SRR9963115	SRX6710614	SRP218269	SAMN12568813	82,106,892	96%	32%
SRR9963116	SRX6710613	SRP218269	SAMN12568814	62,247,762	95%	32%
SRR9963117	SRX6710612	SRP218269	SAMN12568815	55,000,560	96%	32%
SRR9963118	SRX6710611	SRP218269	SAMN12568816	59,207,402	97%	39%
SRR9963119	SRX6710610	SRP218269	SAMN12568817	74,721,590	95%	37%
SRR9963120	SRX6710609	SRP218269	SAMN12568818	80,980,998	96%	38%
SRR9963121	SRX6710608	SRP218269	SAMN12568819	72,018,954	96%	38%
SRR9963122	SRX6710607	SRP218269	SAMN12568820	53,421,032	96%	38%
SRR9963100	SRX6710629	SRP218269	SAMN12568821	72,862,098	97%	38%
SRR9963099	SRX6710630	SRP218269	SAMN12568822	55,732,762	98%	36%
SRR9963102	SRX6710627	SRP218269	SAMN12568823	47,733,924	98%	37%
SRR9963101	SRX6710628	SRP218269	SAMN12568824	40,321,180	98%	36%
SRR9963096	SRX6710633	SRP218269	SAMN12568825	48,105,006	97%	30%
SRR9963095	SRX6710634	SRP218269	SAMN12568826	50,271,474	96%	30%
SRR9963098	SRX6710631	SRP218269	SAMN12568827	52,261,548	96%	30%
SRR9963097	SRX6710632	SRP218269	SAMN12568828	48,185,930	96%	28%
SRR9963094	SRX6710635	SRP218269	SAMN12568829	43,871,140	97%	28%
SRR9963093	SRX6710636	SRP218269	SAMN12568830	56,528,730	96%	28%
SRR11729065	SRX8288321	SRP260465	SAMN14838828	8,880,702	86%	37%
SRR15345837	SRX11650059	SRP331130	SAMN20570107	134,042,620	90%	38%
SRR17134294	SRX13318733	SRP349288	SAMN23644683	37,799,752	91%	39%
SRR17134293	SRX13318734	SRP349288	SAMN23644684	40,761,054	90%	39%
SRR17134284	SRX13318743	SRP349288	SAMN23644685	46,507,582	90%	39%
SRR17134283	SRX13318744	SRP349288	SAMN23644686	35,596,180	92%	40%
SRR17134282	SRX13318745	SRP349288	SAMN23644687	32,664,786	91%	40%
SRR17134281	SRX13318746	SRP349288	SAMN23644688	37,373,424	91%	39%
SRR17134280	SRX13318747	SRP349288	SAMN23644689	40,484,872	92%	40%
SRR17134279	SRX13318748	SRP349288	SAMN23644690	40,906,340	92%	39%
SRR17134278	SRX13318749	SRP349288	SAMN23644691	37,434,964	93%	40%
SRR17134277	SRX13318750	SRP349288	SAMN23644692	44,301,408	90%	38%
SRR17134292	SRX13318735	SRP349288	SAMN23644693	31,389,770	91%	39%
SRR17134291	SRX13318736	SRP349288	SAMN23644694	35,478,774	91%	39%
SRR17134290	SRX13318737	SRP349288	SAMN23644695	41,042,640	91%	39%
SRR17134289	SRX13318738	SRP349288	SAMN23644696	40,721,448	90%	39%
SRR17134288	SRX13318739	SRP349288	SAMN23644697	37,726,598	91%	39%
SRR17134287	SRX13318740	SRP349288	SAMN23644698	43,511,682	91%	38%
SRR17134286	SRX13318741	SRP349288	SAMN23644699	33,092,898	92%	39%
SRR17134285	SRX13318742	SRP349288	SAMN23644700	35,843,272	92%	39%
SRR17331925	SRX13507482	SRP352467	SAMN24390291	68,243,348	176%	36%
SRR17331924	SRX13507483	SRP352467	SAMN24390292	69,382,683	176%	36%
SRR17332954	SRX13508508	SRP352511	SAMN24425025	70,244,910	178%	36%
SRR17332953	SRX13508509	SRP352511	SAMN24425026	68,607,786	175%	35%

SRA Long Read Alignment Statistics

The alignments of the following long RNA-Seq reads (PacBio, Oxford Nanopore, 454, or other long-read sequencing technologies) from the Sequence Read Archive with minimap2 were used for gene prediction:

Run	Sample	Number of reads	Number (%) of sequences aligned by Minimap2	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
All	NA	11524060	10736062 (93.16%)	8834655 (76.66%)	99.16	97.21
SRR1302612	SAMN02803824	520797	475912 (91.38%)	336941 (64.69%)	99.35	97.73
SRR1302614	SAMN02803825	559999	523966 (93.56%)	411039 (73.39%)	99.39	98.04
SRR1302616	SAMN02803826	563572	516140 (91.58%)	391774 (69.51%)	99.36	97.95
SRR1302617	SAMN02803827	554995	512687 (92.37%)	406526 (73.24%)	99.34	97.92
SRR1302618	SAMN02803828	507847	471670 (92.87%)	375319 (73.90%)	99.33	98.1
SRR1302619	SAMN02803829	563976	512625 (90.89%)	382900 (67.89%)	99.33	97.84
SRR2089734	SAMN03846793	312266	293855 (94.10%)	250706 (80.28%)	99.5	99.18
SRR2089735	SAMN03846794	349886	327765 (93.67%)	280719 (80.23%)	99.51	99.1
SRR2089736	SAMN03846795	224692	211218 (94.00%)	178952 (79.64%)	99.5	99.04
SRR2089737	SAMN03846796	225896	211919 (93.81%)	180270 (79.80%)	99.47	98.86
SRR2089738	SAMN03846798	228860	214998 (93.94%)	185824 (81.19%)	99.48	98.88
SRR3204831	SAMN04309034	1595851	1480241 (92.75%)	1273513 (79.80%)	98.98	96.18
SRR3204832	SAMN04309034	688526	646049 (93.83%)	572972 (83.21%)	99.02	96.28
SRR3204833	SAMN04309034	800100	750338 (93.78%)	655017 (81.86%)	99.01	96.27
SRR934439	SAMN02230965	496034	464820 (93.70%)	385171 (77.65%)	98.42	94.4
SRR934440	SAMN02230966	574074	540068 (94.07%)	446938 (77.85%)	98.53	95.11
SRR934441	SAMN02230967	526038	497582 (94.59%)	410830 (78.09%)	98.68	94.69
SRR934442	SAMN02230968	413098	383046 (92.72%)	303024 (73.35%)	98.45	94.07
SRR934443	SAMN02230969	474380	440911 (92.94%)	355246 (74.88%)	98.44	94.6
SRR934444	SAMN02230970	591513	555416 (93.89%)	452241 (76.45%)	98.53	94.88
SRR934445	SAMN02230971	365255	341471 (93.48%)	288884 (79.09%)	98.68	94.14
SRR934446	SAMN02230972	386405	363365 (94.03%)	309849 (80.18%)	98.73	94.76

Protein alignments

The alignments of the following proteins with ProSplign were used for gene prediction:

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Theobroma cacao high-quality model RefSeq (XP_)	13,536	13,025 (96.22%)	13,025 (96.22%)	69.27%	77.98%
Cucurbita maxima high-quality model RefSeq (XP_)	18,981	18,531 (97.63%)	18,531 (97.63%)	69.53%	77.17%
Arabidopsis thaliana GenBank	53,281	48,820 (91.63%)	48,820 (91.63%)	69.32%	75.97%
Arabidopsis thaliana known RefSeq (NP_)	48,147	41,652 (86.51%)	41,652 (86.51%)	67.14%	71.93%
Fabaceae GenBank	42,139	39,503 (93.74%)	39,503 (93.74%)	73.98%	84.87%
Fabaceae known RefSeq (NP_)	8,466	8,256 (97.52%)	8,256 (97.52%)	72.82%	83.03%
Same-species GenBank	1,482	1,435 (96.83%)	1,435 (96.83%)	78.73%	87.41%
Nelumbo nucifera high-quality model RefSeq (XP_)	14,296	13,824 (96.70%)	13,824 (96.70%)	69.44%	77.59%
Trifolium pratense high-quality model RefSeq (XP_)	19,172	18,395 (95.95%)	18,395 (95.95%)	73.69%	84.90%

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
BUSCO: Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. Molecular biology and evolution 2021.38(10):4647-4654
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20
STAR: Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. Bioinformatics 2013 Jan 1;29(1):15-21.
Minimap2: Li H. Bioinformatics 2018 Sep 15;34(18):3094-3100

RefSeq

Integrated reference sequences