NCBI Solanum verrucosum Annotation Release 100

The RefSeq genome records for Solanum verrucosum were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. This report presents statistics on the annotation products, the input data used in the pipeline and intermediate alignment results.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
BUSCO results: Annotation completeness assessed with BUSCO
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as NCBI Solanum verrucosum Annotation Release 100

Annotation release ID: 100
Date of Entrez queries for transcripts and proteins: Jun 24 2022
Date of submission of annotation to the public databases: Jul 23 2022
Software version: 10.0

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
falcon-dt-bn	GCF_900185275.1	Earlham Institute	03-13-2019	Reference	1 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	falcon-dt-bn
Genes and pseudogenes	34,471
protein-coding	27,956
non-coding	2,372
Transcribed pseudogenes	141
Non-transcribed pseudogenes	4,002
genes with variants	3,694
Immunoglobulin/T-cell receptor gene segments	0
other	0
mRNAs	33,605
fully-supported	27,696
with > 5% ab initio	4,392
partial	185
with filled gap(s)	0
known RefSeq (NM_)	0
model RefSeq (XM_)	33,605
non-coding RNAs	2,878
fully-supported	1,010
with > 5% ab initio	0
partial	1
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	2,016
pseudo transcripts	141
fully-supported	126
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	141
CDSs	33,690
fully-supported	27,696
with > 5% ab initio	4,506
partial	185
with major correction(s)	654
known RefSeq (NP_)	0
model RefSeq (XP_)	33,690

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	30,328	8,085	2,842	61	1,808,044
All transcripts	36,483	1,542	1,333	38	39,431
mRNA	33,605	1,631	1,397	102	39,431
misc_RNA	465	1,870	1,673	116	7,821
tRNA	854	74	73	60	92
lncRNA	546	439	338	38	2,835
snoRNA	402	107	104	61	221
snRNA	310	136	118	98	198
rRNA	301	566	119	103	3,394
Single-exon transcripts	5,180	1,184	985	114	14,153
coding transcripts (NM_/XM_ )	5,180	1,184	985	114	14,153
CDSs	33,690	1,376	1,137	90	39,431
Exons	160,327	287	156	1	14,174
in coding transcripts (NM_/XM_ )	158,108	289	157	1	14,174
in non-coding transcripts (NR_/XR_ )	4,762	187	116	5	4,436
Introns	128,856	2,065	251	30	1,199,353
in coding transcripts (NM_/XM_ )	127,471	2,003	249	30	1,199,353
in non-coding transcripts (NR_/XR_ )	3,889	3,420	324	30	513,475

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	1.21	1	1	44
Number of exons per transcript	5.82	4	1	80

BUSCO analysis of gene annotation

BUSCO v4.1.4 was run in "protein" mode on the annotated gene set picking one longest protein per gene, and run using the solanales_odb10 lineage dataset. Results are reported for the gene set from the primary assembly unit, and presented in BUSCO notation.

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the Arabidopsis thaliana known RefSeq proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 27871 coding genes, 24843 genes had a protein with an alignment covering 50% or more of the query and 11738 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: Arabidopsis thaliana known RefSeq proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker (if calculated), for each assembly. RepeatMasker results are only calculated for organisms with complete Dfam HMM model collections.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with WindowMasker
falcon-dt-bn	GCF_900185275.1	37.02%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez Nucleotide, Entrez Protein, and SRA, and aligned to the genome.

Transcript alignments

The alignments of the following transcripts with Splign were used for gene prediction:

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species Genbank	6	6 (100.00%)	6 (100.00%)	99.67%	99.77%
Solanum lycopersicum known RefSeq (NM_/NR_)	2,958	2,899 (98.01%)	2,105 (71.16%)	94.03%	97.58%
Solanum lycopersicum Genbank	17,967	17,359 (96.62%)	12,103 (67.36%)	93.90%	96.80%
Solanum lycopersicum EST	300,738	235,086 (78.17%)	212,500 (70.66%)	94.77%	98.45%
Solanum known RefSeq (NM_/NR_)	56	54 (96.43%)	46 (82.14%)	94.89%	98.32%
Solanum Genbank	1,547	1,296 (83.78%)	844 (54.56%)	94.43%	97.92%
Solanum EST	177,787	105,816 (59.52%)	83,221 (46.81%)	93.86%	97.77%
Solanum tuberosum known RefSeq (NM_/NR_)	1,122	1,084 (96.61%)	986 (87.88%)	97.76%	99.03%
Solanum tuberosum Genbank	2,982	2,918 (97.85%)	2,129 (71.40%)	97.64%	98.81%
Solanum tuberosum EST	250,110	206,564 (82.59%)	193,358 (77.31%)	97.20%	98.55%

RNA-Seq alignments

The alignments of the following RNA-Seq reads with STAR were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Publication	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	NA	Aggregate of all aligned samples	2,493,867,697	58%	38%	155,261
SAMEA1695711	NA	S. berthaultii BR (Solanum berthaultii, SAMEA1695711)	14,145,325	0%	24%	205
SAMEA1695712	NA	S. berthaultii BR (Solanum berthaultii, SAMEA1695712)	11,879,941	0%	15%	112
SAMEA2276961	NA	LIB_5846 (Solanum lycopersicum, SAMEA2276961)	2,882,986	31%	11%	8,467
SAMEA2276963	NA	LIB_5847 (Solanum pimpinellifolium, SAMEA2276963)	4,423,896	32%	13%	12,591
SAMEA3486922	NA	S. americanum 944750095 (Solanum americanum, SAMEA3486922)	3,350,770	14%	30%	11,654
SAMEA4416263	NA	Rapid identification disease resistance genes from Solanum americanum by resistance gene enrichment sequencing (RenSeq) (Solanum americanum, SAMEA4416263)	14,971,800	10%	23%	18,441
SAMN00002433	20431087	Generic sample from Solanum lycopersicum (Solanum lycopersicum, SAMN00002433)	336,745	6%	51%	7,637
SAMN02338739	NA	roots (Solanum lycopersicum, 40 days, SAMN02338739)	11,717,236	5%	37%	60,216
SAMN02338740	NA	roots (Solanum lycopersicum, 40 days, SAMN02338740)	11,787,256	5%	38%	60,502
SAMN03252626	25637453	207_R2 (Solanum pimpinellifolium, two-months old, SAMN03252626)	1,867,236	51%	54%	74,486
SAMN03252627	25637453	207N_R2 (Solanum pimpinellifolium, two-months old, SAMN03252627)	2,954,332	53%	53%	82,988
SAMN03252628	25637453	207_R1 (Solanum pimpinellifolium, two-months old, SAMN03252628)	1,430,760	51%	53%	70,677
SAMN03252629	25637453	207N_R1 (Solanum pimpinellifolium, two-months old, SAMN03252629)	3,711,446	52%	53%	87,030
SAMN03252630	25637453	207N_R3 (Solanum pimpinellifolium, two-months old, SAMN03252630)	3,076,312	54%	54%	85,091
SAMN03252631	25637453	207_R3 (Solanum pimpinellifolium, two-months old, SAMN03252631)	2,874,204	51%	53%	83,214
SAMN03769434	NA	Whole fruits (Solanum pimpinellifolium, SAMN03769434)	6,811,272	55%	53%	87,031
SAMN03769435	NA	Whole fruit (Solanum pimpinellifolium, SAMN03769435)	5,466,275	55%	53%	82,992
SAMN03769436	NA	Whole fruit (Solanum pimpinellifolium, SAMN03769436)	5,741,414	56%	55%	85,093
SAMN03769437	NA	Whole fruit (Solanum pimpinellifolium, SAMN03769437)	15,017,081	54%	52%	94,023
SAMN03769438	NA	Whole fruit (Solanum pimpinellifolium, SAMN03769438)	11,823,653	51%	52%	92,259
SAMN03769440	NA	Whole fruit (Solanum pimpinellifolium, SAMN03769440)	14,857,280	56%	54%	95,646
SAMN03769441	NA	Whole fruit (Solanum pimpinellifolium, SAMN03769441)	9,653,322	52%	53%	88,590
SAMN03769442	NA	Whole fruit (Solanum pimpinellifolium, SAMN03769442)	10,327,618	52%	54%	88,491
SAMN03769443	NA	Whole fruit (Solanum pimpinellifolium, SAMN03769443)	14,544,672	51%	52%	93,637
SAMN03769444	NA	Seedling (Solanum pimpinellifolium, SAMN03769444)	32,863,088	56%	49%	103,518
SAMN03769445	NA	seedling (Solanum lycopersicum, SAMN03769445)	34,443,906	54%	49%	103,260
SAMN03769448	NA	flower (Solanum lycopersicum, SAMN03769448)	25,910,889	49%	51%	100,572
SAMN08728274	NA	Leaf (Solanum pennellii, 10 weeks, SAMN08728274)	56,518,266	65%	36%	110,965
SAMN08728275	NA	Leaf (Solanum pennellii, 10 weeks, SAMN08728275)	52,220,290	68%	36%	113,029
SAMN08728276	NA	Leaf (Solanum pennellii, 10 weeks, SAMN08728276)	47,208,490	68%	36%	111,576
SAMN08728277	NA	Leaf (Solanum pennellii, 10 weeks, SAMN08728277)	53,391,364	67%	36%	114,439
SAMN08728278	NA	Leaf (Solanum pennellii, 10 weeks, SAMN08728278)	55,360,452	69%	37%	111,426
SAMN08728279	NA	Leaf (Solanum pennellii, 10 weeks, SAMN08728279)	54,858,794	68%	36%	112,507
SAMN08728280	NA	Leaf (Solanum pennellii, 10 weeks, SAMN08728280)	49,473,822	67%	36%	112,875
SAMN08728281	NA	Leaf (Solanum pennellii, 10 weeks, SAMN08728281)	50,546,968	68%	37%	109,426
SAMN08728282	NA	Leaf (Solanum pennellii, 10 weeks, SAMN08728282)	49,889,066	68%	37%	113,176
SAMN08728283	NA	Leaf (Solanum pennellii, 10 weeks, SAMN08728283)	24,764,552	57%	34%	102,057
SAMN08728285	NA	Leaf (Solanum pennellii, 10 weeks, SAMN08728285)	52,528,202	69%	37%	109,215
SAMN08728286	NA	Leaf (Solanum pennellii, 10 weeks, SAMN08728286)	52,621,936	67%	37%	111,038
SAMN08728287	NA	Leaf (Solanum pennellii, 10 weeks, SAMN08728287)	64,134,908	67%	36%	114,516
SAMN08728288	NA	Leaf (Solanum pennellii, 10 weeks, SAMN08728288)	48,816,256	64%	36%	108,423
SAMN08728289	NA	Leaf (Solanum pennellii, 10 weeks, SAMN08728289)	60,000,038	68%	37%	110,992
SAMN08728290	NA	Leaf (Solanum pennellii, 10 weeks, SAMN08728290)	61,500,398	67%	36%	112,688
SAMN08728291	NA	Leaf (Solanum pennellii, 10 weeks, SAMN08728291)	67,815,674	67%	36%	112,870
SAMN08728292	NA	Leaf (Solanum pennellii, 10 weeks, SAMN08728292)	58,154,310	67%	36%	112,512
SAMN08728293	NA	Leaf (Solanum pennellii, 10 weeks, SAMN08728293)	52,293,508	65%	36%	110,962
SAMN08728294	NA	Leaf (Solanum pennellii, 10 weeks, SAMN08728294)	55,560,608	69%	35%	115,205
SAMN08728295	NA	Leaf (Solanum pennellii, 10 weeks, SAMN08728295)	106,752,820	67%	37%	119,231
SAMN08728296	NA	Leaf (Solanum pennellii, 10 weeks, SAMN08728296)	83,343,294	68%	36%	117,848
SAMN08728297	NA	Leaf (Solanum pennellii, 10 weeks, SAMN08728297)	44,797,046	61%	35%	109,433
SAMN08728298	NA	Leaf (Solanum pennellii, 10 weeks, SAMN08728298)	54,448,478	67%	37%	111,448
SAMN08728299	NA	Leaf (Solanum pennellii, 10 weeks, SAMN08728299)	53,662,856	68%	36%	111,814
SAMN08728300	NA	Leaf (Solanum pennellii, 10 weeks, SAMN08728300)	52,691,218	66%	38%	109,430
SAMN08728301	NA	Leaf (Solanum pennellii, 10 weeks, SAMN08728301)	56,722,800	67%	37%	112,626
SAMN08728302	NA	Leaf (Solanum pennellii, 10 weeks, SAMN08728302)	98,543,608	68%	36%	117,420
SAMN08728304	NA	Leaf (Solanum pennellii, 10 weeks, SAMN08728304)	110,823,514	68%	37%	117,842
SAMN09982066	NA	Leaf (Solanum tuberosum, SAMN09982066)	14,688,740	22%	36%	81,946
SAMN12623202	33769517	young leaves (Solanum appendiculatum, male, SAMN12623202)	57,553,314	38%	44%	113,413
SAMN12623203	33769517	young leaves (Solanum appendiculatum, female, SAMN12623203)	73,835,216	38%	44%	115,983
SAMN12623205	33769517	young leaves (Solanum appendiculatum, female, SAMN12623205)	72,543,508	37%	42%	115,584
SAMN12623206	33769517	flower buds (Solanum appendiculatum, male, SAMN12623206)	85,899,298	40%	44%	116,858
SAMN12623212	33769517	flowers (anthesis +1) (Solanum appendiculatum, male, SAMN12623212)	75,305,728	38%	43%	113,959
SAMN12623213	33769517	flowers (anthesis +1) (Solanum appendiculatum, female, SAMN12623213)	77,159,778	38%	42%	114,456
SAMN15198650	NA	style (Solanum habrochaites, SAMN15198650)	3,672,646	34%	56%	76,033
SAMN15198651	NA	style (Solanum habrochaites, SAMN15198651)	3,127,814	29%	54%	67,825
SAMN15198652	NA	style (Solanum habrochaites, SAMN15198652)	3,621,714	28%	45%	54,839
SAMN15198653	NA	style (Solanum habrochaites, SAMN15198653)	3,892,046	22%	49%	65,547
SAMN15198654	NA	style (Solanum habrochaites, SAMN15198654)	3,335,932	27%	56%	67,452
SAMN15198655	NA	style (Solanum habrochaites, SAMN15198655)	3,770,998	22%	50%	60,998
SAMN15198656	NA	style (Solanum habrochaites, SAMN15198656)	3,497,516	23%	48%	60,000
SAMN15198657	NA	style (Solanum habrochaites, SAMN15198657)	3,649,198	24%	52%	66,277

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
ERR185927	ERX161479	ERP001918	SAMEA1695711	14,145,325	0%	24%
ERR185928	ERX161480	ERP001918	SAMEA1695712	11,879,941	0%	15%
ERR386162	ERX358392	ERP004467	SAMEA2276961	2,882,986	31%	11%
ERR386163	ERX358393	ERP004467	SAMEA2276963	4,423,896	32%	13%
ERR966153	ERX1043120	ERP011067	SAMEA3486922	3,350,770	14%	30%
ERR1607934	ERX1678501	ERP017007	SAMEA4416263	8,702,164	11%	23%
ERR1607935	ERX1678502	ERP017007	SAMEA4416263	6,269,636	9%	22%
SRR015436	SRX004022	SRP000732	SAMN00002433	336,745	6%	51%
SRR960387	SRX342183	SRP029380	SAMN02338739	11,717,236	5%	37%
SRR960405	SRX342200	SRP029380	SAMN02338740	11,787,256	5%	38%
SRR1693238	SRX796065	SRP050532	SAMN03252626	1,867,236	51%	54%
SRR1693235	SRX796062	SRP050532	SAMN03252627	2,954,332	53%	53%
SRR1693237	SRX796064	SRP050532	SAMN03252628	1,430,760	51%	53%
SRR1693234	SRX796061	SRP050532	SAMN03252629	3,711,446	52%	53%
SRR1693236	SRX796063	SRP050532	SAMN03252630	3,076,312	54%	54%
SRR1693239	SRX796066	SRP050532	SAMN03252631	2,874,204	51%	53%
SRR2064660	SRX1152125	SRP059571	SAMN03769434	3,099,826	59%	53%
SRR2064661	SRX1152007	SRP059571	SAMN03769435	2,511,943	57%	53%
SRR2064664	SRX1152127	SRP059571	SAMN03769436	2,665,102	58%	55%
SRR2064665	SRX1152010	SRP059571	SAMN03769437	6,895,063	56%	52%
SRR2064666	SRX1152128	SRP059571	SAMN03769438	5,200,527	54%	52%
SRR2064667	SRX1152011	SRP059571	SAMN03769440	6,860,320	58%	54%
SRR2064668	SRX1152012	SRP059571	SAMN03769441	4,882,566	56%	53%
SRR2064669	SRX1152130	SRP059571	SAMN03769442	5,359,290	54%	54%
SRR2064670	SRX1152131	SRP059571	SAMN03769443	7,419,194	54%	52%
SRR2064671	SRX1152013	SRP059571	SAMN03769444	13,933,456	58%	49%
SRR2064662	SRX1152008	SRP059571	SAMN03769445	14,977,178	56%	50%
SRR2064663	SRX1152009	SRP059571	SAMN03769448	11,665,983	50%	50%
SRR2356898	SRX1227042	SRP063634	SAMN03769434	3,711,446	52%	53%
SRR2356900	SRX1227043	SRP063634	SAMN03769435	2,954,332	53%	53%
SRR2356914	SRX1227049	SRP063634	SAMN03769436	3,076,312	54%	54%
SRR2356916	SRX1227050	SRP063634	SAMN03769437	8,122,018	52%	52%
SRR2356927	SRX1227054	SRP063634	SAMN03769438	6,623,126	48%	52%
SRR2356939	SRX1227058	SRP063634	SAMN03769440	7,996,960	55%	54%
SRR2356941	SRX1227063	SRP063634	SAMN03769441	4,770,756	49%	53%
SRR2356942	SRX1227064	SRP063634	SAMN03769442	4,968,328	50%	54%
SRR2356968	SRX1227066	SRP063634	SAMN03769443	7,125,478	48%	52%
SRR2356969	SRX1227083	SRP063634	SAMN03769444	18,929,632	54%	49%
SRR2356905	SRX1227045	SRP063634	SAMN03769445	19,466,728	52%	49%
SRR2356907	SRX1227046	SRP063634	SAMN03769448	14,244,906	49%	51%
SRR11977528	SRX8520898	SRP069274	SAMN15198650	3,672,646	34%	56%
SRR11977527	SRX8520899	SRP069274	SAMN15198651	3,127,814	29%	54%
SRR11977516	SRX8520910	SRP069274	SAMN15198652	3,621,714	28%	45%
SRR11977505	SRX8520921	SRP069274	SAMN15198653	3,892,046	22%	49%
SRR11977502	SRX8520924	SRP069274	SAMN15198654	3,335,932	27%	56%
SRR11977501	SRX8520925	SRP069274	SAMN15198655	3,770,998	22%	50%
SRR11977500	SRX8520926	SRP069274	SAMN15198656	3,497,516	23%	48%
SRR11977499	SRX8520927	SRP069274	SAMN15198657	3,649,198	24%	52%
SRR6862263	SRX3817233	SRP136022	SAMN08728274	56,518,266	65%	36%
SRR6862264	SRX3817232	SRP136022	SAMN08728275	52,220,290	68%	36%
SRR6862265	SRX3817231	SRP136022	SAMN08728276	47,208,490	68%	36%
SRR6862266	SRX3817230	SRP136022	SAMN08728277	53,391,364	67%	36%
SRR6862267	SRX3817229	SRP136022	SAMN08728278	55,360,452	69%	37%
SRR6862268	SRX3817228	SRP136022	SAMN08728279	54,858,794	68%	36%
SRR6862269	SRX3817227	SRP136022	SAMN08728280	49,473,822	67%	36%
SRR6862270	SRX3817226	SRP136022	SAMN08728281	50,546,968	68%	37%
SRR6862261	SRX3817235	SRP136022	SAMN08728282	49,889,066	68%	37%
SRR6862262	SRX3817234	SRP136022	SAMN08728283	24,764,552	57%	34%
SRR6862278	SRX3817218	SRP136022	SAMN08728285	52,528,202	69%	37%
SRR6862277	SRX3817219	SRP136022	SAMN08728286	52,621,936	67%	37%
SRR6862280	SRX3817216	SRP136022	SAMN08728287	64,134,908	67%	36%
SRR6862279	SRX3817217	SRP136022	SAMN08728288	48,816,256	64%	36%
SRR6862273	SRX3817223	SRP136022	SAMN08728289	60,000,038	68%	37%
SRR6862274	SRX3817222	SRP136022	SAMN08728290	61,500,398	67%	36%
SRR6862275	SRX3817221	SRP136022	SAMN08728291	67,815,674	67%	36%
SRR6862276	SRX3817220	SRP136022	SAMN08728292	58,154,310	67%	36%
SRR6862271	SRX3817225	SRP136022	SAMN08728293	52,293,508	65%	36%
SRR6862272	SRX3817224	SRP136022	SAMN08728294	55,560,608	69%	35%
SRR6862282	SRX3817214	SRP136022	SAMN08728295	106,752,820	67%	37%
SRR6862283	SRX3817213	SRP136022	SAMN08728296	83,343,294	68%	36%
SRR6862285	SRX3817211	SRP136022	SAMN08728297	44,797,046	61%	35%
SRR6862284	SRX3817212	SRP136022	SAMN08728298	54,448,478	67%	37%
SRR6862287	SRX3817209	SRP136022	SAMN08728299	53,662,856	68%	36%
SRR6862289	SRX3817207	SRP136022	SAMN08728300	52,691,218	66%	38%
SRR6862286	SRX3817210	SRP136022	SAMN08728301	56,722,800	67%	37%
SRR6862288	SRX3817208	SRP136022	SAMN08728302	98,543,608	68%	36%
SRR6862281	SRX3817215	SRP136022	SAMN08728304	110,823,514	68%	37%
SRR8053018	SRX4882673	SRP165682	SAMN09982066	14,688,740	22%	36%
SRR10027293	SRX6763547	SRP219422	SAMN12623202	29,638,448	37%	42%
SRR10027292	SRX6763548	SRP219422	SAMN12623202	27,914,866	40%	45%
SRR10027297	SRX6763543	SRP219422	SAMN12623203	38,128,650	36%	42%
SRR10027296	SRX6763544	SRP219422	SAMN12623203	35,706,566	39%	45%
SRR10027300	SRX6763540	SRP219422	SAMN12623205	40,601,250	36%	41%
SRR10027299	SRX6763541	SRP219422	SAMN12623205	31,942,258	39%	45%
SRR10027295	SRX6763545	SRP219422	SAMN12623206	46,129,242	38%	42%
SRR10027294	SRX6763546	SRP219422	SAMN12623206	39,770,056	41%	45%
SRR10027304	SRX6763536	SRP219422	SAMN12623212	39,771,460	37%	41%
SRR10027303	SRX6763537	SRP219422	SAMN12623212	35,534,268	40%	45%
SRR10027302	SRX6763538	SRP219422	SAMN12623213	40,649,882	37%	41%
SRR10027301	SRX6763539	SRP219422	SAMN12623213	36,509,896	39%	44%

Protein alignments

The alignments of the following proteins with ProSplign were used for gene prediction:

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species GenBank	5	5 (100.00%)	5 (100.00%)	67.43%	93.98%
Arabidopsis thaliana known RefSeq (NP_)	48,147	41,913 (87.05%)	41,913 (87.05%)	66.34%	71.02%
Solanaceae GenBank	14,784	14,214 (96.14%)	14,214 (96.14%)	75.03%	87.70%
Solanaceae known RefSeq (NP_)	5,566	5,472 (98.31%)	5,472 (98.31%)	76.68%	88.19%
Solanum lycopersicum high-quality model RefSeq (XP_)	19,677	19,090 (97.02%)	19,090 (97.02%)	76.30%	87.39%

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
BUSCO: Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. Molecular biology and evolution 2021.38(10):4647-4654
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20
STAR: Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. Bioinformatics 2013 Jan 1;29(1):15-21.
Minimap2: Li H. Bioinformatics 2018 Sep 15;34(18):3094-3100

RefSeq

Integrated reference sequences