NCBI Cherax quadricarinatus Annotation Release GCF_026875155.1-RS_2023_01

The RefSeq genome records for Cherax quadricarinatus were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. This report presents statistics on the annotation products, the input data used in the pipeline and intermediate alignment results.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
BUSCO results: Annotation completeness assessed with BUSCO
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as "GCF_026875155.1-RS_2023_01".

Date of Entrez queries for transcripts and proteins: Jan 17 2023
Date of submission of annotation to the public databases: Feb 23 2023
Software version: 10.1

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
ASM2687515v2	GCF_026875155.1	Zhejiang Academy of Agricultural Sciences	12-14-2022	Reference	101 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	ASM2687515v2
Genes and pseudogenes	22,681
protein-coding	18,152
non-coding	3,065
Transcribed pseudogenes	13
Non-transcribed pseudogenes	1,448
genes with variants	5,206
Immunoglobulin/T-cell receptor gene segments	0
other	3
mRNAs	31,437
fully-supported	26,812
with > 5% ab initio	2,108
partial	2,235
with filled gap(s)	1,644
known RefSeq (NM_)	0
model RefSeq (XM_)	31,437
non-coding RNAs	4,523
fully-supported	3,216
with > 5% ab initio	0
partial	15
with filled gap(s)	15
known RefSeq (NR_)	0
model RefSeq (XR_)	3,535
pseudo transcripts	13
fully-supported	11
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	13
CDSs	31,450
fully-supported	26,812
with > 5% ab initio	2,664
partial	2,157
with major correction(s)	1,910
known RefSeq (NP_)	0
model RefSeq (XP_)	31,450

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	21,220	67,597	25,080	60	1,629,195
All transcripts	35,960	3,319	2,356	60	56,350
mRNA	31,437	3,570	2,580	99	56,350
misc_RNA	965	3,462	2,387	172	22,032
tRNA	986	74	73	61	87
lncRNA	2,252	1,609	920	88	17,466
snoRNA	35	158	161	67	296
snRNA	205	135	112	60	202
rRNA	77	698	119	118	4,908
Single-exon transcripts	839	2,545	1,872	243	17,933
coding transcripts (NM_/XM_ )	839	2,545	1,872	243	17,933
CDSs	31,450	1,738	1,254	99	55,035
Exons	152,534	443	163	1	33,742
in coding transcripts (NM_/XM_ )	145,310	435	163	1	33,742
in non-coding transcripts (NR_/XR_ )	10,549	499	164	11	18,976
Introns	131,083	12,856	3,046	30	595,852
in coding transcripts (NM_/XM_ )	126,161	12,933	3,062	30	595,852
in non-coding transcripts (NR_/XR_ )	8,043	10,654	2,680	30	481,040

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	1.73	1	1	48
Number of exons per transcript	7.96	6	1	109

BUSCO analysis of gene annotation

BUSCO v4.1.4 was run in "protein" mode on the annotated gene set picking one longest protein per gene, and run using the arthropoda_odb10 lineage dataset. Results are reported for the gene set from the primary assembly unit, and presented in BUSCO notation.

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the UniProtKB/Swiss-Prot curated proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 18139 coding genes, 11961 genes had a protein with an alignment covering 50% or more of the query and 2483 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: UniProtKB/Swiss-Prot curated proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker (if calculated), for each assembly. RepeatMasker results are only calculated for organisms with complete Dfam HMM model collections.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with WindowMasker
ASM2687515v2	GCF_026875155.1	58.67%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez Nucleotide, Entrez Protein, and SRA, and aligned to the genome.

Transcript alignments

The alignments of the following transcripts with Splign were used for gene prediction:

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species Genbank	1,118	969 (86.67%)	774 (69.23%)	99.13%	93.60%
Same-species TSA	198,153	176,694 (89.17%)	128,606 (64.90%)	99.15%	96.69%
Same-species EST	120	54 (45.00%)	47 (39.17%)	98.82%	98.51%

RNA-Seq alignments

The alignments of the following RNA-Seq reads with STAR were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	Aggregate of all aligned samples	7,237,804,270	57%	23%	171,727
SAMEA2277320	kidney (Cherax quadricarinatus, SAMEA2277320)	50,774,132	80%	5%	76,933
SAMEA2277321	liver (Cherax quadricarinatus, SAMEA2277321)	34,336,234	80%	7%	58,326
SAMEA2277322	nerve (Cherax quadricarinatus, SAMEA2277322)	38,529,204	80%	6%	81,552
SAMEA2277323	testes (Cherax quadricarinatus, SAMEA2277323)	50,358,042	82%	13%	75,915
SAMN10141291	hepatopancreas (Cherax quadricarinatus, SAMN10141291)	53,148,254	65%	32%	82,319
SAMN10141292	hepatopancreas (Cherax quadricarinatus, SAMN10141292)	66,076,340	66%	31%	86,877
SAMN10141293	hepatopancreas (Cherax quadricarinatus, SAMN10141293)	118,881,940	67%	42%	92,050
SAMN10141294	hepatopancreas (Cherax quadricarinatus, SAMN10141294)	53,231,864	76%	30%	93,280
SAMN10141295	hepatopancreas (Cherax quadricarinatus, SAMN10141295)	88,945,782	69%	37%	99,343
SAMN10141296	hepatopancreas (Cherax quadricarinatus, SAMN10141296)	131,740,490	77%	38%	94,754
SAMN12500462	hepatopancreas (Cherax quadricarinatus, pooled male and female, SAMN12500462)	55,517,534	68%	30%	90,160
SAMN12500463	hepatopancreas (Cherax quadricarinatus, pooled male and female, SAMN12500463)	60,803,190	70%	34%	93,891
SAMN12500464	hepatopancreas (Cherax quadricarinatus, pooled male and female, SAMN12500464)	54,128,412	70%	29%	83,757
SAMN12500465	hepatopancreas (Cherax quadricarinatus, pooled male and female, SAMN12500465)	60,053,456	67%	35%	90,417
SAMN12500466	hepatopancreas (Cherax quadricarinatus, pooled male and female, SAMN12500466)	58,155,472	66%	36%	82,542
SAMN12500467	hepatopancreas (Cherax quadricarinatus, pooled male and female, SAMN12500467)	58,234,598	71%	32%	92,016
SAMN12640691	Brown Eggs, Brown Eggs (Cherax quadricarinatus, SAMN12640691)	176,933,368	53%	13%	117,366
SAMN12640695	Juvenile Eye Stalk (Cherax quadricarinatus, Juvenile, SAMN12640695)	117,482,430	60%	9%	101,027
SAMN12640697	Juvenile Hepatopancreas (Cherax quadricarinatus, Juvenile, SAMN12640697)	100,906,200	64%	14%	83,760
SAMN12640698	Juvenile Muscle (Cherax quadricarinatus, Juvenile, SAMN12640698)	144,025,464	67%	13%	73,371
SAMN12640699	Larvae, Larvae (Cherax quadricarinatus, SAMN12640699)	131,545,402	52%	11%	119,035
SAMN12640700	Muscle (Cherax quadricarinatus, Adult, SAMN12640700)	102,282,720	70%	10%	60,146
SAMN12640701	Larvae, Larvae (Cherax quadricarinatus, SAMN12640701)	137,834,652	73%	24%	137,510
SAMN12640702	Brown Eggs, Brown Eggs (Cherax quadricarinatus, SAMN12640702)	117,111,332	82%	25%	132,698
SAMN12640703	Eye Stalk (Cherax quadricarinatus, Adult, SAMN12640703)	111,581,360	73%	21%	96,735
SAMN12640704	Gill (Cherax quadricarinatus, Adult, SAMN12640704)	124,577,594	72%	25%	113,666
SAMN12640705	Hepatopancreas (Cherax quadricarinatus, Adult, SAMN12640705)	110,338,732	80%	31%	98,351
SAMN12640706	Juvenile Eye Stalk (Cherax quadricarinatus, Juvenile, SAMN12640706)	127,200,182	84%	26%	123,341
SAMN12640707	Juvenile Gill (Cherax quadricarinatus, Juvenile, SAMN12640707)	114,896,974	84%	20%	113,376
SAMN12640708	Juvenile Muscle (Cherax quadricarinatus, Juvenile, SAMN12640708)	104,072,454	86%	29%	100,780
SAMN12640709	Muscle (Cherax quadricarinatus, Adult, SAMN12640709)	110,411,178	84%	33%	75,557
SAMN12640710	Orange Eggs, Orange Eggs (Cherax quadricarinatus, SAMN12640710)	129,459,428	75%	26%	131,557
SAMN12640711	Ovary (Cherax quadricarinatus, Adult, SAMN12640711)	109,484,020	76%	26%	117,665
SAMN12640712	Juvenile Hepatopancreas (Cherax quadricarinatus, Juvenile, SAMN12640712)	121,738,036	82%	25%	82,823
SAMN12640713	Ovary (Cherax quadricarinatus, Adult, SAMN12640713)	117,795,878	51%	15%	103,726
SAMN12640714	Orange Eggs (Cherax quadricarinatus, Adult, SAMN12640714)	151,646,640	43%	9%	96,411
SAMN26867267	testis (Cherax quadricarinatus, SAMN26867267)	48,352,466	74%	42%	61,606
SAMN26867268	testis (Cherax quadricarinatus, SAMN26867268)	49,566,842	73%	33%	71,712
SAMN26867269	testis (Cherax quadricarinatus, SAMN26867269)	48,554,928	67%	26%	75,661
SAMN26867270	ovary (Cherax quadricarinatus, SAMN26867270)	42,501,968	72%	24%	86,795
SAMN26867271	ovary (Cherax quadricarinatus, SAMN26867271)	49,783,780	68%	38%	103,890
SAMN26867272	ovary (Cherax quadricarinatus, SAMN26867272)	45,196,514	69%	40%	101,408
SAMN26867273	testis (Cherax quadricarinatus, SAMN26867273)	48,217,602	76%	46%	68,556
SAMN26867274	testis (Cherax quadricarinatus, SAMN26867274)	53,534,862	72%	52%	66,572
SAMN26867275	testis (Cherax quadricarinatus, SAMN26867275)	45,370,264	45%	22%	41,185
SAMN26867276	ovary (Cherax quadricarinatus, SAMN26867276)	43,944,144	73%	18%	91,652
SAMN26867277	ovary (Cherax quadricarinatus, SAMN26867277)	44,008,998	52%	30%	92,270
SAMN26867278	ovary (Cherax quadricarinatus, SAMN26867278)	48,714,314	81%	23%	71,336
SAMN29983837	Androgenic gland (Cherax quadricarinatus, Adult, intersex, SAMN29983837)	47,448,970	65%	21%	88,545
SAMN29983838	Androgenic gland (Cherax quadricarinatus, Adult, intersex, SAMN29983838)	44,675,996	76%	23%	76,891
SAMN29983839	Androgenic gland (Cherax quadricarinatus, Adult, intersex, SAMN29983839)	44,914,152	68%	20%	77,732
SAMN29983840	Androgenic gland (Cherax quadricarinatus, Adult, intersex, SAMN29983840)	44,616,884	59%	18%	72,372
SAMN29983841	Androgenic gland (Cherax quadricarinatus, Adult, intersex, SAMN29983841)	42,487,280	70%	20%	74,075
SAMN29983842	Androgenic gland (Cherax quadricarinatus, Adult, intersex, SAMN29983842)	44,704,430	74%	23%	82,882
SAMN29983843	Androgenic gland (Cherax quadricarinatus, Adult, intersex, SAMN29983843)	44,011,588	72%	22%	70,199
SAMN29983844	Androgenic gland (Cherax quadricarinatus, Adult, intersex, SAMN29983844)	43,530,018	66%	20%	78,881
SAMN29983845	Testis (Cherax quadricarinatus, Adult, intersex, SAMN29983845)	44,074,598	55%	10%	43,258
SAMN29983846	Testis (Cherax quadricarinatus, Adult, intersex, SAMN29983846)	39,828,822	64%	12%	75,350
SAMN29983847	Testis (Cherax quadricarinatus, Adult, intersex, SAMN29983847)	44,759,428	62%	15%	89,656
SAMN29983848	Testis (Cherax quadricarinatus, Adult, intersex, SAMN29983848)	39,591,846	66%	19%	82,525
SAMN29983849	Testis (Cherax quadricarinatus, Adult, intersex, SAMN29983849)	38,665,006	65%	17%	86,740
SAMN29983850	Testis (Cherax quadricarinatus, Adult, intersex, SAMN29983850)	43,863,820	65%	18%	81,748
SAMN29983851	Testis (Cherax quadricarinatus, Adult, intersex, SAMN29983851)	42,783,386	67%	18%	104,493
SAMN29983852	Testis (Cherax quadricarinatus, Adult, intersex, SAMN29983852)	38,846,662	63%	16%	90,038
SAMN31006035	hepatopancreas (Cherax quadricarinatus, pooled male and female, SAMN31006035)	46,379,574	80%	48%	55,622
SAMN31006036	hepatopancreas (Cherax quadricarinatus, pooled male and female, SAMN31006036)	42,423,984	80%	46%	57,935
SAMN31006037	hepatopancreas (Cherax quadricarinatus, pooled male and female, SAMN31006037)	44,891,590	78%	49%	50,805
SAMN31006038	hepatopancreas (Cherax quadricarinatus, pooled male and female, SAMN31006038)	47,334,822	73%	40%	59,438
SAMN31006039	hepatopancreas (Cherax quadricarinatus, pooled male and female, SAMN31006039)	41,551,364	73%	40%	57,989
SAMN31006040	hepatopancreas (Cherax quadricarinatus, pooled male and female, SAMN31006040)	44,255,976	72%	41%	60,711
SAMN31874954	eyestalk (Cherax quadricarinatus, male, SAMN31874954)	259,769,644	82%	9%	116,773
SAMN31874957	hepatopancreas (Cherax quadricarinatus, male, SAMN31874957)	185,029,686	81%	9%	98,989
SAMN31874958	muscle (Cherax quadricarinatus, male, SAMN31874958)	201,731,854	87%	12%	78,691
SAMN31874960	stomach (Cherax quadricarinatus, male, SAMN31874960)	148,095,294	80%	6%	90,722

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
ERR391749	ERX363979	ERP004477	SAMEA2277320	50,774,132	80%	5%
ERR391750	ERX363980	ERP004477	SAMEA2277321	34,336,234	80%	7%
ERR391751	ERX363981	ERP004477	SAMEA2277322	38,529,204	80%	6%
ERR391752	ERX363982	ERP004477	SAMEA2277323	50,358,042	82%	13%
SRR7912021	SRX4747187	SRP162784	SAMN10141291	53,148,254	65%	32%
SRR7912022	SRX4747186	SRP162784	SAMN10141292	66,076,340	66%	31%
SRR7912019	SRX4747189	SRP162784	SAMN10141293	118,881,940	67%	42%
SRR7912020	SRX4747188	SRP162784	SAMN10141294	53,231,864	76%	30%
SRR7912023	SRX4747185	SRP162784	SAMN10141295	88,945,782	69%	37%
SRR7912024	SRX4747184	SRP162784	SAMN10141296	131,740,490	77%	38%
SRR9903457	SRX6655272	SRP217417	SAMN12500462	55,517,534	68%	30%
SRR9903456	SRX6655273	SRP217417	SAMN12500463	60,803,190	70%	34%
SRR9903455	SRX6655274	SRP217417	SAMN12500464	54,128,412	70%	29%
SRR9903454	SRX6655275	SRP217417	SAMN12500465	60,053,456	67%	35%
SRR9903459	SRX6655270	SRP217417	SAMN12500466	58,155,472	66%	36%
SRR9903458	SRX6655271	SRP217417	SAMN12500467	58,234,598	71%	32%
SRR10023644	SRX6760654	SRP219340	SAMN12640691	176,933,368	53%	13%
SRR10023639	SRX6760659	SRP219340	SAMN12640695	117,482,430	60%	9%
SRR10023637	SRX6760661	SRP219340	SAMN12640697	100,906,200	64%	14%
SRR10023636	SRX6760662	SRP219340	SAMN12640698	144,025,464	67%	13%
SRR10023635	SRX6760663	SRP219340	SAMN12640699	131,545,402	52%	11%
SRR10023634	SRX6760664	SRP219340	SAMN12640700	102,282,720	70%	10%
SRR10023633	SRX6760665	SRP219340	SAMN12640701	137,834,652	73%	24%
SRR10023632	SRX6760666	SRP219340	SAMN12640702	117,111,332	82%	25%
SRR10023630	SRX6760668	SRP219340	SAMN12640703	111,581,360	73%	21%
SRR10023629	SRX6760669	SRP219340	SAMN12640704	124,577,594	72%	25%
SRR10023628	SRX6760670	SRP219340	SAMN12640705	110,338,732	80%	31%
SRR10023627	SRX6760671	SRP219340	SAMN12640706	127,200,182	84%	26%
SRR10023626	SRX6760672	SRP219340	SAMN12640707	114,896,974	84%	20%
SRR10023625	SRX6760673	SRP219340	SAMN12640708	104,072,454	86%	29%
SRR10023624	SRX6760674	SRP219340	SAMN12640709	110,411,178	84%	33%
SRR10023623	SRX6760675	SRP219340	SAMN12640710	129,459,428	75%	26%
SRR10023622	SRX6760676	SRP219340	SAMN12640711	109,484,020	76%	26%
SRR10023621	SRX6760677	SRP219340	SAMN12640712	121,738,036	82%	25%
SRR10023619	SRX6760679	SRP219340	SAMN12640713	117,795,878	51%	15%
SRR10023618	SRX6760680	SRP219340	SAMN12640714	151,646,640	43%	9%
SRR18462562	SRX14595078	SRP365663	SAMN26867267	48,352,466	74%	42%
SRR18462561	SRX14595079	SRP365663	SAMN26867268	49,566,842	73%	33%
SRR18462558	SRX14595082	SRP365663	SAMN26867269	48,554,928	67%	26%
SRR18462557	SRX14595083	SRP365663	SAMN26867270	42,501,968	72%	24%
SRR18462556	SRX14595084	SRP365663	SAMN26867271	49,783,780	68%	38%
SRR18462555	SRX14595085	SRP365663	SAMN26867272	45,196,514	69%	40%
SRR18462554	SRX14595086	SRP365663	SAMN26867273	48,217,602	76%	46%
SRR18462553	SRX14595087	SRP365663	SAMN26867274	53,534,862	72%	52%
SRR18462552	SRX14595088	SRP365663	SAMN26867275	45,370,264	45%	22%
SRR18462551	SRX14595089	SRP365663	SAMN26867276	43,944,144	73%	18%
SRR18462560	SRX14595080	SRP365663	SAMN26867277	44,008,998	52%	30%
SRR18462559	SRX14595081	SRP365663	SAMN26867278	48,714,314	81%	23%
SRR20657111	SRX16680119	SRP388277	SAMN29983837	47,448,970	65%	21%
SRR20657110	SRX16680120	SRP388277	SAMN29983838	44,675,996	76%	23%
SRR20657131	SRX16680099	SRP388277	SAMN29983839	44,914,152	68%	20%
SRR20657130	SRX16680100	SRP388277	SAMN29983840	44,616,884	59%	18%
SRR20657129	SRX16680101	SRP388277	SAMN29983841	42,487,280	70%	20%
SRR20657128	SRX16680102	SRP388277	SAMN29983842	44,704,430	74%	23%
SRR20657127	SRX16680103	SRP388277	SAMN29983843	44,011,588	72%	22%
SRR20657126	SRX16680104	SRP388277	SAMN29983844	43,530,018	66%	20%
SRR20657125	SRX16680105	SRP388277	SAMN29983845	44,074,598	55%	10%
SRR20657124	SRX16680106	SRP388277	SAMN29983846	39,828,822	64%	12%
SRR20657123	SRX16680107	SRP388277	SAMN29983847	44,759,428	62%	15%
SRR20657122	SRX16680108	SRP388277	SAMN29983848	39,591,846	66%	19%
SRR20657120	SRX16680110	SRP388277	SAMN29983849	38,665,006	65%	17%
SRR20657119	SRX16680111	SRP388277	SAMN29983850	43,863,820	65%	18%
SRR20657118	SRX16680112	SRP388277	SAMN29983851	42,783,386	67%	18%
SRR20657117	SRX16680113	SRP388277	SAMN29983852	38,846,662	63%	16%
SRR21699875	SRX17697392	SRP399467	SAMN31006035	46,379,574	80%	48%
SRR21699874	SRX17697393	SRP399467	SAMN31006036	42,423,984	80%	46%
SRR21699873	SRX17697394	SRP399467	SAMN31006037	44,891,590	78%	49%
SRR21699872	SRX17697395	SRP399467	SAMN31006038	47,334,822	73%	40%
SRR21699871	SRX17697396	SRP399467	SAMN31006039	41,551,364	73%	40%
SRR21699870	SRX17697397	SRP399467	SAMN31006040	44,255,976	72%	41%
SRR22412652	SRX18382195	SRP409896	SAMN31874954	259,769,644	82%	9%
SRR22412653	SRX18382194	SRP409896	SAMN31874957	185,029,686	81%	9%
SRR22412637	SRX18382210	SRP409896	SAMN31874958	201,731,854	87%	12%
SRR22412639	SRX18382208	SRP409896	SAMN31874960	148,095,294	80%	6%

SRA Long Read Alignment Statistics

The alignments of the following long RNA-Seq reads (PacBio, Oxford Nanopore, 454, or other long-read sequencing technologies) from the Sequence Read Archive with minimap2 were used for gene prediction:

Run	Sample	Number of reads	Number (%) of sequences aligned by Minimap2	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
All	NA	46909	42653 (90.92%)	36495 (77.79%)	99.18	97.6
SRR20077354	SAMN29626320	46909	42653 (90.92%)	36495 (77.79%)	99.18	97.6

Protein alignments

The alignments of the following proteins with ProSplign were used for gene prediction:

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Penaeus japonicus high-quality model RefSeq (XP_)	15,395	13,423 (87.19%)	13,423 (87.19%)	70.46%	69.37%
Same-species GenBank	219	216 (98.63%)	216 (98.63%)	81.42%	86.40%
Hyalella azteca high-quality model RefSeq (XP_)	10,207	7,082 (69.38%)	7,082 (69.38%)	64.18%	52.81%
Crustacea GenBank	46,530	39,854 (85.65%)	39,854 (85.65%)	70.24%	74.25%
Daphnia pulex high-quality model RefSeq (XP_)	14,091	9,026 (64.06%)	9,026 (64.06%)	64.77%	52.95%
Homarus americanus high-quality model RefSeq (XP_)	14,107	12,885 (91.34%)	12,885 (91.34%)	72.12%	73.98%
Tribolium castaneum GenBank	680	561 (82.50%)	561 (82.50%)	66.97%	57.33%
Tribolium castaneum high-quality model RefSeq (XP_)	11,487	7,426 (64.65%)	7,426 (64.65%)	61.77%	49.62%
Tribolium castaneum known RefSeq (NP_)	630	499 (79.21%)	499 (79.21%)	66.25%	53.70%

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
BUSCO: Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. Molecular biology and evolution 2021.38(10):4647-4654
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20
STAR: Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. Bioinformatics 2013 Jan 1;29(1):15-21.
Minimap2: Li H. Bioinformatics 2018 Sep 15;34(18):3094-3100

RefSeq

Integrated reference sequences