NCBI Procambarus clarkii Annotation Release 100

The RefSeq genome records for Procambarus clarkii were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. This report presents statistics on the annotation products, the input data used in the pipeline and intermediate alignment results.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
BUSCO results: Annotation completeness assessed with BUSCO
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as NCBI Procambarus clarkii Annotation Release 100

Annotation release ID: 100
Date of Entrez queries for transcripts and proteins: Dec 30 2021
Date of submission of annotation to the public databases: Jan 11 2022
Software version: 9.0

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
ASM2042438v2	GCF_020424385.1	Freshwater Fisheries Research Institute of Jiangsu Province, Nanjing 210017, China	10-12-2021	Reference	95 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	ASM2042438v2
Genes and pseudogenes	30,545
protein-coding	26,417
non-coding	3,559
Transcribed pseudogenes	3
Non-transcribed pseudogenes	564
genes with variants	7,173
Immunoglobulin/T-cell receptor gene segments	0
other	2
mRNAs	45,525
fully-supported	34,362
with > 5% ab initio	9,931
partial	2,404
with filled gap(s)	794
known RefSeq (NM_)	0
model RefSeq (XM_)	45,525
non-coding RNAs	5,126
fully-supported	3,386
with > 5% ab initio	0
partial	15
with filled gap(s)	15
known RefSeq (NR_)	0
model RefSeq (XR_)	3,802
pseudo transcripts	3
fully-supported	3
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	3
CDSs	45,538
fully-supported	34,362
with > 5% ab initio	10,165
partial	2,361
with major correction(s)	241
known RefSeq (NP_)	0
model RefSeq (XP_)	45,538

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	29,978	29,362	8,499	46	1,197,285
All transcripts	50,651	3,153	2,227	46	75,121
mRNA	45,525	3,353	2,421	93	75,121
misc_RNA	1,060	3,712	2,849	166	39,577
tRNA	1,322	74	73	61	87
lncRNA	2,326	1,265	847	57	17,687
snoRNA	46	133	126	61	226
snRNA	136	130	117	46	196
rRNA	234	161	119	118	6,283
Single-exon transcripts	4,989	1,072	831	252	9,820
coding transcripts (NM_/XM_ )	4,989	1,072	831	252	9,820
CDSs	45,538	1,977	1,368	93	73,932
Exons	187,612	405	166	2	33,708
in coding transcripts (NM_/XM_ )	179,706	404	166	2	33,708
in non-coding transcripts (NR_/XR_ )	13,362	371	154	2	20,444
Introns	158,538	6,649	1,723	30	599,628
in coding transcripts (NM_/XM_ )	153,009	6,678	1,736	30	599,628
in non-coding transcripts (NR_/XR_ )	10,752	5,670	1,457	30	489,180

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	1.72	1	1	50
Number of exons per transcript	8.56	6	1	172

BUSCO analysis of gene annotation

BUSCO v4.1.4 (Simão et al 2015, PMID: 26059717) was run in "protein" mode on the annotated gene set picking one longest protein per gene, and run using the arthropoda_odb10 lineage dataset. Results are reported for the gene set from the primary assembly unit, and presented in BUSCO notation (C:complete [S:single-copy, D:duplicated], F:fragmented, M:missing, n:number of genes used).

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the UniProtKB/Swiss-Prot curated proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 26404 coding genes, 18478 genes had a protein with an alignment covering 50% or more of the query and 3340 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: UniProtKB/Swiss-Prot curated proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker (if calculated), for each assembly. RepeatMasker results are only calculated for organisms with complete Dfam HMM model collections.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with WindowMasker
ASM2042438v2	GCF_020424385.1	62.76%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez, aligned to the genome by Splign, minimap2, or ProSplign and passed to Gnomon, NCBI's gene prediction software.

Transcript alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species Genbank	804	750 (93.28%)	709 (88.18%)	99.51%	97.03%
Same-species EST	629	606 (96.34%)	574 (91.26%)	99.13%	99.49%

RNA-Seq alignments

The following RNA-Seq reads from the Sequence Read Archive were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Publication	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	NA	Aggregate of all aligned samples	7,500,902,068	75%	31%	192,384
SAMD00057260	NA	toden (Procambarus clarkii, SAMD00057260)	231,550,998	87%	19%	127,364
SAMD00057262	NA	sendai (Procambarus clarkii, SAMD00057262)	154,513,644	83%	18%	122,234
SAMN02179261	25278476,25479010	General Sample for Procambarus clarkii (Procambarus clarkii, adult, SAMN02179261)	87,759,786	81%	19%	132,083
SAMN02641226	28069422	muscle, heart, gut, gill and hepatopancreas (Procambarus clarkii, one year, SAMN02641226)	20,702,068	81%	28%	76,384
SAMN02904635	25338101	hepato (Procambarus clarkii, female, SAMN02904635)	120,459,362	56%	23%	98,038
SAMN02904637	25338101	muscle (Procambarus clarkii, female, SAMN02904637)	83,513,510	90%	8%	90,259
SAMN02904638	25338101	ovary (Procambarus clarkii, female, SAMN02904638)	84,936,238	86%	14%	127,468
SAMN02905149	25338101	pctestis (Procambarus clarkii, male, SAMN02905149)	36,060,538	68%	9%	94,673
SAMN06169691	NA	brain (Procambarus clarkii, SAMN06169691)	133,312,028	41%	4%	83,437
SAMN06169692	NA	neurogenic niche of the deutocerebrum (brain) (Procambarus clarkii, SAMN06169692)	81,985,696	57%	13%	22,659
SAMN06169693	NA	several neurogenic niches of the deutocerebrum (brain) (Procambarus clarkii, SAMN06169693)	57,535,474	63%	17%	52,673
SAMN06169694	NA	anterior proliferation center (Procambarus clarkii, SAMN06169694)	58,183,976	54%	18%	90,609
SAMN06169695	NA	hematopoetic tissue (Procambarus clarkii, SAMN06169695)	48,900,368	41%	11%	70,656
SAMN06169696	NA	semigranular cells (type of hemocyte) (Procambarus clarkii, SAMN06169696)	90,883,718	55%	24%	68,844
SAMN06169697	NA	granular cells (type of hemocyte) (Procambarus clarkii, SAMN06169697)	58,921,832	49%	25%	55,481
SAMN06169698	NA	whole animal (Procambarus clarkii, SAMN06169698)	56,072,224	42%	8%	75,452
SAMN06169699	NA	whole animal (Procambarus clarkii, SAMN06169699)	54,444,914	53%	8%	66,407
SAMN06169700	NA	semigranular cells (type of hemocyte) (Procambarus clarkii, SAMN06169700)	68,976,932	38%	24%	86,726
SAMN06169701	NA	granular cells (type of hemocyte) (Procambarus clarkii, SAMN06169701)	90,151,924	38%	17%	87,438
SAMN06169702	NA	hematopoietic tissue (Procambarus clarkii, SAMN06169702)	72,616,064	36%	18%	83,180
SAMN06169703	NA	brain (Procambarus clarkii, SAMN06169703)	48,393,346	35%	12%	83,517
SAMN06169704	NA	neurogenic niche of the deutocerebrum (brain) (Procambarus clarkii, SAMN06169704)	79,856,734	44%	20%	32,303
SAMN06169705	NA	anterior proliferation center (Procambarus clarkii, SAMN06169705)	53,695,750	53%	24%	88,547
SAMN06169706	NA	several neurogenic niches of the deutocerebrum (brain) (Procambarus clarkii, SAMN06169706)	70,353,688	69%	12%	11,319
SAMN06169707	NA	whole animal (Procambarus clarkii, SAMN06169707)	57,294,162	55%	6%	66,813
SAMN06169708	NA	whole animal (Procambarus clarkii, SAMN06169708)	66,881,460	38%	3%	56,959
SAMN06169709	NA	neurogenic niche of the deutocerebrum (Procambarus clarkii, SAMN06169709)	72,625,056	58%	11%	84,869
SAMN09699930	NA	ovary (Procambarus clarkii, secondary vitellogenesis stage, SAMN09699930)	62,160,266	83%	35%	113,848
SAMN10345055	NA	hepatopancreas (Procambarus clarkii, SAMN10345055)	44,135,162	88%	50%	63,423
SAMN10345056	NA	hepatopancreas (Procambarus clarkii, SAMN10345056)	60,541,654	76%	40%	84,282
SAMN10373339	NA	hepatopancreas (Procambarus clarkii, SAMN10373339)	51,132,944	23%	23%	31,408
SAMN13634784	30240423,32163507,32962631	Antennular lateral flagella (Procambarus clarkii, SAMN13634784)	128,695,104	77%	23%	121,216
SAMN13634785	30240423,32163507,32962631	Walking leg dactyls (Procambarus clarkii, SAMN13634785)	121,465,408	53%	10%	97,211
SAMN13634786	30240423,32163507,32962631	Brain (Procambarus clarkii, SAMN13634786)	124,066,876	75%	13%	130,021
SAMN14939465	NA	gill (Procambarus clarkii, SAMN14939465)	141,036,364	71%	35%	118,538
SAMN14939466	NA	gill (Procambarus clarkii, SAMN14939466)	198,595,164	59%	32%	135,676
SAMN14939467	NA	hepatopancreas (Procambarus clarkii, SAMN14939467)	204,280,888	85%	40%	117,462
SAMN14939468	NA	hepatopancreas (Procambarus clarkii, SAMN14939468)	222,053,848	85%	38%	128,873
SAMN14939469	NA	muscle (Procambarus clarkii, SAMN14939469)	207,578,550	89%	32%	101,023
SAMN14939470	NA	muscle (Procambarus clarkii, SAMN14939470)	149,212,804	84%	31%	100,175
SAMN19018848	34303807	Pc for Trans-41 (Procambarus clarkii, SAMN19018848)	71,166,644	82%	33%	129,877
SAMN19759215	NA	muscle (Procambarus clarkii, SAMN19759215)	45,071,814	78%	34%	107,549
SAMN19759216	NA	muscle (Procambarus clarkii, SAMN19759216)	46,711,218	86%	36%	75,825
SAMN19759217	NA	muscle (Procambarus clarkii, SAMN19759217)	44,947,106	87%	41%	73,104
SAMN19759218	NA	muscle (Procambarus clarkii, SAMN19759218)	41,734,940	84%	32%	85,409
SAMN19759219	NA	muscle (Procambarus clarkii, SAMN19759219)	43,676,014	84%	37%	84,746
SAMN19759220	NA	muscle (Procambarus clarkii, SAMN19759220)	43,994,260	87%	35%	81,493
SAMN20003116	NA	blood (Procambarus clarkii, SAMN20003116)	53,997,450	63%	24%	102,412
SAMN20003117	NA	blood (Procambarus clarkii, SAMN20003117)	60,538,268	62%	24%	106,885
SAMN20003118	NA	blood (Procambarus clarkii, SAMN20003118)	52,920,184	65%	25%	103,028
SAMN20003119	NA	blood (Procambarus clarkii, SAMN20003119)	44,494,552	63%	23%	97,610
SAMN20003120	NA	blood (Procambarus clarkii, SAMN20003120)	48,340,702	61%	24%	95,706
SAMN20003121	NA	blood (Procambarus clarkii, SAMN20003121)	50,208,958	72%	23%	96,718
SAMN20003122	NA	blood (Procambarus clarkii, SAMN20003122)	50,182,474	73%	23%	100,129
SAMN20003123	NA	blood (Procambarus clarkii, SAMN20003123)	45,306,142	76%	25%	101,046
SAMN20003124	NA	blood (Procambarus clarkii, SAMN20003124)	41,032,310	69%	23%	95,493
SAMN20003125	NA	blood (Procambarus clarkii, SAMN20003125)	45,506,042	67%	23%	94,232
SAMN20239179	NA	hepatopancreas (Procambarus clarkii, SAMN20239179)	46,585,272	89%	47%	97,025
SAMN20239180	NA	hepatopancreas (Procambarus clarkii, SAMN20239180)	43,598,786	91%	44%	90,634
SAMN20239181	NA	hepatopancreas (Procambarus clarkii, SAMN20239181)	47,813,836	91%	41%	90,375
SAMN20239182	NA	hepatopancreas (Procambarus clarkii, SAMN20239182)	41,925,894	90%	44%	97,032
SAMN20239183	NA	hepatopancreas (Procambarus clarkii, SAMN20239183)	51,255,420	90%	44%	99,972
SAMN20239184	NA	hepatopancreas (Procambarus clarkii, SAMN20239184)	41,630,686	89%	45%	100,754
SAMN20239185	NA	hepatopancreas (Procambarus clarkii, SAMN20239185)	43,547,936	91%	41%	95,636
SAMN20239186	NA	hepatopancreas (Procambarus clarkii, SAMN20239186)	44,568,732	91%	44%	95,122
SAMN20239187	NA	hepatopancreas (Procambarus clarkii, SAMN20239187)	39,102,880	92%	44%	94,552
SAMN20239188	NA	hepatopancreas (Procambarus clarkii, SAMN20239188)	50,115,436	87%	38%	103,389
SAMN20239189	NA	hepatopancreas (Procambarus clarkii, SAMN20239189)	50,201,410	90%	39%	106,142
SAMN20239190	NA	hepatopancreas (Procambarus clarkii, SAMN20239190)	43,502,240	90%	48%	92,621
SAMN20239191	NA	hepatopancreas (Procambarus clarkii, SAMN20239191)	56,020,550	90%	45%	106,492
SAMN20239192	NA	hepatopancreas (Procambarus clarkii, SAMN20239192)	46,234,638	89%	43%	99,842
SAMN20239193	NA	hepatopancreas (Procambarus clarkii, SAMN20239193)	46,575,538	91%	50%	101,689
SAMN20239194	NA	hepatopancreas (Procambarus clarkii, SAMN20239194)	48,270,098	91%	42%	97,292
SAMN20239195	NA	hepatopancreas (Procambarus clarkii, SAMN20239195)	45,422,778	89%	45%	100,963
SAMN20239196	NA	hepatopancreas (Procambarus clarkii, SAMN20239196)	52,030,390	90%	52%	97,082
SAMN20239197	NA	hepatopancreas (Procambarus clarkii, SAMN20239197)	50,819,934	89%	49%	100,458
SAMN20239198	NA	hepatopancreas (Procambarus clarkii, SAMN20239198)	50,481,252	91%	45%	95,747
SAMN20777491	NA	Liver (Procambarus clarkii, eight month, SAMN20777491)	77,617,668	89%	35%	104,180
SAMN20799522	NA	Liver (Procambarus clarkii, eight month, SAMN20799522)	72,676,042	87%	31%	104,018
SAMN20822787	NA	Liver (Procambarus clarkii, eight month, SAMN20822787)	68,185,800	89%	32%	93,713
SAMN20825196	NA	Hepatopancreas (Procambarus clarkii, SAMN20825196)	44,738,690	82%	34%	103,169
SAMN20825197	NA	Hepatopancreas (Procambarus clarkii, SAMN20825197)	48,217,116	80%	42%	98,980
SAMN20825198	NA	Hepatopancreas (Procambarus clarkii, SAMN20825198)	43,994,692	81%	35%	97,180
SAMN20825199	NA	Hepatopancreas (Procambarus clarkii, SAMN20825199)	47,106,282	82%	37%	96,870
SAMN20825200	NA	Hepatopancreas (Procambarus clarkii, SAMN20825200)	42,830,006	83%	35%	99,331
SAMN20825201	NA	Hepatopancreas (Procambarus clarkii, SAMN20825201)	44,345,410	84%	40%	82,893
SAMN20835921	NA	Liver (Procambarus clarkii, eight month, SAMN20835921)	74,171,122	87%	38%	101,638
SAMN20845780	NA	Liver (Procambarus clarkii, eight month, SAMN20845780)	72,359,746	88%	36%	107,799
SAMN20845831	NA	Liver (Procambarus clarkii, eight month, SAMN20845831)	78,676,172	88%	37%	111,857
SAMN21210902	NA	Hepatopancreas (Procambarus clarkii, 50 days after hatching, SAMN21210902)	52,037,658	87%	33%	98,689
SAMN21210903	NA	Hepatopancreas (Procambarus clarkii, 51 days after hatching, SAMN21210903)	52,785,386	88%	37%	101,537
SAMN21210904	NA	Hepatopancreas (Procambarus clarkii, 52 days after hatching, SAMN21210904)	56,999,874	87%	36%	103,147
SAMN21210905	NA	Hepatopancreas (Procambarus clarkii, 53 days after hatching, SAMN21210905)	49,519,914	78%	29%	78,593
SAMN21210906	NA	Hepatopancreas (Procambarus clarkii, 54 days after hatching, SAMN21210906)	53,486,822	87%	37%	104,249
SAMN21210907	NA	Hepatopancreas (Procambarus clarkii, 55 days after hatching, SAMN21210907)	40,823,606	87%	34%	90,468
SAMN22170947	NA	mature, hepatopancreas (Procambarus clarkii, male, SAMN22170947)	41,753,496	71%	36%	102,226
SAMN22170948	NA	mature, hepatopancreas (Procambarus clarkii, male, SAMN22170948)	41,308,306	71%	38%	101,551
SAMN22170949	NA	mature, hepatopancreas (Procambarus clarkii, male, SAMN22170949)	42,654,016	72%	36%	99,591
SAMN22170950	NA	mature, hepatopancreas (Procambarus clarkii, male, SAMN22170950)	40,508,310	75%	38%	99,401
SAMN22170951	NA	mature, hepatopancreas (Procambarus clarkii, male, SAMN22170951)	41,403,078	74%	38%	95,227
SAMN22170952	NA	mature, hepatopancreas (Procambarus clarkii, male, SAMN22170952)	43,248,636	74%	38%	100,913
SAMN23929577	NA	mature, hepatopancreas (Procambarus clarkii, male, SAMN23929577)	42,795,604	80%	33%	107,532
SAMN23929578	NA	mature, hepatopancreas (Procambarus clarkii, male, SAMN23929578)	43,850,794	80%	32%	109,801
SAMN23929579	NA	mature, hepatopancreas (Procambarus clarkii, male, SAMN23929579)	50,588,168	78%	27%	99,041
SAMN23929580	NA	mature, hepatopancreas (Procambarus clarkii, male, SAMN23929580)	51,623,838	82%	34%	112,228
SAMN23929581	NA	mature, hepatopancreas (Procambarus clarkii, male, SAMN23929581)	50,125,034	82%	34%	113,432
SAMN23929582	NA	mature, hepatopancreas (Procambarus clarkii, male, SAMN23929582)	49,707,042	81%	34%	109,624
SAMN23929583	NA	mature, hepatopancreas (Procambarus clarkii, male, SAMN23929583)	46,462,574	82%	31%	107,631
SAMN23929584	NA	mature, hepatopancreas (Procambarus clarkii, male, SAMN23929584)	50,270,496	83%	32%	106,145
SAMN23929585	NA	mature, hepatopancreas (Procambarus clarkii, male, SAMN23929585)	50,293,278	82%	32%	108,709
SAMN23929586	NA	mature, hepatopancreas (Procambarus clarkii, male, SAMN23929586)	50,141,290	81%	33%	111,344
SAMN23929587	NA	mature, hepatopancreas (Procambarus clarkii, male, SAMN23929587)	61,860,116	81%	34%	115,289
SAMN23929588	NA	mature, hepatopancreas (Procambarus clarkii, male, SAMN23929588)	56,666,680	80%	33%	113,256

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
DRR066858	DRX060818	DRP006259	SAMD00057260	231,550,998	87%	19%
DRR066857	DRX060817	DRP006259	SAMD00057262	154,513,644	83%	18%
SRR870673	SRX288443	SRP023487	SAMN02179261	87,759,786	81%	19%
SRR1509455	SRX648255	SRP044128	SAMN02904635	120,459,362	56%	23%
SRR1509456	SRX648256	SRP044128	SAMN02904637	83,513,510	90%	8%
SRR1509457	SRX648257	SRP044128	SAMN02904638	84,936,238	86%	14%
SRR1509458	SRX648258	SRP044128	SAMN02905149	36,060,538	68%	9%
SRR3458513	SRX1734261	SRP074085	SAMN02641226	20,702,068	81%	28%
SRR5136649	SRX2452759	SRP095858	SAMN06169691	133,312,028	41%	4%
SRR5136662	SRX2452772	SRP095858	SAMN06169692	81,985,696	57%	13%
SRR5136648	SRX2452758	SRP095858	SAMN06169693	57,535,474	63%	17%
SRR5136660	SRX2452770	SRP095858	SAMN06169694	58,183,976	54%	18%
SRR5136661	SRX2452771	SRP095858	SAMN06169695	48,900,368	41%	11%
SRR5136645	SRX2452755	SRP095858	SAMN06169696	90,883,718	55%	24%
SRR5136652	SRX2452762	SRP095858	SAMN06169697	58,921,832	49%	25%
SRR5136658	SRX2452768	SRP095858	SAMN06169698	56,072,224	42%	8%
SRR5136659	SRX2452769	SRP095858	SAMN06169699	54,444,914	53%	8%
SRR5136647	SRX2452757	SRP095858	SAMN06169700	68,976,932	38%	24%
SRR5136653	SRX2452763	SRP095858	SAMN06169701	90,151,924	38%	17%
SRR5136656	SRX2452766	SRP095858	SAMN06169702	72,616,064	36%	18%
SRR5136663	SRX2452773	SRP095858	SAMN06169703	48,393,346	35%	12%
SRR5136651	SRX2452761	SRP095858	SAMN06169704	79,856,734	44%	20%
SRR5136654	SRX2452764	SRP095858	SAMN06169705	53,695,750	53%	24%
SRR5136650	SRX2452760	SRP095858	SAMN06169706	70,353,688	69%	12%
SRR5136657	SRX2452767	SRP095858	SAMN06169707	57,294,162	55%	6%
SRR5136646	SRX2452756	SRP095858	SAMN06169708	66,881,460	38%	3%
SRR5136655	SRX2452765	SRP095858	SAMN06169709	72,625,056	58%	11%
SRR7601327	SRX4466323	SRP155329	SAMN09699930	62,160,266	83%	35%
SRR8151935	SRX4972560	SRP167779	SAMN10345055	44,135,162	88%	50%
SRR8151934	SRX4972561	SRP167779	SAMN10345056	60,541,654	76%	40%
SRR8156034	SRX4976937	SRP167799	SAMN10373339	51,132,944	23%	23%
SRR10874082	SRX7543843	SRP241627	SAMN13634784	128,695,104	77%	23%
SRR10874081	SRX7543844	SRP241627	SAMN13634785	121,465,408	53%	10%
SRR10874080	SRX7543845	SRP241627	SAMN13634786	124,066,876	75%	13%
SRR11802347	SRX8353772	SRP261970	SAMN14939465	51,787,046	75%	37%
SRR11802346	SRX8353773	SRP261970	SAMN14939465	45,991,646	70%	34%
SRR11802335	SRX8353784	SRP261970	SAMN14939465	43,257,672	67%	32%
SRR11802332	SRX8353787	SRP261970	SAMN14939466	50,873,190	82%	34%
SRR11802331	SRX8353788	SRP261970	SAMN14939466	54,329,294	46%	34%
SRR11802330	SRX8353789	SRP261970	SAMN14939466	51,521,868	46%	33%
SRR11802329	SRX8353790	SRP261970	SAMN14939466	41,870,812	61%	26%
SRR11802345	SRX8353774	SRP261970	SAMN14939467	62,437,362	87%	45%
SRR11802328	SRX8353791	SRP261970	SAMN14939467	48,866,146	85%	41%
SRR11802327	SRX8353792	SRP261970	SAMN14939467	49,097,452	83%	37%
SRR11802326	SRX8353793	SRP261970	SAMN14939467	43,879,928	84%	37%
SRR11802344	SRX8353775	SRP261970	SAMN14939468	58,094,370	86%	31%
SRR11802343	SRX8353776	SRP261970	SAMN14939468	62,941,504	84%	39%
SRR11802342	SRX8353777	SRP261970	SAMN14939468	50,018,196	83%	39%
SRR11802341	SRX8353778	SRP261970	SAMN14939468	50,999,778	86%	44%
SRR11802340	SRX8353779	SRP261970	SAMN14939469	59,861,218	91%	33%
SRR11802339	SRX8353780	SRP261970	SAMN14939469	52,338,308	92%	34%
SRR11802338	SRX8353781	SRP261970	SAMN14939469	49,790,550	86%	31%
SRR11802337	SRX8353782	SRP261970	SAMN14939469	45,588,474	86%	31%
SRR11802336	SRX8353783	SRP261970	SAMN14939470	44,893,438	84%	31%
SRR11802334	SRX8353785	SRP261970	SAMN14939470	47,488,996	85%	30%
SRR11802333	SRX8353786	SRP261970	SAMN14939470	56,830,370	83%	31%
SRR14457198	SRX10808043	SRP318730	SAMN19018848	71,166,644	82%	33%
SRR14846522	SRX11167561	SRP324405	SAMN19759215	45,071,814	78%	34%
SRR14846521	SRX11167562	SRP324405	SAMN19759216	46,711,218	86%	36%
SRR14846520	SRX11167563	SRP324405	SAMN19759217	44,947,106	87%	41%
SRR14846519	SRX11167564	SRP324405	SAMN19759218	41,734,940	84%	32%
SRR14846518	SRX11167565	SRP324405	SAMN19759219	43,676,014	84%	37%
SRR14846517	SRX11167566	SRP324405	SAMN19759220	43,994,260	87%	35%
SRR15021497	SRX11333359	SRP326634	SAMN20003116	53,997,450	63%	24%
SRR15021496	SRX11333360	SRP326634	SAMN20003117	60,538,268	62%	24%
SRR15021495	SRX11333361	SRP326634	SAMN20003118	52,920,184	65%	25%
SRR15021494	SRX11333362	SRP326634	SAMN20003119	44,494,552	63%	23%
SRR15021493	SRX11333363	SRP326634	SAMN20003120	48,340,702	61%	24%
SRR15021492	SRX11333364	SRP326634	SAMN20003121	50,208,958	72%	23%
SRR15021491	SRX11333365	SRP326634	SAMN20003122	50,182,474	73%	23%
SRR15021490	SRX11333366	SRP326634	SAMN20003123	45,306,142	76%	25%
SRR15021489	SRX11333367	SRP326634	SAMN20003124	41,032,310	69%	23%
SRR15021488	SRX11333368	SRP326634	SAMN20003125	45,506,042	67%	23%
SRR15146761	SRX11454133	SRP328472	SAMN20239179	46,585,272	89%	47%
SRR15146755	SRX11454139	SRP328472	SAMN20239180	43,598,786	91%	44%
SRR15146744	SRX11454150	SRP328472	SAMN20239181	47,813,836	91%	41%
SRR15146743	SRX11454151	SRP328472	SAMN20239182	41,925,894	90%	44%
SRR15146742	SRX11454152	SRP328472	SAMN20239183	51,255,420	90%	44%
SRR15146760	SRX11454134	SRP328472	SAMN20239184	41,630,686	89%	45%
SRR15146747	SRX11454147	SRP328472	SAMN20239185	43,547,936	91%	41%
SRR15146746	SRX11454148	SRP328472	SAMN20239186	44,568,732	91%	44%
SRR15146745	SRX11454149	SRP328472	SAMN20239187	39,102,880	92%	44%
SRR15146759	SRX11454135	SRP328472	SAMN20239188	50,115,436	87%	38%
SRR15146758	SRX11454136	SRP328472	SAMN20239189	50,201,410	90%	39%
SRR15146757	SRX11454137	SRP328472	SAMN20239190	43,502,240	90%	48%
SRR15146756	SRX11454138	SRP328472	SAMN20239191	56,020,550	90%	45%
SRR15146754	SRX11454140	SRP328472	SAMN20239192	46,234,638	89%	43%
SRR15146753	SRX11454141	SRP328472	SAMN20239193	46,575,538	91%	50%
SRR15146752	SRX11454142	SRP328472	SAMN20239194	48,270,098	91%	42%
SRR15146750	SRX11454144	SRP328472	SAMN20239195	45,422,778	89%	45%
SRR15146751	SRX11454143	SRP328472	SAMN20239196	52,030,390	90%	52%
SRR15146749	SRX11454145	SRP328472	SAMN20239197	50,819,934	89%	49%
SRR15146748	SRX11454146	SRP328472	SAMN20239198	50,481,252	91%	45%
SRR15461231	SRX11760542	SRP332571	SAMN20777491	77,617,668	89%	35%
SRR15496475	SRX11796172	SRP332571	SAMN20799522	72,676,042	87%	31%
SRR15498210	SRX11797901	SRP332844	SAMN20822787	68,185,800	89%	32%
SRR15508059	SRX11807266	SRP333019	SAMN20835921	74,171,122	87%	38%
SRR15509780	SRX11808672	SRP333061	SAMN20845780	72,359,746	88%	36%
SRR15510204	SRX11809091	SRP333075	SAMN20825196	44,738,690	82%	34%
SRR15510203	SRX11809092	SRP333075	SAMN20825197	48,217,116	80%	42%
SRR15510202	SRX11809093	SRP333075	SAMN20825198	43,994,692	81%	35%
SRR15510201	SRX11809094	SRP333075	SAMN20825199	47,106,282	82%	37%
SRR15510200	SRX11809095	SRP333075	SAMN20825200	42,830,006	83%	35%
SRR15510199	SRX11809096	SRP333075	SAMN20825201	44,345,410	84%	40%
SRR15522880	SRX11821639	SRP333213	SAMN20845831	78,676,172	88%	37%
SRR15711076	SRX12006790	SRP335487	SAMN21210902	52,037,658	87%	33%
SRR15711075	SRX12006791	SRP335487	SAMN21210903	52,785,386	88%	37%
SRR15711074	SRX12006792	SRP335487	SAMN21210904	56,999,874	87%	36%
SRR15711073	SRX12006793	SRP335487	SAMN21210905	49,519,914	78%	29%
SRR15711072	SRX12006794	SRP335487	SAMN21210906	53,486,822	87%	37%
SRR15711071	SRX12006795	SRP335487	SAMN21210907	40,823,606	87%	34%
SRR16268611	SRX12547915	SRP340638	SAMN22170947	41,753,496	71%	36%
SRR16268610	SRX12547916	SRP340638	SAMN22170948	41,308,306	71%	38%
SRR16268609	SRX12547917	SRP340638	SAMN22170949	42,654,016	72%	36%
SRR16268608	SRX12547918	SRP340638	SAMN22170950	40,508,310	75%	38%
SRR16268607	SRX12547919	SRP340638	SAMN22170951	41,403,078	74%	38%
SRR16268606	SRX12547920	SRP340638	SAMN22170952	43,248,636	74%	38%
SRR17207781	SRX13387687	SRP350380	SAMN23929577	42,795,604	80%	33%
SRR17207780	SRX13387688	SRP350380	SAMN23929578	43,850,794	80%	32%
SRR17207777	SRX13387691	SRP350380	SAMN23929579	50,588,168	78%	27%
SRR17207776	SRX13387692	SRP350380	SAMN23929580	51,623,838	82%	34%
SRR17207775	SRX13387693	SRP350380	SAMN23929581	50,125,034	82%	34%
SRR17207774	SRX13387694	SRP350380	SAMN23929582	49,707,042	81%	34%
SRR17207773	SRX13387695	SRP350380	SAMN23929583	46,462,574	82%	31%
SRR17207772	SRX13387696	SRP350380	SAMN23929584	50,270,496	83%	32%
SRR17207771	SRX13387697	SRP350380	SAMN23929585	50,293,278	82%	32%
SRR17207770	SRX13387698	SRP350380	SAMN23929586	50,141,290	81%	33%
SRR17207779	SRX13387689	SRP350380	SAMN23929587	61,860,116	81%	34%
SRR17207778	SRX13387690	SRP350380	SAMN23929588	56,666,680	80%	33%

Protein alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Penaeus japonicus high-quality model RefSeq (XP_)	15,395	13,315 (86.49%)	13,315 (86.49%)	70.17%	71.63%
Hyalella azteca high-quality model RefSeq (XP_)	9,395	6,893 (73.37%)	6,893 (73.37%)	63.95%	55.94%
Crustacea GenBank	45,460	30,673 (67.47%)	30,673 (67.47%)	69.23%	74.49%
Homarus americanus high-quality model RefSeq (XP_)	14,107	12,914 (91.54%)	12,914 (91.54%)	72.16%	76.67%
Same-species GenBank	569	547 (96.13%)	547 (96.13%)	84.08%	88.23%
Tribolium castaneum GenBank	673	290 (43.09%)	290 (43.09%)	68.74%	66.63%
Tribolium castaneum high-quality model RefSeq (XP_)	11,487	7,800 (67.90%)	7,800 (67.90%)	61.32%	51.42%
Tribolium castaneum known RefSeq (NP_)	627	507 (80.86%)	507 (80.86%)	64.59%	54.81%

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20
Minimap2: Li H. Bioinformatics 2018 Sep 15;34(18):3094-3100

RefSeq

Integrated reference sequences