NCBI Penaeus monodon Annotation Release 100

The RefSeq genome records for Penaeus monodon were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. This report presents statistics on the annotation products, the input data used in the pipeline and intermediate alignment results.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as NCBI Penaeus monodon Annotation Release 100

Annotation release ID: 100
Date of Entrez queries for transcripts and proteins: Nov 19 2020
Date of submission of annotation to the public databases: Nov 24 2020
Software version: 8.5

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
NSTDA_Pmon_1	GCF_015228065.1	SAFE-Aqua	11-05-2020	Reference	45 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	NSTDA_Pmon_1
Genes and pseudogenes	31,518
protein-coding	24,092
non-coding	4,257
transcribed pseudogenes	71
non-transcribed pseudogenes	3,098
genes with variants	3,937
immunoglobulin/T-cell receptor gene segments	0
other	0
mRNAs	32,887
fully-supported	25,190
with > 5% ab initio	5,501
partial	6,145
with filled gap(s)	5,558
known RefSeq (NM_)	0
model RefSeq (XM_)	32,887
non-coding RNAs	4,894
fully-supported	2,397
with > 5% ab initio	0
partial	27
with filled gap(s)	27
known RefSeq (NR_)	0
model RefSeq (XR_)	2,952
pseudo transcripts	71
fully-supported	51
with > 5% ab initio	0
partial	0
with filled gap(s)	1
known RefSeq (NR_)	0
model RefSeq (XR_)	71
CDSs	32,900
fully-supported	25,190
with > 5% ab initio	5,876
partial	5,333
with major correction(s)	3,453
known RefSeq (NP_)	13
model RefSeq (XP_)	32,887

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	28,349	15,308	5,411	57	420,180
All transcripts	37,781	2,143	1,510	57	55,019
mRNA	32,887	2,381	1,721	102	55,019
misc_RNA	403	2,609	2,357	107	8,775
tRNA	1,940	74	73	65	88
lncRNA	1,994	674	453	70	12,179
snoRNA	141	182	199	63	211
snRNA	258	149	160	57	202
guide_RNA	4	157	134	130	234
rRNA	154	360	119	118	5,695
Single-exon transcripts	1,593	1,177	834	276	8,721
coding transcripts (NM_/XM_ )	1,593	1,177	834	276	8,721
CDSs	32,900	1,610	1,122	102	53,859
Exons	175,083	288	157	1	17,253
in coding transcripts (NM_/XM_ )	169,238	289	157	1	17,253
in non-coding transcripts (NR_/XR_ )	7,389	245	133	2	11,655
Introns	144,065	2,941	569	30	118,768
in coding transcripts (NM_/XM_ )	140,252	2,885	566	30	118,768
in non-coding transcripts (NR_/XR_ )	5,298	4,404	653	30	99,524

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	1.36	1	1	50
Number of exons per transcript	7.52	5	1	116

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the UniProtKB/Swiss-Prot curated proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 24079 coding genes, 15368 genes had a protein with an alignment covering 50% or more of the query and 2631 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: UniProtKB/Swiss-Prot curated proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker for each assembly. RepeatMasker results are only used for organisms for which a comprehensive repeat library is available.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with RepeatMasker	% Masked with WindowMasker
NSTDA_Pmon_1	GCF_015228065.1	25.05%	48.14%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez, aligned to the genome by Splign or ProSplign and passed to Gnomon, NCBI's gene prediction software.

Depending on the other evidence available, long 454 reads (with average length above 250 nt) may be aligned as traditional evidence and reported in the Transcript alignments section or aligned with RNA-Seq reads and reported in the RNA-Seq alignments section.

Transcript alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species Genbank	785	711 (90.57%)	629 (80.13%)	98.95%	95.34%
Same-species EST	39,604	27,777 (70.14%)	24,750 (62.49%)	98.66%	98.07%
Same-species long SRA	22,418	22,054 (98.38%)	16,247 (72.47%)	97.81%	93.94%
Crustacea Genbank	49,251	4,957 (10.06%)	997 (2.02%)	92.03%	92.56%
Crustacea EST	880,354	100,860 (11.46%)	73,009 (8.29%)	92.82%	97.54%

RNA-Seq alignments

The following RNA-Seq reads from the Sequence Read Archive were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Publication	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	NA	Aggregate of all aligned samples	7,129,784,112	72%	28%	166,578
SAMN00764091	25164406	India WSSV resistant animals (Penaeus monodon, SAMN00764091)	59,390,588	77%	13%	115,834
SAMN00764092	25164406	India Andaman Island animals (Penaeus monodon, SAMN00764092)	77,731,518	80%	13%	102,169
SAMN00764094	25164406	East coast animals (Penaeus monodon, SAMN00764094)	59,227,360	76%	13%	116,286
SAMN01164061	NA	Resistant Shrimp (Penaeus monodon, SAMN01164061)	240,897	45%	5%	1,157
SAMN03196615	NA	hepatopancreas (Penaeus monodon, pooled male and female, SAMN03196615)	135,905,608	79%	15%	91,385
SAMN03329577	NA	ovary (Penaeus monodon, female, SAMN03329577)	237,886	83%	26%	15,090
SAMN03329578	NA	ovary (Penaeus monodon, 14 month from grow-out ponds, female, SAMN03329578)	240,550	85%	49%	25,031
SAMN03840823	NA	Antennal glands (Penaeus monodon, male, SAMN03840823)	262,521,420	78%	17%	129,133
SAMN04161436	NA	Heart (Penaeus monodon, SAMN04161436)	45,063,432	86%	27%	48,061
SAMN04161437	NA	Muscle (Penaeus monodon, SAMN04161437)	24,625,638	43%	13%	25,228
SAMN04161438	NA	Hepato-pancreas (Penaeus monodon, SAMN04161438)	41,025,958	93%	13%	25,300
SAMN04161439	NA	Eyestalk (Penaeus monodon, SAMN04161439)	46,435,664	74%	22%	69,699
SAMN08131262	NA	eye stalks, brains, thoracic ganglia, abdominal ganglia (Penaeus monodon, pooled male and female, SAMN08131262)	53,965,308	74%	16%	104,640
SAMN08516578	NA	Muscle (Penaeus monodon, 4 month, pooled male and female, SAMN08516578)	1,849,137	90%	11%	14,170
SAMN08516579	NA	Muscle (Penaeus monodon, 4 month, pooled male and female, SAMN08516579)	1,773,676	88%	8%	15,151
SAMN08516580	NA	Muscle (Penaeus monodon, 4 month, pooled male and female, SAMN08516580)	1,791,636	71%	8%	15,205
SAMN08516581	NA	Muscle (Penaeus monodon, 4 month, pooled male and female, SAMN08516581)	1,792,814	88%	19%	18,231
SAMN08516583	NA	Muscle (Penaeus monodon, 4 month, pooled male and female, SAMN08516583)	1,791,794	87%	13%	15,311
SAMN08516584	NA	Muscle (Penaeus monodon, 4 month, pooled male and female, SAMN08516584)	1,775,350	69%	6%	10,222
SAMN08741487	NA	Gills (Penaeus monodon, female, SAMN08741487)	39,942,230	73%	26%	96,159
SAMN08741488	NA	Hepatopancreas (Penaeus monodon, female, SAMN08741488)	37,663,364	76%	34%	74,118
SAMN08741489	NA	Eyestalk (Penaeus monodon, female, SAMN08741489)	37,968,304	74%	24%	99,082
SAMN08741490	NA	Male Gonad (Penaeus monodon, male, SAMN08741490)	39,600,548	70%	26%	82,038
SAMN08741491	NA	Muscle (Penaeus monodon, female, SAMN08741491)	40,722,598	80%	24%	60,647
SAMN08741492	NA	Stomach (Penaeus monodon, female, SAMN08741492)	26,940,212	78%	29%	80,215
SAMN08741493	NA	Muscle (Penaeus monodon, male, SAMN08741493)	40,259,716	80%	24%	58,507
SAMN08741494	NA	Hepatopancreas (Penaeus monodon, male, SAMN08741494)	38,059,946	72%	31%	66,596
SAMN08741495	NA	Haemolymph (Penaeus monodon, female, SAMN08741495)	40,210,798	70%	25%	75,537
SAMN08741496	NA	Female Gonad (Penaeus monodon, female, SAMN08741496)	42,677,866	72%	43%	65,232
SAMN08741497	NA	Female Gonad (Penaeus monodon, female, SAMN08741497)	41,519,780	74%	44%	65,998
SAMN08741498	NA	Eyestalk (Penaeus monodon, male, SAMN08741498)	42,152,222	71%	22%	106,182
SAMN08741499	NA	Male Gonad (Penaeus monodon, male, SAMN08741499)	41,338,838	67%	42%	78,547
SAMN08741500	NA	Female Gonad (Penaeus monodon, female, SAMN08741500)	40,510,896	73%	42%	63,523
SAMN08741501	NA	Gills (Penaeus monodon, female, SAMN08741501)	42,724,152	67%	24%	85,343
SAMN08741502	NA	Haemolymph (Penaeus monodon, female, SAMN08741502)	40,494,412	69%	29%	64,813
SAMN08741503	NA	Stomach (Penaeus monodon, female, SAMN08741503)	42,923,178	72%	18%	77,580
SAMN08741504	NA	Eyestalk (Penaeus monodon, male, SAMN08741504)	44,500,590	70%	23%	79,034
SAMN08741505	NA	Gills (Penaeus monodon, male, SAMN08741505)	40,793,912	68%	23%	86,528
SAMN08741506	NA	Haemolymph (Penaeus monodon, male, SAMN08741506)	43,275,534	73%	28%	67,705
SAMN08741507	NA	Hepatopancreas (Penaeus monodon, male, SAMN08741507)	41,708,984	77%	24%	66,774
SAMN08741508	NA	Male Gonad (Penaeus monodon, male, SAMN08741508)	41,200,512	65%	46%	70,662
SAMN08741509	NA	Muscle (Penaeus monodon, male, SAMN08741509)	44,928,862	79%	23%	57,412
SAMN08741510	NA	Stomach (Penaeus monodon, male, SAMN08741510)	32,888,754	65%	26%	83,433
SAMN08741511	NA	Pooled animals (Penaeus monodon, pooled male and female, SAMN08741511)	39,490,626	73%	18%	79,964
SAMN08741512	NA	Pooled animals (Penaeus monodon, pooled male and female, SAMN08741512)	36,620,178	72%	22%	80,867
SAMN08741513	NA	Pooled animals (Penaeus monodon, pooled male and female, SAMN08741513)	39,057,378	75%	27%	95,464
SAMN08741514	NA	Pooled animals (Penaeus monodon, pooled male and female, SAMN08741514)	39,489,126	74%	23%	94,962
SAMN08741515	NA	Pooled animals (Penaeus monodon, pooled male and female, SAMN08741515)	39,630,206	75%	23%	93,646
SAMN08741516	NA	Pooled animals (Penaeus monodon, pooled male and female, SAMN08741516)	37,361,110	76%	25%	93,479
SAMN08741517	NA	Pooled animals (Penaeus monodon, pooled male and female, SAMN08741517)	37,547,334	77%	24%	92,155
SAMN08741518	NA	Pooled animals (Penaeus monodon, pooled male and female, SAMN08741518)	39,323,652	79%	24%	100,400
SAMN08741519	NA	Lymphoid Organ (Penaeus monodon, male, SAMN08741519)	39,747,506	73%	24%	76,054
SAMN08741520	NA	Lymphoid Organ (Penaeus monodon, male, SAMN08741520)	40,960,356	71%	27%	77,261
SAMN08741521	NA	Lymphoid Organ (Penaeus monodon, female, SAMN08741521)	40,745,724	70%	24%	74,431
SAMN09273661	NA	hepatopancreas (Penaeus monodon, SAMN09273661)	270,424	61%	40%	55,500
SAMN09652184	NA	muscle, hepatopancreas, haemocytes (Penaeus monodon, 60 days, SAMN09652184)	49,488,606	83%	24%	92,027
SAMN09652185	NA	muscle, hepatopancreas, haemocytes (Penaeus monodon, 60 days, SAMN09652185)	49,589,106	84%	20%	88,559
SAMN11606762	NA	Ovary (Penaeus monodon, 9 month, SAMN11606762)	108,800,908	66%	49%	61,892
SAMN11606763	NA	Ovary (Penaeus monodon, 9 month, SAMN11606763)	122,136,956	62%	49%	63,562
SAMN11606764	NA	Ovary (Penaeus monodon, 9 month, SAMN11606764)	105,386,300	44%	49%	51,546
SAMN11606765	NA	Ovary (Penaeus monodon, 9 month, SAMN11606765)	96,087,174	51%	50%	56,970
SAMN11606766	NA	Ovary (Penaeus monodon, 9 month, SAMN11606766)	119,697,940	65%	50%	67,415
SAMN11606767	NA	Ovary (Penaeus monodon, 9 month, SAMN11606767)	119,853,820	63%	48%	65,146
SAMN11606768	NA	Ovary (Penaeus monodon, 9 month, SAMN11606768)	97,661,268	54%	48%	56,931
SAMN11606769	NA	Ovary (Penaeus monodon, 9 month, SAMN11606769)	115,894,440	74%	48%	72,004
SAMN11606770	NA	Ovary (Penaeus monodon, 9 month, SAMN11606770)	235,945,850	76%	49%	88,833
SAMN11606771	NA	Ovary (Penaeus monodon, 9 month, SAMN11606771)	123,055,006	74%	50%	82,774
SAMN12253028	NA	Intestine (Penaeus monodon, 5-months, SAMN12253028)	129,625,348	66%	43%	92,701
SAMN12253029	NA	Intestine (Penaeus monodon, 5-months, SAMN12253029)	63,303,634	59%	39%	88,669
SAMN12253030	NA	Intestine (Penaeus monodon, 5-months, SAMN12253030)	76,870,614	62%	35%	88,224
SAMN12253031	NA	Intestine (Penaeus monodon, 5-months, SAMN12253031)	76,342,934	60%	38%	66,933
SAMN12253032	NA	Intestine (Penaeus monodon, 5-months, SAMN12253032)	60,428,548	58%	34%	77,658
SAMN12253033	NA	Intestine (Penaeus monodon, 5-months, SAMN12253033)	73,884,040	59%	33%	88,734
SAMN12253034	NA	Intestine (Penaeus monodon, 5-months, SAMN12253034)	62,639,360	62%	41%	93,483
SAMN12253035	NA	Intestine (Penaeus monodon, 5-months, SAMN12253035)	60,295,182	59%	35%	89,729
SAMN12253036	NA	Intestine (Penaeus monodon, 5-months, SAMN12253036)	65,183,512	58%	34%	88,699
SAMN12253037	NA	Intestine (Penaeus monodon, 5-months, SAMN12253037)	64,510,600	57%	33%	84,481
SAMN12739669	NA	intestine (Penaeus monodon, SAMN12739669)	57,870,922	69%	8%	59,375
SAMN12739670	NA	intestine (Penaeus monodon, SAMN12739670)	53,416,866	69%	12%	70,629
SAMN12739671	NA	intestine (Penaeus monodon, SAMN12739671)	53,714,234	63%	15%	62,176
SAMN12739672	NA	intestine (Penaeus monodon, SAMN12739672)	54,301,048	68%	4%	72,135
SAMN12739673	NA	intestine (Penaeus monodon, SAMN12739673)	50,985,090	69%	12%	66,977
SAMN12739675	NA	intestine (Penaeus monodon, SAMN12739675)	52,231,572	58%	5%	48,476
SAMN12739677	NA	intestine (Penaeus monodon, SAMN12739677)	53,170,672	74%	20%	79,398
SAMN12739678	NA	intestine (Penaeus monodon, SAMN12739678)	58,243,174	58%	7%	61,335
SAMN12739724	NA	intestine (Penaeus monodon, SAMN12739724)	55,173,744	69%	25%	83,512
SAMN12739726	NA	intestine (Penaeus monodon, SAMN12739726)	50,581,364	63%	17%	83,927
SAMN12739727	NA	intestine (Penaeus monodon, SAMN12739727)	45,106,434	23%	3%	29,952
SAMN12739728	NA	intestine (Penaeus monodon, SAMN12739728)	46,153,986	61%	8%	61,275
SAMN12739729	NA	intestine (Penaeus monodon, SAMN12739729)	47,331,658	54%	4%	62,078
SAMN12739730	NA	intestine (Penaeus monodon, SAMN12739730)	54,331,388	66%	13%	76,343
SAMN12739731	NA	intestine (Penaeus monodon, SAMN12739731)	51,296,128	70%	19%	79,822
SAMN14892359	NA	juvenile prawn, gill (Penaeus monodon, SAMN14892359)	146,458,648	78%	31%	120,386
SAMN14892360	NA	juvenile prawn, gill (Penaeus monodon, SAMN14892360)	144,479,694	79%	35%	125,017
SAMN14892361	NA	juvenile prawn, gill (Penaeus monodon, SAMN14892361)	143,856,502	79%	36%	121,159
SAMN15586315	NA	Eystalk S0_1 (Penaeus monodon, female, SAMN15586315)	69,429,284	79%	17%	112,964
SAMN15586316	NA	Eystalk S0_2 (Penaeus monodon, female, SAMN15586316)	64,483,194	82%	17%	104,138
SAMN15586317	NA	Eystalk S0_3 (Penaeus monodon, female, SAMN15586317)	56,773,836	87%	20%	104,316
SAMN15586318	NA	Eystalk S2_1 (Penaeus monodon, female, SAMN15586318)	72,995,682	83%	16%	104,724
SAMN15586319	NA	Eystalk S2_2 (Penaeus monodon, female, SAMN15586319)	66,544,648	81%	18%	109,306
SAMN15586320	NA	Eystalk S2_3 (Penaeus monodon, female, SAMN15586320)	59,150,248	79%	17%	106,662
SAMN15586321	NA	Brain and thoracic ganglia S0_1 (Penaeus monodon, female, SAMN15586321)	66,574,836	79%	19%	105,101
SAMN15586322	NA	Brain and thoracic ganglia S0_2 (Penaeus monodon, female, SAMN15586322)	62,328,674	81%	20%	104,372
SAMN15586323	NA	Brain and thoracic ganglia S0_3 (Penaeus monodon, female, SAMN15586323)	60,796,490	81%	19%	105,564
SAMN15586324	NA	Brain and thoracic ganglia S2_1 (Penaeus monodon, female, SAMN15586324)	66,001,640	80%	19%	106,878
SAMN15586325	NA	Brain and thoracic ganglia S2_2 (Penaeus monodon, female, SAMN15586325)	64,394,216	80%	20%	108,836
SAMN15586326	NA	Brain and thoracic ganglia S2_3 (Penaeus monodon, female, SAMN15586326)	63,064,496	79%	19%	106,978
SAMN15586327	NA	Antennal Gland S0_1 (Penaeus monodon, female, SAMN15586327)	62,937,906	76%	19%	94,385
SAMN15586328	NA	Antennal Gland S0_2 (Penaeus monodon, female, SAMN15586328)	65,671,968	78%	18%	97,072
SAMN15586329	NA	Antennal Gland S0_3 (Penaeus monodon, female, SAMN15586329)	132,973,786	77%	18%	103,413
SAMN15586330	NA	Antennal Gland S2_1 (Penaeus monodon, female, SAMN15586330)	67,612,714	77%	18%	94,619
SAMN15586331	NA	Antennal Gland S2_2 (Penaeus monodon, female, SAMN15586331)	65,463,032	77%	19%	96,596
SAMN15586332	NA	Antennal Gland S2_3 (Penaeus monodon, female, SAMN15586332)	66,088,040	78%	19%	96,694
SAMN15586333	NA	Ovary S0_1 (Penaeus monodon, female, SAMN15586333)	60,392,868	83%	30%	73,672
SAMN15586334	NA	Ovary S0_2 (Penaeus monodon, female, SAMN15586334)	65,728,234	88%	32%	79,332
SAMN15586335	NA	Ovary S0_3 (Penaeus monodon, female, SAMN15586335)	58,881,512	89%	32%	74,122
SAMN15586336	NA	Ovary S2_1 (Penaeus monodon, female, SAMN15586336)	70,843,144	86%	32%	79,678
SAMN15586337	NA	Ovary S2_2 (Penaeus monodon, female, SAMN15586337)	65,300,150	88%	32%	79,864
SAMN15586338	NA	Ovary S2_3 (Penaeus monodon, female, SAMN15586338)	67,333,182	88%	32%	82,828

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
SRR388207	SRX110649	SRP009653	SAMN00764091	59,390,588	77%	13%
SRR388221	SRX110651	SRP009653	SAMN00764092	77,731,518	80%	13%
SRR388222	SRX110652	SRP009653	SAMN00764094	59,227,360	76%	13%
SRR577080	SRX186016	SRP015700	SAMN01164061	240,897	45%	5%
SRR1648423	SRX757561	SRP049934	SAMN03196615	67,170,322	80%	11%
SRR1648424	SRX757562	SRP049934	SAMN03196615	68,735,286	79%	20%
SRR2191764	SRX1163483	SRP060408	SAMN03840823	262,521,420	78%	17%
SRR2643301	SRX1333495	SRP064775	SAMN04161436	45,063,432	86%	27%
SRR2643302	SRX1333568	SRP064775	SAMN04161437	24,625,638	43%	13%
SRR2643304	SRX1333569	SRP064775	SAMN04161438	41,025,958	93%	13%
SRR2643305	SRX1333570	SRP064775	SAMN04161439	46,435,664	74%	22%
SRR1805121	SRX2396566	SRP075304	SAMN03329577	237,886	83%	26%
SRR1805122	SRX878238	SRP075304	SAMN03329578	240,550	85%	49%
SRR6363366	SRX3459393	SRP126411	SAMN08131262	53,965,308	74%	16%
SRR6868130	SRX3822690	SRP127068	SAMN08741487	39,942,230	73%	26%
SRR6868129	SRX3822691	SRP127068	SAMN08741488	37,663,364	76%	34%
SRR6868132	SRX3822688	SRP127068	SAMN08741489	37,968,304	74%	24%
SRR6868131	SRX3822689	SRP127068	SAMN08741490	39,600,548	70%	26%
SRR6868134	SRX3822686	SRP127068	SAMN08741491	40,722,598	80%	24%
SRR6868133	SRX3822687	SRP127068	SAMN08741492	26,940,212	78%	29%
SRR6868136	SRX3822684	SRP127068	SAMN08741493	40,259,716	80%	24%
SRR6868135	SRX3822685	SRP127068	SAMN08741494	38,059,946	72%	31%
SRR6868127	SRX3822693	SRP127068	SAMN08741495	40,210,798	70%	25%
SRR6868126	SRX3822694	SRP127068	SAMN08741496	42,677,866	72%	43%
SRR6868149	SRX3822670	SRP127068	SAMN08741497	41,519,780	74%	44%
SRR6868152	SRX3822671	SRP127068	SAMN08741498	42,152,222	71%	22%
SRR6868148	SRX3822672	SRP127068	SAMN08741499	41,338,838	67%	42%
SRR6868147	SRX3822673	SRP127068	SAMN08741500	20,338,174	73%	42%
SRR6868143	SRX3822677	SRP127068	SAMN08741500	20,172,722	73%	42%
SRR6868154	SRX3822666	SRP127068	SAMN08741501	21,445,634	67%	24%
SRR6868146	SRX3822674	SRP127068	SAMN08741501	21,278,518	67%	24%
SRR6868153	SRX3822667	SRP127068	SAMN08741502	20,319,470	69%	29%
SRR6868145	SRX3822675	SRP127068	SAMN08741502	20,174,942	69%	29%
SRR6868151	SRX3822668	SRP127068	SAMN08741503	21,522,732	72%	18%
SRR6868138	SRX3822682	SRP127068	SAMN08741503	21,400,446	72%	18%
SRR6868150	SRX3822669	SRP127068	SAMN08741504	22,316,126	70%	23%
SRR6868137	SRX3822683	SRP127068	SAMN08741504	22,184,464	70%	23%
SRR6868160	SRX3822663	SRP127068	SAMN08741505	20,461,084	68%	23%
SRR6868118	SRX3822702	SRP127068	SAMN08741505	20,332,828	68%	23%
SRR6868156	SRX3822664	SRP127068	SAMN08741506	21,703,132	73%	28%
SRR6868119	SRX3822701	SRP127068	SAMN08741506	21,572,402	73%	28%
SRR6868166	SRX3822653	SRP127068	SAMN08741507	20,909,922	77%	24%
SRR6868123	SRX3822700	SRP127068	SAMN08741507	20,799,062	77%	24%
SRR6868167	SRX3822652	SRP127068	SAMN08741508	20,714,776	65%	46%
SRR6868120	SRX3822699	SRP127068	SAMN08741508	20,485,736	65%	46%
SRR6868168	SRX3822655	SRP127068	SAMN08741509	22,541,996	79%	23%
SRR6868121	SRX3822698	SRP127068	SAMN08741509	22,386,866	79%	23%
SRR6868165	SRX3822654	SRP127068	SAMN08741510	16,503,986	65%	26%
SRR6868122	SRX3822697	SRP127068	SAMN08741510	16,384,768	65%	26%
SRR6868171	SRX3822649	SRP127068	SAMN08741511	19,822,322	74%	18%
SRR6868124	SRX3822696	SRP127068	SAMN08741511	19,668,304	73%	18%
SRR6868172	SRX3822648	SRP127068	SAMN08741512	18,357,896	72%	22%
SRR6868125	SRX3822695	SRP127068	SAMN08741512	18,262,282	72%	22%
SRR6868169	SRX3822651	SRP127068	SAMN08741513	19,588,024	75%	27%
SRR6868116	SRX3822704	SRP127068	SAMN08741513	19,469,354	75%	27%
SRR6868170	SRX3822650	SRP127068	SAMN08741514	19,818,090	74%	23%
SRR6868117	SRX3822703	SRP127068	SAMN08741514	19,671,036	74%	23%
SRR6868163	SRX3822657	SRP127068	SAMN08741515	19,886,890	75%	23%
SRR6868161	SRX3822659	SRP127068	SAMN08741515	19,743,316	75%	23%
SRR6868164	SRX3822656	SRP127068	SAMN08741516	18,742,508	76%	25%
SRR6868159	SRX3822660	SRP127068	SAMN08741516	18,618,602	76%	25%
SRR6868158	SRX3822661	SRP127068	SAMN08741517	18,710,608	77%	24%
SRR6868140	SRX3822680	SRP127068	SAMN08741517	18,836,726	77%	24%
SRR6868157	SRX3822662	SRP127068	SAMN08741518	19,599,512	79%	24%
SRR6868139	SRX3822681	SRP127068	SAMN08741518	19,724,140	79%	24%
SRR6868162	SRX3822658	SRP127068	SAMN08741519	19,781,110	73%	24%
SRR6868142	SRX3822678	SRP127068	SAMN08741519	19,966,396	73%	24%
SRR6868141	SRX3822679	SRP127068	SAMN08741520	20,558,790	71%	27%
SRR6868128	SRX3822692	SRP127068	SAMN08741520	20,401,566	71%	27%
SRR6868155	SRX3822665	SRP127068	SAMN08741521	20,300,098	70%	24%
SRR6868144	SRX3822676	SRP127068	SAMN08741521	20,445,626	70%	24%
SRR6713996	SRX3687430	SRP132755	SAMN08516578	1,849,137	90%	11%
SRR6713995	SRX3687431	SRP132755	SAMN08516579	1,773,676	88%	8%
SRR6713990	SRX3687436	SRP132755	SAMN08516580	1,791,636	71%	8%
SRR6713989	SRX3687437	SRP132755	SAMN08516581	1,792,814	88%	19%
SRR6713991	SRX3687435	SRP132755	SAMN08516583	1,791,794	87%	13%
SRR6713998	SRX3687428	SRP132755	SAMN08516584	1,775,350	69%	6%
SRR7516835	SRX4386842	SRP153251	SAMN09652184	49,488,606	83%	24%
SRR7516836	SRX4386841	SRP153251	SAMN09652185	49,589,106	84%	20%
SRR8078925	SRX4906103	SRP166136	SAMN09273661	44,494	61%	41%
SRR8078924	SRX4906104	SRP166136	SAMN09273661	42,936	63%	41%
SRR8078923	SRX4906105	SRP166136	SAMN09273661	48,691	59%	38%
SRR8078922	SRX4906106	SRP166136	SAMN09273661	43,025	62%	41%
SRR8078921	SRX4906107	SRP166136	SAMN09273661	45,760	62%	41%
SRR8078920	SRX4906108	SRP166136	SAMN09273661	45,518	61%	40%
SRR9031901	SRX5809171	SRP197216	SAMN11606762	108,800,908	66%	49%
SRR9031902	SRX5809170	SRP197216	SAMN11606763	122,136,956	62%	49%
SRR9031899	SRX5809173	SRP197216	SAMN11606764	105,386,300	44%	49%
SRR9031900	SRX5809172	SRP197216	SAMN11606765	96,087,174	51%	50%
SRR9031905	SRX5809167	SRP197216	SAMN11606766	119,697,940	65%	50%
SRR9031906	SRX5809166	SRP197216	SAMN11606767	119,853,820	63%	48%
SRR9031903	SRX5809169	SRP197216	SAMN11606768	97,661,268	54%	48%
SRR9031904	SRX5809168	SRP197216	SAMN11606769	115,894,440	74%	48%
SRR9031907	SRX5809165	SRP197216	SAMN11606770	235,945,850	76%	49%
SRR9031908	SRX5809164	SRP197216	SAMN11606771	123,055,006	74%	50%
SRR9669664	SRX6430186	SRP214180	SAMN12253028	129,625,348	66%	43%
SRR9669678	SRX6430172	SRP214180	SAMN12253029	63,303,634	59%	39%
SRR9669674	SRX6430176	SRP214180	SAMN12253030	76,870,614	62%	35%
SRR9669675	SRX6430175	SRP214180	SAMN12253031	76,342,934	60%	38%
SRR9669688	SRX6430162	SRP214180	SAMN12253032	60,428,548	58%	34%
SRR9669689	SRX6430161	SRP214180	SAMN12253033	73,884,040	59%	33%
SRR9669673	SRX6430177	SRP214180	SAMN12253034	62,639,360	62%	41%
SRR9669687	SRX6430163	SRP214180	SAMN12253035	60,295,182	59%	35%
SRR9669660	SRX6430190	SRP214180	SAMN12253036	65,183,512	58%	34%
SRR9669661	SRX6430189	SRP214180	SAMN12253037	64,510,600	57%	33%
SRR10279842	SRX6992891	SRP225718	SAMN12739669	57,870,922	69%	8%
SRR10279841	SRX6992892	SRP225718	SAMN12739670	53,416,866	69%	12%
SRR10279835	SRX6992898	SRP225718	SAMN12739671	53,714,234	63%	15%
SRR10279834	SRX6992899	SRP225718	SAMN12739672	54,301,048	68%	4%
SRR10279833	SRX6992900	SRP225718	SAMN12739673	50,985,090	69%	12%
SRR10279832	SRX6992901	SRP225718	SAMN12739675	52,231,572	58%	5%
SRR10279831	SRX6992902	SRP225718	SAMN12739677	53,170,672	74%	20%
SRR10279830	SRX6992903	SRP225718	SAMN12739678	58,243,174	58%	7%
SRR10279829	SRX6992904	SRP225718	SAMN12739724	55,173,744	69%	25%
SRR10279828	SRX6992905	SRP225718	SAMN12739726	50,581,364	63%	17%
SRR10279840	SRX6992893	SRP225718	SAMN12739727	45,106,434	23%	3%
SRR10279839	SRX6992894	SRP225718	SAMN12739728	46,153,986	61%	8%
SRR10279838	SRX6992895	SRP225718	SAMN12739729	47,331,658	54%	4%
SRR10279837	SRX6992896	SRP225718	SAMN12739730	54,331,388	66%	13%
SRR10279836	SRX6992897	SRP225718	SAMN12739731	51,296,128	70%	19%
SRR11805665	SRX8357080	SRP262105	SAMN14892359	49,236,624	77%	34%
SRR11805664	SRX8357081	SRP262105	SAMN14892359	48,478,994	79%	32%
SRR11805655	SRX8357090	SRP262105	SAMN14892359	48,743,030	78%	29%
SRR11805654	SRX8357091	SRP262105	SAMN14892360	47,468,446	77%	35%
SRR11805653	SRX8357092	SRP262105	SAMN14892360	48,525,278	80%	34%
SRR11805652	SRX8357093	SRP262105	SAMN14892360	48,485,970	79%	34%
SRR11805651	SRX8357094	SRP262105	SAMN14892361	48,220,504	78%	36%
SRR11805650	SRX8357095	SRP262105	SAMN14892361	47,663,910	81%	35%
SRR11805649	SRX8357096	SRP262105	SAMN14892361	47,972,088	79%	36%
SRR12272664	SRX8777666	SRP272661	SAMN15586315	69,429,284	79%	17%
SRR12272663	SRX8777667	SRP272661	SAMN15586316	64,483,194	82%	17%
SRR12272652	SRX8777678	SRP272661	SAMN15586317	56,773,836	87%	20%
SRR12272647	SRX8777683	SRP272661	SAMN15586318	72,995,682	83%	16%
SRR12272646	SRX8777684	SRP272661	SAMN15586319	66,544,648	81%	18%
SRR12272645	SRX8777685	SRP272661	SAMN15586320	59,150,248	79%	17%
SRR12272644	SRX8777686	SRP272661	SAMN15586321	66,574,836	79%	19%
SRR12272643	SRX8777687	SRP272661	SAMN15586322	62,328,674	81%	20%
SRR12272642	SRX8777688	SRP272661	SAMN15586323	60,796,490	81%	19%
SRR12272641	SRX8777689	SRP272661	SAMN15586324	66,001,640	80%	19%
SRR12272662	SRX8777668	SRP272661	SAMN15586325	64,394,216	80%	20%
SRR12272661	SRX8777669	SRP272661	SAMN15586326	63,064,496	79%	19%
SRR12272660	SRX8777670	SRP272661	SAMN15586327	62,937,906	76%	19%
SRR12272659	SRX8777671	SRP272661	SAMN15586328	65,671,968	78%	18%
SRR12272658	SRX8777672	SRP272661	SAMN15586329	132,973,786	77%	18%
SRR12272657	SRX8777673	SRP272661	SAMN15586330	67,612,714	77%	18%
SRR12272656	SRX8777674	SRP272661	SAMN15586331	65,463,032	77%	19%
SRR12272655	SRX8777675	SRP272661	SAMN15586332	66,088,040	78%	19%
SRR12272654	SRX8777676	SRP272661	SAMN15586333	60,392,868	83%	30%
SRR12272653	SRX8777677	SRP272661	SAMN15586334	65,728,234	88%	32%
SRR12272651	SRX8777679	SRP272661	SAMN15586335	58,881,512	89%	32%
SRR12272650	SRX8777680	SRP272661	SAMN15586336	70,843,144	86%	32%
SRR12272649	SRX8777681	SRP272661	SAMN15586337	65,300,150	88%	32%
SRR12272648	SRX8777682	SRP272661	SAMN15586338	67,333,182	88%	32%

Protein alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Hyalella azteca high-quality model RefSeq (XP_)	9,395	6,443 (68.58%)	6,443 (68.58%)	64.57%	57.66%
Caenorhabditis elegans known RefSeq (NP_)	28,338	9,100 (32.11%)	9,100 (32.11%)	58.92%	41.42%
Crustacea GenBank	44,303	28,097 (63.42%)	28,097 (63.42%)	70.53%	76.66%
Daphnia pulex Other	31,004	11,797 (38.05%)	11,797 (38.05%)	62.90%	56.92%
Same-species GenBank	576	524 (90.97%)	524 (90.97%)	83.33%	89.62%
Penaeus vannamei high-quality model RefSeq (XP_)	11,838	10,846 (91.62%)	10,846 (91.62%)	80.29%	86.30%
Tribolium castaneum GenBank	661	280 (42.36%)	280 (42.36%)	68.67%	68.55%
Tribolium castaneum high-quality model RefSeq (XP_)	11,487	7,324 (63.76%)	7,324 (63.76%)	61.81%	51.36%
Tribolium castaneum known RefSeq (NP_)	627	499 (79.59%)	499 (79.59%)	64.69%	53.56%
Drosophila melanogaster known RefSeq (NP_)	30,704	12,886 (41.97%)	12,886 (41.97%)	62.02%	47.54%
Eurytemora affinis high-quality model RefSeq (XP_)	14,540	7,457 (51.29%)	7,457 (51.29%)	60.49%	45.63%

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20

RefSeq

Integrated reference sequences