NCBI Pecten maximus Annotation Release 100

The RefSeq genome records for Pecten maximus were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. This report presents statistics on the annotation products, the input data used in the pipeline and intermediate alignment results.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as NCBI Pecten maximus Annotation Release 100

Annotation release ID: 100
Date of Entrez queries for transcripts and proteins: Apr 16 2020
Date of submission of annotation to the public databases: Apr 28 2020
Software version: 8.4

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
xPecMax1.1	GCF_902652985.1	SC	11-26-2019	Reference	19 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	xPecMax1.1
Genes and pseudogenes	30,903
protein-coding	26,152
non-coding	4,449
transcribed pseudogenes	0
non-transcribed pseudogenes	302
genes with variants	6,736
immunoglobulin/T-cell receptor gene segments	0
other	0
mRNAs	39,918
fully-supported	34,090
with > 5% ab initio	4,389
partial	1,015
with filled gap(s)	515
known RefSeq (NM_)	0
model RefSeq (XM_)	39,918
non-coding RNAs	7,731
fully-supported	6,407
with > 5% ab initio	0
partial	6
with filled gap(s)	5
known RefSeq (NR_)	0
model RefSeq (XR_)	6,893
pseudo transcripts	0
fully-supported	0
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	0
CDSs	39,918
fully-supported	34,090
with > 5% ab initio	4,607
partial	931
with major correction(s)	1,123
known RefSeq (NP_)	0
model RefSeq (XP_)	39,918

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	30,601	16,624	7,956	68	896,274
All transcripts	47,649	3,353	2,548	55	61,359
mRNA	39,918	3,558	2,727	126	61,359
misc_RNA	2,153	4,773	4,254	155	16,567
tRNA	838	74	73	70	84
lncRNA	4,254	1,679	1,092	55	12,485
snoRNA	73	115	79	68	281
snRNA	148	151	158	105	199
guide_RNA	2	131	131	131	131
rRNA	263	880	121	119	3,739
Single-exon transcripts	2,681	1,649	1,230	279	14,672
coding transcripts (NM_/XM_ )	2,681	1,649	1,230	279	14,672
CDSs	39,918	1,820	1,335	126	61,008
Exons	238,842	399	143	1	16,265
in coding transcripts (NM_/XM_ )	225,170	378	141	1	16,265
in non-coding transcripts (NR_/XR_ )	22,744	523	157	2	10,383
Introns	207,927	2,417	763	30	469,469
in coding transcripts (NM_/XM_ )	198,977	2,446	779	30	469,469
in non-coding transcripts (NR_/XR_ )	17,921	2,180	608	30	223,297

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	1.57	1	1	50
Number of exons per transcript	9.61	6	1	179

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the UniProtKB/Swiss-Prot curated proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 26152 coding genes, 16520 genes had a protein with an alignment covering 50% or more of the query and 3495 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: UniProtKB/Swiss-Prot curated proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker for each assembly. RepeatMasker results are only used for organisms for which a comprehensive repeat library is available.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with RepeatMasker	% Masked with WindowMasker
xPecMax1.1	GCF_902652985.1	1.71%	31.17%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez, aligned to the genome by Splign or ProSplign and passed to Gnomon, NCBI's gene prediction software.

Depending on the other evidence available, long 454 reads (with average length above 250 nt) may be aligned as traditional evidence and reported in the Transcript alignments section or aligned with RNA-Seq reads and reported in the RNA-Seq alignments section.

Transcript alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species Genbank	16	15 (93.75%)	15 (93.75%)	98.86%	98.76%
Same-species EST	1,122	1,063 (94.74%)	1,018 (90.73%)	99.27%	98.55%
Same-species long SRA	1,105,257	1,000,954 (90.56%)	712,439 (64.46%)	94.15%	57.42%

RNA-Seq alignments

The following RNA-Seq reads from the Sequence Read Archive were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Publication	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	NA	Aggregate of all aligned samples	4,863,653,370	76%	13%	265,972
SAMEA4788157	NA	adductor muscle, hepatopancreas, male & female gonad (Pecten maximus, SAMEA4788157)	41,776,607	86%	25%	140,365
SAMN02370931	24486903	control (Pecten maximus, SAMN02370931)	67,643,592	88%	22%	122,408
SAMN02370932	24486903	stim_Vibrio (Pecten maximus, SAMN02370932)	70,855,070	88%	23%	118,420
SAMN02370933	24486903	stim_PAMPs (Pecten maximus, SAMN02370933)	77,946,012	83%	21%	90,975
SAMN02693243	NA	Mantle (Pecten maximus, 1 year, SAMN02693243)	1,335,060,734	80%	4%	215,248
SAMN04329996	NA	mantle edge (Pecten maximus, SAMN04329996)	36,298,044	69%	10%	71,274
SAMN04329997	NA	mantle edge (Pecten maximus, SAMN04329997)	22,772,048	69%	18%	82,272
SAMN04329998	NA	mantle edge (Pecten maximus, SAMN04329998)	15,934,782	71%	15%	57,416
SAMN04329999	NA	mantle edge (Pecten maximus, SAMN04329999)	30,291,796	70%	11%	78,502
SAMN04330000	NA	mantle edge (Pecten maximus, SAMN04330000)	19,730,946	70%	18%	64,379
SAMN04330001	NA	mantle edge (Pecten maximus, SAMN04330001)	30,912,312	70%	15%	77,713
SAMN04330002	NA	mantle edge (Pecten maximus, SAMN04330002)	26,189,046	69%	13%	54,958
SAMN04330003	NA	mantle edge (Pecten maximus, SAMN04330003)	34,735,438	70%	15%	83,398
SAMN04330004	NA	mantle edge (Pecten maximus, SAMN04330004)	33,407,452	72%	12%	79,150
SAMN04330005	NA	mantle edge (Pecten maximus, SAMN04330005)	30,864,958	73%	17%	82,145
SAMN04330006	NA	mantle edge (Pecten maximus, SAMN04330006)	17,562,498	71%	14%	70,407
SAMN04330007	NA	mantle edge (Pecten maximus, SAMN04330007)	20,191,258	68%	10%	44,207
SAMN04330008	NA	mantle edge (Pecten maximus, SAMN04330008)	30,191,656	73%	11%	71,561
SAMN04330009	NA	mantle edge (Pecten maximus, SAMN04330009)	32,923,552	71%	11%	73,462
SAMN06067809	NA	stripped oocytes (Pecten maximus, SAMN06067809)	66,454,364	86%	16%	136,938
SAMN06067812	NA	released oocytes (Pecten maximus, SAMN06067812)	84,121,748	84%	12%	95,403
SAMN08049259	NA	pool of larvae (Pecten maximus, 13 dpf, SAMN08049259)	16,968,577	81%	8%	107,785
SAMN08049260	NA	pool of larvae (Pecten maximus, 13 dpf, SAMN08049260)	14,197,389	79%	9%	104,344
SAMN08049261	NA	pool of larvae (Pecten maximus, 13 dpf, SAMN08049261)	18,064,107	80%	9%	104,502
SAMN08049262	NA	pool of larvae (Pecten maximus, 13 dpf, SAMN08049262)	17,521,477	81%	9%	107,311
SAMN08049263	NA	pool of larvae (Pecten maximus, 13 dpf, SAMN08049263)	21,251,475	80%	8%	114,426
SAMN08049264	NA	pool of larvae (Pecten maximus, 13 dpf, SAMN08049264)	18,783,069	81%	9%	107,129
SAMN08049265	NA	pool of larvae (Pecten maximus, 14 dpf, SAMN08049265)	19,906,862	81%	8%	111,126
SAMN08049266	NA	pool of larvae (Pecten maximus, 14 dpf, SAMN08049266)	16,100,003	80%	10%	114,788
SAMN08049267	NA	pool of larvae (Pecten maximus, 14 dpf, SAMN08049267)	25,011,257	81%	8%	118,988
SAMN08049268	NA	pool of larvae (Pecten maximus, 14 dpf, SAMN08049268)	13,497,173	80%	10%	101,228
SAMN08049269	NA	pool of larvae (Pecten maximus, 14 dpf, SAMN08049269)	15,944,384	81%	9%	101,063
SAMN08049270	NA	pool of larvae (Pecten maximus, 14 dpf, SAMN08049270)	20,317,082	80%	7%	115,417
SAMN08049271	NA	pool of larvae (Pecten maximus, 16 dpf, SAMN08049271)	20,567,026	81%	8%	110,418
SAMN08049272	NA	pool of larvae (Pecten maximus, 16 dpf, SAMN08049272)	27,464,245	81%	9%	119,729
SAMN08049273	NA	pool of larvae (Pecten maximus, 16 dpf, SAMN08049273)	22,918,181	80%	8%	110,197
SAMN08049274	NA	pool of larvae (Pecten maximus, 16 dpf, SAMN08049274)	14,957,523	78%	7%	89,831
SAMN08049275	NA	pool of larvae (Pecten maximus, 16 dpf, SAMN08049275)	15,451,265	81%	9%	101,464
SAMN08049276	NA	pool of larvae (Pecten maximus, 16 dpf, SAMN08049276)	14,675,578	80%	8%	90,154
SAMN10261175	NA	Whole organism (Pecten maximus, 24 hours, SAMN10261175)	124,509,502	76%	17%	200,994
SAMN10261176	NA	Whole organism (Pecten maximus, 24 hours, SAMN10261176)	113,134,920	75%	16%	199,612
SAMN10261177	NA	Whole organism (Pecten maximus, 24 hours, SAMN10261177)	121,455,074	74%	18%	205,420
SAMN10261178	NA	Whole organism (Pecten maximus, 24 hours, SAMN10261178)	119,000,678	74%	18%	205,842
SAMN10261179	NA	Whole organism (Pecten maximus, 40 hours, SAMN10261179)	111,034,960	74%	15%	209,890
SAMN10261180	NA	Whole organism (Pecten maximus, 40 hours, SAMN10261180)	132,612,408	74%	15%	213,002
SAMN10261181	NA	Whole organism (Pecten maximus, 40 hours, SAMN10261181)	135,735,826	74%	15%	214,432
SAMN10261182	NA	Whole organism (Pecten maximus, 40 hours, SAMN10261182)	142,489,662	75%	14%	212,921
SAMN10261183	NA	Whole organism (Pecten maximus, 48 hours, SAMN10261183)	148,642,414	73%	12%	213,364
SAMN10261184	NA	Whole organism (Pecten maximus, 48 hours, SAMN10261184)	145,710,116	74%	12%	215,406
SAMN10261185	NA	Whole organism (Pecten maximus, 48 hours, SAMN10261185)	121,729,544	74%	12%	208,528
SAMN10261186	NA	Whole organism (Pecten maximus, 48 hours, SAMN10261186)	128,303,404	74%	12%	209,256
SAMN10583512	NA	Mantle edge-Left valve (Pecten maximus, SAMN10583512)	27,945,504	72%	25%	126,106
SAMN10583513	NA	Mantle edge-Right valve (Pecten maximus, SAMN10583513)	26,813,822	70%	23%	59,369
SAMN10583514	NA	Central mantle-Left valve (Pecten maximus, SAMN10583514)	28,317,154	71%	22%	77,133
SAMN10583515	NA	Central mantle-Right valve (Pecten maximus, SAMN10583515)	23,272,292	64%	21%	75,108
SAMN10583516	NA	Mantle edge-Left valve (Pecten maximus, SAMN10583516)	27,696,306	69%	21%	115,740
SAMN10583517	NA	Mantle edge-Right valve (Pecten maximus, SAMN10583517)	34,668,896	69%	21%	107,806
SAMN10583518	NA	Central mantle-Left valve (Pecten maximus, SAMN10583518)	27,853,136	71%	25%	90,540
SAMN10583519	NA	Central mantle-Right valve (Pecten maximus, SAMN10583519)	24,885,066	67%	23%	65,752
SAMN10583520	NA	Mantle edge-Left valve (Pecten maximus, SAMN10583520)	35,170,266	70%	21%	101,825
SAMN10583521	NA	Mantle edge-Right valve (Pecten maximus, SAMN10583521)	28,417,286	68%	22%	105,025
SAMN10583522	NA	Central mantle-Left valve (Pecten maximus, SAMN10583522)	35,396,198	71%	18%	105,195
SAMN10583523	NA	Central mantle-Right valve (Pecten maximus, SAMN10583523)	23,967,656	69%	27%	89,989
SAMN10583524	NA	Mantle edge-Left valve (Pecten maximus, SAMN10583524)	28,683,072	67%	17%	85,055
SAMN10583525	NA	Mantle edge-Right valve (Pecten maximus, SAMN10583525)	28,296,926	67%	18%	93,309
SAMN10583526	NA	Central mantle-Left valve (Pecten maximus, SAMN10583526)	30,227,518	68%	20%	97,545
SAMN10583527	NA	Central mantle-Right valve (Pecten maximus, SAMN10583527)	26,109,214	66%	19%	49,683
SAMN10583528	NA	Mantle edge-Left valve (Pecten maximus, SAMN10583528)	17,476,388	63%	24%	60,291
SAMN10583529	NA	Mantle edge-Right valve (Pecten maximus, SAMN10583529)	30,189,288	72%	26%	143,428
SAMN10583530	NA	Central mantle-Left valve (Pecten maximus, SAMN10583530)	24,200,620	67%	22%	63,653
SAMN10583531	NA	Central mantle-Right valve (Pecten maximus, SAMN10583531)	21,887,694	70%	28%	81,540
SAMN10583532	NA	Mantle edge-Left valve (Pecten maximus, SAMN10583532)	22,941,362	69%	22%	125,999
SAMN10583533	NA	Mantle edge-Right valve (Pecten maximus, SAMN10583533)	24,632,832	70%	23%	129,044
SAMN10583534	NA	Central mantle-Left valve (Pecten maximus, SAMN10583534)	27,579,016	70%	27%	86,246
SAMN10583535	NA	Central mantle-Right valve (Pecten maximus, SAMN10583535)	21,109,410	70%	28%	65,049
SAMN10583536	NA	Mantle edge-Left valve (Pecten maximus, SAMN10583536)	24,804,132	70%	23%	119,083
SAMN10583537	NA	Mantle edge-Right valve (Pecten maximus, SAMN10583537)	22,884,290	70%	24%	97,137
SAMN10583538	NA	Central mantle-Left valve (Pecten maximus, SAMN10583538)	20,801,468	70%	30%	66,051
SAMN10583539	NA	Central mantle-Right valve (Pecten maximus, SAMN10583539)	30,117,892	70%	30%	93,184
SAMN10583540	NA	Mantle edge-Left valve (Pecten maximus, SAMN10583540)	16,399,046	68%	23%	92,439
SAMN10583541	NA	Mantle edge-Right valve (Pecten maximus, SAMN10583541)	32,988,674	69%	24%	115,540
SAMN10583542	NA	Central mantle-Left valve (Pecten maximus, SAMN10583542)	32,763,494	71%	31%	103,869
SAMN10583543	NA	Central mantle-Right valve (Pecten maximus, SAMN10583543)	31,338,358	71%	28%	100,887

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
ERR2697551	ERX2711944	ERP019499	SAMEA4788157	38,601,666	87%	25%
ERR2697552	ERX2711945	ERP019499	SAMEA4788157	2,718,279	77%	22%
ERR2697553	ERX2711946	ERP019499	SAMEA4788157	456,662	84%	20%
SRR1009240	SRX363342	SRP030767	SAMN02370931	67,643,592	88%	22%
SRR1009241	SRX363344	SRP030767	SAMN02370932	70,855,070	88%	23%
SRR1009242	SRX363345	SRP030767	SAMN02370933	77,946,012	83%	21%
SRR1200737	SRX497464	SRP040427	SAMN02693243	56,718,224	80%	6%
SRR1200746	SRX497464	SRP040427	SAMN02693243	58,336,152	79%	6%
SRR1200768	SRX497464	SRP040427	SAMN02693243	47,487,192	78%	5%
SRR1200769	SRX497464	SRP040427	SAMN02693243	62,913,936	77%	5%
SRR1200770	SRX497464	SRP040427	SAMN02693243	30,546,464	76%	11%
SRR1200773	SRX497464	SRP040427	SAMN02693243	67,330,408	81%	3%
SRR1201991	SRX497464	SRP040427	SAMN02693243	62,674,414	81%	3%
SRR1201994	SRX497464	SRP040427	SAMN02693243	46,470,136	92%	0%
SRR1202007	SRX497464	SRP040427	SAMN02693243	83,825,106	88%	1%
SRR1202010	SRX497464	SRP040427	SAMN02693243	168,996,064	79%	1%
SRR1202011	SRX497464	SRP040427	SAMN02693243	72,496,360	80%	1%
SRR1202013	SRX497464	SRP040427	SAMN02693243	60,066,228	72%	8%
SRR1202014	SRX497464	SRP040427	SAMN02693243	40,629,646	78%	5%
SRR1202015	SRX497464	SRP040427	SAMN02693243	29,446,388	72%	6%
SRR1202023	SRX497464	SRP040427	SAMN02693243	45,061,124	75%	5%
SRR1202026	SRX497464	SRP040427	SAMN02693243	58,427,352	80%	2%
SRR1202027	SRX497464	SRP040427	SAMN02693243	56,733,230	82%	5%
SRR1202028	SRX497464	SRP040427	SAMN02693243	58,447,438	83%	5%
SRR1202030	SRX497464	SRP040427	SAMN02693243	53,385,178	79%	7%
SRR1202031	SRX497464	SRP040427	SAMN02693243	64,154,740	82%	7%
SRR1202033	SRX497464	SRP040427	SAMN02693243	57,210,118	82%	8%
SRR1202034	SRX497464	SRP040427	SAMN02693243	53,704,836	83%	8%
SRR2601059	SRX1331755	SRP064659	SAMN10261175	62,533,260	77%	18%
SRR2601062	SRX1331757	SRP064659	SAMN10261175	61,976,242	76%	17%
SRR2601050	SRX1331747	SRP064659	SAMN10261176	64,057,588	76%	19%
SRR2601066	SRX1331760	SRP064659	SAMN10261176	49,077,332	72%	11%
SRR2601056	SRX1331752	SRP064659	SAMN10261177	57,135,396	72%	14%
SRR2601071	SRX1331762	SRP064659	SAMN10261177	64,319,678	77%	21%
SRR2601047	SRX1331744	SRP064659	SAMN10261178	57,627,762	74%	19%
SRR2601076	SRX1331765	SRP064659	SAMN10261178	61,372,916	73%	17%
SRR2601057	SRX1331753	SRP064659	SAMN10261179	63,338,448	74%	15%
SRR2601067	SRX1331761	SRP064659	SAMN10261179	47,696,512	75%	16%
SRR2601054	SRX1331750	SRP064659	SAMN10261180	70,752,432	74%	14%
SRR2601063	SRX1331758	SRP064659	SAMN10261180	61,859,976	74%	15%
SRR2601048	SRX1331745	SRP064659	SAMN10261181	68,416,250	74%	17%
SRR2601078	SRX1331767	SRP064659	SAMN10261181	67,319,576	74%	13%
SRR2601052	SRX1331748	SRP064659	SAMN10261182	70,486,582	75%	15%
SRR2601075	SRX1331764	SRP064659	SAMN10261182	72,003,080	75%	13%
SRR2601049	SRX1331746	SRP064659	SAMN10261183	69,873,170	73%	12%
SRR2601077	SRX1331766	SRP064659	SAMN10261183	78,769,244	72%	12%
SRR2601058	SRX1331754	SRP064659	SAMN10261184	75,001,466	74%	14%
SRR2601074	SRX1331763	SRP064659	SAMN10261184	70,708,650	73%	11%
SRR2601053	SRX1331749	SRP064659	SAMN10261185	68,173,182	74%	11%
SRR2601060	SRX1331756	SRP064659	SAMN10261185	53,556,362	74%	14%
SRR2601055	SRX1331751	SRP064659	SAMN10261186	63,221,404	75%	12%
SRR2601064	SRX1331759	SRP064659	SAMN10261186	65,082,000	72%	12%
SRR3101474	SRX1530561	SRP067223	SAMN04329996	36,298,044	69%	10%
SRR3101475	SRX1530562	SRP067223	SAMN04329997	22,772,048	69%	18%
SRR3101476	SRX1530563	SRP067223	SAMN04329998	15,934,782	71%	15%
SRR3101477	SRX1530564	SRP067223	SAMN04329999	30,291,796	70%	11%
SRR3101478	SRX1530565	SRP067223	SAMN04330000	19,730,946	70%	18%
SRR3101479	SRX1530566	SRP067223	SAMN04330001	30,912,312	70%	15%
SRR3101480	SRX1530567	SRP067223	SAMN04330002	26,189,046	69%	13%
SRR3101481	SRX1530568	SRP067223	SAMN04330003	34,735,438	70%	15%
SRR3101482	SRX1530569	SRP067223	SAMN04330004	33,407,452	72%	12%
SRR3101483	SRX1530570	SRP067223	SAMN04330005	30,864,958	73%	17%
SRR3101484	SRX1530571	SRP067223	SAMN04330006	17,562,498	71%	14%
SRR3101485	SRX1530572	SRP067223	SAMN04330007	20,191,258	68%	10%
SRR3101486	SRX1530573	SRP067223	SAMN04330008	30,191,656	73%	11%
SRR3101487	SRX1530574	SRP067223	SAMN04330009	32,923,552	71%	11%
SRR5062040	SRX2382517	SRP094094	SAMN06067809	66,454,364	86%	16%
SRR5062041	SRX2382518	SRP094094	SAMN06067812	84,121,748	84%	12%
SRR6312410	SRX3412666	SRP125391	SAMN08049259	16,968,577	81%	8%
SRR6312411	SRX3412665	SRP125391	SAMN08049260	14,197,389	79%	9%
SRR6312412	SRX3412664	SRP125391	SAMN08049261	18,064,107	80%	9%
SRR6312413	SRX3412663	SRP125391	SAMN08049262	17,521,477	81%	9%
SRR6312406	SRX3412670	SRP125391	SAMN08049263	21,251,475	80%	8%
SRR6312407	SRX3412669	SRP125391	SAMN08049264	18,783,069	81%	9%
SRR6312408	SRX3412668	SRP125391	SAMN08049265	19,906,862	81%	8%
SRR6312409	SRX3412667	SRP125391	SAMN08049266	16,100,003	80%	10%
SRR6312404	SRX3412672	SRP125391	SAMN08049267	25,011,257	81%	8%
SRR6312405	SRX3412671	SRP125391	SAMN08049268	13,497,173	80%	10%
SRR6312414	SRX3412662	SRP125391	SAMN08049269	15,944,384	81%	9%
SRR6312415	SRX3412661	SRP125391	SAMN08049270	20,317,082	80%	7%
SRR6312416	SRX3412660	SRP125391	SAMN08049271	20,567,026	81%	8%
SRR6312417	SRX3412659	SRP125391	SAMN08049272	27,464,245	81%	9%
SRR6312418	SRX3412658	SRP125391	SAMN08049273	22,918,181	80%	8%
SRR6312419	SRX3412657	SRP125391	SAMN08049274	14,957,523	78%	7%
SRR6312420	SRX3412656	SRP125391	SAMN08049275	15,451,265	81%	9%
SRR6312421	SRX3412655	SRP125391	SAMN08049276	14,675,578	80%	8%
SRR8300902	SRX5115401	SRP173064	SAMN10583512	27,945,504	72%	25%
SRR8300901	SRX5115402	SRP173064	SAMN10583513	26,813,822	70%	23%
SRR8300904	SRX5115399	SRP173064	SAMN10583514	28,317,154	71%	22%
SRR8300903	SRX5115400	SRP173064	SAMN10583515	23,272,292	64%	21%
SRR8300898	SRX5115405	SRP173064	SAMN10583516	27,696,306	69%	21%
SRR8300897	SRX5115406	SRP173064	SAMN10583517	34,668,896	69%	21%
SRR8300900	SRX5115403	SRP173064	SAMN10583518	27,853,136	71%	25%
SRR8300899	SRX5115404	SRP173064	SAMN10583519	24,885,066	67%	23%
SRR8300896	SRX5115407	SRP173064	SAMN10583520	35,170,266	70%	21%
SRR8300895	SRX5115408	SRP173064	SAMN10583521	28,417,286	68%	22%
SRR8300916	SRX5115387	SRP173064	SAMN10583522	35,396,198	71%	18%
SRR8300915	SRX5115388	SRP173064	SAMN10583523	23,967,656	69%	27%
SRR8300914	SRX5115389	SRP173064	SAMN10583524	28,683,072	67%	17%
SRR8300913	SRX5115390	SRP173064	SAMN10583525	28,296,926	67%	18%
SRR8300912	SRX5115391	SRP173064	SAMN10583526	30,227,518	68%	20%
SRR8300911	SRX5115392	SRP173064	SAMN10583527	26,109,214	66%	19%
SRR8300910	SRX5115393	SRP173064	SAMN10583528	17,476,388	63%	24%
SRR8300909	SRX5115394	SRP173064	SAMN10583529	30,189,288	72%	26%
SRR8300908	SRX5115395	SRP173064	SAMN10583530	24,200,620	67%	22%
SRR8300907	SRX5115396	SRP173064	SAMN10583531	21,887,694	70%	28%
SRR8300891	SRX5115412	SRP173064	SAMN10583532	22,941,362	69%	22%
SRR8300892	SRX5115411	SRP173064	SAMN10583533	24,632,832	70%	23%
SRR8300889	SRX5115414	SRP173064	SAMN10583534	27,579,016	70%	27%
SRR8300890	SRX5115413	SRP173064	SAMN10583535	21,109,410	70%	28%
SRR8300887	SRX5115416	SRP173064	SAMN10583536	24,804,132	70%	23%
SRR8300888	SRX5115415	SRP173064	SAMN10583537	22,884,290	70%	24%
SRR8300885	SRX5115418	SRP173064	SAMN10583538	20,801,468	70%	30%
SRR8300886	SRX5115417	SRP173064	SAMN10583539	30,117,892	70%	30%
SRR8300893	SRX5115410	SRP173064	SAMN10583540	16,399,046	68%	23%
SRR8300894	SRX5115409	SRP173064	SAMN10583541	32,988,674	69%	24%
SRR8300906	SRX5115397	SRP173064	SAMN10583542	32,763,494	71%	31%
SRR8300905	SRX5115398	SRP173064	SAMN10583543	31,338,358	71%	28%

Protein alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Crassostrea gigas high-quality model RefSeq (XP_)	22,081	16,674 (75.51%)	16,674 (75.51%)	61.33%	53.43%
Mollusca GenBank	14,819	7,365 (49.70%)	7,365 (49.70%)	71.41%	75.95%
Mollusca known RefSeq (NP_)	481	16 (3.33%)	16 (3.33%)	69.36%	70.13%
Aplysia californica high-quality model RefSeq (XP_)	9,874	7,130 (72.21%)	7,130 (72.21%)	61.68%	55.38%
Same-species GenBank	15	15 (100.00%)	15 (100.00%)	79.97%	94.47%
Octopus vulgaris high-quality model RefSeq (XP_)	10,726	8,158 (76.06%)	8,158 (76.06%)	63.08%	59.25%
Drosophila melanogaster known RefSeq (NP_)	30,546	11,330 (37.09%)	11,330 (37.09%)	59.84%	44.69%
Strongylocentrotus purpuratus high-quality model RefSeq (XP_)	19,173	11,454 (59.74%)	11,454 (59.74%)	61.02%	48.61%
Strongylocentrotus purpuratus known RefSeq (NP_)	425	305 (71.76%)	305 (71.76%)	68.77%	65.42%
Ciona intestinalis known RefSeq (NP_)	942	618 (65.61%)	618 (65.61%)	63.23%	46.87%
Homo sapiens known RefSeq (NP_)	56,979	29,074 (51.03%)	29,074 (51.03%)	61.09%	49.40%

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20

RefSeq

Integrated reference sequences