NCBI Pimephales promelas Annotation Release 100

The RefSeq genome records for Pimephales promelas were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. This report presents statistics on the annotation products, the input data used in the pipeline and intermediate alignment results.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as NCBI Pimephales promelas Annotation Release 100

Annotation release ID: 100
Date of Entrez queries for transcripts and proteins: Feb 19 2021
Date of submission of annotation to the public databases: Feb 22 2021
Software version: 8.5

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
EPA_FHM_2.0	GCF_016745375.1	US EPA	01-24-2021	Reference	1 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	EPA_FHM_2.0
Genes and pseudogenes	36,695
protein-coding	26,629
non-coding	9,399
transcribed pseudogenes	25
non-transcribed pseudogenes	508
genes with variants	10,455
immunoglobulin/T-cell receptor gene segments	134
other	0
mRNAs	48,442
fully-supported	46,510
with > 5% ab initio	873
partial	705
with filled gap(s)	455
known RefSeq (NM_)	0
model RefSeq (XM_)	48,442
non-coding RNAs	11,056
fully-supported	5,654
with > 5% ab initio	0
partial	3
with filled gap(s)	3
known RefSeq (NR_)	0
model RefSeq (XR_)	8,832
pseudo transcripts	25
fully-supported	19
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	25
CDSs	48,589
fully-supported	46,510
with > 5% ab initio	1,014
partial	677
with major correction(s)	602
known RefSeq (NP_)	0
model RefSeq (XP_)	48,455

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	36,028	16,457	6,048	54	1,111,994
All transcripts	59,498	2,662	2,154	54	92,671
mRNA	48,442	3,136	2,545	141	92,671
misc_RNA	660	2,560	2,173	110	8,900
tRNA	2,222	75	73	68	87
lncRNA	4,994	850	668	79	7,502
snoRNA	274	136	130	61	317
snRNA	175	134	116	54	217
guide_RNA	13	213	176	129	383
rRNA	2,718	124	119	116	4,063
Single-exon transcripts	868	1,755	1,458	312	11,061
coding transcripts (NM_/XM_ )	868	1,755	1,458	312	11,061
CDSs	48,455	2,071	1,509	99	91,623
Exons	314,289	264	137	1	18,808
in coding transcripts (NM_/XM_ )	298,797	265	137	1	18,808
in non-coding transcripts (NR_/XR_ )	19,375	246	129	2	8,692
Introns	280,306	2,168	519	30	1,108,333
in coding transcripts (NM_/XM_ )	269,663	2,147	521	30	1,108,333
in non-coding transcripts (NR_/XR_ )	14,458	2,544	487	30	343,312

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	1.69	1	1	50
Number of exons per transcript	11.48	8	1	229

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the UniProtKB/Swiss-Prot curated proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 26616 coding genes, 23888 genes had a protein with an alignment covering 50% or more of the query and 12581 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: UniProtKB/Swiss-Prot curated proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker for each assembly. RepeatMasker results are only used for organisms for which a comprehensive repeat library is available.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with RepeatMasker	% Masked with WindowMasker
EPA_FHM_2.0	GCF_016745375.1	3.55%	34.97%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez, aligned to the genome by Splign or ProSplign and passed to Gnomon, NCBI's gene prediction software.

Depending on the other evidence available, long 454 reads (with average length above 250 nt) may be aligned as traditional evidence and reported in the Transcript alignments section or aligned with RNA-Seq reads and reported in the RNA-Seq alignments section.

Transcript alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species Genbank	160	158 (98.75%)	152 (95.00%)	99.08%	99.18%
Same-species EST	258,504	242,052 (93.64%)	226,978 (87.80%)	99.11%	99.55%

RNA-Seq alignments

The following RNA-Seq reads from the Sequence Read Archive were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Publication	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	NA	Aggregate of all aligned samples	3,611,640,760	83%	30%	365,970
SAMN03024265	NA	Adult, Gill Tissue (Pimephales promelas, 6 months, SAMN03024265, SAMN03024265)	470,813,940	82%	36%	265,877
SAMN05258326	NA	Anterior kidney (Pimephales promelas, 1 year, male, SAMN05258326)	6,897,244	82%	21%	137,094
SAMN05258327	NA	Anterior kidney (Pimephales promelas, 1 year, male, SAMN05258327)	10,381,616	82%	24%	158,966
SAMN05258328	NA	Anterior kidney (Pimephales promelas, 1 year, male, SAMN05258328)	6,268,096	81%	21%	127,176
SAMN05258329	NA	Anterior kidney (Pimephales promelas, 1 year, male, SAMN05258329)	10,063,520	82%	22%	146,397
SAMN05258330	NA	Anterior kidney (Pimephales promelas, 1 year, male, SAMN05258330)	11,560,914	83%	22%	148,238
SAMN05258331	NA	Anterior kidney (Pimephales promelas, 1 year, male, SAMN05258331)	10,120,502	83%	23%	141,202
SAMN05258332	NA	Anterior kidney (Pimephales promelas, 1 year, male, SAMN05258332)	7,203,848	81%	21%	133,630
SAMN05258333	NA	Anterior kidney (Pimephales promelas, 1 year, male, SAMN05258333)	13,190,036	82%	20%	148,342
SAMN05258334	NA	Anterior kidney (Pimephales promelas, 1 year, male, SAMN05258334)	14,568,992	84%	22%	160,358
SAMN05258335	NA	Anterior kidney (Pimephales promelas, 1 year, male, SAMN05258335)	16,300,548	82%	21%	144,824
SAMN05258336	NA	Anterior kidney (Pimephales promelas, 1 year, male, SAMN05258336)	10,893,242	83%	20%	144,225
SAMN05258337	NA	Anterior kidney (Pimephales promelas, 1 year, male, SAMN05258337)	8,785,198	80%	20%	137,156
SAMN05258338	NA	Anterior kidney (Pimephales promelas, 1 year, male, SAMN05258338)	7,633,268	83%	22%	143,494
SAMN05258339	NA	Anterior kidney (Pimephales promelas, 1 year, male, SAMN05258339)	9,305,934	80%	20%	144,423
SAMN05258340	NA	Anterior kidney (Pimephales promelas, 1 year, male, SAMN05258340)	11,445,054	82%	22%	156,196
SAMN05258341	NA	Anterior kidney (Pimephales promelas, 1 year, male, SAMN05258341)	8,302,964	81%	20%	137,622
SAMN05258342	NA	Anterior kidney (Pimephales promelas, 1 year, male, SAMN05258342)	7,539,238	84%	22%	140,581
SAMN05258343	NA	Anterior kidney (Pimephales promelas, 1 year, male, SAMN05258343)	11,015,484	83%	22%	152,036
SAMN05258344	NA	Anterior kidney (Pimephales promelas, 1 year, male, SAMN05258344)	8,395,500	55%	25%	80,173
SAMN05258345	NA	Anterior kidney (Pimephales promelas, 1 year, male, SAMN05258345)	7,606,890	81%	20%	121,333
SAMN05258346	NA	Anterior kidney (Pimephales promelas, 1 year, male, SAMN05258346)	11,424,946	84%	20%	150,494
SAMN05258347	NA	Anterior kidney (Pimephales promelas, 1 year, male, SAMN05258347)	6,188,140	80%	20%	120,523
SAMN05258348	NA	Anterior kidney (Pimephales promelas, 1 year, male, SAMN05258348)	11,030,704	82%	21%	132,272
SAMN05258349	NA	Anterior kidney (Pimephales promelas, 1 year, male, SAMN05258349)	7,702,164	84%	21%	131,918
SAMN10038688	30459712	gut (Pimephales promelas, Adult male, SAMN10038688)	71,150,986	86%	34%	207,304
SAMN10038693	30459712	telencephalon (Pimephales promelas, Adult male, SAMN10038693)	56,951,612	83%	24%	235,795
SAMN10038694	30459712	telencephalon (Pimephales promelas, Adult male, SAMN10038694)	69,472,920	83%	23%	242,368
SAMN10038695	30459712	telencephalon (Pimephales promelas, Adult male, SAMN10038695)	64,457,618	83%	24%	243,432
SAMN10038696	30459712	liver (Pimephales promelas, Adult male, SAMN10038696)	63,717,568	86%	37%	175,917
SAMN10038697	30459712	liver (Pimephales promelas, Adult male, SAMN10038697)	71,968,658	83%	37%	179,635
SAMN10038698	30459712	liver (Pimephales promelas, Adult male, SAMN10038698)	69,650,108	86%	38%	175,181
SAMN10038699	30459712	hypothalamus (Pimephales promelas, Adult male, SAMN10038699)	61,704,660	83%	24%	240,630
SAMN10038700	30459712	hypothalamus (Pimephales promelas, Adult male, SAMN10038700)	71,497,814	83%	22%	211,059
SAMN10038701	30459712	hypothalamus (Pimephales promelas, Adult male, SAMN10038701)	69,741,032	82%	24%	243,597
SAMN10038702	30459712	gut (Pimephales promelas, Adult male, SAMN10038702)	66,563,164	84%	31%	206,929
SAMN10038703	30459712	gut (Pimephales promelas, Adult male, SAMN10038703)	70,648,592	85%	33%	211,382
SAMN11518933	NA	liver (Pimephales promelas, not collected, SAMN11518933)	32,439,142	84%	24%	158,329
SAMN11518934	NA	liver (Pimephales promelas, not collected, SAMN11518934)	31,705,407	82%	24%	162,659
SAMN11518935	NA	liver (Pimephales promelas, not collected, SAMN11518935)	27,784,270	83%	25%	143,889
SAMN11518936	NA	liver (Pimephales promelas, not collected, SAMN11518936)	31,218,769	82%	23%	182,485
SAMN11518937	NA	liver (Pimephales promelas, not collected, SAMN11518937)	24,847,430	85%	25%	140,182
SAMN11518938	NA	liver (Pimephales promelas, not collected, SAMN11518938)	40,312,083	83%	25%	163,383
SAMN11518939	NA	liver (Pimephales promelas, not collected, SAMN11518939)	33,946,554	84%	25%	167,868
SAMN11518940	NA	liver (Pimephales promelas, not collected, SAMN11518940)	31,994,503	84%	23%	150,479
SAMN11518941	NA	liver (Pimephales promelas, not collected, SAMN11518941)	69,587,948	84%	25%	184,784
SAMN11518942	NA	liver (Pimephales promelas, not collected, SAMN11518942)	30,729,520	84%	25%	150,900
SAMN11518944	NA	liver (Pimephales promelas, not collected, SAMN11518944)	39,493,527	85%	25%	155,186
SAMN11518945	NA	liver (Pimephales promelas, not collected, SAMN11518945)	32,373,994	83%	24%	149,985
SAMN11518946	NA	liver (Pimephales promelas, not collected, SAMN11518946)	39,254,912	83%	24%	157,647
SAMN11518947	NA	liver (Pimephales promelas, not collected, SAMN11518947)	36,127,298	84%	25%	168,691
SAMN11518948	NA	liver (Pimephales promelas, not collected, SAMN11518948)	35,335,107	84%	24%	145,449
SAMN14246271	NA	kidney (Pimephales promelas, 7 mo, female, SAMN14246271)	44,795,140	80%	32%	196,118
SAMN14246272	NA	kidney (Pimephales promelas, 7 mo, female, SAMN14246272)	39,319,742	83%	33%	206,446
SAMN14246273	NA	kidney (Pimephales promelas, 7 mo, female, SAMN14246273)	44,065,724	81%	33%	199,835
SAMN14246274	NA	kidney (Pimephales promelas, 7 mo, female, SAMN14246274)	38,769,508	81%	31%	197,831
SAMN14246275	NA	kidney (Pimephales promelas, 7 mo, female, SAMN14246275)	41,726,668	82%	32%	216,134
SAMN14246276	NA	kidney (Pimephales promelas, 7 mo, female, SAMN14246276)	43,004,890	83%	32%	198,258
SAMN14246277	NA	kidney (Pimephales promelas, 7 mo, female, SAMN14246277)	44,122,248	82%	32%	200,807
SAMN14246278	NA	kidney (Pimephales promelas, 7 mo, female, SAMN14246278)	40,649,084	81%	32%	192,078
SAMN14246279	NA	kidney (Pimephales promelas, 7 mo, female, SAMN14246279)	40,728,232	82%	33%	196,082
SAMN14246280	NA	kidney (Pimephales promelas, 7 mo, female, SAMN14246280)	41,660,456	86%	34%	222,938
SAMN14246281	NA	kidney (Pimephales promelas, 7 mo, female, SAMN14246281)	45,279,830	81%	32%	203,501
SAMN14246282	NA	kidney (Pimephales promelas, 7 mo, female, SAMN14246282)	48,514,750	82%	32%	204,503
SAMN14246283	NA	kidney (Pimephales promelas, 7 mo, female, SAMN14246283)	42,718,908	83%	34%	216,889
SAMN14246284	NA	kidney (Pimephales promelas, 7 mo, female, SAMN14246284)	44,966,814	82%	33%	204,558
SAMN14246285	NA	kidney (Pimephales promelas, 7 mo, female, SAMN14246285)	41,476,064	81%	33%	204,832
SAMN14246286	NA	kidney (Pimephales promelas, 7 mo, female, SAMN14246286)	38,318,494	82%	33%	208,774
SAMN15912945	NA	whole body (Pimephales promelas, 4-days post-hatch, SAMN15912945)	89,468,940	84%	30%	254,171
SAMN15912946	NA	whole body (Pimephales promelas, 4-days post-hatch, SAMN15912946)	91,210,728	85%	30%	264,088
SAMN15912947	NA	whole body (Pimephales promelas, 4-days post-hatch, SAMN15912947)	61,867,084	85%	31%	246,757
SAMN15912948	NA	whole body (Pimephales promelas, 4-days post-hatch, SAMN15912948)	51,715,998	85%	30%	233,980
SAMN15912949	NA	whole body (Pimephales promelas, 4-days post-hatch, SAMN15912949)	45,955,244	85%	30%	235,887
SAMN15912950	NA	whole body (Pimephales promelas, 4-days post-hatch, SAMN15912950)	67,668,054	84%	30%	240,014
SAMN15912951	NA	whole body (Pimephales promelas, 4-days post-hatch, SAMN15912951)	65,549,134	85%	31%	252,441
SAMN15912952	NA	whole body (Pimephales promelas, 4-days post-hatch, SAMN15912952)	67,199,696	86%	32%	251,904
SAMN15912953	NA	whole body (Pimephales promelas, 4-days post-hatch, SAMN15912953)	49,856,710	85%	31%	237,081
SAMN15912954	NA	whole body (Pimephales promelas, 4-days post-hatch, SAMN15912954)	47,874,974	82%	25%	225,470
SAMN15912961	NA	whole body (Pimephales promelas, 4-days post-hatch, SAMN15912961)	43,765,934	83%	28%	217,562
SAMN15912962	NA	whole body (Pimephales promelas, 4-days post-hatch, SAMN15912962)	45,716,538	83%	27%	225,016
SAMN15912963	NA	whole body (Pimephales promelas, 4-days post-hatch, SAMN15912963)	61,300,204	84%	29%	237,311
SAMN15912964	NA	whole body (Pimephales promelas, 4-days post-hatch, SAMN15912964)	50,967,058	84%	30%	230,951
SAMN15912965	NA	whole body (Pimephales promelas, 4-days post-hatch, SAMN15912965)	42,094,734	83%	27%	222,739

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
SRR1582202	SRX700303	SRP047077	SAMN03024265	470,813,940	82%	36%
SRR3680166	SRX1853000	SRP076715	SAMN05258326	6,897,244	82%	21%
SRR3680167	SRX1853001	SRP076715	SAMN05258327	10,381,616	82%	24%
SRR3680179	SRX1853013	SRP076715	SAMN05258328	6,268,096	81%	21%
SRR3680185	SRX1853019	SRP076715	SAMN05258329	10,063,520	82%	22%
SRR3680186	SRX1853020	SRP076715	SAMN05258330	11,560,914	83%	22%
SRR3680187	SRX1853021	SRP076715	SAMN05258331	10,120,502	83%	23%
SRR3680188	SRX1853022	SRP076715	SAMN05258332	7,203,848	81%	21%
SRR3680189	SRX1853023	SRP076715	SAMN05258333	13,190,036	82%	20%
SRR3680190	SRX1853024	SRP076715	SAMN05258334	14,568,992	84%	22%
SRR3680191	SRX1853025	SRP076715	SAMN05258335	16,300,548	82%	21%
SRR3680168	SRX1853002	SRP076715	SAMN05258336	10,893,242	83%	20%
SRR3680169	SRX1853003	SRP076715	SAMN05258337	8,785,198	80%	20%
SRR3680170	SRX1853005	SRP076715	SAMN05258338	7,633,268	83%	22%
SRR3680171	SRX1853006	SRP076715	SAMN05258339	9,305,934	80%	20%
SRR3680172	SRX1853007	SRP076715	SAMN05258340	11,445,054	82%	22%
SRR3680174	SRX1853008	SRP076715	SAMN05258341	8,302,964	81%	20%
SRR3680175	SRX1853009	SRP076715	SAMN05258342	7,539,238	84%	22%
SRR3680176	SRX1853010	SRP076715	SAMN05258343	11,015,484	83%	22%
SRR3680177	SRX1853011	SRP076715	SAMN05258344	8,395,500	55%	25%
SRR3680178	SRX1853012	SRP076715	SAMN05258345	7,606,890	81%	20%
SRR3680180	SRX1853014	SRP076715	SAMN05258346	11,424,946	84%	20%
SRR3680181	SRX1853016	SRP076715	SAMN05258347	6,188,140	80%	20%
SRR3680182	SRX1853017	SRP076715	SAMN05258348	11,030,704	82%	21%
SRR3680183	SRX1853018	SRP076715	SAMN05258349	7,702,164	84%	21%
SRR7822513	SRX4673680	SRP161579	SAMN10038688	71,150,986	86%	34%
SRR7822524	SRX4673691	SRP161579	SAMN10038693	56,951,612	83%	24%
SRR7822523	SRX4673690	SRP161579	SAMN10038694	69,472,920	83%	23%
SRR7822522	SRX4673689	SRP161579	SAMN10038695	64,457,618	83%	24%
SRR7822521	SRX4673688	SRP161579	SAMN10038696	63,717,568	86%	37%
SRR7822520	SRX4673687	SRP161579	SAMN10038697	71,968,658	83%	37%
SRR7822519	SRX4673686	SRP161579	SAMN10038698	69,650,108	86%	38%
SRR7822518	SRX4673685	SRP161579	SAMN10038699	61,704,660	83%	24%
SRR7822517	SRX4673684	SRP161579	SAMN10038700	71,497,814	83%	22%
SRR7822516	SRX4673683	SRP161579	SAMN10038701	69,741,032	82%	24%
SRR7822515	SRX4673682	SRP161579	SAMN10038702	66,563,164	84%	31%
SRR7822514	SRX4673681	SRP161579	SAMN10038703	70,648,592	85%	33%
SRR8979734	SRX5759099	SRP193989	SAMN11518933	32,439,142	84%	24%
SRR8979735	SRX5759098	SRP193989	SAMN11518934	31,705,407	82%	24%
SRR8979736	SRX5759097	SRP193989	SAMN11518935	27,784,270	83%	25%
SRR8979737	SRX5759096	SRP193989	SAMN11518936	31,218,769	82%	23%
SRR8979746	SRX5759087	SRP193989	SAMN11518937	24,847,430	85%	25%
SRR8979742	SRX5759091	SRP193989	SAMN11518938	40,312,083	83%	25%
SRR8979744	SRX5759089	SRP193989	SAMN11518939	33,946,554	84%	25%
SRR8979747	SRX5759086	SRP193989	SAMN11518940	31,994,503	84%	23%
SRR8979733	SRX5759100	SRP193989	SAMN11518941	29,785,633	84%	25%
SRR8979732	SRX5759101	SRP193989	SAMN11518941	39,802,315	83%	25%
SRR8979738	SRX5759095	SRP193989	SAMN11518942	30,729,520	84%	25%
SRR8979739	SRX5759094	SRP193989	SAMN11518944	39,493,527	85%	25%
SRR8979740	SRX5759093	SRP193989	SAMN11518945	32,373,994	83%	24%
SRR8979741	SRX5759092	SRP193989	SAMN11518946	39,254,912	83%	24%
SRR8979743	SRX5759090	SRP193989	SAMN11518947	36,127,298	84%	25%
SRR8979745	SRX5759088	SRP193989	SAMN11518948	35,335,107	84%	24%
SRR11197916	SRX7817852	SRP251038	SAMN14246271	44,795,140	80%	32%
SRR11197915	SRX7817851	SRP251038	SAMN14246272	39,319,742	83%	33%
SRR11197914	SRX7817850	SRP251038	SAMN14246273	44,065,724	81%	33%
SRR11197913	SRX7817849	SRP251038	SAMN14246274	38,769,508	81%	31%
SRR11197912	SRX7817848	SRP251038	SAMN14246275	41,726,668	82%	32%
SRR11197911	SRX7817847	SRP251038	SAMN14246276	43,004,890	83%	32%
SRR11197910	SRX7817846	SRP251038	SAMN14246277	44,122,248	82%	32%
SRR11197909	SRX7817845	SRP251038	SAMN14246278	40,649,084	81%	32%
SRR11197908	SRX7817844	SRP251038	SAMN14246279	40,728,232	82%	33%
SRR11197907	SRX7817843	SRP251038	SAMN14246280	41,660,456	86%	34%
SRR11197906	SRX7817842	SRP251038	SAMN14246281	45,279,830	81%	32%
SRR11197905	SRX7817841	SRP251038	SAMN14246282	48,514,750	82%	32%
SRR11197904	SRX7817840	SRP251038	SAMN14246283	42,718,908	83%	34%
SRR11197903	SRX7817839	SRP251038	SAMN14246284	44,966,814	82%	33%
SRR11197902	SRX7817838	SRP251038	SAMN14246285	41,476,064	81%	33%
SRR11197901	SRX7817837	SRP251038	SAMN14246286	38,318,494	82%	33%
SRR12521616	SRX9011869	SRP278831	SAMN15912945	89,468,940	84%	30%
SRR12521615	SRX9011868	SRP278831	SAMN15912946	91,210,728	85%	30%
SRR12521614	SRX9011867	SRP278831	SAMN15912947	61,867,084	85%	31%
SRR12521613	SRX9011866	SRP278831	SAMN15912948	51,715,998	85%	30%
SRR12521612	SRX9011865	SRP278831	SAMN15912949	45,955,244	85%	30%
SRR12521611	SRX9011864	SRP278831	SAMN15912950	67,668,054	84%	30%
SRR12521610	SRX9011863	SRP278831	SAMN15912951	65,549,134	85%	31%
SRR12521609	SRX9011862	SRP278831	SAMN15912952	67,199,696	86%	32%
SRR12521608	SRX9011861	SRP278831	SAMN15912953	49,856,710	85%	31%
SRR12521607	SRX9011860	SRP278831	SAMN15912954	47,874,974	82%	25%
SRR12521621	SRX9011874	SRP278831	SAMN15912961	43,765,934	83%	28%
SRR12521620	SRX9011873	SRP278831	SAMN15912962	45,716,538	83%	27%
SRR12521619	SRX9011872	SRP278831	SAMN15912963	61,300,204	84%	29%
SRR12521618	SRX9011871	SRP278831	SAMN15912964	50,967,058	84%	30%
SRR12521617	SRX9011870	SRP278831	SAMN15912965	42,094,734	83%	27%

Protein alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Cynoglossus semilaevis high-quality model RefSeq (XP_)	14,331	13,833 (96.53%)	13,833 (96.53%)	67.59%	74.99%
Actinopterygii GenBank	87,253	71,061 (81.44%)	71,061 (81.44%)	69.50%	80.79%
Actinopterygii known RefSeq (NP_)	25,473	6,861 (26.93%)	6,861 (26.93%)	71.18%	82.28%
Danio rerio high-quality model RefSeq (XP_)	7,718	7,386 (95.70%)	7,386 (95.70%)	70.34%	79.66%
Xiphophorus maculatus high-quality model RefSeq (XP_)	18,457	17,588 (95.29%)	17,588 (95.29%)	65.97%	73.86%
Oryzias latipes high-quality model RefSeq (XP_)	17,157	16,442 (95.83%)	16,442 (95.83%)	66.52%	73.89%
Oreochromis niloticus high-quality model RefSeq (XP_)	19,546	18,540 (94.85%)	18,540 (94.85%)	65.32%	74.06%
Homo sapiens known RefSeq (NP_)	61,383	39,483 (64.32%)	39,483 (64.32%)	66.33%	68.08%

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20

RefSeq

Integrated reference sequences