NCBI Salvelinus alpinus Annotation Release 101

The RefSeq genome records for Salvelinus alpinus were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. This report presents statistics on the annotation products, the input data used in the pipeline and intermediate alignment results.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction
Similarity of current and previous assembly: The similarity of the current and previous assembly
Comparison of the current and previous annotations: What proportion of the genes changed in this annotation

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as NCBI Salvelinus alpinus Annotation Release 101

Annotation release ID: 101
Date of Entrez queries for transcripts and proteins: Feb 18 2018
Date of submission of annotation to the public databases: Feb 25 2018
Software version: 8.0

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
ASM291031v2	GCF_002910315.2	University of Victoria	02-12-2018	Reference	40 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	ASM291031v2
Genes and pseudogenes	46,775
protein-coding	36,435
non-coding	5,908
transcribed pseudogenes	7
non-transcribed pseudogenes	4,329
genes with variants	11,672
immunoglobulin/T-cell receptor gene segments	96
other	0
mRNAs	59,926
fully-supported	53,632
with > 5% ab initio	2,774
partial	3,554
with filled gap(s)	2,123
known RefSeq (NM_)	0
model RefSeq (XM_)	59,926
non-coding RNAs	7,878
fully-supported	6,419
with > 5% ab initio	0
partial	6
with filled gap(s)	5
known RefSeq (NR_)	0
model RefSeq (XR_)	7,260
pseudo transcripts	10
fully-supported	9
with > 5% ab initio	0
partial	0
with filled gap(s)	2
known RefSeq (NR_)	0
model RefSeq (XR_)	10
CDSs	60,035
fully-supported	53,632
with > 5% ab initio	3,329
partial	3,429
with major correction(s)	3,801
known RefSeq (NP_)	13
model RefSeq (XP_)	59,926

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	42,343	20,856	8,534	52	1,265,589
All transcripts	67,804	2,682	2,223	52	86,429
mRNA	59,926	2,911	2,418	120	86,429
misc_RNA	933	2,284	1,874	125	14,400
tRNA	616	74	73	67	85
lncRNA	5,486	940	680	85	6,796
snoRNA	549	117	126	54	315
snRNA	269	104	112	52	200
guide_RNA	18	198	154	83	376
rRNA	7	485	154	152	1,680
Single-exon transcripts	1,341	1,779	1,555	220	8,960
coding transcripts (NM_/XM_ )	1,341	1,779	1,555	220	8,960
CDSs	59,939	1,796	1,335	96	85,197
Exons	386,826	278	139	1	17,535
in coding transcripts (NM_/XM_ )	369,460	278	139	1	17,535
in non-coding transcripts (NR_/XR_ )	22,260	259	127	2	6,912
Introns	341,103	2,529	397	30	1,107,161
in coding transcripts (NM_/XM_ )	329,111	2,531	398	30	1,107,161
in non-coding transcripts (NR_/XR_ )	16,737	2,338	365	30	150,477

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	1.61	1	1	47
Number of exons per transcript	10.39	8	1	211

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the UniProtKB/Swiss-Prot curated proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 36422 coding genes, 33866 genes had a protein with an alignment covering 50% or more of the query and 15143 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: UniProtKB/Swiss-Prot curated proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker for each assembly. RepeatMasker results are only used for organisms for which a comprehensive repeat library is available.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with RepeatMasker	% Masked with WindowMasker
ASM291031v2	GCF_002910315.2	3.78%	43.33%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez, aligned to the genome by Splign or ProSplign and passed to Gnomon, NCBI's gene prediction software.

Depending on the other evidence available, long 454 reads (with average length above 250 nt) may be aligned as traditional evidence and reported in the Transcript alignments section or aligned with RNA-Seq reads and reported in the RNA-Seq alignments section.

Transcript alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species Genbank	194	191 (98.45%)	93 (47.94%)	98.28%	90.81%
Same-species EST	63	55 (87.30%)	51 (80.95%)	98.60%	98.56%
Salmo salar known RefSeq (NM_/NR_)	3,559	3,494 (98.17%)	2,159 (60.66%)	95.71%	96.24%

RNA-Seq alignments

The following RNA-Seq reads from the Sequence Read Archive were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Publication	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	NA	Aggregate of all aligned samples	5,807,429,185	77%	29%	425,111
SAMEA3367620	NA	E-MTAB-3522:charr454 (Salvelinus alpinus, SAMEA3367620)	2,092,607	0%	35%	809
SAMN02212483	NA	gill (Salvelinus alpinus, SAMN02212483)	417,028,962	74%	27%	347,922
SAMN04480811	NA	juvenile, liver, 8C degrees (Salvelinus alpinus, 1 year, male, SAMN04480811)	232,625,572	72%	26%	275,367
SAMN04480812	NA	juvenile, liver, 8C degrees (Salvelinus alpinus, 1 year, male, SAMN04480812)	218,821,908	72%	26%	276,906
SAMN04480813	NA	juvenile, liver, 8C degrees (Salvelinus alpinus, 1 year, male, SAMN04480813)	373,618,192	71%	26%	288,329
SAMN04480816	NA	juvenile, liver, 15C degrees (Salvelinus alpinus, 1 year, male, SAMN04480816)	218,789,806	72%	28%	273,466
SAMN04480818	NA	juvenile, liver, 15C degrees (Salvelinus alpinus, 1 year, male, SAMN04480818)	250,157,442	71%	27%	273,966
SAMN04480821	NA	juvenile, liver, 15C degrees (Salvelinus alpinus, 1 year, male, SAMN04480821)	272,789,654	72%	27%	278,886
SAMN06561753	12411599	Juvenile, Eye (Salvelinus alpinus, 1 year, SAMN06561753)	86,499,084	87%	28%	287,237
SAMN06561754	12411599	Juvenile, Heart (Salvelinus alpinus, 1 year, SAMN06561754)	97,207,250	78%	28%	261,793
SAMN06561755	12411599	Juvenile, Spleen (Salvelinus alpinus, 1 year, SAMN06561755)	74,929,414	88%	29%	246,659
SAMN06561756	12411599	Juvenile, Gut (Salvelinus alpinus, 1 year, SAMN06561756)	99,255,342	83%	27%	274,616
SAMN06561757	12411599	Juvenile, Muscle (Salvelinus alpinus, 1 year, SAMN06561757)	86,422,308	89%	39%	211,908
SAMN06561758	12411599	Juvenile, Brain (Salvelinus alpinus, 1 year, SAMN06561758)	92,546,686	81%	20%	307,936
SAMN06561759	12411599	Juvenile, Hind Kidney (Salvelinus alpinus, 1 year, SAMN06561759)	81,049,832	82%	32%	272,535
SAMN06561760	12411599	Juvenile, Head Kidney (Salvelinus alpinus, 1 year, SAMN06561760)	106,646,706	88%	30%	269,424
SAMN06561761	12411599	Juvenile, Stomach (Salvelinus alpinus, 1 year, SAMN06561761)	81,380,992	83%	36%	235,131
SAMN06561762	12411599	Juvenile, Gill (Salvelinus alpinus, 1 year, SAMN06561762)	79,283,896	83%	25%	294,964
SAMN06561763	12411599	Juvenile, Gonad (Salvelinus alpinus, 1 year, SAMN06561763)	97,777,990	86%	30%	278,981
SAMN06561764	12411599	Juvenile, Liver (Salvelinus alpinus, 1 year, SAMN06561764)	101,871,030	78%	28%	230,848
SAMN07271198	NA	140ts, whole embryo, LB (Salvelinus alpinus, SAMN07271198)	55,403,922	79%	32%	319,582
SAMN07271199	NA	140ts, whole embryo, LB (Salvelinus alpinus, SAMN07271199)	29,147,914	78%	29%	272,320
SAMN07271200	NA	140ts, whole embryo, LB (Salvelinus alpinus, SAMN07271200)	47,543,062	79%	31%	296,833
SAMN07271201	NA	150ts, whole embryo, LB (Salvelinus alpinus, SAMN07271201)	62,946,070	78%	31%	319,963
SAMN07271202	NA	150ts, whole embryo, LB (Salvelinus alpinus, SAMN07271202)	40,033,976	76%	28%	286,981
SAMN07271203	NA	150ts, whole embryo, LB (Salvelinus alpinus, SAMN07271203)	60,506,648	78%	29%	292,529
SAMN07271204	NA	160ts, whole embryo, LB (Salvelinus alpinus, SAMN07271204)	29,963,128	77%	24%	208,960
SAMN07271205	NA	160ts, whole embryo, LB (Salvelinus alpinus, SAMN07271205)	13,227,254	71%	23%	157,156
SAMN07271206	NA	160ts, whole embryo, LB (Salvelinus alpinus, SAMN07271206)	8,996,806	78%	22%	132,738
SAMN07271207	NA	170ts, whole embryo, LB (Salvelinus alpinus, SAMN07271207)	57,268,230	80%	33%	250,581
SAMN07271208	NA	170ts, whole embryo, LB (Salvelinus alpinus, SAMN07271208)	109,336,712	79%	34%	328,026
SAMN07271209	NA	170ts, whole embryo, LB (Salvelinus alpinus, SAMN07271209)	44,713,506	80%	32%	251,497
SAMN07271210	NA	200ts, whole embryo, LB (Salvelinus alpinus, SAMN07271210)	33,402,656	78%	31%	309,520
SAMN07271211	NA	200ts, whole embryo, LB (Salvelinus alpinus, SAMN07271211)	17,957,204	79%	32%	265,414
SAMN07271212	NA	200ts, whole embryo, PL (Salvelinus alpinus, SAMN07271212)	40,995,250	75%	33%	321,914
SAMN07271213	NA	100ts, whole embryo, PL (Salvelinus alpinus, SAMN07271213)	33,349,770	77%	31%	269,801
SAMN07271214	NA	100ts, whole embryo, PL (Salvelinus alpinus, SAMN07271214)	18,577,358	78%	31%	246,945
SAMN07271215	NA	100ts, whole embryo, PL (Salvelinus alpinus, SAMN07271215)	81,941,306	79%	33%	321,875
SAMN07271216	NA	140ts, whole embryo, PL (Salvelinus alpinus, SAMN07271216)	74,332,302	78%	29%	322,333
SAMN07271217	NA	140ts, whole embryo, PL (Salvelinus alpinus, SAMN07271217)	13,039,762	77%	30%	229,599
SAMN07271218	NA	140ts, whole embryo, PL (Salvelinus alpinus, SAMN07271218)	53,673,074	77%	30%	319,114
SAMN07271219	NA	150ts, whole embryo, PL (Salvelinus alpinus, SAMN07271219)	55,030,990	79%	32%	327,672
SAMN07271220	NA	150ts, whole embryo, PL (Salvelinus alpinus, SAMN07271220)	43,145,458	76%	30%	295,352
SAMN07271221	NA	150ts, whole embryo, PL (Salvelinus alpinus, SAMN07271221)	62,553,978	78%	31%	321,868
SAMN07271222	NA	160ts, whole embryo, PL (Salvelinus alpinus, SAMN07271222)	39,439,310	77%	28%	262,226
SAMN07271223	NA	160ts, whole embryo, PL (Salvelinus alpinus, SAMN07271223)	93,737,912	74%	25%	310,536
SAMN07271224	NA	160ts, whole embryo, PL (Salvelinus alpinus, SAMN07271224)	39,166,714	76%	26%	279,658
SAMN07271225	NA	170ts, whole embryo, PL (Salvelinus alpinus, SAMN07271225)	87,658,766	77%	31%	336,919
SAMN07271226	NA	170ts, whole embryo, PL (Salvelinus alpinus, SAMN07271226)	61,224,550	76%	30%	316,802
SAMN07271227	NA	170ts, whole embryo, PL (Salvelinus alpinus, SAMN07271227)	27,075,136	78%	31%	275,481
SAMN07271228	NA	100ts, whole embryo, SB (Salvelinus alpinus, SAMN07271228)	41,706,144	78%	32%	286,768
SAMN07271229	NA	100ts, whole embryo, SB (Salvelinus alpinus, SAMN07271229)	22,146,328	78%	30%	241,773
SAMN07271230	NA	100ts, whole embryo, PL (Salvelinus alpinus, SAMN07271230)	38,871,972	75%	30%	275,623
SAMN07271231	NA	140ts, whole embryo, PL (Salvelinus alpinus, SAMN07271231)	22,068,492	78%	29%	244,965
SAMN07271232	NA	140ts, whole embryo, SB (Salvelinus alpinus, SAMN07271232)	71,445,658	77%	30%	327,056
SAMN07271233	NA	140ts, whole embryo, SB (Salvelinus alpinus, SAMN07271233)	42,106,718	78%	30%	303,973
SAMN07271234	NA	150ts, whole embryo, SB (Salvelinus alpinus, SAMN07271234)	39,010,130	81%	35%	302,606
SAMN07271235	NA	150ts, whole embryo, SB (Salvelinus alpinus, SAMN07271235)	88,079,312	76%	32%	334,601
SAMN07271236	NA	150ts, whole embryo, SB (Salvelinus alpinus, SAMN07271236)	34,824,224	77%	31%	299,779
SAMN07271237	NA	160ts, whole embryo, SB (Salvelinus alpinus, SAMN07271237)	60,862,602	80%	33%	323,770
SAMN07271238	NA	160ts, whole embryo, SB (Salvelinus alpinus, SAMN07271238)	57,541,386	74%	32%	318,736
SAMN07271239	NA	160ts, whole embryo, SB (Salvelinus alpinus, SAMN07271239)	56,170,318	79%	32%	324,670
SAMN07271240	NA	170ts, whole embryo, SB (Salvelinus alpinus, SAMN07271240)	63,171,246	79%	34%	331,226
SAMN07271241	NA	170ts, whole embryo, SB (Salvelinus alpinus, SAMN07271241)	34,971,494	78%	35%	309,616
SAMN07271242	NA	170ts, whole embryo, PL (Salvelinus alpinus, SAMN07271242)	41,762,482	80%	35%	302,653
SAMN07271243	NA	200ts, whole embryo, SB (Salvelinus alpinus, SAMN07271243)	29,538,958	82%	33%	200,200
SAMN07271244	NA	200ts, whole embryo, SB (Salvelinus alpinus, SAMN07271244)	22,714,302	81%	33%	247,051
SAMN07271245	NA	200ts, whole embryo, SB (Salvelinus alpinus, SAMN07271245)	173,788,606	73%	30%	366,811
SAMN08095629	NA	juvenile, whole organism (Salvelinus alpinus, SAMN08095629)	59,701,318	76%	22%	289,071
SAMN08095630	NA	juvenile, whole organism (Salvelinus alpinus, SAMN08095630)	25,890,200	78%	22%	238,471
SAMN08095631	NA	juvenile, whole organism (Salvelinus alpinus, SAMN08095631)	42,145,742	77%	22%	271,646
SAMN08095632	NA	juvenile, whole organism (Salvelinus alpinus, SAMN08095632)	18,842,048	77%	22%	218,982
SAMN08095633	NA	juvenile, whole organism (Salvelinus alpinus, SAMN08095633)	50,900,354	76%	21%	280,939
SAMN08095634	NA	juvenile, whole organism (Salvelinus alpinus, SAMN08095634)	60,079,300	77%	22%	294,587
SAMN08095635	NA	juvenile, whole organism (Salvelinus alpinus, SAMN08095635)	45,703,420	75%	22%	273,487
SAMN08095636	NA	juvenile, whole organism (Salvelinus alpinus, SAMN08095636)	57,203,034	77%	22%	282,118

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
ERR868014	ERX947615	ERP010358	SAMEA3367620	2,092,607	0%	35%
SRR921553	SRX314606	SRP026259	SAMN02212483	21,361,470	71%	27%
SRR921554	SRX314607	SRP026259	SAMN02212483	22,556,208	73%	26%
SRR921555	SRX314608	SRP026259	SAMN02212483	27,548,248	75%	29%
SRR921556	SRX314634	SRP026259	SAMN02212483	21,352,958	73%	27%
SRR921557	SRX314635	SRP026259	SAMN02212483	21,775,936	74%	26%
SRR921558	SRX314636	SRP026259	SAMN02212483	22,027,854	74%	27%
SRR921559	SRX314637	SRP026259	SAMN02212483	19,088,936	74%	27%
SRR921662	SRX314639	SRP026259	SAMN02212483	19,860,852	75%	27%
SRR921663	SRX314641	SRP026259	SAMN02212483	22,838,874	74%	27%
SRR921664	SRX314642	SRP026259	SAMN02212483	24,107,078	72%	23%
SRR921665	SRX314643	SRP026259	SAMN02212483	20,347,770	74%	28%
SRR921666	SRX314644	SRP026259	SAMN02212483	24,196,058	76%	27%
SRR921667	SRX314645	SRP026259	SAMN02212483	22,396,954	74%	27%
SRR921668	SRX314646	SRP026259	SAMN02212483	27,090,472	74%	27%
SRR921669	SRX314647	SRP026259	SAMN02212483	25,161,108	74%	27%
SRR921670	SRX314648	SRP026259	SAMN02212483	26,439,102	71%	28%
SRR921671	SRX314649	SRP026259	SAMN02212483	22,846,844	72%	27%
SRR921672	SRX314650	SRP026259	SAMN02212483	26,032,240	74%	26%
SRR3214131	SRX1621553	SRP068854	SAMN04480811	84,823,238	72%	26%
SRR3214132	SRX1621554	SRP068854	SAMN04480811	79,367,158	73%	26%
SRR3214133	SRX1621555	SRP068854	SAMN04480811	68,435,176	72%	26%
SRR3214134	SRX1621556	SRP068854	SAMN04480812	75,489,414	72%	26%
SRR3214135	SRX1621557	SRP068854	SAMN04480812	73,712,566	72%	26%
SRR3214136	SRX1621558	SRP068854	SAMN04480812	69,619,928	73%	27%
SRR3214137	SRX1621559	SRP068854	SAMN04480813	120,175,232	72%	26%
SRR3214138	SRX1621560	SRP068854	SAMN04480813	81,982,310	70%	24%
SRR3214139	SRX1621561	SRP068854	SAMN04480813	88,349,858	71%	26%
SRR3214140	SRX1621562	SRP068854	SAMN04480813	83,110,792	71%	25%
SRR3214141	SRX1621563	SRP068854	SAMN04480816	72,319,924	72%	28%
SRR3214142	SRX1621564	SRP068854	SAMN04480816	69,718,912	72%	27%
SRR3214143	SRX1621565	SRP068854	SAMN04480816	76,750,970	72%	27%
SRR3214144	SRX1621566	SRP068854	SAMN04480818	78,547,558	72%	27%
SRR3214145	SRX1621567	SRP068854	SAMN04480818	84,207,778	71%	27%
SRR3214146	SRX1621568	SRP068854	SAMN04480818	87,402,106	70%	27%
SRR3214147	SRX1621569	SRP068854	SAMN04480821	54,023,084	72%	26%
SRR3214148	SRX1621570	SRP068854	SAMN04480821	80,490,278	72%	27%
SRR3214149	SRX1621571	SRP068854	SAMN04480821	83,349,614	72%	27%
SRR3214150	SRX1621572	SRP068854	SAMN04480821	54,926,678	70%	26%
SRR5337673	SRX2635059	SRP101753	SAMN06561753	86,499,084	87%	28%
SRR5337672	SRX2635058	SRP101753	SAMN06561754	97,207,250	78%	28%
SRR5337671	SRX2635057	SRP101753	SAMN06561755	74,929,414	88%	29%
SRR5337670	SRX2635056	SRP101753	SAMN06561756	99,255,342	83%	27%
SRR5337669	SRX2635055	SRP101753	SAMN06561757	86,422,308	89%	39%
SRR5337668	SRX2635054	SRP101753	SAMN06561758	92,546,686	81%	20%
SRR5337667	SRX2635053	SRP101753	SAMN06561759	81,049,832	82%	32%
SRR5337666	SRX2635052	SRP101753	SAMN06561760	106,646,706	88%	30%
SRR5337665	SRX2635051	SRP101753	SAMN06561761	81,380,992	83%	36%
SRR5337664	SRX2635050	SRP101753	SAMN06561762	79,283,896	83%	25%
SRR5337663	SRX2635049	SRP101753	SAMN06561763	97,777,990	86%	30%
SRR5337662	SRX2635048	SRP101753	SAMN06561764	101,871,030	78%	28%
SRR5759543	SRX2959415	SRP110568	SAMN07271198	55,403,922	79%	32%
SRR5759542	SRX2959416	SRP110568	SAMN07271199	29,147,914	78%	29%
SRR5759545	SRX2959413	SRP110568	SAMN07271200	47,543,062	79%	31%
SRR5759544	SRX2959414	SRP110568	SAMN07271201	62,946,070	78%	31%
SRR5759547	SRX2959411	SRP110568	SAMN07271202	40,033,976	76%	28%
SRR5759546	SRX2959412	SRP110568	SAMN07271203	60,506,648	78%	29%
SRR5759549	SRX2959409	SRP110568	SAMN07271204	29,963,128	77%	24%
SRR5759548	SRX2959410	SRP110568	SAMN07271205	13,227,254	71%	23%
SRR5759551	SRX2959407	SRP110568	SAMN07271206	8,996,806	78%	22%
SRR5759550	SRX2959408	SRP110568	SAMN07271207	57,268,230	80%	33%
SRR5759563	SRX2959395	SRP110568	SAMN07271208	109,336,712	79%	34%
SRR5759562	SRX2959396	SRP110568	SAMN07271209	44,713,506	80%	32%
SRR5759561	SRX2959397	SRP110568	SAMN07271210	33,402,656	78%	31%
SRR5759560	SRX2959398	SRP110568	SAMN07271211	17,957,204	79%	32%
SRR5759567	SRX2959391	SRP110568	SAMN07271212	40,995,250	75%	33%
SRR5759566	SRX2959392	SRP110568	SAMN07271213	33,349,770	77%	31%
SRR5759565	SRX2959393	SRP110568	SAMN07271214	18,577,358	78%	31%
SRR5759564	SRX2959394	SRP110568	SAMN07271215	81,941,306	79%	33%
SRR5759555	SRX2959403	SRP110568	SAMN07271216	74,332,302	78%	29%
SRR5759554	SRX2959404	SRP110568	SAMN07271217	13,039,762	77%	30%
SRR5759573	SRX2959385	SRP110568	SAMN07271218	53,673,074	77%	30%
SRR5759574	SRX2959384	SRP110568	SAMN07271219	55,030,990	79%	32%
SRR5759571	SRX2959387	SRP110568	SAMN07271220	43,145,458	76%	30%
SRR5759572	SRX2959386	SRP110568	SAMN07271221	62,553,978	78%	31%
SRR5759577	SRX2959381	SRP110568	SAMN07271222	39,439,310	77%	28%
SRR5759578	SRX2959380	SRP110568	SAMN07271223	93,737,912	74%	25%
SRR5759575	SRX2959383	SRP110568	SAMN07271224	39,166,714	76%	26%
SRR5759576	SRX2959382	SRP110568	SAMN07271225	87,658,766	77%	31%
SRR5759579	SRX2959379	SRP110568	SAMN07271226	61,224,550	76%	30%
SRR5759580	SRX2959378	SRP110568	SAMN07271227	27,075,136	78%	31%
SRR5759553	SRX2959405	SRP110568	SAMN07271228	41,706,144	78%	32%
SRR5759552	SRX2959406	SRP110568	SAMN07271229	22,146,328	78%	30%
SRR5759570	SRX2959388	SRP110568	SAMN07271230	38,871,972	75%	30%
SRR5759569	SRX2959389	SRP110568	SAMN07271231	22,068,492	78%	29%
SRR5759557	SRX2959401	SRP110568	SAMN07271232	71,445,658	77%	30%
SRR5759556	SRX2959402	SRP110568	SAMN07271233	42,106,718	78%	30%
SRR5759559	SRX2959399	SRP110568	SAMN07271234	39,010,130	81%	35%
SRR5759558	SRX2959400	SRP110568	SAMN07271235	88,079,312	76%	32%
SRR5759589	SRX2959369	SRP110568	SAMN07271236	34,824,224	77%	31%
SRR5759585	SRX2959373	SRP110568	SAMN07271237	60,862,602	80%	33%
SRR5759581	SRX2959377	SRP110568	SAMN07271238	57,541,386	74%	32%
SRR5759582	SRX2959376	SRP110568	SAMN07271239	56,170,318	79%	32%
SRR5759583	SRX2959375	SRP110568	SAMN07271240	63,171,246	79%	34%
SRR5759584	SRX2959374	SRP110568	SAMN07271241	34,971,494	78%	35%
SRR5759568	SRX2959390	SRP110568	SAMN07271242	41,762,482	80%	35%
SRR5759586	SRX2959372	SRP110568	SAMN07271243	29,538,958	82%	33%
SRR5759587	SRX2959371	SRP110568	SAMN07271244	22,714,302	81%	33%
SRR5759588	SRX2959370	SRP110568	SAMN07271245	173,788,606	73%	30%
SRR6321825	SRX3421619	SRP125593	SAMN08095629	59,701,318	76%	22%
SRR6321826	SRX3421618	SRP125593	SAMN08095630	25,890,200	78%	22%
SRR6321813	SRX3421631	SRP125593	SAMN08095631	42,145,742	77%	22%
SRR6321814	SRX3421630	SRP125593	SAMN08095632	18,842,048	77%	22%
SRR6321811	SRX3421633	SRP125593	SAMN08095633	50,900,354	76%	21%
SRR6321812	SRX3421632	SRP125593	SAMN08095634	60,079,300	77%	22%
SRR6321809	SRX3421635	SRP125593	SAMN08095635	45,703,420	75%	22%
SRR6321810	SRX3421634	SRP125593	SAMN08095636	57,203,034	77%	22%

Protein alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Actinopterygii GenBank	79,653	76,223 (95.69%)	76,223 (95.69%)	71.50%	80.92%
Actinopterygii known RefSeq (NP_)	24,797	23,760 (95.82%)	23,760 (95.82%)	70.71%	79.28%
Same-species GenBank	80	80 (100.00%)	80 (100.00%)	76.87%	84.38%
Homo sapiens known RefSeq (NP_)	50,095	42,568 (84.97%)	42,568 (84.97%)	65.93%	68.47%

Assembly-assembly alignments of current to previous assembly

When the assembly changes between two rounds of annotation, genes in the current and the previous annotation are mapped to each other using the genomic alignments of the current assembly to the previous assembly so that gene identifiers can be preserved. The success of the remapping depends largely on how well the two assembly versions align to each other.

Below are the percent coverage of one assembly by the other and the average percent identity of the alignments. The 'First pass' alignments are reciprocal best hits, while the 'Total' alignments also include 'Second pass' or non-reciprocal best alignments. For more information about the assembly-assembly alignment process, please visit the NCBI Genome Remapping Service page.

First Pass	Total
PPUY01 (Current) Coverage: 72.53%	PPUY01 (Current) Coverage: 73.95%
PPUY01 (Previous) Coverage: 100.00%	PPUY01 (Previous) Coverage: 100.00%
Percent Identity: 100.00%	Percent Identity: 99.80%

Comparison of the current and previous annotations

The annotation produced for this release (101) was compared to the annotation in the previous release (100) for each assembly annotated in both releases. Scores for current and previous gene and transcript features were calculated based on overlap in exon sequence and matches in exon boundaries. Pairs of current and previous features were categorized based on these scores, whether they are reciprocal best matches, and changes in attributes (gene biotype, completeness, etc.). If the assembly was updated between the two releases, alignments between the current and the previous assembly were used to match the current and previous gene and transcript features in mapped regions.

The table below summarizes the changes in the gene set for each assembly as a percent of the number of genes in the current annotation release, and provides links to the details of the comparison in tabular format and in a Genome Workbench project.

	ASM291031v2 (Current) to ASM291031v1 (Previous)
Identical	59%
Minor changes	9%
Major changes	2%
New	29%
Deprecated	1%
Other	<1%
Download the report	tabular, Genome Workbench

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20

RefSeq

Integrated reference sequences