NCBI Periophthalmus magnuspinnatus Annotation Release 100

The RefSeq genome records for Periophthalmus magnuspinnatus were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. This report presents statistics on the annotation products, the input data used in the pipeline and intermediate alignment results.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as NCBI Periophthalmus magnuspinnatus Annotation Release 100

Annotation release ID: 100
Date of Entrez queries for transcripts and proteins: Apr 28 2020
Date of submission of annotation to the public databases: May 1 2020
Software version: 8.4

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
fPerMag1.pri	GCF_009829125.1	Vertebrate Genomes Project	01-03-2020	Reference	26 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	fPerMag1.pri
Genes and pseudogenes	24,742
protein-coding	21,306
non-coding	3,046
transcribed pseudogenes	0
non-transcribed pseudogenes	300
genes with variants	3,711
immunoglobulin/T-cell receptor gene segments	90
other	0
mRNAs	27,498
fully-supported	22,800
with > 5% ab initio	2,398
partial	163
with filled gap(s)	0
known RefSeq (NM_)	0
model RefSeq (XM_)	27,498
non-coding RNAs	3,245
fully-supported	1,418
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	2,006
pseudo transcripts	0
fully-supported	0
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	0
CDSs	27,601
fully-supported	22,800
with > 5% ab initio	2,595
partial	174
with major correction(s)	529
known RefSeq (NP_)	0
model RefSeq (XP_)	27,511

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	24,352	16,994	6,835	53	1,150,816
All transcripts	30,743	2,245	1,726	53	97,023
mRNA	27,498	2,476	1,896	123	97,023
misc_RNA	159	2,542	1,860	185	13,674
tRNA	1,237	74	73	65	84
lncRNA	1,259	273	165	70	4,134
snoRNA	177	121	118	65	313
snRNA	211	142	141	53	194
guide_RNA	6	227	275	131	387
rRNA	196	189	119	119	3,927
Single-exon transcripts	948	1,321	1,122	241	4,897
coding transcripts (NM_/XM_ )	948	1,321	1,122	241	4,897
CDSs	27,511	1,944	1,440	102	95,781
Exons	244,856	203	131	1	19,957
in coding transcripts (NM_/XM_ )	240,693	205	132	1	19,957
in non-coding transcripts (NR_/XR_ )	5,441	124	72	2	4,827
Introns	221,177	1,741	456	30	922,868
in coding transcripts (NM_/XM_ )	218,375	1,702	451	30	922,868
in non-coding transcripts (NR_/XR_ )	4,073	3,879	1,052	30	400,587

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	1.28	1	1	28
Number of exons per transcript	11.64	9	1	258

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the UniProtKB/Swiss-Prot curated proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 21293 coding genes, 20271 genes had a protein with an alignment covering 50% or more of the query and 10124 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: UniProtKB/Swiss-Prot curated proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker for each assembly. RepeatMasker results are only used for organisms for which a comprehensive repeat library is available.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with RepeatMasker	% Masked with WindowMasker
fPerMag1.pri	GCF_009829125.1	4.43%	37.68%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez, aligned to the genome by Splign or ProSplign and passed to Gnomon, NCBI's gene prediction software.

Depending on the other evidence available, long 454 reads (with average length above 250 nt) may be aligned as traditional evidence and reported in the Transcript alignments section or aligned with RNA-Seq reads and reported in the RNA-Seq alignments section.

Transcript alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species Genbank	23	22 (95.65%)	22 (95.65%)	99.68%	99.75%

RNA-Seq alignments

The following RNA-Seq reads from the Sequence Read Archive were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Publication	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	NA	Aggregate of all aligned samples	2,732,706,250	45%	15%	227,118
SAMD00093050	30516281	brain (Rhinogobius flumineus, SAMD00093050)	12,200,866	43%	46%	147,433
SAMN03339821	26719065	liver (Acanthogobius hasta, six month, pooled male and female, SAMN03339821)	54,490,824	41%	25%	96,048
SAMN04866581	NA	liver (Acanthogobius hasta, pooled male and female, SAMN04866581)	180,698,322	44%	27%	113,903
SAMN04999870	NA	adult, liver, control (Neogobius melanostomus, SAMN04999870)	48,078,612	25%	28%	73,072
SAMN04999871	NA	adult, liver, control (Neogobius melanostomus, SAMN04999871)	50,774,612	25%	29%	79,284
SAMN04999872	NA	adult, liver, control (Neogobius melanostomus, SAMN04999872)	49,234,850	24%	28%	74,212
SAMN04999873	NA	adult, liver, low temperature challenge (Neogobius melanostomus, SAMN04999873)	43,106,456	26%	28%	68,306
SAMN04999874	NA	adult, liver, low temperature challenge (Neogobius melanostomus, SAMN04999874)	49,292,192	25%	28%	72,124
SAMN04999875	NA	adult, liver, low temperature challenge (Neogobius melanostomus, SAMN04999875)	50,503,102	25%	28%	67,482
SAMN04999876	NA	adult, liver, high temperature challenge (Neogobius melanostomus, SAMN04999876)	45,437,896	25%	28%	73,625
SAMN04999877	NA	adult, liver, high temperature challenge (Neogobius melanostomus, SAMN04999877)	45,534,298	26%	30%	77,825
SAMN04999878	NA	adult, liver, high temperature challenge (Neogobius melanostomus, SAMN04999878)	46,438,142	27%	30%	76,593
SAMN04999879	NA	adult, liver, control (Proterorhinus semilunaris, SAMN04999879)	50,089,426	24%	28%	76,852
SAMN04999880	NA	adult, liver, control (Proterorhinus semilunaris, SAMN04999880)	46,489,434	26%	30%	73,658
SAMN04999881	NA	adult, liver, control (Proterorhinus semilunaris, SAMN04999881)	48,734,730	27%	30%	71,814
SAMN04999882	NA	adult, liver, low temperature challenge (Proterorhinus semilunaris, SAMN04999882)	45,701,148	25%	29%	69,735
SAMN04999883	NA	adult, liver, low temperature challenge (Proterorhinus semilunaris, SAMN04999883)	48,287,744	24%	29%	67,712
SAMN04999884	NA	adult, liver, low temperature challenge (Proterorhinus semilunaris, SAMN04999884)	51,647,836	24%	28%	68,587
SAMN04999885	NA	adult, liver, high temperature challenge (Proterorhinus semilunaris, SAMN04999885)	44,317,498	26%	29%	75,114
SAMN04999886	NA	adult, liver, high temperature challenge (Proterorhinus semilunaris, SAMN04999886)	47,038,512	25%	29%	73,415
SAMN04999887	NA	adult, liver, high temperature challenge (Proterorhinus semilunaris, SAMN04999887)	47,596,452	25%	29%	76,534
SAMN06011197	NA	liver, control (Boleophthalmus pectinirostris, SAMN06011197)	49,895,990	43%	23%	111,408
SAMN06011198	NA	liver, ammonia (Boleophthalmus pectinirostris, SAMN06011198)	43,185,250	40%	24%	102,813
SAMN06011199	NA	gill, control (Boleophthalmus pectinirostris, SAMN06011199)	44,530,794	42%	22%	137,228
SAMN06011200	NA	gill, ammonia (Boleophthalmus pectinirostris, SAMN06011200)	46,005,642	47%	25%	138,093
SAMN06011201	NA	adult, liver (Periophthalmodon schlosseri, SAMN06011201)	38,148,996	75%	27%	156,386
SAMN06011202	NA	adult, liver (Periophthalmodon schlosseri, SAMN06011202)	30,319,528	46%	22%	99,993
SAMN06011203	NA	adult, gill (Periophthalmodon schlosseri, SAMN06011203)	27,918,190	49%	24%	100,411
SAMN06011204	NA	adult, gill (Periophthalmodon schlosseri, SAMN06011204)	52,146,586	50%	23%	140,417
SAMN07359291	NA	liver (Mugilogobius chulae, 1 year, pooled male and female, SAMN07359291)	52,364,032	27%	24%	89,382
SAMN07359297	NA	liver (Mugilogobius chulae, 1 year, pooled male and female, SAMN07359297)	53,771,748	28%	24%	90,008
SAMN07519030	NA	gill (Coryphopterus lipernes, SAMN07519030)	59,844,392	27%	22%	105,230
SAMN07519046	NA	gill (Exyrias puntang, SAMN07519046)	51,563,180	26%	23%	68,882
SAMN07519052	NA	gill (Glossogobius aureus, SAMN07519052)	54,790,314	28%	20%	70,728
SAMN07519064	NA	unknown (Istigobius decoratus, SAMN07519064)	76,523,418	30%	30%	66,018
SAMN08131159	NA	adult, retina (Amblygobius phalaena, SAMN08131159)	7,880,692	19%	32%	47,245
SAMN08131160	NA	adult, retina (Amblygobius phalaena, SAMN08131160)	11,693,254	21%	36%	59,554
SAMN08131161	NA	adult, retina (Amblygobius phalaena, SAMN08131161)	13,420,930	23%	37%	64,882
SAMN09476104	NA	Adult, Liver (Boleophthalmus boddarti, male, SAMN09476104)	23,585,591	100%	56%	135,390
SAMN11975155	NA	embryo, embryo (Neogobius melanostomus, 8 cell, SAMN11975155)	47,359,940	69%	2%	61,263
SAMN11975156	NA	embryo, embryo (Neogobius melanostomus, 4 cell, SAMN11975156)	39,232,812	70%	1%	52,602
SAMN11975157	NA	embryo, embryo (Neogobius melanostomus, 32 cell, SAMN11975157)	52,585,528	60%	3%	69,922
SAMN11975158	NA	embryo, embryo (Neogobius melanostomus, 1 cell, SAMN11975158)	95,001,540	80%	0%	56,547
SAMN11975159	NA	embryo, embryo (Neogobius melanostomus, 16-32 cell, SAMN11975159)	79,641,364	73%	1%	65,474
SAMN11975160	NA	embryo, embryo (Neogobius melanostomus, 16-32 cell, SAMN11975160)	101,118,710	80%	0%	56,952
SAMN11975161	NA	embryo, embryo (Neogobius melanostomus, 1-2 cell, SAMN11975161)	99,280,714	80%	0%	53,535
SAMN11975162	NA	embryo, embryo (Neogobius melanostomus, 16 cell, SAMN11975162)	42,728,378	58%	4%	67,730
SAMN11975163	NA	embryo, embryo (Neogobius melanostomus, 2 cell, SAMN11975163)	41,256,782	60%	4%	67,869
SAMN11975164	NA	embryo, embryo (Neogobius melanostomus, 4-8 cell, SAMN11975164)	31,807,612	45%	7%	68,538
SAMN11975165	NA	embryo, embryo (Neogobius melanostomus, 1 cell, SAMN11975165)	61,368,459	76%	1%	61,794
SAMN11975166	NA	embryo, embryo (Neogobius melanostomus, 1 cell, SAMN11975166)	28,692,212	33%	8%	52,126
SAMN11975167	NA	embryo, embryo (Neogobius melanostomus, 16 cell, SAMN11975167)	40,298,212	51%	6%	71,077
SAMN11975168	NA	embryo, embryo (Neogobius melanostomus, 4 cell, SAMN11975168)	79,681,420	73%	1%	66,462
SAMN11975169	NA	embryo, embryo (Neogobius melanostomus, prim stage, SAMN11975169)	49,371,088	53%	5%	71,432

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
DRR100551	DRX093984	DRP004748	SAMD00093050	12,200,866	43%	46%
SRR1797677	SRX872263	SRP053620	SAMN03339821	54,490,824	41%	25%
SRR3399027	SRX1710563	SRP073412	SAMN04866581	180,698,322	44%	27%
SRR3521257	SRX1761157	SRP075124	SAMN04999870	48,078,612	25%	28%
SRR3521269	SRX1761168	SRP075124	SAMN04999871	50,774,612	25%	29%
SRR3521274	SRX1761172	SRP075124	SAMN04999872	49,234,850	24%	28%
SRR3526318	SRX1761179	SRP075124	SAMN04999873	43,106,456	26%	28%
SRR3521279	SRX1761183	SRP075124	SAMN04999874	49,292,192	25%	28%
SRR3521284	SRX1761196	SRP075124	SAMN04999875	50,503,102	25%	28%
SRR3521288	SRX1761217	SRP075124	SAMN04999876	45,437,896	25%	28%
SRR3521292	SRX1761230	SRP075124	SAMN04999877	45,534,298	26%	30%
SRR3521259	SRX1761242	SRP075124	SAMN04999878	46,438,142	27%	30%
SRR3510077	SRX1761266	SRP075141	SAMN04999879	50,089,426	24%	28%
SRR3510601	SRX1761273	SRP075141	SAMN04999880	46,489,434	26%	30%
SRR3510603	SRX1761275	SRP075141	SAMN04999881	48,734,730	27%	30%
SRR3510604	SRX1761280	SRP075141	SAMN04999882	45,701,148	25%	29%
SRR3510605	SRX1761281	SRP075141	SAMN04999883	48,287,744	24%	29%
SRR3510606	SRX1761282	SRP075141	SAMN04999884	51,647,836	24%	28%
SRR3510607	SRX1761283	SRP075141	SAMN04999885	44,317,498	26%	29%
SRR3510611	SRX1761284	SRP075141	SAMN04999886	47,038,512	25%	29%
SRR3510615	SRX1761306	SRP075141	SAMN04999887	47,596,452	25%	29%
SRR5012115	SRX2342697	SRP093198	SAMN06011197	49,895,990	43%	23%
SRR5012116	SRX2342698	SRP093198	SAMN06011198	43,185,250	40%	24%
SRR5012117	SRX2342699	SRP093198	SAMN06011199	44,530,794	42%	22%
SRR5012118	SRX2342700	SRP093198	SAMN06011200	46,005,642	47%	25%
SRR5012119	SRX2342701	SRP093198	SAMN06011201	38,148,996	75%	27%
SRR5012120	SRX2342702	SRP093198	SAMN06011202	30,319,528	46%	22%
SRR5012121	SRX2342703	SRP093198	SAMN06011203	27,918,190	49%	24%
SRR5012122	SRX2342704	SRP093198	SAMN06011204	52,146,586	50%	23%
SRR5933696	SRX3093835	SRP115439	SAMN07359291	52,364,032	27%	24%
SRR5933695	SRX3093836	SRP115439	SAMN07359297	53,771,748	28%	24%
SRR5997700	SRX3153214	SRP116672	SAMN07519030	59,844,392	27%	22%
SRR5997758	SRX3153268	SRP116672	SAMN07519046	51,563,180	26%	23%
SRR5997768	SRX3153278	SRP116672	SAMN07519052	54,790,314	28%	20%
SRR5997780	SRX3153289	SRP116672	SAMN07519064	76,523,418	30%	30%
SRR6346742	SRX3443998	SRP126129	SAMN08131159	7,880,692	19%	32%
SRR6346741	SRX3443999	SRP126129	SAMN08131160	11,693,254	21%	36%
SRR6346743	SRX3443997	SRP126129	SAMN08131161	13,420,930	23%	37%
SRR7770428	SRX4626001	SRP159175	SAMN09476104	23,585,591	100%	56%
SRR9317353	SRX6084909	SRP201702	SAMN11975155	47,359,940	69%	2%
SRR9317352	SRX6084910	SRP201702	SAMN11975156	39,232,812	70%	1%
SRR9317358	SRX6084904	SRP201702	SAMN11975157	52,585,528	60%	3%
SRR9317357	SRX6084905	SRP201702	SAMN11975158	95,001,540	80%	0%
SRR9317366	SRX6084896	SRP201702	SAMN11975159	79,641,364	73%	1%
SRR9317365	SRX6084897	SRP201702	SAMN11975160	101,118,710	80%	0%
SRR9317355	SRX6084907	SRP201702	SAMN11975161	99,280,714	80%	0%
SRR9317363	SRX6084899	SRP201702	SAMN11975162	42,728,378	58%	4%
SRR9317356	SRX6084906	SRP201702	SAMN11975163	41,256,782	60%	4%
SRR9317354	SRX6084908	SRP201702	SAMN11975164	31,807,612	45%	7%
SRR9317360	SRX6084902	SRP201702	SAMN11975165	61,368,459	76%	1%
SRR9317359	SRX6084903	SRP201702	SAMN11975166	28,692,212	33%	8%
SRR9317362	SRX6084900	SRP201702	SAMN11975167	40,298,212	51%	6%
SRR9317361	SRX6084901	SRP201702	SAMN11975168	79,681,420	73%	1%
SRR9317364	SRX6084898	SRP201702	SAMN11975169	49,371,088	53%	5%

Protein alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Larimichthys crocea high-quality model RefSeq (XP_)	18,161	17,641 (97.14%)	17,641 (97.14%)	69.56%	79.17%
Same-species GenBank	23	22 (95.65%)	22 (95.65%)	80.66%	84.90%
Actinopterygii GenBank	85,721	51,413 (59.98%)	51,413 (59.98%)	68.16%	79.46%
Actinopterygii known RefSeq (NP_)	24,999	23,168 (92.68%)	23,168 (92.68%)	68.30%	77.95%
Danio rerio high-quality model RefSeq (XP_)	7,935	7,287 (91.83%)	7,287 (91.83%)	65.15%	71.51%
Oryzias latipes high-quality model RefSeq (XP_)	17,157	16,558 (96.51%)	16,558 (96.51%)	68.67%	77.99%
Perca flavescens high-quality model RefSeq (XP_)	16,027	15,455 (96.43%)	15,455 (96.43%)	69.34%	80.16%
Homo sapiens known RefSeq (NP_)	57,106	37,014 (64.82%)	37,014 (64.82%)	66.99%	70.19%

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20

RefSeq

Integrated reference sequences