NCBI Acanthopagrus latus Annotation Release 100

The RefSeq genome records for Acanthopagrus latus were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. This report presents statistics on the annotation products, the input data used in the pipeline and intermediate alignment results.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as NCBI Acanthopagrus latus Annotation Release 100

Annotation release ID: 100
Date of Entrez queries for transcripts and proteins: Oct 30 2020
Date of submission of annotation to the public databases: Nov 1 2020
Software version: 8.5

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
fAcaLat1.1	GCF_904848185.1	SC	10-12-2020	Reference	25 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	fAcaLat1.1
Genes and pseudogenes	30,246
protein-coding	23,786
non-coding	6,111
transcribed pseudogenes	0
non-transcribed pseudogenes	189
genes with variants	12,587
immunoglobulin/T-cell receptor gene segments	160
other	0
mRNAs	54,545
fully-supported	53,670
with > 5% ab initio	255
partial	44
with filled gap(s)	0
known RefSeq (NM_)	0
model RefSeq (XM_)	54,545
non-coding RNAs	10,352
fully-supported	8,652
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	9,416
pseudo transcripts	0
fully-supported	0
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	0
CDSs	54,718
fully-supported	53,670
with > 5% ab initio	314
partial	61
with major correction(s)	103
known RefSeq (NP_)	0
model RefSeq (XP_)	54,558

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	29,897	16,986	7,059	55	1,052,400
All transcripts	64,897	3,595	2,916	55	92,865
mRNA	54,545	3,971	3,270	271	92,865
misc_RNA	2,169	3,141	2,434	221	15,258
tRNA	934	74	73	67	87
lncRNA	6,483	1,494	1,018	149	14,238
snoRNA	223	119	107	55	364
snRNA	199	147	141	57	191
guide_RNA	8	227	271	130	293
rRNA	336	126	119	118	1,693
Single-exon transcripts	820	2,032	1,734	330	9,325
coding transcripts (NM_/XM_ )	820	2,032	1,734	330	9,325
CDSs	54,558	2,270	1,614	96	91,920
Exons	319,351	321	141	1	20,844
in coding transcripts (NM_/XM_ )	299,061	313	140	1	20,844
in non-coding transcripts (NR_/XR_ )	32,318	345	145	2	11,459
Introns	284,477	2,000	413	30	1,018,313
in coding transcripts (NM_/XM_ )	270,255	1,911	403	30	1,018,313
in non-coding transcripts (NR_/XR_ )	25,996	2,975	557	30	940,697

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	2.21	1	1	50
Number of exons per transcript	13	10	1	250

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the UniProtKB/Swiss-Prot curated proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 23773 coding genes, 21736 genes had a protein with an alignment covering 50% or more of the query and 10449 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: UniProtKB/Swiss-Prot curated proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker for each assembly. RepeatMasker results are only used for organisms for which a comprehensive repeat library is available.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with RepeatMasker	% Masked with WindowMasker
fAcaLat1.1	GCF_904848185.1	3.43%	22.20%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez, aligned to the genome by Splign or ProSplign and passed to Gnomon, NCBI's gene prediction software.

Depending on the other evidence available, long 454 reads (with average length above 250 nt) may be aligned as traditional evidence and reported in the Transcript alignments section or aligned with RNA-Seq reads and reported in the RNA-Seq alignments section.

Transcript alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species Genbank	22	22 (100.00%)	21 (95.45%)	99.24%	98.77%

RNA-Seq alignments

The following RNA-Seq reads from the Sequence Read Archive were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	Aggregate of all aligned samples	2,107,566,694	87%	44%	400,162
SAMN07519001	gill (Acanthopagrus latus, not determined, SAMN07519001)	46,508,356	66%	23%	172,706
SAMN12777015	gonad (Acanthopagrus latus, male, SAMN12777015)	73,155,694	91%	48%	283,507
SAMN12777016	gonad (Acanthopagrus latus, male, SAMN12777016)	52,150,700	91%	47%	269,991
SAMN12777017	gonad (Acanthopagrus latus, male, SAMN12777017)	73,206,068	91%	47%	283,844
SAMN12777018	gonad (Acanthopagrus latus, intersex, SAMN12777018)	93,350,244	90%	50%	227,444
SAMN12777019	gonad (Acanthopagrus latus, intersex, SAMN12777019)	53,115,634	88%	44%	261,372
SAMN12777020	gonad (Acanthopagrus latus, intersex, SAMN12777020)	97,998,074	88%	48%	270,284
SAMN12777021	gonad (Acanthopagrus latus, female, SAMN12777021)	54,610,516	91%	46%	245,065
SAMN12777022	gonad (Acanthopagrus latus, female, SAMN12777022)	78,987,258	89%	44%	290,918
SAMN12777023	gonad (Acanthopagrus latus, female, SAMN12777023)	51,090,232	89%	46%	198,133
SAMN14501691	gonad-1 (Acanthopagrus latus, one-year-old, male, SAMN14501691)	66,574,928	91%	45%	265,648
SAMN14501692	gonad-2 (Acanthopagrus latus, one-year-old, male, SAMN14501692)	68,804,116	91%	50%	253,217
SAMN14501693	gonad-3 (Acanthopagrus latus, one-year-old, male, SAMN14501693)	65,509,486	91%	49%	256,464
SAMN14501694	gonad-4 (Acanthopagrus latus, two-year-old, intersex, SAMN14501694)	61,581,128	88%	50%	209,294
SAMN14501695	gonad-5 (Acanthopagrus latus, two-year-old, intersex, SAMN14501695)	52,166,556	89%	42%	154,783
SAMN14501696	gonad-6 (Acanthopagrus latus, two-year-old, intersex, SAMN14501696)	45,537,844	92%	48%	169,808
SAMN14501697	gonad-7 (Acanthopagrus latus, three-year-old, female, SAMN14501697)	43,445,888	89%	38%	154,488
SAMN14501698	gonad-8 (Acanthopagrus latus, three-year-old, female, SAMN14501698)	59,439,728	91%	51%	191,934
SAMN14501699	gonad-9 (Acanthopagrus latus, three-year-old, female, SAMN14501699)	66,190,392	91%	50%	191,189
SAMN15489180	gut (Acanthopagrus latus, 1-year, male, SAMN15489180)	53,999,878	77%	36%	162,543
SAMN15489181	gut (Acanthopagrus latus, 1-year, male, SAMN15489181)	51,719,318	77%	37%	170,387
SAMN15489182	gut (Acanthopagrus latus, 1-year, male, SAMN15489182)	53,512,788	85%	41%	183,128
SAMN15489183	gut (Acanthopagrus latus, 1-year, male, SAMN15489183)	43,051,766	83%	42%	175,588
SAMN15489184	gut (Acanthopagrus latus, 1-year, male, SAMN15489184)	46,206,410	82%	41%	174,834
SAMN15489185	gut (Acanthopagrus latus, 1-year, male, SAMN15489185)	47,058,596	84%	41%	182,955
SAMN15489186	gut (Acanthopagrus latus, 1-year, male, SAMN15489186)	51,155,290	78%	34%	167,443
SAMN15489187	gut (Acanthopagrus latus, 1-year, male, SAMN15489187)	51,262,094	80%	35%	169,055
SAMN15489188	gut (Acanthopagrus latus, 1-year, male, SAMN15489188)	43,803,526	79%	35%	164,968
SAMN16236879	gonad&liver&muscle&spleen (Acanthopagrus latus, female, SAMN16236879)	213,655,866	94%	50%	251,476
SAMN16236880	brain&heart&kideny (Acanthopagrus latus, female, SAMN16236880)	248,718,320	87%	38%	263,075

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
SRR5997672	SRX3153188	SRP116672	SAMN07519001	46,508,356	66%	23%
SRR10249786	SRX6967849	SRP224916	SAMN12777015	73,155,694	91%	48%
SRR10249785	SRX6967850	SRP224916	SAMN12777016	52,150,700	91%	47%
SRR10249794	SRX6967841	SRP224916	SAMN12777017	73,206,068	91%	47%
SRR10249793	SRX6967842	SRP224916	SAMN12777018	93,350,244	90%	50%
SRR10249792	SRX6967843	SRP224916	SAMN12777019	53,115,634	88%	44%
SRR10249791	SRX6967844	SRP224916	SAMN12777020	97,998,074	88%	48%
SRR10249790	SRX6967845	SRP224916	SAMN12777021	54,610,516	91%	46%
SRR10249789	SRX6967846	SRP224916	SAMN12777022	78,987,258	89%	44%
SRR10249788	SRX6967847	SRP224916	SAMN12777023	51,090,232	89%	46%
SRR11458634	SRX8036222	SRP254754	SAMN14501691	66,574,928	91%	45%
SRR11458633	SRX8036223	SRP254754	SAMN14501692	68,804,116	91%	50%
SRR11458632	SRX8036224	SRP254754	SAMN14501693	65,509,486	91%	49%
SRR11458631	SRX8036225	SRP254754	SAMN14501694	61,581,128	88%	50%
SRR11458630	SRX8036226	SRP254754	SAMN14501695	52,166,556	89%	42%
SRR11458629	SRX8036227	SRP254754	SAMN14501696	45,537,844	92%	48%
SRR11458628	SRX8036228	SRP254754	SAMN14501697	43,445,888	89%	38%
SRR11458627	SRX8036229	SRP254754	SAMN14501698	59,439,728	91%	51%
SRR11458626	SRX8036230	SRP254754	SAMN14501699	66,190,392	91%	50%
SRR12182088	SRX8696630	SRP271047	SAMN15489180	53,999,878	77%	36%
SRR12182087	SRX8696631	SRP271047	SAMN15489181	51,719,318	77%	37%
SRR12182086	SRX8696632	SRP271047	SAMN15489182	53,512,788	85%	41%
SRR12182085	SRX8696633	SRP271047	SAMN15489183	43,051,766	83%	42%
SRR12182084	SRX8696634	SRP271047	SAMN15489184	46,206,410	82%	41%
SRR12182083	SRX8696635	SRP271047	SAMN15489185	47,058,596	84%	41%
SRR12182082	SRX8696636	SRP271047	SAMN15489186	51,155,290	78%	34%
SRR12182081	SRX8696637	SRP271047	SAMN15489187	51,262,094	80%	35%
SRR12182080	SRX8696638	SRP271047	SAMN15489188	43,803,526	79%	35%
SRR12888285	SRX9353474	SRP288465	SAMN16236879	213,655,866	94%	50%
SRR12897758	SRX9362829	SRP288465	SAMN16236880	248,718,320	87%	38%

Protein alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Betta splendens high-quality model RefSeq (XP_)	18,279	17,846 (97.63%)	17,846 (97.63%)	70.95%	81.43%
Actinopterygii GenBank	86,738	53,200 (61.33%)	53,200 (61.33%)	69.25%	81.68%
Actinopterygii known RefSeq (NP_)	25,473	23,881 (93.75%)	23,881 (93.75%)	68.58%	79.80%
Danio rerio high-quality model RefSeq (XP_)	7,718	7,217 (93.51%)	7,217 (93.51%)	65.45%	75.05%
Esox lucius high-quality model RefSeq (XP_)	18,508	17,742 (95.86%)	17,742 (95.86%)	68.38%	78.26%
Xiphophorus maculatus high-quality model RefSeq (XP_)	18,457	17,962 (97.32%)	17,962 (97.32%)	70.11%	81.19%
Same-species GenBank	22	22 (100.00%)	22 (100.00%)	80.28%	87.84%
Homo sapiens known RefSeq (NP_)	60,894	38,954 (63.97%)	38,954 (63.97%)	67.37%	71.13%

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20

RefSeq

Integrated reference sequences