NCBI Dicentrarchus labrax Annotation Release 100

The RefSeq genome records for Dicentrarchus labrax were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. This report presents statistics on the annotation products, the input data used in the pipeline and intermediate alignment results.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
BUSCO results: Annotation completeness assessed with BUSCO
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as NCBI Dicentrarchus labrax Annotation Release 100

Annotation release ID: 100
Date of Entrez queries for transcripts and proteins: Oct 18 2022
Date of submission of annotation to the public databases: Oct 28 2022
Software version: 10.0

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
dlabrax2021	GCF_905237075.1	Hellenic Centre for Marine Research, Institute for Marine Biology, Biotechnology and Aquaculture, Heraklion, Crete, Greece	04-24-2021	Reference	1 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	dlabrax2021
Genes and pseudogenes	30,274
protein-coding	23,890
non-coding	5,708
Transcribed pseudogenes	3
Non-transcribed pseudogenes	427
genes with variants	11,143
Immunoglobulin/T-cell receptor gene segments	238
other	8
mRNAs	54,404
fully-supported	53,700
with > 5% ab initio	241
partial	104
with filled gap(s)	11
known RefSeq (NM_)	0
model RefSeq (XM_)	54,404
non-coding RNAs	9,995
fully-supported	7,265
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	8,057
pseudo transcripts	3
fully-supported	3
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	3
CDSs	54,655
fully-supported	53,700
with > 5% ab initio	292
partial	111
with major correction(s)	417
known RefSeq (NP_)	0
model RefSeq (XP_)	54,417

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	29,606	17,311	7,279	55	1,177,706
All transcripts	64,399	3,733	3,010	55	92,811
mRNA	54,404	4,159	3,411	162	92,811
misc_RNA	2,012	3,225	2,615	113	23,062
tRNA	1,936	74	73	68	89
lncRNA	5,259	1,412	877	72	19,218
snoRNA	237	131	131	57	323
snRNA	346	150	164	55	195
rRNA	197	131	119	118	1,700
Single-exon transcripts	708	2,159	1,923	303	11,614
coding transcripts (NM_/XM_ )	708	2,159	1,923	303	11,614
CDSs	54,417	2,369	1,653	96	91,578
Exons	309,552	327	140	1	17,340
in coding transcripts (NM_/XM_ )	295,311	320	140	1	17,340
in non-coding transcripts (NR_/XR_ )	22,302	373	140	9	15,629
Introns	280,116	2,062	436	30	993,697
in coding transcripts (NM_/XM_ )	270,070	1,959	437	30	993,697
in non-coding transcripts (NR_/XR_ )	17,888	3,482	422	30	517,830

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	2.26	1	1	50
Number of exons per transcript	13.36	10	1	238

BUSCO analysis of gene annotation

BUSCO v4.1.4 was run in "protein" mode on the annotated gene set picking one longest protein per gene, and run using the actinopterygii_odb10 lineage dataset. Results are reported for the gene set from the primary assembly unit, and presented in BUSCO notation.

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the UniProtKB/Swiss-Prot curated proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 23877 coding genes, 21843 genes had a protein with an alignment covering 50% or more of the query and 10457 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: UniProtKB/Swiss-Prot curated proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker (if calculated), for each assembly. RepeatMasker results are only calculated for organisms with complete Dfam HMM model collections.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with WindowMasker
dlabrax2021	GCF_905237075.1	25.40%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez Nucleotide, Entrez Protein, and SRA, and aligned to the genome.

Transcript alignments

The alignments of the following transcripts with Splign were used for gene prediction:

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species Genbank	818	815 (99.63%)	705 (86.19%)	99.28%	95.82%
Same-species TSA	52,265	51,936 (99.37%)	51,515 (98.57%)	99.84%	99.85%
Same-species EST	55,814	48,050 (86.09%)	44,971 (80.57%)	99.14%	99.42%

RNA-Seq alignments

The alignments of the following RNA-Seq reads with STAR were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Publication	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	NA	Aggregate of all aligned samples	6,906,647,436	82%	32%	490,804
SAMEA2189177	NA	brain (Dicentrarchus labrax, SAMEA2189177)	47,423,044	85%	20%	230,659
SAMEA2189178	NA	liver (Dicentrarchus labrax, SAMEA2189178)	48,898,638	74%	29%	151,672
SAMEA2189179	NA	brain (Dicentrarchus labrax, SAMEA2189179)	45,845,666	83%	16%	219,284
SAMEA2189180	NA	liver (Dicentrarchus labrax, SAMEA2189180)	51,206,820	77%	28%	165,084
SAMN03252737	NA	whole body (Dicentrarchus labrax, SAMN03252737)	299,249,586	84%	24%	280,148
SAMN04535175	NA	skin (Dicentrarchus labrax, 1 year, SAMN04535175)	81,634,542	75%	29%	241,860
SAMN06241987	NA	juvenile, olfactory epithelium (Dicentrarchus labrax, SAMN06241987)	46,397,618	63%	19%	234,053
SAMN06241988	NA	juvenile, olfactory epithelium (Dicentrarchus labrax, SAMN06241988)	42,956,322	76%	23%	239,704
SAMN06241989	NA	juvenile, olfactory epithelium (Dicentrarchus labrax, SAMN06241989)	43,458,066	80%	24%	243,437
SAMN06241990	NA	juvenile, olfactory epithelium (Dicentrarchus labrax, SAMN06241990)	48,972,336	82%	24%	250,399
SAMN06241991	NA	juvenile, olfactory epithelium (Dicentrarchus labrax, SAMN06241991)	44,865,928	82%	24%	246,059
SAMN06241992	NA	juvenile, olfactory epithelium (Dicentrarchus labrax, SAMN06241992)	52,061,366	80%	23%	250,498
SAMN06241993	NA	juvenile, olfactory epithelium (Dicentrarchus labrax, SAMN06241993)	49,808,038	82%	24%	249,330
SAMN06241994	NA	juvenile, olfactory epithelium (Dicentrarchus labrax, SAMN06241994)	39,864,392	83%	24%	243,345
SAMN06241995	NA	juvenile, olfactory epithelium (Dicentrarchus labrax, SAMN06241995)	38,263,332	83%	23%	241,566
SAMN06241996	NA	juvenile, olfactory epithelium (Dicentrarchus labrax, SAMN06241996)	38,230,618	80%	23%	240,107
SAMN06241997	NA	juvenile, olfactory epithelium (Dicentrarchus labrax, SAMN06241997)	29,734,264	85%	24%	234,373
SAMN06241998	NA	juvenile, olfactory epithelium (Dicentrarchus labrax, SAMN06241998)	29,976,546	81%	23%	232,300
SAMN06241999	NA	juvenile, olfactory epithelium (Dicentrarchus labrax, SAMN06241999)	54,530,278	84%	24%	255,187
SAMN06242000	NA	juvenile, olfactory epithelium (Dicentrarchus labrax, SAMN06242000)	27,560,454	84%	23%	230,419
SAMN06242001	NA	juvenile, olfactory epithelium (Dicentrarchus labrax, SAMN06242001)	45,085,488	80%	23%	246,265
SAMN06242002	NA	juvenile, olfactory epithelium (Dicentrarchus labrax, SAMN06242002)	30,625,814	86%	24%	233,895
SAMN06242003	NA	juvenile, olfactory epithelium (Dicentrarchus labrax, SAMN06242003)	31,696,628	82%	20%	220,847
SAMN06242004	NA	juvenile, olfactory epithelium (Dicentrarchus labrax, SAMN06242004)	55,088,858	81%	24%	249,488
SAMN06242005	NA	juvenile, olfactory epithelium (Dicentrarchus labrax, SAMN06242005)	40,277,938	78%	22%	236,499
SAMN06242006	NA	juvenile, olfactory epithelium (Dicentrarchus labrax, SAMN06242006)	43,294,468	82%	23%	242,593
SAMN06242007	NA	juvenile, olfactory epithelium (Dicentrarchus labrax, SAMN06242007)	47,433,890	82%	24%	245,394
SAMN06242008	NA	juvenile, olfactory epithelium (Dicentrarchus labrax, SAMN06242008)	46,712,030	81%	22%	244,197
SAMN06242009	NA	juvenile, olfactory epithelium (Dicentrarchus labrax, SAMN06242009)	41,282,674	83%	23%	242,870
SAMN06242010	NA	juvenile, olfactory epithelium (Dicentrarchus labrax, SAMN06242010)	38,135,652	80%	23%	239,194
SAMN06242011	NA	juvenile, olfactory lobe (Dicentrarchus labrax, SAMN06242011)	25,906,660	62%	22%	204,902
SAMN06242012	NA	juvenile, olfactory lobe (Dicentrarchus labrax, SAMN06242012)	19,655,826	84%	23%	208,335
SAMN06242013	NA	juvenile, olfactory lobe (Dicentrarchus labrax, SAMN06242013)	19,409,608	85%	22%	203,788
SAMN06242014	NA	juvenile, olfactory lobe (Dicentrarchus labrax, SAMN06242014)	22,151,068	84%	22%	211,899
SAMN06242015	NA	juvenile, olfactory lobe (Dicentrarchus labrax, SAMN06242015)	24,365,584	83%	22%	210,084
SAMN06242016	NA	juvenile, olfactory lobe (Dicentrarchus labrax, SAMN06242016)	36,441,358	83%	21%	225,030
SAMN06242017	NA	juvenile, olfactory lobe (Dicentrarchus labrax, SAMN06242017)	29,202,694	85%	21%	216,611
SAMN06242018	NA	juvenile, olfactory lobe (Dicentrarchus labrax, SAMN06242018)	26,375,710	85%	22%	215,093
SAMN06242019	NA	juvenile, olfactory lobe (Dicentrarchus labrax, SAMN06242019)	19,076,416	83%	21%	201,161
SAMN06242020	NA	juvenile, olfactory lobe (Dicentrarchus labrax, SAMN06242020)	19,570,596	82%	22%	199,931
SAMN06242021	NA	juvenile, olfactory lobe (Dicentrarchus labrax, SAMN06242021)	13,486,472	84%	22%	189,657
SAMN06242022	NA	juvenile, olfactory lobe (Dicentrarchus labrax, SAMN06242022)	18,216,558	82%	21%	204,022
SAMN06242023	NA	juvenile, olfactory lobe (Dicentrarchus labrax, SAMN06242023)	16,844,152	82%	22%	198,618
SAMN06242024	NA	juvenile, olfactory lobe (Dicentrarchus labrax, SAMN06242024)	23,140,022	86%	22%	207,946
SAMN06242025	NA	juvenile, olfactory lobe (Dicentrarchus labrax, SAMN06242025)	21,097,264	85%	23%	205,080
SAMN06242026	NA	juvenile, olfactory lobe (Dicentrarchus labrax, SAMN06242026)	14,101,626	83%	22%	199,801
SAMN06286544	NA	Whole larvae (Dicentrarchus labrax, SAMN06286544)	111,171,781	80%	23%	274,091
SAMN06286545	NA	Whole larvae (Dicentrarchus labrax, SAMN06286545)	123,014,147	74%	28%	284,144
SAMN06286546	NA	Whole larvae (Dicentrarchus labrax, SAMN06286546)	148,104,732	90%	33%	297,129
SAMN06286547	NA	Whole larvae (Dicentrarchus labrax, SAMN06286547)	108,276,081	74%	34%	287,668
SAMN06286548	NA	Whole larvae (Dicentrarchus labrax, SAMN06286548)	128,911,065	87%	31%	285,347
SAMN06286549	NA	Whole larvae (Dicentrarchus labrax, SAMN06286549)	124,335,554	87%	34%	289,483
SAMN06286550	NA	Whole larvae (Dicentrarchus labrax, SAMN06286550)	420,688,502	83%	34%	324,383
SAMN06644892	NA	immature sea bass, scale (Dicentrarchus labrax, SAMN06644892)	1,131,650,928	80%	31%	308,209
SAMN07370991	29133947	Head-kidney leucocytes, (Dicentrarchus labrax, SAMN07370991)	104,384,450	91%	36%	198,753
SAMN07370992	29133947	Head-kidney leucocytes, (Dicentrarchus labrax, SAMN07370992)	101,987,472	91%	34%	184,533
SAMN07370993	29133947	Head-kidney leucocytes, (Dicentrarchus labrax, SAMN07370993)	110,971,690	91%	34%	189,251
SAMN07370994	29133947	Head-kidney leucocytes, (Dicentrarchus labrax, SAMN07370994)	117,910,874	91%	37%	194,482
SAMN07370995	29133947	Head-kidney leucocytes, (Dicentrarchus labrax, SAMN07370995)	101,066,042	91%	33%	190,861
SAMN07370996	29133947	Head-kidney leucocytes, (Dicentrarchus labrax, SAMN07370996)	102,081,590	91%	33%	182,133
SAMN07370997	29133947	Head-kidney leucocytes, (Dicentrarchus labrax, SAMN07370997)	97,408,516	90%	33%	197,749
SAMN07370998	29133947	Head-kidney leucocytes, (Dicentrarchus labrax, SAMN07370998)	108,258,846	91%	34%	193,842
SAMN07370999	29133947	Head-kidney leucocytes, (Dicentrarchus labrax, SAMN07370999)	110,744,126	91%	34%	196,455
SAMN13230695	NA	gills1 (Dicentrarchus labrax, SAMN13230695)	18,903,480	76%	38%	194,840
SAMN13230697	NA	gills3 (Dicentrarchus labrax, SAMN13230697)	4,526,816	77%	38%	148,153
SAMN13230699	NA	gills5 (Dicentrarchus labrax, SAMN13230699)	12,278,962	77%	26%	157,817
SAMN13230701	NA	gills7 (Dicentrarchus labrax, SAMN13230701)	10,724,430	77%	26%	152,375
SAMN13230703	NA	gills9 (Dicentrarchus labrax, SAMN13230703)	19,498,688	72%	38%	192,808
SAMN13230705	NA	head kidney1 (Dicentrarchus labrax, SAMN13230705)	21,471,944	67%	40%	180,491
SAMN13230707	NA	head kidney3 (Dicentrarchus labrax, SAMN13230707)	22,046,852	71%	39%	187,007
SAMN13230709	NA	head kidney5 (Dicentrarchus labrax, SAMN13230709)	17,193,268	72%	39%	169,868
SAMN13230711	NA	head kidney7 (Dicentrarchus labrax, SAMN13230711)	6,164,576	73%	39%	138,755
SAMN13230713	NA	gills11 (Dicentrarchus labrax, SAMN13230713)	19,162,048	71%	42%	186,486
SAMN13230715	NA	gills13 (Dicentrarchus labrax, SAMN13230715)	28,516,300	72%	44%	200,971
SAMN13230717	NA	gills15 (Dicentrarchus labrax, SAMN13230717)	26,356,358	81%	43%	203,457
SAMN13230719	NA	gills17 (Dicentrarchus labrax, SAMN13230719)	25,048,550	82%	42%	199,415
SAMN13230721	NA	gills19 (Dicentrarchus labrax, SAMN13230721)	27,219,702	69%	44%	194,911
SAMN13230723	NA	head kidney9 (Dicentrarchus labrax, SAMN13230723)	23,579,308	71%	44%	175,561
SAMN13230725	NA	head kidney11 (Dicentrarchus labrax, SAMN13230725)	20,647,298	68%	44%	170,247
SAMN13230727	NA	head kidney13 (Dicentrarchus labrax, SAMN13230727)	26,007,174	74%	44%	180,058
SAMN13230729	NA	head kidney15 (Dicentrarchus labrax, SAMN13230729)	20,301,604	69%	45%	169,333
SAMN13230731	NA	head kidney17 (Dicentrarchus labrax, SAMN13230731)	26,699,874	73%	45%	182,792
SAMN17124849	NA	Fingerling, Tongue, Healthy (Dicentrarchus labrax, SAMN17124849)	34,749,770	85%	40%	138,811
SAMN17124850	NA	Fingerling, Tongue, Healthy (Dicentrarchus labrax, SAMN17124850)	31,966,200	76%	39%	151,123
SAMN17124851	NA	Fingerling, Tongue, Healthy (Dicentrarchus labrax, SAMN17124851)	27,551,242	77%	41%	118,831
SAMN17124852	NA	Fingerling, Tongue, Healthy (Dicentrarchus labrax, SAMN17124852)	31,604,458	74%	38%	139,363
SAMN17124853	NA	Fingerling, Tongue, Healthy (Dicentrarchus labrax, SAMN17124853)	28,919,588	77%	40%	145,805
SAMN17124854	NA	Fingerling, Tongue, Healthy (Dicentrarchus labrax, SAMN17124854)	25,877,316	68%	39%	122,355
SAMN17124855	NA	Fingerling, Tongue, Infected (Dicentrarchus labrax, SAMN17124855)	44,961,164	83%	48%	170,924
SAMN17124856	NA	Fingerling, Tongue, Infected (Dicentrarchus labrax, SAMN17124856)	34,634,110	78%	43%	131,261
SAMN17124857	NA	Fingerling, Tongue, Infected (Dicentrarchus labrax, SAMN17124857)	25,002,800	80%	40%	147,483
SAMN17124858	NA	Fingerling, Tongue, Infected (Dicentrarchus labrax, SAMN17124858)	31,542,076	81%	40%	142,878
SAMN17124859	NA	Fingerling, Tongue, Infected (Dicentrarchus labrax,, SAMN17124859)	32,426,716	86%	41%	151,996
SAMN17124860	NA	Fingerling, Tongue, Infected (Dicentrarchus labrax, SAMN17124860)	35,581,686	83%	42%	114,129
SAMN17124861	NA	Fingerling, Spleen, Healthy (Dicentrarchus labrax, SAMN17124861)	36,195,806	83%	42%	183,554
SAMN17124862	NA	Fingerling, Spleen, Healthy (Dicentrarchus labrax, SAMN17124862)	34,071,862	78%	39%	139,857
SAMN17124863	NA	Fingerling, Spleen, Healthy (Dicentrarchus labrax, SAMN17124863)	40,208,164	84%	41%	176,680
SAMN17124864	NA	Fingerling, Spleen, Healthy (Dicentrarchus labrax, SAMN17124864)	38,683,924	84%	41%	170,263
SAMN17124865	NA	Fingerling, Spleen, Healthy (Dicentrarchus labrax, SAMN17124865)	30,309,886	85%	42%	181,960
SAMN17124866	NA	Fingerling, Spleen, Healthy (Dicentrarchus labrax, SAMN17124866)	26,990,982	78%	41%	138,106
SAMN17124867	NA	Fingerling, Spleen, Infected (Dicentrarchus labrax, SAMN17124867)	34,088,460	83%	39%	166,897
SAMN17124868	NA	Fingerling, Spleen, Infected (Dicentrarchus labrax, SAMN17124868)	33,647,920	78%	42%	163,555
SAMN17124869	NA	Fingerling, Spleen, Infected (Dicentrarchus labrax, SAMN17124869)	30,397,016	86%	49%	33,819
SAMN17124870	NA	Fingerling, Spleen, Infected (Dicentrarchus labrax, SAMN17124870)	29,084,866	76%	40%	148,724
SAMN17124871	NA	Fingerling, Spleen, Infected (Dicentrarchus labrax, SAMN17124871)	16,536,888	82%	42%	137,368
SAMN17124872	NA	Fingerling, Spleen, Infected (Dicentrarchus labrax, SAMN17124872)	20,087,554	78%	39%	151,300
SAMN17124873	NA	Fingerling, Liver, Healthy (Dicentrarchus labrax, SAMN17124873)	33,746,296	84%	45%	109,883
SAMN17124874	NA	Fingerling, Liver, Healthy (Dicentrarchus labrax, SAMN17124874)	26,984,624	84%	49%	112,661
SAMN17124875	NA	Fingerling, Liver, Healthy (Dicentrarchus labrax, SAMN17124875)	96,247,310	87%	51%	147,814
SAMN17124876	NA	Fingerling, Liver, Healthy (Dicentrarchus labrax, SAMN17124876)	36,263,134	85%	53%	124,744
SAMN17124877	NA	Fingerling, Liver, Healthy (Dicentrarchus labrax, SAMN17124877)	40,586,870	84%	51%	112,410
SAMN17124878	NA	Fingerling, Liver, Healthy (Dicentrarchus labrax, SAMN17124878)	45,530,192	90%	52%	132,627
SAMN17124879	NA	Fingerling, Liver, Infected (Dicentrarchus labrax, SAMN17124879)	26,029,518	83%	49%	104,925
SAMN17124880	NA	Fingerling, Liver, Infected (Dicentrarchus labrax, SAMN17124880)	37,715,420	83%	51%	114,397
SAMN17124881	NA	Fingerling, Liver, Infected (Dicentrarchus labrax, SAMN17124881)	34,265,506	83%	50%	110,854
SAMN17124882	NA	Fingerling, Liver, Infected (Dicentrarchus labrax, SAMN17124882)	31,981,748	83%	50%	130,260
SAMN17124883	NA	Fingerling, Liver, Infected (Dicentrarchus labrax, SAMN17124883)	36,676,758	82%	47%	121,135
SAMN17124884	NA	Fingerling, Liver, Infected (Dicentrarchus labrax, SAMN17124884)	28,999,030	77%	49%	60,851
SAMN20356404	NA	Juvenile, olfactory rosette (Dicentrarchus labrax, 2 years old, SAMN20356404)	119,574,040	93%	28%	279,421

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
ERR341291	ERX314138	ERP003898	SAMEA2189177	47,423,044	85%	20%
ERR341290	ERX314139	ERP003898	SAMEA2189178	48,898,638	74%	29%
ERR341289	ERX314140	ERP003898	SAMEA2189179	45,845,666	83%	16%
ERR341292	ERX314141	ERP003898	SAMEA2189180	51,206,820	77%	28%
SRR1820180	SRX796243	SRP050545	SAMN03252737	38,873,758	78%	22%
SRR1819884	SRX796245	SRP050545	SAMN03252737	42,809,302	82%	24%
SRR1818265	SRX796246	SRP050545	SAMN03252737	42,203,884	83%	26%
SRR1818260	SRX796264	SRP050545	SAMN03252737	34,468,472	85%	24%
SRR1818230	SRX796265	SRP050545	SAMN03252737	45,858,820	87%	27%
SRR1818228	SRX796276	SRP050545	SAMN03252737	41,909,428	87%	24%
SRR1814214	SRX796587	SRP050545	SAMN03252737	53,125,922	83%	21%
SRR3206634	SRX1615510	SRP071200	SAMN04535175	20,023,512	63%	27%
SRR3206635	SRX1615510	SRP071200	SAMN04535175	20,221,512	63%	26%
SRR3206639	SRX1615511	SRP071200	SAMN04535175	20,607,338	88%	30%
SRR3206726	SRX1615511	SRP071200	SAMN04535175	20,782,180	88%	30%
SRR5188560	SRX2504481	SRP097118	SAMN06241987	46,397,618	63%	19%
SRR5188559	SRX2504480	SRP097118	SAMN06241988	42,956,322	76%	23%
SRR5188558	SRX2504479	SRP097118	SAMN06241989	43,458,066	80%	24%
SRR5188557	SRX2504478	SRP097118	SAMN06241990	48,972,336	82%	24%
SRR5188556	SRX2504477	SRP097118	SAMN06241991	44,865,928	82%	24%
SRR5188555	SRX2504476	SRP097118	SAMN06241992	52,061,366	80%	23%
SRR5188554	SRX2504475	SRP097118	SAMN06241993	49,808,038	82%	24%
SRR5188553	SRX2504474	SRP097118	SAMN06241994	39,864,392	83%	24%
SRR5188552	SRX2504473	SRP097118	SAMN06241995	38,263,332	83%	23%
SRR5188551	SRX2504472	SRP097118	SAMN06241996	38,230,618	80%	23%
SRR5188550	SRX2504471	SRP097118	SAMN06241997	29,734,264	85%	24%
SRR5188549	SRX2504470	SRP097118	SAMN06241998	29,976,546	81%	23%
SRR5188548	SRX2504469	SRP097118	SAMN06241999	54,530,278	84%	24%
SRR5188547	SRX2504468	SRP097118	SAMN06242000	27,560,454	84%	23%
SRR5188546	SRX2504467	SRP097118	SAMN06242001	45,085,488	80%	23%
SRR5188545	SRX2504466	SRP097118	SAMN06242002	30,625,814	86%	24%
SRR5188544	SRX2504465	SRP097118	SAMN06242003	31,696,628	82%	20%
SRR5188543	SRX2504464	SRP097118	SAMN06242004	55,088,858	81%	24%
SRR5188542	SRX2504463	SRP097118	SAMN06242005	40,277,938	78%	22%
SRR5188541	SRX2504462	SRP097118	SAMN06242006	43,294,468	82%	23%
SRR5188540	SRX2504461	SRP097118	SAMN06242007	47,433,890	82%	24%
SRR5188539	SRX2504460	SRP097118	SAMN06242008	46,712,030	81%	22%
SRR5188538	SRX2504459	SRP097118	SAMN06242009	41,282,674	83%	23%
SRR5188537	SRX2504458	SRP097118	SAMN06242010	38,135,652	80%	23%
SRR5188536	SRX2504457	SRP097118	SAMN06242011	25,906,660	62%	22%
SRR5188535	SRX2504456	SRP097118	SAMN06242012	19,655,826	84%	23%
SRR5188534	SRX2504455	SRP097118	SAMN06242013	19,409,608	85%	22%
SRR5188533	SRX2504454	SRP097118	SAMN06242014	22,151,068	84%	22%
SRR5188532	SRX2504453	SRP097118	SAMN06242015	24,365,584	83%	22%
SRR5188531	SRX2504452	SRP097118	SAMN06242016	36,441,358	83%	21%
SRR5188530	SRX2504451	SRP097118	SAMN06242017	29,202,694	85%	21%
SRR5188529	SRX2504450	SRP097118	SAMN06242018	26,375,710	85%	22%
SRR5188528	SRX2504449	SRP097118	SAMN06242019	19,076,416	83%	21%
SRR5188527	SRX2504448	SRP097118	SAMN06242020	19,570,596	82%	22%
SRR5188526	SRX2504447	SRP097118	SAMN06242021	13,486,472	84%	22%
SRR5188525	SRX2504446	SRP097118	SAMN06242022	18,216,558	82%	21%
SRR5188524	SRX2504445	SRP097118	SAMN06242023	16,844,152	82%	22%
SRR5188523	SRX2504444	SRP097118	SAMN06242024	23,140,022	86%	22%
SRR5188522	SRX2504443	SRP097118	SAMN06242025	21,097,264	85%	23%
SRR5188521	SRX2504442	SRP097118	SAMN06242026	14,101,626	83%	22%
SRR5221514	SRX2531290	SRP098676	SAMN06286544	29,929,298	89%	25%
SRR5221515	SRX2531291	SRP098676	SAMN06286544	28,949,062	58%	18%
SRR5221516	SRX2531292	SRP098676	SAMN06286544	24,256,205	84%	20%
SRR5221517	SRX2531293	SRP098676	SAMN06286544	28,037,216	89%	26%
SRR5221510	SRX2531286	SRP098676	SAMN06286545	27,447,004	71%	28%
SRR5221511	SRX2531287	SRP098676	SAMN06286545	32,015,678	68%	29%
SRR5221512	SRX2531288	SRP098676	SAMN06286545	32,573,329	77%	28%
SRR5221513	SRX2531289	SRP098676	SAMN06286545	30,978,136	77%	28%
SRR5221506	SRX2531282	SRP098676	SAMN06286546	38,354,320	89%	31%
SRR5221507	SRX2531283	SRP098676	SAMN06286546	38,707,652	91%	34%
SRR5221508	SRX2531284	SRP098676	SAMN06286546	35,833,637	91%	33%
SRR5221509	SRX2531285	SRP098676	SAMN06286546	35,209,123	90%	32%
SRR5221502	SRX2531278	SRP098676	SAMN06286547	16,000,000	2%	18%
SRR5221503	SRX2531279	SRP098676	SAMN06286547	28,803,098	88%	33%
SRR5221504	SRX2531280	SRP098676	SAMN06286547	32,625,927	85%	36%
SRR5221505	SRX2531281	SRP098676	SAMN06286547	30,847,056	86%	33%
SRR5221498	SRX2531274	SRP098676	SAMN06286548	32,826,652	88%	30%
SRR5221499	SRX2531275	SRP098676	SAMN06286548	30,651,442	87%	31%
SRR5221500	SRX2531276	SRP098676	SAMN06286548	33,155,931	90%	34%
SRR5221501	SRX2531277	SRP098676	SAMN06286548	32,277,040	83%	30%
SRR5221494	SRX2531270	SRP098676	SAMN06286549	32,998,421	89%	34%
SRR5221495	SRX2531271	SRP098676	SAMN06286549	31,126,929	85%	32%
SRR5221496	SRX2531272	SRP098676	SAMN06286549	31,837,867	89%	35%
SRR5221497	SRX2531273	SRP098676	SAMN06286549	28,372,337	84%	33%
SRR5221493	SRX2531269	SRP098676	SAMN06286550	420,688,502	83%	34%
SRR5378700	SRX2673957	SRP102504	SAMN06644892	36,225,312	67%	28%
SRR5378701	SRX2673957	SRP102504	SAMN06644892	32,644,012	81%	32%
SRR5378702	SRX2673957	SRP102504	SAMN06644892	35,733,262	88%	31%
SRR5378703	SRX2673957	SRP102504	SAMN06644892	32,010,562	87%	34%
SRR5378708	SRX2673957	SRP102504	SAMN06644892	32,146,988	89%	32%
SRR5378709	SRX2673957	SRP102504	SAMN06644892	36,137,438	89%	31%
SRR5378710	SRX2673962	SRP102504	SAMN06644892	31,517,304	81%	32%
SRR5378711	SRX2673962	SRP102504	SAMN06644892	37,603,290	86%	31%
SRR5378712	SRX2673962	SRP102504	SAMN06644892	37,590,402	83%	31%
SRR5378794	SRX2673962	SRP102504	SAMN06644892	39,227,946	78%	31%
SRR5378984	SRX2673962	SRP102504	SAMN06644892	31,610,062	81%	30%
SRR5379059	SRX2674307	SRP102504	SAMN06644892	41,553,966	82%	32%
SRR5379060	SRX2674307	SRP102504	SAMN06644892	38,478,934	83%	32%
SRR5379061	SRX2674307	SRP102504	SAMN06644892	44,380,030	84%	31%
SRR5379062	SRX2674307	SRP102504	SAMN06644892	30,414,430	86%	30%
SRR5379141	SRX2674307	SRP102504	SAMN06644892	39,193,218	89%	31%
SRR5379182	SRX2674405	SRP102504	SAMN06644892	40,882,408	82%	32%
SRR5379196	SRX2674405	SRP102504	SAMN06644892	30,697,718	78%	33%
SRR5379203	SRX2674405	SRP102504	SAMN06644892	32,072,948	83%	33%
SRR5379213	SRX2674405	SRP102504	SAMN06644892	29,259,718	72%	30%
SRR5379224	SRX2674405	SRP102504	SAMN06644892	45,768,482	75%	31%
SRR5379246	SRX2674467	SRP102504	SAMN06644892	35,430,446	83%	31%
SRR5379251	SRX2674467	SRP102504	SAMN06644892	36,056,576	83%	32%
SRR5379996	SRX2674467	SRP102504	SAMN06644892	38,837,546	65%	29%
SRR5379998	SRX2674467	SRP102504	SAMN06644892	51,006,532	84%	34%
SRR5380074	SRX2674467	SRP102504	SAMN06644892	43,998,510	60%	18%
SRR5380203	SRX2675383	SRP102504	SAMN06644892	43,984,134	73%	30%
SRR5380204	SRX2675383	SRP102504	SAMN06644892	40,895,084	80%	31%
SRR5380375	SRX2675383	SRP102504	SAMN06644892	42,003,930	84%	30%
SRR5380675	SRX2675383	SRP102504	SAMN06644892	44,289,740	86%	31%
SRR5851409	SRX3021065	SRP113028	SAMN07370991	104,384,450	91%	36%
SRR5851417	SRX3021073	SRP113028	SAMN07370992	101,987,472	91%	34%
SRR5851416	SRX3021072	SRP113028	SAMN07370993	110,971,690	91%	34%
SRR5851415	SRX3021071	SRP113028	SAMN07370994	117,910,874	91%	37%
SRR5851414	SRX3021070	SRP113028	SAMN07370995	101,066,042	91%	33%
SRR5851413	SRX3021069	SRP113028	SAMN07370996	102,081,590	91%	33%
SRR5851412	SRX3021068	SRP113028	SAMN07370997	97,408,516	90%	33%
SRR5851411	SRX3021067	SRP113028	SAMN07370998	108,258,846	91%	34%
SRR5851410	SRX3021066	SRP113028	SAMN07370999	110,744,126	91%	34%
SRR10412160	SRX7110348	SRP228884	SAMN13230695	18,903,480	76%	38%
SRR10412154	SRX7110354	SRP228884	SAMN13230697	4,526,816	77%	38%
SRR10412144	SRX7110364	SRP228884	SAMN13230699	12,278,962	77%	26%
SRR10412143	SRX7110365	SRP228884	SAMN13230701	10,724,430	77%	26%
SRR10412142	SRX7110366	SRP228884	SAMN13230703	19,498,688	72%	38%
SRR10412159	SRX7110349	SRP228884	SAMN13230705	21,471,944	67%	40%
SRR10412158	SRX7110350	SRP228884	SAMN13230707	22,046,852	71%	39%
SRR10412157	SRX7110351	SRP228884	SAMN13230709	17,193,268	72%	39%
SRR10412156	SRX7110352	SRP228884	SAMN13230711	6,164,576	73%	39%
SRR10412155	SRX7110353	SRP228884	SAMN13230713	19,162,048	71%	42%
SRR10412153	SRX7110355	SRP228884	SAMN13230715	28,516,300	72%	44%
SRR10412152	SRX7110356	SRP228884	SAMN13230717	26,356,358	81%	43%
SRR10412151	SRX7110357	SRP228884	SAMN13230719	25,048,550	82%	42%
SRR10412149	SRX7110359	SRP228884	SAMN13230721	27,219,702	69%	44%
SRR10412150	SRX7110358	SRP228884	SAMN13230723	23,579,308	71%	44%
SRR10412148	SRX7110360	SRP228884	SAMN13230725	20,647,298	68%	44%
SRR10412147	SRX7110361	SRP228884	SAMN13230727	26,007,174	74%	44%
SRR10412146	SRX7110362	SRP228884	SAMN13230729	20,301,604	69%	45%
SRR10412145	SRX7110363	SRP228884	SAMN13230731	26,699,874	73%	45%
SRR13278278	SRX9708020	SRP298624	SAMN17124849	34,749,770	85%	40%
SRR13278277	SRX9708021	SRP298624	SAMN17124850	31,966,200	76%	39%
SRR13278266	SRX9708032	SRP298624	SAMN17124851	27,551,242	77%	41%
SRR13278255	SRX9708043	SRP298624	SAMN17124852	31,604,458	74%	38%
SRR13278248	SRX9708050	SRP298624	SAMN17124853	28,919,588	77%	40%
SRR13278247	SRX9708051	SRP298624	SAMN17124854	25,877,316	68%	39%
SRR13278246	SRX9708052	SRP298624	SAMN17124855	44,961,164	83%	48%
SRR13278245	SRX9708053	SRP298624	SAMN17124856	34,634,110	78%	43%
SRR13278244	SRX9708054	SRP298624	SAMN17124857	25,002,800	80%	40%
SRR13278243	SRX9708055	SRP298624	SAMN17124858	31,542,076	81%	40%
SRR13278276	SRX9708022	SRP298624	SAMN17124859	32,426,716	86%	41%
SRR13278275	SRX9708023	SRP298624	SAMN17124860	35,581,686	83%	42%
SRR13278274	SRX9708024	SRP298624	SAMN17124861	36,195,806	83%	42%
SRR13278273	SRX9708025	SRP298624	SAMN17124862	34,071,862	78%	39%
SRR13278272	SRX9708026	SRP298624	SAMN17124863	40,208,164	84%	41%
SRR13278271	SRX9708027	SRP298624	SAMN17124864	38,683,924	84%	41%
SRR13278270	SRX9708028	SRP298624	SAMN17124865	30,309,886	85%	42%
SRR13278269	SRX9708029	SRP298624	SAMN17124866	26,990,982	78%	41%
SRR13278268	SRX9708030	SRP298624	SAMN17124867	34,088,460	83%	39%
SRR13278267	SRX9708031	SRP298624	SAMN17124868	33,647,920	78%	42%
SRR13278265	SRX9708033	SRP298624	SAMN17124869	30,397,016	86%	49%
SRR13278264	SRX9708034	SRP298624	SAMN17124870	29,084,866	76%	40%
SRR13278263	SRX9708035	SRP298624	SAMN17124871	16,536,888	82%	42%
SRR13278262	SRX9708036	SRP298624	SAMN17124872	20,087,554	78%	39%
SRR13278261	SRX9708037	SRP298624	SAMN17124873	33,746,296	84%	45%
SRR13278260	SRX9708038	SRP298624	SAMN17124874	26,984,624	84%	49%
SRR13278259	SRX9708039	SRP298624	SAMN17124875	96,247,310	87%	51%
SRR13278258	SRX9708040	SRP298624	SAMN17124876	36,263,134	85%	53%
SRR13278257	SRX9708041	SRP298624	SAMN17124877	40,586,870	84%	51%
SRR13278256	SRX9708042	SRP298624	SAMN17124878	45,530,192	90%	52%
SRR13278254	SRX9708044	SRP298624	SAMN17124879	26,029,518	83%	49%
SRR13278253	SRX9708045	SRP298624	SAMN17124880	37,715,420	83%	51%
SRR13278252	SRX9708046	SRP298624	SAMN17124881	34,265,506	83%	50%
SRR13278251	SRX9708047	SRP298624	SAMN17124882	31,981,748	83%	50%
SRR13278250	SRX9708048	SRP298624	SAMN17124883	36,676,758	82%	47%
SRR13278249	SRX9708049	SRP298624	SAMN17124884	28,999,030	77%	49%
SRR15222855	SRX11528806	SRP329592	SAMN20356404	64,791,063	93%	26%
SRR15222852	SRX11528809	SRP329592	SAMN20356404	54,782,977	93%	30%

Protein alignments

The alignments of the following proteins with ProSplign were used for gene prediction:

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species GenBank	694	689 (99.28%)	689 (99.28%)	76.60%	88.27%
Betta splendens high-quality model RefSeq (XP_)	18,343	18,169 (99.05%)	18,169 (99.05%)	71.57%	82.13%
Actinopterygii GenBank	89,757	85,844 (95.64%)	85,844 (95.64%)	69.47%	81.33%
Actinopterygii known RefSeq (NP_)	25,472	24,286 (95.34%)	24,286 (95.34%)	68.63%	79.38%
Danio rerio high-quality model RefSeq (XP_)	7,717	7,360 (95.37%)	7,360 (95.37%)	65.99%	74.34%
Esox lucius high-quality model RefSeq (XP_)	18,508	18,020 (97.36%)	18,020 (97.36%)	68.48%	78.01%
Xiphophorus maculatus high-quality model RefSeq (XP_)	18,457	18,200 (98.61%)	18,200 (98.61%)	70.60%	81.54%
Homo sapiens known RefSeq (NP_)	65,821	54,955 (83.49%)	54,955 (83.49%)	66.80%	69.34%

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
BUSCO: Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. Molecular biology and evolution 2021.38(10):4647-4654
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20
STAR: Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. Bioinformatics 2013 Jan 1;29(1):15-21.
Minimap2: Li H. Bioinformatics 2018 Sep 15;34(18):3094-3100

RefSeq

Integrated reference sequences