NCBI Misgurnus anguillicaudatus Annotation Release GCF_027580225.1-RS_2023_04

The genome sequence records for Misgurnus anguillicaudatus RefSeq assembly GCF_027580225.1 (HAU_Mang_1.0) were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
BUSCO results: Annotation completeness assessed with BUSCO
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as "GCF_027580225.1-RS_2023_04".

Date of Entrez queries for transcripts and proteins: Apr 13 2023
Date of submission of annotation to the public databases: Apr 17 2023
Software version: 10.1

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
HAU_Mang_1.0	GCF_027580225.1	Huazhong Agriculture University	01-09-2023	Reference	26 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	HAU_Mang_1.0
Genes and pseudogenes	43,460
protein-coding	27,957
non-coding	14,265
Transcribed pseudogenes	0
Non-transcribed pseudogenes	866
genes with variants	12,411
Immunoglobulin/T-cell receptor gene segments	361
other	11
mRNAs	53,904
fully-supported	51,846
with > 5% ab initio	1,042
partial	178
with filled gap(s)	80
known RefSeq (NM_)	0
model RefSeq (XM_)	53,904
non-coding RNAs	19,861
fully-supported	12,104
with > 5% ab initio	0
partial	2
with filled gap(s)	2
known RefSeq (NR_)	0
model RefSeq (XR_)	14,875
pseudo transcripts	0
fully-supported	0
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	0
CDSs	54,278
fully-supported	51,846
with > 5% ab initio	1,189
partial	177
with major correction(s)	708
known RefSeq (NP_)	0
model RefSeq (XP_)	53,917

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	42,233	17,091	4,996	54	915,203
All transcripts	73,765	2,870	2,255	54	89,194
mRNA	53,904	3,518	2,824	253	89,194
misc_RNA	1,188	3,308	2,796	188	15,449
tRNA	4,984	74	73	66	95
lncRNA	10,918	1,593	1,298	140	14,665
snoRNA	291	143	134	61	313
snRNA	480	135	140	54	198
rRNA	1,989	129	119	119	4,018
Single-exon transcripts	1,364	1,789	1,320	293	9,109
coding transcripts (NM_/XM_ )	1,364	1,789	1,320	293	9,109
CDSs	53,917	2,209	1,557	99	87,957
Exons	343,623	304	143	1	21,699
in coding transcripts (NM_/XM_ )	310,765	296	142	1	21,699
in non-coding transcripts (NR_/XR_ )	40,631	341	148	9	9,328
Introns	301,926	2,613	390	30	533,817
in coding transcripts (NM_/XM_ )	278,389	2,657	396	30	533,817
in non-coding transcripts (NR_/XR_ )	31,180	2,298	378	30	300,908

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	1.85	1	1	50
Number of exons per transcript	11.63	8	1	233

BUSCO analysis of gene annotation

BUSCO v4.1.4 was run in "protein" mode on the annotated gene set picking one longest protein per gene, and run using the actinopterygii_odb10 lineage dataset. Results are reported for the gene set from the primary assembly unit, and presented in BUSCO notation.

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the UniProtKB/Swiss-Prot curated proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 27944 coding genes, 24213 genes had a protein with an alignment covering 50% or more of the query and 12296 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: UniProtKB/Swiss-Prot curated proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker (if calculated), for each assembly. RepeatMasker results are only calculated for organisms with complete Dfam HMM model collections.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with WindowMasker
HAU_Mang_1.0	GCF_027580225.1	44.93%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez Nucleotide, Entrez Protein, and SRA, and aligned to the genome.

Transcript alignments

The alignments of the following transcripts with Splign were used for gene prediction:

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species Genbank	159	155 (97.48%)	137 (86.16%)	98.53%	99.10%
Same-species TSA	220,479	192,854 (87.47%)	125,107 (56.74%)	98.32%	97.95%
Same-species EST	22,158	19,149 (86.42%)	16,978 (76.62%)	98.63%	99.25%

RNA-Seq alignments

The alignments of the following RNA-Seq reads with STAR were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Publication	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	NA	Aggregate of all aligned samples	6,196,524,372	76%	52%	360,672
SAMD00144882	NA	Japanese loach transcriptome from brain and eye (Misgurnus anguillicaudatus, SAMD00144882)	89,158,386	62%	32%	257,723
SAMD00406525	NA	skin (Misgurnus anguillicaudatus, SAMD00406525)	42,685,200	82%	54%	217,707
SAMD00406526	NA	skin (Misgurnus anguillicaudatus, SAMD00406526)	42,280,156	80%	56%	223,339
SAMD00406527	NA	skin (Misgurnus anguillicaudatus, SAMD00406527)	46,848,676	78%	56%	224,378
SAMD00406528	NA	skin (Misgurnus anguillicaudatus, SAMD00406528)	43,063,166	84%	54%	216,649
SAMD00406529	NA	skin (Misgurnus anguillicaudatus, SAMD00406529)	49,059,076	83%	58%	211,252
SAMD00406530	NA	skin (Misgurnus anguillicaudatus, SAMD00406530)	52,146,362	83%	57%	225,667
SAMN05356266	27886141	pooled brain, muscle, liver, spleen, intestine and gonad (Misgurnus anguillicaudatus, pooled male and female, SAMN05356266)	88,842,922	84%	44%	272,100
SAMN07519078	NA	gill (Misgurnus anguillicaudatus, SAMN07519078)	30,923,166	65%	23%	169,340
SAMN08195246	NA	liver (Misgurnus anguillicaudatus, 1, female, SAMN08195246)	55,122,928	83%	60%	176,739
SAMN08195248	NA	liver (Misgurnus anguillicaudatus, 1, female, SAMN08195248)	49,639,570	79%	51%	181,034
SAMN08195250	NA	liver (Misgurnus anguillicaudatus, 1, female, SAMN08195250)	44,908,122	79%	52%	179,439
SAMN08605991	NA	liver (Misgurnus anguillicaudatus, 1, female, SAMN08605991)	46,896,926	75%	51%	176,603
SAMN08605992	NA	liver (Misgurnus anguillicaudatus, 1, female, SAMN08605992)	56,366,180	75%	52%	186,532
SAMN08605993	NA	liver (Misgurnus anguillicaudatus, 1, female, SAMN08605993)	49,215,906	72%	53%	185,814
SAMN08826812	NA	gonad (Misgurnus anguillicaudatus, 2, pooled male and female, SAMN08826812)	661,342,426	64%	51%	325,428
SAMN15846495	NA	posterior intestine (Misgurnus anguillicaudatus, 2 years old, female and male, SAMN15846495)	142,487,388	66%	38%	246,087
SAMN16814389	NA	brain (Misgurnus anguillicaudatus, 18 month, female, SAMN16814389)	55,335,508	77%	38%	247,493
SAMN16814390	NA	gonad (Misgurnus anguillicaudatus, 18 month, female, SAMN16814390)	56,318,006	84%	51%	197,181
SAMN16814391	NA	liver (Misgurnus anguillicaudatus, 18 month, female, SAMN16814391)	58,138,334	81%	57%	197,919
SAMN16814392	NA	muscle (Misgurnus anguillicaudatus, 18 month, female, SAMN16814392)	57,947,708	81%	58%	179,912
SAMN16814393	NA	brain (Misgurnus anguillicaudatus, 18 month, female, SAMN16814393)	68,752,362	78%	38%	250,899
SAMN16814394	NA	gonad (Misgurnus anguillicaudatus, 18 month, female, SAMN16814394)	50,562,430	84%	52%	210,603
SAMN16814395	NA	liver (Misgurnus anguillicaudatus, 18 month, female, SAMN16814395)	54,354,574	73%	34%	174,644
SAMN16814396	NA	muscle (Misgurnus anguillicaudatus, 18 month, female, SAMN16814396)	55,690,086	80%	52%	183,495
SAMN16814397	NA	brain (Misgurnus anguillicaudatus, 18 month, female, SAMN16814397)	49,130,138	77%	38%	242,595
SAMN16814398	NA	gonad (Misgurnus anguillicaudatus, 18 month, female, SAMN16814398)	63,283,518	84%	51%	198,835
SAMN16814399	NA	liver (Misgurnus anguillicaudatus, 18 month, female, SAMN16814399)	52,052,568	79%	53%	190,180
SAMN16814400	NA	muscle (Misgurnus anguillicaudatus, 18 month, female, SAMN16814400)	66,324,712	81%	55%	183,971
SAMN16814401	NA	brain (Misgurnus anguillicaudatus, 18 month, male, SAMN16814401)	62,027,998	77%	37%	250,953
SAMN16814402	NA	gonad (Misgurnus anguillicaudatus, 18 month, male, SAMN16814402)	48,364,530	81%	46%	275,994
SAMN16814403	NA	liver (Misgurnus anguillicaudatus, 18 month, male, SAMN16814403)	63,704,586	81%	53%	196,882
SAMN16814404	NA	muscle (Misgurnus anguillicaudatus, 18 month, male, SAMN16814404)	76,424,464	82%	53%	203,598
SAMN16814405	NA	brain (Misgurnus anguillicaudatus, 18 month, male, SAMN16814405)	64,419,878	78%	38%	250,903
SAMN16814406	NA	gonad (Misgurnus anguillicaudatus, 18 month, male, SAMN16814406)	55,222,048	79%	42%	285,954
SAMN16814407	NA	liver (Misgurnus anguillicaudatus, 18 month, male, SAMN16814407)	52,548,386	79%	52%	187,680
SAMN16814408	NA	muscle (Misgurnus anguillicaudatus, 18 month, male, SAMN16814408)	59,187,982	82%	52%	195,212
SAMN16814409	NA	brain (Misgurnus anguillicaudatus, 18 month, male, SAMN16814409)	55,529,032	78%	38%	246,306
SAMN16814410	NA	gonad (Misgurnus anguillicaudatus, 18 month, male, SAMN16814410)	53,634,604	79%	46%	275,971
SAMN16814411	NA	liver (Misgurnus anguillicaudatus, 18 month, male, SAMN16814411)	45,645,952	79%	51%	181,518
SAMN16814412	NA	muscle (Misgurnus anguillicaudatus, 18 month, male, SAMN16814412)	65,172,704	82%	53%	189,783
SAMN16814413	NA	brain (Misgurnus anguillicaudatus, 18 month, female, SAMN16814413)	57,011,632	75%	36%	248,245
SAMN16814414	NA	gonad (Misgurnus anguillicaudatus, 18 month, female, SAMN16814414)	49,020,606	83%	53%	196,371
SAMN16814415	NA	liver (Misgurnus anguillicaudatus, 18 month, female, SAMN16814415)	51,513,626	79%	55%	192,547
SAMN16814416	NA	muscle (Misgurnus anguillicaudatus, 18 month, female, SAMN16814416)	55,663,072	82%	60%	191,835
SAMN16814417	NA	brain (Misgurnus anguillicaudatus, 18 month, female, SAMN16814417)	52,329,554	77%	39%	248,914
SAMN16814418	NA	gonad (Misgurnus anguillicaudatus, 18 month, female, SAMN16814418)	60,232,132	82%	54%	202,750
SAMN16814419	NA	liver (Misgurnus anguillicaudatus, 18 month, female, SAMN16814419)	53,451,016	81%	63%	181,752
SAMN16814420	NA	muscle (Misgurnus anguillicaudatus, 18 month, female, SAMN16814420)	43,752,276	82%	63%	182,814
SAMN16814421	NA	brain (Misgurnus anguillicaudatus, 18 month, female, SAMN16814421)	49,189,320	77%	40%	244,015
SAMN16814422	NA	gonad (Misgurnus anguillicaudatus, 18 month, female, SAMN16814422)	48,854,592	81%	55%	198,927
SAMN16814423	NA	liver (Misgurnus anguillicaudatus, 18 month, female, SAMN16814423)	46,463,698	83%	64%	167,219
SAMN16814424	NA	muscle (Misgurnus anguillicaudatus, 18 month, female, SAMN16814424)	58,262,486	84%	61%	184,874
SAMN16814425	NA	brain (Misgurnus anguillicaudatus, 18 month, male, SAMN16814425)	55,465,238	77%	38%	247,988
SAMN16814426	NA	gonad (Misgurnus anguillicaudatus, 18 month, male, SAMN16814426)	48,885,844	80%	48%	277,078
SAMN16814427	NA	liver (Misgurnus anguillicaudatus, 18 month, male, SAMN16814427)	43,211,132	80%	60%	179,537
SAMN16814428	NA	muscle (Misgurnus anguillicaudatus, 18 month, male, SAMN16814428)	57,898,034	82%	61%	178,362
SAMN16814429	NA	brain (Misgurnus anguillicaudatus, 18 month, male, SAMN16814429)	47,784,460	77%	39%	242,409
SAMN16814430	NA	gonad (Misgurnus anguillicaudatus, 18 month, male, SAMN16814430)	52,665,304	81%	51%	276,425
SAMN16814431	NA	liver (Misgurnus anguillicaudatus, 18 month, male, SAMN16814431)	57,910,806	74%	44%	171,746
SAMN16814432	NA	muscle (Misgurnus anguillicaudatus, 18 month, male, SAMN16814432)	58,241,502	80%	63%	167,258
SAMN16814433	NA	brain (Misgurnus anguillicaudatus, 18 month, male, SAMN16814433)	52,357,444	77%	38%	244,701
SAMN16814434	NA	gonad (Misgurnus anguillicaudatus, 18 month, male, SAMN16814434)	56,226,944	81%	50%	278,240
SAMN16814435	NA	liver (Misgurnus anguillicaudatus, 18 month, male, SAMN16814435)	49,225,604	77%	53%	183,945
SAMN16814436	NA	muscle (Misgurnus anguillicaudatus, 18 month, male, SAMN16814436)	51,556,686	83%	62%	182,366
SAMN20064302	NA	barbels (Misgurnus anguillicaudatus, 2, pooled male and female, SAMN20064302)	1,075,281,198	76%	53%	328,975
SAMN30626157	NA	spermatogonium (Misgurnus anguillicaudatus, 1, male, SAMN30626157)	341,609,962	71%	62%	291,381
SAMN32145245	NA	posterior intestine (Misgurnus anguillicaudatus, 2, pooled male and female, SAMN32145245)	541,701,540	77%	64%	238,962

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
DRR157555	DRX148227	DRP006652	SAMD00144882	89,158,386	62%	32%
DRR319697	DRX309045	DRP007725	SAMD00406525	42,685,200	82%	54%
DRR319698	DRX309046	DRP007725	SAMD00406526	42,280,156	80%	56%
DRR319699	DRX309047	DRP007725	SAMD00406527	46,848,676	78%	56%
DRR319700	DRX309048	DRP007725	SAMD00406528	43,063,166	84%	54%
DRR319701	DRX309049	DRP007725	SAMD00406529	49,059,076	83%	58%
DRR319702	DRX309050	DRP007725	SAMD00406530	52,146,362	83%	57%
SRR3744972	SRX1898888	SRP077967	SAMN05356266	88,842,922	84%	44%
SRR5997795	SRX3153303	SRP116672	SAMN07519078	30,923,166	65%	23%
SRR6388422	SRX3481898	SRP127013	SAMN08195246	55,122,928	83%	60%
SRR6388423	SRX3481897	SRP127013	SAMN08195248	49,639,570	79%	51%
SRR6388424	SRX3481896	SRP127013	SAMN08195250	44,908,122	79%	52%
SRR6781483	SRX3741429	SRP133451	SAMN08605991	46,896,926	75%	51%
SRR6781482	SRX3741430	SRP133451	SAMN08605992	56,366,180	75%	52%
SRR6781484	SRX3741428	SRP133451	SAMN08605993	49,215,906	72%	53%
SRR6942728	SRX3886652	SRP136902	SAMN08826812	70,161,008	69%	44%
SRR6942727	SRX3886653	SRP136902	SAMN08826812	90,883,062	60%	54%
SRR6942726	SRX3886654	SRP136902	SAMN08826812	70,406,528	56%	54%
SRR6942725	SRX3886655	SRP136902	SAMN08826812	77,598,528	70%	52%
SRR6942724	SRX3886656	SRP136902	SAMN08826812	43,100,252	64%	54%
SRR6942723	SRX3886657	SRP136902	SAMN08826812	43,896,950	62%	53%
SRR6942722	SRX3886658	SRP136902	SAMN08826812	41,847,570	64%	52%
SRR6942721	SRX3886659	SRP136902	SAMN08826812	41,161,426	67%	51%
SRR6942720	SRX3886660	SRP136902	SAMN08826812	51,402,056	65%	53%
SRR6942719	SRX3886661	SRP136902	SAMN08826812	41,211,316	58%	53%
SRR6942718	SRX3886662	SRP136902	SAMN08826812	43,464,608	66%	47%
SRR6942717	SRX3886663	SRP136902	SAMN08826812	46,209,122	71%	46%
SRR12466340	SRX8960526	SRP277906	SAMN15846495	53,953,082	72%	41%
SRR12466339	SRX8960527	SRP277906	SAMN15846495	43,534,054	65%	36%
SRR12466338	SRX8960528	SRP277906	SAMN15846495	45,000,252	59%	36%
SRR13107419	SRX9552846	SRP293717	SAMN16814389	55,335,508	77%	38%
SRR13107418	SRX9552847	SRP293717	SAMN16814390	56,318,006	84%	51%
SRR13107407	SRX9552858	SRP293717	SAMN16814391	58,138,334	81%	57%
SRR13107396	SRX9552869	SRP293717	SAMN16814392	57,947,708	81%	58%
SRR13107385	SRX9552880	SRP293717	SAMN16814393	68,752,362	78%	38%
SRR13107376	SRX9552889	SRP293717	SAMN16814394	50,562,430	84%	52%
SRR13107375	SRX9552890	SRP293717	SAMN16814395	54,354,574	73%	34%
SRR13107374	SRX9552891	SRP293717	SAMN16814396	55,690,086	80%	52%
SRR13107373	SRX9552892	SRP293717	SAMN16814397	49,130,138	77%	38%
SRR13107372	SRX9552893	SRP293717	SAMN16814398	63,283,518	84%	51%
SRR13107417	SRX9552848	SRP293717	SAMN16814399	52,052,568	79%	53%
SRR13107416	SRX9552849	SRP293717	SAMN16814400	66,324,712	81%	55%
SRR13107415	SRX9552850	SRP293717	SAMN16814401	62,027,998	77%	37%
SRR13107414	SRX9552851	SRP293717	SAMN16814402	48,364,530	81%	46%
SRR13107413	SRX9552852	SRP293717	SAMN16814403	63,704,586	81%	53%
SRR13107412	SRX9552853	SRP293717	SAMN16814404	76,424,464	82%	53%
SRR13107411	SRX9552854	SRP293717	SAMN16814405	64,419,878	78%	38%
SRR13107410	SRX9552855	SRP293717	SAMN16814406	55,222,048	79%	42%
SRR13107409	SRX9552856	SRP293717	SAMN16814407	52,548,386	79%	52%
SRR13107408	SRX9552857	SRP293717	SAMN16814408	59,187,982	82%	52%
SRR13107406	SRX9552859	SRP293717	SAMN16814409	55,529,032	78%	38%
SRR13107405	SRX9552860	SRP293717	SAMN16814410	53,634,604	79%	46%
SRR13107404	SRX9552861	SRP293717	SAMN16814411	45,645,952	79%	51%
SRR13107403	SRX9552862	SRP293717	SAMN16814412	65,172,704	82%	53%
SRR13107402	SRX9552863	SRP293717	SAMN16814413	57,011,632	75%	36%
SRR13107401	SRX9552864	SRP293717	SAMN16814414	49,020,606	83%	53%
SRR13107400	SRX9552865	SRP293717	SAMN16814415	51,513,626	79%	55%
SRR13107399	SRX9552866	SRP293717	SAMN16814416	55,663,072	82%	60%
SRR13107398	SRX9552867	SRP293717	SAMN16814417	52,329,554	77%	39%
SRR13107397	SRX9552868	SRP293717	SAMN16814418	60,232,132	82%	54%
SRR13107395	SRX9552870	SRP293717	SAMN16814419	53,451,016	81%	63%
SRR13107394	SRX9552871	SRP293717	SAMN16814420	43,752,276	82%	63%
SRR13107393	SRX9552872	SRP293717	SAMN16814421	49,189,320	77%	40%
SRR13107392	SRX9552873	SRP293717	SAMN16814422	48,854,592	81%	55%
SRR13107391	SRX9552874	SRP293717	SAMN16814423	46,463,698	83%	64%
SRR13107390	SRX9552875	SRP293717	SAMN16814424	58,262,486	84%	61%
SRR13107389	SRX9552876	SRP293717	SAMN16814425	55,465,238	77%	38%
SRR13107388	SRX9552877	SRP293717	SAMN16814426	48,885,844	80%	48%
SRR13107387	SRX9552878	SRP293717	SAMN16814427	43,211,132	80%	60%
SRR13107386	SRX9552879	SRP293717	SAMN16814428	57,898,034	82%	61%
SRR13107384	SRX9552881	SRP293717	SAMN16814429	47,784,460	77%	39%
SRR13107383	SRX9552882	SRP293717	SAMN16814430	52,665,304	81%	51%
SRR13107382	SRX9552883	SRP293717	SAMN16814431	57,910,806	74%	44%
SRR13107381	SRX9552884	SRP293717	SAMN16814432	58,241,502	80%	63%
SRR13107380	SRX9552885	SRP293717	SAMN16814433	52,357,444	77%	38%
SRR13107379	SRX9552886	SRP293717	SAMN16814434	56,226,944	81%	50%
SRR13107378	SRX9552887	SRP293717	SAMN16814435	49,225,604	77%	53%
SRR13107377	SRX9552888	SRP293717	SAMN16814436	51,556,686	83%	62%
SRR15046101	SRX11356723	SRP327047	SAMN20064302	41,929,032	75%	49%
SRR15046100	SRX11356724	SRP327047	SAMN20064302	41,416,054	72%	48%
SRR15046099	SRX11356725	SRP327047	SAMN20064302	41,630,536	75%	51%
SRR15046098	SRX11356726	SRP327047	SAMN20064302	42,770,722	77%	48%
SRR15046097	SRX11356727	SRP327047	SAMN20064302	40,835,702	77%	48%
SRR15046096	SRX11356728	SRP327047	SAMN20064302	41,950,232	77%	48%
SRR17468119	SRX13639252	SRP353807	SAMN20064302	45,926,940	75%	53%
SRR17468118	SRX13639253	SRP353807	SAMN20064302	50,144,398	76%	51%
SRR17468117	SRX13639254	SRP353807	SAMN20064302	47,704,462	76%	51%
SRR17468116	SRX13639255	SRP353807	SAMN20064302	49,278,728	74%	51%
SRR17468115	SRX13639256	SRP353807	SAMN20064302	55,434,792	77%	54%
SRR17468114	SRX13639257	SRP353807	SAMN20064302	47,637,322	78%	53%
SRR18242169	SRX14383601	SRP362640	SAMN20064302	40,023,412	72%	47%
SRR18242168	SRX14383602	SRP362640	SAMN20064302	47,802,926	76%	45%
SRR18242167	SRX14383603	SRP362640	SAMN20064302	44,550,370	78%	50%
SRR18242166	SRX14383604	SRP362640	SAMN20064302	44,292,284	78%	46%
SRR18242165	SRX14383605	SRP362640	SAMN20064302	42,657,920	72%	46%
SRR18242164	SRX14383606	SRP362640	SAMN20064302	42,363,420	77%	47%
SRR21396405	SRX17401671	SRP395313	SAMN30626157	341,609,962	71%	62%
SRR22689642	SRX18651774	SRP412560	SAMN20064302	43,744,122	78%	69%
SRR22689641	SRX18651775	SRP412560	SAMN20064302	45,154,266	80%	70%
SRR22689640	SRX18651776	SRP412560	SAMN20064302	43,970,168	78%	66%
SRR22689639	SRX18651777	SRP412560	SAMN20064302	44,409,628	77%	66%
SRR22689638	SRX18651778	SRP412560	SAMN20064302	44,237,838	76%	62%
SRR22689637	SRX18651779	SRP412560	SAMN20064302	45,415,924	77%	61%
SRR22699397	SRX18661465	SRP412560	SAMN32145245	43,918,414	77%	67%
SRR22699396	SRX18661466	SRP412560	SAMN32145245	43,964,960	80%	69%
SRR22699395	SRX18661467	SRP412560	SAMN32145245	44,268,130	78%	61%
SRR22699394	SRX18661468	SRP412560	SAMN32145245	45,131,788	77%	65%
SRR22699393	SRX18661469	SRP412560	SAMN32145245	47,946,290	78%	65%
SRR22699392	SRX18661470	SRP412560	SAMN32145245	45,219,406	76%	60%
SRR22721358	SRX18682881	SRP412560	SAMN32145245	44,875,238	79%	67%
SRR22721357	SRX18682882	SRP412560	SAMN32145245	43,710,180	77%	65%
SRR22721356	SRX18682883	SRP412560	SAMN32145245	45,261,784	76%	63%
SRR22721355	SRX18682884	SRP412560	SAMN32145245	47,327,920	77%	60%
SRR22721354	SRX18682885	SRP412560	SAMN32145245	45,056,328	78%	62%
SRR22721353	SRX18682886	SRP412560	SAMN32145245	45,021,102	77%	64%

Protein alignments

The alignments of the following proteins with ProSplign were used for gene prediction:

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Cynoglossus semilaevis high-quality model RefSeq (XP_)	14,331	13,892 (96.94%)	13,892 (96.94%)	67.90%	76.83%
Poecilia formosa high-quality model RefSeq (XP_)	18,503	17,805 (96.23%)	17,805 (96.23%)	67.12%	75.77%
Actinopterygii GenBank	91,081	86,654 (95.14%)	86,654 (95.14%)	69.72%	82.11%
Actinopterygii known RefSeq (NP_)	25,457	24,479 (96.16%)	24,479 (96.16%)	70.16%	82.10%
Danio rerio high-quality model RefSeq (XP_)	7,712	7,482 (97.02%)	7,482 (97.02%)	69.99%	81.28%
Astyanax mexicanus high-quality model RefSeq (XP_)	19,875	19,362 (97.42%)	19,362 (97.42%)	68.06%	79.55%
Esox lucius high-quality model RefSeq (XP_)	18,508	17,839 (96.39%)	17,839 (96.39%)	67.57%	77.10%
Homo sapiens known RefSeq (NP_)	66,908	55,563 (83.04%)	55,563 (83.04%)	67.36%	71.73%

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
BUSCO: Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. Molecular biology and evolution 2021.38(10):4647-4654
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20
STAR: Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. Bioinformatics 2013 Jan 1;29(1):15-21.
Minimap2: Li H. Bioinformatics 2018 Sep 15;34(18):3094-3100

RefSeq

Integrated reference sequences