NCBI Rhinichthys klamathensis goyatoka Annotation Release GCF_029890125.1-RS_2023_05

The genome sequence records for Rhinichthys klamathensis goyatoka RefSeq assembly GCF_029890125.1 (OSU_Roscu_1.1) were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
BUSCO results: Annotation completeness assessed with BUSCO
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as "GCF_029890125.1-RS_2023_05".

Date of Entrez queries for transcripts and proteins: May 17 2023
Date of submission of annotation to the public databases: May 22 2023
Software version: 10.1

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
OSU_Roscu_1.1	GCF_029890125.1	Oregon State University	04-27-2023	Reference	unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	OSU_Roscu_1.1
Genes and pseudogenes	37,669
protein-coding	23,703
non-coding	13,107
Transcribed pseudogenes	0
Non-transcribed pseudogenes	657
genes with variants	7,231
Immunoglobulin/T-cell receptor gene segments	191
other	11
mRNAs	40,181
fully-supported	37,587
with > 5% ab initio	1,351
partial	366
with filled gap(s)	0
known RefSeq (NM_)	0
model RefSeq (XM_)	40,181
non-coding RNAs	13,773
fully-supported	2,647
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	6,022
pseudo transcripts	0
fully-supported	0
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	0
CDSs	40,372
fully-supported	37,587
with > 5% ab initio	1,468
partial	370
with major correction(s)	377
known RefSeq (NP_)	0
model RefSeq (XP_)	40,181

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	36,821	16,186	3,835	52	912,375
All transcripts	53,954	2,215	1,656	52	82,176
mRNA	40,181	2,891	2,187	138	82,176
misc_RNA	407	2,596	2,155	114	8,770
tRNA	7,751	75	73	70	101
lncRNA	2,240	568	437	101	4,600
snoRNA	314	143	133	61	316
snRNA	724	124	115	52	192
rRNA	2,326	138	119	118	4,057
Single-exon transcripts	1,148	1,476	1,237	252	6,862
coding transcripts (NM_/XM_ )	1,148	1,476	1,237	252	6,862
CDSs	40,181	2,237	1,566	114	80,928
Exons	266,712	226	136	1	23,550
in coding transcripts (NM_/XM_ )	260,033	227	136	1	23,550
in non-coding transcripts (NR_/XR_ )	10,145	195	125	9	6,701
Introns	239,458	2,613	562	30	677,391
in coding transcripts (NM_/XM_ )	235,129	2,592	568	30	677,391
in non-coding transcripts (NR_/XR_ )	7,735	3,275	466	30	283,778

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	1.59	1	1	50
Number of exons per transcript	12.23	9	1	198

BUSCO analysis of gene annotation

BUSCO v4.1.4 was run in "protein" mode on the annotated gene set picking one longest protein per gene, and run using the actinopterygii_odb10 lineage dataset. Results are reported for the gene set from the primary assembly unit, and presented in BUSCO notation.

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the UniProtKB/Swiss-Prot curated proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 23703 coding genes, 21586 genes had a protein with an alignment covering 50% or more of the query and 11446 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: UniProtKB/Swiss-Prot curated proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker (if calculated), for each assembly. RepeatMasker results are only calculated for organisms with complete Dfam HMM model collections.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with WindowMasker
OSU_Roscu_1.1	GCF_029890125.1	40.34%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez Nucleotide, Entrez Protein, and SRA, and aligned to the genome.

Transcript alignments

The alignments of the following transcripts with Splign were used for gene prediction:

No transcript evidence was used in this annotation

RNA-Seq alignments

The alignments of the following RNA-Seq reads with STAR were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Publication	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	NA	Aggregate of all aligned samples	8,052,492,790	62%	36%	278,316
SAMN03024265	NA	Adult, Gill Tissue (Pimephales promelas, 6 months, SAMN03024265, SAMN03024265)	470,813,940	63%	36%	211,562
SAMN07166470	NA	juvenile or prespawning adult, gut (Cyprinella lutrensis, SAMN07166470)	74,453,248	55%	34%	158,759
SAMN07166471	NA	juvenile or prespawning adult, gut (Cyprinella lutrensis, SAMN07166471)	68,334,258	51%	29%	158,439
SAMN07166472	NA	juvenile or prespawning adult, gut (Cyprinella lutrensis, SAMN07166472)	68,897,252	51%	31%	182,447
SAMN07166473	NA	juvenile or prespawning adult, gill (Cyprinella lutrensis, SAMN07166473)	68,668,538	42%	26%	158,261
SAMN07166474	NA	juvenile or prespawning adult, gill (Cyprinella lutrensis, SAMN07166474)	74,843,062	58%	31%	184,933
SAMN07166475	NA	juvenile or prespawning adult, gill (Cyprinella lutrensis, SAMN07166475)	69,316,180	58%	31%	188,584
SAMN07166476	NA	juvenile or prespawning adult, skin (Cyprinella lutrensis, SAMN07166476)	64,789,504	59%	30%	176,354
SAMN07166477	NA	juvenile or prespawning adult, skin (Cyprinella lutrensis, SAMN07166477)	71,807,394	52%	30%	173,989
SAMN07166478	NA	juvenile or prespawning adult, skin (Cyprinella lutrensis, SAMN07166478)	66,354,774	62%	33%	167,119
SAMN07166479	NA	juvenile or prespawning adult, kidney (Cyprinella lutrensis, SAMN07166479)	54,323,650	45%	25%	148,744
SAMN07166480	NA	juvenile or prespawning adult, kidney (Cyprinella lutrensis, SAMN07166480)	77,613,208	41%	25%	160,417
SAMN07166481	NA	juvenile or prespawning adult, kidney (Cyprinella lutrensis, SAMN07166481)	68,435,540	49%	27%	162,699
SAMN07166482	NA	juvenile or prespawning adult, gut (Platygobio gracilis, SAMN07166482)	60,464,086	73%	32%	172,693
SAMN07166483	NA	juvenile or prespawning adult, gut (Platygobio gracilis, SAMN07166483)	62,814,470	71%	32%	167,816
SAMN07166484	NA	juvenile or prespawning adult, gut (Platygobio gracilis, SAMN07166484)	70,838,038	69%	30%	181,769
SAMN07166485	NA	juvenile or prespawning adult, gill (Platygobio gracilis, SAMN07166485)	73,925,290	71%	29%	198,857
SAMN07166486	NA	juvenile or prespawning adult, gill (Platygobio gracilis, SAMN07166486)	59,973,206	69%	28%	193,388
SAMN07166487	NA	juvenile or prespawning adult, gill (Platygobio gracilis, SAMN07166487)	66,836,024	71%	29%	197,129
SAMN07166488	NA	juvenile or prespawning adult, skin (Platygobio gracilis, SAMN07166488)	71,955,068	72%	27%	194,298
SAMN07166489	NA	juvenile or prespawning adult, skin (Platygobio gracilis, SAMN07166489)	54,793,310	72%	34%	178,864
SAMN07166490	NA	juvenile or prespawning adult, skin (Platygobio gracilis, SAMN07166490)	56,007,818	76%	37%	166,133
SAMN07166491	NA	juvenile or prespawning adult, kidney (Platygobio gracilis, SAMN07166491)	73,531,808	71%	29%	196,492
SAMN07166492	NA	juvenile or prespawning adult, kidney (Platygobio gracilis, SAMN07166492)	62,679,372	69%	27%	186,719
SAMN07166493	NA	juvenile or prespawning adult, kidney (Platygobio gracilis, SAMN07166493)	56,818,024	67%	29%	189,535
SAMN10038688	30459712	gut (Pimephales promelas, Adult male, SAMN10038688)	71,150,986	67%	34%	168,178
SAMN10038693	30459712	telencephalon (Pimephales promelas, Adult male, SAMN10038693)	56,951,612	62%	26%	192,006
SAMN10038694	30459712	telencephalon (Pimephales promelas, Adult male, SAMN10038694)	69,472,920	61%	25%	195,034
SAMN10038695	30459712	telencephalon (Pimephales promelas, Adult male, SAMN10038695)	64,457,618	61%	26%	196,739
SAMN10038696	30459712	liver (Pimephales promelas, Adult male, SAMN10038696)	63,717,568	61%	37%	142,683
SAMN10038697	30459712	liver (Pimephales promelas, Adult male, SAMN10038697)	71,968,658	60%	36%	145,934
SAMN10038698	30459712	liver (Pimephales promelas, Adult male, SAMN10038698)	69,650,108	63%	38%	142,842
SAMN10038699	30459712	hypothalamus (Pimephales promelas, Adult male, SAMN10038699)	61,704,660	62%	26%	196,138
SAMN10038700	30459712	hypothalamus (Pimephales promelas, Adult male, SAMN10038700)	71,497,814	61%	24%	169,806
SAMN10038701	30459712	hypothalamus (Pimephales promelas, Adult male, SAMN10038701)	69,741,032	61%	26%	196,913
SAMN10038702	30459712	gut (Pimephales promelas, Adult male, SAMN10038702)	66,563,164	63%	32%	167,133
SAMN10038703	30459712	gut (Pimephales promelas, Adult male, SAMN10038703)	70,648,592	64%	34%	170,017
SAMN14246271	NA	kidney (Pimephales promelas, 7 mo, female, SAMN14246271)	44,795,140	59%	33%	157,633
SAMN14246272	NA	kidney (Pimephales promelas, 7 mo, female, SAMN14246272)	39,319,742	62%	34%	165,402
SAMN14246273	NA	kidney (Pimephales promelas, 7 mo, female, SAMN14246273)	44,065,724	60%	34%	160,795
SAMN14246274	NA	kidney (Pimephales promelas, 7 mo, female, SAMN14246274)	38,769,508	60%	33%	158,374
SAMN14246275	NA	kidney (Pimephales promelas, 7 mo, female, SAMN14246275)	41,726,668	61%	33%	172,964
SAMN14246276	NA	kidney (Pimephales promelas, 7 mo, female, SAMN14246276)	43,004,890	62%	34%	159,741
SAMN14246277	NA	kidney (Pimephales promelas, 7 mo, female, SAMN14246277)	44,122,248	61%	33%	161,675
SAMN14246278	NA	kidney (Pimephales promelas, 7 mo, female, SAMN14246278)	40,649,084	59%	33%	154,126
SAMN14246279	NA	kidney (Pimephales promelas, 7 mo, female, SAMN14246279)	40,728,232	61%	34%	158,234
SAMN14246280	NA	kidney (Pimephales promelas, 7 mo, female, SAMN14246280)	41,660,456	57%	34%	177,830
SAMN14246281	NA	kidney (Pimephales promelas, 7 mo, female, SAMN14246281)	45,279,830	60%	33%	163,470
SAMN14246282	NA	kidney (Pimephales promelas, 7 mo, female, SAMN14246282)	48,514,750	61%	33%	164,400
SAMN14246283	NA	kidney (Pimephales promelas, 7 mo, female, SAMN14246283)	42,718,908	63%	35%	174,277
SAMN14246284	NA	kidney (Pimephales promelas, 7 mo, female, SAMN14246284)	44,966,814	62%	34%	164,770
SAMN14246285	NA	kidney (Pimephales promelas, 7 mo, female, SAMN14246285)	41,476,064	61%	34%	165,176
SAMN14246286	NA	kidney (Pimephales promelas, 7 mo, female, SAMN14246286)	38,318,494	62%	34%	168,266
SAMN15808843	NA	epithelioma (Pimephales promelas, SAMN15808843)	48,960,194	71%	57%	163,856
SAMN15808844	NA	epithelioma (Pimephales promelas, SAMN15808844)	37,623,872	70%	57%	160,568
SAMN15808845	NA	epithelioma (Pimephales promelas, SAMN15808845)	53,788,262	71%	57%	165,311
SAMN15808846	NA	epithelioma (Pimephales promelas, SAMN15808846)	52,717,848	71%	58%	165,551
SAMN15808847	NA	epithelioma (Pimephales promelas, SAMN15808847)	55,105,456	70%	58%	165,567
SAMN15808848	NA	epithelioma (Pimephales promelas, SAMN15808848)	53,506,238	70%	58%	165,203
SAMN15808849	NA	epithelioma (Pimephales promelas, SAMN15808849)	52,086,872	70%	57%	158,562
SAMN15808850	NA	epithelioma (Pimephales promelas, SAMN15808850)	48,873,674	71%	58%	157,229
SAMN15808851	NA	epithelioma (Pimephales promelas, SAMN15808851)	56,151,838	72%	58%	159,309
SAMN15808852	NA	epithelioma (Pimephales promelas, SAMN15808852)	53,717,776	70%	57%	158,791
SAMN15808853	NA	epithelioma (Pimephales promelas, SAMN15808853)	50,482,686	72%	58%	157,955
SAMN15808854	NA	epithelioma (Pimephales promelas, SAMN15808854)	51,477,708	71%	58%	157,094
SAMN18344483	NA	epithelial tissue (Pimephales promelas, 1-year-old, female and male, SAMN18344483)	88,770,036	69%	59%	172,636
SAMN21019948	35026279	Liver (Pimephales promelas, male, SAMN21019948)	90,300,334	57%	36%	156,808
SAMN21019949	35026279	Liver (Pimephales promelas, male, SAMN21019949)	66,886,388	57%	35%	147,982
SAMN21019950	35026279	Liver (Pimephales promelas, male, SAMN21019950)	61,477,958	57%	36%	146,477
SAMN21019951	35026279	Liver (Pimephales promelas, male, SAMN21019951)	54,899,620	59%	37%	151,237
SAMN21019952	35026279	Liver (Pimephales promelas, male, SAMN21019952)	61,128,722	59%	35%	142,969
SAMN21019953	35026279	Liver (Pimephales promelas, male, SAMN21019953)	51,269,012	57%	37%	142,227
SAMN21019954	35026279	Liver (Pimephales promelas, male, SAMN21019954)	65,971,908	57%	35%	160,594
SAMN21019955	35026279	Liver (Pimephales promelas, male, SAMN21019955)	62,610,604	59%	36%	150,427
SAMN21019956	35026279	Liver (Pimephales promelas, male, SAMN21019956)	67,195,236	60%	35%	148,445
SAMN21019957	35026279	Liver (Pimephales promelas, male, SAMN21019957)	59,374,150	60%	35%	152,158
SAMN21019958	35026279	Liver (Pimephales promelas, male, SAMN21019958)	55,600,002	59%	37%	146,464
SAMN21019959	35026279	Liver (Pimephales promelas, male, SAMN21019959)	64,205,552	56%	36%	148,005
SAMN21019960	35026279	Liver (Pimephales promelas, male, SAMN21019960)	59,312,602	59%	36%	157,829
SAMN21019961	35026279	Liver (Pimephales promelas, male, SAMN21019961)	56,214,606	58%	37%	147,501
SAMN21019962	35026279	Liver (Pimephales promelas, male, SAMN21019962)	62,309,624	58%	38%	142,723
SAMN21019963	35026279	Brain (Pimephales promelas, male, SAMN21019963)	75,010,844	60%	25%	204,637
SAMN21019964	35026279	Brain (Pimephales promelas, male, SAMN21019964)	67,343,342	62%	27%	205,458
SAMN21019965	35026279	Brain (Pimephales promelas, male, SAMN21019965)	57,814,236	60%	25%	200,640
SAMN21019966	35026279	Brain (Pimephales promelas, male, SAMN21019966)	67,491,296	62%	27%	207,648
SAMN21019967	35026279	Brain (Pimephales promelas, male, SAMN21019967)	62,984,424	62%	28%	205,005
SAMN21019968	35026279	Brain (Pimephales promelas, male, SAMN21019968)	81,504,772	63%	27%	207,740
SAMN21019969	35026279	Brain (Pimephales promelas, male, SAMN21019969)	75,466,886	63%	28%	211,006
SAMN21019970	35026279	Brain (Pimephales promelas, male, SAMN21019970)	59,360,830	62%	27%	204,021
SAMN21019971	35026279	Brain (Pimephales promelas, male, SAMN21019971)	63,374,952	61%	26%	201,705
SAMN21019972	35026279	Brain (Pimephales promelas, male, SAMN21019972)	62,676,676	61%	26%	200,765
SAMN21019973	35026279	Brain (Pimephales promelas, male, SAMN21019973)	59,117,524	62%	27%	205,723
SAMN21019974	35026279	Brain (Pimephales promelas, male, SAMN21019974)	57,486,264	61%	26%	204,088
SAMN21019975	35026279	Brain (Pimephales promelas, male, SAMN21019975)	62,714,724	61%	26%	202,779
SAMN21019976	35026279	Brain (Pimephales promelas, male, SAMN21019976)	68,332,910	29%	28%	191,729
SAMN21019977	35026279	Brain (Pimephales promelas, male, SAMN21019977)	73,165,002	62%	27%	207,687
SAMN30589513	NA	whole embryo bodies (Pimephales promelas, 7-days post-fertilization, SAMN30589513)	86,981,170	68%	35%	227,229
SAMN30589514	NA	whole embryo bodies (Pimephales promelas, 7-days post-fertilization, SAMN30589514)	113,586,476	67%	34%	230,031
SAMN30589515	NA	whole embryo bodies (Pimephales promelas, 7-days post-fertilization, SAMN30589515)	55,141,504	68%	34%	218,046
SAMN30589516	NA	whole embryo bodies (Pimephales promelas, 7-days post-fertilization, SAMN30589516)	56,520,756	67%	35%	217,745
SAMN30589517	NA	whole embryo bodies (Pimephales promelas, 7-days post-fertilization, SAMN30589517)	61,912,122	67%	35%	217,792
SAMN30589518	NA	whole embryo bodies (Pimephales promelas, 7-days post-fertilization, SAMN30589518)	52,867,806	68%	35%	219,816
SAMN30589519	NA	whole embryo bodies (Pimephales promelas, 7-days post-fertilization, SAMN30589519)	153,546,206	66%	33%	233,917
SAMN30589520	NA	whole embryo bodies (Pimephales promelas, 7-days post-fertilization, SAMN30589520)	115,862,608	65%	32%	230,924
SAMN30589521	NA	whole embryo bodies (Pimephales promelas, 7-days post-fertilization, SAMN30589521)	57,597,140	66%	34%	217,194
SAMN30589522	NA	whole embryo bodies (Pimephales promelas, 7-days post-fertilization, SAMN30589522)	85,265,536	67%	35%	225,548
SAMN30589523	NA	whole embryo bodies (Pimephales promelas, 7-days post-fertilization, SAMN30589523)	247,097,754	68%	35%	239,897
SAMN30589524	NA	whole embryo bodies (Pimephales promelas, 7-days post-fertilization, SAMN30589524)	210,915,694	68%	35%	238,180
SAMN30589525	NA	whole embryo bodies (Pimephales promelas, 7-days post-fertilization, SAMN30589525)	56,016,468	67%	35%	218,418
SAMN30589526	NA	whole embryo bodies (Pimephales promelas, 7-days post-fertilization, SAMN30589526)	57,825,898	66%	34%	218,198
SAMN30589527	NA	whole embryo bodies (Pimephales promelas, 7-days post-fertilization, SAMN30589527)	57,335,176	67%	35%	218,937
SAMN34043878	NA	manhood, Muscle, FHM (Pimephales promelas, SAMN34043878)	391,228,670	45%	56%	191,147

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
SRR1582202	SRX700303	SRP047077	SAMN03024265	470,813,940	63%	36%
SRR5601338	SRX2854042	SRP107991	SAMN07166470	74,453,248	55%	34%
SRR5601341	SRX2854039	SRP107991	SAMN07166471	68,334,258	51%	29%
SRR5601340	SRX2854040	SRP107991	SAMN07166472	68,897,252	51%	31%
SRR5601337	SRX2854043	SRP107991	SAMN07166473	68,668,538	42%	26%
SRR5601336	SRX2854044	SRP107991	SAMN07166474	74,843,062	58%	31%
SRR5601339	SRX2854041	SRP107991	SAMN07166475	69,316,180	58%	31%
SRR5601363	SRX2854017	SRP107991	SAMN07166476	64,789,504	59%	30%
SRR5601364	SRX2854016	SRP107991	SAMN07166477	71,807,394	52%	30%
SRR5601365	SRX2854015	SRP107991	SAMN07166478	66,354,774	62%	33%
SRR5601343	SRX2854037	SRP107991	SAMN07166479	54,323,650	45%	25%
SRR5601342	SRX2854038	SRP107991	SAMN07166480	77,613,208	41%	25%
SRR5601362	SRX2854018	SRP107991	SAMN07166481	68,435,540	49%	27%
SRR5601361	SRX2854019	SRP107991	SAMN07166482	60,464,086	73%	32%
SRR5601366	SRX2854014	SRP107991	SAMN07166483	62,814,470	71%	32%
SRR5601367	SRX2854013	SRP107991	SAMN07166484	70,838,038	69%	30%
SRR5601358	SRX2854022	SRP107991	SAMN07166485	73,925,290	71%	29%
SRR5601359	SRX2854021	SRP107991	SAMN07166486	59,973,206	69%	28%
SRR5601360	SRX2854020	SRP107991	SAMN07166487	66,836,024	71%	29%
SRR5601368	SRX2854012	SRP107991	SAMN07166488	71,955,068	72%	27%
SRR5601347	SRX2854033	SRP107991	SAMN07166489	54,793,310	72%	34%
SRR5601346	SRX2854034	SRP107991	SAMN07166490	56,007,818	76%	37%
SRR5601345	SRX2854035	SRP107991	SAMN07166491	73,531,808	71%	29%
SRR5601344	SRX2854036	SRP107991	SAMN07166492	62,679,372	69%	27%
SRR5601369	SRX2854011	SRP107991	SAMN07166493	56,818,024	67%	29%
SRR7822513	SRX4673680	SRP161579	SAMN10038688	71,150,986	67%	34%
SRR7822524	SRX4673691	SRP161579	SAMN10038693	56,951,612	62%	26%
SRR7822523	SRX4673690	SRP161579	SAMN10038694	69,472,920	61%	25%
SRR7822522	SRX4673689	SRP161579	SAMN10038695	64,457,618	61%	26%
SRR7822521	SRX4673688	SRP161579	SAMN10038696	63,717,568	61%	37%
SRR7822520	SRX4673687	SRP161579	SAMN10038697	71,968,658	60%	36%
SRR7822519	SRX4673686	SRP161579	SAMN10038698	69,650,108	63%	38%
SRR7822518	SRX4673685	SRP161579	SAMN10038699	61,704,660	62%	26%
SRR7822517	SRX4673684	SRP161579	SAMN10038700	71,497,814	61%	24%
SRR7822516	SRX4673683	SRP161579	SAMN10038701	69,741,032	61%	26%
SRR7822515	SRX4673682	SRP161579	SAMN10038702	66,563,164	63%	32%
SRR7822514	SRX4673681	SRP161579	SAMN10038703	70,648,592	64%	34%
SRR11197916	SRX7817852	SRP251038	SAMN14246271	44,795,140	59%	33%
SRR11197915	SRX7817851	SRP251038	SAMN14246272	39,319,742	62%	34%
SRR11197914	SRX7817850	SRP251038	SAMN14246273	44,065,724	60%	34%
SRR11197913	SRX7817849	SRP251038	SAMN14246274	38,769,508	60%	33%
SRR11197912	SRX7817848	SRP251038	SAMN14246275	41,726,668	61%	33%
SRR11197911	SRX7817847	SRP251038	SAMN14246276	43,004,890	62%	34%
SRR11197910	SRX7817846	SRP251038	SAMN14246277	44,122,248	61%	33%
SRR11197909	SRX7817845	SRP251038	SAMN14246278	40,649,084	59%	33%
SRR11197908	SRX7817844	SRP251038	SAMN14246279	40,728,232	61%	34%
SRR11197907	SRX7817843	SRP251038	SAMN14246280	41,660,456	57%	34%
SRR11197906	SRX7817842	SRP251038	SAMN14246281	45,279,830	60%	33%
SRR11197905	SRX7817841	SRP251038	SAMN14246282	48,514,750	61%	33%
SRR11197904	SRX7817840	SRP251038	SAMN14246283	42,718,908	63%	35%
SRR11197903	SRX7817839	SRP251038	SAMN14246284	44,966,814	62%	34%
SRR11197902	SRX7817838	SRP251038	SAMN14246285	41,476,064	61%	34%
SRR11197901	SRX7817837	SRP251038	SAMN14246286	38,318,494	62%	34%
SRR12445082	SRX8939616	SRP277336	SAMN15808843	48,960,194	71%	57%
SRR12445081	SRX8939617	SRP277336	SAMN15808844	37,623,872	70%	57%
SRR12445078	SRX8939620	SRP277336	SAMN15808845	53,788,262	71%	57%
SRR12445077	SRX8939621	SRP277336	SAMN15808846	52,717,848	71%	58%
SRR12445076	SRX8939622	SRP277336	SAMN15808847	55,105,456	70%	58%
SRR12445075	SRX8939623	SRP277336	SAMN15808848	53,506,238	70%	58%
SRR12445074	SRX8939624	SRP277336	SAMN15808849	52,086,872	70%	57%
SRR12445073	SRX8939625	SRP277336	SAMN15808850	48,873,674	71%	58%
SRR12445072	SRX8939626	SRP277336	SAMN15808851	56,151,838	72%	58%
SRR12445071	SRX8939627	SRP277336	SAMN15808852	53,717,776	70%	57%
SRR12445080	SRX8939618	SRP277336	SAMN15808853	50,482,686	72%	58%
SRR12445079	SRX8939619	SRP277336	SAMN15808854	51,477,708	71%	58%
SRR13997290	SRX10374716	SRP311197	SAMN18344483	88,770,036	69%	59%
SRR15652209	SRX11949283	SRP334446	SAMN21019948	90,300,334	57%	36%
SRR15652210	SRX11949284	SRP334446	SAMN21019949	66,886,388	57%	35%
SRR15652211	SRX11949285	SRP334446	SAMN21019950	61,477,958	57%	36%
SRR15652212	SRX11949286	SRP334446	SAMN21019951	54,899,620	59%	37%
SRR15652213	SRX11949287	SRP334446	SAMN21019952	61,128,722	59%	35%
SRR15652214	SRX11949288	SRP334446	SAMN21019953	51,269,012	57%	37%
SRR15652215	SRX11949289	SRP334446	SAMN21019954	65,971,908	57%	35%
SRR15652216	SRX11949290	SRP334446	SAMN21019955	62,610,604	59%	36%
SRR15652217	SRX11949291	SRP334446	SAMN21019956	67,195,236	60%	35%
SRR15652218	SRX11949292	SRP334446	SAMN21019957	59,374,150	60%	35%
SRR15652219	SRX11949293	SRP334446	SAMN21019958	55,600,002	59%	37%
SRR15652220	SRX11949294	SRP334446	SAMN21019959	64,205,552	56%	36%
SRR15652221	SRX11949295	SRP334446	SAMN21019960	59,312,602	59%	36%
SRR15652222	SRX11949296	SRP334446	SAMN21019961	56,214,606	58%	37%
SRR15652223	SRX11949297	SRP334446	SAMN21019962	62,309,624	58%	38%
SRR15652224	SRX11949298	SRP334446	SAMN21019963	75,010,844	60%	25%
SRR15652225	SRX11949299	SRP334446	SAMN21019964	67,343,342	62%	27%
SRR15652226	SRX11949300	SRP334446	SAMN21019965	57,814,236	60%	25%
SRR15652227	SRX11949301	SRP334446	SAMN21019966	67,491,296	62%	27%
SRR15652228	SRX11949302	SRP334446	SAMN21019967	62,984,424	62%	28%
SRR15652229	SRX11949303	SRP334446	SAMN21019968	81,504,772	63%	27%
SRR15652230	SRX11949304	SRP334446	SAMN21019969	75,466,886	63%	28%
SRR15652231	SRX11949305	SRP334446	SAMN21019970	59,360,830	62%	27%
SRR15652232	SRX11949306	SRP334446	SAMN21019971	63,374,952	61%	26%
SRR15652233	SRX11949307	SRP334446	SAMN21019972	62,676,676	61%	26%
SRR15652234	SRX11949308	SRP334446	SAMN21019973	59,117,524	62%	27%
SRR15652235	SRX11949309	SRP334446	SAMN21019974	57,486,264	61%	26%
SRR15652236	SRX11949310	SRP334446	SAMN21019975	62,714,724	61%	26%
SRR15652237	SRX11949311	SRP334446	SAMN21019976	68,332,910	29%	28%
SRR15652238	SRX11949312	SRP334446	SAMN21019977	73,165,002	62%	27%
SRR21314752	SRX17320977	SRP394841	SAMN30589513	86,981,170	68%	35%
SRR21314753	SRX17320976	SRP394841	SAMN30589514	113,586,476	67%	34%
SRR21314754	SRX17320975	SRP394841	SAMN30589515	55,141,504	68%	34%
SRR21314755	SRX17320974	SRP394841	SAMN30589516	56,520,756	67%	35%
SRR21314756	SRX17320973	SRP394841	SAMN30589517	61,912,122	67%	35%
SRR21314757	SRX17320972	SRP394841	SAMN30589518	52,867,806	68%	35%
SRR21314758	SRX17320971	SRP394841	SAMN30589519	153,546,206	66%	33%
SRR21314759	SRX17320970	SRP394841	SAMN30589520	115,862,608	65%	32%
SRR21314760	SRX17320969	SRP394841	SAMN30589521	57,597,140	66%	34%
SRR21314761	SRX17320968	SRP394841	SAMN30589522	85,265,536	67%	35%
SRR21314762	SRX17320967	SRP394841	SAMN30589523	247,097,754	68%	35%
SRR21314763	SRX17320966	SRP394841	SAMN30589524	210,915,694	68%	35%
SRR21314764	SRX17320965	SRP394841	SAMN30589525	56,016,468	67%	35%
SRR21314765	SRX17320964	SRP394841	SAMN30589526	57,825,898	66%	34%
SRR21314766	SRX17320963	SRP394841	SAMN30589527	57,335,176	67%	35%
SRR24044853	SRX19846788	SRP430565	SAMN34043878	46,432,138	61%	56%
SRR24044852	SRX19846789	SRP430565	SAMN34043878	43,088,852	8%	47%
SRR24044851	SRX19846790	SRP430565	SAMN34043878	43,937,000	62%	56%
SRR24044850	SRX19846791	SRP430565	SAMN34043878	42,394,344	7%	45%
SRR24044849	SRX19846792	SRP430565	SAMN34043878	39,186,292	61%	56%
SRR24044848	SRX19846793	SRP430565	SAMN34043878	42,095,308	7%	47%
SRR24044847	SRX19846794	SRP430565	SAMN34043878	47,406,426	63%	56%
SRR24044846	SRX19846795	SRP430565	SAMN34043878	40,225,682	64%	56%
SRR24044845	SRX19846796	SRP430565	SAMN34043878	46,462,628	64%	56%

Protein alignments

The alignments of the following proteins with ProSplign were used for gene prediction:

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Cynoglossus semilaevis high-quality model RefSeq (XP_)	14,331	13,778 (96.14%)	13,778 (96.14%)	67.69%	74.91%
Actinopterygii GenBank	61,484	57,951 (94.25%)	57,951 (94.25%)	69.95%	81.17%
Actinopterygii known RefSeq (NP_)	25,458	24,287 (95.40%)	24,287 (95.40%)	71.26%	81.50%
Danio rerio high-quality model RefSeq (XP_)	7,712	7,464 (96.78%)	7,464 (96.78%)	71.25%	80.51%
Xiphophorus maculatus high-quality model RefSeq (XP_)	18,457	17,609 (95.41%)	17,609 (95.41%)	66.53%	74.19%
Oryzias latipes high-quality model RefSeq (XP_)	17,157	16,434 (95.79%)	16,434 (95.79%)	67.11%	74.32%
Oreochromis niloticus high-quality model RefSeq (XP_)	19,546	18,599 (95.16%)	18,599 (95.16%)	65.80%	74.33%
Homo sapiens known RefSeq (NP_)	66,931	55,511 (82.94%)	55,511 (82.94%)	66.49%	68.24%

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
BUSCO: Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. Molecular biology and evolution 2021.38(10):4647-4654
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20
STAR: Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. Bioinformatics 2013 Jan 1;29(1):15-21.
Minimap2: Li H. Bioinformatics 2018 Sep 15;34(18):3094-3100

RefSeq

Integrated reference sequences