NCBI Syngnathus typhle Annotation Release GCF_033458585.1-RS_2023_12

The genome sequence records for Syngnathus typhle RefSeq assembly GCF_033458585.1 (RoL_Styp_1.0) were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
BUSCO results: Annotation completeness assessed with BUSCO
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as "GCF_033458585.1-RS_2023_12".

Date of Entrez queries for transcripts and proteins: Dec 2 2023
Date of submission of annotation to the public databases: Dec 5 2023
Software version: 10.2

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
RoL_Styp_1.0	GCF_033458585.1	University of Idaho at Moscow	11-13-2023	Reference	22 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	RoL_Styp_1.0
Genes and pseudogenes	28,904
protein-coding	20,226
non-coding	8,190
Transcribed pseudogenes	0
Non-transcribed pseudogenes	471
genes with variants	9,149
Immunoglobulin/T-cell receptor gene segments	10
other	7
mRNAs	41,038
fully-supported	40,224
with > 5% ab initio	292
partial	64
with filled gap(s)	0
known RefSeq (NM_)	0
model RefSeq (XM_)	41,038
non-coding RNAs	10,471
fully-supported	4,425
with > 5% ab initio	0
partial	2
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	8,601
pseudo transcripts	0
fully-supported	0
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	0
CDSs	41,048
fully-supported	40,224
with > 5% ab initio	341
partial	64
with major correction(s)	208
known RefSeq (NP_)	0
model RefSeq (XP_)	41,038

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	28,423	8,879	3,987	56	534,890
All transcripts	51,509	3,018	2,461	56	99,004
mRNA	41,038	3,453	2,811	138	99,004
misc_RNA	1,524	3,558	2,871	270	25,087
tRNA	1,870	75	73	71	87
lncRNA	2,901	1,858	1,276	124	15,467
snoRNA	169	148	134	62	318
snRNA	395	162	164	56	200
rRNA	3,605	756	119	115	4,388
Single-exon transcripts	577	2,185	1,749	246	11,866
coding transcripts (NM_/XM_ )	577	2,185	1,749	246	11,866
CDSs	41,038	2,176	1,578	96	97,956
Exons	265,622	282	140	2	19,284
in coding transcripts (NM_/XM_ )	255,173	271	138	2	19,284
in non-coding transcripts (NR_/XR_ )	20,682	366	141	8	16,078
Introns	237,868	1,014	148	30	332,634
in coding transcripts (NM_/XM_ )	230,864	968	145	30	332,634
in non-coding transcripts (NR_/XR_ )	17,109	1,734	216	30	267,629

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	1.87	1	1	50
Number of exons per transcript	12.28	9	1	253

BUSCO analysis of gene annotation

BUSCO v4.1.4 was run in "protein" mode on the annotated gene set picking one longest protein per gene, and run using the actinopterygii_odb10 lineage dataset. Results are reported for the gene set from the primary assembly unit, and presented in BUSCO notation.

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the UniProtKB/Swiss-Prot curated proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 20226 coding genes, 18843 genes had a protein with an alignment covering 50% or more of the query and 9338 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: UniProtKB/Swiss-Prot curated proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker (if calculated), for each assembly. RepeatMasker results are only calculated for organisms with complete Dfam HMM model collections.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with WindowMasker
RoL_Styp_1.0	GCF_033458585.1	43.29%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez Nucleotide, Entrez Protein, and SRA, and aligned to the genome.

Transcript alignments

The alignments of the following transcripts with Splign were used for gene prediction:

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species Genbank	16	16 (100.00%)	15 (93.75%)	98.58%	98.29%

RNA-Seq alignments

The alignments of the following RNA-Seq reads with STAR were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	Aggregate of all aligned samples	4,487,872,200	73%	52%	265,654
SAMEA6647188	brood_pouch (Syngnathus typhle, male, SAMEA6647188)	4,256,782	86%	36%	124,004
SAMEA6647189	brood_pouch (Syngnathus typhle, male, SAMEA6647189)	2,658,002	86%	35%	96,707
SAMEA6647191	brood_pouch (Syngnathus typhle, male, SAMEA6647191)	3,910,794	87%	37%	110,317
SAMEA6647192	brood_pouch (Syngnathus typhle, male, SAMEA6647192)	6,168,984	86%	39%	129,879
SAMEA6647193	brood_pouch (Syngnathus typhle, male, SAMEA6647193)	1,980,558	85%	35%	98,691
SAMEA6647194	brood_pouch (Syngnathus typhle, male, SAMEA6647194)	1,948,990	85%	35%	97,065
SAMEA6647195	brood_pouch (Syngnathus typhle, male, SAMEA6647195)	7,092,194	86%	35%	144,395
SAMEA6647196	brood_pouch (Syngnathus typhle, male, SAMEA6647196)	6,003,778	86%	37%	137,081
SAMEA6647197	brood_pouch (Syngnathus typhle, male, SAMEA6647197)	3,201,478	86%	36%	102,832
SAMEA6647199	brood_pouch (Syngnathus typhle, male, SAMEA6647199)	7,810,566	87%	34%	142,531
SAMEA6647200	brood_pouch (Syngnathus typhle, male, SAMEA6647200)	6,061,384	86%	34%	136,655
SAMEA6647201	brood_pouch (Syngnathus typhle, male, SAMEA6647201)	2,740,332	85%	33%	105,120
SAMEA6647203	brood_pouch (Syngnathus typhle, male, SAMEA6647203)	6,733,580	86%	35%	132,943
SAMEA6647204	brood_pouch (Syngnathus typhle, male, SAMEA6647204)	2,699,038	83%	33%	94,408
SAMEA6647207	brood_pouch (Syngnathus typhle, male, SAMEA6647207)	3,046,818	88%	36%	74,641
SAMEA6647208	gill (Syngnathus typhle, female, SAMEA6647208)	3,850,078	75%	32%	121,458
SAMEA6647209	gill (Syngnathus typhle, female, SAMEA6647209)	2,571,090	67%	32%	95,776
SAMEA6647211	gill (Syngnathus typhle, female, SAMEA6647211)	8,000,000	70%	33%	141,231
SAMEA6647212	gill (Syngnathus typhle, female, SAMEA6647212)	5,006,766	63%	30%	123,887
SAMEA6647213	liver (Syngnathus typhle, female, SAMEA6647213)	4,768,520	87%	43%	72,192
SAMEA6647214	liver (Syngnathus typhle, female, SAMEA6647214)	2,961,394	88%	45%	41,949
SAMEA6647215	liver (Syngnathus typhle, female, SAMEA6647215)	7,664,200	88%	41%	94,661
SAMEA6647216	liver (Syngnathus typhle, female, SAMEA6647216)	6,414,162	90%	46%	66,900
SAMEA6647217	liver (Syngnathus typhle, female, SAMEA6647217)	2,982,526	87%	44%	47,430
SAMEA6647219	belly (Syngnathus typhle, female, SAMEA6647219)	5,178,270	89%	39%	90,881
SAMEA6647220	belly (Syngnathus typhle, female, SAMEA6647220)	4,801,430	89%	40%	81,048
SAMEA6647223	gill (Syngnathus typhle, male, SAMEA6647223)	4,297,274	74%	32%	114,052
SAMEA6647224	gill (Syngnathus typhle, male, SAMEA6647224)	5,103,810	73%	32%	123,566
SAMEA6647225	gill (Syngnathus typhle, male, SAMEA6647225)	3,149,436	70%	32%	107,422
SAMEA6647226	gill (Syngnathus typhle, male, SAMEA6647226)	3,449,010	76%	32%	110,004
SAMEA6647227	gill (Syngnathus typhle, male, SAMEA6647227)	7,812,492	74%	33%	144,402
SAMEA6647228	liver (Syngnathus typhle, male, SAMEA6647228)	8,000,000	89%	41%	99,423
SAMEA6647229	liver (Syngnathus typhle, male, SAMEA6647229)	4,265,604	87%	39%	80,077
SAMEA6647230	liver (Syngnathus typhle, male, SAMEA6647230)	2,565,736	88%	39%	62,493
SAMEA6647231	liver (Syngnathus typhle, male, SAMEA6647231)	16,000,000	89%	39%	94,242
SAMEA6647232	liver (Syngnathus typhle, male, SAMEA6647232)	6,467,438	90%	40%	89,146
SAMEA6647235	48_larvae (Syngnathus typhle, SAMEA6647235)	16,000,000	87%	37%	160,690
SAMN20842776	brood pouch (Syngnathus typhle, male, SAMN20842776)	50,819,100	79%	52%	198,699
SAMN20842777	brood pouch (Syngnathus typhle, male, SAMN20842777)	52,831,390	80%	54%	195,999
SAMN20842778	brood pouch (Syngnathus typhle, male, SAMN20842778)	48,969,844	78%	55%	197,383
SAMN20842779	brood pouch (Syngnathus typhle, male, SAMN20842779)	52,853,364	78%	54%	197,943
SAMN20842780	brood pouch (Syngnathus typhle, male, SAMN20842780)	48,436,978	78%	48%	192,225
SAMN20842781	brood pouch (Syngnathus typhle, male, SAMN20842781)	51,614,464	80%	55%	199,570
SAMN20842782	brood pouch (Syngnathus typhle, male, SAMN20842782)	51,486,332	74%	47%	187,031
SAMN20842783	brood pouch (Syngnathus typhle, male, SAMN20842783)	49,861,630	77%	50%	193,315
SAMN20842784	brood pouch (Syngnathus typhle, male, SAMN20842784)	51,178,162	69%	49%	195,543
SAMN20842785	brood pouch (Syngnathus typhle, male, SAMN20842785)	48,143,540	70%	52%	198,110
SAMN20842786	brood pouch (Syngnathus typhle, male, SAMN20842786)	49,977,780	70%	50%	195,927
SAMN20842787	brood pouch (Syngnathus typhle, male, SAMN20842787)	49,508,350	70%	50%	196,053
SAMN20842788	brood pouch (Syngnathus typhle, male, SAMN20842788)	50,995,116	78%	54%	196,402
SAMN20842789	brood pouch (Syngnathus typhle, male, SAMN20842789)	47,916,962	80%	55%	198,866
SAMN20842790	brood pouch (Syngnathus typhle, male, SAMN20842790)	48,285,708	77%	56%	198,768
SAMN20842791	brood pouch (Syngnathus typhle, male, SAMN20842791)	48,039,082	79%	49%	184,274
SAMN20842792	brood pouch (Syngnathus typhle, male, SAMN20842792)	48,211,828	79%	51%	192,190
SAMN20842793	brood pouch (Syngnathus typhle, male, SAMN20842793)	50,300,138	80%	55%	200,935
SAMN20842794	brood pouch (Syngnathus typhle, male, SAMN20842794)	51,289,336	80%	53%	204,049
SAMN20842795	brood pouch (Syngnathus typhle, male, SAMN20842795)	48,684,486	77%	53%	200,025
SAMN20842796	brood pouch (Syngnathus typhle, male, SAMN20842796)	50,938,994	78%	52%	202,291
SAMN20842797	brood pouch (Syngnathus typhle, male, SAMN20842797)	49,453,242	78%	53%	198,188
SAMN20842798	brood pouch (Syngnathus typhle, male, SAMN20842798)	52,519,386	77%	51%	197,701
SAMN20842799	brood pouch (Syngnathus typhle, male, SAMN20842799)	46,405,346	75%	47%	187,315
SAMN23848188	Surgical Area Tissue (Syngnathus typhle, female, SAMN23848188)	73,352,774	74%	61%	193,460
SAMN23848189	Surgical Area Tissue (Syngnathus typhle, female, SAMN23848189)	48,072,554	72%	60%	177,953
SAMN23848190	Surgical Area Tissue (Syngnathus typhle, female, SAMN23848190)	25,610,666	73%	58%	179,625
SAMN23848191	Surgical Area Tissue (Syngnathus typhle, female, SAMN23848191)	70,990,710	74%	53%	209,469
SAMN23848192	Surgical Area Tissue (Syngnathus typhle, female, SAMN23848192)	26,955,432	75%	59%	152,839
SAMN23848193	Surgical Area Tissue (Syngnathus typhle, female, SAMN23848193)	47,385,330	71%	52%	190,294
SAMN23848194	Surgical Area Tissue (Syngnathus typhle, female, SAMN23848194)	73,298,242	74%	60%	197,355
SAMN23848195	Surgical Area Tissue (Syngnathus typhle, female, SAMN23848195)	55,484,156	75%	60%	184,826
SAMN23848196	Surgical Area Tissue (Syngnathus typhle, female, SAMN23848196)	58,473,470	77%	62%	183,093
SAMN23848197	Surgical Area Tissue (Syngnathus typhle, female, SAMN23848197)	52,964,378	75%	51%	189,612
SAMN23848198	Surgical Area Tissue (Syngnathus typhle, female, SAMN23848198)	56,459,390	74%	59%	195,282
SAMN23848199	Surgical Area Tissue (Syngnathus typhle, female, SAMN23848199)	45,029,492	74%	53%	194,325
SAMN23848200	Surgical Area Tissue (Syngnathus typhle, female, SAMN23848200)	30,656,260	70%	55%	174,948
SAMN23848201	Surgical Area Tissue (Syngnathus typhle, female, SAMN23848201)	27,728,376	1%	53%	1,899
SAMN23848202	Surgical Area Tissue (Syngnathus typhle, female, SAMN23848202)	47,627,742	71%	52%	193,091
SAMN23848203	Surgical Area Tissue (Syngnathus typhle, female, SAMN23848203)	51,556,848	70%	53%	195,397
SAMN23848204	Surgical Area Tissue (Syngnathus typhle, female, SAMN23848204)	67,436,240	70%	53%	205,902
SAMN23848205	Surgical Area Tissue (Syngnathus typhle, female, SAMN23848205)	48,054,342	73%	54%	197,596
SAMN23848206	Surgical Area Tissue (Syngnathus typhle, female, SAMN23848206)	62,057,176	74%	54%	203,142
SAMN23848207	Surgical Area Tissue (Syngnathus typhle, female, SAMN23848207)	35,599,626	71%	59%	177,988
SAMN23848208	Surgical Area Tissue (Syngnathus typhle, female, SAMN23848208)	62,557,812	64%	53%	190,955
SAMN23848209	Surgical Area Tissue (Syngnathus typhle, female, SAMN23848209)	45,779,938	70%	56%	189,725
SAMN23848210	Surgical Area Tissue (Syngnathus typhle, female, SAMN23848210)	49,787,872	71%	54%	190,060
SAMN23848211	Surgical Area Tissue (Syngnathus typhle, female, SAMN23848211)	50,242,450	67%	51%	194,935
SAMN23848212	Gill (Syngnathus typhle, female, SAMN23848212)	79,805,160	65%	52%	212,771
SAMN23848213	Gill (Syngnathus typhle, female, SAMN23848213)	80,113,300	74%	52%	217,703
SAMN23848214	Gill (Syngnathus typhle, female, SAMN23848214)	104,916,510	70%	52%	222,776
SAMN23848215	Gill (Syngnathus typhle, female, SAMN23848215)	82,579,882	72%	51%	219,410
SAMN23848216	Gill (Syngnathus typhle, female, SAMN23848216)	81,108,792	69%	52%	218,862
SAMN23848217	Gill (Syngnathus typhle, female, SAMN23848217)	76,834,952	69%	51%	215,469
SAMN23848218	Gill (Syngnathus typhle, female, SAMN23848218)	66,737,780	70%	51%	212,409
SAMN23848219	Gill (Syngnathus typhle, female, SAMN23848219)	76,239,740	71%	53%	216,797
SAMN23848220	Gill (Syngnathus typhle, female, SAMN23848220)	91,943,812	68%	52%	217,854
SAMN23848221	Gill (Syngnathus typhle, female, SAMN23848221)	73,259,120	65%	48%	212,742
SAMN23848222	Gill (Syngnathus typhle, female, SAMN23848222)	81,185,562	70%	53%	216,139
SAMN23848223	Gill (Syngnathus typhle, female, SAMN23848223)	79,255,384	71%	49%	216,423
SAMN23848224	Gill (Syngnathus typhle, female, SAMN23848224)	59,408,322	71%	54%	210,568
SAMN23848225	Gill (Syngnathus typhle, female, SAMN23848225)	68,667,396	66%	50%	212,299
SAMN23848226	Gill (Syngnathus typhle, female, SAMN23848226)	92,848,500	71%	52%	218,170
SAMN23848227	Gill (Syngnathus typhle, female, SAMN23848227)	65,012,332	69%	52%	213,166
SAMN23848228	Gill (Syngnathus typhle, female, SAMN23848228)	77,040,346	69%	50%	218,207
SAMN23848229	Gill (Syngnathus typhle, female, SAMN23848229)	82,906,420	73%	54%	219,591
SAMN23848230	Gill (Syngnathus typhle, female, SAMN23848230)	86,712,516	67%	53%	215,998
SAMN23848231	Gill (Syngnathus typhle, female, SAMN23848231)	77,146,220	70%	54%	214,335
SAMN23848232	Gill (Syngnathus typhle, female, SAMN23848232)	60,528,878	70%	50%	208,453
SAMN23848233	Gill (Syngnathus typhle, female, SAMN23848233)	65,675,080	70%	51%	213,749
SAMN23848234	Gill (Syngnathus typhle, female, SAMN23848234)	96,306,978	72%	53%	219,592
SAMN23848235	Gill (Syngnathus typhle, female, SAMN23848235)	72,134,870	71%	50%	215,188

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
ERR3994076	ERX3995494	ERP114765	SAMEA6647188	4,256,782	86%	36%
ERR3994078	ERX3995496	ERP114765	SAMEA6647189	2,658,002	86%	35%
ERR3994081	ERX3995499	ERP114765	SAMEA6647191	3,910,794	87%	37%
ERR3994083	ERX3995501	ERP114765	SAMEA6647192	6,168,984	86%	39%
ERR3994084	ERX3995502	ERP114765	SAMEA6647193	1,980,558	85%	35%
ERR3994091	ERX3995509	ERP114765	SAMEA6647194	1,948,990	85%	35%
ERR3994130	ERX3995548	ERP114765	SAMEA6647195	7,092,194	86%	35%
ERR3994132	ERX3995550	ERP114765	SAMEA6647196	6,003,778	86%	37%
ERR3994134	ERX3995552	ERP114765	SAMEA6647197	3,201,478	86%	36%
ERR3994136	ERX3995554	ERP114765	SAMEA6647199	7,810,566	87%	34%
ERR3994138	ERX3995556	ERP114765	SAMEA6647200	6,061,384	86%	34%
ERR3994140	ERX3995558	ERP114765	SAMEA6647201	2,740,332	85%	33%
ERR3994142	ERX3995560	ERP114765	SAMEA6647203	6,733,580	86%	35%
ERR3994143	ERX3995561	ERP114765	SAMEA6647204	2,699,038	83%	33%
ERR3994146	ERX3995564	ERP114765	SAMEA6647207	3,046,818	88%	36%
ERR3994147	ERX3995565	ERP114765	SAMEA6647208	3,850,078	75%	32%
ERR3994149	ERX3995567	ERP114765	SAMEA6647209	2,571,090	67%	32%
ERR4019841	ERX4021229	ERP114765	SAMEA6647211	8,000,000	70%	33%
ERR3994155	ERX3995573	ERP114765	SAMEA6647212	5,006,766	63%	30%
ERR3994157	ERX3995575	ERP114765	SAMEA6647213	4,768,520	87%	43%
ERR3994158	ERX3995576	ERP114765	SAMEA6647214	2,961,394	88%	45%
ERR3994161	ERX3995579	ERP114765	SAMEA6647215	7,664,200	88%	41%
ERR3994198	ERX3995616	ERP114765	SAMEA6647216	6,414,162	90%	46%
ERR3994199	ERX3995617	ERP114765	SAMEA6647217	2,982,526	87%	44%
ERR3994201	ERX3995619	ERP114765	SAMEA6647219	5,178,270	89%	39%
ERR3994202	ERX3995620	ERP114765	SAMEA6647220	4,801,430	89%	40%
ERR3994206	ERX3995624	ERP114765	SAMEA6647223	4,297,274	74%	32%
ERR3994207	ERX3995625	ERP114765	SAMEA6647224	5,103,810	73%	32%
ERR3994208	ERX3995626	ERP114765	SAMEA6647225	3,149,436	70%	32%
ERR3994210	ERX3995628	ERP114765	SAMEA6647226	3,449,010	76%	32%
ERR3994211	ERX3995629	ERP114765	SAMEA6647227	7,812,492	74%	33%
ERR4019842	ERX4021230	ERP114765	SAMEA6647228	8,000,000	89%	41%
ERR3994216	ERX3995634	ERP114765	SAMEA6647229	4,265,604	87%	39%
ERR3994217	ERX3995635	ERP114765	SAMEA6647230	2,565,736	88%	39%
ERR3994219	ERX3995637	ERP114765	SAMEA6647231	8,000,000	89%	39%
ERR4019843	ERX4021231	ERP114765	SAMEA6647231	8,000,000	89%	39%
ERR3994222	ERX3995640	ERP114765	SAMEA6647232	6,467,438	90%	40%
ERR3994226	ERX3995644	ERP114765	SAMEA6647235	8,000,000	87%	37%
ERR4019844	ERX4021232	ERP114765	SAMEA6647235	8,000,000	87%	37%
SRR15507407	SRX11806581	SRP332988	SAMN20842776	50,819,100	79%	52%
SRR15507406	SRX11806582	SRP332988	SAMN20842777	52,831,390	80%	54%
SRR15507430	SRX11806558	SRP332988	SAMN20842778	48,969,844	78%	55%
SRR15507419	SRX11806569	SRP332988	SAMN20842779	52,853,364	78%	54%
SRR15507408	SRX11806580	SRP332988	SAMN20842780	48,436,978	78%	48%
SRR15507393	SRX11806595	SRP332988	SAMN20842781	51,614,464	80%	55%
SRR15507382	SRX11806606	SRP332988	SAMN20842782	51,486,332	74%	47%
SRR15507371	SRX11806617	SRP332988	SAMN20842783	49,861,630	77%	50%
SRR15507360	SRX11806628	SRP332988	SAMN20842784	51,178,162	69%	49%
SRR15507349	SRX11806639	SRP332988	SAMN20842785	48,143,540	70%	52%
SRR15507405	SRX11806583	SRP332988	SAMN20842786	49,977,780	70%	50%
SRR15507404	SRX11806584	SRP332988	SAMN20842787	49,508,350	70%	50%
SRR15507438	SRX11806550	SRP332988	SAMN20842788	50,995,116	78%	54%
SRR15507437	SRX11806551	SRP332988	SAMN20842789	47,916,962	80%	55%
SRR15507436	SRX11806552	SRP332988	SAMN20842790	48,285,708	77%	56%
SRR15507435	SRX11806553	SRP332988	SAMN20842791	48,039,082	79%	49%
SRR15507434	SRX11806554	SRP332988	SAMN20842792	48,211,828	79%	51%
SRR15507433	SRX11806555	SRP332988	SAMN20842793	50,300,138	80%	55%
SRR15507432	SRX11806556	SRP332988	SAMN20842794	51,289,336	80%	53%
SRR15507431	SRX11806557	SRP332988	SAMN20842795	48,684,486	77%	53%
SRR15507429	SRX11806559	SRP332988	SAMN20842796	50,938,994	78%	52%
SRR15507428	SRX11806560	SRP332988	SAMN20842797	49,453,242	78%	53%
SRR15507427	SRX11806561	SRP332988	SAMN20842798	52,519,386	77%	51%
SRR15507426	SRX11806562	SRP332988	SAMN20842799	46,405,346	75%	47%
SRR17194183	SRX13374557	SRP350203	SAMN23848188	73,352,774	74%	61%
SRR17194182	SRX13374558	SRP350203	SAMN23848189	48,072,554	72%	60%
SRR17194171	SRX13374569	SRP350203	SAMN23848190	25,610,666	73%	58%
SRR17194160	SRX13374580	SRP350203	SAMN23848191	70,990,710	74%	53%
SRR17194149	SRX13374591	SRP350203	SAMN23848192	26,955,432	75%	59%
SRR17194138	SRX13374602	SRP350203	SAMN23848193	47,385,330	71%	52%
SRR17194127	SRX13374613	SRP350203	SAMN23848194	73,298,242	74%	60%
SRR17194116	SRX13374624	SRP350203	SAMN23848195	55,484,156	75%	60%
SRR17194105	SRX13374635	SRP350203	SAMN23848196	58,473,470	77%	62%
SRR17194094	SRX13374646	SRP350203	SAMN23848197	52,964,378	75%	51%
SRR17194181	SRX13374559	SRP350203	SAMN23848198	56,459,390	74%	59%
SRR17194180	SRX13374560	SRP350203	SAMN23848199	45,029,492	74%	53%
SRR17194179	SRX13374561	SRP350203	SAMN23848200	30,656,260	70%	55%
SRR17194178	SRX13374562	SRP350203	SAMN23848201	27,728,376	1%	53%
SRR17194177	SRX13374563	SRP350203	SAMN23848202	47,627,742	71%	52%
SRR17194176	SRX13374564	SRP350203	SAMN23848203	51,556,848	70%	53%
SRR17194175	SRX13374565	SRP350203	SAMN23848204	67,436,240	70%	53%
SRR17194174	SRX13374566	SRP350203	SAMN23848205	48,054,342	73%	54%
SRR17194173	SRX13374567	SRP350203	SAMN23848206	62,057,176	74%	54%
SRR17194172	SRX13374568	SRP350203	SAMN23848207	35,599,626	71%	59%
SRR17194170	SRX13374570	SRP350203	SAMN23848208	62,557,812	64%	53%
SRR17194169	SRX13374571	SRP350203	SAMN23848209	45,779,938	70%	56%
SRR17194168	SRX13374572	SRP350203	SAMN23848210	49,787,872	71%	54%
SRR17194167	SRX13374573	SRP350203	SAMN23848211	50,242,450	67%	51%
SRR17194166	SRX13374574	SRP350203	SAMN23848212	79,805,160	65%	52%
SRR17194165	SRX13374575	SRP350203	SAMN23848213	80,113,300	74%	52%
SRR17194164	SRX13374576	SRP350203	SAMN23848214	104,916,510	70%	52%
SRR17194163	SRX13374577	SRP350203	SAMN23848215	82,579,882	72%	51%
SRR17194162	SRX13374578	SRP350203	SAMN23848216	81,108,792	69%	52%
SRR17194161	SRX13374579	SRP350203	SAMN23848217	76,834,952	69%	51%
SRR17194159	SRX13374581	SRP350203	SAMN23848218	66,737,780	70%	51%
SRR17194158	SRX13374582	SRP350203	SAMN23848219	76,239,740	71%	53%
SRR17194157	SRX13374583	SRP350203	SAMN23848220	91,943,812	68%	52%
SRR17194156	SRX13374584	SRP350203	SAMN23848221	73,259,120	65%	48%
SRR17194155	SRX13374585	SRP350203	SAMN23848222	81,185,562	70%	53%
SRR17194154	SRX13374586	SRP350203	SAMN23848223	79,255,384	71%	49%
SRR17194153	SRX13374587	SRP350203	SAMN23848224	59,408,322	71%	54%
SRR17194152	SRX13374588	SRP350203	SAMN23848225	68,667,396	66%	50%
SRR17194151	SRX13374589	SRP350203	SAMN23848226	92,848,500	71%	52%
SRR17194150	SRX13374590	SRP350203	SAMN23848227	65,012,332	69%	52%
SRR17194148	SRX13374592	SRP350203	SAMN23848228	77,040,346	69%	50%
SRR17194147	SRX13374593	SRP350203	SAMN23848229	82,906,420	73%	54%
SRR17194146	SRX13374594	SRP350203	SAMN23848230	86,712,516	67%	53%
SRR17194145	SRX13374595	SRP350203	SAMN23848231	77,146,220	70%	54%
SRR17194144	SRX13374596	SRP350203	SAMN23848232	60,528,878	70%	50%
SRR17194143	SRX13374597	SRP350203	SAMN23848233	65,675,080	70%	51%
SRR17194142	SRX13374598	SRP350203	SAMN23848234	96,306,978	72%	53%
SRR17194141	SRX13374599	SRP350203	SAMN23848235	72,134,870	71%	50%

Protein alignments

The alignments of the following proteins with ProSplign were used for gene prediction:

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Hippocampus comes high-quality model RefSeq (XP_)	15,317	15,096 (98.56%)	15,096 (98.56%)	73.44%	81.89%
Betta splendens high-quality model RefSeq (XP_)	18,289	17,689 (96.72%)	17,689 (96.72%)	69.31%	77.57%
Actinopterygii GenBank	92,811	85,347 (91.96%)	85,347 (91.96%)	68.92%	79.47%
Actinopterygii known RefSeq (NP_)	25,752	23,806 (92.44%)	23,806 (92.44%)	68.28%	77.62%
Danio rerio high-quality model RefSeq (XP_)	7,594	7,075 (93.17%)	7,075 (93.17%)	66.79%	70.81%
Esox lucius high-quality model RefSeq (XP_)	18,508	17,603 (95.11%)	17,603 (95.11%)	67.86%	75.17%
Xiphophorus maculatus high-quality model RefSeq (XP_)	18,457	17,795 (96.41%)	17,795 (96.41%)	68.99%	77.12%
Homo sapiens known RefSeq (NP_)	67,587	55,448 (82.04%)	55,448 (82.04%)	66.88%	70.51%

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
BUSCO: Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. Molecular biology and evolution 2021.38(10):4647-4654
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20
STAR: Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. Bioinformatics 2013 Jan 1;29(1):15-21.
Minimap2: Li H. Bioinformatics 2018 Sep 15;34(18):3094-3100

RefSeq

Integrated reference sequences