NCBI Brienomyrus brachyistius Annotation Release 100

The RefSeq genome records for Brienomyrus brachyistius were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. This report presents statistics on the annotation products, the input data used in the pipeline and intermediate alignment results.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
BUSCO results: Annotation completeness assessed with BUSCO
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as NCBI Brienomyrus brachyistius Annotation Release 100

Annotation release ID: 100
Date of Entrez queries for transcripts and proteins: Jun 30 2022
Date of submission of annotation to the public databases: Jul 13 2022
Software version: 10.0

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
BBRACH_0.4	GCF_023856365.1	Michigan State University	06-28-2022	Reference	25 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	BBRACH_0.4
Genes and pseudogenes	48,042
protein-coding	25,637
non-coding	21,076
Transcribed pseudogenes	0
Non-transcribed pseudogenes	1,265
genes with variants	13,610
Immunoglobulin/T-cell receptor gene segments	56
other	8
mRNAs	62,316
fully-supported	61,079
with > 5% ab initio	540
partial	156
with filled gap(s)	0
known RefSeq (NM_)	0
model RefSeq (XM_)	62,316
non-coding RNAs	26,709
fully-supported	10,505
with > 5% ab initio	0
partial	5
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	19,837
pseudo transcripts	0
fully-supported	0
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	0
CDSs	62,372
fully-supported	61,079
with > 5% ab initio	643
partial	159
with major correction(s)	911
known RefSeq (NP_)	0
model RefSeq (XP_)	62,316

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	46,721	12,141	3,684	55	1,180,389
All transcripts	89,025	2,954	2,417	55	97,478
mRNA	62,316	3,798	3,149	230	97,478
misc_RNA	3,377	3,406	3,088	146	18,548
tRNA	6,872	75	73	70	91
lncRNA	7,132	1,600	959	71	22,171
snoRNA	268	145	133	63	321
snRNA	2,359	153	164	55	199
rRNA	6,693	381	119	115	4,123
Single-exon transcripts	696	1,997	1,727	230	8,491
coding transcripts (NM_/XM_ )	696	1,997	1,727	230	8,491
CDSs	62,316	2,214	1,557	99	95,151
Exons	338,475	331	141	1	21,597
in coding transcripts (NM_/XM_ )	315,493	319	140	1	18,798
in non-coding transcripts (NR_/XR_ )	35,217	392	145	9	21,597
Introns	301,591	2,203	454	30	1,175,438
in coding transcripts (NM_/XM_ )	285,496	2,131	456	30	1,175,438
in non-coding transcripts (NR_/XR_ )	28,088	2,806	434	30	495,156

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	2.06	1	1	50
Number of exons per transcript	10.94	8	1	246

BUSCO analysis of gene annotation

BUSCO v4.1.4 was run in "protein" mode on the annotated gene set picking one longest protein per gene, and run using the actinopterygii_odb10 lineage dataset. Results are reported for the gene set from the primary assembly unit, and presented in BUSCO notation.

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the UniProtKB/Swiss-Prot curated proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 25637 coding genes, 23400 genes had a protein with an alignment covering 50% or more of the query and 12647 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: UniProtKB/Swiss-Prot curated proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker (if calculated), for each assembly. RepeatMasker results are only calculated for organisms with complete Dfam HMM model collections.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with WindowMasker
BBRACH_0.4	GCF_023856365.1	34.11%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez Nucleotide, Entrez Protein, and SRA, and aligned to the genome.

Transcript alignments

The alignments of the following transcripts with Splign were used for gene prediction:

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species Genbank	20	19 (95.00%)	18 (90.00%)	99.14%	96.87%
Same-species EST	153	124 (81.05%)	117 (76.47%)	99.24%	91.26%

RNA-Seq alignments

The alignments of the following RNA-Seq reads with STAR were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	Aggregate of all aligned samples	2,052,339,768	73%	29%	531,398
SAMN02799596	EO (Brienomyrus brachyistius, undetermined, not determined, SAMN02799596)	53,865,139	88%	24%	228,508
SAMN02799597	muscle (Brienomyrus brachyistius, undetermined, not determined, SAMN02799597)	26,591,453	89%	36%	158,315
SAMN02800480	electric organ (Brienomyrus brachyistius, undetermined, not determined, SAMN02800480)	6,004,610	79%	24%	109,049
SAMN28861232	Chin Skin (Brienomyrus brachyistius, not determined, SAMN28861232)	46,323,168	18%	11%	131,721
SAMN28861233	Chin Skin (Brienomyrus brachyistius, not determined, SAMN28861233)	54,783,582	72%	14%	200,899
SAMN28861234	Electric Organ (Brienomyrus brachyistius, not determined, SAMN28861234)	33,156,376	78%	25%	227,207
SAMN28861235	Electric Organ (Brienomyrus brachyistius, not determined, SAMN28861235)	27,009,804	79%	32%	233,030
SAMN28861236	Electric Organ (Brienomyrus brachyistius, not determined, SAMN28861236)	35,296,652	81%	38%	239,464
SAMN28861237	Flank Skin (Brienomyrus brachyistius, not determined, SAMN28861237)	35,996,916	82%	34%	215,342
SAMN28861238	Flank Skin (Brienomyrus brachyistius, not determined, SAMN28861238)	30,124,462	81%	37%	224,571
SAMN28861239	Flank Skin (Brienomyrus brachyistius, not determined, SAMN28861239)	36,672,954	82%	37%	240,127
SAMN28861240	FlankSkin (Brienomyrus brachyistius, not determined, SAMN28861240)	57,880,062	89%	29%	242,965
SAMN28861241	FlankSkin (Brienomyrus brachyistius, not determined, SAMN28861241)	51,183,292	89%	29%	246,473
SAMN28861242	Gills (Brienomyrus brachyistius, not determined, SAMN28861242)	25,341,438	82%	30%	218,014
SAMN28861243	Gills (Brienomyrus brachyistius, not determined, SAMN28861243)	32,057,410	78%	35%	259,017
SAMN28861244	Gills (Brienomyrus brachyistius, not determined, SAMN28861244)	35,922,364	80%	35%	253,886
SAMN28861245	Gonad (Brienomyrus brachyistius, not determined, SAMN28861245)	28,406,622	73%	33%	286,180
SAMN28861246	Head Skin (Brienomyrus brachyistius, not determined, SAMN28861246)	31,063,660	81%	34%	210,888
SAMN28861247	Head Skin (Brienomyrus brachyistius, not determined, SAMN28861247)	26,364,334	80%	36%	205,705
SAMN28861248	Head Skin (Brienomyrus brachyistius, not determined, SAMN28861248)	35,268,172	82%	36%	251,561
SAMN28861249	HeadSkin (Brienomyrus brachyistius, not determined, SAMN28861249)	102,278,134	90%	28%	243,087
SAMN28861250	HeadSkin (Brienomyrus brachyistius, not determined, SAMN28861250)	77,130,994	90%	27%	238,469
SAMN28861251	Heart (Brienomyrus brachyistius, not determined, SAMN28861251)	27,162,356	79%	27%	207,349
SAMN28861252	Heart (Brienomyrus brachyistius, not determined, SAMN28861252)	26,544,986	79%	36%	220,474
SAMN28861253	Heart (Brienomyrus brachyistius, not determined, SAMN28861253)	25,538,166	80%	36%	230,075
SAMN28861254	Kidney (Brienomyrus brachyistius, not determined, SAMN28861254)	29,809,262	80%	36%	229,849
SAMN28861255	Kidney (Brienomyrus brachyistius, not determined, SAMN28861255)	27,420,252	81%	37%	241,977
SAMN28861256	Knollenorgan (Brienomyrus brachyistius, not determined, SAMN28861256)	45,882,578	66%	12%	193,338
SAMN28861257	Knollenorgan (Brienomyrus brachyistius, not determined, SAMN28861257)	61,066,006	66%	9%	159,591
SAMN28861258	Knollenorgan (Brienomyrus brachyistius, not determined, SAMN28861258)	48,223,060	28%	12%	152,808
SAMN28861259	Liver (Brienomyrus brachyistius, not determined, SAMN28861259)	29,395,272	77%	40%	186,870
SAMN28861260	Liver (Brienomyrus brachyistius, not determined, SAMN28861260)	27,200,188	83%	41%	184,861
SAMN28861261	Mormyromast (Brienomyrus brachyistius, not determined, SAMN28861261)	53,441,922	60%	14%	185,386
SAMN28861262	Mormyromast (Brienomyrus brachyistius, not determined, SAMN28861262)	48,641,364	23%	12%	140,483
SAMN28861263	Mormyromast (Brienomyrus brachyistius, not determined, SAMN28861263)	52,827,104	77%	13%	201,153
SAMN28861264	Olfactory Epithelium (Brienomyrus brachyistius, not determined, SAMN28861264)	161,056,298	64%	35%	238,892
SAMN28861265	Skeletal Muscle (Brienomyrus brachyistius, not determined, SAMN28861265)	24,646,782	82%	51%	151,709
SAMN28861266	Skeletal Muscle (Brienomyrus brachyistius, not determined, SAMN28861266)	28,197,942	83%	52%	173,596
SAMN28861267	Skeletal Muscle (Brienomyrus brachyistius, not determined, SAMN28861267)	28,885,150	82%	47%	195,571
SAMN28861268	Skin (Brienomyrus brachyistius, not determined, SAMN28861268)	91,942,960	53%	14%	221,266
SAMN28861269	Skin (Brienomyrus brachyistius, not determined, SAMN28861269)	71,641,872	67%	13%	218,361
SAMN28861270	Skin (Brienomyrus brachyistius, not determined, SAMN28861270)	52,396,248	79%	14%	221,830
SAMN28861271	Spleen (Brienomyrus brachyistius, not determined, SAMN28861271)	27,758,874	78%	34%	215,759
SAMN28861272	Spleen (Brienomyrus brachyistius, not determined, SAMN28861272)	29,695,638	78%	35%	222,520
SAMN28861273	Spleen (Brienomyrus brachyistius, not determined, SAMN28861273)	29,805,644	84%	42%	183,785
SAMN28861274	Swim Bladder (Brienomyrus brachyistius, not determined, SAMN28861274)	40,790,114	78%	48%	183,308
SAMN28861275	Whole Brain (Brienomyrus brachyistius, not determined, SAMN28861275)	30,292,836	83%	27%	234,059
SAMN28861276	Whole Brain (Brienomyrus brachyistius, not determined, SAMN28861276)	19,905,362	74%	32%	213,836
SAMN28861277	Whole Brain (Brienomyrus brachyistius, not determined, SAMN28861277)	23,449,934	80%	31%	228,441

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
SRR1299496	SRX553135	SRP042260	SAMN02799596	53,865,139	88%	24%
SRR1299518	SRX553136	SRP042260	SAMN02799597	26,591,453	89%	36%
SRR1342268	SRX573075	SRP042260	SAMN02800480	6,004,610	79%	24%
SRR19539507	SRX15591479	SRP378436	SAMN28861232	8,000,000	18%	12%
SRR19539505	SRX15591480	SRP378436	SAMN28861232	8,000,000	18%	12%
SRR19539504	SRX15591481	SRP378436	SAMN28861232	8,000,000	18%	12%
SRR19539503	SRX15591482	SRP378436	SAMN28861232	8,000,000	18%	11%
SRR19539502	SRX15591483	SRP378436	SAMN28861232	8,000,000	18%	11%
SRR19539501	SRX15591484	SRP378436	SAMN28861232	6,323,168	18%	11%
SRR19539390	SRX15591395	SRP378436	SAMN28861233	8,000,000	71%	14%
SRR19539389	SRX15591396	SRP378436	SAMN28861233	8,000,000	71%	14%
SRR19539388	SRX15591397	SRP378436	SAMN28861233	8,000,000	71%	14%
SRR19539387	SRX15591398	SRP378436	SAMN28861233	8,000,000	71%	14%
SRR19539386	SRX15591399	SRP378436	SAMN28861233	6,783,582	71%	14%
SRR19539491	SRX15591494	SRP378436	SAMN28861233	8,000,000	72%	14%
SRR19539490	SRX15591495	SRP378436	SAMN28861233	8,000,000	72%	14%
SRR19539411	SRX15591374	SRP378436	SAMN28861234	10,878,002	77%	25%
SRR19539379	SRX15591408	SRP378436	SAMN28861234	11,281,486	77%	25%
SRR19539479	SRX15591506	SRP378436	SAMN28861234	10,996,888	80%	26%
SRR19539336	SRX15591450	SRP378436	SAMN28861235	9,006,022	80%	33%
SRR19539335	SRX15591451	SRP378436	SAMN28861235	9,104,558	78%	31%
SRR19539334	SRX15591452	SRP378436	SAMN28861235	8,899,224	79%	32%
SRR19539409	SRX15591376	SRP378436	SAMN28861236	11,835,572	82%	39%
SRR19539408	SRX15591377	SRP378436	SAMN28861236	11,923,094	80%	38%
SRR19539407	SRX15591378	SRP378436	SAMN28861236	11,537,986	80%	38%
SRR19539402	SRX15591383	SRP378436	SAMN28861237	11,841,852	82%	34%
SRR19539373	SRX15591416	SRP378436	SAMN28861237	12,252,098	82%	34%
SRR19539424	SRX15591463	SRP378436	SAMN28861237	11,902,966	83%	35%
SRR19539339	SRX15591453	SRP378436	SAMN28861238	10,149,082	82%	37%
SRR19539333	SRX15591454	SRP378436	SAMN28861238	10,110,716	80%	36%
SRR19539340	SRX15591455	SRP378436	SAMN28861238	9,864,664	80%	36%
SRR19539331	SRX15591457	SRP378436	SAMN28861239	12,324,760	83%	38%
SRR19539330	SRX15591458	SRP378436	SAMN28861239	12,377,956	81%	37%
SRR19539329	SRX15591459	SRP378436	SAMN28861239	11,970,238	81%	37%
SRR19539420	SRX15591467	SRP378436	SAMN28861240	19,060,456	88%	29%
SRR19539419	SRX15591468	SRP378436	SAMN28861240	16,968,872	89%	30%
SRR19539418	SRX15591469	SRP378436	SAMN28861240	21,850,734	88%	29%
SRR19539510	SRX15591476	SRP378436	SAMN28861241	15,201,266	89%	29%
SRR19539509	SRX15591477	SRP378436	SAMN28861241	15,712,064	90%	29%
SRR19539508	SRX15591478	SRP378436	SAMN28861241	20,269,962	89%	29%
SRR19539398	SRX15591387	SRP378436	SAMN28861242	8,359,020	81%	29%
SRR19539512	SRX15591474	SRP378436	SAMN28861242	8,418,136	82%	30%
SRR19539500	SRX15591485	SRP378436	SAMN28861242	8,564,282	81%	29%
SRR19539332	SRX15591456	SRP378436	SAMN28861243	10,726,190	79%	35%
SRR19539481	SRX15591504	SRP378436	SAMN28861243	10,822,724	77%	34%
SRR19539480	SRX15591505	SRP378436	SAMN28861243	10,508,496	77%	34%
SRR19539328	SRX15591460	SRP378436	SAMN28861244	11,959,470	81%	35%
SRR19539426	SRX15591461	SRP378436	SAMN28861244	12,133,376	79%	34%
SRR19539425	SRX15591462	SRP378436	SAMN28861244	11,829,518	80%	34%
SRR19539478	SRX15591507	SRP378436	SAMN28861245	9,470,410	74%	34%
SRR19539477	SRX15591508	SRP378436	SAMN28861245	9,572,710	73%	33%
SRR19539476	SRX15591509	SRP378436	SAMN28861245	9,363,502	73%	33%
SRR19539383	SRX15591402	SRP378436	SAMN28861246	10,473,156	81%	34%
SRR19539348	SRX15591441	SRP378436	SAMN28861246	10,210,590	81%	34%
SRR19539493	SRX15591492	SRP378436	SAMN28861246	10,379,914	82%	35%
SRR19539381	SRX15591404	SRP378436	SAMN28861247	8,866,366	82%	37%
SRR19539341	SRX15591405	SRP378436	SAMN28861247	8,863,664	79%	35%
SRR19539380	SRX15591406	SRP378436	SAMN28861247	8,634,304	80%	35%
SRR19539471	SRX15591514	SRP378436	SAMN28861248	11,773,852	83%	37%
SRR19539470	SRX15591515	SRP378436	SAMN28861248	11,906,254	81%	35%
SRR19539469	SRX15591516	SRP378436	SAMN28861248	11,588,066	81%	36%
SRR19539399	SRX15591386	SRP378436	SAMN28861249	31,480,326	90%	28%
SRR19539422	SRX15591465	SRP378436	SAMN28861249	33,517,260	90%	29%
SRR19539421	SRX15591466	SRP378436	SAMN28861249	37,280,548	90%	29%
SRR19539417	SRX15591470	SRP378436	SAMN28861250	23,330,786	90%	27%
SRR19539416	SRX15591471	SRP378436	SAMN28861250	23,310,238	90%	27%
SRR19539511	SRX15591475	SRP378436	SAMN28861250	30,489,970	90%	27%
SRR19539360	SRX15591429	SRP378436	SAMN28861251	9,198,904	79%	27%
SRR19539357	SRX15591432	SRP378436	SAMN28861251	8,977,474	79%	27%
SRR19539454	SRX15591531	SRP378436	SAMN28861251	8,985,978	80%	28%
SRR19539382	SRX15591403	SRP378436	SAMN28861252	8,759,006	78%	35%
SRR19539475	SRX15591510	SRP378436	SAMN28861252	8,832,222	79%	36%
SRR19539474	SRX15591511	SRP378436	SAMN28861252	8,953,758	78%	35%
SRR19539423	SRX15591464	SRP378436	SAMN28861253	8,620,198	80%	36%
SRR19539473	SRX15591512	SRP378436	SAMN28861253	8,569,062	80%	35%
SRR19539472	SRX15591513	SRP378436	SAMN28861253	8,348,906	80%	35%
SRR19539515	SRX15591407	SRP378436	SAMN28861254	9,948,668	81%	36%
SRR19539378	SRX15591409	SRP378436	SAMN28861254	10,051,590	79%	35%
SRR19539516	SRX15591410	SRP378436	SAMN28861254	9,809,004	79%	35%
SRR19539468	SRX15591517	SRP378436	SAMN28861255	9,013,634	81%	37%
SRR19539467	SRX15591518	SRP378436	SAMN28861255	9,332,778	80%	36%
SRR19539466	SRX15591519	SRP378436	SAMN28861255	9,073,840	81%	36%
SRR19539499	SRX15591486	SRP378436	SAMN28861256	8,000,000	67%	12%
SRR19539498	SRX15591487	SRP378436	SAMN28861256	8,000,000	67%	12%
SRR19539465	SRX15591520	SRP378436	SAMN28861256	8,000,000	66%	12%
SRR19539464	SRX15591521	SRP378436	SAMN28861256	8,000,000	66%	12%
SRR19539463	SRX15591522	SRP378436	SAMN28861256	8,000,000	66%	12%
SRR19539462	SRX15591523	SRP378436	SAMN28861256	5,882,578	66%	12%
SRR19539397	SRX15591388	SRP378436	SAMN28861257	8,000,000	66%	9%
SRR19539396	SRX15591389	SRP378436	SAMN28861257	8,000,000	66%	9%
SRR19539395	SRX15591390	SRP378436	SAMN28861257	8,000,000	66%	9%
SRR19539394	SRX15591391	SRP378436	SAMN28861257	5,066,006	67%	9%
SRR19539461	SRX15591524	SRP378436	SAMN28861257	8,000,000	67%	9%
SRR19539460	SRX15591525	SRP378436	SAMN28861257	8,000,000	67%	9%
SRR19539459	SRX15591526	SRP378436	SAMN28861257	8,000,000	66%	9%
SRR19539458	SRX15591527	SRP378436	SAMN28861257	8,000,000	66%	9%
SRR19539385	SRX15591400	SRP378436	SAMN28861258	8,000,000	24%	12%
SRR19539384	SRX15591401	SRP378436	SAMN28861258	8,000,000	24%	12%
SRR19539441	SRX15591544	SRP378436	SAMN28861258	4,000,000	48%	12%
SRR19539440	SRX15591545	SRP378436	SAMN28861258	8,000,000	24%	12%
SRR19539439	SRX15591546	SRP378436	SAMN28861258	4,000,000	49%	12%
SRR19539438	SRX15591547	SRP378436	SAMN28861258	8,000,000	24%	12%
SRR19539437	SRX15591548	SRP378436	SAMN28861258	8,000,000	24%	11%
SRR19539436	SRX15591549	SRP378436	SAMN28861258	223,060	24%	12%
SRR19539484	SRX15591501	SRP378436	SAMN28861259	9,661,904	78%	41%
SRR19539482	SRX15591503	SRP378436	SAMN28861259	10,011,318	76%	39%
SRR19539433	SRX15591552	SRP378436	SAMN28861259	9,722,050	76%	39%
SRR19539374	SRX15591415	SRP378436	SAMN28861260	8,947,410	83%	42%
SRR19539372	SRX15591417	SRP378436	SAMN28861260	9,243,808	82%	41%
SRR19539371	SRX15591418	SRP378436	SAMN28861260	9,008,970	82%	41%
SRR19539393	SRX15591392	SRP378436	SAMN28861261	8,000,000	60%	14%
SRR19539392	SRX15591393	SRP378436	SAMN28861261	8,000,000	60%	14%
SRR19539391	SRX15591394	SRP378436	SAMN28861261	8,000,000	60%	14%
SRR19539457	SRX15591528	SRP378436	SAMN28861261	8,000,000	59%	14%
SRR19539456	SRX15591529	SRP378436	SAMN28861261	8,000,000	60%	14%
SRR19539455	SRX15591530	SRP378436	SAMN28861261	8,000,000	59%	14%
SRR19539453	SRX15591532	SRP378436	SAMN28861261	5,441,922	59%	14%
SRR19539358	SRX15591431	SRP378436	SAMN28861262	8,000,000	19%	12%
SRR19539356	SRX15591433	SRP378436	SAMN28861262	8,000,000	19%	12%
SRR19539355	SRX15591434	SRP378436	SAMN28861262	641,364	19%	12%
SRR19539446	SRX15591539	SRP378436	SAMN28861262	8,000,000	19%	12%
SRR19539445	SRX15591540	SRP378436	SAMN28861262	4,000,000	39%	12%
SRR19539444	SRX15591541	SRP378436	SAMN28861262	4,000,000	39%	12%
SRR19539443	SRX15591542	SRP378436	SAMN28861262	8,000,000	19%	12%
SRR19539442	SRX15591543	SRP378436	SAMN28861262	8,000,000	20%	11%
SRR19539350	SRX15591439	SRP378436	SAMN28861263	8,000,000	77%	13%
SRR19539349	SRX15591440	SRP378436	SAMN28861263	8,000,000	77%	13%
SRR19539347	SRX15591442	SRP378436	SAMN28861263	8,000,000	77%	13%
SRR19539346	SRX15591443	SRP378436	SAMN28861263	8,000,000	77%	13%
SRR19539345	SRX15591444	SRP378436	SAMN28861263	4,827,104	77%	13%
SRR19539435	SRX15591550	SRP378436	SAMN28861263	8,000,000	77%	13%
SRR19539434	SRX15591551	SRP378436	SAMN28861263	8,000,000	77%	13%
SRR19539514	SRX15591472	SRP378436	SAMN28861264	161,056,298	64%	35%
SRR19539432	SRX15591553	SRP378436	SAMN28861265	8,188,432	83%	51%
SRR19539431	SRX15591554	SRP378436	SAMN28861265	8,336,566	81%	50%
SRR19539430	SRX15591555	SRP378436	SAMN28861265	8,121,784	81%	50%
SRR19539377	SRX15591411	SRP378436	SAMN28861266	9,446,574	84%	53%
SRR19539376	SRX15591412	SRP378436	SAMN28861266	9,485,262	83%	52%
SRR19539375	SRX15591413	SRP378436	SAMN28861266	9,266,106	83%	52%
SRR19539406	SRX15591379	SRP378436	SAMN28861267	9,563,204	83%	47%
SRR19539405	SRX15591380	SRP378436	SAMN28861267	9,815,390	82%	46%
SRR19539404	SRX15591381	SRP378436	SAMN28861267	9,506,556	82%	46%
SRR19539366	SRX15591423	SRP378436	SAMN28861268	8,000,000	54%	14%
SRR19539365	SRX15591424	SRP378436	SAMN28861268	8,000,000	54%	14%
SRR19539364	SRX15591425	SRP378436	SAMN28861268	2,727,792	54%	13%
SRR19539363	SRX15591426	SRP378436	SAMN28861268	8,000,000	51%	13%
SRR19539362	SRX15591427	SRP378436	SAMN28861268	8,000,000	52%	14%
SRR19539361	SRX15591428	SRP378436	SAMN28861268	8,000,000	51%	14%
SRR19539359	SRX15591430	SRP378436	SAMN28861268	8,000,000	52%	14%
SRR19539452	SRX15591533	SRP378436	SAMN28861268	8,000,000	54%	14%
SRR19539451	SRX15591534	SRP378436	SAMN28861268	8,000,000	54%	14%
SRR19539450	SRX15591535	SRP378436	SAMN28861268	8,000,000	54%	14%
SRR19539449	SRX15591536	SRP378436	SAMN28861268	8,000,000	52%	14%
SRR19539448	SRX15591537	SRP378436	SAMN28861268	8,000,000	51%	13%
SRR19539447	SRX15591538	SRP378436	SAMN28861268	1,215,168	53%	14%
SRR19539354	SRX15591435	SRP378436	SAMN28861269	8,000,000	67%	13%
SRR19539353	SRX15591436	SRP378436	SAMN28861269	8,000,000	67%	13%
SRR19539352	SRX15591437	SRP378436	SAMN28861269	8,000,000	67%	13%
SRR19539351	SRX15591438	SRP378436	SAMN28861269	8,000,000	67%	13%
SRR19539497	SRX15591488	SRP378436	SAMN28861269	8,000,000	67%	13%
SRR19539496	SRX15591489	SRP378436	SAMN28861269	8,000,000	67%	13%
SRR19539495	SRX15591490	SRP378436	SAMN28861269	8,000,000	67%	13%
SRR19539494	SRX15591491	SRP378436	SAMN28861269	8,000,000	67%	13%
SRR19539492	SRX15591493	SRP378436	SAMN28861269	7,641,872	67%	13%
SRR19539344	SRX15591445	SRP378436	SAMN28861270	8,000,000	79%	14%
SRR19539343	SRX15591446	SRP378436	SAMN28861270	8,000,000	79%	14%
SRR19539489	SRX15591496	SRP378436	SAMN28861270	8,000,000	79%	14%
SRR19539488	SRX15591497	SRP378436	SAMN28861270	8,000,000	79%	14%
SRR19539487	SRX15591498	SRP378436	SAMN28861270	8,000,000	79%	14%
SRR19539486	SRX15591499	SRP378436	SAMN28861270	8,000,000	79%	14%
SRR19539485	SRX15591500	SRP378436	SAMN28861270	4,396,248	79%	14%
SRR19539429	SRX15591556	SRP378436	SAMN28861271	9,128,804	79%	34%
SRR19539428	SRX15591557	SRP378436	SAMN28861271	9,421,962	77%	33%
SRR19539427	SRX15591558	SRP378436	SAMN28861271	9,208,108	78%	33%
SRR19539414	SRX15591370	SRP378436	SAMN28861272	9,988,332	78%	34%
SRR19539506	SRX15591371	SRP378436	SAMN28861272	9,726,398	78%	34%
SRR19539517	SRX15591414	SRP378436	SAMN28861272	9,980,908	78%	35%
SRR19539403	SRX15591382	SRP378436	SAMN28861273	9,848,826	85%	43%
SRR19539401	SRX15591384	SRP378436	SAMN28861273	10,145,752	83%	42%
SRR19539400	SRX15591385	SRP378436	SAMN28861273	9,811,066	83%	42%
SRR19539370	SRX15591419	SRP378436	SAMN28861274	36,998,206	80%	47%
SRR19539369	SRX15591420	SRP378436	SAMN28861274	1,225,514	66%	51%
SRR19539368	SRX15591421	SRP378436	SAMN28861274	1,296,862	66%	50%
SRR19539367	SRX15591422	SRP378436	SAMN28861274	1,269,532	66%	51%
SRR19539338	SRX15591448	SRP378436	SAMN28861275	9,986,588	82%	27%
SRR19539513	SRX15591473	SRP378436	SAMN28861275	10,015,978	83%	27%
SRR19539483	SRX15591502	SRP378436	SAMN28861275	10,290,270	82%	26%
SRR19539415	SRX15591369	SRP378436	SAMN28861276	6,477,016	74%	32%
SRR19539342	SRX15591447	SRP378436	SAMN28861276	6,765,410	74%	31%
SRR19539337	SRX15591449	SRP378436	SAMN28861276	6,662,936	74%	31%
SRR19539413	SRX15591372	SRP378436	SAMN28861277	7,881,510	80%	31%
SRR19539412	SRX15591373	SRP378436	SAMN28861277	7,863,174	80%	30%
SRR19539410	SRX15591375	SRP378436	SAMN28861277	7,705,250	80%	30%

Protein alignments

The alignments of the following proteins with ProSplign were used for gene prediction:

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Scleropages formosus high-quality model RefSeq (XP_)	17,879	17,624 (98.57%)	17,624 (98.57%)	70.99%	79.92%
Cynoglossus semilaevis high-quality model RefSeq (XP_)	14,331	13,947 (97.32%)	13,947 (97.32%)	68.32%	76.46%
Poecilia formosa high-quality model RefSeq (XP_)	18,503	17,811 (96.26%)	17,811 (96.26%)	67.55%	75.40%
Actinopterygii GenBank	90,078	84,653 (93.98%)	84,653 (93.98%)	69.13%	80.31%
Actinopterygii known RefSeq (NP_)	25,472	24,056 (94.44%)	24,056 (94.44%)	68.71%	79.08%
Danio rerio high-quality model RefSeq (XP_)	7,717	7,338 (95.09%)	7,338 (95.09%)	66.50%	74.10%
Astyanax mexicanus high-quality model RefSeq (XP_)	16,692	16,202 (97.06%)	16,202 (97.06%)	67.46%	77.37%
Esox lucius high-quality model RefSeq (XP_)	18,508	17,867 (96.54%)	17,867 (96.54%)	68.16%	76.58%
Homo sapiens known RefSeq (NP_)	65,282	55,236 (84.61%)	55,236 (84.61%)	67.83%	71.72%

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
BUSCO: Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. Molecular biology and evolution 2021.38(10):4647-4654
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20
STAR: Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. Bioinformatics 2013 Jan 1;29(1):15-21.
Minimap2: Li H. Bioinformatics 2018 Sep 15;34(18):3094-3100

RefSeq

Integrated reference sequences