NCBI Megalobrama amblycephala Annotation Release 100

The RefSeq genome records for Megalobrama amblycephala were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. This report presents statistics on the annotation products, the input data used in the pipeline and intermediate alignment results.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
BUSCO results: Annotation completeness assessed with BUSCO
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as NCBI Megalobrama amblycephala Annotation Release 100

Annotation release ID: 100
Date of Entrez queries for transcripts and proteins: May 6 2022
Date of submission of annotation to the public databases: May 13 2022
Software version: 9.0

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
ASM1881202v1	GCF_018812025.1	Huazhong Agricultural University	06-09-2021	Reference	25 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	ASM1881202v1
Genes and pseudogenes	38,087
protein-coding	29,877
non-coding	6,223
Transcribed pseudogenes	0
Non-transcribed pseudogenes	1,233
genes with variants	13,166
Immunoglobulin/T-cell receptor gene segments	743
other	11
mRNAs	59,914
fully-supported	58,262
with > 5% ab initio	840
partial	189
with filled gap(s)	0
known RefSeq (NM_)	0
model RefSeq (XM_)	59,914
non-coding RNAs	9,608
fully-supported	8,359
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	8,854
pseudo transcripts	0
fully-supported	0
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	0
CDSs	60,670
fully-supported	58,262
with > 5% ab initio	966
partial	206
with major correction(s)	1,461
known RefSeq (NP_)	0
model RefSeq (XP_)	59,927

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	36,111	20,237	8,377	54	1,350,402
All transcripts	69,522	3,239	2,572	54	92,962
mRNA	59,914	3,508	2,803	212	92,962
misc_RNA	1,545	2,949	2,499	164	20,645
tRNA	752	75	73	68	87
lncRNA	6,816	1,513	1,134	106	13,643
snoRNA	218	118	106	63	320
snRNA	96	135	141	54	200
rRNA	170	133	119	118	1,692
Single-exon transcripts	1,278	1,931	1,639	315	12,441
coding transcripts (NM_/XM_ )	1,278	1,931	1,639	315	12,441
CDSs	59,927	2,161	1,506	99	91,683
Exons	352,246	310	143	1	47,416
in coding transcripts (NM_/XM_ )	329,805	303	143	1	47,416
in non-coding transcripts (NR_/XR_ )	32,230	347	146	2	19,982
Introns	310,418	2,552	494	30	1,179,556
in coding transcripts (NM_/XM_ )	294,785	2,503	487	30	1,179,556
in non-coding transcripts (NR_/XR_ )	25,224	3,181	595	30	454,412

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	1.94	1	1	50
Number of exons per transcript	12.19	8	1	233

BUSCO analysis of gene annotation

BUSCO v4.1.4 was run in "protein" mode on the annotated gene set picking one longest protein per gene, and run using the actinopterygii_odb10 lineage dataset. Results are reported for the gene set from the primary assembly unit, and presented in BUSCO notation.

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the UniProtKB/Swiss-Prot curated proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 29864 coding genes, 26080 genes had a protein with an alignment covering 50% or more of the query and 12710 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: UniProtKB/Swiss-Prot curated proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker (if calculated), for each assembly. RepeatMasker results are only calculated for organisms with complete Dfam HMM model collections.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with WindowMasker
ASM1881202v1	GCF_018812025.1	42.33%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez, aligned to the genome by Splign, minimap2, or ProSplign and passed to Gnomon, NCBI's gene prediction software.

Transcript alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species Genbank	405	403 (99.51%)	330 (81.48%)	99.55%	98.46%

RNA-Seq alignments

The following RNA-Seq reads from the Sequence Read Archive were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Publication	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	NA	Aggregate of all aligned samples	5,608,463,871	83%	42%	380,392
SAMN00715909	NA	An EST-based genome scan using 454 sequencing in the blunt snout bream Megalobrama amblycephala and large scale gene-associated marker development for its whole genome association studies (Megalobrama amblycephala, SAMN00715909)	1,409,706	78%	55%	73,127
SAMN03104782	NA	blood, liver, gill, intestine, spleen and kidney (Megalobrama amblycephala, 1, SAMN03104782)	48,046,044	86%	30%	205,991
SAMN03104783	NA	Blood, liver, gill, intestine, spleen and kidney (Megalobrama amblycephala, 1, SAMN03104783)	47,529,384	86%	31%	215,633
SAMN03108594	NA	Connective tissue (Megalobrama amblycephala, 1 years old, SAMN03108594)	39,073,700	85%	33%	134,772
SAMN03112591	NA	intermuscular bone (Megalobrama amblycephala, 1 year old, SAMN03112591)	38,718,120	85%	33%	138,422
SAMN03253152	NA	liver and gill (Megalobrama amblycephala, 120d, pooled male and female, SAMN03253152)	99,644,426	90%	34%	265,395
SAMN03253154	NA	liver and gill (Megalobrama amblycephala, 120d, pooled male and female, SAMN03253154)	104,244,002	89%	33%	264,450
SAMN03253166	NA	liver and gill (Megalobrama amblycephala, 120d, pooled male and female, SAMN03253166)	114,696,020	90%	31%	259,459
SAMN03253171	NA	liver and gill (Megalobrama amblycephala, 120d, pooled male and female, SAMN03253171)	99,745,432	87%	31%	261,737
SAMN04318424	NA	liver (Megalobrama amblycephala, SAMN04318424)	67,477,436	83%	49%	184,029
SAMN04318425	NA	liver (Megalobrama amblycephala, SAMN04318425)	65,222,674	84%	49%	179,654
SAMN05727044	NA	SSMT2, liver, control (Megalobrama amblycephala, SAMN05727044)	65,222,674	84%	49%	179,654
SAMN05727045	NA	SSMT1, liver, heat treatment (Megalobrama amblycephala, SAMN05727045)	67,477,436	83%	49%	184,029
SAMN06887734	34003267	Whole body 1 (Megalobrama amblycephala, not collected, SAMN06887734)	60,936,140	86%	22%	250,047
SAMN06887735	34003267	Whole body 2 (Megalobrama amblycephala, not collected, SAMN06887735)	113,030,060	88%	25%	264,516
SAMN06887736	34003267	muscle tissue with partial distribution of IB 1 (Megalobrama amblycephala, not collected, SAMN06887736)	69,841,506	91%	33%	197,148
SAMN06887737	34003267	muscle tissue with partial distribution of IB 2 (Megalobrama amblycephala, not collected, SAMN06887737)	55,908,166	89%	35%	194,224
SAMN06887738	34003267	muscle tissue with completed distribution of IB 1 (Megalobrama amblycephala, not collected, SAMN06887738)	64,855,446	90%	38%	180,320
SAMN06887739	34003267	muscle tissue with completed distribution of IB 2 (Megalobrama amblycephala, not collected, SAMN06887739)	61,618,862	92%	36%	168,402
SAMN07285688	NA	liver (Megalobrama amblycephala, six month old, not collected, SAMN07285688)	51,674,516	88%	50%	148,777
SAMN07285689	NA	liver (Megalobrama amblycephala, six month old, not collected, SAMN07285689)	52,931,508	88%	54%	156,286
SAMN07285690	NA	liver (Megalobrama amblycephala, six month old, not collected, SAMN07285690)	40,929,414	85%	44%	161,778
SAMN07285960	NA	liver (Megalobrama amblycephala, 1 year, SAMN07285960)	21,055,461	83%	46%	123,446
SAMN07285961	NA	liver (Megalobrama amblycephala, 2 year, SAMN07285961)	25,809,396	87%	51%	132,284
SAMN07285962	NA	liver (Megalobrama amblycephala, 3 year, SAMN07285962)	20,380,287	89%	49%	129,395
SAMN07285963	NA	liver (Megalobrama amblycephala, 4 year, SAMN07285963)	16,997,735	88%	53%	136,878
SAMN07285964	NA	liver (Megalobrama amblycephala, 5 year, SAMN07285964)	20,125,647	89%	51%	140,082
SAMN07285965	NA	liver (Megalobrama amblycephala, 6 year, SAMN07285965)	20,435,825	88%	52%	144,831
SAMN07285966	NA	liver (Megalobrama amblycephala, 7 year, SAMN07285966)	20,193,999	87%	51%	146,799
SAMN07285967	NA	liver (Megalobrama amblycephala, 8 year, SAMN07285967)	19,703,408	88%	51%	147,293
SAMN07285968	NA	liver (Megalobrama amblycephala, 9 year, SAMN07285968)	19,613,412	88%	52%	147,157
SAMN07285969	NA	liver (Megalobrama amblycephala, 10 year, SAMN07285969)	20,058,243	88%	53%	151,569
SAMN07285970	NA	liver (Megalobrama amblycephala, 11 year, SAMN07285970)	19,812,931	88%	53%	149,607
SAMN07285971	NA	liver (Megalobrama amblycephala, 12 year, SAMN07285971)	18,991,663	89%	51%	140,417
SAMN07508548	NA	testis (Megalobrama amblycephala, two years old, male, SAMN07508548)	167,173,678	81%	49%	337,462
SAMN07821248	NA	testis (Megalobrama amblycephala, 2 year old, male, SAMN07821248)	86,359,184	80%	48%	319,826
SAMN08449832	NA	Testis (Megalobrama amblycephala, 2 years old, male, SAMN08449832)	184,176,060	77%	38%	329,745
SAMN08449833	NA	Testis (Megalobrama amblycephala, 2 years old, male, SAMN08449833)	136,847,266	63%	45%	257,229
SAMN10457792	NA	adult, primary head kidney macrophages, macrophage (Megalobrama amblycephala, 1 year old, SAMN10457792)	70,457,946	83%	50%	178,110
SAMN10457793	NA	adult, primary head kidney macrophages, macrophage (Megalobrama amblycephala, 1 year old, SAMN10457793)	62,514,254	80%	47%	177,079
SAMN10457794	NA	adult, primary head kidney macrophages, macrophage (Megalobrama amblycephala, 1 year old, SAMN10457794)	58,699,824	81%	48%	174,399
SAMN10457795	NA	adult, primary head kidney macrophages, macrophage (Megalobrama amblycephala, 1 year old, SAMN10457795)	76,489,096	84%	47%	182,603
SAMN10457796	NA	adult, primary head kidney macrophages, macrophage (Megalobrama amblycephala, 1 year old, SAMN10457796)	53,392,976	82%	48%	173,041
SAMN10457797	NA	adult, primary head kidney macrophages, macrophage (Megalobrama amblycephala, 1 year old, SAMN10457797)	68,031,970	82%	48%	176,418
SAMN10457798	NA	adult, primary head kidney macrophages, macrophage (Megalobrama amblycephala, 1 year old, SAMN10457798)	72,908,956	83%	45%	185,057
SAMN10457799	NA	adult, primary head kidney macrophages, macrophage (Megalobrama amblycephala, 1 year old, SAMN10457799)	65,216,014	82%	46%	179,773
SAMN10457800	NA	adult, primary head kidney macrophages, macrophage (Megalobrama amblycephala, 1 year old, SAMN10457800)	56,875,996	83%	46%	178,129
SAMN10457801	NA	adult, primary head kidney macrophages, macrophage (Megalobrama amblycephala, 1 year old, SAMN10457801)	55,755,916	76%	47%	173,114
SAMN10457802	NA	adult, primary head kidney macrophages, macrophage (Megalobrama amblycephala, 1 year old, SAMN10457802)	72,668,082	82%	48%	182,984
SAMN10457803	NA	adult, primary head kidney macrophages, macrophage (Megalobrama amblycephala, 1 year old, SAMN10457803)	52,527,004	82%	49%	171,155
SAMN10457804	NA	adult, primary head kidney macrophages, macrophage (Megalobrama amblycephala, 1 year old, SAMN10457804)	33,169,392	80%	47%	168,578
SAMN10457805	NA	adult, primary head kidney macrophages, macrophage (Megalobrama amblycephala, 1 year old, SAMN10457805)	60,449,830	79%	48%	176,085
SAMN10457806	NA	adult, primary head kidney macrophages, macrophage (Megalobrama amblycephala, 1 year old, SAMN10457806)	51,651,062	80%	49%	171,704
SAMN11316752	NA	brain, eyes, heart, liver, spleen, kidney, intestines, skin, muscle, gill (Megalobrama amblycephala, adult, pooled male and female, SAMN11316752)	104,013,838	40%	4%	155,383
SAMN14380551	NA	skin and scale (Megalobrama amblycephala, SAMN14380551)	41,998,038	88%	51%	228,867
SAMN14380552	NA	skin and scale (Megalobrama amblycephala, SAMN14380552)	42,545,926	88%	51%	232,435
SAMN14380553	NA	skin and scale (Megalobrama amblycephala, SAMN14380553)	42,378,644	89%	52%	225,209
SAMN14380554	NA	skin (Megalobrama amblycephala, SAMN14380554)	42,530,792	90%	54%	232,920
SAMN14380555	NA	skin (Megalobrama amblycephala, SAMN14380555)	42,498,970	90%	61%	194,835
SAMN14380556	NA	skin (Megalobrama amblycephala, SAMN14380556)	42,515,172	93%	58%	119,440
SAMN15332765	NA	intermuscular bones (Megalobrama amblycephala, six-months-old, female and male, SAMN15332765)	275,611,594	86%	36%	259,458
SAMN15332766	NA	intermuscular bones (Megalobrama amblycephala, three-years-old, female and male, SAMN15332766)	286,395,870	84%	38%	264,425
SAMN16392964	NA	muscle (Megalobrama amblycephala, 1 years old, SAMN16392964)	298,971,224	87%	44%	242,503
SAMN16392965	NA	muscle (Megalobrama amblycephala, 1 years old, male, SAMN16392965)	272,222,610	87%	43%	248,322
SAMN18522316	34003267	blood (Megalobrama amblycephala, adult, female, SAMN18522316)	524,568,274	82%	34%	331,879
SAMN24389518	NA	unfertilized eggs (Megalobrama amblycephala, not collected, female, SAMN24389518)	34,043,058	83%	49%	164,188
SAMN24389519	NA	unfertilized eggs (Megalobrama amblycephala, not collected, female, SAMN24389519)	28,798,300	74%	51%	155,018
SAMN24389520	NA	unfertilized eggs (Megalobrama amblycephala, not collected, female, SAMN24389520)	48,625,498	86%	51%	175,750
SAMN24389521	NA	water-swelled eggs (Megalobrama amblycephala, not collected, female, SAMN24389521)	17,679,226	72%	49%	137,410
SAMN24389522	NA	water-swelled eggs (Megalobrama amblycephala, not collected, female, SAMN24389522)	23,887,620	74%	47%	141,856
SAMN24389523	NA	water-swelled eggs (Megalobrama amblycephala, not collected, female, SAMN24389523)	27,965,418	88%	51%	155,417
SAMN24389524	NA	newly fertilized eggs (Megalobrama amblycephala, not collected, female, SAMN24389524)	24,055,792	87%	50%	154,319
SAMN24389525	NA	newly fertilized eggs (Megalobrama amblycephala, not collected, female, SAMN24389525)	17,093,184	71%	51%	135,735
SAMN24389526	NA	newly fertilized eggs (Megalobrama amblycephala, not collected, female, SAMN24389526)	20,211,186	85%	50%	149,138
SAMN24389527	NA	the fertilized eggs for half an hour (Megalobrama amblycephala, not collected, female, SAMN24389527)	27,963,364	89%	51%	156,250
SAMN24389528	NA	the fertilized eggs for half an hour (Megalobrama amblycephala, not collected, female, SAMN24389528)	39,633,668	87%	48%	157,031
SAMN24389529	NA	the fertilized eggs for half an hour (Megalobrama amblycephala, not collected, female, SAMN24389529)	33,977,096	89%	52%	157,540
SAMN24389530	NA	the fertilized eggs for one hour (Megalobrama amblycephala, not collected, female, SAMN24389530)	29,016,738	88%	51%	162,880
SAMN24389531	NA	the fertilized eggs for one hour (Megalobrama amblycephala, not collected, female, SAMN24389531)	31,034,180	89%	52%	158,924
SAMN24389532	NA	the fertilized eggs for one hour (Megalobrama amblycephala, not collected, female, SAMN24389532)	29,453,406	90%	52%	159,676

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
SRR363809	SRX096134	SRP008107	SAMN00715909	1,409,706	78%	55%
SRR1610771	SRX731259	SRP048850	SAMN03104782	48,046,044	86%	30%
SRR1611584	SRX740764	SRP048850	SAMN03104783	47,529,384	86%	31%
SRR1612557	SRX732923	SRP048932	SAMN03108594	39,073,700	85%	33%
SRR1613326	SRX733625	SRP048956	SAMN03112591	38,718,120	85%	33%
SRR1731587	SRX810947	SRP050593	SAMN03253152	99,644,426	90%	34%
SRR1731588	SRX810948	SRP050593	SAMN03253154	104,244,002	89%	33%
SRR1731589	SRX810949	SRP050593	SAMN03253166	114,696,020	90%	31%
SRR1731590	SRX810950	SRP050593	SAMN03253171	99,745,432	87%	31%
SRR2976392	SRX1465892	SRP067072	SAMN04318424	67,477,436	83%	49%
SRR2976393	SRX1465893	SRP067072	SAMN04318425	65,222,674	84%	49%
SRR4125068	SRX2107047	SRP084356	SAMN05727044	65,222,674	84%	49%
SRR4125067	SRX2107046	SRP084356	SAMN05727045	67,477,436	83%	49%
SRR5512263	SRX2786233	SRP090157	SAMN06887734	60,936,140	86%	22%
SRR5512264	SRX2786234	SRP090157	SAMN06887735	113,030,060	88%	25%
SRR5512253	SRX2786223	SRP090157	SAMN06887736	69,841,506	91%	33%
SRR5512260	SRX2786224	SRP090157	SAMN06887737	55,908,166	89%	35%
SRR5512261	SRX2786231	SRP090157	SAMN06887738	64,855,446	90%	38%
SRR5512262	SRX2786232	SRP090157	SAMN06887739	61,618,862	92%	36%
SRR14126287	SRX10496512	SRP090157	SAMN18522316	43,976,108	82%	36%
SRR14126286	SRX10496513	SRP090157	SAMN18522316	43,452,806	82%	37%
SRR14126285	SRX10496514	SRP090157	SAMN18522316	44,264,314	81%	34%
SRR14126284	SRX10496515	SRP090157	SAMN18522316	44,375,498	81%	35%
SRR14126283	SRX10496516	SRP090157	SAMN18522316	43,102,190	82%	37%
SRR14126282	SRX10496517	SRP090157	SAMN18522316	42,721,288	82%	28%
SRR14126281	SRX10496518	SRP090157	SAMN18522316	43,603,734	81%	30%
SRR14126280	SRX10496519	SRP090157	SAMN18522316	44,196,366	80%	31%
SRR14126279	SRX10496520	SRP090157	SAMN18522316	43,069,714	83%	35%
SRR14126278	SRX10496521	SRP090157	SAMN18522316	44,290,172	83%	35%
SRR14126277	SRX10496522	SRP090157	SAMN18522316	43,023,710	83%	34%
SRR14126276	SRX10496523	SRP090157	SAMN18522316	44,492,374	81%	35%
SRR5763110	SRX2962787	SRP110651	SAMN07285960	21,055,461	83%	46%
SRR5763111	SRX2962786	SRP110651	SAMN07285961	25,809,396	87%	51%
SRR5763112	SRX2962785	SRP110651	SAMN07285962	20,380,287	89%	49%
SRR5763113	SRX2962784	SRP110651	SAMN07285963	16,997,735	88%	53%
SRR5763106	SRX2962791	SRP110651	SAMN07285964	20,125,647	89%	51%
SRR5763109	SRX2962788	SRP110651	SAMN07285965	20,435,825	88%	52%
SRR5763116	SRX2962781	SRP110651	SAMN07285966	20,193,999	87%	51%
SRR5763117	SRX2962780	SRP110651	SAMN07285967	19,703,408	88%	51%
SRR5763107	SRX2962790	SRP110651	SAMN07285968	19,613,412	88%	52%
SRR5763108	SRX2962789	SRP110651	SAMN07285969	20,058,243	88%	53%
SRR5763114	SRX2962783	SRP110651	SAMN07285970	19,812,931	88%	53%
SRR5763115	SRX2962782	SRP110651	SAMN07285971	18,991,663	89%	51%
SRR5763135	SRX2962810	SRP110655	SAMN07285688	51,674,516	88%	50%
SRR5763136	SRX2962809	SRP110655	SAMN07285689	52,931,508	88%	54%
SRR5763134	SRX2962811	SRP110655	SAMN07285690	40,929,414	85%	44%
SRR6201717	SRX3311643	SRP120730	SAMN07821248	86,359,184	80%	48%
SRR6201736	SRX3311664	SRP120749	SAMN07508548	86,585,268	86%	47%
SRR6228045	SRX3311664	SRP120749	SAMN07508548	80,588,410	76%	50%
SRR6660866	SRX3637963	SRP131937	SAMN08449832	80,588,410	76%	50%
SRR6660978	SRX3638073	SRP131937	SAMN08449833	41,894,952	65%	57%
SRR6660977	SRX3638074	SRP131937	SAMN08449833	40,109,404	64%	56%
SRR6660976	SRX3638075	SRP131937	SAMN08449833	54,842,910	60%	25%
SRR6661078	SRX3638173	SRP131951	SAMN08449832	52,091,274	77%	29%
SRR6661077	SRX3638174	SRP131951	SAMN08449832	51,496,376	79%	27%
SRR8224158	SRX5042670	SRP169988	SAMN10457792	70,457,946	83%	50%
SRR8224157	SRX5042671	SRP169988	SAMN10457793	62,514,254	80%	47%
SRR8224156	SRX5042672	SRP169988	SAMN10457794	58,699,824	81%	48%
SRR8224155	SRX5042673	SRP169988	SAMN10457795	76,489,096	84%	47%
SRR8224154	SRX5042674	SRP169988	SAMN10457796	53,392,976	82%	48%
SRR8224153	SRX5042675	SRP169988	SAMN10457797	68,031,970	82%	48%
SRR8224152	SRX5042676	SRP169988	SAMN10457798	72,908,956	83%	45%
SRR8224151	SRX5042677	SRP169988	SAMN10457799	65,216,014	82%	46%
SRR8224160	SRX5042668	SRP169988	SAMN10457800	56,875,996	83%	46%
SRR8224159	SRX5042669	SRP169988	SAMN10457801	55,755,916	76%	47%
SRR8224163	SRX5042665	SRP169988	SAMN10457802	72,668,082	82%	48%
SRR8224162	SRX5042666	SRP169988	SAMN10457803	52,527,004	82%	49%
SRR8224165	SRX5042663	SRP169988	SAMN10457804	33,169,392	80%	47%
SRR8224164	SRX5042664	SRP169988	SAMN10457805	60,449,830	79%	48%
SRR8224161	SRX5042667	SRP169988	SAMN10457806	51,651,062	80%	49%
SRR8916529	SRX5697863	SRP192781	SAMN11316752	104,013,838	40%	4%
SRR11312085	SRX7916539	SRP252958	SAMN14380551	41,998,038	88%	51%
SRR11312084	SRX7916540	SRP252958	SAMN14380552	42,545,926	88%	51%
SRR11312083	SRX7916541	SRP252958	SAMN14380553	42,378,644	89%	52%
SRR11312088	SRX7916536	SRP252958	SAMN14380554	42,530,792	90%	54%
SRR11312087	SRX7916537	SRP252958	SAMN14380555	42,498,970	90%	61%
SRR11312086	SRX7916538	SRP252958	SAMN14380556	42,515,172	93%	58%
SRR12073329	SRX8600880	SRP268412	SAMN15332765	94,702,670	85%	37%
SRR12073328	SRX8600881	SRP268412	SAMN15332765	93,291,392	86%	35%
SRR12073325	SRX8600884	SRP268412	SAMN15332765	87,617,532	87%	36%
SRR12073324	SRX8600885	SRP268412	SAMN15332766	96,794,976	86%	35%
SRR12073323	SRX8600886	SRP268412	SAMN15332766	99,117,984	84%	40%
SRR12073322	SRX8600887	SRP268412	SAMN15332766	90,482,910	84%	39%
SRR12791368	SRX9260484	SRP286688	SAMN16392964	115,662,842	87%	44%
SRR12791367	SRX9260485	SRP286688	SAMN16392964	91,517,836	86%	45%
SRR12791366	SRX9260486	SRP286688	SAMN16392964	91,790,546	88%	44%
SRR12791365	SRX9260487	SRP286688	SAMN16392965	84,720,390	86%	43%
SRR12791364	SRX9260488	SRP286688	SAMN16392965	94,082,116	86%	41%
SRR12791363	SRX9260489	SRP286688	SAMN16392965	93,420,104	89%	45%
SRR17641360	SRX13809608	SRP355363	SAMN24389518	34,043,058	83%	49%
SRR17641359	SRX13809609	SRP355363	SAMN24389519	28,798,300	74%	51%
SRR17641358	SRX13809610	SRP355363	SAMN24389520	48,625,498	86%	51%
SRR17641357	SRX13809611	SRP355363	SAMN24389521	17,679,226	72%	49%
SRR17641356	SRX13809612	SRP355363	SAMN24389522	23,887,620	74%	47%
SRR17641354	SRX13809614	SRP355363	SAMN24389523	27,965,418	88%	51%
SRR17641353	SRX13809615	SRP355363	SAMN24389524	24,055,792	87%	50%
SRR17641352	SRX13809616	SRP355363	SAMN24389525	17,093,184	71%	51%
SRR17641351	SRX13809617	SRP355363	SAMN24389526	20,211,186	85%	50%
SRR17641350	SRX13809618	SRP355363	SAMN24389527	27,963,364	89%	51%
SRR17641349	SRX13809619	SRP355363	SAMN24389528	39,633,668	87%	48%
SRR17641348	SRX13809620	SRP355363	SAMN24389529	33,977,096	89%	52%
SRR17641347	SRX13809621	SRP355363	SAMN24389530	29,016,738	88%	51%
SRR17641346	SRX13809622	SRP355363	SAMN24389531	31,034,180	89%	52%
SRR17641345	SRX13809623	SRP355363	SAMN24389532	29,453,406	90%	52%

Protein alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Cynoglossus semilaevis high-quality model RefSeq (XP_)	14,331	14,003 (97.71%)	14,003 (97.71%)	68.19%	77.15%
Poecilia formosa high-quality model RefSeq (XP_)	18,503	17,938 (96.95%)	17,938 (96.95%)	67.31%	76.17%
Actinopterygii GenBank	89,826	86,081 (95.83%)	86,081 (95.83%)	69.85%	81.94%
Actinopterygii known RefSeq (NP_)	25,472	24,746 (97.15%)	24,746 (97.15%)	71.53%	83.05%
Danio rerio high-quality model RefSeq (XP_)	7,717	7,543 (97.75%)	7,543 (97.75%)	72.06%	82.99%
Astyanax mexicanus high-quality model RefSeq (XP_)	16,692	16,410 (98.31%)	16,410 (98.31%)	68.56%	80.31%
Esox lucius high-quality model RefSeq (XP_)	18,508	17,994 (97.22%)	17,994 (97.22%)	67.89%	77.32%
Homo sapiens known RefSeq (NP_)	64,133	54,317 (84.69%)	54,317 (84.69%)	67.45%	71.69%

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
BUSCO: Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. Molecular biology and evolution 2021.38(10):4647-4654
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20
Minimap2: Li H. Bioinformatics 2018 Sep 15;34(18):3094-3100

RefSeq

Integrated reference sequences