NCBI Eriocheir sinensis Annotation Release 100

The RefSeq genome records for Eriocheir sinensis were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. This report presents statistics on the annotation products, the input data used in the pipeline and intermediate alignment results.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
BUSCO results: Annotation completeness assessed with BUSCO
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as NCBI Eriocheir sinensis Annotation Release 100

Annotation release ID: 100
Date of Entrez queries for transcripts and proteins: Sep 22 2022
Date of submission of annotation to the public databases: Sep 30 2022
Software version: 10.0

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
ASM2467909v1	GCF_024679095.1	Shanghai Ocean University	08-16-2022	Reference	70 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	ASM2467909v1
Genes and pseudogenes	30,289
protein-coding	19,615
non-coding	10,015
Transcribed pseudogenes	5
Non-transcribed pseudogenes	653
genes with variants	11,610
Immunoglobulin/T-cell receptor gene segments	0
other	1
mRNAs	54,884
fully-supported	51,968
with > 5% ab initio	1,671
partial	687
with filled gap(s)	342
known RefSeq (NM_)	0
model RefSeq (XM_)	54,884
non-coding RNAs	31,089
fully-supported	29,724
with > 5% ab initio	0
partial	20
with filled gap(s)	20
known RefSeq (NR_)	0
model RefSeq (XR_)	30,249
pseudo transcripts	5
fully-supported	4
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	5
CDSs	54,884
fully-supported	51,968
with > 5% ab initio	1,885
partial	666
with major correction(s)	244
known RefSeq (NP_)	0
model RefSeq (XP_)	54,884

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	29,631	22,372	7,926	62	483,883
All transcripts	85,973	3,526	2,552	62	63,355
mRNA	54,884	4,131	3,141	123	63,355
misc_RNA	7,583	3,887	2,921	203	21,524
tRNA	840	74	73	70	87
lncRNA	22,145	2,112	1,425	129	31,691
snoRNA	44	148	192	62	202
snRNA	129	154	163	105	193
rRNA	347	317	119	117	4,996
Single-exon transcripts	810	1,705	1,317	284	15,655
coding transcripts (NM_/XM_ )	810	1,705	1,317	284	15,655
CDSs	54,884	2,159	1,452	123	62,031
Exons	244,383	472	170	2	27,245
in coding transcripts (NM_/XM_ )	192,074	434	167	2	20,880
in non-coding transcripts (NR_/XR_ )	64,502	543	183	9	27,245
Introns	208,064	4,069	758	30	190,883
in coding transcripts (NM_/XM_ )	169,324	4,178	870	30	190,883
in non-coding transcripts (NR_/XR_ )	50,519	3,563	438	30	99,569

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	2.96	1	1	50
Number of exons per transcript	8.35	6	1	111

BUSCO analysis of gene annotation

BUSCO v4.1.4 was run in "protein" mode on the annotated gene set picking one longest protein per gene, and run using the arthropoda_odb10 lineage dataset. Results are reported for the gene set from the primary assembly unit, and presented in BUSCO notation.

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the UniProtKB/Swiss-Prot curated proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 19615 coding genes, 12627 genes had a protein with an alignment covering 50% or more of the query and 2560 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: UniProtKB/Swiss-Prot curated proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker (if calculated), for each assembly. RepeatMasker results are only calculated for organisms with complete Dfam HMM model collections.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with WindowMasker
ASM2467909v1	GCF_024679095.1	55.93%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez Nucleotide, Entrez Protein, and SRA, and aligned to the genome.

Transcript alignments

The alignments of the following transcripts with Splign were used for gene prediction:

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species Genbank	677	664 (98.08%)	604 (89.22%)	99.14%	97.23%
Same-species TSA	461,381	425,119 (92.14%)	344,941 (74.76%)	99.18%	98.73%
Same-species EST	16,987	14,914 (87.80%)	14,071 (82.83%)	99.20%	98.24%

RNA-Seq alignments

The alignments of the following RNA-Seq reads with STAR were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	Aggregate of all aligned samples	6,639,725,424	73%	32%	448,222
SAMN03273342	hepatopancreas (Eriocheir sinensis, male, SAMN03273342)	251,704,966	80%	22%	224,755
SAMN03780434	hemocytes (Eriocheir sinensis, 1, pooled male and female, SAMN03780434)	23,868,558	67%	20%	88,146
SAMN06204640	gland (Eriocheir sinensis, 1year, SAMN06204640)	73,856,906	60%	32%	177,514
SAMN06204641	gland (Eriocheir sinensis, 2years, SAMN06204641)	67,435,856	59%	34%	179,558
SAMN06946596	eyestalk, hepatopancreas, muscle, heart, stomach, gill, thoracic ganglia, intestine, ovary, and testis (Eriocheir sinensis, two, pooled male and female, SAMN06946596)	619,262,438	85%	31%	217,943
SAMN09664721	gills (Eriocheir sinensis, crab, SAMN09664721)	49,845,580	61%	19%	138,447
SAMN09664723	gills (Eriocheir sinensis, crab, SAMN09664723)	49,450,588	68%	32%	147,763
SAMN10346445	eyestalk (Eriocheir sinensis, One year old, not collected, SAMN10346445)	51,786,884	73%	26%	190,628
SAMN10346446	eyestalk (Eriocheir sinensis, One year old, not collected, SAMN10346446)	59,026,446	75%	25%	192,324
SAMN10346447	eyestalk (Eriocheir sinensis, One year old, not collected, SAMN10346447)	58,383,804	75%	27%	192,860
SAMN11928146	gill (Eriocheir sinensis, SAMN11928146)	112,406,504	75%	29%	171,395
SAMN11930397	gill (Eriocheir sinensis, SAMN11930397)	98,690,970	74%	35%	172,315
SAMN11930399	gill (Eriocheir sinensis, SAMN11930399)	115,833,814	76%	29%	160,128
SAMN11930400	gill (Eriocheir sinensis, SAMN11930400)	90,634,634	74%	36%	157,640
SAMN11936013	hepatopancreas (Eriocheir sinensis, male, SAMN11936013)	48,219,524	77%	28%	166,221
SAMN11936014	hepatopancreas (Eriocheir sinensis, male, SAMN11936014)	44,950,578	77%	27%	163,813
SAMN11936015	hepatopancreas (Eriocheir sinensis, male, SAMN11936015)	46,421,300	77%	27%	163,955
SAMN11936016	hepatopancreas (Eriocheir sinensis, male, SAMN11936016)	99,087,448	85%	30%	171,713
SAMN11936017	hepatopancreas (Eriocheir sinensis, male, SAMN11936017)	89,086,478	84%	34%	164,408
SAMN11936018	hepatopancreas (Eriocheir sinensis, male, SAMN11936018)	100,117,958	85%	32%	162,178
SAMN12236545	brain (Eriocheir sinensis, one year old, male, SAMN12236545)	196,881,932	81%	35%	232,481
SAMN12236546	Liver (Eriocheir sinensis, one year old, male, SAMN12236546)	191,243,876	84%	37%	223,600
SAMN12238776	Brain (Eriocheir sinensis, one year old, female, SAMN12238776)	195,521,498	80%	34%	247,029
SAMN12238784	Liver (Eriocheir sinensis, one year old, female, SAMN12238784)	202,864,612	86%	36%	218,759
SAMN12326771	muscle (Eriocheir sinensis, SAMN12326771)	61,749,254	70%	23%	84,391
SAMN12541675	heart (Eriocheir sinensis, SAMN12541675)	68,596,452	64%	31%	146,288
SAMN12541676	liver (Eriocheir sinensis, SAMN12541676)	69,892,892	89%	34%	156,091
SAMN12714332	whole body (Eriocheir sinensis, SAMN12714332)	185,178,888	76%	35%	223,980
SAMN12714333	whole body (Eriocheir sinensis, SAMN12714333)	142,667,234	78%	35%	221,123
SAMN13028156	androgenic gland from eyestalk ablation control group (Eriocheir sinensis, SAMN13028156)	86,182,176	79%	29%	181,654
SAMN13028157	androgenic gland from eyestalk ablation crab (Eriocheir sinensis, SAMN13028157)	96,451,988	80%	31%	190,495
SAMN13028158	androgenic gland from IAG knock down control group (Eriocheir sinensis, SAMN13028158)	86,931,250	76%	28%	180,217
SAMN13028159	testis from IAG knock down control group (Eriocheir sinensis, SAMN13028159)	85,471,170	87%	22%	250,270
SAMN13028160	androgenic gland from IAG knock down crab (Eriocheir sinensis, SAMN13028160)	86,962,344	78%	29%	184,662
SAMN13028161	testis from IAG knock down crab (Eriocheir sinensis, SAMN13028161)	88,453,304	86%	21%	247,572
SAMN13564652	whole body (Eriocheir sinensis, first stage zoea, SAMN13564652)	64,386,126	82%	42%	178,598
SAMN13564653	whole body (Eriocheir sinensis, first stage zoea, SAMN13564653)	55,599,080	83%	43%	174,755
SAMN13564654	whole body (Eriocheir sinensis, first stage zoea, SAMN13564654)	62,426,108	82%	42%	194,891
SAMN13564655	whole body (Eriocheir sinensis, first stage zoea, SAMN13564655)	59,061,474	80%	36%	194,903
SAMN13564656	whole body (Eriocheir sinensis, first stage zoea, SAMN13564656)	59,873,648	80%	36%	197,019
SAMN13564657	whole body (Eriocheir sinensis, first stage zoea, SAMN13564657)	65,551,046	81%	37%	199,549
SAMN14436411	megalopa (Eriocheir sinensis, SAMN14436411)	329,232,004	75%	31%	314,148
SAMN19460310	hemolymph (Eriocheir sinensis, SAMN19460310)	247,647,180	64%	32%	203,332
SAMN19460311	hemolymph (Eriocheir sinensis, SAMN19460311)	235,000,206	68%	34%	191,644
SAMN19460312	hemolymph (Eriocheir sinensis, SAMN19460312)	237,255,616	60%	33%	206,097
SAMN19460313	hemolymph (Eriocheir sinensis, SAMN19460313)	226,472,212	62%	31%	210,461
SAMN22550281	Ovary (Eriocheir sinensis, female, SAMN22550281)	137,894,066	85%	46%	216,628
SAMN22550355	Ovary (Eriocheir sinensis, female, SAMN22550355)	140,442,394	85%	47%	213,284
SAMN22550414	Ovary (Eriocheir sinensis, female, SAMN22550414)	135,632,564	86%	46%	214,476
SAMN27281647	thoracic ganglia (Eriocheir sinensis, juvenile, SAMN27281647)	271,779,902	63%	20%	233,962

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
SRR1735503	SRX824594	SRP051575	SAMN03273342	70,125,586	76%	23%
SRR1735537	SRX845724	SRP051575	SAMN03273342	58,544,920	81%	23%
SRR1735536	SRX845725	SRP051575	SAMN03273342	123,034,460	82%	21%
SRR2073826	SRX1068777	SRP059617	SAMN03780434	23,868,558	67%	20%
SRR5169063	SRX2485857	SRP096300	SAMN06204640	59,087,476	75%	32%
SRR5169066	SRX2485862	SRP096300	SAMN06204641	52,312,098	77%	34%
SRR7188323	SRX4105045	SRP148536	SAMN06946596	164,493,444	80%	31%
SRR7188322	SRX4105046	SRP148536	SAMN06946596	152,264,764	83%	31%
SRR7188321	SRX4105047	SRP148536	SAMN06946596	157,036,202	89%	31%
SRR7188320	SRX4105048	SRP148536	SAMN06946596	145,468,028	88%	30%
SRR7530968	SRX4398927	SRP154143	SAMN09664721	49,845,580	61%	19%
SRR7530969	SRX4398926	SRP154143	SAMN09664723	49,450,588	68%	32%
SRR8135130	SRX4956158	SRP167228	SAMN10346445	51,786,884	73%	26%
SRR8135131	SRX4956157	SRP167228	SAMN10346446	59,026,446	75%	25%
SRR8135129	SRX4956159	SRP167228	SAMN10346447	58,383,804	75%	27%
SRR9179351	SRX5951970	SRP200123	SAMN11928146	38,841,568	75%	29%
SRR9179350	SRX5951971	SRP200123	SAMN11928146	36,439,480	75%	29%
SRR9179348	SRX5951973	SRP200123	SAMN11928146	37,125,456	75%	29%
SRR9179349	SRX5951972	SRP200123	SAMN11930397	27,660,816	74%	36%
SRR9179347	SRX5951974	SRP200123	SAMN11930397	38,374,428	75%	34%
SRR9179346	SRX5951975	SRP200123	SAMN11930397	32,655,726	74%	37%
SRR9179345	SRX5951976	SRP200123	SAMN11930399	40,140,450	77%	29%
SRR9179344	SRX5951977	SRP200123	SAMN11930399	38,994,304	75%	28%
SRR9179342	SRX5951979	SRP200123	SAMN11930399	36,699,060	75%	29%
SRR9179343	SRX5951978	SRP200123	SAMN11930400	38,919,454	73%	36%
SRR9179341	SRX5951980	SRP200123	SAMN11930400	25,934,308	76%	33%
SRR9179340	SRX5951981	SRP200123	SAMN11930400	25,780,872	75%	38%
SRR9202038	SRX5973460	SRP200428	SAMN11936013	48,219,524	77%	28%
SRR9202039	SRX5973459	SRP200428	SAMN11936014	44,950,578	77%	27%
SRR9202040	SRX5973458	SRP200428	SAMN11936015	46,421,300	77%	27%
SRR9202041	SRX5973457	SRP200428	SAMN11936016	99,087,448	85%	30%
SRR9202042	SRX5973456	SRP200428	SAMN11936017	89,086,478	84%	34%
SRR9202043	SRX5973455	SRP200428	SAMN11936018	100,117,958	85%	32%
SRR9663143	SRX6424019	SRP213978	SAMN12236545	196,881,932	81%	35%
SRR9663142	SRX6424020	SRP213978	SAMN12236546	191,243,876	84%	37%
SRR9663145	SRX6424017	SRP213978	SAMN12238776	195,521,498	80%	34%
SRR9663144	SRX6424018	SRP213978	SAMN12238784	202,864,612	86%	36%
SRR9964280	SRX6711794	SRP218295	SAMN12326771	61,749,254	70%	23%
SRR9964281	SRX6711793	SRP218295	SAMN12541675	68,596,452	64%	31%
SRR9964282	SRX6711792	SRP218295	SAMN12541676	69,892,892	89%	34%
SRR10083963	SRX6816969	SRP220979	SAMN12714332	64,765,530	77%	35%
SRR10083962	SRX6816970	SRP220979	SAMN12714332	64,760,558	73%	34%
SRR10083961	SRX6816971	SRP220979	SAMN12714332	55,652,800	79%	35%
SRR10083960	SRX6816972	SRP220979	SAMN12714333	46,802,770	80%	37%
SRR10083959	SRX6816973	SRP220979	SAMN12714333	48,745,298	77%	34%
SRR10083958	SRX6816974	SRP220979	SAMN12714333	47,119,166	77%	34%
SRR10276548	SRX6989748	SRP225587	SAMN13028156	43,813,668	76%	27%
SRR10276547	SRX6989749	SRP225587	SAMN13028156	42,368,508	82%	32%
SRR10276544	SRX6989752	SRP225587	SAMN13028157	43,248,686	80%	29%
SRR10276543	SRX6989753	SRP225587	SAMN13028157	53,203,302	80%	33%
SRR10276542	SRX6989754	SRP225587	SAMN13028158	43,680,586	73%	25%
SRR10276541	SRX6989755	SRP225587	SAMN13028158	43,250,664	78%	31%
SRR10276540	SRX6989756	SRP225587	SAMN13028159	42,664,112	87%	20%
SRR10276539	SRX6989757	SRP225587	SAMN13028159	42,807,058	87%	25%
SRR10276538	SRX6989758	SRP225587	SAMN13028160	44,315,578	76%	26%
SRR10276537	SRX6989759	SRP225587	SAMN13028160	42,646,766	80%	33%
SRR10276546	SRX6989750	SRP225587	SAMN13028161	43,984,288	85%	20%
SRR10276545	SRX6989751	SRP225587	SAMN13028161	44,469,016	86%	22%
SRR10736386	SRX7412182	SRP238119	SAMN13564652	64,386,126	82%	42%
SRR10736385	SRX7412183	SRP238119	SAMN13564653	55,599,080	83%	43%
SRR10736384	SRX7412184	SRP238119	SAMN13564654	62,426,108	82%	42%
SRR10736383	SRX7412185	SRP238119	SAMN13564655	59,061,474	80%	36%
SRR10736382	SRX7412186	SRP238119	SAMN13564656	59,873,648	80%	36%
SRR10736381	SRX7412187	SRP238119	SAMN13564657	65,551,046	81%	37%
SRR11411683	SRX7990333	SRP253936	SAMN14436411	165,909,826	76%	31%
SRR11411682	SRX7990334	SRP253936	SAMN14436411	163,322,178	74%	31%
SRR14692539	SRX11030644	SRP321971	SAMN19460310	46,118,824	65%	32%
SRR14692538	SRX11030645	SRP321971	SAMN19460310	45,552,870	64%	33%
SRR14692527	SRX11030656	SRP321971	SAMN19460310	50,689,212	64%	32%
SRR14692526	SRX11030657	SRP321971	SAMN19460310	52,814,718	64%	33%
SRR14692525	SRX11030658	SRP321971	SAMN19460310	52,471,556	64%	32%
SRR14692524	SRX11030659	SRP321971	SAMN19460311	47,214,214	67%	33%
SRR14692523	SRX11030660	SRP321971	SAMN19460311	50,057,780	68%	34%
SRR14692522	SRX11030661	SRP321971	SAMN19460311	48,052,960	68%	34%
SRR14692521	SRX11030662	SRP321971	SAMN19460311	44,680,494	69%	34%
SRR14692520	SRX11030663	SRP321971	SAMN19460311	44,994,758	68%	34%
SRR14692537	SRX11030646	SRP321971	SAMN19460312	44,485,446	61%	33%
SRR14692536	SRX11030647	SRP321971	SAMN19460312	50,757,614	62%	34%
SRR14692535	SRX11030648	SRP321971	SAMN19460312	48,981,918	60%	32%
SRR14692534	SRX11030649	SRP321971	SAMN19460312	48,885,300	60%	33%
SRR14692533	SRX11030650	SRP321971	SAMN19460312	44,145,338	59%	32%
SRR14692532	SRX11030651	SRP321971	SAMN19460313	38,955,202	61%	32%
SRR14692531	SRX11030652	SRP321971	SAMN19460313	48,648,226	62%	30%
SRR14692530	SRX11030653	SRP321971	SAMN19460313	47,971,658	62%	30%
SRR14692529	SRX11030654	SRP321971	SAMN19460313	45,067,472	62%	31%
SRR14692528	SRX11030655	SRP321971	SAMN19460313	45,829,654	62%	32%
SRR16603374	SRX12804510	SRP343517	SAMN22550281	137,894,066	85%	46%
SRR16603373	SRX12804511	SRP343517	SAMN22550355	140,442,394	85%	47%
SRR16603372	SRX12804512	SRP343517	SAMN22550414	135,632,564	86%	46%
SRR18609759	SRX14736785	SRP367526	SAMN27281647	45,247,234	49%	17%
SRR18609758	SRX14736786	SRP367526	SAMN27281647	45,992,206	70%	17%
SRR18609757	SRX14736787	SRP367526	SAMN27281647	47,182,912	64%	24%
SRR18609756	SRX14736788	SRP367526	SAMN27281647	39,969,580	69%	22%
SRR18609755	SRX14736789	SRP367526	SAMN27281647	43,788,168	70%	22%
SRR18609754	SRX14736790	SRP367526	SAMN27281647	49,599,802	56%	20%

Protein alignments

The alignments of the following proteins with ProSplign were used for gene prediction:

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Penaeus japonicus high-quality model RefSeq (XP_)	15,395	13,252 (86.08%)	13,252 (86.08%)	69.17%	69.93%
Hyalella azteca high-quality model RefSeq (XP_)	10,207	7,076 (69.32%)	7,076 (69.32%)	63.73%	55.79%
Crustacea GenBank	45,995	40,784 (88.67%)	40,784 (88.67%)	69.39%	75.12%
Daphnia pulex high-quality model RefSeq (XP_)	14,091	9,171 (65.08%)	9,171 (65.08%)	61.71%	49.83%
Homarus americanus high-quality model RefSeq (XP_)	14,107	12,493 (88.56%)	12,493 (88.56%)	69.97%	71.42%
Tribolium castaneum GenBank	679	572 (84.24%)	572 (84.24%)	67.24%	59.31%
Tribolium castaneum high-quality model RefSeq (XP_)	11,487	7,700 (67.03%)	7,700 (67.03%)	61.03%	51.95%
Tribolium castaneum known RefSeq (NP_)	627	507 (80.86%)	507 (80.86%)	66.29%	56.56%
Same-species GenBank	521	518 (99.42%)	518 (99.42%)	85.37%	89.83%

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
BUSCO: Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. Molecular biology and evolution 2021.38(10):4647-4654
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20
STAR: Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. Bioinformatics 2013 Jan 1;29(1):15-21.
Minimap2: Li H. Bioinformatics 2018 Sep 15;34(18):3094-3100

RefSeq

Integrated reference sequences