NCBI Dermacentor andersoni Annotation Release 100

The RefSeq genome records for Dermacentor andersoni were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. This report presents statistics on the annotation products, the input data used in the pipeline and intermediate alignment results.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
BUSCO results: Annotation completeness assessed with BUSCO
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as NCBI Dermacentor andersoni Annotation Release 100

Annotation release ID: 100
Date of Entrez queries for transcripts and proteins: Aug 16 2022
Date of submission of annotation to the public databases: Aug 19 2022
Software version: 10.0

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
qqDerAnde1.2	GCF_023375885.1	United States Department of Agriculture	05-16-2022	Reference	1 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	qqDerAnde1.2
Genes and pseudogenes	32,684
protein-coding	20,370
non-coding	10,118
Transcribed pseudogenes	0
Non-transcribed pseudogenes	2,195
genes with variants	5,150
Immunoglobulin/T-cell receptor gene segments	0
other	1
mRNAs	30,402
fully-supported	25,028
with > 5% ab initio	3,877
partial	235
with filled gap(s)	1
known RefSeq (NM_)	0
model RefSeq (XM_)	30,402
non-coding RNAs	11,807
fully-supported	5,194
with > 5% ab initio	0
partial	3
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	6,991
pseudo transcripts	0
fully-supported	0
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	0
CDSs	30,415
fully-supported	25,028
with > 5% ab initio	4,088
partial	235
with major correction(s)	73
known RefSeq (NP_)	0
model RefSeq (XP_)	30,415

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	30,489	33,111	6,755	54	1,479,256
All transcripts	42,209	2,063	1,386	52	57,898
mRNA	30,402	2,673	1,894	84	57,898
misc_RNA	957	1,681	1,114	107	13,486
tRNA	4,814	74	73	54	88
lncRNA	4,239	754	523	52	13,807
snoRNA	16	166	206	69	396
snRNA	700	174	186	103	201
rRNA	1,080	489	119	118	5,110
Single-exon transcripts	2,510	1,563	1,286	297	12,074
coding transcripts (NM_/XM_ )	2,510	1,563	1,286	297	12,074
CDSs	30,415	1,682	1,242	84	57,045
Exons	170,764	311	154	1	20,970
in coding transcripts (NM_/XM_ )	159,038	313	155	1	20,970
in non-coding transcripts (NR_/XR_ )	14,847	267	138	10	10,405
Introns	145,911	7,980	2,294	30	591,736
in coding transcripts (NM_/XM_ )	138,005	7,807	2,300	30	591,736
in non-coding transcripts (NR_/XR_ )	10,767	9,931	2,219	30	520,947

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	1.46	1	1	50
Number of exons per transcript	7.95	5	1	184

BUSCO analysis of gene annotation

BUSCO v4.1.4 was run in "protein" mode on the annotated gene set picking one longest protein per gene, and run using the arachnida_odb10 lineage dataset. Results are reported for the gene set from the primary assembly unit, and presented in BUSCO notation.

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the UniProtKB/Swiss-Prot curated proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 20357 coding genes, 13285 genes had a protein with an alignment covering 50% or more of the query and 2738 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: UniProtKB/Swiss-Prot curated proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker (if calculated), for each assembly. RepeatMasker results are only calculated for organisms with complete Dfam HMM model collections.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with WindowMasker
qqDerAnde1.2	GCF_023375885.1	36.92%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez Nucleotide, Entrez Protein, and SRA, and aligned to the genome.

Transcript alignments

The alignments of the following transcripts with Splign were used for gene prediction:

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species Genbank	34	32 (94.12%)	31 (91.18%)	99.08%	97.85%
Same-species EST	1,387	959 (69.14%)	798 (57.53%)	98.67%	98.77%

RNA-Seq alignments

The alignments of the following RNA-Seq reads with STAR were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Publication	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	NA	Aggregate of all aligned samples	5,706,989,788	60%	20%	216,348
SAMN02854796	NA	midgut (Dermacentor andersoni, adult, male, SAMN02854796)	3,475,932	54%	38%	51,963
SAMN04331754	NA	whole animal (Dermacentor variabilis, SAMN04331754)	26,432,310	45%	8%	65,403
SAMN04331755	NA	whole animal (Dermacentor variabilis, SAMN04331755)	33,609,337	46%	8%	69,943
SAMN04331756	NA	whole animal (Dermacentor variabilis, SAMN04331756)	25,242,545	72%	11%	89,060
SAMN04331757	NA	whole animal (Dermacentor variabilis, SAMN04331757)	31,809,731	72%	11%	93,643
SAMN04331758	NA	whole animal (Dermacentor variabilis, SAMN04331758)	41,753,974	71%	11%	93,752
SAMN04331775	NA	whole animal (Dermacentor variabilis, SAMN04331775)	19,548,217	75%	11%	76,876
SAMN04331776	NA	whole animal (Dermacentor variabilis, SAMN04331776)	33,650,599	75%	11%	87,359
SAMN04331777	NA	whole animal (Dermacentor variabilis, SAMN04331777)	40,968,111	59%	9%	88,007
SAMN04331778	NA	whole animal (Dermacentor variabilis, SAMN04331778)	41,752,184	74%	11%	94,867
SAMN06481838	NA	Skin (Dermacentor variabilis, SAMN06481838)	524,431,098	42%	12%	159,792
SAMN08513157	NA	1st Legs (Dermacentor variabilis, 3-4 months post-molt, male, SAMN08513157)	286,315,548	80%	24%	132,177
SAMN08662384	NA	Whole organism (Dermacentor variabilis, SAMN08662384)	33,859,641	76%	9%	74,522
SAMN08662385	NA	whole organism (Dermacentor variabilis, SAMN08662385)	49,954,837	66%	8%	81,213
SAMN08662576	NA	Whole organism (Dermacentor variabilis, SAMN08662576)	44,826,680	53%	10%	71,754
SAMN08662581	NA	Whole organism (Dermacentor variabilis, SAMN08662581)	31,450,492	43%	12%	34,334
SAMN08662951	NA	Whole organism (Dermacentor variabilis, SAMN08662951)	37,382,914	58%	10%	70,042
SAMN08662952	NA	Whole organism (Dermacentor variabilis, SAMN08662952)	33,448,124	56%	10%	59,593
SAMN08662958	NA	Whole organism (Dermacentor variabilis, SAMN08662958)	42,378,095	68%	10%	74,511
SAMN08662963	NA	Whole organism (Dermacentor variabilis, SAMN08662963)	23,406,497	84%	13%	72,818
SAMN08662968	NA	Whole organism (Dermacentor variabilis, SAMN08662968)	30,197,924	82%	13%	76,736
SAMN09383130	NA	whole body (Dermacentor variabilis, SAMN09383130)	23,418,592	80%	21%	120,468
SAMN09383131	NA	whole body (Dermacentor variabilis, SAMN09383131)	21,457,360	79%	21%	115,644
SAMN09383132	NA	whole body (Dermacentor variabilis, SAMN09383132)	23,375,210	79%	21%	117,778
SAMN09383133	NA	whole body (Dermacentor variabilis, SAMN09383133)	16,000,396	79%	21%	106,690
SAMN09383134	NA	whole body (Dermacentor variabilis, SAMN09383134)	29,030,496	80%	22%	120,858
SAMN09383135	NA	whole body (Dermacentor variabilis, SAMN09383135)	31,625,866	80%	21%	124,108
SAMN09383136	NA	whole body (Dermacentor variabilis, SAMN09383136)	25,761,192	79%	20%	120,144
SAMN09383137	NA	whole body (Dermacentor variabilis, SAMN09383137)	28,852,778	80%	22%	120,309
SAMN09383138	NA	whole body (Dermacentor variabilis, SAMN09383138)	31,388,418	80%	21%	121,722
SAMN09383139	NA	whole body (Dermacentor variabilis, SAMN09383139)	26,512,312	81%	22%	122,550
SAMN09383140	NA	whole body (Dermacentor variabilis, SAMN09383140)	59,105,680	79%	22%	138,172
SAMN09383141	NA	whole body (Dermacentor variabilis, SAMN09383141)	61,129,168	80%	20%	141,023
SAMN09383142	NA	whole body (Dermacentor variabilis, SAMN09383142)	23,890,958	81%	22%	118,001
SAMN09383143	NA	whole body (Dermacentor variabilis, SAMN09383143)	24,584,932	81%	22%	120,773
SAMN09383144	NA	whole body (Dermacentor variabilis, SAMN09383144)	21,292,342	79%	21%	112,628
SAMN09383145	NA	whole body (Dermacentor variabilis, SAMN09383145)	22,870,504	81%	21%	116,635
SAMN09383146	NA	whole body (Dermacentor variabilis, SAMN09383146)	57,308,458	78%	21%	135,486
SAMN09383147	NA	whole body (Dermacentor variabilis, SAMN09383147)	27,001,856	81%	22%	119,882
SAMN09383148	NA	whole body (Dermacentor variabilis, SAMN09383148)	80,529,784	82%	23%	143,467
SAMN09383149	NA	whole body (Dermacentor variabilis, SAMN09383149)	28,390,304	78%	21%	117,156
SAMN09383150	NA	whole body (Dermacentor variabilis, SAMN09383150)	27,812,454	81%	23%	121,789
SAMN09383151	NA	whole body (Dermacentor variabilis, SAMN09383151)	28,508,046	81%	22%	120,445
SAMN09383152	NA	whole body (Dermacentor variabilis, SAMN09383152)	23,873,312	82%	22%	115,720
SAMN09383153	NA	whole body (Dermacentor variabilis, SAMN09383153)	27,751,328	78%	22%	113,036
SAMN12766353	NA	active tick (Dermacentor silvarum, SAMN12766353)	36,207,074	55%	34%	98,915
SAMN12766354	NA	diapause tick (Dermacentor silvarum, SAMN12766354)	34,713,812	39%	28%	86,489
SAMN12766355	NA	overwinter tick (Dermacentor silvarum, SAMN12766355)	36,647,684	43%	29%	87,081
SAMN15086211	33172483	M_tick3 (Dermacentor marginatus, adult, female, SAMN15086211)	51,976,602	37%	29%	81,404
SAMN15086212	33172483	M_tick2 (Dermacentor marginatus, adult, female, SAMN15086212)	48,376,992	35%	28%	77,374
SAMN15086213	33172483	M_tick1 (Dermacentor marginatus, adult, female, SAMN15086213)	52,978,266	33%	26%	78,076
SAMN15086214	33172483	H_tick3 (Dermacentor marginatus, adult, female, SAMN15086214)	55,155,938	48%	37%	89,151
SAMN15086215	33172483	H_tick2 (Dermacentor marginatus, adult, female, SAMN15086215)	50,242,374	50%	37%	86,510
SAMN15086216	33172483	H_tick1 (Dermacentor marginatus, adult, female, SAMN15086216)	46,912,662	44%	35%	87,034
SAMN15086217	33172483	F_tick3 (Dermacentor marginatus, adult, female, SAMN15086217)	44,496,928	49%	35%	85,418
SAMN15086218	33172483	F_tick2 (Dermacentor marginatus, adult, female, SAMN15086218)	46,197,036	45%	34%	76,597
SAMN15086219	33172483	F_tick1 (Dermacentor marginatus, adult, female, SAMN15086219)	53,012,248	48%	39%	81,348
SAMN15857353	NA	Whole body (Dermacentor variabilis, SAMN15857353)	36,214,537	66%	9%	80,719
SAMN15857354	NA	Whole body (Dermacentor variabilis, SAMN15857354)	43,754,338	70%	10%	88,192
SAMN15857355	NA	Whole body (Dermacentor variabilis, SAMN15857355)	49,177,383	67%	9%	81,003
SAMN15857356	NA	Whole body (Dermacentor variabilis, SAMN15857356)	43,009,464	51%	10%	65,719
SAMN15857357	NA	Whole body (Dermacentor variabilis, SAMN15857357)	23,127,885	67%	10%	68,726
SAMN15857358	NA	Whole body (Dermacentor variabilis, SAMN15857358)	31,958,015	67%	10%	74,498
SAMN15857359	NA	Whole body (Dermacentor variabilis, SAMN15857359)	22,679,304	69%	12%	70,597
SAMN15857360	NA	Whole body (Dermacentor variabilis, SAMN15857360)	22,679,304	69%	10%	70,484
SAMN15857361	NA	Whole body (Dermacentor variabilis, SAMN15857361)	32,568,890	69%	10%	77,562
SAMN15857362	NA	Whole body (Dermacentor variabilis, SAMN15857362)	26,207,818	45%	8%	46,362
SAMN15857363	NA	Whole body (Dermacentor variabilis, SAMN15857363)	33,618,942	45%	8%	48,810
SAMN15857364	NA	Whole body (Dermacentor variabilis, SAMN15857364)	22,795,815	68%	10%	68,660
SAMN15857365	NA	Whole body (Dermacentor variabilis, SAMN15857365)	31,134,771	69%	10%	74,954
SAMN15857366	NA	Whole body (Dermacentor variabilis, SAMN15857366)	29,463,501	69%	10%	79,127
SAMN28704436	NA	salivary glands (Dermacentor nuttalli, 2day, SAMN28704436)	48,026,426	54%	42%	90,438
SAMN28704437	NA	salivary glands (Dermacentor nuttalli, 3day, SAMN28704437)	47,127,762	56%	42%	91,515
SAMN28704438	NA	salivary glands (Dermacentor nuttalli, 4day, SAMN28704438)	41,839,350	56%	42%	91,792
SAMN28704439	NA	salivary glands (Dermacentor nuttalli, 2day, SAMN28704439)	46,269,808	35%	36%	77,286
SAMN28704440	NA	salivary glands (Dermacentor nuttalli, 3day, SAMN28704440)	42,260,790	36%	36%	73,476
SAMN28704441	NA	salivary glands (Dermacentor nuttalli, 4day, SAMN28704441)	44,296,222	35%	35%	71,625
SAMN28704442	NA	salivary glands (Dermacentor nuttalli, 2day, SAMN28704442)	44,245,368	37%	36%	76,694
SAMN28704443	NA	salivary glands (Dermacentor nuttalli, 3day, SAMN28704443)	47,264,442	38%	37%	80,135
SAMN28704444	NA	salivary glands (Dermacentor nuttalli, 4day, SAMN28704444)	46,661,868	36%	35%	72,564
SAMN28704445	NA	salivary glands (Dermacentor nuttalli, 2day, SAMN28704445)	45,754,112	35%	38%	70,766
SAMN28704446	NA	salivary glands (Dermacentor nuttalli, 3day, SAMN28704446)	45,241,356	40%	38%	81,719
SAMN28704447	NA	salivary glands (Dermacentor nuttalli, 4day, SAMN28704447)	49,022,616	36%	38%	78,629
SAMN28704448	NA	salivary glands (Dermacentor nuttalli, 2day, SAMN28704448)	44,262,990	36%	36%	76,725
SAMN28704449	NA	salivary glands (Dermacentor nuttalli, 3day, SAMN28704449)	46,386,970	35%	37%	76,133
SAMN28704450	NA	salivary glands (Dermacentor nuttalli, 4day, SAMN28704450)	46,827,408	37%	36%	77,135
SAMN29420670	NA	whole body (Dermacentor variabilis, adults, female, SAMN29420670)	49,361,220	53%	12%	30,092
SAMN29420671	NA	whole body (Dermacentor variabilis, adults, female, SAMN29420671)	31,214,886	65%	12%	27,631
SAMN29420672	NA	whole body (Dermacentor variabilis, adults, female, SAMN29420672)	48,359,910	78%	13%	53,175
SAMN29420673	NA	whole body (Dermacentor variabilis, adults, female, SAMN29420673)	77,106,069	83%	12%	61,651
SAMN29420674	NA	whole body (Dermacentor variabilis, adults, female, SAMN29420674)	34,921,568	70%	12%	60,279
SAMN29420675	NA	whole body (Dermacentor variabilis, adults, female, SAMN29420675)	38,042,601	67%	15%	69,474
SAMN29420676	NA	whole body (Dermacentor variabilis, adults, female, SAMN29420676)	34,575,824	81%	12%	58,302
SAMN29420677	NA	whole body (Dermacentor variabilis, adults, female, SAMN29420677)	39,885,416	72%	11%	77,880
SAMN29420678	NA	whole body (Dermacentor variabilis, adults, female, SAMN29420678)	33,805,230	66%	12%	53,711
SAMN29420679	NA	whole body (Dermacentor variabilis, adults, female, SAMN29420679)	63,552,373	74%	11%	65,353
SAMN29420680	NA	whole body (Dermacentor variabilis, adults, female, SAMN29420680)	47,619,698	76%	12%	56,506
SAMN29420681	NA	whole body (Dermacentor variabilis, adults, female, SAMN29420681)	41,392,924	73%	12%	52,905
SAMN29420682	NA	whole body (Dermacentor variabilis, adults, female, SAMN29420682)	49,025,913	77%	9%	99,839
SAMN29420683	NA	whole body (Dermacentor variabilis, adults, female, SAMN29420683)	28,953,753	81%	11%	97,770
SAMN29420684	NA	whole body (Dermacentor variabilis, adults, female, SAMN29420684)	54,917,795	76%	9%	106,097
SAMN29420685	NA	whole body (Dermacentor variabilis, adults, female, SAMN29420685)	48,491,765	80%	10%	106,573
SAMN29420686	NA	whole body (Dermacentor variabilis, adults, female, SAMN29420686)	40,035,359	79%	9%	97,810
SAMN29420687	NA	whole body (Dermacentor variabilis, adults, female, SAMN29420687)	34,392,716	79%	9%	96,716
SAMN29420688	NA	whole body (Dermacentor variabilis, adults, female, SAMN29420688)	34,770,965	83%	11%	102,735
SAMN29420689	NA	whole body (Dermacentor variabilis, adults, female, SAMN29420689)	63,702,815	82%	10%	111,257
SAMN29420690	NA	whole body (Dermacentor variabilis, adults, female, SAMN29420690)	33,537,107	81%	10%	98,263
SAMN29420691	NA	whole body (Dermacentor variabilis, adults, female, SAMN29420691)	31,896,440	84%	13%	81,644
SAMN29420692	NA	whole body (Dermacentor variabilis, adults, female, SAMN29420692)	92,969,226	80%	12%	74,599
SAMN29420693	NA	whole body (Dermacentor variabilis, adults, female, SAMN29420693)	35,935,606	71%	15%	61,099
SAMN30260860	NA	midgut (Dermacentor nuttalli, 2day, SAMN30260860)	46,290,404	50%	41%	87,230
SAMN30260861	NA	midgut (Dermacentor nuttalli, 3day, SAMN30260861)	48,235,416	50%	40%	88,428
SAMN30260862	NA	midgut (Dermacentor nuttalli, 4day, SAMN30260862)	46,436,760	50%	41%	88,667
SAMN30260863	NA	midgut (Dermacentor nuttalli, 2day, SAMN30260863)	55,255,138	49%	43%	87,370
SAMN30260864	NA	midgut (Dermacentor nuttalli, 3day, SAMN30260864)	46,495,740	48%	42%	82,140
SAMN30260865	NA	midgut (Dermacentor nuttalli, 4day, SAMN30260865)	47,719,986	48%	42%	85,868
SAMN30260866	NA	midgut (Dermacentor nuttalli, 2day, SAMN30260866)	42,404,526	48%	42%	81,259
SAMN30260867	NA	midgut (Dermacentor nuttalli, 3day, SAMN30260867)	43,696,014	48%	41%	84,346
SAMN30260868	NA	midgut (Dermacentor nuttalli, 4day, SAMN30260868)	40,881,376	49%	42%	87,070
SAMN30260869	NA	midgut (Dermacentor nuttalli, 2day, SAMN30260869)	45,662,460	47%	41%	83,220
SAMN30260870	NA	midgut (Dermacentor nuttalli, 3day, SAMN30260870)	43,445,564	46%	41%	81,742
SAMN30260871	NA	midgut (Dermacentor nuttalli, 4day, SAMN30260871)	44,806,956	44%	41%	78,605
SAMN30260872	NA	midgut (Dermacentor nuttalli, 2day, SAMN30260872)	44,145,330	47%	43%	83,416
SAMN30260873	NA	midgut (Dermacentor nuttalli, 3day, SAMN30260873)	42,200,976	46%	42%	88,190
SAMN30260874	NA	midgut (Dermacentor nuttalli, 4day, SAMN30260874)	46,395,618	44%	43%	80,409

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
SRR1424294	SRX608554	SRP043223	SAMN02854796	3,251,411	58%	38%
SRR2989664	SRX1478129	SRP067267	SAMN04331754	26,432,310	45%	8%
SRR2989723	SRX1478171	SRP067267	SAMN04331755	33,609,337	46%	8%
SRR2989724	SRX1478172	SRP067267	SAMN04331756	25,242,545	72%	11%
SRR2993800	SRX1480596	SRP067267	SAMN04331757	31,809,731	72%	11%
SRR2993801	SRX1480597	SRP067267	SAMN04331758	41,753,974	71%	11%
SRR2993802	SRX1480599	SRP067267	SAMN04331775	19,548,217	75%	11%
SRR2993803	SRX1480602	SRP067267	SAMN04331776	33,650,599	75%	11%
SRR3017207	SRX1485099	SRP067267	SAMN04331777	40,968,111	59%	9%
SRR2993807	SRX1480608	SRP067267	SAMN04331778	41,752,184	74%	11%
SRR5317833	SRX2617316	SRP101465	SAMN06481838	524,431,098	42%	12%
SRR6702974	SRX3677152	SRP132560	SAMN08513157	105,959,590	81%	24%
SRR6702973	SRX3677153	SRP132560	SAMN08513157	180,355,958	79%	24%
SRR6822734	SRX3779278	SRP134385	SAMN08662384	33,859,641	76%	9%
SRR6822735	SRX3779277	SRP134385	SAMN08662385	49,954,837	66%	8%
SRR6822732	SRX3779280	SRP134385	SAMN08662576	44,826,680	53%	10%
SRR6822733	SRX3779279	SRP134385	SAMN08662581	31,450,492	43%	12%
SRR6822738	SRX3779274	SRP134385	SAMN08662951	37,382,914	58%	10%
SRR6822739	SRX3779273	SRP134385	SAMN08662952	33,448,124	56%	10%
SRR6822737	SRX3779275	SRP134385	SAMN08662958	42,378,095	68%	10%
SRR6822740	SRX3779272	SRP134385	SAMN08662963	23,406,497	84%	13%
SRR6822741	SRX3779271	SRP134385	SAMN08662968	30,197,924	82%	13%
SRR7286796	SRX4190349	SRP150145	SAMN09383130	23,418,592	80%	21%
SRR7286795	SRX4190350	SRP150145	SAMN09383131	21,457,360	79%	21%
SRR7286798	SRX4190347	SRP150145	SAMN09383132	23,375,210	79%	21%
SRR7286797	SRX4190348	SRP150145	SAMN09383133	16,000,396	79%	21%
SRR7286800	SRX4190345	SRP150145	SAMN09383134	29,030,496	80%	22%
SRR7286799	SRX4190346	SRP150145	SAMN09383135	31,625,866	80%	21%
SRR7286802	SRX4190343	SRP150145	SAMN09383136	25,761,192	79%	20%
SRR7286801	SRX4190344	SRP150145	SAMN09383137	28,852,778	80%	22%
SRR7286804	SRX4190341	SRP150145	SAMN09383138	31,388,418	80%	21%
SRR7286803	SRX4190342	SRP150145	SAMN09383139	26,512,312	81%	22%
SRR7286810	SRX4190335	SRP150145	SAMN09383140	59,105,680	79%	22%
SRR7286809	SRX4190336	SRP150145	SAMN09383141	61,129,168	80%	20%
SRR7286808	SRX4190337	SRP150145	SAMN09383142	23,890,958	81%	22%
SRR7286807	SRX4190338	SRP150145	SAMN09383143	24,584,932	81%	22%
SRR7286814	SRX4190331	SRP150145	SAMN09383144	21,292,342	79%	21%
SRR7286813	SRX4190332	SRP150145	SAMN09383145	22,870,504	81%	21%
SRR7286812	SRX4190333	SRP150145	SAMN09383146	57,308,458	78%	21%
SRR7286811	SRX4190334	SRP150145	SAMN09383147	27,001,856	81%	22%
SRR7286806	SRX4190339	SRP150145	SAMN09383148	80,529,784	82%	23%
SRR7286805	SRX4190340	SRP150145	SAMN09383149	28,390,304	78%	21%
SRR7286793	SRX4190352	SRP150145	SAMN09383150	27,812,454	81%	23%
SRR7286794	SRX4190351	SRP150145	SAMN09383151	28,508,046	81%	22%
SRR7286791	SRX4190354	SRP150145	SAMN09383152	23,873,312	82%	22%
SRR7286792	SRX4190353	SRP150145	SAMN09383153	27,751,328	78%	22%
SRR10123456	SRX6852925	SRP221719	SAMN12766353	36,207,074	55%	34%
SRR10123455	SRX6852926	SRP221719	SAMN12766354	34,713,812	39%	28%
SRR10123454	SRX6852927	SRP221719	SAMN12766355	36,647,684	43%	29%
SRR11908808	SRX8455371	SRP265652	SAMN15086211	51,976,602	37%	29%
SRR11908807	SRX8455370	SRP265652	SAMN15086212	48,376,992	35%	28%
SRR11908806	SRX8455369	SRP265652	SAMN15086213	52,978,266	33%	26%
SRR11908805	SRX8455368	SRP265652	SAMN15086214	55,155,938	48%	37%
SRR11908804	SRX8455367	SRP265652	SAMN15086215	50,242,374	50%	37%
SRR11908803	SRX8455366	SRP265652	SAMN15086216	46,912,662	44%	35%
SRR11908802	SRX8455365	SRP265652	SAMN15086217	44,496,928	49%	35%
SRR11908801	SRX8455364	SRP265652	SAMN15086218	46,197,036	45%	34%
SRR11908800	SRX8455363	SRP265652	SAMN15086219	53,012,248	48%	39%
SRR12477440	SRX8971327	SRP278112	SAMN15857353	36,214,537	66%	9%
SRR12477439	SRX8971328	SRP278112	SAMN15857354	43,754,338	70%	10%
SRR12477434	SRX8971333	SRP278112	SAMN15857355	49,177,383	67%	9%
SRR12477433	SRX8971334	SRP278112	SAMN15857356	43,009,464	51%	10%
SRR12477432	SRX8971335	SRP278112	SAMN15857357	23,127,885	67%	10%
SRR12477431	SRX8971336	SRP278112	SAMN15857358	31,958,015	67%	10%
SRR12477430	SRX8971337	SRP278112	SAMN15857359	22,679,304	69%	12%
SRR12477429	SRX8971338	SRP278112	SAMN15857360	22,679,304	69%	10%
SRR12477428	SRX8971339	SRP278112	SAMN15857361	32,568,890	69%	10%
SRR12477427	SRX8971340	SRP278112	SAMN15857362	26,207,818	45%	8%
SRR12477438	SRX8971329	SRP278112	SAMN15857363	33,618,942	45%	8%
SRR12477437	SRX8971330	SRP278112	SAMN15857364	22,795,815	68%	10%
SRR12477436	SRX8971331	SRP278112	SAMN15857365	31,134,771	69%	10%
SRR12477435	SRX8971332	SRP278112	SAMN15857366	29,463,501	69%	10%
SRR19434760	SRX15487977	SRP377473	SAMN28704436	48,026,426	54%	42%
SRR19434759	SRX15487978	SRP377473	SAMN28704437	47,127,762	56%	42%
SRR19434753	SRX15487984	SRP377473	SAMN28704438	41,839,350	56%	42%
SRR19434752	SRX15487985	SRP377473	SAMN28704439	46,269,808	35%	36%
SRR19434751	SRX15487986	SRP377473	SAMN28704440	42,260,790	36%	36%
SRR19434750	SRX15487987	SRP377473	SAMN28704441	44,296,222	35%	35%
SRR19434749	SRX15487988	SRP377473	SAMN28704442	44,245,368	37%	36%
SRR19434748	SRX15487989	SRP377473	SAMN28704443	47,264,442	38%	37%
SRR19434747	SRX15487990	SRP377473	SAMN28704444	46,661,868	36%	35%
SRR19434746	SRX15487991	SRP377473	SAMN28704445	45,754,112	35%	38%
SRR19434758	SRX15487979	SRP377473	SAMN28704446	45,241,356	40%	38%
SRR19434757	SRX15487980	SRP377473	SAMN28704447	49,022,616	36%	38%
SRR19434756	SRX15487981	SRP377473	SAMN28704448	44,262,990	36%	36%
SRR19434755	SRX15487982	SRP377473	SAMN28704449	46,386,970	35%	37%
SRR19434754	SRX15487983	SRP377473	SAMN28704450	46,827,408	37%	36%
SRR19908529	SRX15951264	SRP384371	SAMN29420670	49,361,220	53%	12%
SRR19908528	SRX15951265	SRP384371	SAMN29420671	31,214,886	65%	12%
SRR19908517	SRX15951276	SRP384371	SAMN29420672	48,359,910	78%	13%
SRR19908512	SRX15951281	SRP384371	SAMN29420673	77,106,069	83%	12%
SRR19908511	SRX15951282	SRP384371	SAMN29420674	34,921,568	70%	12%
SRR19908510	SRX15951283	SRP384371	SAMN29420675	38,042,601	67%	15%
SRR19908509	SRX15951284	SRP384371	SAMN29420676	34,575,824	81%	12%
SRR19908508	SRX15951285	SRP384371	SAMN29420677	39,885,416	72%	11%
SRR19908507	SRX15951286	SRP384371	SAMN29420678	33,805,230	66%	12%
SRR19908506	SRX15951287	SRP384371	SAMN29420679	63,552,373	74%	11%
SRR19908527	SRX15951266	SRP384371	SAMN29420680	47,619,698	76%	12%
SRR19908526	SRX15951267	SRP384371	SAMN29420681	41,392,924	73%	12%
SRR19908525	SRX15951268	SRP384371	SAMN29420682	49,025,913	77%	9%
SRR19908524	SRX15951269	SRP384371	SAMN29420683	28,953,753	81%	11%
SRR19908523	SRX15951270	SRP384371	SAMN29420684	54,917,795	76%	9%
SRR19908522	SRX15951271	SRP384371	SAMN29420685	48,491,765	80%	10%
SRR19908521	SRX15951272	SRP384371	SAMN29420686	40,035,359	79%	9%
SRR19908520	SRX15951273	SRP384371	SAMN29420687	34,392,716	79%	9%
SRR19908519	SRX15951274	SRP384371	SAMN29420688	34,770,965	83%	11%
SRR19908518	SRX15951275	SRP384371	SAMN29420689	63,702,815	82%	10%
SRR19908516	SRX15951277	SRP384371	SAMN29420690	33,537,107	81%	10%
SRR19908515	SRX15951278	SRP384371	SAMN29420691	31,896,440	84%	13%
SRR19908514	SRX15951279	SRP384371	SAMN29420692	92,969,226	80%	12%
SRR19908513	SRX15951280	SRP384371	SAMN29420693	35,935,606	71%	15%
SRR21031640	SRX17047117	SRP391941	SAMN30260860	46,290,404	50%	41%
SRR21031639	SRX17047118	SRP391941	SAMN30260861	48,235,416	50%	40%
SRR21031633	SRX17047124	SRP391941	SAMN30260862	46,436,760	50%	41%
SRR21031632	SRX17047125	SRP391941	SAMN30260863	55,255,138	49%	43%
SRR21031631	SRX17047126	SRP391941	SAMN30260864	46,495,740	48%	42%
SRR21031630	SRX17047127	SRP391941	SAMN30260865	47,719,986	48%	42%
SRR21031629	SRX17047128	SRP391941	SAMN30260866	42,404,526	48%	42%
SRR21031628	SRX17047129	SRP391941	SAMN30260867	43,696,014	48%	41%
SRR21031627	SRX17047130	SRP391941	SAMN30260868	40,881,376	49%	42%
SRR21031626	SRX17047131	SRP391941	SAMN30260869	45,662,460	47%	41%
SRR21031638	SRX17047119	SRP391941	SAMN30260870	43,445,564	46%	41%
SRR21031637	SRX17047120	SRP391941	SAMN30260871	44,806,956	44%	41%
SRR21031636	SRX17047121	SRP391941	SAMN30260872	44,145,330	47%	43%
SRR21031635	SRX17047122	SRP391941	SAMN30260873	42,200,976	46%	42%
SRR21031634	SRX17047123	SRP391941	SAMN30260874	46,395,618	44%	43%

SRA Long Read Alignment Statistics

The alignments of the following long RNA-Seq reads (PacBio, Oxford Nanopore, 454, or other long-read sequencing technologies) from the Sequence Read Archive with minimap2 were used for gene prediction:

Run	Sample	Number of reads	Number (%) of sequences aligned by Minimap2	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
All	NA	1397545	1152992 (82.50%)	458203 (32.78%)	98.93	99.27
SRR006886	SAMN00007559	62730	16306 (25.99%)	0 (0.00%)	0	0
SRR006887	SAMN00007559	469406	407400 (86.79%)	0 (0.00%)	0	0
SRR038715	SAMN00010599	5560	2290 (41.18%)	0 (0.00%)	0	0
SRR038716	SAMN00010599	55532	46762 (84.20%)	0 (0.00%)	0	0
SRR038717	SAMN00010599	14445	11778 (81.53%)	0 (0.00%)	0	0
SRR038718	SAMN00010599	44445	40505 (91.13%)	0 (0.00%)	0	0
SRR038719	SAMN00010599	23043	17187 (74.58%)	0 (0.00%)	0	0
SRR038720	SAMN00010599	56388	39518 (70.08%)	0 (0.00%)	0	0
SRR038721	SAMN00010599	33922	176 (0.51%)	0 (0.00%)	0	0
SRR535790	SAMN01109476	318807	296630 (93.04%)	238566 (74.83%)	99.02	99.36
SRR535791	SAMN01109477	155620	139378 (89.56%)	109870 (70.60%)	98.78	99.22
SRR535792	SAMN01109478	157647	135062 (85.67%)	109767 (69.62%)	98.93	99.16

Protein alignments

The alignments of the following proteins with ProSplign were used for gene prediction:

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Varroa destructor high-quality model RefSeq (XP_)	9,670	6,646 (68.73%)	6,646 (68.73%)	63.34%	53.88%
Pediculus humanus corporis model RefSeq (XP_)	10,775	6,734 (62.50%)	6,734 (62.50%)	61.90%	52.62%
Tetranychus urticae high-quality model RefSeq (XP_)	10,412	6,581 (63.21%)	6,581 (63.21%)	60.79%	49.17%
Same-species GenBank	32	32 (100.00%)	32 (100.00%)	86.50%	87.88%
Arthropoda GenBank	180,050	118,314 (65.71%)	118,314 (65.71%)	65.26%	62.83%
Arthropoda known RefSeq (NP_)	39,899	23,296 (58.39%)	23,296 (58.39%)	62.51%	49.54%
Limulus polyphemus high-quality model RefSeq (XP_)	14,599	11,388 (78.01%)	11,388 (78.01%)	64.17%	58.24%
Ixodes scapularis high-quality model RefSeq (XP_)	13,514	11,565 (85.58%)	11,565 (85.58%)	67.28%	73.02%
Tribolium castaneum high-quality model RefSeq (XP_)	11,487	7,126 (62.04%)	7,126 (62.04%)	59.68%	45.97%
Apis mellifera high-quality model RefSeq (XP_)	8,879	5,814 (65.48%)	5,814 (65.48%)	61.11%	50.70%

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
BUSCO: Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. Molecular biology and evolution 2021.38(10):4647-4654
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20
STAR: Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. Bioinformatics 2013 Jan 1;29(1):15-21.
Minimap2: Li H. Bioinformatics 2018 Sep 15;34(18):3094-3100

RefSeq

Integrated reference sequences