NCBI Panonychus citri Annotation Release GCF_014898815.1-RS_2023_01

The RefSeq genome records for Panonychus citri were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. This report presents statistics on the annotation products, the input data used in the pipeline and intermediate alignment results.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
BUSCO results: Annotation completeness assessed with BUSCO
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as "GCF_014898815.1-RS_2023_01".

Date of Entrez queries for transcripts and proteins: Jan 26 2023
Date of submission of annotation to the public databases: Jan 30 2023
Software version: 10.1

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
ASM1489881v1	GCF_014898815.1	Citrus Research Institute, Southwest University/Chinese Academy of Agricultural Sciences	10-20-2020	Reference	1 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	ASM1489881v1
Genes and pseudogenes	12,885
protein-coding	11,460
non-coding	1,055
Transcribed pseudogenes	1
Non-transcribed pseudogenes	369
genes with variants	2,399
Immunoglobulin/T-cell receptor gene segments	0
other	0
mRNAs	14,849
fully-supported	13,777
with > 5% ab initio	667
partial	154
with filled gap(s)	0
known RefSeq (NM_)	0
model RefSeq (XM_)	14,849
non-coding RNAs	2,530
fully-supported	2,340
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	2,388
pseudo transcripts	1
fully-supported	1
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	1
CDSs	14,862
fully-supported	13,777
with > 5% ab initio	718
partial	154
with major correction(s)	1,093
known RefSeq (NP_)	0
model RefSeq (XP_)	14,862

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	12,515	4,738	2,359	69	196,440
All transcripts	17,379	2,469	2,026	41	24,500
mRNA	14,849	2,466	2,008	118	24,500
misc_RNA	244	2,848	2,443	268	9,233
tRNA	140	71	73	41	87
lncRNA	2,096	2,660	2,366	177	11,097
snoRNA	8	140	129	69	217
snRNA	15	156	166	106	193
rRNA	27	594	119	119	5,421
Single-exon transcripts	1,541	1,645	1,387	270	13,482
coding transcripts (NM_/XM_ )	1,541	1,645	1,387	270	13,482
CDSs	14,862	1,691	1,278	117	23,625
Exons	56,957	553	295	4	13,486
in coding transcripts (NM_/XM_ )	51,449	556	294	4	13,486
in non-coding transcripts (NR_/XR_ )	6,190	508	290	11	8,112
Introns	42,326	842	118	30	195,606
in coding transcripts (NM_/XM_ )	38,563	874	113	30	195,606
in non-coding transcripts (NR_/XR_ )	4,431	584	169	45	32,992

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	1.39	1	1	28
Number of exons per transcript	4.68	4	1	37

BUSCO analysis of gene annotation

BUSCO v4.1.4 was run in "protein" mode on the annotated gene set picking one longest protein per gene, and run using the arachnida_odb10 lineage dataset. Results are reported for the gene set from the primary assembly unit, and presented in BUSCO notation.

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the UniProtKB/Swiss-Prot curated proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 11447 coding genes, 8250 genes had a protein with an alignment covering 50% or more of the query and 1634 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: UniProtKB/Swiss-Prot curated proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker (if calculated), for each assembly. RepeatMasker results are only calculated for organisms with complete Dfam HMM model collections.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with WindowMasker
ASM1489881v1	GCF_014898815.1	36.46%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez Nucleotide, Entrez Protein, and SRA, and aligned to the genome.

Transcript alignments

The alignments of the following transcripts with Splign were used for gene prediction:

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species Genbank	110	96 (87.27%)	90 (81.82%)	99.36%	99.84%
Same-species TSA	11,565	11,554 (99.90%)	11,531 (99.71%)	99.99%	99.98%

RNA-Seq alignments

The alignments of the following RNA-Seq reads with STAR were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	Aggregate of all aligned samples	2,725,133,682	73%	17%	53,627
SAMEA1034596	miticides susceptible strain (Panonychus citri, SAMEA1034596)	26,822,224	92%	8%	33,909
SAMEA1034597	Hexyzhiazox resistant strain (Panonychus citri, SAMEA1034597)	25,955,556	92%	9%	34,627
SAMN02776974	MIGS Eukaryotic sample from Panonychus citri (Panonychus citri, SAMN02776974)	39,606,128	66%	11%	35,734
SAMN07430744	whole body (Panonychus citri, adult, female, SAMN07430744)	54,535,734	89%	16%	44,682
SAMN07430745	whole body (Panonychus citri, adult, female, SAMN07430745)	57,184,900	85%	14%	43,462
SAMN10995422	whole body (Panonychus citri, SAMN10995422)	65,270,766	60%	18%	43,750
SAMN10995424	whole body (Panonychus citri, SAMN10995424)	58,342,636	88%	16%	42,664
SAMN10995425	whole body (Panonychus citri, SAMN10995425)	60,396,880	85%	15%	42,349
SAMN10995428	whole body (Panonychus citri, SAMN10995428)	63,157,912	60%	18%	43,995
SAMN10995430	whole body (Panonychus citri, SAMN10995430)	63,281,770	53%	18%	43,390
SAMN10995431	whole body (Panonychus citri, SAMN10995431)	52,080,626	87%	17%	43,813
SAMN10995433	whole body (Panonychus citri, SAMN10995433)	67,904,070	79%	17%	43,666
SAMN10995434	whole body (Panonychus citri, SAMN10995434)	65,318,866	79%	17%	44,654
SAMN10995435	whole body (Panonychus citri, SAMN10995435)	72,945,604	89%	17%	46,047
SAMN11157749	whole body (Panonychus citri, SAMN11157749)	72,945,604	89%	17%	46,047
SAMN11157750	whole body (Panonychus citri, SAMN11157750)	58,342,636	88%	16%	42,664
SAMN11157751	whole body (Panonychus citri, SAMN11157751)	60,396,880	85%	15%	42,349
SAMN11157823	whole body (Panonychus citri, SAMN11157823)	70,351,298	84%	16%	42,485
SAMN11157824	whole body (Panonychus citri, SAMN11157824)	69,549,678	82%	15%	41,077
SAMN11157825	whole body (Panonychus citri, SAMN11157825)	60,403,424	87%	17%	44,479
SAMN13519016	Body (Panonychus citri, SAMN13519016)	90,876,178	77%	18%	41,967
SAMN13578004	SARI4 (Panonychus citri, female, SAMN13578004)	30,439,988	68%	19%	36,733
SAMN13578005	SARI3 (Panonychus citri, female, SAMN13578005)	35,851,822	73%	20%	38,166
SAMN13578006	SARI2 (Panonychus citri, female, SAMN13578006)	38,737,026	67%	20%	38,043
SAMN13578009	SARI1 (Panonychus citri, female, SAMN13578009)	39,627,558	66%	19%	37,885
SAMN13578010	RAMSAR4 (Panonychus citri, female, SAMN13578010)	33,866,698	71%	20%	37,497
SAMN13578011	RAMSAR3 (Panonychus citri, female, SAMN13578011)	33,098,262	56%	19%	35,860
SAMN13578013	RAMSAR2 (Panonychus citri, female, SAMN13578013)	34,950,938	61%	20%	37,035
SAMN13578015	RAMSAR1 (Panonychus citri, female, SAMN13578015)	39,106,876	63%	19%	38,300
SAMN13578016	LAHIJAN4 (Panonychus citri, female, SAMN13578016)	34,398,292	71%	21%	36,427
SAMN13578018	LAHIJAN3 (Panonychus citri, female, SAMN13578018)	28,089,786	63%	21%	34,367
SAMN13578019	LAHIJAN2 (Panonychus citri, female, SAMN13578019)	32,095,520	71%	21%	35,515
SAMN13578020	LAHIJAN1 (Panonychus citri, female, SAMN13578020)	37,445,824	53%	21%	35,605
SAMN13578022	RASHT4 (Panonychus citri, female, SAMN13578022)	31,395,854	65%	21%	34,833
SAMN13578023	RASHT3 (Panonychus citri, female, SAMN13578023)	32,748,010	72%	21%	36,034
SAMN13578024	RASHT2 (Panonychus citri, female, SAMN13578024)	45,911,242	71%	19%	19,836
SAMN13578025	RASHT1 (Panonychus citri, female, SAMN13578025)	42,004,222	70%	21%	36,094
SAMN16481338	whole body (Panonychus citri, SAMN16481338)	49,883,144	87%	19%	44,386
SAMN16481339	whole body (Panonychus citri, SAMN16481339)	41,715,478	85%	19%	43,320
SAMN16481340	whole body (Panonychus citri, SAMN16481340)	47,841,800	87%	19%	43,839
SAMN16481341	whole body (Panonychus citri, SAMN16481341)	44,155,506	87%	19%	43,669
SAMN16481342	whole body (Panonychus citri, SAMN16481342)	43,610,586	86%	19%	43,348
SAMN16481343	whole body (Panonychus citri, SAMN16481343)	47,163,226	88%	18%	43,225
SAMN16481344	whole body (Panonychus citri, SAMN16481344)	43,035,912	88%	19%	42,316
SAMN16481345	whole body (Panonychus citri, SAMN16481345)	45,500,492	86%	19%	42,826
SAMN16481346	whole body (Panonychus citri, SAMN16481346)	40,533,588	88%	19%	41,786
SAMN16481347	whole body (Panonychus citri, SAMN16481347)	42,114,044	87%	18%	42,674
SAMN16481348	whole body (Panonychus citri, SAMN16481348)	45,858,402	86%	18%	42,639
SAMN16481349	whole body (Panonychus citri, SAMN16481349)	49,369,694	88%	18%	43,841
SAMN17081356	whole body (Panonychus citri, adult, SAMN17081356)	82,302,448	88%	14%	42,486
SAMN20865588	deutonymph (Panonychus citri, SAMN20865588)	49,513,248	6%	18%	30,861
SAMN20865589	deutonymph (Panonychus citri, SAMN20865589)	42,414,022	43%	19%	39,237
SAMN20865590	deutonymph (Panonychus citri, SAMN20865590)	51,850,544	37%	19%	39,878
SAMN20865591	deutonymph (Panonychus citri, SAMN20865591)	42,586,740	7%	14%	28,838
SAMN20865592	deutonymph (Panonychus citri, SAMN20865592)	43,462,344	7%	15%	31,381
SAMN20865593	deutonymph (Panonychus citri, SAMN20865593)	46,785,176	14%	15%	35,599

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
ERR044692	ERX021800	ERP000885	SAMEA1034596	13,411,112	93%	8%
ERR044693	ERX021801	ERP000885	SAMEA1034596	13,411,112	92%	8%
ERR044694	ERX021802	ERP000885	SAMEA1034597	12,977,778	91%	9%
ERR044695	ERX021803	ERP000885	SAMEA1034597	12,977,778	93%	9%
SRR1698040	SRX800412	SRP051661	SAMN02776974	39,606,128	66%	11%
SRR5892088	SRX3057835	SRP114745	SAMN07430744	54,535,734	89%	16%
SRR5892087	SRX3057836	SRP114745	SAMN07430745	57,184,900	85%	14%
SRR8658356	SRX5456066	SRP187337	SAMN10995422	65,270,766	60%	18%
SRR8658358	SRX5456064	SRP187337	SAMN10995424	58,342,636	88%	16%
SRR8658357	SRX5456065	SRP187337	SAMN10995425	60,396,880	85%	15%
SRR8658355	SRX5456067	SRP187337	SAMN10995428	63,157,912	60%	18%
SRR8658354	SRX5456068	SRP187337	SAMN10995430	63,281,770	53%	18%
SRR8658353	SRX5456069	SRP187337	SAMN10995431	52,080,626	87%	17%
SRR8658352	SRX5456070	SRP187337	SAMN10995433	67,904,070	79%	17%
SRR8658360	SRX5456062	SRP187337	SAMN10995434	65,318,866	79%	17%
SRR8658359	SRX5456063	SRP187337	SAMN10995435	72,945,604	89%	17%
SRR8749693	SRX5540672	SRP188804	SAMN11157749	72,945,604	89%	17%
SRR8749694	SRX5540671	SRP188804	SAMN11157750	58,342,636	88%	16%
SRR8749695	SRX5540670	SRP188804	SAMN11157751	60,396,880	85%	15%
SRR8749696	SRX5540669	SRP188804	SAMN11157823	70,351,298	84%	16%
SRR8749691	SRX5540674	SRP188804	SAMN11157824	69,549,678	82%	15%
SRR8749692	SRX5540673	SRP188804	SAMN11157825	60,403,424	87%	17%
SRR10613610	SRX7292770	SRP235325	SAMN13519016	45,496,038	66%	17%
SRR10613609	SRX7292771	SRP235325	SAMN13519016	45,380,140	87%	19%
SRR10708143	SRX7388968	SRP237799	SAMN13578004	30,439,988	68%	19%
SRR10708142	SRX7388967	SRP237799	SAMN13578005	35,851,822	73%	20%
SRR10708141	SRX7388966	SRP237799	SAMN13578006	38,737,026	67%	20%
SRR10708140	SRX7388965	SRP237799	SAMN13578009	39,627,558	66%	19%
SRR10708139	SRX7388963	SRP237799	SAMN13578010	33,866,698	71%	20%
SRR10708138	SRX7388962	SRP237799	SAMN13578011	33,098,262	56%	19%
SRR10708137	SRX7388961	SRP237799	SAMN13578013	34,950,938	61%	20%
SRR10708136	SRX7388960	SRP237799	SAMN13578015	39,106,876	63%	19%
SRR10708135	SRX7388959	SRP237799	SAMN13578016	34,398,292	71%	21%
SRR10708134	SRX7388958	SRP237799	SAMN13578018	28,089,786	63%	21%
SRR10708133	SRX7388957	SRP237799	SAMN13578019	32,095,520	71%	21%
SRR10708132	SRX7388956	SRP237799	SAMN13578020	37,445,824	53%	21%
SRR10708131	SRX7388955	SRP237799	SAMN13578022	31,395,854	65%	21%
SRR10708130	SRX7388954	SRP237799	SAMN13578023	32,748,010	72%	21%
SRR10708129	SRX7388953	SRP237799	SAMN13578024	45,911,242	71%	19%
SRR10708128	SRX7388952	SRP237799	SAMN13578025	42,004,222	70%	21%
SRR12855029	SRX9321972	SRP287932	SAMN16481338	49,883,144	87%	19%
SRR12855028	SRX9321973	SRP287932	SAMN16481339	41,715,478	85%	19%
SRR12855025	SRX9321976	SRP287932	SAMN16481340	47,841,800	87%	19%
SRR12855024	SRX9321977	SRP287932	SAMN16481341	44,155,506	87%	19%
SRR12855023	SRX9321978	SRP287932	SAMN16481342	43,610,586	86%	19%
SRR12855022	SRX9321979	SRP287932	SAMN16481343	47,163,226	88%	18%
SRR12855021	SRX9321980	SRP287932	SAMN16481344	43,035,912	88%	19%
SRR12855020	SRX9321981	SRP287932	SAMN16481345	45,500,492	86%	19%
SRR12855019	SRX9321982	SRP287932	SAMN16481346	40,533,588	88%	19%
SRR12855018	SRX9321983	SRP287932	SAMN16481347	42,114,044	87%	18%
SRR12855027	SRX9321974	SRP287932	SAMN16481348	45,858,402	86%	18%
SRR12855026	SRX9321975	SRP287932	SAMN16481349	49,369,694	88%	18%
SRR13254828	SRX9685830	SRP297981	SAMN17081356	82,302,448	88%	14%
SRR15533143	SRX11831473	SRP333427	SAMN20865588	49,513,248	6%	18%
SRR15533142	SRX11831474	SRP333427	SAMN20865589	42,414,022	43%	19%
SRR15533141	SRX11831475	SRP333427	SAMN20865590	51,850,544	37%	19%
SRR15533140	SRX11831476	SRP333427	SAMN20865591	42,586,740	7%	14%
SRR15533139	SRX11831477	SRP333427	SAMN20865592	43,462,344	7%	15%
SRR15533138	SRX11831478	SRP333427	SAMN20865593	46,785,176	14%	15%

Protein alignments

The alignments of the following proteins with ProSplign were used for gene prediction:

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Varroa destructor high-quality model RefSeq (XP_)	9,670	5,878 (60.79%)	5,878 (60.79%)	57.57%	45.02%
Pediculus humanus corporis model RefSeq (XP_)	10,775	6,344 (58.88%)	6,344 (58.88%)	57.58%	47.01%
Tetranychus urticae high-quality model RefSeq (XP_)	10,412	9,769 (93.82%)	9,769 (93.82%)	71.12%	78.08%
Same-species GenBank	106	104 (98.11%)	104 (98.11%)	85.26%	93.08%
Arthropoda GenBank	183,043	118,046 (64.49%)	118,046 (64.49%)	61.53%	57.18%
Arthropoda known RefSeq (NP_)	40,237	22,984 (57.12%)	22,984 (57.12%)	58.33%	44.94%
Limulus polyphemus high-quality model RefSeq (XP_)	14,599	9,769 (66.92%)	9,769 (66.92%)	59.59%	50.20%
Ixodes scapularis high-quality model RefSeq (XP_)	13,514	6,965 (51.54%)	6,965 (51.54%)	58.56%	45.43%
Tribolium castaneum high-quality model RefSeq (XP_)	11,487	6,679 (58.14%)	6,679 (58.14%)	55.62%	40.94%
Apis mellifera high-quality model RefSeq (XP_)	8,879	5,457 (61.46%)	5,457 (61.46%)	56.95%	46.21%

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
BUSCO: Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. Molecular biology and evolution 2021.38(10):4647-4654
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20
STAR: Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. Bioinformatics 2013 Jan 1;29(1):15-21.
Minimap2: Li H. Bioinformatics 2018 Sep 15;34(18):3094-3100

RefSeq

Integrated reference sequences