NCBI Drosophila innubila Annotation Release 100

The RefSeq genome records for Drosophila innubila were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. This report presents statistics on the annotation products, the input data used in the pipeline and intermediate alignment results.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as NCBI Drosophila innubila Annotation Release 100

Annotation release ID: 100
Date of Entrez queries for transcripts and proteins: May 7 2020
Date of submission of annotation to the public databases: May 20 2020
Software version: 8.4

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
UK_Dinn_1.0	GCF_004354385.1	University of Kansas	03-19-2019	Reference	2 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	UK_Dinn_1.0
Genes and pseudogenes	15,094
protein-coding	13,595
non-coding	1,345
transcribed pseudogenes	0
non-transcribed pseudogenes	154
genes with variants	3,018
immunoglobulin/T-cell receptor gene segments	0
other	0
mRNAs	19,534
fully-supported	18,135
with > 5% ab initio	701
partial	111
with filled gap(s)	0
known RefSeq (NM_)	0
model RefSeq (XM_)	19,534
non-coding RNAs	1,765
fully-supported	1,213
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	1,432
pseudo transcripts	0
fully-supported	0
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	0
CDSs	19,534
fully-supported	18,135
with > 5% ab initio	755
partial	111
with major correction(s)	305
known RefSeq (NP_)	0
model RefSeq (XP_)	19,534

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	14,940	6,490	2,061	47	250,661
All transcripts	21,299	2,236	1,654	47	56,967
mRNA	19,534	2,380	1,763	129	56,967
misc_RNA	243	1,961	1,389	128	10,469
tRNA	333	74	73	71	84
lncRNA	970	540	446	81	4,250
snoRNA	70	108	89	47	303
snRNA	34	158	166	65	269
guide_RNA	4	145	159	105	182
rRNA	111	745	119	118	3,955
Single-exon transcripts	1,763	1,179	999	129	5,537
coding transcripts (NM_/XM_ )	1,763	1,179	999	129	5,537
CDSs	19,534	1,857	1,332	99	55,611
Exons	69,644	451	258	2	13,188
in coding transcripts (NM_/XM_ )	67,092	460	263	2	13,188
in non-coding transcripts (NR_/XR_ )	3,262	268	164	2	4,676
Introns	53,190	1,588	115	30	179,350
in coding transcripts (NM_/XM_ )	51,639	1,543	114	30	179,350
in non-coding transcripts (NR_/XR_ )	2,238	2,588	148	30	168,382

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	1.44	1	1	43
Number of exons per transcript	5.16	4	1	77

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the Drosophila melanogaster known RefSeq proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 13595 coding genes, 13117 genes had a protein with an alignment covering 50% or more of the query and 8082 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: Drosophila melanogaster known RefSeq proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker for each assembly. RepeatMasker results are only used for organisms for which a comprehensive repeat library is available.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker and RepeatMasker only.

Assembly name	Assembly accession	% Masked with RepeatMasker	% Masked with WindowMasker
UK_Dinn_1.0	GCF_004354385.1	13.76%	35.99%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez, aligned to the genome by Splign or ProSplign and passed to Gnomon, NCBI's gene prediction software.

Depending on the other evidence available, long 454 reads (with average length above 250 nt) may be aligned as traditional evidence and reported in the Transcript alignments section or aligned with RNA-Seq reads and reported in the RNA-Seq alignments section.

Transcript alignments

No transcript evidence was used in this annotation

RNA-Seq alignments

The following RNA-Seq reads from the Sequence Read Archive were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Publication	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	NA	Aggregate of all aligned samples	967,706,053	49%	10%	71,850
SAMN08322877	NA	42 N 78 W (Drosophila falleni, SAMN08322877)	52,138,204	49%	18%	41,317
SAMN08322878	NA	42 N 78 W (Drosophila falleni, SAMN08322878)	60,393,366	7%	14%	29,966
SAMN08322879	NA	42 N 78 W (Drosophila falleni, SAMN08322879)	74,778,384	23%	11%	36,755
SAMN08322880	NA	42 N 78 W (Drosophila falleni, SAMN08322880)	69,975,052	35%	16%	41,163
SAMN08322881	NA	42 N 78 W (Drosophila falleni, SAMN08322881)	53,356,052	46%	18%	41,609
SAMN08322882	NA	42 N 78 W (Drosophila falleni, SAMN08322882)	67,017,348	9%	17%	33,257
SAMN08322883	NA	42 N 78 W (Drosophila falleni, SAMN08322883)	53,808,324	25%	16%	39,857
SAMN08322884	NA	42 N 78 W (Drosophila falleni, SAMN08322884)	50,844,954	41%	17%	39,217
SAMN10320660	NA	whole body (Drosophila falleni, SAMN10320660)	4,716,897	50%	5%	10,787
SAMN10320661	NA	whole body (Drosophila falleni, SAMN10320661)	2,504,234	33%	4%	5,106
SAMN10320662	NA	whole body (Drosophila falleni, SAMN10320662)	4,312,091	71%	7%	6,975
SAMN10320663	NA	whole body (Drosophila falleni, SAMN10320663)	4,873,245	73%	5%	7,163
SAMN10320664	NA	whole body (Drosophila falleni, SAMN10320664)	3,261,579	76%	2%	5,326
SAMN10320665	NA	whole body (Drosophila falleni, SAMN10320665)	2,541,774	71%	3%	5,821
SAMN10320666	NA	whole body (Drosophila falleni, SAMN10320666)	11,103,888	61%	5%	12,868
SAMN10320667	NA	whole body (Drosophila falleni, SAMN10320667)	2,283,231	54%	6%	11,780
SAMN10320672	NA	whole body (Drosophila falleni, SAMN10320672)	8,189,368	63%	5%	12,224
SAMN10320673	NA	whole body (Drosophila falleni, SAMN10320673)	3,589,760	65%	7%	7,241
SAMN10320674	NA	whole body (Drosophila falleni, SAMN10320674)	2,328,674	84%	1%	5,131
SAMN10320682	NA	whole body (Drosophila falleni, SAMN10320682)	3,808,659	50%	2%	4,601
SAMN10320683	NA	whole body (Drosophila falleni, SAMN10320683)	4,401,475	73%	6%	9,111
SAMN10320684	NA	whole body (Drosophila falleni, SAMN10320684)	1,799,754	78%	2%	5,093
SAMN10320691	NA	whole body (Drosophila falleni, SAMN10320691)	790,173	72%	3%	4,158
SAMN10320692	NA	whole body (Drosophila falleni, SAMN10320692)	3,796,311	50%	4%	7,297
SAMN10320693	NA	whole body (Drosophila falleni, SAMN10320693)	3,925,510	44%	4%	6,274
SAMN10320697	NA	whole body (Drosophila falleni, SAMN10320697)	1,055,076	62%	7%	4,881
SAMN10320698	NA	whole body (Drosophila falleni, SAMN10320698)	3,660,317	69%	2%	5,594
SAMN10320699	NA	whole body (Drosophila falleni, SAMN10320699)	2,585,735	42%	7%	4,349
SAMN10320703	NA	whole body (Drosophila falleni, SAMN10320703)	11,007,055	65%	5%	18,079
SAMN10320704	NA	whole body (Drosophila falleni, SAMN10320704)	7,535,100	62%	5%	16,101
SAMN10320705	NA	whole body (Drosophila falleni, SAMN10320705)	6,443,369	56%	4%	12,638
SAMN10320712	NA	whole body (Drosophila falleni, SAMN10320712)	4,953,452	73%	2%	12,108
SAMN10320713	NA	whole body (Drosophila falleni, SAMN10320713)	8,109,896	57%	5%	15,758
SAMN10320714	NA	whole body (Drosophila falleni, SAMN10320714)	3,289,711	70%	2%	5,836
SAMN10320715	NA	whole body (Drosophila falleni, SAMN10320715)	7,953,703	66%	5%	17,222
SAMN10320722	NA	whole body (Drosophila falleni, SAMN10320722)	1,460,190	64%	6%	7,244
SAMN10320723	NA	whole body (Drosophila falleni, SAMN10320723)	2,840,224	62%	5%	9,635
SAMN10320724	NA	whole body (Drosophila falleni, SAMN10320724)	2,941,248	58%	5%	11,493
SAMN10320725	NA	whole body (Drosophila falleni, SAMN10320725)	10,984,657	40%	5%	11,146
SAMN10320733	NA	whole body (Drosophila falleni, SAMN10320733)	5,611,671	63%	5%	12,535
SAMN10320734	NA	whole body (Drosophila falleni, SAMN10320734)	5,202,182	75%	5%	11,143
SAMN10320735	NA	whole body (Drosophila falleni, SAMN10320735)	5,535,413	23%	5%	8,206
SAMN10320736	NA	whole body (Drosophila falleni, SAMN10320736)	3,273,254	67%	5%	9,026
SAMN10320737	NA	whole body (Drosophila falleni, SAMN10320737)	5,144,614	63%	4%	10,815
SAMN10320738	NA	whole body (Drosophila falleni, SAMN10320738)	2,478,646	58%	6%	8,000
SAMN10320739	NA	whole body (Drosophila falleni, SAMN10320739)	4,909,522	35%	5%	7,020
SAMN10320767	NA	whole body (Drosophila falleni, SAMN10320767)	5,137,450	77%	6%	12,044
SAMN10320768	NA	whole body (Drosophila falleni, SAMN10320768)	3,005,791	75%	5%	8,178
SAMN10320769	NA	whole body (Drosophila falleni, SAMN10320769)	3,277,654	79%	6%	8,642
SAMN10320770	NA	whole body (Drosophila falleni, SAMN10320770)	4,035,925	56%	7%	8,120
SAMN10320771	NA	whole body (Drosophila falleni, SAMN10320771)	17,622,065	67%	6%	17,395
SAMN10320772	NA	whole body (Drosophila falleni, SAMN10320772)	8,045,143	67%	4%	10,465
SAMN10320773	NA	whole body (Drosophila falleni, SAMN10320773)	5,056,588	66%	4%	9,194
SAMN10320774	NA	whole body (Drosophila falleni, SAMN10320774)	3,301,166	66%	7%	8,076
SAMN10320779	NA	whole body (Drosophila falleni, SAMN10320779)	11,312,265	72%	4%	14,655
SAMN10320780	NA	whole body (Drosophila falleni, SAMN10320780)	5,497,505	69%	5%	11,524
SAMN10320781	NA	whole body (Drosophila falleni, SAMN10320781)	5,437,892	37%	5%	9,668
SAMN10320782	NA	whole body (Drosophila falleni, SAMN10320782)	4,170,573	79%	5%	9,220
SAMN10320783	NA	whole body (Drosophila falleni, SAMN10320783)	6,228,768	71%	5%	13,561
SAMN10320802	NA	whole body (Drosophila falleni, SAMN10320802)	10,801,271	65%	4%	11,220
SAMN10320803	NA	whole body (Drosophila falleni, SAMN10320803)	7,263,633	75%	5%	13,031
SAMN10320804	NA	whole body (Drosophila falleni, SAMN10320804)	3,758,394	74%	4%	8,029
SAMN10320805	NA	whole body (Drosophila falleni, SAMN10320805)	3,428,182	63%	5%	7,140
SAMN10320806	NA	whole body (Drosophila falleni, SAMN10320806)	4,368,883	66%	6%	7,394
SAMN11037167	30865231	whole body (Drosophila innubila, adult, female, SAMN11037167)	15,743,259	71%	13%	25,917
SAMN11037168	30865231	whole body (Drosophila innubila, adult, male, SAMN11037168)	16,678,256	93%	10%	34,984
SAMN11037169	30865231	whole body (Drosophila innubila, adult, female, SAMN11037169)	15,108,306	88%	10%	30,922
SAMN11037170	30865231	whole body (Drosophila innubila, adult, male, SAMN11037170)	23,413,411	90%	9%	37,417
SAMN11037171	30865231	head (Drosophila innubila, adult, female, SAMN11037171)	13,490,766	64%	10%	10,480
SAMN11037172	30865231	head (Drosophila innubila, adult, male, SAMN11037172)	20,159,109	90%	10%	33,160
SAMN11037173	30865231	embryo, whole body (Drosophila innubila, embryo, SAMN11037173)	14,729,163	56%	9%	17,664
SAMN11037174	30865231	embryo, whole body (Drosophila innubila, embryo, SAMN11037174)	17,191,913	90%	11%	30,982
SAMN11037175	30865231	larvae, whole body (Drosophila innubila, larva, SAMN11037175)	17,002,327	89%	10%	28,160
SAMN11037176	30865231	larvae, whole body (Drosophila innubila, larvae, SAMN11037176)	16,712,914	74%	9%	21,429
SAMN11037177	30865231	pupae, whole body (Drosophila innubila, pupae, SAMN11037177)	14,692,306	80%	10%	35,199
SAMN11037178	30865231	pupae, whole body (Drosophila innubila, pupae, SAMN11037178)	13,521,833	53%	9%	24,526

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
SRR6464550	SRX3554608	SRP128923	SAMN08322877	52,138,204	49%	18%
SRR6464546	SRX3554612	SRP128923	SAMN08322878	60,393,366	7%	14%
SRR6464600	SRX3554558	SRP128923	SAMN08322879	74,778,384	23%	11%
SRR6464601	SRX3554557	SRP128923	SAMN08322880	69,975,052	35%	16%
SRR6464609	SRX3554549	SRP128923	SAMN08322881	53,356,052	46%	18%
SRR6464570	SRX3554588	SRP128923	SAMN08322882	67,017,348	9%	17%
SRR6464608	SRX3554550	SRP128923	SAMN08322883	53,808,324	25%	16%
SRR6464611	SRX3554547	SRP128923	SAMN08322884	50,844,954	41%	17%
SRR8131139	SRX4952240	SRP167159	SAMN10320660	4,716,897	50%	5%
SRR8131140	SRX4952239	SRP167159	SAMN10320661	2,504,234	33%	4%
SRR8131141	SRX4952238	SRP167159	SAMN10320662	4,312,091	71%	7%
SRR8131142	SRX4952237	SRP167159	SAMN10320663	4,873,245	73%	5%
SRR8131143	SRX4952236	SRP167159	SAMN10320664	3,261,579	76%	2%
SRR8131144	SRX4952235	SRP167159	SAMN10320665	2,541,774	71%	3%
SRR8131135	SRX4952244	SRP167159	SAMN10320666	11,103,888	61%	5%
SRR8131136	SRX4952243	SRP167159	SAMN10320667	2,283,231	54%	6%
SRR8131056	SRX4952323	SRP167159	SAMN10320672	8,189,368	63%	5%
SRR8131057	SRX4952322	SRP167159	SAMN10320673	3,589,760	65%	7%
SRR8131058	SRX4952321	SRP167159	SAMN10320674	2,328,674	84%	1%
SRR8131152	SRX4952227	SRP167159	SAMN10320682	3,808,659	50%	2%
SRR8131151	SRX4952228	SRP167159	SAMN10320683	4,401,475	73%	6%
SRR8131150	SRX4952229	SRP167159	SAMN10320684	1,799,754	78%	2%
SRR8131232	SRX4952147	SRP167159	SAMN10320691	790,173	72%	3%
SRR8131229	SRX4952150	SRP167159	SAMN10320692	3,796,311	50%	4%
SRR8131230	SRX4952149	SRP167159	SAMN10320693	3,925,510	44%	4%
SRR8131226	SRX4952153	SRP167159	SAMN10320697	1,055,076	62%	7%
SRR8130976	SRX4952403	SRP167159	SAMN10320698	3,660,317	69%	2%
SRR8130975	SRX4952404	SRP167159	SAMN10320699	2,585,735	42%	7%
SRR8131316	SRX4952408	SRP167159	SAMN10320703	11,007,055	65%	5%
SRR8130974	SRX4952405	SRP167159	SAMN10320704	7,535,100	62%	5%
SRR8130973	SRX4952406	SRP167159	SAMN10320705	6,443,369	56%	4%
SRR8131097	SRX4952282	SRP167159	SAMN10320712	4,953,452	73%	2%
SRR8131098	SRX4952281	SRP167159	SAMN10320713	8,109,896	57%	5%
SRR8131099	SRX4952280	SRP167159	SAMN10320714	3,289,711	70%	2%
SRR8131100	SRX4952279	SRP167159	SAMN10320715	7,953,703	66%	5%
SRR8131186	SRX4952193	SRP167159	SAMN10320722	1,460,190	64%	6%
SRR8131185	SRX4952194	SRP167159	SAMN10320723	2,840,224	62%	5%
SRR8131184	SRX4952195	SRP167159	SAMN10320724	2,941,248	58%	5%
SRR8131183	SRX4952196	SRP167159	SAMN10320725	10,984,657	40%	5%
SRR8131262	SRX4952117	SRP167159	SAMN10320733	5,611,671	63%	5%
SRR8131259	SRX4952120	SRP167159	SAMN10320734	5,202,182	75%	5%
SRR8131260	SRX4952119	SRP167159	SAMN10320735	5,535,413	23%	5%
SRR8131265	SRX4952114	SRP167159	SAMN10320736	3,273,254	67%	5%
SRR8131266	SRX4952113	SRP167159	SAMN10320737	5,144,614	63%	4%
SRR8131021	SRX4952358	SRP167159	SAMN10320738	2,478,646	58%	6%
SRR8131020	SRX4952359	SRP167159	SAMN10320739	4,909,522	35%	5%
SRR8131312	SRX4952412	SRP167159	SAMN10320767	5,137,450	77%	6%
SRR8131268	SRX4952111	SRP167159	SAMN10320768	3,005,791	75%	5%
SRR8131267	SRX4952112	SRP167159	SAMN10320769	3,277,654	79%	6%
SRR8131270	SRX4952109	SRP167159	SAMN10320770	4,035,925	56%	7%
SRR8131269	SRX4952110	SRP167159	SAMN10320771	17,622,065	67%	6%
SRR8131264	SRX4952115	SRP167159	SAMN10320772	8,045,143	67%	4%
SRR8131263	SRX4952116	SRP167159	SAMN10320773	5,056,588	66%	4%
SRR8131080	SRX4952299	SRP167159	SAMN10320774	3,301,166	66%	7%
SRR8131182	SRX4952197	SRP167159	SAMN10320779	11,312,265	72%	4%
SRR8131195	SRX4952184	SRP167159	SAMN10320780	5,497,505	69%	5%
SRR8131196	SRX4952183	SRP167159	SAMN10320781	5,437,892	37%	5%
SRR8131193	SRX4952186	SRP167159	SAMN10320782	4,170,573	79%	5%
SRR8131194	SRX4952185	SRP167159	SAMN10320783	6,228,768	71%	5%
SRR8130979	SRX4952400	SRP167159	SAMN10320802	10,801,271	65%	4%
SRR8130980	SRX4952399	SRP167159	SAMN10320803	7,263,633	75%	5%
SRR8130981	SRX4952398	SRP167159	SAMN10320804	3,758,394	74%	4%
SRR8130982	SRX4952397	SRP167159	SAMN10320805	3,428,182	63%	5%
SRR8130988	SRX4952391	SRP167159	SAMN10320806	4,368,883	66%	6%
SRR8651759	SRX5449481	SRP187240	SAMN11037167	15,743,259	71%	13%
SRR8651758	SRX5449482	SRP187240	SAMN11037168	16,678,256	93%	10%
SRR8651757	SRX5449483	SRP187240	SAMN11037169	15,108,306	88%	10%
SRR8651756	SRX5449484	SRP187240	SAMN11037170	23,413,411	90%	9%
SRR8651765	SRX5449475	SRP187240	SAMN11037171	13,490,766	64%	10%
SRR8651764	SRX5449476	SRP187240	SAMN11037172	20,159,109	90%	10%
SRR8651769	SRX5449471	SRP187240	SAMN11037173	14,729,163	56%	9%
SRR8651768	SRX5449472	SRP187240	SAMN11037174	17,191,913	90%	11%
SRR8651771	SRX5449469	SRP187240	SAMN11037175	17,002,327	89%	10%
SRR8651770	SRX5449470	SRP187240	SAMN11037176	16,712,914	74%	9%
SRR8651767	SRX5449473	SRP187240	SAMN11037177	14,692,306	80%	10%
SRR8651766	SRX5449474	SRP187240	SAMN11037178	13,521,833	53%	9%

Protein alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Insecta GenBank	77,595	56,517 (72.84%)	56,517 (72.84%)	66.46%	67.34%
Drosophila melanogaster GenBank	28,019	11,813 (42.16%)	11,813 (42.16%)	72.62%	79.06%
Drosophila melanogaster known RefSeq (NP_)	30,157	20,440 (67.78%)	20,440 (67.78%)	73.64%	81.59%

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20

RefSeq

Integrated reference sequences