NCBI Vulpes vulpes Annotation Release 100

The RefSeq genome records for Vulpes vulpes were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. This report presents statistics on the annotation products, the input data used in the pipeline and intermediate alignment results.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as NCBI Vulpes vulpes Annotation Release 100

Annotation release ID: 100
Date of Entrez queries for transcripts and proteins: Jul 26 2018
Date of submission of annotation to the public databases: Jul 31 2018
Software version: 8.1

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
VulVul2.2	GCF_003160815.1	University of Illinois at Urbana-Champaign	05-25-2018	Reference	1 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	VulVul2.2
Genes and pseudogenes	29,061
protein-coding	19,366
non-coding	4,109
transcribed pseudogenes	0
non-transcribed pseudogenes	5,475
genes with variants	7,591
immunoglobulin/T-cell receptor gene segments	111
other	0
mRNAs	37,902
fully-supported	34,401
with > 5% ab initio	1,570
partial	673
with filled gap(s)	1
known RefSeq (NM_)	0
model RefSeq (XM_)	37,902
non-coding RNAs	6,002
fully-supported	3,856
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	5,593
pseudo transcripts	0
fully-supported	0
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	0
CDSs	38,026
fully-supported	34,401
with > 5% ab initio	1,926
partial	693
with major correction(s)	842
known RefSeq (NP_)	0
model RefSeq (XP_)	37,915

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	23,475	41,659	13,350	37	1,976,276
All transcripts	43,904	3,071	2,525	37	104,111
mRNA	37,902	3,336	2,767	102	104,111
misc_RNA	1,049	3,153	2,609	169	13,889
tRNA	407	74	73	59	84
lncRNA	2,807	1,732	1,235	55	13,508
snoRNA	534	108	104	47	328
snRNA	1,168	113	107	37	197
guide_RNA	30	162	138	78	420
rRNA	7	452	119	119	1,578
Single-exon transcripts	1,764	1,283	948	132	10,872
coding transcripts (NM_/XM_ )	1,764	1,283	948	132	10,872
CDSs	37,915	1,949	1,446	96	103,089
Exons	224,855	301	136	1	18,518
in coding transcripts (NM_/XM_ )	215,642	294	135	1	18,518
in non-coding transcripts (NR_/XR_ )	16,242	355	143	3	11,292
Introns	201,937	5,326	1,399	30	928,065
in coding transcripts (NM_/XM_ )	195,367	5,245	1,385	30	928,065
in non-coding transcripts (NR_/XR_ )	13,388	5,847	1,530	33	333,431

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	1.89	1	1	35
Number of exons per transcript	11.47	9	1	312

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the UniProtKB/Swiss-Prot curated proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 19353 coding genes, 19055 genes had a protein with an alignment covering 50% or more of the query and 15575 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: UniProtKB/Swiss-Prot curated proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker for each assembly. RepeatMasker results are only used for organisms for which a comprehensive repeat library is available.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with RepeatMasker	% Masked with WindowMasker
VulVul2.2	GCF_003160815.1	39.60%	30.28%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez, aligned to the genome by Splign or ProSplign and passed to Gnomon, NCBI's gene prediction software.

Depending on the other evidence available, long 454 reads (with average length above 250 nt) may be aligned as traditional evidence and reported in the Transcript alignments section or aligned with RNA-Seq reads and reported in the RNA-Seq alignments section.

Transcript alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species Genbank	50	50 (100.00%)	49 (98.00%)	99.46%	91.44%

RNA-Seq alignments

The following RNA-Seq reads from the Sequence Read Archive were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	Aggregate of all aligned samples	2,088,083,022	84%	10%	220,468
SAMEA3138149	VvulF01 (Vulpes vulpes, SAMEA3138149)	103,018,172	71%	22%	161,781
SAMN00194223	brain; prefrontal cortex (Vulpes vulpes, SAMN00194223)	2,565,892	75%	41%	142,816
SAMN00194224	brain; prefrontal cortex (Vulpes vulpes, SAMN00194224)	3,379,343	73%	38%	146,575
SAMN04383754	Right prefrontal cortex (Vulpes vulpes, male, SAMN04383754)	25,492,942	85%	7%	132,620
SAMN04383755	Right prefrontal cortex (Vulpes vulpes, male, SAMN04383755)	42,038,051	84%	8%	134,364
SAMN04383756	Right prefrontal cortex (Vulpes vulpes, male, SAMN04383756)	31,077,357	85%	7%	137,886
SAMN04383757	Right prefrontal cortex (Vulpes vulpes, male, SAMN04383757)	20,659,559	84%	8%	126,519
SAMN04383758	Right prefrontal cortex (Vulpes vulpes, male, SAMN04383758)	37,020,077	85%	8%	147,888
SAMN04383759	Right prefrontal cortex (Vulpes vulpes, male, SAMN04383759)	23,317,339	85%	8%	121,980
SAMN04383760	Right prefrontal cortex (Vulpes vulpes, male, SAMN04383760)	35,614,595	85%	9%	116,053
SAMN04383761	Right prefrontal cortex (Vulpes vulpes, male, SAMN04383761)	50,603,352	85%	7%	150,767
SAMN04383762	Right prefrontal cortex (Vulpes vulpes, male, SAMN04383762)	43,980,944	85%	7%	147,488
SAMN04383763	Right prefrontal cortex (Vulpes vulpes, male, SAMN04383763)	38,957,530	86%	8%	135,152
SAMN04383764	Right prefrontal cortex (Vulpes vulpes, male, SAMN04383764)	23,943,747	85%	9%	107,239
SAMN04383765	Right prefrontal cortex (Vulpes vulpes, male, SAMN04383765)	30,760,396	84%	7%	137,905
SAMN04383766	Right prefrontal cortex (Vulpes vulpes, male, SAMN04383766)	18,588,067	84%	7%	123,965
SAMN04383767	Right prefrontal cortex (Vulpes vulpes, male, SAMN04383767)	27,253,195	85%	7%	135,018
SAMN04383768	Right prefrontal cortex (Vulpes vulpes, male, SAMN04383768)	45,762,357	85%	8%	150,112
SAMN04383769	Right prefrontal cortex (Vulpes vulpes, male, SAMN04383769)	45,247,750	85%	7%	148,344
SAMN04383770	Right prefrontal cortex (Vulpes vulpes, male, SAMN04383770)	25,896,719	86%	8%	134,717
SAMN04383771	Right prefrontal cortex (Vulpes vulpes, male, SAMN04383771)	35,859,502	85%	7%	141,696
SAMN04383772	Right prefrontal cortex (Vulpes vulpes, male, SAMN04383772)	28,570,574	85%	7%	134,078
SAMN04383773	Right prefrontal cortex (Vulpes vulpes, male, SAMN04383773)	47,274,581	85%	7%	148,831
SAMN04383774	Right prefrontal cortex (Vulpes vulpes, male, SAMN04383774)	32,082,055	85%	7%	139,845
SAMN04383775	Right prefrontal cortex (Vulpes vulpes, male, SAMN04383775)	43,558,186	85%	7%	149,367
SAMN04383776	Right prefrontal cortex (Vulpes vulpes, male, SAMN04383776)	40,582,881	86%	8%	145,396
SAMN04383777	Right prefrontal cortex (Vulpes vulpes, male, SAMN04383777)	33,541,449	84%	7%	135,459
SAMN04383778	Right basal forebrain (Vulpes vulpes, male, SAMN04383778)	27,441,406	84%	7%	136,240
SAMN04383779	Right basal forebrain (Vulpes vulpes, male, SAMN04383779)	36,466,034	82%	7%	146,293
SAMN04383780	Right basal forebrain (Vulpes vulpes, male, SAMN04383780)	28,627,351	82%	7%	140,001
SAMN04383781	Right basal forebrain (Vulpes vulpes, male, SAMN04383781)	36,162,227	82%	7%	144,601
SAMN04383782	Right basal forebrain (Vulpes vulpes, male, SAMN04383782)	26,161,591	84%	7%	137,027
SAMN04383784	Right basal forebrain (Vulpes vulpes, male, SAMN04383784)	28,850,472	84%	7%	138,566
SAMN04383785	Right basal forebrain (Vulpes vulpes, male, SAMN04383785)	29,543,592	84%	9%	104,133
SAMN04383786	Right basal forebrain (Vulpes vulpes, male, SAMN04383786)	35,149,965	84%	7%	143,214
SAMN04383787	Right basal forebrain (Vulpes vulpes, male, SAMN04383787)	20,246,295	65%	7%	77,375
SAMN04383788	Right basal forebrain (Vulpes vulpes, male, SAMN04383788)	31,876,398	84%	7%	141,317
SAMN04383789	Right basal forebrain (Vulpes vulpes, male, SAMN04383789)	40,016,774	83%	7%	146,293
SAMN04383790	Right basal forebrain (Vulpes vulpes, male, SAMN04383790)	30,779,304	84%	7%	139,407
SAMN04383791	Right basal forebrain (Vulpes vulpes, male, SAMN04383791)	42,716,663	83%	7%	147,077
SAMN04383792	Right basal forebrain (Vulpes vulpes, male, SAMN04383792)	28,156,553	81%	7%	136,537
SAMN04383793	Right basal forebrain (Vulpes vulpes, male, SAMN04383793)	20,077,866	81%	7%	122,983
SAMN04383794	Right basal forebrain (Vulpes vulpes, male, SAMN04383794)	28,387,122	81%	7%	134,064
SAMN04383795	Right basal forebrain (Vulpes vulpes, male, SAMN04383795)	32,395,119	83%	7%	138,974
SAMN04383796	Right basal forebrain (Vulpes vulpes, male, SAMN04383796)	36,849,922	84%	7%	146,277
SAMN04383797	Right basal forebrain (Vulpes vulpes, male, SAMN04383797)	31,820,869	83%	7%	139,370
SAMN04383798	Right basal forebrain (Vulpes vulpes, male, SAMN04383798)	27,147,015	83%	7%	133,824
SAMN04383799	Right basal forebrain (Vulpes vulpes, male, SAMN04383799)	28,072,867	83%	7%	137,419
SAMN04383800	Right basal forebrain (Vulpes vulpes, male, SAMN04383800)	39,846,658	84%	7%	146,763
SAMN04383801	Right basal forebrain (Vulpes vulpes, male, SAMN04383801)	21,938,281	84%	7%	131,968
SAMN04383802	Right basal forebrain (Vulpes vulpes, male, SAMN04383802)	35,944,544	83%	7%	145,883
SAMN07138598	Pituitary tissue (Vulpes vulpes, male, SAMN07138598)	34,577,964	87%	17%	172,582
SAMN07138599	Pituitary tissue (Vulpes vulpes, male, SAMN07138599)	31,279,625	87%	18%	172,324
SAMN07138600	Pituitary tissue (Vulpes vulpes, male, SAMN07138600)	33,576,358	87%	17%	172,461
SAMN07138601	Pituitary tissue (Vulpes vulpes, male, SAMN07138601)	37,936,135	88%	18%	176,714
SAMN07138602	Pituitary tissue (Vulpes vulpes, male, SAMN07138602)	32,876,971	88%	17%	173,505
SAMN07138603	Pituitary tissue (Vulpes vulpes, male, SAMN07138603)	34,760,977	88%	18%	175,367
SAMN07138604	Pituitary tissue (Vulpes vulpes, male, SAMN07138604)	34,731,107	87%	16%	174,019
SAMN07138605	Pituitary tissue (Vulpes vulpes, male, SAMN07138605)	34,735,178	87%	17%	175,008
SAMN07138606	Pituitary tissue (Vulpes vulpes, male, SAMN07138606)	32,688,890	87%	17%	173,048
SAMN07138607	Pituitary tissue (Vulpes vulpes, male, SAMN07138607)	33,570,511	87%	17%	173,839
SAMN07138608	Pituitary tissue (Vulpes vulpes, male, SAMN07138608)	32,901,110	88%	17%	172,740
SAMN07138609	Pituitary tissue (Vulpes vulpes, male, SAMN07138609)	33,126,696	87%	17%	173,192

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
ERR687855	ERX632795	ERP008762	SAMEA3138149	103,018,172	71%	22%
SRR094923	SRX039019	SRP005414	SAMN00194223	474,665	76%	42%
SRR094995	SRX039019	SRP005414	SAMN00194223	516,238	76%	41%
SRR094996	SRX039019	SRP005414	SAMN00194223	545,706	75%	40%
SRR094997	SRX039019	SRP005414	SAMN00194223	1,029,283	75%	42%
SRR094922	SRX039020	SRP005414	SAMN00194224	446,792	72%	39%
SRR094998	SRX039020	SRP005414	SAMN00194224	538,387	73%	37%
SRR094999	SRX039020	SRP005414	SAMN00194224	1,284,244	73%	38%
SRR095000	SRX039020	SRP005414	SAMN00194224	1,109,920	73%	39%
SRR3084289	SRX1515903	SRP068062	SAMN04383754	25,492,942	85%	7%
SRR3084290	SRX1515904	SRP068062	SAMN04383755	42,038,051	84%	8%
SRR3084291	SRX1515905	SRP068062	SAMN04383756	31,077,357	85%	7%
SRR3084292	SRX1515906	SRP068062	SAMN04383757	20,659,559	84%	8%
SRR3084293	SRX1515907	SRP068062	SAMN04383758	37,020,077	85%	8%
SRR3084294	SRX1515908	SRP068062	SAMN04383759	23,317,339	85%	8%
SRR3084295	SRX1515909	SRP068062	SAMN04383760	35,614,595	85%	9%
SRR3084296	SRX1515910	SRP068062	SAMN04383761	50,603,352	85%	7%
SRR3084297	SRX1515911	SRP068062	SAMN04383762	43,980,944	85%	7%
SRR3084298	SRX1515912	SRP068062	SAMN04383763	38,957,530	86%	8%
SRR3084299	SRX1515913	SRP068062	SAMN04383764	23,943,747	85%	9%
SRR3084300	SRX1515914	SRP068062	SAMN04383765	30,760,396	84%	7%
SRR3084301	SRX1515915	SRP068062	SAMN04383766	18,588,067	84%	7%
SRR3084302	SRX1515916	SRP068062	SAMN04383767	27,253,195	85%	7%
SRR3084303	SRX1515918	SRP068062	SAMN04383768	45,762,357	85%	8%
SRR3084304	SRX1515919	SRP068062	SAMN04383769	45,247,750	85%	7%
SRR3084305	SRX1515920	SRP068062	SAMN04383770	25,896,719	86%	8%
SRR3084306	SRX1515921	SRP068062	SAMN04383771	35,859,502	85%	7%
SRR3084307	SRX1515922	SRP068062	SAMN04383772	28,570,574	85%	7%
SRR3084308	SRX1515923	SRP068062	SAMN04383773	47,274,581	85%	7%
SRR3084309	SRX1515924	SRP068062	SAMN04383774	32,082,055	85%	7%
SRR3084310	SRX1515925	SRP068062	SAMN04383775	43,558,186	85%	7%
SRR3084311	SRX1515926	SRP068062	SAMN04383776	40,582,881	86%	8%
SRR3084312	SRX1515927	SRP068062	SAMN04383777	33,541,449	84%	7%
SRR3084313	SRX1515928	SRP068062	SAMN04383778	27,441,406	84%	7%
SRR3084314	SRX1515929	SRP068062	SAMN04383779	36,466,034	82%	7%
SRR3084315	SRX1515930	SRP068062	SAMN04383780	28,627,351	82%	7%
SRR3084316	SRX1515931	SRP068062	SAMN04383781	36,162,227	82%	7%
SRR3084317	SRX1515932	SRP068062	SAMN04383782	26,161,591	84%	7%
SRR3084318	SRX1515933	SRP068062	SAMN04383784	28,850,472	84%	7%
SRR3084319	SRX1515934	SRP068062	SAMN04383785	29,543,592	84%	9%
SRR3084320	SRX1515935	SRP068062	SAMN04383786	35,149,965	84%	7%
SRR3084321	SRX1515936	SRP068062	SAMN04383787	20,246,295	65%	7%
SRR3084322	SRX1515937	SRP068062	SAMN04383788	31,876,398	84%	7%
SRR3084323	SRX1515938	SRP068062	SAMN04383789	40,016,774	83%	7%
SRR3084324	SRX1515939	SRP068062	SAMN04383790	30,779,304	84%	7%
SRR3084325	SRX1515940	SRP068062	SAMN04383791	42,716,663	83%	7%
SRR3084326	SRX1515941	SRP068062	SAMN04383792	28,156,553	81%	7%
SRR3084327	SRX1515942	SRP068062	SAMN04383793	20,077,866	81%	7%
SRR3084328	SRX1515943	SRP068062	SAMN04383794	28,387,122	81%	7%
SRR3084329	SRX1515944	SRP068062	SAMN04383795	32,395,119	83%	7%
SRR3084330	SRX1515945	SRP068062	SAMN04383796	36,849,922	84%	7%
SRR3084331	SRX1515947	SRP068062	SAMN04383797	31,820,869	83%	7%
SRR3084332	SRX1515948	SRP068062	SAMN04383798	27,147,015	83%	7%
SRR3084333	SRX1515949	SRP068062	SAMN04383799	28,072,867	83%	7%
SRR3084334	SRX1515950	SRP068062	SAMN04383800	39,846,658	84%	7%
SRR3084335	SRX1515951	SRP068062	SAMN04383801	21,938,281	84%	7%
SRR3084336	SRX1515952	SRP068062	SAMN04383802	35,944,544	83%	7%
SRR5573828	SRX2832165	SRP107226	SAMN07138598	34,577,964	87%	17%
SRR5573827	SRX2832166	SRP107226	SAMN07138599	31,279,625	87%	18%
SRR5573826	SRX2832167	SRP107226	SAMN07138600	33,576,358	87%	17%
SRR5573825	SRX2832168	SRP107226	SAMN07138601	37,936,135	88%	18%
SRR5573832	SRX2832161	SRP107226	SAMN07138602	32,876,971	88%	17%
SRR5573831	SRX2832162	SRP107226	SAMN07138603	34,760,977	88%	18%
SRR5573830	SRX2832163	SRP107226	SAMN07138604	34,731,107	87%	16%
SRR5573829	SRX2832164	SRP107226	SAMN07138605	34,735,178	87%	17%
SRR5573834	SRX2832159	SRP107226	SAMN07138606	32,688,890	87%	17%
SRR5573833	SRX2832160	SRP107226	SAMN07138607	33,570,511	87%	17%
SRR5573824	SRX2832169	SRP107226	SAMN07138608	32,901,110	88%	17%
SRR5573823	SRX2832170	SRP107226	SAMN07138609	33,126,696	87%	17%

Protein alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Homo sapiens known RefSeq (NP_)	51,100	50,092 (98.03%)	50,092 (98.03%)	76.58%	83.08%
Same-species GenBank	48	48 (100.00%)	48 (100.00%)	85.78%	89.32%

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20

RefSeq

Integrated reference sequences