NCBI Panthera pardus Annotation Release 100

The RefSeq genome records for Panthera pardus were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. This report presents statistics on the annotation products, the input data used in the pipeline and intermediate alignment results.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as NCBI Panthera pardus Annotation Release 100

Annotation release ID: 100
Date of Entrez queries for transcripts and proteins: Nov 29 2016
Date of submission of annotation to the public databases: Dec 6 2016
Software version: 7.2

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
PanPar1.0	GCF_001857705.1	Ulsan National Institute of Science and Technology	11-16-2016	Reference	1 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	PanPar1.0
Genes and pseudogenes	34,163
protein-coding	20,422
non-coding	10,226
pseudogenes	3,515
genes with variants	12,879
mRNAs	57,941
fully-supported	56,079
with > 5% ab initio	748
partial	377
with filled gap(s)	0
known RefSeq (NM_)	0
model RefSeq (XM_)	57,941
Other RNAs	15,839
fully-supported	15,318
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	15,318
CDSs	58,058
fully-supported	56,079
with > 5% ab initio	904
partial	395
with major correction(s)	643
known RefSeq (NP_)	0
model RefSeq (XP_)	57,941

Detailed reports

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	30,648	43,113	12,502	71	3,019,129
All transcripts	73,780	3,002	2,407	41	105,293
mRNA	57,941	3,356	2,747	153	105,293
misc_RNA	2,294	2,734	2,310	158	28,156
tRNA	521	74	73	71	85
lncRNA	13,024	1,592	1,146	41	19,279
Single-exon transcripts	1,798	1,384	969	218	8,742
coding transcripts (NM_/XM_ )	1,798	1,384	969	218	8,742
CDSs	57,941	2,068	1,515	96	104,211
Exons	279,855	320	142	1	21,905
in coding transcripts (NM_/XM_ )	244,547	293	138	1	21,905
in non-coding transcripts (NR_/XR_ )	48,621	417	166	2	10,987
Introns	245,407	6,981	1,674	30	1,081,186
in coding transcripts (NM_/XM_ )	221,606	6,847	1,636	30	1,081,186
in non-coding transcripts (NR_/XR_ )	36,725	7,345	1,927	30	483,628

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	2.44	1	1	50
Number of exons per transcript	11.51	8	1	325

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the UniProtKB/Swiss-Prot curated proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 20305 coding genes, 19760 genes had a protein with an alignment covering 50% or more of the query and 16575 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: UniProtKB/Swiss-Prot curated proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker for each assembly. RepeatMasker results are only used for organisms for which a comprehensive repeat library is available.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with RepeatMasker	% Masked with WindowMasker
PanPar1.0	GCF_001857705.1	41.22%	31.67%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez, aligned to the genome by Splign or ProSplign and passed to Gnomon, NCBI's gene prediction software.

Depending on the other evidence available, long 454 reads (with average length above 250 nt) may be aligned as traditional evidence and reported in the Transcript alignments section or aligned with RNA-Seq reads and reported in the RNA-Seq alignments section.

Transcript alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species Genbank	10	10 (100.00%)	9 (90.00%)	99.44%	99.88%
Felis catus known RefSeq (NM_/NR_)	420	417 (99.29%)	362 (86.19%)	98.26%	99.14%
Felis catus Genbank	1,216	1,149 (94.49%)	701 (57.65%)	97.63%	94.31%
Felis catus EST	919	854 (92.93%)	761 (82.81%)	97.69%	98.98%

RNA-Seq alignments

The following RNA-Seq reads from the Sequence Read Archive were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Track name	Number of reads	Percent aligned reads	Percent spliced reads	Number of introns
All	Aggregate of all aligned samples	7,092,935,490	85%	22%	369,352
SAMN01761015	testicle (Felis catus, SAMN01761015)	68,178,484	65%	5%	181,664
SAMN01831871	normal liver (Felis catus, adult, SAMN01831871)	36,356,376	81%	8%	122,090
SAMN01831872	normal liver (Felis catus, adult, SAMN01831872)	37,302,574	81%	8%	120,465
SAMN01831873	normal liver (Felis catus, adult, SAMN01831873)	43,786,726	81%	9%	123,378
SAMN01831921	normal kidney (Felis catus, adult, SAMN01831921)	42,717,142	77%	5%	144,737
SAMN01831922	normal kidney (Felis catus, adult, SAMN01831922)	43,667,072	77%	6%	136,380
SAMN01831923	normal kidney (Felis catus, adult, SAMN01831923)	36,100,760	78%	7%	139,845
SAMN01831965	normal brain, frontal parts (Felis catus, adult, SAMN01831965)	46,356,088	79%	5%	160,953
SAMN01831966	normal brain, frontal parts (Felis catus, adult, SAMN01831966)	48,799,196	79%	5%	160,658
SAMN01831967	normal brain, frontal parts (Felis catus, adult, SAMN01831967)	39,439,684	79%	5%	154,327
SAMN02058438	Subcutaneous adipose, Short Day (Felis catus, SAMN02058438)	25,217,636	93%	26%	143,002
SAMN02058439	Subcutaneous adipose, Long Day (Felis catus, SAMN02058439)	27,572,827	93%	23%	161,130
SAMN02058440	Subcutaneous adipose, Long Day (Felis catus, SAMN02058440)	26,276,810	93%	28%	152,422
SAMN02058441	Subcutaneous adipose, Short Day (Felis catus, SAMN02058441)	27,280,242	93%	25%	163,425
SAMN02058442	Subcutaneous adipose, Long Day (Felis catus, SAMN02058442)	27,959,031	92%	24%	153,747
SAMN02058443	Subcutaneous adipose, Short Day (Felis catus, SAMN02058443)	29,321,049	92%	26%	164,605
SAMN02058444	Subcutaneous adipose, Long Day (Felis catus, SAMN02058444)	27,074,727	92%	21%	166,539
SAMN02058445	Subcutaneous adipose, Short Day (Felis catus, SAMN02058445)	27,384,876	92%	23%	158,621
SAMN02058446	Subcutaneous adipose, Long Day (Felis catus, SAMN02058446)	27,650,878	93%	25%	162,040
SAMN02058447	Subcutaneous adipose, Short Day (Felis catus, SAMN02058447)	26,426,247	92%	22%	161,947
SAMN02058448	Subcutaneous adipose, Short Day (Felis catus, SAMN02058448)	30,277,827	93%	30%	148,928
SAMN02058449	Subcutaneous adipose, Long Day (Felis catus, SAMN02058449)	30,827,991	92%	21%	170,338
SAMN02058450	Subcutaneous adipose, Short Day (Felis catus, SAMN02058450)	30,835,006	93%	27%	160,644
SAMN02058451	Subcutaneous adipose, Long Day (Felis catus, SAMN02058451)	33,864,272	93%	27%	158,374
SAMN02058452	Subcutaneous adipose, Long Day (Felis catus, SAMN02058452)	28,897,848	92%	23%	164,137
SAMN02058453	Subcutaneous adipose, Short Day (Felis catus, SAMN02058453)	29,065,945	93%	26%	162,151
SAMN02058454	Subcutaneous adipose, Short Day (Felis catus, SAMN02058454)	30,055,186	92%	21%	163,972
SAMN02058455	Subcutaneous adipose, Long Day (Felis catus, SAMN02058455)	28,341,003	93%	24%	161,182
SAMN02058456	Subcutaneous adipose, Long Day (Felis catus, SAMN02058456)	31,620,077	92%	23%	175,444
SAMN02058457	Subcutaneous adipose, Short Day (Felis catus, SAMN02058457)	32,392,358	92%	24%	173,778
SAMN02374891	lung (Panthera tigris altaica, 2.5, female, SAMN02374891)	52,150,204	90%	15%	169,054
SAMN02378470	pooled samples (Panthera tigris altaica, 2.5, female, SAMN02378470)	163,074,066	89%	16%	164,167
SAMN04099974	Iridiocorneal angle (Felis catus, SAMN04099974)	177,835,580	91%	24%	251,166
SAMN04099975	Iridiocorneal angle (Felis catus, SAMN04099975)	182,421,744	91%	24%	246,854
SAMN04100432	brain (Felis catus, Two months, pooled male and female, SAMN04100432)	1,356,214,492	91%	31%	328,011
SAMN04498517	embryo (fetus) (Felis catus, not determined, SAMN04498517)	166,944,018	82%	20%	239,871
SAMN04498518	embryo (fetus) (Felis catus, not determined, SAMN04498518)	201,883,266	84%	24%	245,622
SAMN04498519	lung (Felis catus, male, SAMN04498519)	123,208,322	79%	17%	233,115
SAMN04498520	pancreas (Felis catus, male, SAMN04498520)	166,157,056	79%	38%	155,948
SAMN04498521	heart (Felis catus, male, SAMN04498521)	215,449,116	81%	14%	215,549
SAMN04498522	muscle (Felis catus, male, SAMN04498522)	106,996,830	86%	36%	157,152
SAMN04498523	ear cartilage (Felis catus, male, SAMN04498523)	148,955,788	83%	21%	244,093
SAMN04498524	spinal cord (Felis catus, male, SAMN04498524)	162,679,594	82%	16%	224,416
SAMN04498525	thymus (Felis catus, male, SAMN04498525)	104,390,084	72%	12%	178,782
SAMN04498526	kidney (Felis catus, male, SAMN04498526)	94,438,504	79%	20%	202,202
SAMN04498527	testes (Felis catus, male, SAMN04498527)	160,548,060	82%	25%	267,738
SAMN04498528	cerebellum (brain) (Felis catus, male, SAMN04498528)	131,799,730	83%	18%	228,643
SAMN04498529	parietal lobe (brain) (Felis catus, female, SAMN04498529)	96,748,380	81%	16%	219,532
SAMN04498530	hippocampus (brain) (Felis catus, female, SAMN04498530)	110,467,612	84%	19%	234,418
SAMN04498531	liver (Felis catus, female, SAMN04498531)	132,973,642	86%	29%	174,926
SAMN04498532	cerebellum (brain) (Felis catus, female, SAMN04498532)	128,178,488	81%	17%	229,447
SAMN04498533	temperal lobe (brain) (Felis catus, female, SAMN04498533)	119,882,660	82%	18%	202,456
SAMN04498534	salivary gland (Felis catus, female, SAMN04498534)	113,655,162	77%	19%	202,719
SAMN04498535	bone marrow (Felis catus, female, SAMN04498535)	72,193,490	79%	21%	165,878
SAMN04498536	head (embryo) (Felis catus, not determined, SAMN04498536)	122,891,208	76%	15%	207,699
SAMN04498537	body (embryo) (Felis catus, not determined, SAMN04498537)	123,289,946	81%	20%	239,743
SAMN04498538	retina (Felis catus, male, SAMN04498538)	108,620,542	87%	23%	226,810
SAMN04498539	skin (Felis catus, female, SAMN04498539)	133,555,718	83%	20%	239,487
SAMN04498540	retina (Felis catus, female, SAMN04498540)	117,242,720	81%	13%	233,843
SAMN04498541	kidney (Felis catus, female, SAMN04498541)	120,590,350	82%	19%	214,814
SAMN04498542	spleen (Felis catus, female, SAMN04498542)	121,451,690	82%	18%	237,882
SAMN04498543	ear tip (Felis catus, female, SAMN04498543)	123,699,138	85%	23%	222,369
SAMN04498544	uterus (Felis catus, female, SAMN04498544)	103,269,448	79%	15%	210,757
SAMN04498545	spleen (Felis catus, female, SAMN04498545)	85,691,628	82%	20%	184,217
SAMN04498546	skin (orange color) (Felis catus, missing, SAMN04498546)	155,642,984	82%	20%	243,450
SAMN04498547	skin (white color) (Felis catus, missing, SAMN04498547)	117,414,704	81%	19%	231,759
SAMN04498548	occipital (brain) (Felis catus, female, SAMN04498548)	149,283,588	83%	16%	234,801

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent spliced reads
SRR586041	SRX193575	SRP016073	SAMN01761015	68,178,484	65%	5%
SRR636854	SRX211594	SRP017611	SAMN01831871	36,356,376	81%	8%
SRR636855	SRX211595	SRP017611	SAMN01831872	37,302,574	81%	8%
SRR636856	SRX211596	SRP017611	SAMN01831873	43,786,726	81%	9%
SRR636904	SRX211644	SRP017611	SAMN01831921	42,717,142	77%	5%
SRR636905	SRX211645	SRP017611	SAMN01831922	43,667,072	77%	6%
SRR636906	SRX211646	SRP017611	SAMN01831923	36,100,760	78%	7%
SRR636948	SRX211688	SRP017611	SAMN01831965	46,356,088	79%	5%
SRR636949	SRX211689	SRP017611	SAMN01831966	48,799,196	79%	5%
SRR636950	SRX211690	SRP017611	SAMN01831967	39,439,684	79%	5%
SRR835484	SRX272130	SRP021539	SAMN02058438	25,217,636	93%	26%
SRR835485	SRX272131	SRP021539	SAMN02058439	27,572,827	93%	23%
SRR835486	SRX272132	SRP021539	SAMN02058440	26,276,810	93%	28%
SRR835487	SRX272133	SRP021539	SAMN02058441	27,280,242	93%	25%
SRR835488	SRX272134	SRP021539	SAMN02058442	27,959,031	92%	24%
SRR835489	SRX272135	SRP021539	SAMN02058443	29,321,049	92%	26%
SRR835490	SRX272136	SRP021539	SAMN02058444	27,074,727	92%	21%
SRR835491	SRX272137	SRP021539	SAMN02058445	27,384,876	92%	23%
SRR835492	SRX272138	SRP021539	SAMN02058446	27,650,878	93%	25%
SRR835493	SRX272139	SRP021539	SAMN02058447	26,426,247	92%	22%
SRR835494	SRX272140	SRP021539	SAMN02058448	30,277,827	93%	30%
SRR835495	SRX272141	SRP021539	SAMN02058449	30,827,991	92%	21%
SRR835496	SRX272142	SRP021539	SAMN02058450	30,835,006	93%	27%
SRR835497	SRX272143	SRP021539	SAMN02058451	33,864,272	93%	27%
SRR835498	SRX272144	SRP021539	SAMN02058452	28,897,848	92%	23%
SRR835499	SRX272145	SRP021539	SAMN02058453	29,065,945	93%	26%
SRR835500	SRX272146	SRP021539	SAMN02058454	30,055,186	92%	21%
SRR835501	SRX272147	SRP021539	SAMN02058455	28,341,003	93%	24%
SRR835502	SRX272148	SRP021539	SAMN02058456	31,620,077	92%	23%
SRR835503	SRX272149	SRP021539	SAMN02058457	32,392,358	92%	24%
SRR1014897	SRX365622	SRP032170	SAMN02374891	52,150,204	90%	15%
SRR1015468	SRX366926	SRP032171	SAMN02378470	54,358,022	89%	16%
SRR1015836	SRX366926	SRP032171	SAMN02378470	54,358,022	89%	16%
SRR1015838	SRX367214	SRP032171	SAMN02378470	54,358,022	89%	16%
SRR2470307	SRX1268863	SRP063937	SAMN04099974	177,835,580	91%	24%
SRR2470308	SRX1268864	SRP063937	SAMN04099975	182,421,744	91%	24%
SRR2495945	SRX1271494	SRP063963	SAMN04100432	1,356,214,492	91%	31%
SRR3200448	SRX1610301	SRP071078	SAMN04498517	166,944,018	82%	20%
SRR3200450	SRX1610303	SRP071078	SAMN04498518	201,883,266	84%	24%
SRR3200449	SRX1610302	SRP071078	SAMN04498519	123,208,322	79%	17%
SRR3200469	SRX1610322	SRP071078	SAMN04498520	166,157,056	79%	38%
SRR3200471	SRX1610324	SRP071078	SAMN04498521	215,449,116	81%	14%
SRR3200451	SRX1610304	SRP071078	SAMN04498522	106,996,830	86%	36%
SRR3200455	SRX1610308	SRP071078	SAMN04498523	148,955,788	83%	21%
SRR3200466	SRX1610319	SRP071078	SAMN04498524	162,679,594	82%	16%
SRR3218715	SRX1625945	SRP071078	SAMN04498525	104,390,084	72%	12%
SRR3200473	SRX1610326	SRP071078	SAMN04498526	94,438,504	79%	20%
SRR3200462	SRX1610315	SRP071078	SAMN04498527	160,548,060	82%	25%
SRR3218718	SRX1625949	SRP071078	SAMN04498528	131,799,730	83%	18%
SRR3200472	SRX1610325	SRP071078	SAMN04498529	96,748,380	81%	16%
SRR3200452	SRX1610305	SRP071078	SAMN04498530	110,467,612	84%	19%
SRR3200453	SRX1610306	SRP071078	SAMN04498531	132,973,642	86%	29%
SRR3200456	SRX1610309	SRP071078	SAMN04498532	128,178,488	81%	17%
SRR3200461	SRX1610314	SRP071078	SAMN04498533	119,882,660	82%	18%
SRR3218717	SRX1625948	SRP071078	SAMN04498534	113,655,162	77%	19%
SRR3200459	SRX1610312	SRP071078	SAMN04498535	72,193,490	79%	21%
SRR3200464	SRX1610317	SRP071078	SAMN04498536	122,891,208	76%	15%
SRR3200468	SRX1610321	SRP071078	SAMN04498537	123,289,946	81%	20%
SRR3200457	SRX1610310	SRP071078	SAMN04498538	108,620,542	87%	23%
SRR3200470	SRX1610323	SRP071078	SAMN04498539	133,555,718	83%	20%
SRR3200465	SRX1610318	SRP071078	SAMN04498540	117,242,720	81%	13%
SRR3200460	SRX1610313	SRP071078	SAMN04498541	120,590,350	82%	19%
SRR3218714	SRX1625944	SRP071078	SAMN04498542	121,451,690	82%	18%
SRR3200454	SRX1610307	SRP071078	SAMN04498543	123,699,138	85%	23%
SRR3200458	SRX1610311	SRP071078	SAMN04498544	103,269,448	79%	15%
SRR3218716	SRX1625946	SRP071078	SAMN04498545	85,691,628	82%	20%
SRR3218712	SRX1625943	SRP071078	SAMN04498546	155,642,984	82%	20%
SRR3200467	SRX1610320	SRP071078	SAMN04498547	117,414,704	81%	19%
SRR3200463	SRX1610316	SRP071078	SAMN04498548	149,283,588	83%	16%

Protein alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Carnivora GenBank	4,625	4,499 (97.28%)	4,499 (97.28%)	78.88%	88.27%
Carnivora known RefSeq (NP_)	2,369	2,340 (98.78%)	2,340 (98.78%)	77.97%	88.70%
Homo sapiens known RefSeq (NP_)	44,898	44,042 (98.09%)	44,042 (98.09%)	76.17%	83.60%
Felis catus high-quality model RefSeq (XP_)	11,000	10,944 (99.49%)	10,944 (99.49%)	81.00%	89.01%
Same-species GenBank	8	8 (100.00%)	8 (100.00%)	90.15%	92.76%

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20

RefSeq

Integrated reference sequences