NCBI Pan paniscus Annotation Release GCF_029289425.1-RS_2023_04

The genome sequence records for Pan paniscus RefSeq assembly GCF_029289425.1 (NHGRI_mPanPan1-v1.1-0.1.freeze_pri) were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
BUSCO results: Annotation completeness assessed with BUSCO
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction
Similarity of current and previous assembly: The similarity of the current and previous assembly
Comparison of the current and previous annotations: What proportion of the genes changed in this annotation

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as "GCF_029289425.1-RS_2023_04".

Date of Entrez queries for transcripts and proteins: Apr 8 2023
Date of submission of annotation to the public databases: Apr 18 2023
Software version: 10.1

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
NHGRI_mPanPan1-v1.1-0.1.freeze_pri	GCF_029289425.1	National Human Genome Research Institute, National Institutes of Health	03-20-2023	Reference	25 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	NHGRI_mPanPan1-v1.1-0.1.freeze_pri
Genes and pseudogenes	37,628
protein-coding	22,625
non-coding	7,961
Transcribed pseudogenes	12
Non-transcribed pseudogenes	6,827
genes with variants	13,202
Immunoglobulin/T-cell receptor gene segments	158
other	45
mRNAs	72,209
fully-supported	71,321
with > 5% ab initio	455
partial	139
with filled gap(s)	0
known RefSeq (NM_)	49
model RefSeq (XM_)	72,160
non-coding RNAs	13,137
fully-supported	8,591
with > 5% ab initio	0
partial	1
with filled gap(s)	0
known RefSeq (NR_)	166
model RefSeq (XR_)	12,470
pseudo transcripts	13
fully-supported	12
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	3
model RefSeq (XR_)	10
CDSs	72,367
fully-supported	71,321
with > 5% ab initio	600
partial	139
with major correction(s)	922
known RefSeq (NP_)	49
model RefSeq (XP_)	72,160

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	30,631	47,101	13,668	46	3,238,117
All transcripts	85,346	4,218	3,375	18	109,647
mRNA	72,209	4,458	3,591	99	109,647
misc_RNA	3,577	4,288	3,239	186	41,393
miRNA	170	22	22	18	25
tRNA	497	74	73	71	87
lncRNA	4,858	4,053	2,694	135	66,377
snoRNA	1,146	111	104	46	330
snRNA	1,545	110	107	59	199
rRNA	1,299	2,092	1,869	119	5,148
Single-exon transcripts	2,798	2,615	1,502	99	42,344
coding transcripts (NM_/XM_ )	2,788	2,617	1,506	99	42,344
non-coding transcripts (NR_/XR_ )	10	1,900	1,387	693	6,049
CDSs	72,209	2,013	1,509	96	106,206
Exons	286,276	501	145	1	46,340
in coding transcripts (NM_/XM_ )	266,789	458	143	1	42,835
in non-coding transcripts (NR_/XR_ )	33,416	731	150	10	46,340
Introns	250,191	7,826	1,798	30	1,160,458
in coding transcripts (NM_/XM_ )	236,055	7,512	1,753	30	1,160,458
in non-coding transcripts (NR_/XR_ )	27,809	9,733	2,193	31	472,956

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	2.81	1	1	50
Number of exons per transcript	11.79	9	1	348

BUSCO analysis of gene annotation

BUSCO v4.1.4 was run in "protein" mode on the annotated gene set picking one longest protein per gene, and run using the primates_odb10 lineage dataset. Results are reported for the gene set from the primary assembly unit, and presented in BUSCO notation.

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the UniProtKB/Swiss-Prot curated proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 22625 coding genes, 21974 genes had a protein with an alignment covering 50% or more of the query and 19256 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: UniProtKB/Swiss-Prot curated proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker (if calculated), for each assembly. RepeatMasker results are only calculated for organisms with complete Dfam HMM model collections.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with WindowMasker
NHGRI_mPanPan1-v1.1-0.1.freeze_pri	GCF_029289425.1	43.22%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez Nucleotide, Entrez Protein, and SRA, and aligned to the genome.

Transcript alignments

The alignments of the following transcripts with Splign were used for gene prediction:

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species known RefSeq (NM_/NR_)	218	218 (100.00%)	218 (100.00%)	99.97%	100.00%
Same-species Genbank	192	183 (95.31%)	170 (88.54%)	99.56%	99.65%

RNA-Seq alignments

The alignments of the following RNA-Seq reads with STAR were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Publication	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	NA	Aggregate of all aligned samples	4,212,026,048	56%	23%	291,336
SAMEA5858368	NA	prefrontal cortex (Pan paniscus, 15, male, SAMEA5858368)	31,005,746	39%	30%	170,748
SAMEA5858369	NA	prefrontal cortex (Pan paniscus, 15, male, SAMEA5858369)	30,117,640	48%	31%	176,988
SAMEA5858370	NA	prefrontal cortex (Pan paniscus, 15, male, SAMEA5858370)	35,294,986	46%	30%	177,973
SAMEA5858371	NA	prefrontal cortex (Pan paniscus, 15, male, SAMEA5858371)	33,354,856	45%	30%	175,822
SAMEA5858372	NA	prefrontal cortex (Pan paniscus, 15, male, SAMEA5858372)	34,463,786	46%	32%	178,643
SAMEA5858373	NA	prefrontal cortex (Pan paniscus, 15, male, SAMEA5858373)	33,140,374	46%	31%	176,724
SAMEA5858374	NA	prefrontal cortex (Pan paniscus, 15, male, SAMEA5858374)	36,317,322	47%	31%	180,859
SAMEA5858375	NA	prefrontal cortex (Pan paniscus, 15, male, SAMEA5858375)	29,489,318	47%	31%	173,036
SAMEA5858376	NA	prefrontal cortex (Pan paniscus, 15, male, SAMEA5858376)	32,149,328	45%	31%	173,556
SAMEA5858377	NA	prefrontal cortex (Pan paniscus, 15, male, SAMEA5858377)	29,288,332	47%	30%	173,636
SAMN00632220	22012392	Brain, prefrontal cortex (Pan paniscus, Female, SAMN00632220)	34,332,540	68%	22%	179,274
SAMN02189949	24153179,27487209,28430982	iPSC, (Pan paniscus, SAMN02189949)	30,297,954	89%	28%	158,583
SAMN02189950	24153179,27487209,28430982	iPSC, (Pan paniscus, SAMN02189950)	36,053,276	89%	29%	169,787
SAMN02189951	24153179,27487209,28430982	iPSC, (Pan paniscus, SAMN02189951)	145,284,828	89%	30%	206,213
SAMN02189952	24153179,27487209,28430982	iPSC, (Pan paniscus, SAMN02189952)	27,397,776	86%	28%	155,287
SAMN04546399	NA	placenta (Pan paniscus, SAMN04546399)	87,082,582	22%	25%	115,209
SAMN11165805	32424074	BA28: 1 Supramarginal (BA40) (Pan paniscus, SAMN11165805)	38,621,270	61%	27%	179,378
SAMN11165806	32424074	BC22: 30 Cerebellar Grey Matter (Pan paniscus, SAMN11165806)	31,021,648	50%	14%	137,033
SAMN11165807	32424074	BB22: 30 Cerebellar Grey Matter (Pan paniscus, SAMN11165807)	33,122,852	63%	21%	161,694
SAMN11165808	32424074	BA22: 30 Cerebellar Grey Matter (Pan paniscus, SAMN11165808)	36,865,746	63%	29%	181,872
SAMN11165809	32424074	BC20: 28 Cerebellar White Matter (Pan paniscus, SAMN11165809)	32,704,030	45%	18%	127,092
SAMN11165810	32424074	BB20: 28 Cerebellar White Matter (Pan paniscus, SAMN11165810)	28,975,830	60%	13%	134,116
SAMN11165811	32424074	BA20: 28 Cerebellar White Matter (Pan paniscus, SAMN11165811)	36,108,548	62%	23%	162,900
SAMN11165812	32424074	BC18: 22 Hypothalamus (Pan paniscus, SAMN11165812)	37,921,460	37%	21%	141,785
SAMN11165813	32424074	BB18: 22 Hypothalamus (Pan paniscus, SAMN11165813)	30,626,882	52%	26%	169,612
SAMN11165814	32424074	BA18: 22 Hypothalamus (Pan paniscus, SAMN11165814)	42,344,838	62%	21%	184,593
SAMN11165815	32424074	BC17: 23 Thalamus (Pan paniscus, SAMN11165815)	31,020,684	37%	21%	136,433
SAMN11165816	32424074	BB17: 23 Thalamus (Pan paniscus, SAMN11165816)	39,380,074	69%	26%	174,942
SAMN11165817	32424074	BA17: 23 Thalamus (Pan paniscus, SAMN11165817)	33,048,394	61%	24%	169,741
SAMN11165818	32424074	BC16: 25 Corpus Callosum Posterior (Pan paniscus, SAMN11165818)	34,309,100	52%	18%	139,347
SAMN11165819	32424074	BB16: 25 Corpus Callosum Posterior (Pan paniscus, SAMN11165819)	42,593,130	58%	14%	130,906
SAMN11165820	32424074	BA16: 25 Corpus Callosum Posterior (Pan paniscus, SAMN11165820)	27,547,936	56%	13%	124,998
SAMN11165821	32424074	BC15: 24 Corpus Callosum Anterior (Pan paniscus, SAMN11165821)	36,029,820	49%	14%	146,290
SAMN11165822	32424074	BB15: 24 Corpus Callosum Anterior (Pan paniscus, SAMN11165822)	32,282,048	62%	20%	152,675
SAMN11165823	32424074	BA15: 24 Corpus Callosum Anterior (Pan paniscus, SAMN11165823)	48,459,588	63%	25%	191,313
SAMN11165824	32424074	BC14: 14 Inferior Temporal (BA20) (Pan paniscus, SAMN11165824)	38,775,898	37%	16%	127,770
SAMN11165825	32424074	BB14: 14 Inferior Temporal (BA20) (Pan paniscus, SAMN11165825)	27,476,696	64%	27%	160,830
SAMN11165826	32424074	BA14: 14 Inferior Temporal (BA20) (Pan paniscus, SAMN11165826)	29,131,670	55%	24%	160,557
SAMN11165827	32424074	BC12: 2 2ary Auditory (BA22) (Pan paniscus, SAMN11165827)	31,929,280	51%	11%	128,502
SAMN11165828	32424074	BB12: 2 2ary Auditory (BA22) (Pan paniscus, SAMN11165828)	32,996,960	65%	24%	167,095
SAMN11165829	32424074	BA12: 2 2ary Auditory (BA22) (Pan paniscus, SAMN11165829)	94,272,626	59%	25%	201,310
SAMN11165830	32424074	BC11: 18 Ventrolateral Prefrontal (BA44) (Pan paniscus, SAMN11165830)	34,294,306	38%	14%	123,848
SAMN11165831	32424074	BB11: 18 Ventrolateral Prefrontal (BA44) (Pan paniscus, SAMN11165831)	36,007,136	59%	23%	159,656
SAMN11165832	32424074	BA11: 18 Ventrolateral Prefrontal (BA44) (Pan paniscus, SAMN11165832)	34,931,774	60%	28%	177,293
SAMN11165910	32424074	BB4: 12 Cingulate Anterior (BA32) (Pan paniscus, SAMN11165910)	35,445,860	66%	20%	162,176
SAMN11165911	32424074	BA4: 12 Cingulate Anterior (BA32) (Pan paniscus, SAMN11165911)	35,457,004	64%	28%	180,351
SAMN11165912	32424074	BC3: 17 Orbitofrontal (BA11) (Pan paniscus, SAMN11165912)	34,525,428	43%	19%	138,218
SAMN11165913	32424074	BB3: 17 Orbitofrontal (BA11) (Pan paniscus, SAMN11165913)	32,092,890	65%	24%	172,328
SAMN11165914	32424074	BA3: 17 Orbitofrontal (BA11) (Pan paniscus, SAMN11165914)	33,829,448	63%	27%	177,495
SAMN11165915	32424074	BC2: 16 Dorsolateral Prefrontal (BA9) (Pan paniscus, SAMN11165915)	39,231,050	41%	15%	135,721
SAMN11165916	32424074	BB2: 16 Dorsolateral Prefrontal (BA9) (Pan paniscus, SAMN11165916)	28,394,546	60%	22%	149,533
SAMN11165917	32424074	BA2: 16 Dorsolateral Prefrontal (BA9) (Pan paniscus, SAMN11165917)	39,336,866	61%	29%	182,236
SAMN11165918	32424074	BC1: 13 Prefrontal (BA10) (Pan paniscus, SAMN11165918)	29,974,768	49%	19%	138,049
SAMN11165919	32424074	BB1: 13 Prefrontal (BA10) (Pan paniscus, SAMN11165919)	39,871,132	67%	14%	160,741
SAMN11165920	32424074	BA1: 13 Prefrontal (BA10) (Pan paniscus, SAMN11165920)	38,212,944	61%	27%	187,156
SAMN11165921	32424074	BC10: 10 2ary Visual (BA18/19) (Pan paniscus, SAMN11165921)	40,896,792	45%	9%	116,989
SAMN11165922	32424074	BB10: 10 2ary Visual (BA18/19) (Pan paniscus, SAMN11165922)	32,703,660	68%	23%	163,950
SAMN11165923	32424074	BA10: 10 2ary Visual (BA18/19) (Pan paniscus, SAMN11165923)	37,210,998	62%	26%	174,681
SAMN11165924	32424074	BC9: 9 1ary Visual (BA17) (Pan paniscus, SAMN11165924)	31,228,482	41%	13%	115,055
SAMN11165925	32424074	BB9: 9 1ary Visual (BA17) (Pan paniscus, SAMN11165925)	27,141,614	65%	23%	161,211
SAMN11165926	32424074	BA9: 9 1ary Visual (BA17) (Pan paniscus, SAMN11165926)	37,499,828	60%	26%	174,068
SAMN11165927	32424074	BC8: 4 Cingulate Posterior (BA31) (Pan paniscus, SAMN11165927)	42,349,424	44%	14%	141,550
SAMN11165928	32424074	BB8: 4 Cingulate Posterior (BA31) (Pan paniscus, SAMN11165928)	35,741,454	65%	24%	168,096
SAMN11165929	32424074	BA8: 4 Cingulate Posterior (BA31) (Pan paniscus, SAMN11165929)	31,403,600	57%	22%	151,689
SAMN11165930	32424074	BC7: 3 Precuneus (BA7) (Pan paniscus, SAMN11165930)	35,997,994	29%	11%	105,852
SAMN11165931	32424074	BB7: 3 Precuneus (BA7) (Pan paniscus, SAMN11165931)	43,109,516	65%	22%	177,880
SAMN11165932	32424074	BA7: 3 Precuneus (BA7) (Pan paniscus, SAMN11165932)	36,294,670	63%	26%	176,827
SAMN11165933	32424074	BC6: 5 Premotor (BA6) (Pan paniscus, SAMN11165933)	30,570,898	40%	12%	115,594
SAMN11165934	32424074	BB6: 5 Premotor (BA6) (Pan paniscus, SAMN11165934)	28,597,274	58%	17%	111,002
SAMN11165935	32424074	BA6: 5 Premotor (BA6) (Pan paniscus, SAMN11165935)	36,040,088	64%	26%	176,721
SAMN11165936	32424074	BC5: 11 Cingulate Anterior (BA24) (Pan paniscus, SAMN11165936)	37,250,598	49%	15%	143,177
SAMN11165937	32424074	BB5: 11 Cingulate Anterior (BA24) (Pan paniscus, SAMN11165937)	31,598,080	66%	26%	161,538
SAMN11165938	32424074	BA5: 11 Cingulate Anterior (BA24) (Pan paniscus, SAMN11165938)	32,314,680	59%	26%	171,619
SAMN11165939	32424074	BC4: 12 Cingulate Anterior (BA32) (Pan paniscus, SAMN11165939)	34,885,490	37%	16%	122,035
SAMN11165940	32424074	BC28: 1 Supramarginal (BA40) (Pan paniscus, SAMN11165940)	34,196,784	49%	14%	136,535
SAMN11165941	32424074	BB28: 1 Supramarginal (BA40) (Pan paniscus, SAMN11165941)	37,275,136	68%	27%	176,989
SAMN11165942	32424074	BC29: 8 1ary Motor (BA4) (Pan paniscus, SAMN11165942)	34,656,112	49%	16%	144,270
SAMN11165943	32424074	BB29: 8 1ary Motor (BA4) (Pan paniscus, SAMN11165943)	37,804,724	60%	24%	170,684
SAMN11165944	32424074	BA29: 8 1ary Motor (BA4) (Pan paniscus, SAMN11165944)	39,064,654	63%	25%	178,240
SAMN11165972	32424074	BC41: 33 Nucleus Accumbens (Pan paniscus, SAMN11165972)	36,975,384	38%	18%	132,051
SAMN11165973	32424074	BB41: 33 Nucleus Accumbens (Pan paniscus, SAMN11165973)	36,005,418	66%	30%	184,117
SAMN11165974	32424074	BA41: 33 Nucleus Accumbens (Pan paniscus, SAMN11165974)	30,274,848	60%	29%	171,830
SAMN11165975	32424074	BC40: 27 Globus Pallidus (Pan paniscus, SAMN11165975)	49,065,672	31%	18%	124,665
SAMN11165976	32424074	BB40: 27 Globus Pallidus (Pan paniscus, SAMN11165976)	34,222,784	61%	25%	166,124
SAMN11165977	32424074	BA40: 27 Globus Pallidus (Pan paniscus, SAMN11165977)	35,633,504	59%	23%	162,743
SAMN11165978	32424074	BC39: 26 Internal Capsule (Pan paniscus, SAMN11165978)	42,235,262	46%	17%	145,086
SAMN11165979	32424074	BB39: 26 Internal Capsule (Pan paniscus, SAMN11165979)	34,888,660	61%	27%	162,498
SAMN11165980	32424074	BA39: 26 Internal Capsule (Pan paniscus, SAMN11165980)	36,833,390	60%	20%	148,521
SAMN11165981	32424074	BC38: 31 Putamen (Pan paniscus, SAMN11165981)	36,234,122	34%	14%	123,496
SAMN11165982	32424074	BB38: 31 Putamen (Pan paniscus, SAMN11165982)	39,513,378	63%	25%	173,100
SAMN11165983	32424074	BA38: 31 Putamen (Pan paniscus, SAMN11165983)	34,102,552	57%	27%	166,114
SAMN11165984	32424074	BC37: 32 Caudate (Pan paniscus, SAMN11165984)	33,006,208	32%	16%	126,914
SAMN11165985	32424074	BB37: 32 Caudate (Pan paniscus, SAMN11165985)	29,011,398	66%	26%	173,876
SAMN11165986	32424074	BA37: 32 Caudate (Pan paniscus, SAMN11165986)	32,276,942	58%	25%	162,899
SAMN11165987	32424074	BC36: 20 Entorhinal Cortex (Pan paniscus, SAMN11165987)	42,395,038	28%	14%	114,061
SAMN11165988	32424074	BB36: 20 Entorhinal Cortex (Pan paniscus, SAMN11165988)	35,502,796	65%	28%	170,192
SAMN11165989	32424074	BA36: 20 Entorhinal Cortex (Pan paniscus, SAMN11165989)	34,034,130	60%	25%	165,365
SAMN11165990	32424074	BC33: 21 Hippocampus (Pan paniscus, SAMN11165990)	36,319,576	34%	12%	113,444
SAMN11165991	32424074	BB33: 21 Hippocampus (Pan paniscus, SAMN11165991)	31,855,884	67%	26%	179,640
SAMN11165992	32424074	BA33: 21 Hippocampus (Pan paniscus, SAMN11165992)	38,928,648	57%	26%	173,240
SAMN11165993	32424074	BC32: 15 Insular Posterior Cortex (Pan paniscus, SAMN11165993)	37,269,032	42%	11%	120,808
SAMN11165994	32424074	BB32: 15 Insular Posterior Cortex (Pan paniscus, SAMN11165994)	39,232,794	63%	26%	170,840
SAMN11165995	32424074	BA32: 15 Insular Posterior Cortex (Pan paniscus, SAMN11165995)	39,409,078	63%	28%	177,685
SAMN11165996	32424074	BC31: 7 1ary Auditory (BA41/42) (Pan paniscus, SAMN11165996)	35,217,380	43%	15%	120,628
SAMN11165997	32424074	BB31: 7 1ary Auditory (BA41/42) (Pan paniscus, SAMN11165997)	42,760,922	60%	22%	164,005
SAMN11165998	32424074	BA31: 7 1ary Auditory (BA41/42) (Pan paniscus, SAMN11165998)	35,072,304	64%	28%	174,089
SAMN11165999	32424074	BC30: 6 1ary Somatosensory (BA3/1/2) (Pan paniscus, SAMN11165999)	40,688,976	49%	16%	150,323
SAMN11166000	32424074	BB30: 6 1ary Somatosensory (BA3/1/2) (Pan paniscus, SAMN11166000)	30,985,612	67%	29%	170,641
SAMN11166001	32424074	BA30: 6 1ary Somatosensory (BA3/1/2) (Pan paniscus, SAMN11166001)	35,001,002	61%	24%	172,664
SAMN11166027	32424074	BC43: 29 Substantia Nigra (Pan paniscus, SAMN11166027)	26,737,420	34%	9%	82,594
SAMN11166028	32424074	BA43: 29 Substantia Nigra (Pan paniscus, SAMN11166028)	37,593,362	60%	27%	170,191
SAMN11166029	32424074	BC42: 19 Amygdala (Pan paniscus, SAMN11166029)	27,847,582	31%	11%	93,634
SAMN11166030	32424074	BB42: 19 Amygdala (Pan paniscus, SAMN11166030)	37,197,124	63%	27%	169,882
SAMN11166031	32424074	BA42: 19 Amygdala (Pan paniscus, SAMN11166031)	37,622,512	60%	22%	167,419

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
ERR3473997	ERX3495639	ERP116771	SAMEA5858368	31,005,746	39%	30%
ERR3473998	ERX3495640	ERP116771	SAMEA5858369	30,117,640	48%	31%
ERR3473999	ERX3495641	ERP116771	SAMEA5858370	35,294,986	46%	30%
ERR3474000	ERX3495642	ERP116771	SAMEA5858371	33,354,856	45%	30%
ERR3474001	ERX3495643	ERP116771	SAMEA5858372	34,463,786	46%	32%
ERR3474002	ERX3495644	ERP116771	SAMEA5858373	33,140,374	46%	31%
ERR3474003	ERX3495645	ERP116771	SAMEA5858374	36,317,322	47%	31%
ERR3474004	ERX3495646	ERP116771	SAMEA5858375	29,489,318	47%	31%
ERR3474005	ERX3495647	ERP116771	SAMEA5858376	32,149,328	45%	31%
ERR3474006	ERX3495648	ERP116771	SAMEA5858377	29,288,332	47%	30%
SRR306826	SRX081970	SRP007412	SAMN00632220	34,332,540	68%	22%
SRR873626	SRX290735	SRP023550	SAMN02189949	30,297,954	89%	28%
SRR873627	SRX290736	SRP023550	SAMN02189950	36,053,276	89%	29%
SRR873628	SRX290737	SRP023550	SAMN02189951	145,284,828	89%	30%
SRR873629	SRX290738	SRP023550	SAMN02189952	27,397,776	86%	28%
SRR3222427	SRX1629202	SRP071663	SAMN04546399	87,082,582	22%	25%
SRR8750596	SRX5541530	SRP188822	SAMN11165805	38,621,270	61%	27%
SRR8750595	SRX5541529	SRP188822	SAMN11165806	31,021,648	50%	14%
SRR8750452	SRX5541528	SRP188822	SAMN11165807	33,122,852	63%	21%
SRR8750451	SRX5541527	SRP188822	SAMN11165808	36,865,746	63%	29%
SRR8750450	SRX5541526	SRP188822	SAMN11165809	32,704,030	45%	18%
SRR8750449	SRX5541525	SRP188822	SAMN11165810	28,975,830	60%	13%
SRR8750448	SRX5541524	SRP188822	SAMN11165811	36,108,548	62%	23%
SRR8750447	SRX5541523	SRP188822	SAMN11165812	37,921,460	37%	21%
SRR8750446	SRX5541522	SRP188822	SAMN11165813	30,626,882	52%	26%
SRR8750445	SRX5541521	SRP188822	SAMN11165814	42,344,838	62%	21%
SRR8750444	SRX5541520	SRP188822	SAMN11165815	31,020,684	37%	21%
SRR8750443	SRX5541519	SRP188822	SAMN11165816	39,380,074	69%	26%
SRR8750442	SRX5541518	SRP188822	SAMN11165817	33,048,394	61%	24%
SRR8750441	SRX5541517	SRP188822	SAMN11165818	34,309,100	52%	18%
SRR8750440	SRX5541516	SRP188822	SAMN11165819	42,593,130	58%	14%
SRR8750439	SRX5541515	SRP188822	SAMN11165820	27,547,936	56%	13%
SRR8750438	SRX5541514	SRP188822	SAMN11165821	36,029,820	49%	14%
SRR8750437	SRX5541513	SRP188822	SAMN11165822	32,282,048	62%	20%
SRR8750436	SRX5541512	SRP188822	SAMN11165823	48,459,588	63%	25%
SRR8750435	SRX5541511	SRP188822	SAMN11165824	38,775,898	37%	16%
SRR8750434	SRX5541510	SRP188822	SAMN11165825	27,476,696	64%	27%
SRR8750433	SRX5541509	SRP188822	SAMN11165826	29,131,670	55%	24%
SRR8750432	SRX5541508	SRP188822	SAMN11165827	31,929,280	51%	11%
SRR8750431	SRX5541507	SRP188822	SAMN11165828	32,996,960	65%	24%
SRR8750430	SRX5541506	SRP188822	SAMN11165829	94,272,626	59%	25%
SRR8750429	SRX5541505	SRP188822	SAMN11165830	34,294,306	38%	14%
SRR8750428	SRX5541504	SRP188822	SAMN11165831	36,007,136	59%	23%
SRR8750427	SRX5541503	SRP188822	SAMN11165832	34,931,774	60%	28%
SRR8750407	SRX5541333	SRP188822	SAMN11165910	35,445,860	66%	20%
SRR8750406	SRX5541332	SRP188822	SAMN11165911	35,457,004	64%	28%
SRR8750405	SRX5541331	SRP188822	SAMN11165912	34,525,428	43%	19%
SRR8750404	SRX5541330	SRP188822	SAMN11165913	32,092,890	65%	24%
SRR8750403	SRX5541329	SRP188822	SAMN11165914	33,829,448	63%	27%
SRR8750402	SRX5541328	SRP188822	SAMN11165915	39,231,050	41%	15%
SRR8750401	SRX5541327	SRP188822	SAMN11165916	28,394,546	60%	22%
SRR8750400	SRX5541326	SRP188822	SAMN11165917	39,336,866	61%	29%
SRR8750399	SRX5541325	SRP188822	SAMN11165918	29,974,768	49%	19%
SRR8750398	SRX5541324	SRP188822	SAMN11165919	39,871,132	67%	14%
SRR8750397	SRX5541323	SRP188822	SAMN11165920	38,212,944	61%	27%
SRR8750426	SRX5541502	SRP188822	SAMN11165921	40,896,792	45%	9%
SRR8750425	SRX5541501	SRP188822	SAMN11165922	32,703,660	68%	23%
SRR8750424	SRX5541500	SRP188822	SAMN11165923	37,210,998	62%	26%
SRR8750423	SRX5541499	SRP188822	SAMN11165924	31,228,482	41%	13%
SRR8750422	SRX5541498	SRP188822	SAMN11165925	27,141,614	65%	23%
SRR8750421	SRX5541497	SRP188822	SAMN11165926	37,499,828	60%	26%
SRR8750420	SRX5541496	SRP188822	SAMN11165927	42,349,424	44%	14%
SRR8750419	SRX5541495	SRP188822	SAMN11165928	35,741,454	65%	24%
SRR8750418	SRX5541494	SRP188822	SAMN11165929	31,403,600	57%	22%
SRR8750417	SRX5541493	SRP188822	SAMN11165930	35,997,994	29%	11%
SRR8750416	SRX5541492	SRP188822	SAMN11165931	43,109,516	65%	22%
SRR8750415	SRX5541491	SRP188822	SAMN11165932	36,294,670	63%	26%
SRR8750414	SRX5541490	SRP188822	SAMN11165933	30,570,898	40%	12%
SRR8750413	SRX5541489	SRP188822	SAMN11165934	28,597,274	58%	17%
SRR8750412	SRX5541338	SRP188822	SAMN11165935	36,040,088	64%	26%
SRR8750411	SRX5541337	SRP188822	SAMN11165936	37,250,598	49%	15%
SRR8750410	SRX5541336	SRP188822	SAMN11165937	31,598,080	66%	26%
SRR8750409	SRX5541335	SRP188822	SAMN11165938	32,314,680	59%	26%
SRR8750408	SRX5541334	SRP188822	SAMN11165939	34,885,490	37%	16%
SRR8750598	SRX5541532	SRP188822	SAMN11165940	34,196,784	49%	14%
SRR8750597	SRX5541531	SRP188822	SAMN11165941	37,275,136	68%	27%
SRR8750601	SRX5541535	SRP188822	SAMN11165942	34,656,112	49%	16%
SRR8750600	SRX5541534	SRP188822	SAMN11165943	37,804,724	60%	24%
SRR8750599	SRX5541533	SRP188822	SAMN11165944	39,064,654	63%	25%
SRR8750631	SRX5541565	SRP188822	SAMN11165972	36,975,384	38%	18%
SRR8750630	SRX5541564	SRP188822	SAMN11165973	36,005,418	66%	30%
SRR8750629	SRX5541563	SRP188822	SAMN11165974	30,274,848	60%	29%
SRR8750628	SRX5541562	SRP188822	SAMN11165975	49,065,672	31%	18%
SRR8750627	SRX5541561	SRP188822	SAMN11165976	34,222,784	61%	25%
SRR8750626	SRX5541560	SRP188822	SAMN11165977	35,633,504	59%	23%
SRR8750625	SRX5541559	SRP188822	SAMN11165978	42,235,262	46%	17%
SRR8750624	SRX5541558	SRP188822	SAMN11165979	34,888,660	61%	27%
SRR8750623	SRX5541557	SRP188822	SAMN11165980	36,833,390	60%	20%
SRR8750622	SRX5541556	SRP188822	SAMN11165981	36,234,122	34%	14%
SRR8750621	SRX5541555	SRP188822	SAMN11165982	39,513,378	63%	25%
SRR8750620	SRX5541554	SRP188822	SAMN11165983	34,102,552	57%	27%
SRR8750619	SRX5541553	SRP188822	SAMN11165984	33,006,208	32%	16%
SRR8750618	SRX5541552	SRP188822	SAMN11165985	29,011,398	66%	26%
SRR8750617	SRX5541551	SRP188822	SAMN11165986	32,276,942	58%	25%
SRR8750616	SRX5541550	SRP188822	SAMN11165987	42,395,038	28%	14%
SRR8750615	SRX5541549	SRP188822	SAMN11165988	35,502,796	65%	28%
SRR8750614	SRX5541548	SRP188822	SAMN11165989	34,034,130	60%	25%
SRR8750613	SRX5541547	SRP188822	SAMN11165990	36,319,576	34%	12%
SRR8750612	SRX5541546	SRP188822	SAMN11165991	31,855,884	67%	26%
SRR8750611	SRX5541545	SRP188822	SAMN11165992	38,928,648	57%	26%
SRR8750610	SRX5541544	SRP188822	SAMN11165993	37,269,032	42%	11%
SRR8750609	SRX5541543	SRP188822	SAMN11165994	39,232,794	63%	26%
SRR8750608	SRX5541542	SRP188822	SAMN11165995	39,409,078	63%	28%
SRR8750607	SRX5541541	SRP188822	SAMN11165996	35,217,380	43%	15%
SRR8750606	SRX5541540	SRP188822	SAMN11165997	42,760,922	60%	22%
SRR8750605	SRX5541539	SRP188822	SAMN11165998	35,072,304	64%	28%
SRR8750604	SRX5541538	SRP188822	SAMN11165999	40,688,976	49%	16%
SRR8750603	SRX5541537	SRP188822	SAMN11166000	30,985,612	67%	29%
SRR8750602	SRX5541536	SRP188822	SAMN11166001	35,001,002	61%	24%
SRR8750636	SRX5541570	SRP188822	SAMN11166027	26,737,420	34%	9%
SRR8750635	SRX5541569	SRP188822	SAMN11166028	37,593,362	60%	27%
SRR8750634	SRX5541568	SRP188822	SAMN11166029	27,847,582	31%	11%
SRR8750633	SRX5541567	SRP188822	SAMN11166030	37,197,124	63%	27%
SRR8750632	SRX5541566	SRP188822	SAMN11166031	37,622,512	60%	22%

SRA Long Read Alignment Statistics

The alignments of the following long RNA-Seq reads (PacBio, Oxford Nanopore, 454, or other long-read sequencing technologies) from the Sequence Read Archive with minimap2 were used for gene prediction:

Run	Sample	Number of reads	Number (%) of sequences aligned by Minimap2	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
All	NA	3842328	3838106 (99.89%)	3377400 (87.89%)	99.52	99.3
SRR22306523	SAMN31746400	3804083	3802251 (99.95%)	3347999 (88.01%)	99.53	99.34
SRR22838391	SAMN13178638	20060	18764 (93.53%)	15345 (76.49%)	99.28	96.18
SRR22838392	SAMN13178638	18185	17091 (93.98%)	14056 (77.29%)	99.32	96.23

Protein alignments

The alignments of the following proteins with ProSplign were used for gene prediction:

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Primates GenBank	25,127	24,603 (97.91%)	24,603 (97.91%)	82.91%	91.68%
Primates known RefSeq (NP_)	12,541	12,261 (97.77%)	12,261 (97.77%)	87.47%	94.03%
Same-species GenBank	79	79 (100.00%)	79 (100.00%)	84.29%	96.49%
Same-species known RefSeq (NP_)	49	46 (93.88%)	46 (93.88%)	87.21%	95.21%
Homo sapiens GenBank	144,465	138,067 (95.57%)	138,067 (95.57%)	82.91%	86.10%
Homo sapiens known RefSeq (NP_)	66,908	64,862 (96.94%)	64,862 (96.94%)	89.62%	92.66%

Assembly-assembly alignments of current to previous assembly

When the assembly changes between two rounds of annotation, genes in the current and the previous annotation are mapped to each other using the genomic alignments of the current assembly to the previous assembly so that gene identifiers can be preserved. The success of the remapping depends largely on how well the two assembly versions align to each other.

Below are the percent coverage of one assembly by the other and the average percent identity of the alignments. The 'First pass' alignments are reciprocal best hits, while the 'Total' alignments also include 'Second pass' or non-reciprocal best alignments. For more information about the assembly-assembly alignment process, please visit the NCBI Genome Remapping Service page.

First Pass	Total
NHGRI_mPanPan1-v1.1-0.1.freeze_pri (Current) Coverage: 86.09%	NHGRI_mPanPan1-v1.1-0.1.freeze_pri (Current) Coverage: 88.11%
Mhudiblu_PPA_v0 (Previous) Coverage: 93.38%	Mhudiblu_PPA_v0 (Previous) Coverage: 94.18%
Percent Identity: 99.10%	Percent Identity: 99.00%

Comparison of the current and previous annotations

The annotations produced for this release were compared to the annotations in the previous release for each assembly annotated in both releases. Scores for current and previous gene and transcript features were calculated based on overlap in exon sequence and matches in exon boundaries. Pairs of current and previous features were categorized based on these scores, whether they are reciprocal best matches, and changes in attributes (gene biotype, completeness, etc.). If the assembly was updated between the two releases, alignments between the current and the previous assembly were used to match the current and previous gene and transcript features in mapped regions.

The table below summarizes the changes in the gene set for each assembly as a percent of the number of genes in the current annotation release, and provides links to the details of the comparison in tabular format and in a Genome Workbench project.

	NHGRI_mPanPan1-v1.1-0.1.freeze_pri (Current) to Mhudiblu_PPA_v0 (Previous)
Identical	14%
Minor changes	51%
Major changes	17%
New	16%
Deprecated	18%
Other	1%
Download the report	tabular, Genome Workbench

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
BUSCO: Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. Molecular biology and evolution 2021.38(10):4647-4654
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20
STAR: Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. Bioinformatics 2013 Jan 1;29(1):15-21.
Minimap2: Li H. Bioinformatics 2018 Sep 15;34(18):3094-3100

RefSeq

Integrated reference sequences