NCBI Gorilla gorilla gorilla Annotation Release GCF_029281585.1-RS_2023_04

The genome sequence records for Gorilla gorilla gorilla RefSeq assembly GCF_029281585.1 (NHGRI_mGorGor1-v1.1-0.2.freeze_pri) were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
BUSCO results: Annotation completeness assessed with BUSCO
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction
Similarity of current and previous assembly: The similarity of the current and previous assembly
Comparison of the current and previous annotations: What proportion of the genes changed in this annotation

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as "GCF_029281585.1-RS_2023_04".

Date of Entrez queries for transcripts and proteins: Apr 14 2023
Date of submission of annotation to the public databases: Apr 21 2023
Software version: 10.1

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
NHGRI_mGorGor1-v1.1-0.2.freeze_pri	GCF_029281585.1	National Human Genome Research Institute, National Institutes of Health	03-20-2023	Reference	25 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	NHGRI_mGorGor1-v1.1-0.2.freeze_pri
Genes and pseudogenes	43,486
protein-coding	23,010
non-coding	13,255
Transcribed pseudogenes	637
Non-transcribed pseudogenes	6,387
genes with variants	14,877
Immunoglobulin/T-cell receptor gene segments	148
other	49
mRNAs	81,373
fully-supported	80,298
with > 5% ab initio	625
partial	95
with filled gap(s)	0
known RefSeq (NM_)	93
model RefSeq (XM_)	81,280
non-coding RNAs	19,697
fully-supported	15,417
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	363
model RefSeq (XR_)	18,860
pseudo transcripts	646
fully-supported	553
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	2
model RefSeq (XR_)	644
CDSs	81,521
fully-supported	80,298
with > 5% ab initio	761
partial	96
with major correction(s)	1,011
known RefSeq (NP_)	93
model RefSeq (XP_)	81,280

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	36,314	42,613	11,844	46	2,294,770
All transcripts	101,070	3,668	2,961	18	106,947
mRNA	81,373	4,070	3,334	51	106,947
misc_RNA	3,532	3,599	2,767	177	33,057
miRNA	379	22	22	18	25
tRNA	456	74	73	71	87
lncRNA	11,533	2,149	1,166	135	84,479
snoRNA	1,148	111	104	46	330
snRNA	1,602	111	107	59	200
rRNA	998	1,720	153	119	5,040
Single-exon transcripts	2,610	1,893	1,144	51	43,365
coding transcripts (NM_/XM_ )	2,594	1,897	1,145	51	43,365
non-coding transcripts (NR_/XR_ )	16	1,306	827	246	5,014
CDSs	81,373	2,078	1,539	51	105,651
Exons	311,710	413	144	1	78,815
in coding transcripts (NM_/XM_ )	273,632	383	142	1	43,365
in non-coding transcripts (NR_/XR_ )	52,324	514	151	4	78,815
Introns	271,479	7,714	1,809	30	1,262,629
in coding transcripts (NM_/XM_ )	244,132	7,379	1,754	30	1,174,457
in non-coding transcripts (NR_/XR_ )	41,228	9,163	2,150	30	1,262,629

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	2.8	1	1	50
Number of exons per transcript	11.68	9	1	344

BUSCO analysis of gene annotation

BUSCO v4.1.4 was run in "protein" mode on the annotated gene set picking one longest protein per gene, and run using the primates_odb10 lineage dataset. Results are reported for the gene set from the primary assembly unit, and presented in BUSCO notation.

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the UniProtKB/Swiss-Prot curated proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 23010 coding genes, 22074 genes had a protein with an alignment covering 50% or more of the query and 19387 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: UniProtKB/Swiss-Prot curated proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker (if calculated), for each assembly. RepeatMasker results are only calculated for organisms with complete Dfam HMM model collections.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with WindowMasker
NHGRI_mGorGor1-v1.1-0.2.freeze_pri	GCF_029281585.1	47.93%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez Nucleotide, Entrez Protein, and SRA, and aligned to the genome.

Transcript alignments

The alignments of the following transcripts with Splign were used for gene prediction:

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species known RefSeq (NM_/NR_)	462	460 (99.57%)	458 (99.13%)	99.91%	99.99%
Same-species Genbank	5	5 (100.00%)	4 (80.00%)	99.69%	100.00%
Homo sapiens known RefSeq (NM_/NR_)	87,541	87,337 (99.77%)	75,026 (85.70%)	98.62%	99.62%
Homo sapiens Genbank	360,867	314,252 (87.08%)	226,728 (62.83%)	96.49%	92.50%
Homo sapiens EST	8,647,257	7,096,599 (82.07%)	6,596,287 (76.28%)	97.16%	98.06%

RefSeq transcript alignment quality report

The known RefSeq transcripts (NM_ and NR_ accessions) are a set of hiqh-quality transcripts maintained by the RefSeq group at NCBI. Alignment statistics for this group of transcripts, such as percent and number of sequences not aligning at all, percent best alignments split between multiple scaffolds, and percent alignments not covering the full CDS are indicative of the genome quality and are provided below.

	NHGRI_mGorGor1-v1.1-0.2.freeze_pri Primary Assembly
Number of sequences retrieved from Entrez	462
Number (%) of sequences not aligning	2 (0.43%)
Number (%) of sequences with multiple best alignments (split genes)	0 (0.00%)
Number (%) of sequences with CDS coverage < 95%	(0.00%)

RNA-Seq alignments

The alignments of the following RNA-Seq reads with STAR were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Publication	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	NA	Aggregate of all aligned samples	2,312,030,998	67%	17%	340,734
SAMD00115867	NA	skin (Gorilla gorilla gorilla, SAMD00115867)	30,899,578	85%	22%	154,081
SAMD00115868	NA	skin (Gorilla gorilla gorilla, SAMD00115868)	30,749,568	58%	27%	131,578
SAMD00115869	NA	skin (Gorilla gorilla gorilla, SAMD00115869)	32,405,536	86%	31%	157,324
SAMEA10418790	NA	testis (Gorilla gorilla, 4, male, SAMEA10418790)	24,072,170	38%	37%	185,695
SAMEA10418791	NA	testis (Gorilla gorilla, male, SAMEA10418791)	60,307,305	45%	42%	238,770
SAMN00632194	22012392	Brain, prefrontal cortex (Gorilla gorilla, Female, 50 years, SAMN00632194)	35,257,547	57%	15%	170,353
SAMN00632195	22012392	Brain, prefrontal cortex (Gorilla gorilla, Male, 51 years, SAMN00632195)	32,509,628	65%	21%	179,506
SAMN00632196	22012392	Cerebellum (Gorilla gorilla, Female, 50 years, SAMN00632196)	28,305,051	59%	13%	157,666
SAMN00632197	22012392	Cerebellum (Gorilla gorilla, Male, 51 years, SAMN00632197)	20,661,901	54%	9%	140,550
SAMN00632198	22012392	Heart (Gorilla gorilla, Female, 50 years, SAMN00632198)	28,286,878	38%	17%	108,738
SAMN00632199	22012392	Heart (Gorilla gorilla, Male, 51 years, SAMN00632199)	30,588,563	30%	10%	98,253
SAMN00632200	22012392	Kidney (Gorilla gorilla, Female, 50 years, SAMN00632200)	19,804,877	61%	16%	150,314
SAMN00632201	22012392	Kidney (Gorilla gorilla, Male, 51 years, SAMN00632201)	29,684,063	59%	13%	152,702
SAMN00632202	22012392	Liver (Gorilla gorilla, Female, 50 years, SAMN00632202)	32,830,718	65%	16%	136,666
SAMN00632203	22012392	Liver (Gorilla gorilla, Male, 51 years, SAMN00632203)	34,982,548	67%	15%	147,494
SAMN00632204	22012392	Testis (Gorilla gorilla, Male, 51 years, SAMN00632204)	21,124,809	62%	18%	192,794
SAMN02353877	24631741	iPS, endothelial cells, vein (Gorilla gorilla, SAMN02353877)	5,751,064	65%	16%	101,183
SAMN02353878	24631741	iPS, endothelial cells, vein (Gorilla gorilla, SAMN02353878)	3,107,151	59%	15%	77,366
SAMN02353879	24631741	iPS, endothelial cells, vein (Gorilla gorilla, SAMN02353879)	4,631,720	59%	17%	93,649
SAMN02353883	24631741	iPS, endothelial cells, vein (Gorilla gorilla, SAMN02353883)	6,058,538	70%	17%	108,437
SAMN02353886	24631741	iPS, endothelial cells, vein (Gorilla gorilla, SAMN02353886)	4,258,056	69%	18%	96,302
SAMN02353890	24631741	iPS, endothelial cells, vein (Gorilla gorilla, SAMN02353890)	4,605,201	62%	18%	93,972
SAMN02353893	24631741	iPS, endothelial cells, vein (Gorilla gorilla, SAMN02353893)	4,226,045	72%	18%	97,572
SAMN04313459	NA	testis (Gorilla gorilla gorilla, male, SAMN04313459)	86,459,932	36%	34%	206,629
SAMN07315581	29898898	Gorilla_Sakura_DPFC (Gorilla gorilla, female, SAMN07315581)	24,342,192	50%	12%	147,779
SAMN07315627	29898898	Gorilla_Sakura_CB (Gorilla gorilla, female, SAMN07315627)	29,001,376	55%	9%	148,754
SAMN07315628	29898898	Gorilla_Sakura_HIP (Gorilla gorilla, female, SAMN07315628)	31,241,326	47%	15%	158,668
SAMN07315629	29898898	Gorilla_Sakura_STR (Gorilla gorilla, female, SAMN07315629)	31,279,100	45%	13%	152,851
SAMN07315630	29898898	Gorilla_Sakura_ACC (Gorilla gorilla, female, SAMN07315630)	28,693,714	47%	14%	156,685
SAMN07315631	29898898	Gorilla_Sakura_V1C (Gorilla gorilla, female, SAMN07315631)	10,135,008	53%	13%	118,036
SAMN07315632	29898898	Gorilla_Sakura_PMC (Gorilla gorilla, female, SAMN07315632)	23,929,306	53%	15%	157,828
SAMN07315633	29898898	Gorilla_Sakura_VPFC (Gorilla gorilla, female, SAMN07315633)	24,646,056	48%	12%	143,522
SAMN07315685	29898898	Gorilla_GON_CB (Gorilla gorilla, male, SAMN07315685)	72,691,756	58%	10%	184,317
SAMN07315686	29898898	Gorilla_GON_HIP (Gorilla gorilla, male, SAMN07315686)	64,479,034	46%	14%	186,476
SAMN07315687	29898898	Gorilla_GON_STR (Gorilla gorilla, male, SAMN07315687)	73,008,402	43%	12%	184,063
SAMN07315688	29898898	Gorilla_GON_ACC (Gorilla gorilla, male, SAMN07315688)	52,696,182	46%	13%	178,128
SAMN07315689	29898898	Gorilla_GON_V1C (Gorilla gorilla, male, SAMN07315689)	56,752,528	52%	12%	183,754
SAMN07315690	29898898	Gorilla_GON_PMC (Gorilla gorilla, male, SAMN07315690)	82,722,878	47%	14%	190,784
SAMN07315691	29898898	Gorilla_GON_VPFC (Gorilla gorilla, male, SAMN07315691)	64,176,600	64%	15%	198,562
SAMN07315692	29898898	Gorilla_GON_DPFC (Gorilla gorilla, male, SAMN07315692)	43,292,504	66%	16%	185,007
SAMN07611971	30228200	iPS cell line (Gorilla gorilla, male, SAMN07611971)	47,527,191	141%	36%	201,639
SAMN07814346	31649152	primary dermal fibroblasts, (Gorilla gorilla, SAMN07814346)	4,653,184	54%	17%	67,792
SAMN07814348	31649152	primary dermal fibroblasts, (Gorilla gorilla, SAMN07814348)	3,312,457	69%	16%	64,686
SAMN07814515	31649152	primary dermal fibroblasts, (Gorilla gorilla, SAMN07814515)	23,211,819	89%	20%	113,870
SAMN07814832	31649152	primary dermal fibroblasts, (Gorilla gorilla, SAMN07814832)	17,125,488	93%	16%	81,008
SAMN07814833	31649152	primary dermal fibroblasts, (Gorilla gorilla, SAMN07814833)	22,444,863	89%	17%	105,291
SAMN07814834	31649152	primary dermal fibroblasts, (Gorilla gorilla, SAMN07814834)	18,467,281	91%	18%	96,679
SAMN07814835	31649152	primary dermal fibroblasts, (Gorilla gorilla, SAMN07814835)	21,111,740	91%	19%	108,689
SAMN07814836	31649152	primary dermal fibroblasts, (Gorilla gorilla, SAMN07814836)	16,321,872	93%	15%	80,941
SAMN07814837	31649152	primary dermal fibroblasts, (Gorilla gorilla, SAMN07814837)	15,287,239	93%	16%	84,972
SAMN07814838	31649152	primary dermal fibroblasts, (Gorilla gorilla, SAMN07814838)	14,919,662	93%	16%	73,453
SAMN07814839	31649152	primary dermal fibroblasts, (Gorilla gorilla, SAMN07814839)	18,539,752	93%	15%	82,929
SAMN07814840	31649152	primary dermal fibroblasts, (Gorilla gorilla, SAMN07814840)	16,787,258	92%	16%	85,705
SAMN07814841	31649152	primary dermal fibroblasts, (Gorilla gorilla, SAMN07814841)	18,094,917	91%	15%	81,869
SAMN07814842	31649152	primary dermal fibroblasts, (Gorilla gorilla, SAMN07814842)	22,173,808	93%	17%	99,004
SAMN07814843	31649152	primary dermal fibroblasts, (Gorilla gorilla, SAMN07814843)	21,822,182	91%	16%	97,033
SAMN07814844	31649152	primary dermal fibroblasts, (Gorilla gorilla, SAMN07814844)	23,481,607	90%	18%	104,571
SAMN07814845	31649152	primary dermal fibroblasts, (Gorilla gorilla, SAMN07814845)	25,337,155	93%	12%	73,728
SAMN07814846	31649152	primary dermal fibroblasts, (Gorilla gorilla, SAMN07814846)	19,992,905	92%	11%	57,448
SAMN07814847	31649152	primary dermal fibroblasts, (Gorilla gorilla, SAMN07814847)	19,677,658	90%	16%	91,924
SAMN07814939	31649152	primary dermal fibroblasts, (Gorilla gorilla, SAMN07814939)	18,153,912	91%	15%	79,581
SAMN08722040	32763147	iPSC-CM (Gorilla gorilla, male, SAMN08722040)	157,896,170	86%	11%	196,410
SAMN08722041	32763147	iPSC (Gorilla gorilla, male, SAMN08722041)	305,643,932	71%	12%	213,624
SAMN11096711	NA	primary visual cortex (V1) (Gorilla gorilla, 46, female, SAMN11096711)	36,468,763	80%	10%	176,407
SAMN25211330	NA	Lymphoblastoid cell line, Lymphoblastoid cell line, (Gorilla gorilla, SAMN25211330)	78,921,774	83%	20%	164,748

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
DRR128390	DRX121130	DRP005799	SAMD00115867	30,899,578	85%	22%
DRR128391	DRX121131	DRP005799	SAMD00115868	30,749,568	58%	27%
DRR128392	DRX121132	DRP005799	SAMD00115869	32,405,536	86%	31%
ERR7132965	ERX6700403	ERP132575	SAMEA10418790	24,072,170	38%	37%
ERR7132966	ERX6700404	ERP132575	SAMEA10418791	60,307,305	45%	42%
SRR306800	SRX081944	SRP007412	SAMN00632194	35,257,547	57%	15%
SRR306801	SRX081945	SRP007412	SAMN00632195	32,509,628	65%	21%
SRR306802	SRX081946	SRP007412	SAMN00632196	28,305,051	59%	13%
SRR306803	SRX081947	SRP007412	SAMN00632197	20,661,901	54%	9%
SRR306804	SRX081948	SRP007412	SAMN00632198	28,286,878	38%	17%
SRR306805	SRX081949	SRP007412	SAMN00632199	30,588,563	30%	10%
SRR306806	SRX081950	SRP007412	SAMN00632200	19,804,877	61%	16%
SRR306807	SRX081951	SRP007412	SAMN00632201	29,684,063	59%	13%
SRR306808	SRX081952	SRP007412	SAMN00632202	32,830,718	65%	16%
SRR306809	SRX081953	SRP007412	SAMN00632203	34,982,548	67%	15%
SRR306810	SRX081954	SRP007412	SAMN00632204	21,124,809	62%	18%
SRR976178	SRX348498	SRP029888	SAMN02353877	5,751,064	65%	16%
SRR976177	SRX348497	SRP029888	SAMN02353878	3,107,151	59%	15%
SRR976176	SRX348496	SRP029888	SAMN02353879	4,631,720	59%	17%
SRR976180	SRX348500	SRP029888	SAMN02353883	6,058,538	70%	17%
SRR976181	SRX348501	SRP029888	SAMN02353886	4,258,056	69%	18%
SRR976182	SRX348502	SRP029888	SAMN02353890	4,605,201	62%	18%
SRR976179	SRX348499	SRP029888	SAMN02353893	4,226,045	72%	18%
SRR3053573	SRX1486509	SRP067453	SAMN04313459	24,387,228	36%	34%
SRR10393358	SRX7093734	SRP067453	SAMN04313459	62,072,704	36%	34%
SRR5804502	SRX2983714	SRP111096	SAMN07315581	24,342,192	50%	12%
SRR5804509	SRX2983721	SRP111096	SAMN07315627	29,001,376	55%	9%
SRR5804508	SRX2983720	SRP111096	SAMN07315628	31,241,326	47%	15%
SRR5804507	SRX2983719	SRP111096	SAMN07315629	31,279,100	45%	13%
SRR5804506	SRX2983718	SRP111096	SAMN07315630	28,693,714	47%	14%
SRR5804505	SRX2983717	SRP111096	SAMN07315631	10,135,008	53%	13%
SRR5804504	SRX2983716	SRP111096	SAMN07315632	23,929,306	53%	15%
SRR5804503	SRX2983715	SRP111096	SAMN07315633	24,646,056	48%	12%
SRR5804501	SRX2983713	SRP111096	SAMN07315685	72,691,756	58%	10%
SRR5804500	SRX2983712	SRP111096	SAMN07315686	64,479,034	46%	14%
SRR5804499	SRX2983711	SRP111096	SAMN07315687	73,008,402	43%	12%
SRR5804498	SRX2983710	SRP111096	SAMN07315688	52,696,182	46%	13%
SRR5804497	SRX2983709	SRP111096	SAMN07315689	56,752,528	52%	12%
SRR5804496	SRX2983708	SRP111096	SAMN07315690	82,722,878	47%	14%
SRR5804495	SRX2983707	SRP111096	SAMN07315691	64,176,600	64%	15%
SRR5804494	SRX2983706	SRP111096	SAMN07315692	43,292,504	66%	16%
SRR6190191	SRX3300218	SRP120495	SAMN07814346	4,653,184	54%	17%
SRR6190189	SRX3300216	SRP120495	SAMN07814348	3,312,457	69%	16%
SRR6190044	SRX3300071	SRP120495	SAMN07814515	23,211,819	89%	20%
SRR6190042	SRX3300069	SRP120495	SAMN07814832	17,125,488	93%	16%
SRR6190041	SRX3300068	SRP120495	SAMN07814833	22,444,863	89%	17%
SRR6190040	SRX3300067	SRP120495	SAMN07814834	18,467,281	91%	18%
SRR6190039	SRX3300066	SRP120495	SAMN07814835	21,111,740	91%	19%
SRR6190038	SRX3300065	SRP120495	SAMN07814836	16,321,872	93%	15%
SRR6190037	SRX3300064	SRP120495	SAMN07814837	15,287,239	93%	16%
SRR6190036	SRX3300063	SRP120495	SAMN07814838	14,919,662	93%	16%
SRR6190035	SRX3300062	SRP120495	SAMN07814839	18,539,752	93%	15%
SRR6190034	SRX3300061	SRP120495	SAMN07814840	16,787,258	92%	16%
SRR6190033	SRX3300060	SRP120495	SAMN07814841	18,094,917	91%	15%
SRR6190032	SRX3300059	SRP120495	SAMN07814842	22,173,808	93%	17%
SRR6190031	SRX3300058	SRP120495	SAMN07814843	21,822,182	91%	16%
SRR6190030	SRX3300057	SRP120495	SAMN07814844	23,481,607	90%	18%
SRR6190029	SRX3300056	SRP120495	SAMN07814845	25,337,155	93%	12%
SRR6190028	SRX3300055	SRP120495	SAMN07814846	19,992,905	92%	11%
SRR6190027	SRX3300054	SRP120495	SAMN07814847	19,677,658	90%	16%
SRR6190043	SRX3300070	SRP120495	SAMN07814939	18,153,912	91%	15%
SRR6847098	SRX3802486	SRP135857	SAMN08722040	157,896,170	86%	11%
SRR6847097	SRX3802485	SRP135857	SAMN08722041	305,643,932	71%	12%
SRR7410594	SRX4282133	SRP150877	SAMN07611971	47,527,191	141%	36%
SRR8711732	SRX5506063	SRP188145	SAMN11096711	36,468,763	80%	10%
SRR17715154	SRX13878161	SRP355765	SAMN25211330	78,921,774	83%	20%

SRA Long Read Alignment Statistics

The alignments of the following long RNA-Seq reads (PacBio, Oxford Nanopore, 454, or other long-read sequencing technologies) from the Sequence Read Archive with minimap2 were used for gene prediction:

Run	Sample	Number of reads	Number (%) of sequences aligned by Minimap2	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
All	NA	26972132	26271256 (97.40%)	16194255 (60.04%)	92.48	91.97
SRR17660890	SAMN25013761	10876333	10530729 (96.82%)	5766070 (53.01%)	90.15	89.27
SRR17660906	SAMN25013760	11685853	11331798 (96.97%)	6248442 (53.47%)	90	89.43
SRR22452370	SAMN31928435	4409946	4408729 (99.97%)	4179743 (94.77%)	99.4	99.5

Protein alignments

The alignments of the following proteins with ProSplign were used for gene prediction:

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Primates GenBank	25,396	24,847 (97.84%)	24,847 (97.84%)	81.27%	90.59%
Primates known RefSeq (NP_)	12,512	12,205 (97.55%)	12,205 (97.55%)	87.15%	93.84%
Same-species GenBank	5	5 (100.00%)	5 (100.00%)	80.90%	89.62%
Same-species known RefSeq (NP_)	95	92 (96.84%)	92 (96.84%)	87.02%	93.80%
Homo sapiens GenBank	146,820	138,830 (94.56%)	138,830 (94.56%)	82.51%	86.05%
Homo sapiens known RefSeq (NP_)	66,908	64,584 (96.53%)	64,584 (96.53%)	89.30%	92.69%

Assembly-assembly alignments of current to previous assembly

When the assembly changes between two rounds of annotation, genes in the current and the previous annotation are mapped to each other using the genomic alignments of the current assembly to the previous assembly so that gene identifiers can be preserved. The success of the remapping depends largely on how well the two assembly versions align to each other.

Below are the percent coverage of one assembly by the other and the average percent identity of the alignments. The 'First pass' alignments are reciprocal best hits, while the 'Total' alignments also include 'Second pass' or non-reciprocal best alignments. For more information about the assembly-assembly alignment process, please visit the NCBI Genome Remapping Service page.

First Pass	Total
NHGRI_mGorGor1-v1.1-0.2.freeze_pri (Current) Coverage: 77.44%	NHGRI_mGorGor1-v1.1-0.2.freeze_pri (Current) Coverage: 79.64%
Kamilah_GGO_v0 (Previous) Coverage: 92.97%	Kamilah_GGO_v0 (Previous) Coverage: 93.54%
Percent Identity: 98.11%	Percent Identity: 98.05%

Comparison of the current and previous annotations

The annotations produced for this release were compared to the annotations in the previous release for each assembly annotated in both releases. Scores for current and previous gene and transcript features were calculated based on overlap in exon sequence and matches in exon boundaries. Pairs of current and previous features were categorized based on these scores, whether they are reciprocal best matches, and changes in attributes (gene biotype, completeness, etc.). If the assembly was updated between the two releases, alignments between the current and the previous assembly were used to match the current and previous gene and transcript features in mapped regions.

The table below summarizes the changes in the gene set for each assembly as a percent of the number of genes in the current annotation release, and provides links to the details of the comparison in tabular format and in a Genome Workbench project.

	NHGRI_mGorGor1-v1.1-0.2.freeze_pri (Current) to Kamilah_GGO_v0 (Previous)
Identical	15%
Minor changes	41%
Major changes	17%
New	26%
Deprecated	8%
Other	1%
Download the report	tabular, Genome Workbench

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
BUSCO: Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. Molecular biology and evolution 2021.38(10):4647-4654
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20
STAR: Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. Bioinformatics 2013 Jan 1;29(1):15-21.
Minimap2: Li H. Bioinformatics 2018 Sep 15;34(18):3094-3100

RefSeq

Integrated reference sequences