NCBI Marmota marmota marmota Annotation Release 101

The RefSeq genome records for Marmota marmota marmota were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. This report presents statistics on the annotation products, the input data used in the pipeline and intermediate alignment results.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
BUSCO results: Annotation completeness assessed with BUSCO
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction
Similarity of current and previous assembly: The similarity of the current and previous assembly
Comparison of the current and previous annotations: What proportion of the genes changed in this annotation

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as NCBI Marmota marmota marmota Annotation Release 101

Annotation release ID: 101
Date of Entrez queries for transcripts and proteins: Jun 2 2022
Date of submission of annotation to the public databases: Jul 1 2022
Software version: 9.0

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
marMar	GCF_001458135.2	MRC NATIONAL INSTITUTE FOR MEDICAL RESEARCH	12-18-2019	Reference	unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	marMar
Genes and pseudogenes	34,590
protein-coding	21,014
non-coding	8,982
Transcribed pseudogenes	93
Non-transcribed pseudogenes	4,267
genes with variants	10,872
Immunoglobulin/T-cell receptor gene segments	208
other	26
mRNAs	56,596
fully-supported	53,505
with > 5% ab initio	1,068
partial	609
with filled gap(s)	0
known RefSeq (NM_)	0
model RefSeq (XM_)	56,596
non-coding RNAs	14,097
fully-supported	11,232
with > 5% ab initio	0
partial	2
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	12,898
pseudo transcripts	93
fully-supported	73
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	93
CDSs	56,804
fully-supported	53,505
with > 5% ab initio	1,390
partial	615
with major correction(s)	1,227
known RefSeq (NP_)	0
model RefSeq (XP_)	56,596

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	30,022	36,423	10,480	55	2,842,490
All transcripts	70,693	3,479	2,668	55	104,332
mRNA	56,596	3,772	2,947	114	104,332
misc_RNA	2,585	3,882	2,696	157	59,291
tRNA	1,199	73	73	63	87
lncRNA	8,649	2,563	1,573	102	92,092
snoRNA	661	113	110	55	326
snRNA	973	114	107	60	198
rRNA	4	128	119	119	153
Single-exon transcripts	2,654	1,503	969	114	14,427
coding transcripts (NM_/XM_ )	2,654	1,503	969	114	14,427
CDSs	56,596	2,161	1,491	96	103,098
Exons	262,559	361	142	1	56,625
in coding transcripts (NM_/XM_ )	236,989	321	139	1	29,854
in non-coding transcripts (NR_/XR_ )	40,479	526	154	2	56,625
Introns	230,566	6,124	1,490	30	1,075,449
in coding transcripts (NM_/XM_ )	213,100	6,050	1,462	30	1,075,449
in non-coding transcripts (NR_/XR_ )	31,897	5,715	1,569	30	528,902

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	2.41	1	1	50
Number of exons per transcript	11.97	9	1	314

BUSCO analysis of gene annotation

BUSCO v4.1.4 was run in "protein" mode on the annotated gene set picking one longest protein per gene, and run using the glires_odb10 lineage dataset. Results are reported for the gene set from the primary assembly unit, and presented in BUSCO notation.

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the UniProtKB/Swiss-Prot curated proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 21014 coding genes, 20659 genes had a protein with an alignment covering 50% or more of the query and 16887 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: UniProtKB/Swiss-Prot curated proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker (if calculated), for each assembly. RepeatMasker results are only calculated for organisms with complete Dfam HMM model collections.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with WindowMasker
marMar	GCF_001458135.2	29.81%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez, aligned to the genome by Splign, minimap2, or ProSplign and passed to Gnomon, NCBI's gene prediction software.

Transcript alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Homo sapiens known RefSeq (NM_/NR_)	84,043	70,512 (83.90%)	15,507 (18.45%)	89.63%	80.89%
Homo sapiens Genbank	350,018	168,655 (48.18%)	57,604 (16.46%)	89.73%	87.85%
Mus musculus known RefSeq (NM_/NR_)	49,016	37,094 (75.68%)	6,747 (13.76%)	88.54%	71.36%
Mus musculus Genbank	330,984	122,603 (37.04%)	35,715 (10.79%)	88.96%	83.45%

RNA-Seq alignments

The following RNA-Seq reads from the Sequence Read Archive were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Publication	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	NA	Aggregate of all aligned samples	9,007,967,511	76%	27%	340,394
SAMN00808732	NA	peripheral blood mononuclear cells, negative (Marmota monax, SAMN00808732)	927,972	62%	5%	30,565
SAMN00808734	NA	peripheral blood mononuclear cells, chronic (Marmota monax, SAMN00808734)	947,013	65%	7%	43,258
SAMN00808736	NA	peripheral blood mononuclear cells, negative (Marmota monax, SAMN00808736)	1,109,424	63%	4%	31,010
SAMN00808740	NA	peripheral blood mononuclear cells, chronic (Marmota monax, SAMN00808740)	1,040,164	65%	7%	46,177
SAMN00808742	NA	liver, chronic (Marmota monax, SAMN00808742)	887,126	85%	61%	86,854
SAMN00808745	NA	liver, resolved (Marmota monax, SAMN00808745)	1,176,763	86%	60%	86,865
SAMN00808746	NA	liver, negative (Marmota monax, SAMN00808746)	1,026,233	85%	58%	77,708
SAMN00808749	NA	liver, chronic (Marmota monax, SAMN00808749)	489,337	84%	62%	68,828
SAMN00808750	NA	liver, negative (Marmota monax, SAMN00808750)	656,458	86%	60%	70,928
SAMN07661221	30684495	liver and blood (Marmota himalayana, 2, male, SAMN07661221)	1,202,282,254	86%	38%	260,015
SAMN12828065	31645421	Pancreas (Marmota monax, SAMN12828065)	127,666,526	93%	52%	176,011
SAMN12828067	31645421	Spleen (Marmota monax, SAMN12828067)	142,020,270	78%	25%	201,809
SAMN12828068	31645421	Kidney (Marmota monax, SAMN12828068)	179,703,418	89%	24%	204,293
SAMN12828070	31645421	Liver (Marmota monax, SAMN12828070)	144,701,830	89%	39%	183,939
SAMN12828071	31645421	Heart (Marmota monax, SAMN12828071)	138,856,972	87%	27%	193,474
SAMN12828072	31645421	Lung (Marmota monax, SAMN12828072)	206,453,434	89%	25%	226,347
SAMN12828074	31645421	Spleen (Marmota monax, SAMN12828074)	225,337,732	88%	28%	217,362
SAMN12828076	31645421	Kidney (Marmota monax, SAMN12828076)	185,982,482	92%	26%	214,502
SAMN12828077	31645421	Liver (Marmota monax, SAMN12828077)	211,888,262	91%	39%	192,577
SAMN12828080	31645421	Spleen (Marmota monax, SAMN12828080)	122,753,104	80%	25%	187,329
SAMN12828082	31645421	Kidney (Marmota monax, SAMN12828082)	125,790,226	88%	25%	202,377
SAMN12828083	31645421	Liver (Marmota monax, SAMN12828083)	145,226,684	89%	36%	178,399
SAMN12828085	31645421	Spleen (Marmota monax, SAMN12828085)	126,030,468	85%	27%	198,694
SAMN12828086	31645421	Kidney (Marmota monax, SAMN12828086)	196,174,530	90%	26%	220,165
SAMN12828087	31645421	Liver (Marmota monax, SAMN12828087)	147,677,126	82%	34%	179,852
SAMN12828088	31645421	Spleen (Marmota monax, SAMN12828088)	125,489,746	89%	27%	201,209
SAMN12828089	31645421	Kidney (Marmota monax, SAMN12828089)	127,534,952	86%	26%	212,325
SAMN12828090	31645421	Liver (Marmota monax, SAMN12828090)	202,080,726	91%	36%	184,258
SAMN12828091	31645421	Liver (Marmota monax, SAMN12828091)	134,795,402	84%	39%	174,245
SAMN12828092	31645421	Thymus (Marmota monax, SAMN12828092)	186,450,732	92%	18%	180,263
SAMN14412061	NA	Liver (Marmota monax, SAMN14412061)	73,378,161	85%	32%	144,448
SAMN14412076	NA	Liver (Marmota monax, SAMN14412076)	65,406,203	87%	35%	152,242
SAMN14412077	NA	Liver (Marmota monax, SAMN14412077)	53,324,411	86%	39%	144,444
SAMN14412078	NA	Liver (Marmota monax, SAMN14412078)	55,255,506	85%	36%	161,648
SAMN23970909	35580607	Heart (Marmota monax, SAMN23970909)	168,578,894	68%	22%	187,340
SAMN23970910	35580607	Liver (Marmota monax, SAMN23970910)	129,646,198	64%	31%	175,373
SAMN23970911	35580607	Heart (Marmota monax, SAMN23970911)	152,112,024	66%	17%	188,688
SAMN23970912	35580607	Liver (Marmota monax, SAMN23970912)	145,401,716	63%	26%	174,198
SAMN23970913	35580607	Liver (Marmota monax, SAMN23970913)	126,854,170	70%	37%	178,573
SAMN23970914	35580607	Brain (Marmota monax, SAMN23970914)	148,317,522	70%	26%	209,682
SAMN23970915	35580607	Heart (Marmota monax, SAMN23970915)	173,528,840	67%	23%	194,084
SAMN23970916	35580607	Liver (Marmota monax, SAMN23970916)	153,646,320	68%	31%	181,088
SAMN23970917	35580607	Heart (Marmota monax, SAMN23970917)	132,762,220	71%	27%	196,347
SAMN23970918	35580607	Liver (Marmota monax, SAMN23970918)	166,687,832	56%	6%	150,541
SAMN23970985	35580607	Heart (Marmota monax, SAMN23970985)	156,370,060	63%	16%	181,295
SAMN23970986	35580607	Kidney (Marmota monax, SAMN23970986)	156,374,956	68%	23%	199,976
SAMN23970987	35580607	Lung (Marmota monax, SAMN23970987)	157,125,948	62%	14%	203,306
SAMN23971075	35580607	Skin (Marmota monax, SAMN23971075)	163,384,328	62%	17%	188,904
SAMN23971077	35580607	Brain (Marmota monax, SAMN23971077)	171,998,916	68%	20%	209,740
SAMN23971079	35580607	Kidney (Marmota monax, SAMN23971079)	125,821,948	64%	15%	179,016
SAMN23971081	35580607	Lung (Marmota monax, SAMN23971081)	144,172,796	58%	15%	198,360
SAMN23971083	35580607	Skin (Marmota monax, SAMN23971083)	135,849,960	55%	12%	162,073
SAMN23971084	35580607	Kidney (Marmota monax, SAMN23971084)	138,705,988	66%	19%	202,200
SAMN23971086	35580607	Lung (Marmota monax, SAMN23971086)	150,702,442	66%	17%	208,118
SAMN23971088	35580607	Skin (Marmota monax, SAMN23971088)	130,877,132	58%	14%	165,372
SAMN23971090	35580607	Brain (Marmota monax, SAMN23971090)	165,617,478	67%	20%	209,333
SAMN23971093	35580607	Kidney (Marmota monax, SAMN23971093)	140,909,398	56%	7%	154,372
SAMN23971095	35580607	Lung (Marmota monax, SAMN23971095)	129,300,118	53%	6%	149,211
SAMN23971097	35580607	Skin (Marmota monax, SAMN23971097)	144,854,934	53%	8%	162,056
SAMN23971099	35580607	Brain (Marmota monax, SAMN23971099)	150,669,358	71%	25%	209,425
SAMN23971101	35580607	Kidney (Marmota monax, SAMN23971101)	181,901,594	68%	26%	205,035
SAMN23971103	35580607	Lung (Marmota monax, SAMN23971103)	172,158,620	70%	26%	219,673
SAMN23971105	35580607	Skin (Marmota monax, SAMN23971105)	133,114,154	67%	26%	194,178

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
SRR437934	SRX127233	SRP011132	SAMN00808732	927,972	62%	5%
SRR437935	SRX129177	SRP011132	SAMN00808734	947,013	65%	7%
SRR437938	SRX129178	SRP011132	SAMN00808736	1,109,424	63%	4%
SRR437939	SRX129179	SRP011132	SAMN00808740	1,040,164	65%	7%
SRR437940	SRX129180	SRP011132	SAMN00808742	887,126	85%	61%
SRR437941	SRX129181	SRP011132	SAMN00808745	1,176,763	86%	60%
SRR437942	SRX129182	SRP011132	SAMN00808746	1,026,233	85%	58%
SRR437943	SRX129183	SRP011132	SAMN00808749	489,337	84%	62%
SRR437946	SRX129184	SRP011132	SAMN00808750	656,458	86%	60%
SRR6349910	SRX3446650	SRP122881	SAMN07661221	36,344,898	88%	37%
SRR6349909	SRX3446651	SRP122881	SAMN07661221	35,564,770	86%	39%
SRR6349908	SRX3446652	SRP122881	SAMN07661221	32,846,612	86%	37%
SRR6349907	SRX3446653	SRP122881	SAMN07661221	35,245,342	87%	40%
SRR6349906	SRX3446654	SRP122881	SAMN07661221	56,600,472	86%	35%
SRR6349905	SRX3446655	SRP122881	SAMN07661221	47,685,606	85%	36%
SRR6349904	SRX3446656	SRP122881	SAMN07661221	53,358,650	85%	36%
SRR6349903	SRX3446657	SRP122881	SAMN07661221	52,203,944	87%	46%
SRR6349902	SRX3446658	SRP122881	SAMN07661221	40,958,900	85%	45%
SRR6349901	SRX3446659	SRP122881	SAMN07661221	42,296,996	88%	46%
SRR6349900	SRX3446660	SRP122881	SAMN07661221	49,695,914	87%	46%
SRR6349899	SRX3446661	SRP122881	SAMN07661221	38,021,130	88%	46%
SRR6349898	SRX3446662	SRP122881	SAMN07661221	50,120,498	88%	44%
SRR6349897	SRX3446663	SRP122881	SAMN07661221	44,910,092	87%	43%
SRR6349896	SRX3446664	SRP122881	SAMN07661221	48,227,620	87%	46%
SRR6349895	SRX3446665	SRP122881	SAMN07661221	48,714,244	88%	46%
SRR6349894	SRX3446666	SRP122881	SAMN07661221	44,420,216	86%	44%
SRR8196985	SRX5016354	SRP122881	SAMN07661221	47,073,852	85%	29%
SRR8196984	SRX5016355	SRP122881	SAMN07661221	56,615,178	83%	33%
SRR8196983	SRX5016356	SRP122881	SAMN07661221	44,987,554	81%	31%
SRR8196982	SRX5016357	SRP122881	SAMN07661221	36,878,314	79%	32%
SRR8196981	SRX5016358	SRP122881	SAMN07661221	41,232,930	85%	31%
SRR8196980	SRX5016359	SRP122881	SAMN07661221	53,710,924	86%	30%
SRR8196979	SRX5016360	SRP122881	SAMN07661221	54,106,148	85%	28%
SRR8196978	SRX5016361	SRP122881	SAMN07661221	56,038,790	86%	30%
SRR8196977	SRX5016362	SRP122881	SAMN07661221	54,422,660	86%	29%
SRR10172930	SRX6894605	SRP223069	SAMN12828065	127,666,526	93%	52%
SRR10172929	SRX6894604	SRP223069	SAMN12828067	142,020,270	78%	25%
SRR10172928	SRX6894603	SRP223069	SAMN12828068	179,703,418	89%	24%
SRR10172927	SRX6894602	SRP223069	SAMN12828070	144,701,830	89%	39%
SRR10172926	SRX6894601	SRP223069	SAMN12828071	138,856,972	87%	27%
SRR10172925	SRX6894600	SRP223069	SAMN12828072	206,453,434	89%	25%
SRR10172924	SRX6894599	SRP223069	SAMN12828074	225,337,732	88%	28%
SRR10172923	SRX6894598	SRP223069	SAMN12828076	185,982,482	92%	26%
SRR10172922	SRX6894597	SRP223069	SAMN12828077	211,888,262	91%	39%
SRR10172941	SRX6894616	SRP223069	SAMN12828080	122,753,104	80%	25%
SRR10172940	SRX6894615	SRP223069	SAMN12828082	125,790,226	88%	25%
SRR10172939	SRX6894614	SRP223069	SAMN12828083	145,226,684	89%	36%
SRR10172938	SRX6894613	SRP223069	SAMN12828085	126,030,468	85%	27%
SRR10172937	SRX6894612	SRP223069	SAMN12828086	196,174,530	90%	26%
SRR10172936	SRX6894611	SRP223069	SAMN12828087	147,677,126	82%	34%
SRR10172935	SRX6894610	SRP223069	SAMN12828088	125,489,746	89%	27%
SRR10172934	SRX6894609	SRP223069	SAMN12828089	127,534,952	86%	26%
SRR10172933	SRX6894608	SRP223069	SAMN12828090	202,080,726	91%	36%
SRR10172932	SRX6894607	SRP223069	SAMN12828091	134,795,402	84%	39%
SRR10172931	SRX6894606	SRP223069	SAMN12828092	186,450,732	92%	18%
SRR11357518	SRX7959045	SRP253468	SAMN14412061	73,378,161	85%	32%
SRR11357521	SRX7959048	SRP253468	SAMN14412076	65,406,203	87%	35%
SRR11357520	SRX7959047	SRP253468	SAMN14412077	53,324,411	86%	39%
SRR11357519	SRX7959046	SRP253468	SAMN14412078	55,255,506	85%	36%
SRR17216521	SRX13396294	SRP350516	SAMN23970909	168,578,894	68%	22%
SRR17216522	SRX13396296	SRP350516	SAMN23970910	129,646,198	64%	31%
SRR17216523	SRX13396297	SRP350516	SAMN23970911	152,112,024	66%	17%
SRR17216524	SRX13396298	SRP350516	SAMN23970912	145,401,716	63%	26%
SRR17216530	SRX13396062	SRP350516	SAMN23970913	126,854,170	70%	37%
SRR17216531	SRX13396063	SRP350516	SAMN23970914	148,317,522	70%	26%
SRR17216525	SRX13396057	SRP350516	SAMN23970915	173,528,840	67%	23%
SRR17216526	SRX13396058	SRP350516	SAMN23970916	153,646,320	68%	31%
SRR17216527	SRX13396059	SRP350516	SAMN23970917	132,762,220	71%	27%
SRR17216528	SRX13396060	SRP350516	SAMN23970918	166,687,832	56%	6%
SRR17216529	SRX13396061	SRP350516	SAMN23970985	156,370,060	63%	16%
SRR17216532	SRX13396064	SRP350516	SAMN23970986	156,374,956	68%	23%
SRR17216533	SRX13396065	SRP350516	SAMN23970987	157,125,948	62%	14%
SRR17216534	SRX13396066	SRP350516	SAMN23971075	163,384,328	62%	17%
SRR17216535	SRX13396067	SRP350516	SAMN23971077	171,998,916	68%	20%
SRR17216536	SRX13396068	SRP350516	SAMN23971079	125,821,948	64%	15%
SRR17216537	SRX13396069	SRP350516	SAMN23971081	144,172,796	58%	15%
SRR17216538	SRX13396070	SRP350516	SAMN23971083	135,849,960	55%	12%
SRR17216539	SRX13396071	SRP350516	SAMN23971084	138,705,988	66%	19%
SRR17216540	SRX13396072	SRP350516	SAMN23971086	150,702,442	66%	17%
SRR17216541	SRX13396332	SRP350516	SAMN23971088	130,877,132	58%	14%
SRR17216542	SRX13396333	SRP350516	SAMN23971090	165,617,478	67%	20%
SRR17216543	SRX13396334	SRP350516	SAMN23971093	140,909,398	56%	7%
SRR17216544	SRX13396335	SRP350516	SAMN23971095	129,300,118	53%	6%
SRR17216545	SRX13396336	SRP350516	SAMN23971097	144,854,934	53%	8%
SRR17216546	SRX13396337	SRP350516	SAMN23971099	150,669,358	71%	25%
SRR17216547	SRX13396338	SRP350516	SAMN23971101	181,901,594	68%	26%
SRR17216548	SRX13396339	SRP350516	SAMN23971103	172,158,620	70%	26%
SRR17216549	SRX13396340	SRP350516	SAMN23971105	133,114,154	67%	26%

Protein alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Mus musculus known RefSeq (NP_)	40,703	38,845 (95.44%)	38,845 (95.44%)	75.72%	84.94%
Rattus norvegicus known RefSeq (NP_)	19,648	19,050 (96.96%)	19,050 (96.96%)	74.17%	87.00%
Homo sapiens known RefSeq (NP_)	64,840	61,956 (95.55%)	61,956 (95.55%)	78.44%	84.44%

Assembly-assembly alignments of current to previous assembly

When the assembly changes between two rounds of annotation, genes in the current and the previous annotation are mapped to each other using the genomic alignments of the current assembly to the previous assembly so that gene identifiers can be preserved. The success of the remapping depends largely on how well the two assembly versions align to each other.

Below are the percent coverage of one assembly by the other and the average percent identity of the alignments. The 'First pass' alignments are reciprocal best hits, while the 'Total' alignments also include 'Second pass' or non-reciprocal best alignments. For more information about the assembly-assembly alignment process, please visit the NCBI Genome Remapping Service page.

First Pass	Total
marMar (Current) Coverage: 100.00%	marMar (Current) Coverage: 100.00%
marMar2.1 (Previous) Coverage: 99.92%	marMar2.1 (Previous) Coverage: 99.92%
Percent Identity: 100.00%	Percent Identity: 100.00%

Comparison of the current and previous annotations

The annotation produced for this release (101) was compared to the annotation in the previous release (100) for each assembly annotated in both releases. Scores for current and previous gene and transcript features were calculated based on overlap in exon sequence and matches in exon boundaries. Pairs of current and previous features were categorized based on these scores, whether they are reciprocal best matches, and changes in attributes (gene biotype, completeness, etc.). If the assembly was updated between the two releases, alignments between the current and the previous assembly were used to match the current and previous gene and transcript features in mapped regions.

The table below summarizes the changes in the gene set for each assembly as a percent of the number of genes in the current annotation release, and provides links to the details of the comparison in tabular format and in a Genome Workbench project.

	marMar (Current) to marMar2.1 (Previous)
Identical	6%
Minor changes	46%
Major changes	18%
New	28%
Deprecated	9%
Other	2%
Download the report	tabular, Genome Workbench

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
BUSCO: Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. Molecular biology and evolution 2021.38(10):4647-4654
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20
Minimap2: Li H. Bioinformatics 2018 Sep 15;34(18):3094-3100

RefSeq

Integrated reference sequences