NCBI Mus musculus Annotation Release 109

The RefSeq genome records for Mus musculus were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. This report presents statistics on the annotation products, the input data used in the pipeline and intermediate alignment results.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction
Similarity of current and previous assembly: The similarity of the current and previous assembly
Comparison of the current and previous annotations: What proportion of the genes changed in this annotation

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as NCBI Mus musculus Annotation Release 109

Annotation release ID: 109
Date of Entrez queries for transcripts and proteins: Sep 9 2020
Date of submission of annotation to the public databases: Sep 22 2020
Software version: 8.5

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
GRCm39	GCF_000001635.27	Genome Reference Consortium	06-24-2020	Reference	22 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	GRCm39
Genes and pseudogenes	50,561
protein-coding	22,186
non-coding	17,518
transcribed pseudogenes	474
non-transcribed pseudogenes	9,869
genes with variants	18,531
immunoglobulin/T-cell receptor gene segments	490
other	24
mRNAs	92,486
fully-supported	92,206
with > 5% ab initio	113
partial	54
with filled gap(s)	0
known RefSeq (NM_)	37,907
model RefSeq (XM_)	54,579
non-coding RNAs	38,629
fully-supported	35,041
with > 5% ab initio	0
partial	5
with filled gap(s)	0
known RefSeq (NR_)	7,443
model RefSeq (XR_)	29,867
pseudo transcripts	533
fully-supported	524
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	461
model RefSeq (XR_)	72
CDSs	92,989
fully-supported	92,206
with > 5% ab initio	155
partial	53
with major correction(s)	30
known RefSeq (NP_)	37,920
model RefSeq (XP_)	54,579

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	39,728	33,162	10,312	10	3,171,119
All transcripts	131,115	3,878	2,920	15	148,183
mRNA	92,486	4,315	3,392	111	141,950
misc_RNA	10,029	4,339	3,496	120	148,183
miRNA	2,112	22	22	15	27
tRNA	422	73	72	59	87
lncRNA	23,611	2,774	1,418	36	141,249
snoRNA	1,331	118	130	28	320
snRNA	999	115	107	59	332
antisense_RNA	9	2,648	2,487	865	5,023
guide_RNA	41	151	135	42	381
rRNA	64	157	121	119	1,582
telomerase_RNA	1	397	397	397	397
RNase_MRP_RNA	1	275	275	275	275
RNase_P_RNA	4	259	238	235	325
Y_RNA	2	106	111	101	111
scRNA	3	219	299	58	299
Single-exon transcripts	2,941	1,557	951	111	99,852
coding transcripts (NM_/XM_ )	2,710	1,400	948	111	99,852
non-coding transcripts (NR_/XR_ )	231	3,394	2,177	147	83,437
CDSs	92,499	2,092	1,491	78	106,392
Exons	352,138	488	152	1	139,958
in coding transcripts (NM_/XM_ )	285,412	426	146	1	139,958
in non-coding transcripts (NR_/XR_ )	120,842	530	153	2	112,051
Introns	291,708	6,493	1,565	26	1,067,346
in coding transcripts (NM_/XM_ )	243,819	6,465	1,519	26	1,067,346
in non-coding transcripts (NR_/XR_ )	100,211	5,515	1,539	30	601,637

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	3.31	1	1	88
Number of exons per transcript	11.67	8	1	349

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the UniProtKB/Swiss-Prot curated proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 22173 coding genes, 21594 genes had a protein with an alignment covering 50% or more of the query and 19854 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: UniProtKB/Swiss-Prot curated proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker for each assembly. RepeatMasker results are only used for organisms for which a comprehensive repeat library is available.

For this annotation run, transcripts and proteins were aligned to the genome masked with RepeatMasker only.

Assembly name	Assembly accession	% Masked with RepeatMasker	% Masked with WindowMasker
GRCm39	GCF_000001635.27	44.29%	35.45%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez, aligned to the genome by Splign or ProSplign and passed to Gnomon, NCBI's gene prediction software.

Depending on the other evidence available, long 454 reads (with average length above 250 nt) may be aligned as traditional evidence and reported in the Transcript alignments section or aligned with RNA-Seq reads and reported in the RNA-Seq alignments section.

Transcript alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species known RefSeq (NM_/NR_)	45,948	45,947 (99.99%)	45,818 (99.72%)	100.00%	100.00%
Same-species Genbank	327,638	284,864 (86.94%)	223,062 (68.08%)	99.68%	99.74%
Same-species EST	4,866,711	4,117,243 (84.60%)	3,146,706 (64.66%)	98.89%	99.44%
Same-species long SRA	76,166,637	63,946,643 (83.96%)	38,611,517 (50.69%)	92.91%	79.41%

RefSeq transcript alignment quality report

The known RefSeq transcripts (NM_ and NR_ accessions) are a set of hiqh-quality transcripts maintained by the RefSeq group at NCBI. Alignment statistics for this group of transcripts, such as percent and number of sequences not aligning at all, percent best alignments split between multiple scaffolds, and percent alignments not covering the full CDS are indicative of the genome quality and are provided below.

	GRCm39 Primary Assembly
Number of sequences retrieved from Entrez	45,948
Number (%) of sequences not aligning	1 (0.00%)
Number (%) of sequences with multiple best alignments (split genes)	2 (0.00%)
Number (%) of sequences with CDS coverage < 95%	19 (0.05%)

RNA-Seq alignments

The following RNA-Seq reads from the Sequence Read Archive were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Publication	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	NA	Aggregate of all aligned samples	17,175,996,771	82%	20%	378,977
SAMN00849374	NA	ovary (Mus musculus, adult-8wks, SAMN00849374)	211,354,114	84%	16%	189,787
SAMN00849375	NA	mammary gland (Mus musculus, adult-8wks, SAMN00849375)	294,005,236	79%	16%	193,160
SAMN00849376	NA	stomach (Mus musculus, adult-8wks, SAMN00849376)	337,101,130	77%	20%	138,120
SAMN00849377	NA	small intestine (Mus musculus, adult-8wks, SAMN00849377)	276,113,856	82%	17%	160,663
SAMN00849378	NA	duodenum (Mus musculus, adult-8wks, SAMN00849378)	311,815,510	72%	14%	138,692
SAMN00849379	NA	adrenal gland (Mus musculus, adult-8wks, SAMN00849379)	296,584,250	79%	15%	151,480
SAMN00849380	NA	large intestine (Mus musculus, adult-8wks, SAMN00849380)	297,232,294	84%	18%	192,824
SAMN00849381	NA	genital fat pad (Mus musculus, adult-8wks, SAMN00849381)	295,933,444	84%	17%	218,187
SAMN00849382	NA	subcutaneous fat pad (Mus musculus, adult-8wks, SAMN00849382)	314,334,288	83%	16%	202,341
SAMN00849383	NA	thymus (Mus musculus, adult-8wks, SAMN00849383)	363,153,942	81%	14%	202,545
SAMN00849384	NA	testis (Mus musculus, adult-8wks, SAMN00849384)	301,527,276	85%	17%	248,497
SAMN00849385	NA	kidney (Mus musculus, adult-8wks, SAMN00849385)	422,158,200	83%	16%	204,134
SAMN00849386	NA	liver (Mus musculus, adult-8wks, SAMN00849386)	325,376,342	85%	19%	159,839
SAMN00849387	NA	lung (Mus musculus, adult-8wks, SAMN00849387)	266,132,318	83%	16%	205,054
SAMN00849388	NA	spleen (Mus musculus, adult-8wks, SAMN00849388)	305,188,592	83%	16%	182,097
SAMN00849389	NA	colon (Mus musculus, adult-8wks, SAMN00849389)	262,011,506	84%	17%	181,194
SAMN00849390	NA	heart (Mus musculus, adult-8wks, SAMN00849390)	311,162,380	81%	13%	183,171
SAMN01164131	NA	frontal lobe (Mus musculus, adult-8wks, SAMN01164131)	690,133,160	78%	15%	248,192
SAMN01164132	NA	cortex (Mus musculus, adult-8wks, SAMN01164132)	643,737,804	79%	15%	241,620
SAMN01164133	NA	bladder (Mus musculus, adult-8wks, SAMN01164133)	646,247,254	80%	23%	237,577
SAMN01164134	NA	placenta (Mus musculus, adult-8wks, SAMN01164134)	942,317,442	82%	25%	251,913
SAMN01164135	NA	liver (Mus musculus, E18, SAMN01164135)	799,842,894	86%	31%	227,565
SAMN01164136	NA	cerebellum (Mus musculus, adult-8wks, SAMN01164136)	591,359,684	77%	17%	239,495
SAMN01164137	NA	limb (Mus musculus, E14.5, SAMN01164137)	681,583,788	82%	21%	244,436
SAMN01164138	NA	CSHL_RnaSeq_CNS_E14 (superseded by GSE90198) (Mus musculus, E14, SAMN01164138)	760,798,446	82%	20%	259,563
SAMN01164139	NA	CSHL_RnaSeq_CNS_E18 (superseded by GSE90199) (Mus musculus, E18, SAMN01164139)	734,218,674	84%	19%	258,121
SAMN01164140	NA	liver (Mus musculus, E14.5, SAMN01164140)	501,444,074	81%	27%	209,846
SAMN01164141	NA	whole brain (Mus musculus, E14.5, SAMN01164141)	687,712,618	81%	18%	247,779
SAMN01164142	NA	CSHL_RnaSeq_CNS_E11.5 (superseded by GSE90194) (Mus musculus, E11.5, SAMN01164142)	757,034,232	79%	22%	247,772
SAMN01164143	NA	liver (Mus musculus, E14, SAMN01164143)	699,427,530	84%	26%	220,953
SAMN01766820	23258891	skeletal muscle (Mus musculus, SAMN01766820)	226,222,554	85%	11%	145,639
SAMN01766828	23258891	skeletal muscle (Mus musculus, SAMN01766828)	234,343,474	83%	23%	173,044
SAMN01766837	23258891	skeletal muscle (Mus musculus, SAMN01766837)	59,477,738	79%	4%	70,099
SAMN02146467	24709821	whole organism, E10.5 (Mus musculus, SAMN02146467)	160,569,334	74%	8%	165,187
SAMN02146468	24709821	whole organism, E10.5 (Mus musculus, SAMN02146468)	203,975,305	160%	10%	187,611
SAMN02415122	25186741	cerebral cortex, Astrocyte, (Mus musculus, SAMN02415122)	64,037,644	89%	19%	178,485
SAMN02415123	25186741	cerebral cortex, Astrocyte, (Mus musculus, SAMN02415123)	59,245,824	88%	21%	172,335
SAMN02415124	25186741	cerebral cortex, neuron, (Mus musculus, SAMN02415124)	67,883,582	83%	20%	172,456
SAMN02415125	25186741	cerebral cortex, neuron, (Mus musculus, SAMN02415125)	75,786,274	87%	17%	183,400
SAMN02415126	25186741	cerebral cortex, oligodendrocyte precursor cells, (Mus musculus, SAMN02415126)	64,975,528	87%	22%	164,237
SAMN02415127	25186741	cerebral cortex, newly formed oligodendrocytes, (Mus musculus, SAMN02415127)	61,094,990	87%	23%	155,391
SAMN02415128	25186741	cerebral cortex, microglia, (Mus musculus, SAMN02415128)	60,014,680	77%	27%	125,591
SAMN02415129	25186741	cerebral cortex, myelinating oligodendrocytes, (Mus musculus, SAMN02415129)	59,465,980	87%	20%	148,169
SAMN02415130	25186741	cerebral cortex, oligodendrocyte precursor cells, (Mus musculus, SAMN02415130)	64,463,476	88%	21%	182,609
SAMN02415131	25186741	cerebral cortex, newly formed oligodendrocytes, (Mus musculus, SAMN02415131)	64,264,726	87%	22%	168,140
SAMN02415132	25186741	cerebral cortex, myelinating oligodendrocytes, (Mus musculus, SAMN02415132)	66,877,618	85%	20%	129,867
SAMN02415133	25186741	cerebral cortex, microglia, (Mus musculus, SAMN02415133)	58,345,738	90%	26%	145,191
SAMN02415134	25186741	cerebral cortex, endothelial cells, (Mus musculus, SAMN02415134)	67,583,586	90%	27%	158,992
SAMN02415135	25186741	cerebral cortex (Mus musculus, SAMN02415135)	62,146,188	88%	20%	193,986
SAMN02415136	25186741	cerebral cortex, endothelial cells, (Mus musculus, SAMN02415136)	73,076,698	89%	26%	161,524
SAMN02415137	25186741	cerebral cortex (Mus musculus, SAMN02415137)	63,526,304	88%	20%	195,334
SAMN02415138	25186741	cerebral cortex (Mus musculus, SAMN02415138)	60,345,648	88%	20%	195,143
SAMN04095718	27183606	Femur, tibia and pelvis, Mesenchymal cells (Lin- SCA1+ cells), (Mus musculus, 3 to 4 wks-old, SAMN04095718)	156,717,904	63%	25%	96,257
SAMN04095719	27183606	Femur, tibia and pelvis, Mesenchymal cells (Lin- SCA1+ cells), (Mus musculus, 3 to 4 wks-old, SAMN04095719)	152,785,684	71%	25%	125,856
SAMN04095720	27183606	Femur, tibia and pelvis, Mesenchymal cells (Lin- SCA1+ cells), (Mus musculus, 3 to 4 wks-old, SAMN04095720)	134,506,896	65%	26%	136,388
SAMN04095721	27183606	abdominal and back dermis (truncal skin), Mesenchymal cells (Lin- SCA1+ cells), (Mus musculus, 3 to 4 wks-old, SAMN04095721)	155,081,856	79%	32%	130,599
SAMN04095722	27183606	abdominal and back dermis (truncal skin), Mesenchymal cells (Lin- SCA1+ cells), (Mus musculus, 3 to 4 wks-old, SAMN04095722)	123,055,204	84%	33%	141,077
SAMN04095723	27183606	abdominal and back dermis (truncal skin), Mesenchymal cells (Lin- SCA1+ cells), (Mus musculus, 3 to 4 wks-old, SAMN04095723)	149,084,060	84%	30%	132,783

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
SRR453077	SRX135150	SRP012040	SAMN00849374	32,386,150	87%	16%
SRR453078	SRX135150	SRP012040	SAMN00849374	23,929,060	84%	16%
SRR453079	SRX135150	SRP012040	SAMN00849374	10,186,844	84%	16%
SRR453080	SRX135150	SRP012040	SAMN00849374	22,288,566	88%	16%
SRR453081	SRX135150	SRP012040	SAMN00849374	22,207,250	87%	16%
SRR453082	SRX135150	SRP012040	SAMN00849374	20,679,748	80%	15%
SRR453083	SRX135150	SRP012040	SAMN00849374	21,442,928	79%	16%
SRR453084	SRX135150	SRP012040	SAMN00849374	15,152,054	84%	16%
SRR453085	SRX135150	SRP012040	SAMN00849374	21,132,374	84%	16%
SRR453086	SRX135150	SRP012040	SAMN00849374	21,949,140	84%	16%
SRR453087	SRX135151	SRP012040	SAMN00849375	35,726,184	61%	11%
SRR453088	SRX135151	SRP012040	SAMN00849375	58,915,062	82%	16%
SRR453089	SRX135151	SRP012040	SAMN00849375	76,730,942	82%	17%
SRR453090	SRX135151	SRP012040	SAMN00849375	35,413,120	81%	16%
SRR453091	SRX135151	SRP012040	SAMN00849375	30,183,818	81%	16%
SRR453092	SRX135151	SRP012040	SAMN00849375	57,036,110	81%	16%
SRR453093	SRX135152	SRP012040	SAMN00849376	52,494,346	79%	18%
SRR453094	SRX135152	SRP012040	SAMN00849376	54,234,248	77%	18%
SRR453095	SRX135152	SRP012040	SAMN00849376	48,701,322	73%	18%
SRR453096	SRX135152	SRP012040	SAMN00849376	59,334,910	77%	21%
SRR453097	SRX135152	SRP012040	SAMN00849376	67,722,044	81%	21%
SRR453098	SRX135152	SRP012040	SAMN00849376	54,614,260	75%	21%
SRR453099	SRX135153	SRP012040	SAMN00849377	39,262,504	83%	18%
SRR453100	SRX135153	SRP012040	SAMN00849377	35,918,702	79%	18%
SRR453101	SRX135153	SRP012040	SAMN00849377	13,838,674	81%	18%
SRR453102	SRX135153	SRP012040	SAMN00849377	25,934,096	83%	18%
SRR453103	SRX135153	SRP012040	SAMN00849377	25,619,286	83%	18%
SRR453104	SRX135153	SRP012040	SAMN00849377	36,423,822	84%	16%
SRR453105	SRX135153	SRP012040	SAMN00849377	33,934,446	80%	15%
SRR453106	SRX135153	SRP012040	SAMN00849377	11,483,924	83%	15%
SRR453107	SRX135153	SRP012040	SAMN00849377	26,148,122	84%	15%
SRR453108	SRX135153	SRP012040	SAMN00849377	27,550,280	85%	15%
SRR453109	SRX135154	SRP012040	SAMN00849378	52,351,100	74%	15%
SRR453110	SRX135154	SRP012040	SAMN00849378	55,174,656	73%	15%
SRR453111	SRX135154	SRP012040	SAMN00849378	46,244,620	69%	15%
SRR453112	SRX135154	SRP012040	SAMN00849378	30,366,602	73%	14%
SRR453113	SRX135154	SRP012040	SAMN00849378	47,776,396	72%	14%
SRR453114	SRX135154	SRP012040	SAMN00849378	34,517,372	67%	14%
SRR453115	SRX135154	SRP012040	SAMN00849378	45,384,764	74%	14%
SRR453116	SRX135155	SRP012040	SAMN00849379	48,580,184	82%	15%
SRR453117	SRX135155	SRP012040	SAMN00849379	48,625,494	75%	15%
SRR453118	SRX135155	SRP012040	SAMN00849379	48,017,016	74%	15%
SRR453119	SRX135155	SRP012040	SAMN00849379	51,837,378	85%	15%
SRR453120	SRX135155	SRP012040	SAMN00849379	53,474,608	82%	15%
SRR453121	SRX135155	SRP012040	SAMN00849379	46,049,570	77%	15%
SRR453122	SRX135156	SRP012040	SAMN00849380	73,962,838	83%	17%
SRR453123	SRX135156	SRP012040	SAMN00849380	70,034,228	84%	17%
SRR453124	SRX135156	SRP012040	SAMN00849380	78,153,708	83%	18%
SRR453125	SRX135156	SRP012040	SAMN00849380	75,081,520	84%	18%
SRR453126	SRX135157	SRP012040	SAMN00849381	77,262,206	82%	17%
SRR453127	SRX135157	SRP012040	SAMN00849381	61,487,986	86%	17%
SRR453128	SRX135157	SRP012040	SAMN00849381	78,776,450	82%	16%
SRR453129	SRX135157	SRP012040	SAMN00849381	78,406,802	87%	17%
SRR453130	SRX135158	SRP012040	SAMN00849382	80,134,562	84%	16%
SRR453131	SRX135158	SRP012040	SAMN00849382	76,509,998	82%	16%
SRR453132	SRX135158	SRP012040	SAMN00849382	77,355,846	78%	16%
SRR453133	SRX135158	SRP012040	SAMN00849382	80,333,882	87%	17%
SRR453134	SRX135159	SRP012040	SAMN00849383	26,948,684	81%	14%
SRR453135	SRX135159	SRP012040	SAMN00849383	66,170,602	80%	14%
SRR453136	SRX135159	SRP012040	SAMN00849383	66,944,732	80%	14%
SRR453137	SRX135159	SRP012040	SAMN00849383	74,711,836	84%	13%
SRR453138	SRX135159	SRP012040	SAMN00849383	62,394,044	81%	13%
SRR453139	SRX135159	SRP012040	SAMN00849383	65,984,044	81%	13%
SRR453140	SRX135160	SRP012040	SAMN00849384	75,392,014	85%	17%
SRR453141	SRX135160	SRP012040	SAMN00849384	71,722,206	85%	17%
SRR453142	SRX135160	SRP012040	SAMN00849384	82,240,752	85%	17%
SRR453143	SRX135160	SRP012040	SAMN00849384	72,172,304	85%	17%
SRR453144	SRX135161	SRP012040	SAMN00849385	77,733,554	84%	17%
SRR453145	SRX135161	SRP012040	SAMN00849385	71,073,348	80%	17%
SRR453146	SRX135161	SRP012040	SAMN00849385	37,576,572	84%	17%
SRR453147	SRX135161	SRP012040	SAMN00849385	81,085,360	83%	15%
SRR453148	SRX135161	SRP012040	SAMN00849385	74,817,040	83%	15%
SRR453149	SRX135161	SRP012040	SAMN00849385	79,872,326	84%	16%
SRR453150	SRX135162	SRP012040	SAMN00849386	60,923,898	83%	19%
SRR453151	SRX135162	SRP012040	SAMN00849386	35,789,592	85%	18%
SRR453152	SRX135162	SRP012040	SAMN00849386	63,826,944	85%	19%
SRR453153	SRX135162	SRP012040	SAMN00849386	55,098,830	85%	19%
SRR453154	SRX135162	SRP012040	SAMN00849386	48,469,846	87%	18%
SRR453155	SRX135162	SRP012040	SAMN00849386	61,267,232	86%	19%
SRR453156	SRX135163	SRP012040	SAMN00849387	71,218,238	84%	15%
SRR453157	SRX135163	SRP012040	SAMN00849387	69,916,506	82%	15%
SRR453158	SRX135163	SRP012040	SAMN00849387	62,649,820	83%	16%
SRR453159	SRX135163	SRP012040	SAMN00849387	62,347,754	82%	16%
SRR453160	SRX135164	SRP012040	SAMN00849388	59,828,110	85%	16%
SRR453161	SRX135164	SRP012040	SAMN00849388	45,463,628	82%	16%
SRR453162	SRX135164	SRP012040	SAMN00849388	42,918,108	82%	16%
SRR453163	SRX135164	SRP012040	SAMN00849388	58,613,470	85%	16%
SRR453164	SRX135164	SRP012040	SAMN00849388	47,323,212	81%	16%
SRR453165	SRX135164	SRP012040	SAMN00849388	51,042,064	84%	17%
SRR453166	SRX135165	SRP012040	SAMN00849389	55,982,780	87%	16%
SRR453167	SRX135165	SRP012040	SAMN00849389	48,470,144	81%	17%
SRR453168	SRX135165	SRP012040	SAMN00849389	25,226,134	86%	16%
SRR453169	SRX135165	SRP012040	SAMN00849389	44,295,364	82%	17%
SRR453170	SRX135165	SRP012040	SAMN00849389	44,682,888	85%	16%
SRR453171	SRX135165	SRP012040	SAMN00849389	43,354,196	84%	17%
SRR453172	SRX135166	SRP012040	SAMN00849390	78,077,912	80%	13%
SRR453173	SRX135166	SRP012040	SAMN00849390	71,140,618	80%	12%
SRR453174	SRX135166	SRP012040	SAMN00849390	82,886,668	83%	14%
SRR453175	SRX135166	SRP012040	SAMN00849390	79,057,182	82%	14%
SRR567478	SRX186041	SRP012040	SAMN01164131	371,490,262	78%	16%
SRR567479	SRX186041	SRP012040	SAMN01164131	318,642,898	78%	15%
SRR567480	SRX186042	SRP012040	SAMN01164132	311,361,602	80%	16%
SRR567481	SRX186042	SRP012040	SAMN01164132	332,376,202	78%	15%
SRR567482	SRX186043	SRP012040	SAMN01164133	323,288,408	79%	22%
SRR567483	SRX186043	SRP012040	SAMN01164133	322,958,846	80%	25%
SRR567484	SRX186044	SRP012040	SAMN01164134	466,466,922	83%	26%
SRR567485	SRX186044	SRP012040	SAMN01164134	475,850,520	81%	24%
SRR567486	SRX186045	SRP012040	SAMN01164135	432,307,054	85%	31%
SRR567487	SRX186045	SRP012040	SAMN01164135	367,535,840	86%	30%
SRR567488	SRX186046	SRP012040	SAMN01164136	301,949,602	76%	17%
SRR567489	SRX186046	SRP012040	SAMN01164136	289,410,082	79%	17%
SRR567490	SRX186047	SRP012040	SAMN01164137	341,806,528	83%	22%
SRR567491	SRX186047	SRP012040	SAMN01164137	339,777,260	81%	20%
SRR567492	SRX186048	SRP012040	SAMN01164138	317,634,496	84%	20%
SRR567493	SRX186048	SRP012040	SAMN01164138	443,163,950	81%	20%
SRR567494	SRX186049	SRP012040	SAMN01164139	300,853,652	83%	19%
SRR567495	SRX186049	SRP012040	SAMN01164139	433,365,022	84%	19%
SRR567496	SRX186050	SRP012040	SAMN01164140	312,763,976	79%	27%
SRR567497	SRX186050	SRP012040	SAMN01164140	188,680,098	85%	28%
SRR567498	SRX186051	SRP012040	SAMN01164141	341,390,158	81%	19%
SRR567499	SRX186051	SRP012040	SAMN01164141	346,322,460	81%	17%
SRR567500	SRX186052	SRP012040	SAMN01164142	408,288,868	76%	24%
SRR567501	SRX186052	SRP012040	SAMN01164142	348,745,364	84%	20%
SRR567502	SRX186053	SRP012040	SAMN01164143	272,149,598	85%	27%
SRR567503	SRX186053	SRP012040	SAMN01164143	427,277,932	84%	26%
SRR594399	SRX196270	SRP016501	SAMN01766820	226,222,554	85%	11%
SRR594407	SRX196278	SRP016501	SAMN01766828	234,343,474	83%	23%
SRR594416	SRX196287	SRP016501	SAMN01766837	59,477,738	79%	4%
SRR851934	SRX278615	SRP022861	SAMN02146467	160,569,334	74%	8%
SRR851935	SRX278616	SRP022861	SAMN02146468	203,975,305	160%	10%
SRR1033784	SRX380380	SRP033200	SAMN02415122	64,037,644	89%	19%
SRR1033783	SRX380379	SRP033200	SAMN02415123	59,245,824	88%	21%
SRR1033786	SRX380382	SRP033200	SAMN02415124	67,883,582	83%	20%
SRR1033785	SRX380381	SRP033200	SAMN02415125	75,786,274	87%	17%
SRR1033788	SRX380384	SRP033200	SAMN02415126	64,975,528	87%	22%
SRR1033790	SRX380386	SRP033200	SAMN02415127	61,094,990	87%	23%
SRR1033794	SRX380390	SRP033200	SAMN02415128	60,014,680	77%	27%
SRR1033792	SRX380388	SRP033200	SAMN02415129	59,465,980	87%	20%
SRR1033787	SRX380383	SRP033200	SAMN02415130	64,463,476	88%	21%
SRR1033789	SRX380385	SRP033200	SAMN02415131	64,264,726	87%	22%
SRR1033791	SRX380387	SRP033200	SAMN02415132	66,877,618	85%	20%
SRR1033793	SRX380389	SRP033200	SAMN02415133	58,345,738	90%	26%
SRR1033796	SRX380392	SRP033200	SAMN02415134	67,583,586	90%	27%
SRR1033798	SRX380394	SRP033200	SAMN02415135	62,146,188	88%	20%
SRR1033795	SRX380391	SRP033200	SAMN02415136	73,076,698	89%	26%
SRR1033797	SRX380393	SRP033200	SAMN02415137	63,526,304	88%	20%
SRR1033799	SRX380395	SRP033200	SAMN02415138	60,345,648	88%	20%
SRR2443105	SRX1258024	SRP063829	SAMN04095718	51,859,782	63%	25%
SRR2443106	SRX1258024	SRP063829	SAMN04095718	52,277,558	63%	25%
SRR2443107	SRX1258024	SRP063829	SAMN04095718	52,580,564	63%	25%
SRR2443108	SRX1258025	SRP063829	SAMN04095719	50,650,714	71%	25%
SRR2443109	SRX1258025	SRP063829	SAMN04095719	50,945,800	71%	25%
SRR2443110	SRX1258025	SRP063829	SAMN04095719	51,189,170	71%	25%
SRR2443111	SRX1258026	SRP063829	SAMN04095720	44,965,672	66%	26%
SRR2443112	SRX1258026	SRP063829	SAMN04095720	44,829,132	65%	26%
SRR2443113	SRX1258026	SRP063829	SAMN04095720	44,712,092	66%	26%
SRR2443114	SRX1258027	SRP063829	SAMN04095721	4,932,620	79%	32%
SRR2443115	SRX1258027	SRP063829	SAMN04095721	4,918,682	79%	32%
SRR2443116	SRX1258027	SRP063829	SAMN04095721	4,967,034	79%	32%
SRR2443117	SRX1258027	SRP063829	SAMN04095721	58,948,032	79%	32%
SRR2443118	SRX1258027	SRP063829	SAMN04095721	81,315,488	79%	32%
SRR2443119	SRX1258028	SRP063829	SAMN04095722	17,951,836	84%	33%
SRR2443120	SRX1258028	SRP063829	SAMN04095722	44,138,172	84%	33%
SRR2443121	SRX1258028	SRP063829	SAMN04095722	60,965,196	84%	33%
SRR2443122	SRX1258029	SRP063829	SAMN04095723	9,728,180	83%	30%
SRR2443123	SRX1258029	SRP063829	SAMN04095723	9,707,034	84%	30%
SRR2443124	SRX1258029	SRP063829	SAMN04095723	9,868,658	84%	30%
SRR2443125	SRX1258029	SRP063829	SAMN04095723	50,920,706	84%	30%
SRR2443126	SRX1258029	SRP063829	SAMN04095723	68,859,482	84%	30%

Protein alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species GenBank	62,164	29,735 (47.83%)	29,735 (47.83%)	78.81%	84.75%
Same-species known RefSeq (NP_)	38,037	31,803 (83.61%)	31,803 (83.61%)	82.40%	91.41%
Rattus norvegicus known RefSeq (NP_)	17,793	16,480 (92.62%)	16,480 (92.62%)	76.96%	90.94%
Homo sapiens known RefSeq (NP_)	59,428	43,687 (73.51%)	43,687 (73.51%)	77.18%	85.48%

Assembly-assembly alignments of current to previous assembly

When the assembly changes between two rounds of annotation, genes in the current and the previous annotation are mapped to each other using the genomic alignments of the current assembly to the previous assembly so that gene identifiers can be preserved. The success of the remapping depends largely on how well the two assembly versions align to each other.

Below are the percent coverage of one assembly by the other and the average percent identity of the alignments. The 'First pass' alignments are reciprocal best hits, while the 'Total' alignments also include 'Second pass' or non-reciprocal best alignments. For more information about the assembly-assembly alignment process, please visit the NCBI Genome Remapping Service page.

First Pass	Total
GRCm39 (Current) Coverage: 99.93%	GRCm39 (Current) Coverage: 99.96%
GRCm38.p6 (Previous) Coverage: 96.83%	GRCm38.p6 (Previous) Coverage: 99.84%
Percent Identity: 99.99%	Percent Identity: 99.97%

Comparison of the current and previous annotations

The annotation produced for this release (109) was compared to the annotation in the previous release (108) for each assembly annotated in both releases. Scores for current and previous gene and transcript features were calculated based on overlap in exon sequence and matches in exon boundaries. Pairs of current and previous features were categorized based on these scores, whether they are reciprocal best matches, and changes in attributes (gene biotype, completeness, etc.). If the assembly was updated between the two releases, alignments between the current and the previous assembly were used to match the current and previous gene and transcript features in mapped regions.

The table below summarizes the changes in the gene set for each assembly as a percent of the number of genes in the current annotation release, and provides links to the details of the comparison in tabular format and in a Genome Workbench project.

	GRCm39 (Current) to GRCm38.p6 (Previous)
Identical	68%
Minor changes	26%
Major changes	3%
New	3%
Deprecated	2%
Other	<1%
Download the report	tabular, Genome Workbench

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20

RefSeq

Integrated reference sequences