NCBI Bubalus bubalis Annotation Release 103

The RefSeq genome records for Bubalus bubalis were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. This report presents statistics on the annotation products, the input data used in the pipeline and intermediate alignment results.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
BUSCO results: Annotation completeness assessed with BUSCO
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction
Similarity of current and previous assembly: The similarity of the current and previous assembly
Comparison of the current and previous annotations: What proportion of the genes changed in this annotation

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as NCBI Bubalus bubalis Annotation Release 103

Annotation release ID: 103
Date of Entrez queries for transcripts and proteins: Nov 11 2021
Date of submission of annotation to the public databases: Nov 17 2021
Software version: 9.0

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
NDDB_SH_1	GCF_019923935.1	National Dairy Development Board, India	09-10-2021	Reference	26 assembled chromosomes

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	NDDB_SH_1
Genes and pseudogenes	37,167
protein-coding	21,359
non-coding	11,663
Transcribed pseudogenes	0
Non-transcribed pseudogenes	3,819
genes with variants	13,508
Immunoglobulin/T-cell receptor gene segments	292
other	34
mRNAs	64,365
fully-supported	63,052
with > 5% ab initio	660
partial	210
with filled gap(s)	128
known RefSeq (NM_)	164
model RefSeq (XM_)	64,201
non-coding RNAs	18,653
fully-supported	14,536
with > 5% ab initio	0
partial	3
with filled gap(s)	2
known RefSeq (NR_)	0
model RefSeq (XR_)	16,187
pseudo transcripts	0
fully-supported	0
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	0
CDSs	64,670
fully-supported	63,052
with > 5% ab initio	761
partial	202
with major correction(s)	616
known RefSeq (NP_)	164
model RefSeq (XP_)	64,214

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	33,056	39,243	9,276	41	2,402,664
All transcripts	83,018	3,284	2,536	41	105,371
mRNA	64,365	3,769	2,984	105	105,371
misc_RNA	3,214	2,894	2,311	196	21,135
tRNA	2,464	73	73	60	92
lncRNA	11,341	1,800	1,121	107	45,814
snoRNA	623	114	126	41	321
snRNA	959	113	107	61	200
rRNA	19	739	119	118	4,841
Single-exon transcripts	2,724	1,418	976	105	17,692
coding transcripts (NM_/XM_ )	2,724	1,418	976	105	17,692
CDSs	64,378	2,178	1,530	87	104,109
Exons	283,647	351	145	2	27,073
in coding transcripts (NM_/XM_ )	251,665	326	142	2	27,073
in non-coding transcripts (NR_/XR_ )	49,122	423	156	2	26,632
Introns	249,367	7,118	1,533	25	1,192,113
in coding transcripts (NM_/XM_ )	227,209	6,872	1,504	25	1,192,113
in non-coding transcripts (NR_/XR_ )	38,707	7,824	1,675	30	526,858

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	2.63	1	1	50
Number of exons per transcript	11.87	8	1	352

BUSCO analysis of gene annotation

BUSCO v4.1.4 (Simão et al 2015, PMID: 26059717) was run in "protein" mode on the annotated gene set picking one longest protein per gene, and run using the cetartiodactyla_odb10 lineage dataset. Results are reported for the gene set from the primary assembly unit, and presented in BUSCO notation (C:complete [S:single-copy, D:duplicated], F:fragmented, M:missing, n:number of genes used).

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the UniProtKB/Swiss-Prot curated proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 21346 coding genes, 20915 genes had a protein with an alignment covering 50% or more of the query and 17829 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: UniProtKB/Swiss-Prot curated proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker (if calculated), for each assembly. RepeatMasker results are only calculated for organisms with complete Dfam HMM model collections.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with WindowMasker
NDDB_SH_1	GCF_019923935.1	40.76%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez, aligned to the genome by Splign, minimap2, or ProSplign and passed to Gnomon, NCBI's gene prediction software.

Transcript alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species known RefSeq (NM_/NR_)	167	167 (100.00%)	159 (95.21%)	99.61%	99.19%
Same-species Genbank	1,712	1,674 (97.78%)	1,570 (91.71%)	99.13%	98.53%
Same-species EST	1,855	1,466 (79.03%)	1,276 (68.79%)	98.89%	98.42%

RNA-Seq alignments

The following RNA-Seq reads from the Sequence Read Archive were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Publication	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	NA	Aggregate of all aligned samples	9,850,623,333	81%	36%	345,293
SAMEA1968690	NA	large_intestine (Bubalus bubalis, SAMEA1968690)	104,347,784	76%	25%	196,589
SAMEA2142479	NA	lung (Bubalus bubalis, SAMEA2142479)	41,702,096	81%	23%	182,343
SAMEA2144234	NA	abomasum (Bubalus bubalis, SAMEA2144234)	53,648,600	76%	25%	174,164
SAMEA2146571	NA	tongue (Bubalus bubalis, SAMEA2146571)	51,046,542	76%	28%	165,811
SAMEA2150597	NA	obex (Bubalus bubalis, SAMEA2150597)	65,901,272	67%	18%	187,535
SAMEA2155194	NA	embryo single (Bubalus bubalis, SAMEA2155194)	10,628,524	23%	11%	36,280
SAMEA2156867	NA	hypophysis (Bubalus bubalis, SAMEA2156867)	54,401,998	79%	25%	178,142
SAMN01093896	NA	Transcriptome assembly of Riverine Buffalo (Bubalus bubalis bubalis, SAMN01093896)	98,060,582	88%	29%	175,966
SAMN03256161	NA	Term Placenta, Cotyledon (Bubalus bubalis, 6, SAMN03256161)	61,403,296	80%	20%	141,039
SAMN03855787	26766209	heart, brain, lung, kidney, fat, liver, spleen, uterus, testis, ovary and gland (Bubalus bubalis, Adult, pooled male and female, SAMN03855787)	52,979,055	159%	25%	232,512
SAMN05730165	NA	liver (Bubalus bubalis, ten months, male, SAMN05730165)	58,779,918	84%	36%	172,385
SAMN05730168	NA	liver (Bubalus bubalis, ten months, male, SAMN05730168)	61,679,484	83%	39%	177,146
SAMN05730169	NA	liver (Bubalus bubalis, ten months, female, SAMN05730169)	61,673,824	84%	39%	177,507
SAMN05730172	NA	liver (Bubalus bubalis, ten months, female, SAMN05730172)	54,749,108	84%	38%	171,903
SAMN05730173	NA	liver (Bubalus bubalis, ten months, male, SAMN05730173)	51,543,660	86%	39%	175,809
SAMN05730174	NA	liver (Bubalus bubalis, ten months, female, SAMN05730174)	479,633,360	84%	41%	238,096
SAMN05730191	NA	liver (Bubalus bubalis, ten months, male, SAMN05730191)	55,104,034	85%	42%	172,751
SAMN05730196	NA	liver (Bubalus bubalis, ten months, male, SAMN05730196)	67,748,128	84%	43%	174,959
SAMN05730198	NA	liver (Bubalus bubalis, ten months, male, SAMN05730198)	60,159,308	84%	45%	175,105
SAMN05730199	NA	liver (Bubalus bubalis, ten months, female, SAMN05730199)	59,653,148	85%	41%	172,397
SAMN05730635	NA	liver (Bubalus bubalis, ten months, female, SAMN05730635)	50,873,820	84%	42%	176,883
SAMN06758773	NA	Mammary parenchyma (Bubalus bubalis, female, SAMN06758773)	126,570,028	84%	22%	208,473
SAMN08271820	30002204	blastocyst (Bubalus bubalis, SAMN08271820)	53,763,960	65%	35%	104,831
SAMN08271821	30002204	morula (Bubalus bubalis, SAMN08271821)	59,338,344	66%	39%	93,333
SAMN08271822	30002204	16cells (Bubalus bubalis, SAMN08271822)	54,046,260	75%	38%	151,715
SAMN08271823	30002204	8cells (Bubalus bubalis, SAMN08271823)	56,933,268	80%	42%	165,141
SAMN08271824	30002204	4cells (Bubalus bubalis, SAMN08271824)	93,780,788	76%	43%	175,086
SAMN08271825	30002204	2cells (Bubalus bubalis, SAMN08271825)	77,156,170	89%	40%	187,885
SAMN08271826	30002204	MII (Bubalus bubalis, SAMN08271826)	83,922,266	90%	39%	165,960
SAMN08271827	30002204	GV (Bubalus bubalis, SAMN08271827)	94,886,112	89%	40%	184,522
SAMN08871710	31266200,31500202,33045988,33841496	adipose tissue (Bubalus bubalis, thirty months, SAMN08871710)	133,216,720	73%	15%	164,309
SAMN08871711	31266200,31500202,33045988,33841496	adipose tissue (Bubalus bubalis, thirty months, SAMN08871711)	163,907,892	75%	16%	174,260
SAMN08871712	31266200,31500202,33045988,33841496	adipose tissue (Bubalus bubalis, thirty months, SAMN08871712)	146,336,524	75%	16%	165,560
SAMN08871713	31266200,31500202,33045988,33841496	adipose tissue (Bubalus bubalis, six months, SAMN08871713)	173,289,672	56%	15%	174,787
SAMN08871714	31266200,31500202,33045988,33841496	adipose tissue (Bubalus bubalis, six months, SAMN08871714)	192,283,524	55%	12%	168,526
SAMN08871715	31266200,31500202,33045988,33841496	adipose tissue (Bubalus bubalis, six months, SAMN08871715)	194,133,440	54%	12%	181,417
SAMN08991595	NA	Milk (Bubalus bubalis, female, SAMN08991595)	73,845,150	84%	26%	165,724
SAMN08991596	NA	Milk (Bubalus bubalis, female, SAMN08991596)	63,252,478	90%	38%	160,231
SAMN08991597	NA	Milk (Bubalus bubalis, female, SAMN08991597)	47,773,688	76%	35%	148,816
SAMN08991598	NA	Milk (Bubalus bubalis, female, SAMN08991598)	88,926,132	87%	30%	170,323
SAMN08991599	NA	Milk (Bubalus bubalis, female, SAMN08991599)	86,709,746	86%	25%	171,414
SAMN08991600	NA	Milk (Bubalus bubalis, female, SAMN08991600)	80,978,704	85%	28%	172,277
SAMN08991601	NA	Milk (Bubalus bubalis, female, SAMN08991601)	105,728,098	84%	24%	179,337
SAMN08991602	NA	Milk (Bubalus bubalis, female, SAMN08991602)	108,754,608	83%	25%	183,981
SAMN08991603	NA	Milk (Bubalus bubalis, female, SAMN08991603)	48,109,684	76%	30%	157,867
SAMN08991604	NA	Milk (Bubalus bubalis, female, SAMN08991604)	114,278,262	83%	27%	186,134
SAMN08991605	NA	Milk (Bubalus bubalis, female, SAMN08991605)	69,643,178	83%	28%	171,310
SAMN11523623	NA	mammary (Bubalus bubalis, female, SAMN11523623)	53,507,352	87%	51%	151,031
SAMN11523625	NA	mammary (Bubalus bubalis, female, SAMN11523625)	41,026,966	88%	60%	146,509
SAMN11523669	NA	mammary (Bubalus bubalis, female, SAMN11523669)	48,171,438	89%	62%	150,073
SAMN11523818	NA	mammary (Bubalus bubalis, female, SAMN11523818)	57,849,732	87%	53%	154,798
SAMN11523822	NA	mammary (Bubalus bubalis, female, SAMN11523822)	64,808,894	82%	43%	162,156
SAMN11523824	NA	mammary (Bubalus bubalis, female, SAMN11523824)	62,158,506	84%	42%	190,733
SAMN11523827	NA	mammary (Bubalus bubalis, female, SAMN11523827)	50,809,600	85%	45%	168,853
SAMN11523868	NA	mammary (Bubalus bubalis, female, SAMN11523868)	56,687,218	89%	58%	151,565
SAMN11523887	NA	mammary (Bubalus bubalis, female, SAMN11523887)	57,439,148	89%	60%	152,223
SAMN13059400	33045988,33841496	muscle (Bubalus bubalis, SAMN13059400)	152,454,256	62%	32%	164,952
SAMN13059401	33045988,33841496	muscle (Bubalus bubalis, SAMN13059401)	162,419,446	61%	31%	165,801
SAMN13059403	33045988,33841496	muscle (Bubalus bubalis, SAMN13059403)	167,106,268	62%	31%	161,863
SAMN13814144	NA	Oocyte (Bubalus bubalis, female, SAMN13814144)	42,830,047	93%	42%	158,337
SAMN13814145	NA	Oocyte (Bubalus bubalis, female, SAMN13814145)	51,709,496	94%	42%	173,985
SAMN13814146	NA	Oocyte (Bubalus bubalis, female, SAMN13814146)	47,610,799	93%	42%	173,097
SAMN13814147	NA	Oocyte (Bubalus bubalis, female, SAMN13814147)	52,842,743	93%	42%	179,460
SAMN13814148	NA	Oocyte (Bubalus bubalis, female, SAMN13814148)	61,871,879	93%	43%	195,708
SAMN13814149	NA	Oocyte (Bubalus bubalis, female, SAMN13814149)	45,361,210	92%	41%	161,808
SAMN13814150	NA	Oocyte (Bubalus bubalis, female, SAMN13814150)	39,992,530	93%	43%	171,701
SAMN13814151	NA	Oocyte (Bubalus bubalis, female, SAMN13814151)	36,293,582	93%	42%	142,969
SAMN13814152	NA	Oocyte (Bubalus bubalis, female, SAMN13814152)	31,915,227	92%	40%	132,218
SAMN13814153	NA	Oocyte (Bubalus bubalis, female, SAMN13814153)	39,509,526	93%	42%	172,174
SAMN13814154	NA	Granulosa Cell (Bubalus bubalis, female, SAMN13814154)	95,221,585	89%	43%	180,825
SAMN13814155	NA	Granulosa Cell (Bubalus bubalis, female, SAMN13814155)	109,709,287	89%	40%	191,979
SAMN13814156	NA	Granulosa Cell (Bubalus bubalis, female, SAMN13814156)	119,200,714	89%	41%	185,882
SAMN13814157	NA	Granulosa Cell (Bubalus bubalis, female, SAMN13814157)	62,780,272	86%	39%	181,080
SAMN13814158	NA	Granulosa Cell (Bubalus bubalis, female, SAMN13814158)	128,912,094	89%	42%	184,072
SAMN13814159	NA	Granulosa Cell (Bubalus bubalis, female, SAMN13814159)	88,796,258	86%	42%	179,975
SAMN13814160	NA	Granulosa Cell (Bubalus bubalis, female, SAMN13814160)	102,281,640	88%	40%	179,221
SAMN13814161	NA	Granulosa Cell (Bubalus bubalis, female, SAMN13814161)	98,646,181	87%	41%	188,803
SAMN13814162	NA	Granulosa Cell (Bubalus bubalis, female, SAMN13814162)	113,829,456	90%	40%	193,614
SAMN13814163	NA	Granulosa Cell (Bubalus bubalis, female, SAMN13814163)	140,585,851	87%	42%	195,447
SAMN15010194	NA	skin tissues(ear) (Bubalus bubalis, 2-6 years, SAMN15010194)	39,798,464	84%	38%	183,193
SAMN15010195	NA	skin tissues(ear) (Bubalus bubalis, 2-6 years, SAMN15010195)	44,533,226	84%	38%	185,093
SAMN15010196	NA	skin tissues(ear) (Bubalus bubalis, 2-6 years, SAMN15010196)	51,260,934	85%	38%	189,105
SAMN15010197	NA	skin tissues(ear) (Bubalus bubalis, 2-6 years, SAMN15010197)	45,092,202	86%	40%	182,431
SAMN15010198	NA	skin tissues(ear) (Bubalus bubalis, 2-6 years, SAMN15010198)	44,928,512	85%	37%	186,974
SAMN15010199	NA	skin tissues(ear) (Bubalus bubalis, 2-6 years, SAMN15010199)	47,112,588	86%	36%	187,245
SAMN15222733	NA	longissimus dorsi (Bubalus bubalis, male, SAMN15222733)	104,672,134	84%	28%	169,195
SAMN15222734	NA	longissimus dorsi (Bubalus bubalis, male, SAMN15222734)	108,274,434	85%	30%	173,655
SAMN15222735	NA	longissimus dorsi (Bubalus bubalis, male, SAMN15222735)	105,365,194	86%	31%	175,848
SAMN15222736	NA	longissimus dorsi (Bubalus bubalis, male, SAMN15222736)	106,828,636	84%	24%	175,107
SAMN15222737	NA	longissimus dorsi (Bubalus bubalis, male, SAMN15222737)	100,183,377	82%	25%	171,595
SAMN15222738	NA	longissimus dorsi (Bubalus bubalis, male, SAMN15222738)	103,416,606	83%	26%	173,222
SAMN17182570	34363886	testis (Bubalus bubalis, post natal months 4, male, SAMN17182570)	43,057,570	92%	42%	168,054
SAMN17182571	34363886	testis (Bubalus bubalis, post natal months 4, male, SAMN17182571)	40,730,298	91%	41%	166,258
SAMN17182572	34363886	testis (Bubalus bubalis, post natal months 4, male, SAMN17182572)	38,607,788	91%	42%	165,474
SAMN17182573	34363886	testis (Bubalus bubalis, post natal months 24, male, SAMN17182573)	42,877,044	90%	42%	166,412
SAMN17182574	34363886	testis (Bubalus bubalis, post natal months 24, male, SAMN17182574)	42,879,004	90%	43%	166,365
SAMN17182575	34363886	testis (Bubalus bubalis, post natal months 24, male, SAMN17182575)	43,037,688	90%	42%	166,465
SAMN17313764	34306003	Muscle (Bubalus bubalis, SAMN17313764)	60,247,382	86%	47%	173,508
SAMN17313765	34306003	Muscle (Bubalus bubalis, SAMN17313765)	65,025,170	85%	47%	173,476
SAMN17313766	34306003	Muscle (Bubalus bubalis, SAMN17313766)	64,813,710	86%	47%	175,343
SAMN17313767	34306003	Muscle (Bubalus bubalis, SAMN17313767)	63,795,084	83%	49%	177,063
SAMN17313768	34306003	Muscle (Bubalus bubalis, SAMN17313768)	61,516,490	85%	49%	178,309
SAMN17313769	34306003	Muscle (Bubalus bubalis, SAMN17313769)	57,596,044	86%	50%	176,256
SAMN18250201	34144152	ovary (Bubalus bubalis, SAMN18250201)	46,078,398	84%	39%	179,511
SAMN18250202	34144152	ovary (Bubalus bubalis, SAMN18250202)	40,090,196	82%	42%	173,899
SAMN18250203	34144152	ovary (Bubalus bubalis, SAMN18250203)	42,430,016	85%	41%	176,046
SAMN18250204	34144152	ovary (Bubalus bubalis, SAMN18250204)	42,697,580	79%	39%	177,549
SAMN18250205	34144152	ovary (Bubalus bubalis, SAMN18250205)	42,755,486	77%	41%	169,430
SAMN18250206	34144152	ovary (Bubalus bubalis, SAMN18250206)	50,926,312	81%	44%	179,900
SAMN18250207	34144152	ovary (Bubalus bubalis, SAMN18250207)	49,523,304	84%	38%	170,813
SAMN18250208	34144152	ovary (Bubalus bubalis, SAMN18250208)	47,627,998	82%	37%	170,733
SAMN18250209	34144152	ovary (Bubalus bubalis, SAMN18250209)	40,725,910	84%	38%	167,847
SAMN21220033	NA	brain1 (Bubalus bubalis, 5 years, female, SAMN21220033)	43,943,598	82%	40%	184,119
SAMN21220034	NA	Leg muscle1 (Bubalus bubalis, 5 years, female, SAMN21220034)	46,813,434	87%	42%	172,650
SAMN21220035	NA	uterus (Bubalus bubalis, 5 years, female, SAMN21220035)	43,941,274	87%	38%	192,240
SAMN21220036	NA	ovary (Bubalus bubalis, 5 years, female, SAMN21220036)	55,498,634	81%	44%	177,205
SAMN21220037	NA	Medulla oblongata (Bubalus bubalis, 5 years, female, SAMN21220037)	46,365,100	76%	40%	186,078
SAMN21220038	NA	hypothalamus (Bubalus bubalis, 5 years, female, SAMN21220038)	49,143,036	77%	39%	186,645
SAMN21220039	NA	brain2 (Bubalus bubalis, 5 years, female, SAMN21220039)	49,739,516	77%	40%	183,414
SAMN21220040	NA	lung (Bubalus bubalis, 5 years, female, SAMN21220040)	50,773,198	87%	38%	195,375
SAMN21220041	NA	brain3 (Bubalus bubalis, 5 years, female, SAMN21220041)	45,023,378	77%	42%	177,091
SAMN21220042	NA	muscle (Bubalus bubalis, 5 years, female, SAMN21220042)	43,736,134	88%	42%	167,816
SAMN21220043	NA	liver (Bubalus bubalis, 5 years, female, SAMN21220043)	52,332,246	86%	46%	175,599
SAMN21220044	NA	brain4 (Bubalus bubalis, 5 years, female, SAMN21220044)	48,081,592	76%	41%	175,346
SAMN21220045	NA	brain5 (Bubalus bubalis, 5 years, female, SAMN21220045)	49,355,978	70%	38%	171,558
SAMN21220046	NA	Pineal gland (Bubalus bubalis, 5 years, female, SAMN21220046)	54,742,336	84%	45%	193,243
SAMN21220047	NA	brain6 (Bubalus bubalis, 5 years, female, SAMN21220047)	46,641,716	78%	41%	177,348
SAMN21220048	NA	brain7 (Bubalus bubalis, 5 years, female, SAMN21220048)	46,647,426	76%	39%	172,923
SAMN21220049	NA	brain8 (Bubalus bubalis, 5 years, female, SAMN21220049)	56,662,426	76%	38%	181,908
SAMN21220050	NA	cerebellum (Bubalus bubalis, 5 years, female, SAMN21220050)	52,502,120	75%	40%	175,458
SAMN21220051	NA	spleen (Bubalus bubalis, 5 years, female, SAMN21220051)	52,944,322	83%	39%	184,166
SAMN21220052	NA	kidney (Bubalus bubalis, 5 years, female, SAMN21220052)	45,237,402	78%	40%	183,058
SAMN21220053	NA	heart (Bubalus bubalis, 5 years, female, SAMN21220053)	50,899,820	70%	46%	171,178
SAMN21220054	NA	fat (Bubalus bubalis, 5 years, female, SAMN21220054)	51,634,252	85%	36%	181,059
SAMN21220055	NA	Leg muscle2 (Bubalus bubalis, 5 years, female, SAMN21220055)	51,164,090	85%	39%	172,213
SAMN21220056	NA	tongue (Bubalus bubalis, 5 years, female, SAMN21220056)	56,674,088	68%	41%	173,711

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
ERR315620	ERX288768	ERP003627	SAMEA1968690	104,347,784	76%	25%
ERR315645	ERX288793	ERP003627	SAMEA2142479	41,702,096	81%	23%
ERR315618	ERX288766	ERP003627	SAMEA2144234	53,648,600	76%	25%
ERR315616	ERX288764	ERP003627	SAMEA2146571	51,046,542	76%	28%
ERR315621	ERX288769	ERP003627	SAMEA2150597	65,901,272	67%	18%
ERR315638	ERX288786	ERP003627	SAMEA2155194	10,628,524	23%	11%
ERR315622	ERX288770	ERP003627	SAMEA2156867	54,401,998	79%	25%
SRR4119663	SRX2100497	SRP051419	SAMN03256161	61,403,296	80%	20%
SRR2954814	SRX1440110	SRP066461	SAMN03855787	52,979,055	159%	25%
SRR4195884	SRX2144080	SRP086528	SAMN05730165	58,779,918	84%	36%
SRR4181744	SRX2144081	SRP086528	SAMN05730168	61,679,484	83%	39%
SRR4181754	SRX2144082	SRP086528	SAMN05730169	61,673,824	84%	39%
SRR4181761	SRX2144083	SRP086528	SAMN05730172	54,749,108	84%	38%
SRR4181765	SRX2144084	SRP086528	SAMN05730173	51,543,660	86%	39%
SRR4181757	SRX2139792	SRP086528	SAMN05730174	49,251,988	86%	42%
SRR4181770	SRX2139792	SRP086528	SAMN05730174	43,706,084	86%	43%
SRR4183457	SRX2139792	SRP086528	SAMN05730174	51,941,086	84%	43%
SRR4183458	SRX2139792	SRP086528	SAMN05730174	59,280,312	84%	41%
SRR4183461	SRX2139792	SRP086528	SAMN05730174	62,970,852	83%	39%
SRR4183463	SRX2139792	SRP086528	SAMN05730174	47,509,932	81%	41%
SRR4184632	SRX2139792	SRP086528	SAMN05730174	50,873,820	84%	42%
SRR4184633	SRX2139792	SRP086528	SAMN05730174	70,435,824	86%	42%
SRR4184634	SRX2139792	SRP086528	SAMN05730174	43,663,462	82%	35%
SRR4181774	SRX2144085	SRP086528	SAMN05730191	55,104,034	85%	42%
SRR4183466	SRX2144089	SRP086528	SAMN05730196	67,748,128	84%	43%
SRR4183467	SRX2144090	SRP086528	SAMN05730198	60,159,308	84%	45%
SRR4183491	SRX2144091	SRP086528	SAMN05730199	59,653,148	85%	41%
SRR4212918	SRX2144092	SRP086528	SAMN05730635	50,873,820	84%	42%
SRR5468160	SRX2753516	SRP104171	SAMN06758773	19,437,350	84%	24%
SRR5468161	SRX2753517	SRP104171	SAMN06758773	19,437,350	85%	24%
SRR5468162	SRX2753518	SRP104171	SAMN06758773	24,618,695	84%	22%
SRR5468163	SRX2753519	SRP104171	SAMN06758773	24,618,695	85%	22%
SRR5468164	SRX2753520	SRP104171	SAMN06758773	19,228,969	83%	21%
SRR5468165	SRX2753521	SRP104171	SAMN06758773	19,228,969	84%	21%
SRR6425315	SRX3517699	SRP127629	SAMN08271820	53,763,960	65%	35%
SRR6425314	SRX3517698	SRP127629	SAMN08271821	59,338,344	66%	39%
SRR6425313	SRX3517697	SRP127629	SAMN08271822	54,046,260	75%	38%
SRR6425312	SRX3517696	SRP127629	SAMN08271823	56,933,268	80%	42%
SRR6425311	SRX3517695	SRP127629	SAMN08271824	93,780,788	76%	43%
SRR6425310	SRX3517694	SRP127629	SAMN08271825	77,156,170	89%	40%
SRR6425309	SRX3517693	SRP127629	SAMN08271826	83,922,266	90%	39%
SRR6425308	SRX3517692	SRP127629	SAMN08271827	94,886,112	89%	40%
SRR6949372	SRX3892937	SRP137711	SAMN08871710	133,216,720	73%	15%
SRR6949371	SRX3892936	SRP137711	SAMN08871711	163,907,892	75%	16%
SRR6949370	SRX3892935	SRP137711	SAMN08871712	146,336,524	75%	16%
SRR6949369	SRX3892934	SRP137711	SAMN08871713	173,289,672	56%	15%
SRR6949368	SRX3892933	SRP137711	SAMN08871714	192,283,524	55%	12%
SRR6949367	SRX3892932	SRP137711	SAMN08871715	194,133,440	54%	12%
SRR7091388	SRX4020291	SRP144268	SAMN01093896	98,060,582	88%	29%
SRR7091387	SRX4020292	SRP144268	SAMN08991595	73,845,150	84%	26%
SRR7091390	SRX4020289	SRP144268	SAMN08991596	63,252,478	90%	38%
SRR7091389	SRX4020290	SRP144268	SAMN08991597	47,773,688	76%	35%
SRR7091392	SRX4020287	SRP144268	SAMN08991598	88,926,132	87%	30%
SRR7091391	SRX4020288	SRP144268	SAMN08991599	86,709,746	86%	25%
SRR7091394	SRX4020285	SRP144268	SAMN08991600	80,978,704	85%	28%
SRR7091393	SRX4020286	SRP144268	SAMN08991601	105,728,098	84%	24%
SRR7091396	SRX4020283	SRP144268	SAMN08991602	108,754,608	83%	25%
SRR7091395	SRX4020284	SRP144268	SAMN08991603	48,109,684	76%	30%
SRR7091398	SRX4020281	SRP144268	SAMN08991604	114,278,262	83%	27%
SRR7091397	SRX4020282	SRP144268	SAMN08991605	69,643,178	83%	28%
SRR8993015	SRX5771955	SRP194304	SAMN11523623	53,507,352	87%	51%
SRR8993016	SRX5771954	SRP194304	SAMN11523625	41,026,966	88%	60%
SRR8993013	SRX5771957	SRP194304	SAMN11523669	48,171,438	89%	62%
SRR8993014	SRX5771956	SRP194304	SAMN11523818	57,849,732	87%	53%
SRR8993011	SRX5771959	SRP194304	SAMN11523822	64,808,894	82%	43%
SRR8993012	SRX5771958	SRP194304	SAMN11523824	62,158,506	84%	42%
SRR8993010	SRX5771960	SRP194304	SAMN11523827	50,809,600	85%	45%
SRR8993018	SRX5771952	SRP194304	SAMN11523868	56,687,218	89%	58%
SRR8993017	SRX5771953	SRP194304	SAMN11523887	57,439,148	89%	60%
SRR10312261	SRX7023481	SRP226223	SAMN13059400	152,454,256	62%	32%
SRR10312260	SRX7023480	SRP226223	SAMN13059401	162,419,446	61%	31%
SRR10312259	SRX7023479	SRP226223	SAMN13059403	167,106,268	62%	31%
SRR10863221	SRX7533225	SRP241092	SAMN13814144	42,830,047	93%	42%
SRR10863220	SRX7533226	SRP241092	SAMN13814145	51,709,496	94%	42%
SRR10863209	SRX7533237	SRP241092	SAMN13814146	47,610,799	93%	42%
SRR10863208	SRX7533238	SRP241092	SAMN13814147	52,842,743	93%	42%
SRR10863207	SRX7533239	SRP241092	SAMN13814148	61,871,879	93%	43%
SRR10863206	SRX7533240	SRP241092	SAMN13814149	45,361,210	92%	41%
SRR10863205	SRX7533241	SRP241092	SAMN13814150	39,992,530	93%	43%
SRR10863204	SRX7533242	SRP241092	SAMN13814151	36,293,582	93%	42%
SRR10863203	SRX7533243	SRP241092	SAMN13814152	31,915,227	92%	40%
SRR10863202	SRX7533244	SRP241092	SAMN13814153	39,509,526	93%	42%
SRR10863219	SRX7533227	SRP241092	SAMN13814154	95,221,585	89%	43%
SRR10863218	SRX7533228	SRP241092	SAMN13814155	109,709,287	89%	40%
SRR10863217	SRX7533229	SRP241092	SAMN13814156	119,200,714	89%	41%
SRR10863216	SRX7533230	SRP241092	SAMN13814157	62,780,272	86%	39%
SRR10863215	SRX7533231	SRP241092	SAMN13814158	128,912,094	89%	42%
SRR10863214	SRX7533232	SRP241092	SAMN13814159	88,796,258	86%	42%
SRR10863213	SRX7533233	SRP241092	SAMN13814160	102,281,640	88%	40%
SRR10863212	SRX7533234	SRP241092	SAMN13814161	98,646,181	87%	41%
SRR10863211	SRX7533235	SRP241092	SAMN13814162	113,829,456	90%	40%
SRR10863210	SRX7533236	SRP241092	SAMN13814163	140,585,851	87%	42%
SRR11842091	SRX8392433	SRP263117	SAMN15010194	39,798,464	84%	38%
SRR11842090	SRX8392434	SRP263117	SAMN15010195	44,533,226	84%	38%
SRR11842089	SRX8392435	SRP263117	SAMN15010196	51,260,934	85%	38%
SRR11842088	SRX8392436	SRP263117	SAMN15010197	45,092,202	86%	40%
SRR11842087	SRX8392437	SRP263117	SAMN15010198	44,928,512	85%	37%
SRR11842086	SRX8392438	SRP263117	SAMN15010199	47,112,588	86%	36%
SRR12023715	SRX8555538	SRP267511	SAMN15222733	104,672,134	84%	28%
SRR12023714	SRX8555539	SRP267511	SAMN15222734	108,274,434	85%	30%
SRR12023713	SRX8555540	SRP267511	SAMN15222735	105,365,194	86%	31%
SRR12023712	SRX8555541	SRP267511	SAMN15222736	106,828,636	84%	24%
SRR12023711	SRX8555542	SRP267511	SAMN15222737	100,183,377	82%	25%
SRR12023710	SRX8555543	SRP267511	SAMN15222738	103,416,606	83%	26%
SRR13327310	SRX9754422	SRP299754	SAMN17182570	43,057,570	92%	42%
SRR13327309	SRX9754423	SRP299754	SAMN17182571	40,730,298	91%	41%
SRR13327308	SRX9754424	SRP299754	SAMN17182572	38,607,788	91%	42%
SRR13327313	SRX9754419	SRP299754	SAMN17182573	42,877,044	90%	42%
SRR13327312	SRX9754420	SRP299754	SAMN17182574	42,879,004	90%	43%
SRR13327311	SRX9754421	SRP299754	SAMN17182575	43,037,688	90%	42%
SRR13435639	SRX9849197	SRP301734	SAMN17313764	60,247,382	86%	47%
SRR13435638	SRX9849196	SRP301734	SAMN17313765	65,025,170	85%	47%
SRR13435637	SRX9849195	SRP301734	SAMN17313766	64,813,710	86%	47%
SRR13435636	SRX9849194	SRP301734	SAMN17313767	63,795,084	83%	49%
SRR13435635	SRX9849193	SRP301734	SAMN17313768	61,516,490	85%	49%
SRR13435634	SRX9849192	SRP301734	SAMN17313769	57,596,044	86%	50%
SRR13931415	SRX10310795	SRP310172	SAMN18250201	46,078,398	84%	39%
SRR13931414	SRX10310794	SRP310172	SAMN18250202	40,090,196	82%	42%
SRR13931413	SRX10310793	SRP310172	SAMN18250203	42,430,016	85%	41%
SRR13931412	SRX10310792	SRP310172	SAMN18250204	42,697,580	79%	39%
SRR13931411	SRX10310791	SRP310172	SAMN18250205	42,755,486	77%	41%
SRR13931410	SRX10310790	SRP310172	SAMN18250206	50,926,312	81%	44%
SRR13931409	SRX10310798	SRP310172	SAMN18250207	49,523,304	84%	38%
SRR13931408	SRX10310797	SRP310172	SAMN18250208	47,627,998	82%	37%
SRR13931407	SRX10310796	SRP310172	SAMN18250209	40,725,910	84%	38%
SRR15721761	SRX12017226	SRP335698	SAMN21220033	43,943,598	82%	40%
SRR15721760	SRX12017227	SRP335698	SAMN21220034	46,813,434	87%	42%
SRR15721749	SRX12017238	SRP335698	SAMN21220035	43,941,274	87%	38%
SRR15721744	SRX12017243	SRP335698	SAMN21220036	55,498,634	81%	44%
SRR15721743	SRX12017244	SRP335698	SAMN21220037	46,365,100	76%	40%
SRR15721742	SRX12017245	SRP335698	SAMN21220038	49,143,036	77%	39%
SRR15721741	SRX12017246	SRP335698	SAMN21220039	49,739,516	77%	40%
SRR15721739	SRX12017248	SRP335698	SAMN21220040	50,773,198	87%	38%
SRR15721738	SRX12017249	SRP335698	SAMN21220041	45,023,378	77%	42%
SRR15721759	SRX12017228	SRP335698	SAMN21220042	43,736,134	88%	42%
SRR15721758	SRX12017229	SRP335698	SAMN21220043	52,332,246	86%	46%
SRR15721757	SRX12017230	SRP335698	SAMN21220044	48,081,592	76%	41%
SRR15721756	SRX12017231	SRP335698	SAMN21220045	49,355,978	70%	38%
SRR15721755	SRX12017232	SRP335698	SAMN21220046	54,742,336	84%	45%
SRR15721754	SRX12017233	SRP335698	SAMN21220047	46,641,716	78%	41%
SRR15721753	SRX12017234	SRP335698	SAMN21220048	46,647,426	76%	39%
SRR15721752	SRX12017235	SRP335698	SAMN21220049	56,662,426	76%	38%
SRR15721751	SRX12017236	SRP335698	SAMN21220050	52,502,120	75%	40%
SRR15721750	SRX12017237	SRP335698	SAMN21220051	52,944,322	83%	39%
SRR15721740	SRX12017247	SRP335698	SAMN21220052	45,237,402	78%	40%
SRR15721745	SRX12017242	SRP335698	SAMN21220053	50,899,820	70%	46%
SRR15721746	SRX12017241	SRP335698	SAMN21220054	51,634,252	85%	36%
SRR15721747	SRX12017240	SRP335698	SAMN21220055	51,164,090	85%	39%
SRR15721748	SRX12017239	SRP335698	SAMN21220056	56,674,088	68%	41%

Protein alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species GenBank	1,113	806 (72.42%)	806 (72.42%)	81.17%	93.35%
Same-species known RefSeq (NP_)	167	158 (94.61%)	158 (94.61%)	84.07%	90.53%
Homo sapiens known RefSeq (NP_)	62,832	45,891 (73.04%)	45,891 (73.04%)	78.20%	86.28%
Bos taurus known RefSeq (NP_)	14,210	13,490 (94.93%)	13,490 (94.93%)	79.97%	91.72%

Assembly-assembly alignments of current to previous assembly

When the assembly changes between two rounds of annotation, genes in the current and the previous annotation are mapped to each other using the genomic alignments of the current assembly to the previous assembly so that gene identifiers can be preserved. The success of the remapping depends largely on how well the two assembly versions align to each other.

Below are the percent coverage of one assembly by the other and the average percent identity of the alignments. The 'First pass' alignments are reciprocal best hits, while the 'Total' alignments also include 'Second pass' or non-reciprocal best alignments. For more information about the assembly-assembly alignment process, please visit the NCBI Genome Remapping Service page.

First Pass	Total
NDDB_SH_1 (Current) Coverage: 98.21%	NDDB_SH_1 (Current) Coverage: 98.50%
NDDB_DH_1 (Previous) Coverage: 98.57%	NDDB_DH_1 (Previous) Coverage: 99.17%
Percent Identity: 99.58%	Percent Identity: 99.56%

Comparison of the current and previous annotations

The annotation produced for this release (103) was compared to the annotation in the previous release (102) for each assembly annotated in both releases. Scores for current and previous gene and transcript features were calculated based on overlap in exon sequence and matches in exon boundaries. Pairs of current and previous features were categorized based on these scores, whether they are reciprocal best matches, and changes in attributes (gene biotype, completeness, etc.). If the assembly was updated between the two releases, alignments between the current and the previous assembly were used to match the current and previous gene and transcript features in mapped regions.

The table below summarizes the changes in the gene set for each assembly as a percent of the number of genes in the current annotation release, and provides links to the details of the comparison in tabular format and in a Genome Workbench project.

	NDDB_SH_1 (Current) to NDDB_DH_1 (Previous)
Identical	70%
Minor changes	20%
Major changes	4%
New	5%
Deprecated	5%
Other	1%
Download the report	tabular, Genome Workbench

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20
Minimap2: Li H. Bioinformatics 2018 Sep 15;34(18):3094-3100

RefSeq

Integrated reference sequences