NCBI Falco cherrug Annotation Release GCF_023634085.1-RS_2023_04

The genome sequence records for Falco cherrug RefSeq assembly GCF_023634085.1 (bFalChe1.pri) were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
BUSCO results: Annotation completeness assessed with BUSCO
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction
Similarity of current and previous assembly: The similarity of the current and previous assembly
Comparison of the current and previous annotations: What proportion of the genes changed in this annotation

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as "GCF_023634085.1-RS_2023_04".

Date of Entrez queries for transcripts and proteins: Apr 26 2023
Date of submission of annotation to the public databases: May 2 2023
Software version: 10.1

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
bFalChe1.pri	GCF_023634085.1	Vertebrate Genomes Project	06-02-2022	Reference	25 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	bFalChe1.pri
Genes and pseudogenes	19,950
protein-coding	16,251
non-coding	3,462
Transcribed pseudogenes	15
Non-transcribed pseudogenes	187
genes with variants	9,049
Immunoglobulin/T-cell receptor gene segments	18
other	17
mRNAs	42,949
fully-supported	41,895
with > 5% ab initio	532
partial	126
with filled gap(s)	0
known RefSeq (NM_)	4
model RefSeq (XM_)	42,945
non-coding RNAs	7,597
fully-supported	6,940
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	7,326
pseudo transcripts	15
fully-supported	7
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	15
CDSs	42,980
fully-supported	41,895
with > 5% ab initio	640
partial	128
with major correction(s)	821
known RefSeq (NP_)	4
model RefSeq (XP_)	42,958

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	19,730	34,522	13,321	60	1,210,443
All transcripts	50,546	4,230	3,420	60	95,515
mRNA	42,949	4,295	3,546	180	95,515
misc_RNA	2,171	4,117	3,177	195	32,994
tRNA	269	74	73	66	89
lncRNA	4,775	4,240	2,690	100	37,060
snoRNA	217	107	91	62	321
snRNA	45	150	164	60	192
rRNA	103	811	119	118	4,268
Single-exon transcripts	720	2,149	1,457	180	15,757
coding transcripts (NM_/XM_ )	720	2,149	1,457	180	15,757
CDSs	42,962	2,184	1,593	96	94,530
Exons	224,228	390	140	1	31,930
in coding transcripts (NM_/XM_ )	208,361	343	138	1	31,930
in non-coding transcripts (NR_/XR_ )	28,714	646	151	11	30,992
Introns	201,537	4,175	1,055	30	803,409
in coding transcripts (NM_/XM_ )	190,225	4,063	1,035	30	803,409
in non-coding transcripts (NR_/XR_ )	23,852	4,601	1,214	30	460,444

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	2.58	1	1	50
Number of exons per transcript	13.08	10	1	258

BUSCO analysis of gene annotation

BUSCO v4.1.4 was run in "protein" mode on the annotated gene set picking one longest protein per gene, and run using the aves_odb10 lineage dataset. Results are reported for the gene set from the primary assembly unit, and presented in BUSCO notation.

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the UniProtKB/Swiss-Prot curated proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 16238 coding genes, 15651 genes had a protein with an alignment covering 50% or more of the query and 10971 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: UniProtKB/Swiss-Prot curated proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker (if calculated), for each assembly. RepeatMasker results are only calculated for organisms with complete Dfam HMM model collections.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with WindowMasker
bFalChe1.pri	GCF_023634085.1	24.31%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez Nucleotide, Entrez Protein, and SRA, and aligned to the genome.

Transcript alignments

The alignments of the following transcripts with Splign were used for gene prediction:

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species known RefSeq (NM_/NR_)	4	4 (100.00%)	4 (100.00%)	99.91%	100.00%
Same-species Genbank	1	1 (100.00%)	1 (100.00%)	100.00%	100.00%
Aves known RefSeq (NM_/NR_)	11,268	9,743 (86.47%)	3,991 (35.42%)	90.79%	86.47%
Aves Genbank	44,588	30,390 (68.16%)	14,184 (31.81%)	90.86%	91.13%
Aves TSA	684,293	476,416 (69.62%)	82,634 (12.08%)	98.13%	98.66%
Aves EST	756,952	272,288 (35.97%)	170,455 (22.52%)	91.35%	97.05%

RNA-Seq alignments

The alignments of the following RNA-Seq reads with STAR were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Publication	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	NA	Aggregate of all aligned samples	4,910,971,389	56%	27%	242,744
SAMD00156794	NA	skin (Falco peregrinus, SAMD00156794)	16,031,718	50%	40%	121,856
SAMN01055120	9724766,12078637,23525076	Falco peregrinus reads sequenced by BGI (Falco, SAMN01055120)	79,571,270	80%	27%	114,467
SAMN01055121	NA	Falco cherrug reads sequenced by BGI (Falco, SAMN01055121)	76,863,908	85%	25%	116,634
SAMN04525554	NA	liver (Falco sparverius, male, SAMN04525554)	17,456,911	83%	12%	70,298
SAMN04525555	NA	liver (Falco sparverius, female, SAMN04525555)	278,044,640	76%	11%	131,441
SAMN04525556	NA	liver (Falco sparverius, female, SAMN04525556)	39,385,559	84%	12%	99,446
SAMN04525557	NA	liver (Falco sparverius, male, SAMN04525557)	34,611,542	81%	10%	89,414
SAMN04525558	NA	liver (Falco sparverius, female, SAMN04525558)	38,265,091	80%	11%	87,064
SAMN04525559	NA	liver (Falco sparverius, female, SAMN04525559)	31,591,785	84%	11%	101,091
SAMN04525560	NA	liver (Falco sparverius, female, SAMN04525560)	39,434,485	86%	11%	97,877
SAMN04525561	NA	liver (Falco sparverius, female, SAMN04525561)	28,290,295	82%	10%	91,168
SAMN04525562	NA	liver (Falco sparverius, female, SAMN04525562)	34,294,777	82%	11%	92,784
SAMN04531380	NA	retina and cochlea (Falco tinnunculus, SAMN04531380)	100,655,654	77%	26%	187,036
SAMN04531384	NA	retina and cochlea (Falco subbuteo, SAMN04531384)	100,188,960	78%	28%	190,867
SAMN05831928	NA	Liver (Falco sparverius, 24 Hours, male, SAMN05831928)	78,042,186	70%	28%	157,199
SAMN06101813	NA	Liver (Falco sparverius, 24 Hours, female, SAMN06101813)	80,936,692	70%	29%	155,714
SAMN06101814	NA	Liver (Falco sparverius, 24 Hours, female, SAMN06101814)	87,855,668	69%	28%	161,941
SAMN06101815	NA	Liver (Falco sparverius, 24 Hours, male, SAMN06101815)	115,745,142	74%	27%	116,026
SAMN06101816	NA	Liver (Falco sparverius, 24 Hours, male, SAMN06101816)	84,219,458	72%	28%	159,798
SAMN06101817	NA	Liver (Falco sparverius, 24 Hours, female, SAMN06101817)	84,760,702	67%	27%	158,704
SAMN06101818	NA	Liver (Falco sparverius, 24 Hours, female, SAMN06101818)	84,100,882	71%	29%	161,986
SAMN06101819	NA	Liver (Falco sparverius, 24 Hours, female, SAMN06101819)	86,763,472	71%	28%	162,312
SAMN06101820	NA	Liver (Falco sparverius, 24 Hours, male, SAMN06101820)	84,265,484	71%	27%	158,068
SAMN06101821	NA	Liver (Falco sparverius, 24 Hours, female, SAMN06101821)	87,216,286	72%	26%	166,636
SAMN06101822	NA	Liver (Falco sparverius, 24 Hours, male, SAMN06101822)	100,556,822	69%	28%	167,525
SAMN06101823	NA	Liver (Falco sparverius, 24 Hours, female, SAMN06101823)	86,700,708	71%	28%	163,025
SAMN06101824	NA	Liver (Falco sparverius, 24 Hours, female, SAMN06101824)	95,780,238	69%	25%	165,716
SAMN06101825	NA	Liver (Falco sparverius, 24 Hours, male, SAMN06101825)	83,002,084	72%	28%	162,464
SAMN06101826	NA	Liver (Falco sparverius, 24 Hours, female, SAMN06101826)	89,253,326	69%	27%	162,889
SAMN06101827	NA	Liver (Falco sparverius, 24 Hours, male, SAMN06101827)	87,793,098	74%	27%	167,623
SAMN06101828	NA	Liver (Falco sparverius, 24 Hours, male, SAMN06101828)	86,581,814	72%	27%	164,923
SAMN06101829	NA	Liver (Falco sparverius, 24 Hours, female, SAMN06101829)	74,998,584	75%	28%	162,941
SAMN08398480	31464627	Blood (Falco tinnunculus, SAMN08398480)	64,250,348	87%	22%	138,090
SAMN13755181	NA	blood (Falco sparverius, SAMN13755181)	103,522,512	88%	8%	96,153
SAMN26236054	NA	Spleen (Falco sparverius, male, SAMN26236054)	58,432,140	53%	44%	152,557
SAMN26236055	NA	Spleen (Falco sparverius, female, SAMN26236055)	51,045,524	43%	43%	147,544
SAMN26236056	NA	Spleen (Falco sparverius, female, SAMN26236056)	53,357,984	45%	37%	151,550
SAMN26236057	NA	Spleen (Falco sparverius, male, SAMN26236057)	53,913,832	44%	36%	153,774
SAMN26236058	NA	Spleen (Falco sparverius, female, SAMN26236058)	46,161,094	43%	36%	151,897
SAMN26236059	NA	Spleen (Falco sparverius, female, SAMN26236059)	52,435,956	44%	36%	153,519
SAMN26236060	NA	Spleen (Falco sparverius, male, SAMN26236060)	48,616,910	45%	38%	150,345
SAMN26236061	NA	Spleen (Falco sparverius, female, SAMN26236061)	57,356,480	46%	37%	159,099
SAMN26236062	NA	Spleen (Falco sparverius, female, SAMN26236062)	52,362,412	45%	37%	155,363
SAMN26236063	NA	Spleen (Falco sparverius, female, SAMN26236063)	46,644,860	42%	37%	148,511
SAMN26236064	NA	Spleen (Falco sparverius, male, SAMN26236064)	51,732,110	47%	36%	152,607
SAMN26236065	NA	Spleen (Falco sparverius, female, SAMN26236065)	49,677,822	42%	37%	152,900
SAMN26236066	NA	Spleen (Falco sparverius, male, SAMN26236066)	48,403,000	44%	38%	151,390
SAMN26236067	NA	Spleen (Falco sparverius, male, SAMN26236067)	52,372,850	45%	37%	152,715
SAMN26236068	NA	Spleen (Falco sparverius, female, SAMN26236068)	57,755,824	49%	36%	160,360
SAMN26236069	NA	Spleen (Falco sparverius, female, SAMN26236069)	50,609,016	43%	35%	155,080
SAMN26236070	NA	Spleen (Falco sparverius, male, SAMN26236070)	51,891,904	46%	35%	155,501
SAMN26236071	NA	Spleen (Falco sparverius, female, SAMN26236071)	48,444,062	45%	34%	152,543
SAMN26236072	NA	Spleen (Falco sparverius, female, SAMN26236072)	49,833,438	45%	34%	153,516
SAMN26236073	NA	Spleen (Falco sparverius, female, SAMN26236073)	47,113,338	44%	36%	151,738
SAMN26236074	NA	Spleen (Falco sparverius, male, SAMN26236074)	55,051,960	49%	35%	155,998
SAMN26236075	NA	Spleen (Falco sparverius, male, SAMN26236075)	54,310,696	45%	34%	156,785
SAMN26236076	NA	Spleen (Falco sparverius, female, SAMN26236076)	50,193,232	45%	35%	154,300
SAMN26236077	NA	Spleen (Falco sparverius, female, SAMN26236077)	50,721,640	45%	35%	157,302
SAMN26236078	NA	Spleen (Falco sparverius, male, SAMN26236078)	54,423,166	43%	36%	149,714
SAMN26236079	NA	Spleen (Falco sparverius, female, SAMN26236079)	49,659,604	45%	36%	151,840
SAMN26236080	NA	Spleen (Falco sparverius, male, SAMN26236080)	51,596,012	47%	35%	154,605
SAMN26236081	NA	Spleen (Falco sparverius, female, SAMN26236081)	54,776,994	44%	36%	154,693
SAMN26236082	NA	Spleen (Falco sparverius, male, SAMN26236082)	49,229,758	46%	36%	152,995
SAMN26236083	NA	Spleen (Falco sparverius, female, SAMN26236083)	47,271,898	44%	37%	150,363
SAMN26236084	NA	Spleen (Falco sparverius, female, SAMN26236084)	47,114,500	46%	36%	151,129
SAMN26236085	NA	Spleen (Falco sparverius, male, SAMN26236085)	49,606,200	45%	37%	151,437
SAMN26236086	NA	Spleen (Falco sparverius, male, SAMN26236086)	46,409,950	47%	37%	150,386

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
DRR350760	DRX336773	DRP005438	SAMD00156794	16,031,718	50%	40%
SRR522906	SRX160529	SRP013939	SAMN01055120	79,571,270	80%	27%
SRR522907	SRX160530	SRP018394	SAMN01055121	76,863,908	85%	25%
SRR3203231	SRX1612734	SRP071126	SAMN04531380	100,655,654	77%	26%
SRR3203238	SRX1612742	SRP071126	SAMN04531384	100,188,960	78%	28%
SRR3217264	SRX1624460	SRP071583	SAMN04525554	17,456,911	83%	12%
SRR3217266	SRX1624462	SRP071583	SAMN04525555	278,044,640	76%	11%
SRR3217265	SRX1624461	SRP071583	SAMN04525556	39,385,559	84%	12%
SRR3217261	SRX1624457	SRP071583	SAMN04525557	34,611,542	81%	10%
SRR3217262	SRX1624458	SRP071583	SAMN04525558	38,265,091	80%	11%
SRR3217263	SRX1624459	SRP071583	SAMN04525559	31,591,785	84%	11%
SRR3217258	SRX1624454	SRP071583	SAMN04525560	39,434,485	86%	11%
SRR3217259	SRX1624455	SRP071583	SAMN04525561	28,290,295	82%	10%
SRR3217260	SRX1624456	SRP071583	SAMN04525562	34,294,777	82%	11%
SRR5070564	SRX2390435	SRP094478	SAMN05831928	78,042,186	70%	28%
SRR5270429	SRX2574481	SRP094478	SAMN06101813	80,936,692	70%	29%
SRR5270428	SRX2574480	SRP094478	SAMN06101814	87,855,668	69%	28%
SRR5270427	SRX2574479	SRP094478	SAMN06101815	115,745,142	74%	27%
SRR5270426	SRX2574478	SRP094478	SAMN06101816	84,219,458	72%	28%
SRR5270425	SRX2574477	SRP094478	SAMN06101817	84,760,702	67%	27%
SRR5270424	SRX2574476	SRP094478	SAMN06101818	84,100,882	71%	29%
SRR5270423	SRX2574475	SRP094478	SAMN06101819	86,763,472	71%	28%
SRR5270422	SRX2574474	SRP094478	SAMN06101820	84,265,484	71%	27%
SRR5270421	SRX2574473	SRP094478	SAMN06101821	87,216,286	72%	26%
SRR5270420	SRX2574472	SRP094478	SAMN06101822	100,556,822	69%	28%
SRR5270419	SRX2574471	SRP094478	SAMN06101823	86,700,708	71%	28%
SRR5270418	SRX2574470	SRP094478	SAMN06101824	95,780,238	69%	25%
SRR5270417	SRX2574469	SRP094478	SAMN06101825	83,002,084	72%	28%
SRR5270416	SRX2574468	SRP094478	SAMN06101826	89,253,326	69%	27%
SRR5270415	SRX2574467	SRP094478	SAMN06101827	87,793,098	74%	27%
SRR5270414	SRX2574466	SRP094478	SAMN06101828	86,581,814	72%	27%
SRR5270413	SRX2574465	SRP094478	SAMN06101829	74,998,584	75%	28%
SRR6650831	SRX3628421	SRP131743	SAMN08398480	64,250,348	87%	22%
SRR10853095	SRX7523253	SRP240625	SAMN13755181	103,522,512	88%	8%
SRR18135033	SRX14283047	SRP361390	SAMN26236054	58,432,140	53%	44%
SRR18135032	SRX14283048	SRP361390	SAMN26236055	51,045,524	43%	43%
SRR18135021	SRX14283059	SRP361390	SAMN26236056	53,357,984	45%	37%
SRR18135010	SRX14283070	SRP361390	SAMN26236057	53,913,832	44%	36%
SRR18135006	SRX14283074	SRP361390	SAMN26236058	46,161,094	43%	36%
SRR18135005	SRX14283075	SRP361390	SAMN26236059	52,435,956	44%	36%
SRR18135004	SRX14283076	SRP361390	SAMN26236060	48,616,910	45%	38%
SRR18135003	SRX14283077	SRP361390	SAMN26236061	57,356,480	46%	37%
SRR18135002	SRX14283078	SRP361390	SAMN26236062	52,362,412	45%	37%
SRR18135001	SRX14283079	SRP361390	SAMN26236063	46,644,860	42%	37%
SRR18135031	SRX14283049	SRP361390	SAMN26236064	51,732,110	47%	36%
SRR18135030	SRX14283050	SRP361390	SAMN26236065	49,677,822	42%	37%
SRR18135029	SRX14283051	SRP361390	SAMN26236066	48,403,000	44%	38%
SRR18135028	SRX14283052	SRP361390	SAMN26236067	52,372,850	45%	37%
SRR18135027	SRX14283053	SRP361390	SAMN26236068	57,755,824	49%	36%
SRR18135026	SRX14283054	SRP361390	SAMN26236069	50,609,016	43%	35%
SRR18135025	SRX14283055	SRP361390	SAMN26236070	51,891,904	46%	35%
SRR18135024	SRX14283056	SRP361390	SAMN26236071	48,444,062	45%	34%
SRR18135023	SRX14283057	SRP361390	SAMN26236072	49,833,438	45%	34%
SRR18135022	SRX14283058	SRP361390	SAMN26236073	47,113,338	44%	36%
SRR18135020	SRX14283060	SRP361390	SAMN26236074	55,051,960	49%	35%
SRR18135019	SRX14283061	SRP361390	SAMN26236075	54,310,696	45%	34%
SRR18135018	SRX14283062	SRP361390	SAMN26236076	50,193,232	45%	35%
SRR18135017	SRX14283063	SRP361390	SAMN26236077	50,721,640	45%	35%
SRR18135016	SRX14283064	SRP361390	SAMN26236078	54,423,166	43%	36%
SRR18135015	SRX14283065	SRP361390	SAMN26236079	49,659,604	45%	36%
SRR18135014	SRX14283066	SRP361390	SAMN26236080	51,596,012	47%	35%
SRR18135013	SRX14283067	SRP361390	SAMN26236081	54,776,994	44%	36%
SRR18135012	SRX14283068	SRP361390	SAMN26236082	49,229,758	46%	36%
SRR18135011	SRX14283069	SRP361390	SAMN26236083	47,271,898	44%	37%
SRR18135009	SRX14283071	SRP361390	SAMN26236084	47,114,500	46%	36%
SRR18135008	SRX14283072	SRP361390	SAMN26236085	49,606,200	45%	37%
SRR18135007	SRX14283073	SRP361390	SAMN26236086	46,409,950	47%	37%

Protein alignments

The alignments of the following proteins with ProSplign were used for gene prediction:

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Xenopus known RefSeq (NP_)	19,237	17,714 (92.08%)	17,714 (92.08%)	70.17%	79.46%
Aves GenBank	15,357	14,093 (91.77%)	14,093 (91.77%)	73.96%	85.14%
Aves known RefSeq (NP_)	10,012	9,753 (97.41%)	9,753 (97.41%)	77.80%	85.84%
Columba livia high-quality model RefSeq (XP_)	8,292	8,217 (99.10%)	8,217 (99.10%)	78.88%	86.59%
Gallus gallus high-quality model RefSeq (XP_)	9,975	9,647 (96.71%)	9,647 (96.71%)	77.48%	84.75%
Parus major high-quality model RefSeq (XP_)	11,979	11,848 (98.91%)	11,848 (98.91%)	78.08%	85.36%
Homo sapiens known RefSeq (NP_)	66,927	57,967 (86.61%)	57,967 (86.61%)	71.49%	77.13%

Assembly-assembly alignments of current to previous assembly

When the assembly changes between two rounds of annotation, genes in the current and the previous annotation are mapped to each other using the genomic alignments of the current assembly to the previous assembly so that gene identifiers can be preserved. The success of the remapping depends largely on how well the two assembly versions align to each other.

Below are the percent coverage of one assembly by the other and the average percent identity of the alignments. The 'First pass' alignments are reciprocal best hits, while the 'Total' alignments also include 'Second pass' or non-reciprocal best alignments. For more information about the assembly-assembly alignment process, please visit the NCBI Genome Remapping Service page.

First Pass	Total
bFalChe1.pri (Current) Coverage: 87.20%	bFalChe1.pri (Current) Coverage: 88.15%
F_cherrug_v1.0 (Previous) Coverage: 99.05%	F_cherrug_v1.0 (Previous) Coverage: 99.24%
Percent Identity: 96.95%	Percent Identity: 96.89%

Comparison of the current and previous annotations

The annotations produced for this release were compared to the annotations in the previous release for each assembly annotated in both releases. Scores for current and previous gene and transcript features were calculated based on overlap in exon sequence and matches in exon boundaries. Pairs of current and previous features were categorized based on these scores, whether they are reciprocal best matches, and changes in attributes (gene biotype, completeness, etc.). If the assembly was updated between the two releases, alignments between the current and the previous assembly were used to match the current and previous gene and transcript features in mapped regions.

The table below summarizes the changes in the gene set for each assembly as a percent of the number of genes in the current annotation release, and provides links to the details of the comparison in tabular format and in a Genome Workbench project.

	bFalChe1.pri (Current) to F_cherrug_v1.0 (Previous)
Identical	3%
Minor changes	64%
Major changes	15%
New	17%
Deprecated	11%
Other	1%
Download the report	tabular, Genome Workbench

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
BUSCO: Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. Molecular biology and evolution 2021.38(10):4647-4654
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20
STAR: Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. Bioinformatics 2013 Jan 1;29(1):15-21.
Minimap2: Li H. Bioinformatics 2018 Sep 15;34(18):3094-3100

RefSeq

Integrated reference sequences