NCBI Actinia tenebrosa Annotation Release 100

The RefSeq genome records for Actinia tenebrosa were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. This report presents statistics on the annotation products, the input data used in the pipeline and intermediate alignment results.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as NCBI Actinia tenebrosa Annotation Release 100

Annotation release ID: 100
Date of Entrez queries for transcripts and proteins: Nov 18 2019
Date of submission of annotation to the public databases: Nov 25 2019
Software version: 8.2

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
ASM960242v1	GCF_009602425.1	QUT	11-06-2019	Reference	1 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	ASM960242v1
Genes and pseudogenes	22,927
protein-coding	19,980
non-coding	2,469
transcribed pseudogenes	0
non-transcribed pseudogenes	478
genes with variants	4,104
immunoglobulin/T-cell receptor gene segments	0
other	0
mRNAs	27,024
fully-supported	23,835
with > 5% ab initio	1,805
partial	2,766
with filled gap(s)	1,985
known RefSeq (NM_)	0
model RefSeq (XM_)	27,024
non-coding RNAs	3,261
fully-supported	2,551
with > 5% ab initio	0
partial	7
with filled gap(s)	7
known RefSeq (NR_)	0
model RefSeq (XR_)	2,642
pseudo transcripts	0
fully-supported	0
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	0
CDSs	27,037
fully-supported	23,835
with > 5% ab initio	1,924
partial	2,644
with major correction(s)	1,001
known RefSeq (NP_)	0
model RefSeq (XP_)	27,037

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	22,449	7,644	4,812	64	201,163
All transcripts	30,285	2,408	1,954	64	64,712
mRNA	27,024	2,582	2,087	146	64,712
misc_RNA	502	2,504	2,049	125	12,565
tRNA	617	74	73	64	84
lncRNA	2,049	890	660	79	9,168
snoRNA	34	125	87	68	266
snRNA	50	128	122	72	181
guide_RNA	1	128	128	128	128
rRNA	8	499	119	119	2,194
Single-exon transcripts	2,635	1,668	1,461	312	9,621
coding transcripts (NM_/XM_ )	2,635	1,668	1,461	312	9,621
CDSs	27,037	1,761	1,221	146	63,354
Exons	180,512	286	130	2	17,933
in coding transcripts (NM_/XM_ )	174,188	285	130	2	17,933
in non-coding transcripts (NR_/XR_ )	8,921	278	139	2	8,835
Introns	156,179	891	467	30	70,079
in coding transcripts (NM_/XM_ )	151,991	859	464	30	70,079
in non-coding transcripts (NR_/XR_ )	6,727	1,622	534	30	47,075

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	1.36	1	1	22
Number of exons per transcript	8.63	5	1	134

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the UniProtKB/Swiss-Prot curated proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 19967 coding genes, 13270 genes had a protein with an alignment covering 50% or more of the query and 2819 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: UniProtKB/Swiss-Prot curated proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker for each assembly. RepeatMasker results are only used for organisms for which a comprehensive repeat library is available.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with RepeatMasker	% Masked with WindowMasker
ASM960242v1	GCF_009602425.1	3.09%	24.47%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez, aligned to the genome by Splign or ProSplign and passed to Gnomon, NCBI's gene prediction software.

Depending on the other evidence available, long 454 reads (with average length above 250 nt) may be aligned as traditional evidence and reported in the Transcript alignments section or aligned with RNA-Seq reads and reported in the RNA-Seq alignments section.

Transcript alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species Genbank	22	22 (100.00%)	22 (100.00%)	99.79%	98.95%

RNA-Seq alignments

The following RNA-Seq reads from the Sequence Read Archive were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Publication	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	NA	Aggregate of all aligned samples	1,778,350,511	79%	25%	184,104
SAMN04093377	NA	Adult, whole organism (Actinia tenebrosa, SAMN04093377)	81,470,022	86%	23%	147,097
SAMN04520502	27806695	whole (Actinia tenebrosa, SAMN04520502)	165,137,600	70%	35%	145,681
SAMN04520715	27806695	whole organism (Actinia tenebrosa, SAMN04520715)	69,567,960	67%	36%	145,566
SAMN04534877	27806695	whole organism (Actinia tenebrosa, SAMN04534877)	175,687,690	83%	19%	160,950
SAMN04534878	27806695	whole organism (Actinia tenebrosa, SAMN04534878)	179,309,262	80%	39%	161,576
SAMN04534887	27806695	whole organisim (Actinia tenebrosa, SAMN04534887)	201,995,450	79%	36%	165,475
SAMN05938593	30913335,30913335	Acrorhagi (Actinia tenebrosa, SAMN05938593)	77,812,598	79%	20%	135,429
SAMN05938594	30913335,30913335	Acrorhagi (Actinia tenebrosa, SAMN05938594)	85,870,634	80%	21%	136,275
SAMN05938595	30913335,30913335	Acrorhagi (Actinia tenebrosa, SAMN05938595)	84,448,842	79%	20%	139,797
SAMN05938596	30913335,30913335	Mesentery (Actinia tenebrosa, SAMN05938596)	94,096,112	81%	21%	138,171
SAMN05938597	30913335,30913335	Mesentery (Actinia tenebrosa, SAMN05938597)	87,164,782	82%	21%	133,049
SAMN05938598	30913335,30913335	Mesentery (Actinia tenebrosa, SAMN05938598)	80,872,122	80%	21%	133,643
SAMN05938599	30913335,30913335	Tentacle (Actinia tenebrosa, SAMN05938599)	114,615,436	80%	17%	143,217
SAMN05938600	30913335,30913335	Tentacle (Actinia tenebrosa, SAMN05938600)	96,124,370	80%	17%	140,140
SAMN05938601	30913335,30913335	Tentacle (Actinia tenebrosa, SAMN05938601)	97,133,340	79%	16%	138,135
SAMN05953609	30913335,30913335	Whole Organism (Actinia tenebrosa, SAMN05953609)	83,195,147	83%	17%	151,752
SAMN07786920	NA	Tentacles (Actinia tenebrosa, SAMN07786920)	3,849,144	76%	30%	52,130

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
SRR2437124	SRX1253004	SRP063756	SAMN04093377	81,470,022	86%	23%
SRR3193284	SRX1604071	SRP070917	SAMN04520502	82,568,800	70%	35%
SRR3216075	SRX1604071	SRP070917	SAMN04520502	82,568,800	70%	35%
SRR3193648	SRX1604429	SRP070917	SAMN04520715	69,567,960	67%	36%
SRR3210696	SRX1618963	SRP070917	SAMN04534877	175,687,690	83%	19%
SRR3206038	SRX1615196	SRP070917	SAMN04534878	179,309,262	80%	39%
SRR3207346	SRX1616256	SRP070917	SAMN04534887	201,995,450	79%	36%
SRR4677507	SRX2310344	SRP092287	SAMN05938593	77,812,598	79%	20%
SRR4677512	SRX2310345	SRP092287	SAMN05938594	85,870,634	80%	21%
SRR4677515	SRX2310346	SRP092287	SAMN05938595	84,448,842	79%	20%
SRR4677518	SRX2310347	SRP092287	SAMN05938596	94,096,112	81%	21%
SRR4677488	SRX2310340	SRP092287	SAMN05938597	87,164,782	82%	21%
SRR4677492	SRX2310341	SRP092287	SAMN05938598	80,872,122	80%	21%
SRR4677495	SRX2310342	SRP092287	SAMN05938599	114,615,436	80%	17%
SRR4677502	SRX2310343	SRP092287	SAMN05938600	96,124,370	80%	17%
SRR4677522	SRX2310348	SRP092287	SAMN05938601	97,133,340	79%	16%
SRR4696535	SRX2310423	SRP092287	SAMN05953609	83,195,147	83%	17%
SRR6282389	SRX3384504	SRP124815	SAMN07786920	3,849,144	76%	30%

Protein alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Crassostrea gigas high-quality model RefSeq (XP_)	22,081	11,378 (51.53%)	11,378 (51.53%)	54.50%	31.11%
Crassostrea gigas known RefSeq (NP_)	147	107 (72.79%)	107 (72.79%)	61.97%	48.39%
Nematostella vectensis GenBank	449	434 (96.66%)	434 (96.66%)	68.80%	68.84%
Nematostella vectensis model RefSeq (XP_)	24,780	20,407 (82.35%)	20,407 (82.35%)	64.67%	64.79%
Orbicella faveolata high-quality model RefSeq (XP_)	18,402	15,641 (85.00%)	15,641 (85.00%)	60.17%	48.94%
Hydra vulgaris high-quality model RefSeq (XP_)	7,077	4,555 (64.36%)	4,555 (64.36%)	56.88%	41.55%
Same-species GenBank	22	22 (100.00%)	22 (100.00%)	82.35%	82.61%
Acropora digitifera high-quality model RefSeq (XP_)	15,470	12,048 (77.88%)	12,048 (77.88%)	58.82%	45.69%
Strongylocentrotus purpuratus high-quality model RefSeq (XP_)	19,174	11,170 (58.26%)	11,170 (58.26%)	58.25%	38.65%
Strongylocentrotus purpuratus known RefSeq (NP_)	425	310 (72.94%)	310 (72.94%)	70.48%	60.26%
Ciona intestinalis high-quality model RefSeq (XP_)	11,388	6,332 (55.60%)	6,332 (55.60%)	55.14%	34.70%
Ciona intestinalis known RefSeq (NP_)	942	590 (62.63%)	590 (62.63%)	57.30%	31.50%
Homo sapiens known RefSeq (NP_)	55,479	32,715 (58.97%)	32,715 (58.97%)	57.21%	38.02%

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20

RefSeq

Integrated reference sequences