NCBI Branchiostoma floridae Annotation Release 100

The RefSeq genome records for Branchiostoma floridae were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. This report presents statistics on the annotation products, the input data used in the pipeline and intermediate alignment results.

The annotation products are available in the sequence databases and on the FTP site.

This report provides:

Annotation Release information: The name of the release, important dates, the software version
Assemblies: A brief description of the annotated assembly(ies)
Gene and feature statistics: The counts and characteristics of the annotated features
Alignment of the annotated proteins to a set of high-quality proteins: The number of annotated proteins with hits to a set of high-quality proteins
Masking of genomic sequence: How much of the genome was masked
Transcript and protein alignments: The number and type of evidence retrieved from public databases and used for gene prediction

For more information on the annotation process, please visit the NCBI Eukaryotic Genome Annotation Pipeline page.

Annotation Release information

This annotation should be referred to as NCBI Branchiostoma floridae Annotation Release 100

Annotation release ID: 100
Date of Entrez queries for transcripts and proteins: Aug 11 2020
Date of submission of annotation to the public databases: Aug 13 2020
Software version: 8.5

Assemblies

The following assemblies were included in this annotation run:

Assembly name	Assembly accession	Submitter	Assembly date	Reference/Alternate	Assembly content
Bfl_VNyyK	GCF_000003815.2	DOE Joint Genome Institute	04-29-2020	Reference	20 assembled chromosomes; unplaced scaffolds

Gene and feature statistics

Counts and length of annotated features are provided below for each assembly.

Feature counts

Feature	Bfl_VNyyK
Genes and pseudogenes	29,857
protein-coding	26,689
non-coding	3,042
transcribed pseudogenes	3
non-transcribed pseudogenes	123
genes with variants	6,676
immunoglobulin/T-cell receptor gene segments	0
other	0
mRNAs	43,028
fully-supported	35,985
with > 5% ab initio	4,024
partial	1,399
with filled gap(s)	1,147
known RefSeq (NM_)	0
model RefSeq (XM_)	43,028
non-coding RNAs	3,977
fully-supported	3,364
with > 5% ab initio	0
partial	1
with filled gap(s)	1
known RefSeq (NR_)	0
model RefSeq (XR_)	3,536
pseudo transcripts	3
fully-supported	1
with > 5% ab initio	0
partial	0
with filled gap(s)	0
known RefSeq (NR_)	0
model RefSeq (XR_)	3
CDSs	43,041
fully-supported	35,985
with > 5% ab initio	4,409
partial	1,258
with major correction(s)	821
known RefSeq (NP_)	13
model RefSeq (XP_)	43,028

Detailed reports

The counts below do not include pseudogenes.

Feature lengths

Feature	Count	Mean length (bp)	Median length (bp)	Min length (bp)	Max length (bp)
Genes	29,731	12,109	6,126	66	1,184,186
All transcripts	47,005	2,876	2,248	58	62,792
mRNA	43,028	3,031	2,383	231	62,792
misc_RNA	561	2,772	2,089	200	35,201
tRNA	439	74	73	58	88
lncRNA	2,803	1,115	860	74	10,974
snoRNA	60	113	83	66	277
snRNA	82	145	141	102	194
guide_RNA	3	138	140	134	141
rRNA	29	1,119	154	118	3,891
Single-exon transcripts	1,437	1,780	1,526	273	15,753
coding transcripts (NM_/XM_ )	1,437	1,780	1,526	273	15,753
CDSs	43,041	1,975	1,410	165	61,653
Exons	263,543	285	145	2	22,746
in coding transcripts (NM_/XM_ )	255,037	282	144	2	22,746
in non-coding transcripts (NR_/XR_ )	11,218	341	151	2	10,677
Introns	232,488	1,469	519	30	1,173,455
in coding transcripts (NM_/XM_ )	226,874	1,449	518	30	1,173,455
in non-coding transcripts (NR_/XR_ )	8,247	2,049	550	30	620,124

Transcripts per gene, exons per transcript

	Mean	Median	Min	Max
Number of transcripts per gene	1.59	1	1	50
Number of exons per transcript	11.06	7	1	227

Alignment of the annotated proteins to a set of high-quality proteins

The final set of annotated proteins was searched with BLASTP against the UniProtKB/Swiss-Prot curated proteins, using the annotated proteins as the query and the high-quality proteins as the target. Out of 26676 coding genes, 18839 genes had a protein with an alignment covering 50% or more of the query and 4014 had an alignment covering 95% or more of the query.

Definition of query and target coverage. The query coverage is the percentage of the annotated protein length that is included in the alignment. The target coverage is the percentage of the target length that is included in the alignment.

Below is a cumulative graph displaying the number of genes with alignments above a given query or target coverage threshold. For comparison, corresponding statistics for other organisms annotated by the NCBI eukaryotic annotation pipeline were added to the graph.

Query: annotated proteins
Target: UniProtKB/Swiss-Prot curated proteins

Masking of genomic sequence

Transcript and protein alignments are performed on the repeat-masked genome. Below are the percentages of genomic sequence masked by WindowMasker and RepeatMasker for each assembly. RepeatMasker results are only used for organisms for which a comprehensive repeat library is available.

For this annotation run, transcripts and proteins were aligned to the genome masked with WindowMasker only.

Assembly name	Assembly accession	% Masked with RepeatMasker	% Masked with WindowMasker
Bfl_VNyyK	GCF_000003815.2	14.63%	24.05%

Transcript and protein alignments

The annotation pipeline relies heavily on alignments of experimental evidence for gene prediction. Below are the sets of transcripts and proteins that were retrieved from Entrez, aligned to the genome by Splign or ProSplign and passed to Gnomon, NCBI's gene prediction software.

Depending on the other evidence available, long 454 reads (with average length above 250 nt) may be aligned as traditional evidence and reported in the Transcript alignments section or aligned with RNA-Seq reads and reported in the RNA-Seq alignments section.

Transcript alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by Splign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Same-species Genbank	517	510 (98.65%)	417 (80.66%)	98.17%	97.73%
Same-species EST	334,502	247,730 (74.06%)	181,520 (54.27%)	98.12%	99.07%
Branchiostomidae Genbank	1,026	936 (91.23%)	432 (42.11%)	91.24%	90.65%
Branchiostomidae EST	24,838	10,258 (41.30%)	7,773 (31.29%)	92.16%	98.32%

RNA-Seq alignments

The following RNA-Seq reads from the Sequence Read Archive were also used for gene prediction:

Hide alignments statistics, by sample (SAME, SAMN, SAMD, DRS)

Sample Id	Publication	Track name	Number of reads	Percent aligned reads	Percent of aligned reads with introns	Number of introns
All	NA	Aggregate of all aligned samples	4,691,988,217	62%	26%	279,667
SAMD00028051	NA	whole embryo (Branchiostoma floridae, male, female, and mixed, SAMD00028051)	56,415,654	66%	25%	140,125
SAMD00028052	NA	whole embryo (Branchiostoma floridae, male, female, and mixed, SAMD00028052)	57,000,061	67%	25%	140,345
SAMD00028053	NA	whole embryo (Branchiostoma floridae, male, female, and mixed, SAMD00028053)	54,496,005	63%	23%	142,717
SAMD00028054	NA	whole embryo (Branchiostoma floridae, male, female, and mixed, SAMD00028054)	53,428,468	64%	23%	140,451
SAMD00028055	NA	whole embryo (Branchiostoma floridae, male, female, and mixed, SAMD00028055)	54,244,448	70%	24%	154,818
SAMD00028056	NA	whole embryo (Branchiostoma floridae, male, female, and mixed, SAMD00028056)	59,066,184	69%	25%	157,539
SAMD00028057	NA	whole embryo (Branchiostoma floridae, male, female, and mixed, SAMD00028057)	58,104,178	68%	28%	211,214
SAMD00028058	NA	whole embryo (Branchiostoma floridae, male, female, and mixed, SAMD00028058)	53,940,639	68%	28%	213,796
SAMD00028059	NA	whole embryo (Branchiostoma floridae, male, female, and mixed, SAMD00028059)	64,475,767	71%	27%	204,264
SAMD00028060	NA	whole embryo (Branchiostoma floridae, male, female, and mixed, SAMD00028060)	54,213,811	70%	27%	184,883
SAMD00028061	NA	whole embryo (Branchiostoma floridae, male, female, and mixed, SAMD00028061)	50,249,407	70%	27%	209,688
SAMD00028062	NA	whole embryo (Branchiostoma floridae, male, female, and mixed, SAMD00028062)	49,564,508	70%	27%	209,427
SAMD00028063	NA	whole embryo (Branchiostoma floridae, male, female, and mixed, SAMD00028063)	55,916,248	69%	27%	219,838
SAMD00028064	NA	whole embryo (Branchiostoma floridae, male, female, and mixed, SAMD00028064)	54,769,900	70%	28%	207,891
SAMD00028065	NA	whole embryo (Branchiostoma floridae, male, female, and mixed, SAMD00028065)	57,585,429	67%	27%	196,325
SAMD00028066	NA	whole embryo (Branchiostoma floridae, male, female, and mixed, SAMD00028066)	49,758,094	67%	27%	176,577
SAMD00028067	NA	whole embryo (Branchiostoma floridae, male, female, and mixed, SAMD00028067)	56,051,298	67%	27%	198,120
SAMD00028068	NA	whole embryo (Branchiostoma floridae, male, female, and mixed, SAMD00028068)	57,281,888	70%	23%	175,281
SAMD00028069	NA	whole embryo (Branchiostoma floridae, male, female, and mixed, SAMD00028069)	53,929,279	70%	26%	170,212
SAMD00028070	NA	whole embryo (Branchiostoma floridae, male, female, and mixed, SAMD00028070)	55,359,756	69%	26%	170,249
SAMD00028071	NA	whole embryo (Branchiostoma floridae, male, female, and mixed, SAMD00028071)	69,869,564	69%	27%	189,124
SAMD00028072	NA	whole embryo (Branchiostoma floridae, male, female, and mixed, SAMD00028072)	76,790,918	69%	27%	190,439
SAMD00028073	NA	whole embryo (Branchiostoma floridae, male, female, and mixed, SAMD00028073)	78,472,254	70%	26%	189,633
SAMD00028074	NA	whole embryo (Branchiostoma floridae, male, female, and mixed, SAMD00028074)	51,165,770	66%	23%	113,909
SAMD00028075	NA	whole embryo (Branchiostoma floridae, male, female, and mixed, SAMD00028075)	60,579,248	67%	24%	119,720
SAMD00028076	NA	whole embryo (Branchiostoma floridae, male, female, and mixed, SAMD00028076)	56,083,328	65%	23%	152,783
SAMD00028077	NA	whole embryo (Branchiostoma floridae, male, female, and mixed, SAMD00028077)	57,063,175	67%	25%	145,340
SAMN02058516	NA	Generic sample from Branchiostoma floridae (Branchiostoma floridae, SAMN02058516)	127,001,906	68%	10%	181,470
SAMN03456679	27412606	whole (Branchiostoma floridae, SAMN03456679)	101,906,362	38%	7%	136,179
SAMN12385097	NA	germ granule-positive blastomere (Branchiostoma floridae, SAMN12385097)	48,449,498	46%	14%	95,819
SAMN12385098	NA	germ granule-positive blastomere (Branchiostoma floridae, SAMN12385098)	53,512,244	41%	21%	107,032
SAMN12385099	NA	germ granule-positive blastomere (Branchiostoma floridae, SAMN12385099)	46,325,442	41%	23%	108,739
SAMN12385100	NA	germ granule-positive blastomere (Branchiostoma floridae, SAMN12385100)	45,617,436	39%	13%	88,859
SAMN12385101	NA	germ granule-positive blastomere (Branchiostoma floridae, SAMN12385101)	46,311,264	42%	18%	96,701
SAMN12385102	NA	germ granule-positive blastomere (Branchiostoma floridae, SAMN12385102)	51,459,096	48%	11%	95,109
SAMN12385103	NA	germ granule-negative blastomere (Branchiostoma floridae, SAMN12385103)	46,757,628	44%	21%	109,967
SAMN12385104	NA	germ granule-negative blastomere (Branchiostoma floridae, SAMN12385104)	54,691,064	45%	23%	112,133
SAMN12385105	NA	germ granule-negative blastomere (Branchiostoma floridae, SAMN12385105)	47,181,826	45%	25%	111,856
SAMN12385106	NA	germ granule-negative blastomere (Branchiostoma floridae, SAMN12385106)	51,868,194	38%	16%	95,369
SAMN12385107	NA	germ granule-negative blastomere (Branchiostoma floridae, SAMN12385107)	49,476,932	43%	16%	94,514
SAMN12385108	NA	germ granule-negative blastomere (Branchiostoma floridae, SAMN12385108)	58,979,166	49%	11%	99,845
SAMN12385109	NA	animal tier (Branchiostoma floridae, SAMN12385109)	49,326,652	51%	20%	113,763
SAMN12385110	NA	animal tier (Branchiostoma floridae, SAMN12385110)	52,004,720	48%	15%	105,547
SAMN12385111	NA	animal tier (Branchiostoma floridae, SAMN12385111)	52,960,834	54%	21%	114,987
SAMN12385112	NA	animal tier (Branchiostoma floridae, SAMN12385112)	50,017,824	51%	21%	113,304
SAMN12385113	NA	animal tier (Branchiostoma floridae, SAMN12385113)	53,949,236	46%	12%	94,184
SAMN12385114	NA	animal tier (Branchiostoma floridae, SAMN12385114)	52,724,388	45%	23%	108,540
SAMN12385115	NA	vegetal tier (Branchiostoma floridae, SAMN12385115)	51,894,986	51%	13%	105,328
SAMN12385116	NA	vegetal tier (Branchiostoma floridae, SAMN12385116)	46,517,718	54%	14%	107,977
SAMN12385117	NA	vegetal tier (Branchiostoma floridae, SAMN12385117)	57,580,412	51%	21%	114,688
SAMN12385118	NA	vegetal tier (Branchiostoma floridae, SAMN12385118)	52,455,462	49%	22%	113,340
SAMN12385119	NA	vegetal tier (Branchiostoma floridae, SAMN12385119)	51,138,114	47%	9%	89,837
SAMN12385120	NA	vegetal tier (Branchiostoma floridae, SAMN12385120)	50,529,138	45%	17%	103,969
SAMN13521182	NA	larvae (Branchiostoma floridae, pooled male and female, SAMN13521182)	67,862,074	68%	22%	202,729
SAMN13521183	NA	larvae (Branchiostoma floridae, pooled male and female, SAMN13521183)	66,526,310	68%	24%	200,721
SAMN13521184	NA	larvae (Branchiostoma floridae, pooled male and female, SAMN13521184)	66,336,602	69%	24%	201,911
SAMN13521185	NA	larvae (Branchiostoma floridae, pooled male and female, SAMN13521185)	66,014,144	68%	23%	203,633
SAMN13521186	NA	larvae (Branchiostoma floridae, pooled male and female, SAMN13521186)	68,794,000	69%	23%	202,175
SAMN13521187	NA	larvae (Branchiostoma floridae, pooled male and female, SAMN13521187)	66,449,690	68%	23%	201,707
SAMN13521188	NA	larvae (Branchiostoma floridae, pooled male and female, SAMN13521188)	67,662,144	69%	24%	206,529
SAMN13521189	NA	larvae (Branchiostoma floridae, pooled male and female, SAMN13521189)	68,026,014	68%	23%	206,117
SAMN13521190	NA	larvae (Branchiostoma floridae, pooled male and female, SAMN13521190)	65,542,104	68%	23%	206,360
SAMN13521191	NA	larvae (Branchiostoma floridae, pooled male and female, SAMN13521191)	64,141,976	69%	25%	203,704
SAMN13521192	NA	larvae (Branchiostoma floridae, pooled male and female, SAMN13521192)	66,553,378	69%	24%	207,117
SAMN13521193	NA	larvae (Branchiostoma floridae, pooled male and female, SAMN13521193)	66,471,594	70%	28%	202,410
SAMN15232949	NA	gonad (Branchiostoma floridae, female, SAMN15232949)	47,958,672	67%	46%	162,734
SAMN15232950	NA	gonad (Branchiostoma floridae, female, SAMN15232950)	41,775,434	69%	50%	170,853
SAMN15232951	NA	gonad (Branchiostoma floridae, female, SAMN15232951)	52,717,218	67%	48%	177,137
SAMN15232952	NA	gonad (Branchiostoma floridae, male, SAMN15232952)	28,931,966	69%	38%	159,596
SAMN15232953	NA	gonad (Branchiostoma floridae, male, SAMN15232953)	23,838,764	73%	28%	153,076
SAMN15232954	NA	gonad (Branchiostoma floridae, male, SAMN15232954)	33,214,072	74%	23%	144,361
SAMN15232955	NA	gonad (Branchiostoma floridae, male, SAMN15232955)	31,041,452	70%	25%	148,247
SAMN15232956	NA	gonad (Branchiostoma floridae, female, SAMN15232956)	24,318,360	65%	46%	159,760
SAMN15232957	NA	gonad (Branchiostoma floridae, female, SAMN15232957)	27,540,928	69%	50%	161,488
SAMN15232958	NA	gonad (Branchiostoma floridae, female, SAMN15232958)	29,745,534	64%	43%	158,358
SAMN15232959	NA	gonad (Branchiostoma floridae, female, SAMN15232959)	21,051,348	68%	51%	148,743
SAMN15232960	NA	gonad (Branchiostoma floridae, female, SAMN15232960)	20,559,830	68%	49%	150,191
SAMN15232961	NA	gonad (Branchiostoma floridae, male, SAMN15232961)	63,537,936	82%	24%	169,112
SAMN15232962	NA	gonad (Branchiostoma floridae, male, SAMN15232962)	63,094,794	66%	38%	181,241
SAMN15232963	NA	gonad (Branchiostoma floridae, male, SAMN15232963)	59,333,664	66%	37%	178,057
SAMN15232964	NA	muscle (Branchiostoma floridae, male, SAMN15232964)	51,875,460	69%	42%	196,993
SAMN15232965	NA	muscle (Branchiostoma floridae, female, SAMN15232965)	73,601,982	73%	41%	198,726
SAMN15232966	NA	muscle (Branchiostoma floridae, male, SAMN15232966)	33,426,012	73%	41%	158,899
SAMN15232967	NA	muscle (Branchiostoma floridae, male, SAMN15232967)	33,193,392	73%	41%	155,548
SAMN15232968	NA	muscle (Branchiostoma floridae, male, SAMN15232968)	36,727,494	73%	40%	166,100
SAMN15232969	NA	muscle (Branchiostoma floridae, female, SAMN15232969)	27,875,024	74%	40%	155,348
SAMN15232970	NA	muscle (Branchiostoma floridae, female, SAMN15232970)	37,115,714	73%	42%	163,584
SAMN15232971	NA	muscle (Branchiostoma floridae, female, SAMN15232971)	22,620,316	73%	40%	155,517

Show alignments statistics, by run (ERR, SRR, DRR)

Run	Experiment	Project	Sample	Number of reads	Percent aligned reads	Percent of aligned reads with introns
DRR032654	DRX029460	DRP003810	SAMD00028051	56,415,654	66%	25%
DRR032655	DRX029461	DRP003810	SAMD00028052	57,000,061	67%	25%
DRR032656	DRX029462	DRP003810	SAMD00028053	54,496,005	63%	23%
DRR032657	DRX029463	DRP003810	SAMD00028054	53,428,468	64%	23%
DRR032658	DRX029464	DRP003810	SAMD00028055	54,244,448	70%	24%
DRR032659	DRX029465	DRP003810	SAMD00028056	59,066,184	69%	25%
DRR032660	DRX029466	DRP003810	SAMD00028057	58,104,178	68%	28%
DRR032661	DRX029467	DRP003810	SAMD00028058	53,940,639	68%	28%
DRR032662	DRX029468	DRP003810	SAMD00028059	64,475,767	71%	27%
DRR032663	DRX029469	DRP003810	SAMD00028060	54,213,811	70%	27%
DRR032664	DRX029470	DRP003810	SAMD00028061	50,249,407	70%	27%
DRR032665	DRX029471	DRP003810	SAMD00028062	49,564,508	70%	27%
DRR032666	DRX029472	DRP003810	SAMD00028063	55,916,248	69%	27%
DRR032667	DRX029473	DRP003810	SAMD00028064	54,769,900	70%	28%
DRR032668	DRX029474	DRP003810	SAMD00028065	57,585,429	67%	27%
DRR032669	DRX029475	DRP003810	SAMD00028066	49,758,094	67%	27%
DRR032670	DRX029476	DRP003810	SAMD00028067	56,051,298	67%	27%
DRR032671	DRX029477	DRP003810	SAMD00028068	57,281,888	70%	23%
DRR032672	DRX029478	DRP003810	SAMD00028069	53,929,279	70%	26%
DRR032673	DRX029479	DRP003810	SAMD00028070	55,359,756	69%	26%
DRR032674	DRX029480	DRP003810	SAMD00028071	69,869,564	69%	27%
DRR032675	DRX029481	DRP003810	SAMD00028072	76,790,918	69%	27%
DRR032676	DRX029482	DRP003810	SAMD00028073	78,472,254	70%	26%
DRR032677	DRX029483	DRP003810	SAMD00028074	51,165,770	66%	23%
DRR032678	DRX029484	DRP003810	SAMD00028075	60,579,248	67%	24%
DRR032679	DRX029485	DRP003810	SAMD00028076	56,083,328	65%	23%
DRR032680	DRX029486	DRP003810	SAMD00028077	57,063,175	67%	25%
SRR923751	SRX316148	SRP026346	SAMN02058516	127,001,906	68%	10%
SRR1952655	SRX978371	SRP056868	SAMN03456679	101,906,362	38%	7%
SRR9865933	SRX6619460	SRP216881	SAMN12385097	48,449,498	46%	14%
SRR9865934	SRX6619459	SRP216881	SAMN12385098	53,512,244	41%	21%
SRR9865935	SRX6619458	SRP216881	SAMN12385099	46,325,442	41%	23%
SRR9865936	SRX6619457	SRP216881	SAMN12385100	45,617,436	39%	13%
SRR9865929	SRX6619464	SRP216881	SAMN12385101	46,311,264	42%	18%
SRR9865930	SRX6619463	SRP216881	SAMN12385102	51,459,096	48%	11%
SRR9865931	SRX6619462	SRP216881	SAMN12385103	46,757,628	44%	21%
SRR9865932	SRX6619461	SRP216881	SAMN12385104	54,691,064	45%	23%
SRR9865927	SRX6619466	SRP216881	SAMN12385105	47,181,826	45%	25%
SRR9865928	SRX6619465	SRP216881	SAMN12385106	51,868,194	38%	16%
SRR9865939	SRX6619454	SRP216881	SAMN12385107	49,476,932	43%	16%
SRR9865940	SRX6619453	SRP216881	SAMN12385108	58,979,166	49%	11%
SRR9865941	SRX6619452	SRP216881	SAMN12385109	49,326,652	51%	20%
SRR9865942	SRX6619451	SRP216881	SAMN12385110	52,004,720	48%	15%
SRR9865943	SRX6619450	SRP216881	SAMN12385111	52,960,834	54%	21%
SRR9865944	SRX6619449	SRP216881	SAMN12385112	50,017,824	51%	21%
SRR9865945	SRX6619448	SRP216881	SAMN12385113	53,949,236	46%	12%
SRR9865946	SRX6619447	SRP216881	SAMN12385114	52,724,388	45%	23%
SRR9865937	SRX6619456	SRP216881	SAMN12385115	51,894,986	51%	13%
SRR9865938	SRX6619455	SRP216881	SAMN12385116	46,517,718	54%	14%
SRR9865924	SRX6619469	SRP216881	SAMN12385117	57,580,412	51%	21%
SRR9865923	SRX6619470	SRP216881	SAMN12385118	52,455,462	49%	22%
SRR9865926	SRX6619467	SRP216881	SAMN12385119	51,138,114	47%	9%
SRR9865925	SRX6619468	SRP216881	SAMN12385120	50,529,138	45%	17%
SRR10674591	SRX7351856	SRP237289	SAMN13521182	67,862,074	68%	22%
SRR10674590	SRX7351857	SRP237289	SAMN13521183	66,526,310	68%	24%
SRR10674587	SRX7351860	SRP237289	SAMN13521184	66,336,602	69%	24%
SRR10674586	SRX7351861	SRP237289	SAMN13521185	66,014,144	68%	23%
SRR10674585	SRX7351862	SRP237289	SAMN13521186	68,794,000	69%	23%
SRR10674584	SRX7351863	SRP237289	SAMN13521187	66,449,690	68%	23%
SRR10674583	SRX7351864	SRP237289	SAMN13521188	67,662,144	69%	24%
SRR10674582	SRX7351865	SRP237289	SAMN13521189	68,026,014	68%	23%
SRR10674581	SRX7351866	SRP237289	SAMN13521190	65,542,104	68%	23%
SRR10674580	SRX7351867	SRP237289	SAMN13521191	64,141,976	69%	25%
SRR10674589	SRX7351858	SRP237289	SAMN13521192	66,553,378	69%	24%
SRR10674588	SRX7351859	SRP237289	SAMN13521193	66,471,594	70%	28%
SRR12011573	SRX8544320	SRP247070	SAMN15232949	47,958,672	67%	46%
SRR12011572	SRX8544321	SRP247070	SAMN15232950	41,775,434	69%	50%
SRR12011571	SRX8544322	SRP247070	SAMN15232951	52,717,218	67%	48%
SRR12011570	SRX8544323	SRP247070	SAMN15232952	28,931,966	69%	38%
SRR12011569	SRX8544324	SRP247070	SAMN15232953	23,838,764	73%	28%
SRR12011568	SRX8544325	SRP247070	SAMN15232954	33,214,072	74%	23%
SRR12011567	SRX8544326	SRP247070	SAMN15232955	31,041,452	70%	25%
SRR12011566	SRX8544327	SRP247070	SAMN15232956	24,318,360	65%	46%
SRR12011564	SRX8544329	SRP247070	SAMN15232957	27,540,928	69%	50%
SRR12011563	SRX8544330	SRP247070	SAMN15232958	29,745,534	64%	43%
SRR12011562	SRX8544331	SRP247070	SAMN15232959	21,051,348	68%	51%
SRR12011561	SRX8544332	SRP247070	SAMN15232960	20,559,830	68%	49%
SRR12011560	SRX8544333	SRP247070	SAMN15232961	63,537,936	82%	24%
SRR12011559	SRX8544334	SRP247070	SAMN15232962	63,094,794	66%	38%
SRR12011558	SRX8544335	SRP247070	SAMN15232963	59,333,664	66%	37%
SRR12011557	SRX8544336	SRP247070	SAMN15232964	51,875,460	69%	42%
SRR12011556	SRX8544337	SRP247070	SAMN15232965	73,601,982	73%	41%
SRR12011555	SRX8544338	SRP247070	SAMN15232966	33,426,012	73%	41%
SRR12011553	SRX8544340	SRP247070	SAMN15232967	33,193,392	73%	41%
SRR12011552	SRX8544341	SRP247070	SAMN15232968	36,727,494	73%	40%
SRR12011551	SRX8544342	SRP247070	SAMN15232969	27,875,024	74%	40%
SRR12011550	SRX8544343	SRP247070	SAMN15232970	37,115,714	73%	42%
SRR12011549	SRX8544344	SRP247070	SAMN15232971	22,620,316	73%	40%

Protein alignments

Source	Number of sequences retrieved from Entrez	Number (%) of sequences aligned by ProSplign	Number (%) of sequences passed to Gnomon	Average % identity	Average % coverage
Saccoglossus kowalevskii GenBank	271	219 (80.81%)	219 (80.81%)	69.12%	57.00%
Saccoglossus kowalevskii high-quality model RefSeq (XP_)	6,124	4,276 (69.82%)	4,276 (69.82%)	66.45%	65.26%
Saccoglossus kowalevskii known RefSeq (NP_)	474	329 (69.41%)	329 (69.41%)	67.72%	63.58%
Crassostrea gigas GenBank	761	303 (39.82%)	303 (39.82%)	69.99%	75.81%
Crassostrea gigas high-quality model RefSeq (XP_)	28,029	14,323 (51.10%)	14,323 (51.10%)	58.19%	39.97%
Crassostrea gigas known RefSeq (NP_)	147	108 (73.47%)	108 (73.47%)	71.28%	68.88%
Drosophila melanogaster known RefSeq (NP_)	30,704	9,793 (31.89%)	9,793 (31.89%)	60.11%	45.49%
Strongylocentrotus purpuratus high-quality model RefSeq (XP_)	19,173	11,500 (59.98%)	11,500 (59.98%)	62.09%	51.32%
Strongylocentrotus purpuratus known RefSeq (NP_)	425	325 (76.47%)	325 (76.47%)	71.81%	69.02%
Tunicata GenBank	11,518	6,681 (58.00%)	6,681 (58.00%)	63.91%	58.50%
Ciona intestinalis GenBank	1,268	776 (61.20%)	776 (61.20%)	64.34%	48.72%
Ciona intestinalis high-quality model RefSeq (XP_)	11,388	6,468 (56.80%)	6,468 (56.80%)	59.71%	48.47%
Ciona intestinalis known RefSeq (NP_)	942	124 (13.16%)	124 (13.16%)	58.24%	36.48%
Branchiostomidae GenBank	1,434	1,383 (96.44%)	1,383 (96.44%)	79.29%	88.27%
Danio rerio GenBank	27,371	8,055 (29.43%)	8,055 (29.43%)	63.30%	54.98%
Danio rerio known RefSeq (NP_)	15,872	11,411 (71.89%)	11,411 (71.89%)	63.66%	56.95%
Homo sapiens known RefSeq (NP_)	59,221	30,506 (51.51%)	30,506 (51.51%)	62.44%	53.08%

References

RefSeq: Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, Dicuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM. Nucleic Acids Research 2014, 42(Database issue):D756-63
RepeatMasker: Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2004. http://www.repeatmasker.org
WindowMasker: Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Bioinformatics 2006, 2:134-41
Splign: Kapustin Y, Souvorov A, Tatusova T, Lipman D. Biology Direct 2008, 3:20

RefSeq

Integrated reference sequences