GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Sample GSM1503654

Query DataSets for GSM1503654

Status

Public on Nov 04, 2014

Title

L21 mRNA profiling

Sample type

SRA

Source name

KK progeny, early developing bud primordia, male

Organism

Diospyros lotus

Characteristics

cultivar: Kunsenshi
tissue: mixed persimmon buds
developmental stage: early differentiation of primordia
Sex: male

Treatment protocol

For expression analyses, mixed buds were sampled on June 17th, and July 4th, 2013, corresponding to the early differentiation stages of male/female primordia, respectively.

Growth protocol

Trees from the 57 KK population (T. Akagi et al., J. Japan Soc. Hort. Sci. 76, 214-221 (2014)) (D. lotus) and their parents, planted in Kyoto University (Kyoto, Japan, 35º03’, 135º78’ L/L).

Extracted molecule

total RNA

Extraction protocol

mRNA libraries: Total RNA was extracted using the CTAB method and purified by phenol/chloroform extraction.
mRNA libraries: Fifteen to twenty micrograms of total RNA was processed in preparation for Illumina sequencing, according to the previous report (D. Burkart-Waco et al., Plant Cell 25, 2037-2055 (2013)). Briefly, mRNA was purified using the Dynabeads mRNA purification kit (Life Technologies). Next, cDNA was synthesized via random priming using Superscript III (Life Technologies), followed by heat inactivation for 5 min at 65°C. Second-strand cDNA was synthesized using the second-strand buffer (200 mM Tris-HCl, pH 7.0, 22 mM MgCl2, and 425 mM KCl), DNA polymerase I (NEB) and RNaseH (NEB) with incubation at 16°C for 2.5 h. Double-stranded cDNA was purified using AMPure with a 1.8 : 1 (v/v) AMPure to reaction volume ratio. The resulting double-stranded cDNAs were subjected to fragmentation and following library construction, as follows. Approximately 1.0 μg of cDNA was fragmented using NEBNext dsDNA Fragmentase (New England BioLabs; NEB) for 40-60 min at 37°C and cleaned using Agencourt AMPure XP (Beckman Coulter Genomics) for size-selection. To select fragments ranging between 200 and 600 bp, 25 μl AMPure were added to the initial 50 μl reaction. After a brief incubation, 72 μl of the supernatant was transferred to a new tube, and an additional 12 μl water and 36 μl AMPure were added. After a second brief incubation, the supernatant was discarded and the DNA was eluted from the beads in 20 μl of EB, as recommended. Next, DNA fragments were subjected to end repair using NEB’s End Repair Module Enzyme Mix, and A-base overhangs were added with Klenow (NEB), as recommended by the manufacturer. End repair and A-base addition were both followed by AMPure cleanup using 1.8 : 1 (v/v) AMPure / reaction. Barcoded NEXTflex adaptors (Bioo Scientific) were ligated at room temperature using NEB Quick Ligase (NEB), following the manufacturer’s recommendations. To remove contamination of self-ligated adapter dimers, libraries were size-selected using AMPure in 0.8 : 1 (v/v) AMPure : reaction volume, in order to select for adapter-ligated DNA fragments at least 300-bp long. Half of the eluted DNA was enriched by PCR reaction using Phusion 2X HF master mix (NEB), with the following PCR conditions: 30 s at 95°C; 10 cycles of 10 s at 95°C, 30 s at 65°C, and 30 s at 72°C and a final extension step of 1 min at 72°C. Enriched libraries were purified with AMPure (0.8 : 1 v/v AMPure to reaction), and quality and quantity were assessed using the Agilent BioAnalyzer (Agilent Technologies) and Qubit fluorometer (Invitrogen). The constructed libraries were sequenced on Illumina’s HiSeq 2500 sequencer (150-bp paired-end reads).

Library strategy

RNA-Seq

Library source

transcriptomic

Library selection

cDNA

Instrument model

Illumina HiSeq 2500

Description

Sample15
KK progeny population of Kunsenshi male and female cross.
mRNA
Processed data file: AllMF-CLC-denovo-count-new08122014-rev_GEO.txt

Data processing

Sequence processing: Illumina sequencing reads were processed with custom Python scripts developed in the Comai laboratory and available online (http://comailab.genomecenter.ucdavis.edu/index.php/Barcoded_data_preparation_tools). In short, sequences were split according to barcode information, trimmed for quality (average Phred sequence quality > 20 over a 5 bp sliding window) and adaptor sequence contamination after which reads shorter than 35-bp were discarded (except for the small RNA reads, for which read length cut-off was reduced to 19 bps). For the smRNA data, a maximum read length of 25 bp was applied as well. After application of those size thresholds, a total of approximately 12.0 M and 9.6 M reads were obtained from the female and male samples, respectively.
Sex-specific k-mer extraction: To select sex-biased reads, the quality trimmed read files from both male and female samples were processed to identify gender-specific subsequences using custom Python scripts. For the genomic reads, all 35 bp kmers (words) starting with the "AG" dinucleotide were selected from all reads, while keeping track of the number of times each specific subsequence was collected. The use of a dinucleotide trigger sequence allowed us to restrict file size while retaining the ability to compare k-mers between reads by effectively phasing them. For the RNA-Seq data, no trigger sequence was used, resulting in the selection of all possible k-mers. Next, the set of subsequences that met a minimum total (male + female) count threshold of 10 for genomic kmers and 20 for RNA-Seq kmers, and a maximum total count threshold of 200 for genomic kmers and 2000 for RNA-Seq kmers were retained. The k-mer counts were then compared between male and female reads. Finally, fully male specific k-mers (count of 0 in the female set) were identified and used to extract the sex-biased reads from the original quality trimmed read set as follows: all pair-ended reads containing at least one of the selected fully male-specific kmers were retained.
Identification of expressed genes involved in sex determination: RNA-Seq reads from 9 male and 9 female individuals from the KK population and their parents (N = 2 x 10) were subjected to three independent analyses in order to identify candidate genes involved in sex determination in D. lotus. (i) RNA-Seq 150-bp paired-end reads were fragmented to 3 x 50-bp (x 2), and mapped to 796 genomic contigs on the MSY and the 777 corresponding X allelic sequences assembled from the female genomic reads. The original 150-bp PE RNA-Seq reads containing one or more 50-bp kmers mapping to these genomic contigs were used to assemble cDNA contigs. (ii) Male-specific k-mers (MSKs) were isolated directly from the RNA-Seq read sequences (k = 35, ≥ 20X coverage). Next, all reads containing one or more of these MSKs were assembled into cDNA contigs using the CLC assembler. Using the genomic sequence reads obtained from 57 individuals, recombination mapping was performed to define a subset of MSY-linked contigs. The cDNA contigs constructed using approaches (i) and (ii) were used for differential expression analysis by determining reads per kilobase per millions (RPKM) values (> 1.0), and annotated using the TAIR/uniprot databases. The approach followed to integrate all data concerning the putative SD loci is described below. For the third approach, (iii), all male and female full-length RNA-Seq reads were aligned using the CLC assembler to produce approximately 400,000 contigs, including allelic polymorphisms. Selection for contigs that exhibited RPKM values of at least 1.0 reduced that number to approximately 80,000 contigs. The number of RNA-Seq reads from each individual mapping to these contigs, were used for DESeq analysis described below.
Construction of cDNA contigs located in the MSY: To map the RNA-Seq reads to the 796 and 777, respectively, Y- and X genomic contigs, each 150-bp read sequence was first converted to three 50-bp fragments using a custom Python script, to decrease the negative effect of intron sequences on mapping efficiency. Here, each fragmented read from the male and female individuals of the KK population was mapped, respectively, to the 796 putative Y and 777 putative X MSY allelic contigs using BWA and allowing no nucleotide mismatches. The original PE reads for which at least one of the fragmented read was mapped to a genomic contigs, were extracted using custom Python scripts. The mapped PE reads were assembled into contigs using Trinity and CAP3 and default parameters. Next, cDNA contigs exhibiting > 99% nucleotide homology to the 796 Y or 777 X allelic genomic contigs over at least 100-bp, were defined as cDNA contigs located in the sex-determining region. From the Y and X allelic genomic contigs, respectively, 99 and 81 cDNA contigs were retained. Alignment of the genomic and cDNA contigs to each other indicated that these cDNA contigs were derived from 60 of the 796 Y allelic genomic contigs and / or their corresponding X allelic contigs.
Expression profiling: cDNA contigs corresponding to genes expressed in male or female developing bud primordia were assembled with the full-length cDNA reads from all male and female individuals using the CLC assembler and a minimum contig length of 200-bp. Next, the resulting 400,000 cDNA contigs, including some alternative splicing and isoforms, were used as reference sequences for alignment of the reads using BWA with default parameters. The read counts per contig were generated from the aligned SAM files using a custom R script. Differential expression between male and female individuals was analyzed in R (version 3.0.1) using the R package DESeq (version 1.14; http://bioconductor.org/packages/release/bioc/html/DESeq.html) (S. Anders and W. Huber, Genome Biol. 11, R106 (2010)). We conducted DESeq analysis using 10 biological replicates from male and female individuals, with the following parameters: method="per-condition" and sharingMode="gene-est-only". A False discovery rate (FDR) threshold of 0.01 was used to identify differentially expressed genes.
Genome_build: Assembled Contigs
Supplementary_files_format_and_content: .txt: Count file; simple reads mapped per contig. The contig sequences are available in the FASTA file (GSE61386_CLC_MF_All_DeNovo.fasta) that is linked to the GSE61386 record as a supplementary file.

Submission date

Sep 12, 2014

Last update date

May 15, 2019

Contact name

Luca Comai

E-mail(s)

lcomai@ucdavis.edu

Phone

(530) 752-8485

Organization name

UC Davis