GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Sample GSM1503660

Query DataSets for GSM1503660

Status

Public on Nov 04, 2014

Title

LF small RNA profiling

Sample type

SRA

Source name

Parent, early developing bud primordia, female

Organism

Diospyros lotus

Characteristics

cultivar: Kunsenshi
tissue: mixed persimmon buds
developmental stage: early differentiation of primordia
Sex: female

Treatment protocol

For expression analyses, mixed buds were sampled on June 17th, and July 4th, 2013, corresponding to the early differentiation stages of male/female primordia, respectively.

Growth protocol

Trees from the 57 KK population (T. Akagi et al., J. Japan Soc. Hort. Sci. 76, 214-221 (2014)) (D. lotus) and their parents, planted in Kyoto University (Kyoto, Japan, 35º03’, 135º78’ L/L).

Extracted molecule

total RNA

Extraction protocol

Small RNA libraries: Total RNA was extracted using the CTAB method and purified by phenol/chloroform extraction. The small RNA fraction was concentrated from total RNA using the mirVana miRNA Isolation kit (Life Technologies).
Small RNA library: Approximately 150 ng of concentrated small RNA was processed to library construction using the NEBNext Small RNA Library Prep Set (NEB), according to the manufacturer’s instruction. PCR enrichment reactions used 12 cycles of amplifications, followed by DNA cleanup using AMPure (AMPure : reaction = 1.1 : 1 v/v) to remove self-ligated adapter dimers. Library quality and quantity were assessed using the Agilent BioAnalyzer (Agilent Technologies) and Qubit fluorometer (Invitrogen). The constructed libraries were sequenced using Illumina’s HiSeq 2000 sequencer (50-bp single-end reads).

Library strategy

ncRNA-Seq

Library source

transcriptomic

Library selection

size fractionation

Instrument model

Illumina HiSeq 2000

Description

Sample19_2
Kunsenshi female parent of the KK progeny population.
small RNA
Processed data file: AllContigsIMH14_All5.txt

Data processing

Sequence processing: Illumina sequencing reads were processed with custom Python scripts developed in the Comai laboratory and available online (http://comailab.genomecenter.ucdavis.edu/index.php/Barcoded_data_preparation_tools). In short, sequences were split according to barcode information, trimmed for quality (average Phred sequence quality > 20 over a 5 bp sliding window) and adaptor sequence contamination after which reads shorter than 35-bp were discarded (except for the small RNA reads, for which read length cut-off was reduced to 19 bps). For the smRNA data, a maximum read length of 25 bp was applied as well. After application of those size thresholds, a total of approximately 12.0 M and 9.6 M reads were obtained from the female and male samples, respectively.
Sex-specific k-mer extraction: To select sex-biased reads, the quality trimmed read files from both male and female samples were processed to identify gender-specific subsequences using custom Python scripts. For the genomic reads, all 35 bp kmers (words) starting with the "AG" dinucleotide were selected from all reads, while keeping track of the number of times each specific subsequence was collected. The use of a dinucleotide trigger sequence allowed us to restrict file size while retaining the ability to compare k-mers between reads by effectively phasing them. For the RNA-Seq data, no trigger sequence was used, resulting in the selection of all possible k-mers. Next, the set of subsequences that met a minimum total (male + female) count threshold of 10 for genomic kmers and 20 for RNA-Seq kmers, and a maximum total count threshold of 200 for genomic kmers and 2000 for RNA-Seq kmers were retained. The k-mer counts were then compared between male and female reads. Finally, fully male specific k-mers (count of 0 in the female set) were identified and used to extract the sex-biased reads from the original quality trimmed read set as follows: all pair-ended reads containing at least one of the selected fully male-specific kmers were retained.
Identification of expressed genes involved in sex determination: RNA-Seq reads from 9 male and 9 female individuals from the KK population and their parents (N = 2 x 10) were subjected to three independent analyses in order to identify candidate genes involved in sex determination in D. lotus. (i) RNA-Seq 150-bp paired-end reads were fragmented to 3 x 50-bp (x 2), and mapped to 796 genomic contigs on the MSY and the 777 corresponding X allelic sequences assembled from the female genomic reads. The original 150-bp PE RNA-Seq reads containing one or more 50-bp kmers mapping to these genomic contigs were used to assemble cDNA contigs. (ii) Male-specific k-mers (MSKs) were isolated directly from the RNA-Seq read sequences (k = 35, ≥ 20X coverage). Next, all reads containing one or more of these MSKs were assembled into cDNA contigs using the CLC assembler. Using the genomic sequence reads obtained from 57 individuals, recombination mapping was performed to define a subset of MSY-linked contigs. The cDNA contigs constructed using approaches (i) and (ii) were used for differential expression analysis by determining reads per kilobase per millions (RPKM) values (> 1.0), and annotated using the TAIR/uniprot databases. The approach followed to integrate all data concerning the putative SD loci is described below. For the third approach, (iii), all male and female full-length RNA-Seq reads were aligned using the CLC assembler to produce approximately 400,000 contigs, including allelic polymorphisms. Selection for contigs that exhibited RPKM values of at least 1.0 reduced that number to approximately 80,000 contigs. The number of RNA-Seq reads from each individual mapping to these contigs, were used for DESeq analysis described below.
Construction of cDNA contigs located in the MSY: To map the RNA-Seq reads to the 796 and 777, respectively, Y- and X genomic contigs, each 150-bp read sequence was first converted to three 50-bp fragments using a custom Python script, to decrease the negative effect of intron sequences on mapping efficiency. Here, each fragmented read from the male and female individuals of the KK population was mapped, respectively, to the 796 putative Y and 777 putative X MSY allelic contigs using BWA and allowing no nucleotide mismatches. The original PE reads for which at least one of the fragmented read was mapped to a genomic contigs, were extracted using custom Python scripts. The mapped PE reads were assembled into contigs using Trinity and CAP3 and default parameters. Next, cDNA contigs exhibiting > 99% nucleotide homology to the 796 Y or 777 X allelic genomic contigs over at least 100-bp, were defined as cDNA contigs located in the sex-determining region. From the Y and X allelic genomic contigs, respectively, 99 and 81 cDNA contigs were retained. Alignment of the genomic and cDNA contigs to each other indicated that these cDNA contigs were derived from 60 of the 796 Y allelic genomic contigs and / or their corresponding X allelic contigs.
Expression profiling: cDNA contigs corresponding to genes expressed in male or female developing bud primordia were assembled with the full-length cDNA reads from all male and female individuals using the CLC assembler and a minimum contig length of 200-bp. Next, the resulting 400,000 cDNA contigs, including some alternative splicing and isoforms, were used as reference sequences for alignment of the reads using BWA with default parameters. The read counts per contig were generated from the aligned SAM files using a custom R script. Differential expression between male and female individuals was analyzed in R (version 3.0.1) using the R package DESeq (version 1.14; http://bioconductor.org/packages/release/bioc/html/DESeq.html) (S. Anders and W. Huber, Genome Biol. 11, R106 (2010)). We conducted DESeq analysis using 10 biological replicates from male and female individuals, with the following parameters: method="per-condition" and sharingMode="gene-est-only". A False discovery rate (FDR) threshold of 0.01 was used to identify differentially expressed genes.
Genome_build: Assembled Contigs
Supplementary_files_format_and_content: .txt: Count file; simple reads mapped per contig. The contig sequences are available in the FASTA file (GSE61386_CLC_MF_All_DeNovo.fasta) that is linked to the GSE61386 record as a supplementary file.

Submission date

Sep 12, 2014

Last update date

May 15, 2019

Contact name

Luca Comai

E-mail(s)

lcomai@ucdavis.edu

Phone

(530) 752-8485

Organization name

UC Davis