GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Sample GSM5051942

Query DataSets for GSM5051942

Status

Public on Aug 11, 2021

Title

DNAseq of Arabidopsis: ddm1-1_rep2

Sample type

SRA

Source name

rosette leaf

Organism

Arabidopsis thaliana

Characteristics

genotype: ddm1-1
age: three-week-seedling
tissue: rosette leaf

Growth protocol

Arabidopsis seeds were plated on 1/2 Murashige and Skoog (MS) medium with 0.6% agar and 1.5% sucrose and stratified for 7 days at 4℃ in darkness before being transferred to the growth chamber (16 h light/8 h dark, 22℃). Then, the 2-week-old seedlings were transplanted to soil for growth 1 week in growth chamber (16 h light/8 h dark, 22℃).

Extracted molecule

genomic DNA

Extraction protocol

Genomic DNA was extracted from rosette leaf of three-week-old plants using the DNeasy Plant Maxi kit (Qiagen).
Library construction and sequencing were performed at the PSC Genomics Core Facility.

Library strategy

OTHER

Library source

genomic

Library selection

other

Instrument model

Illumina HiSeq 2500

Data processing

For RNA-seq data, low-quality sequences and adaptors were trimmed using Trimmomatic. Clean reads were mapped to the reference genome by TopHat with the parameter "-g 1". The total number of reads mapping to each gene was calculated with the htseq-count script in HTSeq with the parameter “--nonunique all”, to minimize over-estimation of TE expression caused by overlapping genes; read counts for each TE were calculated by htseq-count with parameters "--nonunique all -m intersection-strict" based on the TE-only annotation. Genes and TEs with a normalized expression level of at least one count per million mapped reads (CPM) in three or more libraries were considered as expressed. Principle component analysis (PCA) based on the transcript levels of the expressed protein-coding genes and TEs was performed using the prcomp function in R software with default settings. Differentially expressed TEs with at least a 2-fold change in expression and an FDR < 0.05 were identified by the R package edgeR using the trimmed mean of M-values (TMM) method.
For methylation data, low-quality sequences and adapters were trimmed using Trimmomatic with parameters “LEADING: 3 SLIDINGWINDOW: 4: 30 MINLEN: 36”, and clean reads were mapped to the A. thaliana TAIR 10 genome using Bisulfite Sequence Mapping Program (BSMAP) with parameters “-v 2 -S 1” which allowing 2 mismatches. The methratio.py script from BSMAP with parameters “-r -z -p -m 1” was used to extract the methylation ratio from mapping results; only mapped reads after deduplication were considered for subsequent analyses.
For ChIP-seq data analysis, around 15.2 million raw paired-end reads were obtained for each sample and subsequently cleaned by Trimmomatic. Clean reads were mapped to the reference genome by Bowtie2 using the parameter "--very-sensitive --no-unal --no-mixed --no-discordant -k 2". Subsequently, uniquely mapped reads were selected and marked as duplicates using the Picard tool followed by using the SAMtools “rmdup” command. Coverage of deduplicated reads was normalized to 1 × sequencing depth by bamCoverage in deepTools with parameters "--normalizeUsing RPGC --exactScaling -bs 10".
Identification of non-reference TE insertions with TSDs (Target Site Duplications) was conducted using SPLITREADER with some modifications.
Genome_build: TAIR10
Supplementary_files_format_and_content: methylation wig files, RNA-seq coverage bedgGraph, ChIP-seq BigWig files

Submission date

Feb 01, 2021

Last update date

Aug 11, 2021

Contact name

Cheng Zhao

E-mail(s)

zhaocheng3326@gmail.com

Organization name

Karolinska institute

Department

Clinical Science