|
|
GEO help: Mouse over screen elements for information. |
|
Status |
Public on Sep 03, 2020 |
Title |
A: pre-transfection MDA-MB-231 cells |
Sample type |
SRA |
|
|
Source name |
pre-transfection MDA-MB-231 cells
|
Organism |
Homo sapiens |
Characteristics |
tissue: Triple negative breast cancer gender: female cell line: MDA-MB-231
|
Extracted molecule |
total RNA |
Extraction protocol |
A total amount of 3 μg RNA per sample was used as input material for the RNA sample preparations. Firstly, ribosomal RNA was removed by Epicentre Ribo-zero™ rRNA Removal Kit (Epicentre, USA), and rRNA free residue was cleaned up by ethanol precipitation. Subsequently, sequencing libraries were generated using the rRNA- depleted RNA by NEBNext® Ultra™ Directional RNA Library Prep Kit for Illumina® (NEB, USA) following manufacturer’s recommendations. Briefly, fragmentation was carried out using divalent cations under elevated temperature in NEBNext First Strand Synthesis Reaction Buffer(5X). First strand cDNA was synthesized using random hexamer primer and M-MuLV Reverse Transcriptase(RNaseH-). Second strand cDNA synthesis was subsequently performed using DNA Polymerase I and RNase H. In the reaction buffer, dNTPs with dTTP were replaced by dUTP. Remaining overhangs were converted into blunt ends via exonuclease/polymerase activities. After adenylation of 3’ ends of DNA fragments, NEBNext Adaptor with hairpin loop structure were ligated to prepare for hybridization. In order to select cDNA fragments of preferentially 150~200 bp in length, the library fragments were purified with AMPure XP system (Beckman Coulter, Beverly, USA). Then 3 μl USER Enzyme (NEB, USA) was used with size-selected, adaptor-ligated cDNA at 37° C for 15 min followed by 5 min at 95°C before PCR. Then PCR was performed with Phusion High-Fidelity DNA polymerase, Universal PCR primers and Index (X) Primer. At last, products were purified (AMPure XP system) and library quality was assessed on the Agilent Bioanalyzer 2100 system.
|
|
|
Library strategy |
RNA-Seq |
Library source |
transcriptomic |
Library selection |
cDNA |
Instrument model |
Illumina HiSeq 2500 |
|
|
Data processing |
Raw data(raw reads) of fastq format were firstly processed through in-house perl scripts. In this step, clean data(clean reads) were obtained by removing reads containing adapter, reads on containing ploy-N and low quality reads from raw data. At the same time, Q20, Q30 and GC content of the clean data were calculated. All the down stream analyses were based on the clean data with high quality. Reference genome and gene model annotation files were downloaded from genome website directly. Index of the reference genome was built using bowtie2 v2.2.8 and paired-end clean reads were aligned to the reference genome using HISAT2(Langmead, B.et al) v2.0.4. HISAT2 was run with ‘--rna-strandness RF’, other parameters were set as default. The mapped reads of each sample were assembled by StringTie (v1.3.1) (Mihaela Pertea.et al. 2016) in a reference-based approach. StringTie uses a novel network flow algorithm as well as an optional de novo assembly step to assemble and quantitate full- length transcripts representing multiple splice variants for each gene locus. CNCI (Coding-Non-Coding-Index) (v2) profiles adjoining nucleotide triplets to effectively distinguish protein-coding and non-coding sequences independent of known annotations (Sun et al. 2013). We use CNCI with default parameters. CPC (Coding Potential Calculator) (0.9-r2) mainly through assess the extent and quality of the ORF in a transcript and search the sequences with known protein sequence database to clarify the coding and non-coding transcripts (Kong et al. 2007). We used the NCBI eukaryotes' protein database and set the e-value ‘1e-10’ in our analysis. Phast (v1.3) is a software package contains much of statistical programs, most used in phylogenetic analysis (Siepel, et al. 2005), and phastCons is a conservation scoring and identificating program of conserved elements. We used phyloFit to compute phylogenetic models for conserved and non-conserved regions among species and then gave the model and HMM transition parameters to phyloP to compute a set of conservation scores of lncRNA and coding genes. Cuffdiff (v2.1.1) was used to calculate FPKMs of both lncRNAs and coding genes in each sample (Trapnell, C. et al. 2010). Gene FPKMs were computed by summing the FPKMs of transcripts in each gene group. FPKM means fragments per kilo-base of exon per million fragments mapped, calculated based on the length of the fragments and reads count mapped to this fragment. The Ballgown suite includes functions for interactive exploration of the transcriptome assembly, visualization of transcript structures and feature-specific abundances for each locus, and post-hoc annotation of assembled features to annotated features(Alyssa C. Frazee et al.2014). Transcripts with an P-adjust <0.05 were assigned as differentially expressed. Cuffdiff provides statistical routines for determining differential expression in digital transcript or gene expression data using a model based on the negative binomial distribution (Trapnell, C. et al. 2010). Transcripts with an P-adjust <0.05 were assigned as differentially expressed. Gene Ontology (GO) enrichment analysis of differentially expressed genes or lncRNA target genes were implemented by the GOseq R package, in which gene length bias was corrected(Young, M. D.et al.2010). GO terms with corrected Pvalue less than 0.05 were considered significantly enriched by differential expressed genes. KEGG is a database resource for understanding high-level functions and utilities of the biological system(Kanehisa, M.et al.2008), such as the cell, the organism and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies (http://www.genome.jp/kegg/). We used KOBAS software to test the statistical enrichment of differential expression genes or lncRNA target genes in KEGG pathways(Mao, X.et al.1995). PPI analysis of differentially expressed genes was based on the STRING database, which known and predicted Protein-Protein Interactions. For the species existing in the database, we construct the networks by extract the target gene list from the database; Otherwise, Blastx (v2.2.28) was used to align the target gene sequences to the selected reference protein sequences, and then the networks was built according to the known interaction of selected reference species. Alternative splicing events were classified to 12 basic types by the software Asprofile v1.0. The number of AS events in each sample was estimated, separately. Picard-tools v1.96 and samtools v0.1.18 were used to sort, mark duplicated reads and reorder the bam alignment results of each sample. GATK2 software was used to perform SNP calling(McKenna, A.et al.2010). Genome_build: homo_sapiens_Ensembl_94
|
|
|
Submission date |
Nov 20, 2019 |
Last update date |
Sep 03, 2020 |
Contact name |
Bolin Wu |
Organization name |
Harbin Medical University Cancer Hospital
|
Street address |
No.150, Haping Road, Nangang District
|
City |
Harbin |
State/province |
Heilongjiang Province |
ZIP/Postal code |
150081 |
Country |
China |
|
|
Platform ID |
GPL16791 |
Series (1) |
GSE140729 |
Structure of LINC00511-siRNA-conjugated nanobubbles and improvement of cisplatin sensitivity on triple negative breast cancer |
|
Relations |
BioSample |
SAMN13336968 |
SRA |
SRX7191584 |
Supplementary data files not provided |
SRA Run Selector |
Raw data are available in SRA |
Processed data are available on Series record |
|
|
|
|
|