NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Series GSE268261 Query DataSets for GSE268261
Status Public on Jul 17, 2024
Title Interpretably deep learning amyloid nucleation by massive experimental quantification of random sequences
Organism Saccharomyces cerevisiae
Experiment type Other
Summary More than 50 human diseases are characterized by the deposition of specific protein aggregates in the form of insoluble amyloid fibrils. However, only a very small number of proteins are known to form amyloids with high propensity, limiting our ability to understand, predict and engineer amyloid aggregation from sequence. Here we use a massively parallel assay to quantify the amyloid nucleation propensity of >100,000 random 20 amino acid sequences. Approximately 5% of assayed random sequences nucleate the formation of aggregates, generating a very large and diverse training dataset from which to train models to predict amyloid nucleation. We use this dataset to train CANYA, a convolution-attention hybrid neural network that predicts the propensity of any primary sequence to form amyloids. CANYA outperforms previous predictors of protein aggregation on additional random sequences and out-of-sample datasets including human disease-causing amyloids, with very stable performance across diverse prediction tasks. We adapt and extend recent advances in interpretability of genomic neural networks to elucidate CANYA’s decision-making process and learned grammar and to provide mechanistic insights into amyloid formation. Our results demonstrate the power of massive experimental random sequence-space exploration and provide an interpretable and robust neural network model for understanding, predicting and designing amyloid-forming proteins.
 
Overall design Systematic measurement of the nucleation of random 20mers peptides
 
Contributor(s) Lehner B, Bolognesi B, Thompson M, Martìn M
Citation(s) 39071305
Submission date May 23, 2024
Last update date Aug 16, 2024
Contact name Mariano Martín
E-mail(s) mmartin@ibecbarcelona.eu
Organization name IBEC
Street address c/ Baldiri Reixac 10-12
City Barcelona
ZIP/Postal code 08028
Country Spain
 
Platforms (1)
GPL19756 Illumina NextSeq 500 (Saccharomyces cerevisiae)
Samples (22)
GSM8288859 input NNK1
GSM8288860 output1 NNK1
GSM8288861 output2 NNK1
Relations
BioProject PRJNA1115911

Download family Format
SOFT formatted family file(s) SOFTHelp
MINiML formatted family file(s) MINiMLHelp
Series Matrix File(s) TXTHelp

Supplementary file Size Download File type/resource
GSE268261_MT_MM_TSM_BB_BL_processed_data.xlsx 6.9 Mb (ftp)(http) XLSX
SRA Run SelectorHelp
Raw data are available in SRA

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap