Overall design 
We preprocessed as a single dataset gene expression data collected from 26 publicly available GEO and ArrayExpress studies by applying a custom preprocessing pipeline based on GCRMA. The assembled dataset contains 1489 microarray samples of human tissue that measure gene expression of normal lung (NORM) and 5 different lung diseases (lung adenocarcinoma (ADC), lung squamous cell carcinoma (SCC), large cell lung carcinoma (LCLC), asthma (AST), and chronic obstructive pulmonary disease (COPD)) in preparation for subsequent disease classification analysis. The studies included are: GSE994, GSE10072, GSE10245, GSE10445, GSE10799, GSE12345, GSE12667, GSE14814, GSE1643, GSE1650, GSE17475, GSE18842, GSE18965, GSE19188, GSE2109, GSE3141, GSE4302, GSE5058, GSE6253, GSE7368, GSE7670, GSE8545, GSE8581, ETABM15, EMEXP231, and EMTAB47. These studies collected the microarray data using 2 platforms: Affymetrix U133A (GPL96) and Affymetrix U133Plus2 (GPL570).
The following are included in this reanalysis:
79_1 ArrayExpress study ETABM15 sample 1, ADC lung surgical resection GPL96 47_1 ArrayExpress study ETABM15 sample 2, ADC lung surgical resection GPL96 41_1 ArrayExpress study ETABM15 sample 3, ADC lung surgical resection GPL96 61_1 ArrayExpress study ETABM15 sample 4, ADC lung surgical resection GPL96 92_1 ArrayExpress study ETABM15 sample 5, ADC lung surgical resection GPL96 60_1 ArrayExpress study ETABM15 sample 6, ADC lung surgical resection GPL96 86_1 ArrayExpress study ETABM15 sample 7, ADC lung surgical resection GPL96 53_1 ArrayExpress study ETABM15 sample 8, ADC lung surgical resection GPL96 40_1 ArrayExpress study ETABM15 sample 9, ADC lung surgical resection GPL96 33_1 ArrayExpress study ETABM15 sample 10, ADC lung surgical resection GPL96 71_1 ArrayExpress study ETABM15 sample 11, ADC lung surgical resection GPL96 23_1 ArrayExpress study ETABM15 sample 12, ADC lung surgical resection GPL96 78_1 ArrayExpress study ETABM15 sample 13, ADC lung surgical resection GPL96 89_1 ArrayExpress study ETABM15 sample 14, ADC lung surgical resection GPL96 38_1 ArrayExpress study ETABM15 sample 15, ADC lung surgical resection GPL96 84_1 ArrayExpress study ETABM15 sample 16, ADC lung surgical resection GPL96 54_1 ArrayExpress study ETABM15 sample 17, ADC lung surgical resection GPL96 87_1 ArrayExpress study ETABM15 sample 18, ADC lung surgical resection GPL96 59_1 ArrayExpress study ETABM15 sample 19, ADC lung surgical resection GPL96 28_1 ArrayExpress study ETABM15 sample 20, ADC lung surgical resection GPL96 32_1 ArrayExpress study ETABM15 sample 21, ADC lung surgical resection GPL96 67_1 ArrayExpress study ETABM15 sample 22, ADC lung surgical resection GPL96 48_1 ArrayExpress study ETABM15 sample 23, ADC lung surgical resection GPL96 40_2 ArrayExpress study ETABM15 sample 24, NORM lung surgical resection GPL96 33_2 ArrayExpress study ETABM15 sample 25, NORM lung surgical resection GPL96 54_2 ArrayExpress study ETABM15 sample 26, NORM lung surgical resection GPL96 48_2 ArrayExpress study ETABM15 sample 27, NORM lung surgical resection GPL96 84_2 ArrayExpress study ETABM15 sample 28, NORM lung surgical resection GPL96 89_2 ArrayExpress study ETABM15 sample 29, NORM lung surgical resection GPL96 28_2 ArrayExpress study ETABM15 sample 30, NORM lung surgical resection GPL96 41_2 ArrayExpress study ETABM15 sample 31, NORM lung surgical resection GPL96 92_2 ArrayExpress study ETABM15 sample 32, NORM lung surgical resection GPL96 78_2 ArrayExpress study ETABM15 sample 33, NORM lung surgical resection GPL96 87_2 ArrayExpress study ETABM15 sample 34, NORM lung surgical resection GPL96 38_2 ArrayExpress study ETABM15 sample 35, NORM lung surgical resection GPL96 23_2 ArrayExpress study ETABM15 sample 36, NORM lung surgical resection GPL96 61_2 ArrayExpress study ETABM15 sample 37, NORM lung surgical resection GPL96 32_2 ArrayExpress study ETABM15 sample 38, NORM lung surgical resection GPL96 47_2 ArrayExpress study ETABM15 sample 39, NORM lung surgical resection GPL96 79_2 ArrayExpress study ETABM15 sample 40, NORM lung surgical resection GPL96 86_2 ArrayExpress study ETABM15 sample 41, NORM lung surgical resection GPL96 DL06 ArrayExpress study EMEXP231 sample 1, ADC lung surgical resection GPL96 DL01 ArrayExpress study EMEXP231 sample 2, ADC lung surgical resection GPL96 DL05 ArrayExpress study EMEXP231 sample 3, ADC lung surgical resection GPL96 DL14 ArrayExpress study EMEXP231 sample 4, ADC lung surgical resection GPL96 DL39 ArrayExpress study EMEXP231 sample 5, ADC lung surgical resection GPL96 DL34 ArrayExpress study EMEXP231 sample 6, ADC lung surgical resection GPL96 DL88 ArrayExpress study EMEXP231 sample 7, ADC lung surgical resection GPL96 DL09 ArrayExpress study EMEXP231 sample 8, ADC lung surgical resection GPL96 DL59 ArrayExpress study EMEXP231 sample 9, ADC lung surgical resection GPL96 DL65 ArrayExpress study EMEXP231 sample 10, ADC lung surgical resection GPL96 DL31 ArrayExpress study EMEXP231 sample 11, ADC lung surgical resection GPL96 DL45 ArrayExpress study EMEXP231 sample 12, ADC lung surgical resection GPL96 DL63 ArrayExpress study EMEXP231 sample 13, ADC lung surgical resection GPL96 DL36 ArrayExpress study EMEXP231 sample 14, ADC lung surgical resection GPL96 DL13 ArrayExpress study EMEXP231 sample 15, ADC lung surgical resection GPL96 T421 ArrayExpress study EMEXP231 sample 16, ADC lung surgical resection GPL96 DL68 ArrayExpress study EMEXP231 sample 17, ADC lung surgical resection GPL96 DL50 ArrayExpress study EMEXP231 sample 18, ADC lung surgical resection GPL96 DL46 ArrayExpress study EMEXP231 sample 19, ADC lung surgical resection GPL96 DL30 ArrayExpress study EMEXP231 sample 20, ADC lung surgical resection GPL96 DL40 ArrayExpress study EMEXP231 sample 21, ADC lung surgical resection GPL96 DL51 ArrayExpress study EMEXP231 sample 22, ADC lung surgical resection GPL96 DL08 ArrayExpress study EMEXP231 sample 23, ADC lung surgical resection GPL96 DL35 ArrayExpress study EMEXP231 sample 24, ADC lung surgical resection GPL96 DL58 ArrayExpress study EMEXP231 sample 25, ADC lung surgical resection GPL96 DL61 ArrayExpress study EMEXP231 sample 26, ADC lung surgical resection GPL96 DL33 ArrayExpress study EMEXP231 sample 27, ADC lung surgical resection GPL96 DL16 ArrayExpress study EMEXP231 sample 28, ADC lung surgical resection GPL96 DL37 ArrayExpress study EMEXP231 sample 29, ADC lung surgical resection GPL96 DL12 ArrayExpress study EMEXP231 sample 30, ADC lung surgical resection GPL96 DL67 ArrayExpress study EMEXP231 sample 31, ADC lung surgical resection GPL96 DL91 ArrayExpress study EMEXP231 sample 32, ADC lung surgical resection GPL96 DL54 ArrayExpress study EMEXP231 sample 33, ADC lung surgical resection GPL96 DL47 ArrayExpress study EMEXP231 sample 34, ADC lung surgical resection GPL96 DL57 ArrayExpress study EMEXP231 sample 35, ADC lung surgical resection GPL96 DL03 ArrayExpress study EMEXP231 sample 36, ADC lung surgical resection GPL96 DL41 ArrayExpress study EMEXP231 sample 37, ADC lung surgical resection GPL96 DL15 ArrayExpress study EMEXP231 sample 38, ADC lung surgical resection GPL96 DL66 ArrayExpress study EMEXP231 sample 39, ADC lung surgical resection GPL96 DL11 ArrayExpress study EMEXP231 sample 40, ADC lung surgical resection GPL96 DL44 ArrayExpress study EMEXP231 sample 41, ADC lung surgical resection GPL96 DL38 ArrayExpress study EMEXP231 sample 42, ADC lung surgical resection GPL96 DL56 ArrayExpress study EMEXP231 sample 43, ADC lung surgical resection GPL96 DL60 ArrayExpress study EMEXP231 sample 44, ADC lung surgical resection GPL96 DL48 ArrayExpress study EMEXP231 sample 45, ADC lung surgical resection GPL96 DL89 ArrayExpress study EMEXP231 sample 46, ADC lung surgical resection GPL96 DL42 ArrayExpress study EMEXP231 sample 47, ADC lung surgical resection GPL96 DL32 ArrayExpress study EMEXP231 sample 48, ADC lung surgical resection GPL96 DL55 ArrayExpress study EMEXP231 sample 49, ADC lung surgical resection GPL96 DL15N ArrayExpress study EMEXP231 sample 50, NORM lung surgical resection GPL96 DL50N ArrayExpress study EMEXP231 sample 51, NORM lung surgical resection GPL96 DL4_1N ArrayExpress study EMEXP231 sample 52, NORM lung surgical resection GPL96 DL4_17N ArrayExpress study EMEXP231 sample 53, NORM lung surgical resection GPL96 DL4_7N ArrayExpress study EMEXP231 sample 54, NORM lung surgical resection GPL96 DL4_11N ArrayExpress study EMEXP231 sample 55, NORM lung surgical resection GPL96 DL86N ArrayExpress study EMEXP231 sample 56, NORM lung surgical resection GPL96 DL79N ArrayExpress study EMEXP231 sample 57, NORM lung surgical resection GPL96 DL67N ArrayExpress study EMEXP231 sample 58, NORM lung surgical resection GPL96 Biopsy_07 ArrayExpress study EMTAB47 sample 1, NORM lung surgical resection GPL570 Biopsy_11 ArrayExpress study EMTAB47 sample 2, NORM lung surgical resection GPL570 Biopsy_06 ArrayExpress study EMTAB47 sample 3, NORM lung surgical resection GPL570 Biopsy_13 ArrayExpress study EMTAB47 sample 4, NORM lung surgical resection GPL570 Biopsy_10 ArrayExpress study EMTAB47 sample 5, NORM lung surgical resection GPL570 Biopsy_01 ArrayExpress study EMTAB47 sample 6, NORM lung surgical resection GPL570 Biopsy_22 ArrayExpress study EMTAB47 sample 7, NORM lung surgical resection GPL570 Biopsy_04 ArrayExpress study EMTAB47 sample 8, NORM lung surgical resection GPL570 Biopsy_20 ArrayExpress study EMTAB47 sample 9, NORM lung surgical resection GPL570 Biopsy_14 ArrayExpress study EMTAB47 sample 10, NORM lung surgical resection GPL570 Biopsy_02 ArrayExpress study EMTAB47 sample 11, NORM lung surgical resection GPL570 Biopsy_08 ArrayExpress study EMTAB47 sample 12, NORM lung surgical resection GPL570
We preprocess using the MATLAB implementation of GCRMA. To minimize preprocessing batch effects, it is desirable to preprocess all samples in the entire dataset together. Preprocessing requires microarray platformspecific chip specifications indicating the locations of each probe on the chip, precluding global preprocessing on metaanalyses that use multiple platforms. To address this problem, we developed and applied a preprocessing pipeline to combine the raw .CEL files from multiple platforms that share the same probe sets. Importantly, our pipeline maps the locations of probes containing identical sequences from the different platforms to enable grouping of these probes and subsequent processing. These "consensus datasets" are then preprocessed normally by applying the GCRMA algorithm. The output of this consensus preprocessing contains only the 22,227 probes that exist on both microarray platforms that we use; probes that only appear on one of the platforms are excluded from preprocessing. We also include "presence/marginal/absence" calls based on the MAS5 protocol, in which the measured values on the "perfect match" (perfectly complementary to target) probes are compared to the values on the "mismatch" (basepair mismatch at position 13 of 25) probes with Wilcoxon signed rank test. "Present" calls are made when the pvalue of difference is less than 0.04; "marginal" calls are made when the pvalue lies between 0.04 and 0.06, and "absent" calls are made when the pvalue is greater than 0.06. GCRMA signal
