How to preprocess 20,000 CEL files (or more) on an ordinary desktop computer in a few hours? Our new Online-RPA algorithm – developed in collaboration with the EBI functional genomics group and recently published in Nucleic Acids Research (2013) – enables full utilization of the most comprehensive microarray data collections available to date. We hope this will be widely adopted by the microarray community, and welcome feedback on the implementation.
Transcriptome-wide profiling data sets are now available on standardized microarray platforms (such as the Affymetrix HG-U133Plus2 array) for tens of thousands of samples, covering thousands of body sites and disease conditions through ArrayExpress and other genomic data repositories. The lack of scalable probe-level preprocessing techniques for very large gene expression atlas collections has formed a bottleneck for full utilization of these data resources.
The new online-version of RPA (Robust Probabilistic Averaging) now allows fully scalable analysis of contemporary (Affymetrix and other) short oligonucleotide microarray atlases of any size, up to arbitrarily large collections involving hundreds of thousands of samples. The scalability is achieved by sequential hyperparameter updates, circumventing the extensive memory requirements of standard approaches. Unlike fRMA, our method is readily applicable to all short oligonucleotide platforms. It also outperforms the standard RMA (a special case of the general RPA model) already in moderately sized standard data sets and can be used as the default preprocessing method for short oligo microarrays.
Online-RPA is freely available as a R/Bioconductor package. The wiki site provides installation instructions and usage examples. For feedback, issues, bug tracking, and pull requests, see the Github development version.
- A fully scalable online pre-processing algorithm for short oligonucleotide microarray atlases Leo Lahti, Aurora Torrente, Laura L Elo, Alvis Brazma, Johan Rung. Nucleic Acids Research 41(10):e110, 2013.
- Probabilistic analysis of probe reliability in differential gene expression studies with short oligonucleotide arrays Leo Lahti, Laura L. Elo, Tero Aittokallio, and Samuel Kaski. IEEE/ACM Transactions on Computational Biology and Bioinformatics 8(1):217-25, 2011.
- Robust Probabilistic Analysis (RPA). Bioconductor package. Leo Lahti, 2013. URL: http://bioconductor.org/packages/release/bioc/html/RPA.html