Understanding functional organization of genetic information is a major challenge in modern biology. Following the initial publication of the human genome sequence in 2001, advances in high-throughput measurement technologies and efficient sharing of research material through community databases have opened up new views to the study of living organisms and the structure of life.
High-dimensional genomic observations are associated with high levels of complex and largely unknown sources of variation. By combining statistical evidence across multiple measurement sources and the wealth of background information in genomic data repositories it has become possible to solve some the uncertainties associated with individual observations and to identify functional mechanisms that could not be detected based on individual measurement sources. However, measuring of all the aspects of genome function is far beyond the level any single research institution could afford. Therefore, sharing of research data, methods, code, ideas, and other research material through centralized databases has become a central element in investigating the structure and function of the human genome. Contributing to these global efforts was a key underlying motivation also with my recent thesis, where novel computational strategies with open source implementations were developed to investigate various aspects of genome function by integrative analysis of heterogeneous genomic data sources.
Similar computational challenges are encountered in quantitative social science, where the limited availability of observations and targeted computational tools form a bottleneck in investigating such extremely complex and poorly understood systems. Can the open data movement advance similar global collaboration that we currently have in modern genomics, to understand the structure and function of human societies?