Computational solutions developed in one application domain are often easily generalized to new problem settings. However, the poor availability of algorithmic implementations slows down the flow from theory to practice. An increasing proportion of research details in modern data-intensive science are embedded in the code and data accompanying traditional publications. The lack of widely adopted standards for sharing these resources form serious bottlenecks for transparency, reproducibility, and progress.
The Machine learning open source software (MLOSS) workshop provides a meeting point for the machine learning community to discuss potential solutions to these challenges. Victoria Stodden, a science commons fellow, gave fascinating keynote talk concerning intellectual property issues in modern data-intensive science. This and other presentations are freely available at videolectures.net. Wide-spread adoption of open source policies will have remarkable impact on machine learning and its applications through accelerated scientific process and enhanced reproducibility (see the recent position paper advocating the need to make computational code available to other scientists, data analyst, and general public). The community website mloss.org supports the movement towards open access by hosting >200 machine learning projects. Our Probabilistic Dependency Modelling Toolkit is one among the many.
Unrestricted access to algorithmic solutions will have wider implications in society through facilitating the emergence, flow, and application of computational ideas. The full potential of these resources will be realized only when the public at large will have convenient access to shared data resources, and tools for discovery.