NeuroDebian is a turnkey research software platform for all aspects of the neuroscientific research process. It takes the ideas of the software hosting portals such as NITRC on maximizing research transparency and methods sharing, one step further, by providing a comprehensive suite of readily usable and fully integrated software with a robust testing and deployment infrastructure. Consequently, it improves interoperability among the tools and frees researchers from the burden of tedious installation or upgrade procedures. That, in turn, positively affects their availability for actual research activities, as well as their motivation to test new analysis tools and stay connected with the latest methodological developments in the field.
- Y.O. Halchenko & M. Hanke (2012). Open is not enough. Let's take the next step: An integrated, community-driven computing platform for neuroscience. Frontiers in Neuroinformatics, 6:22. [PDF] DOI: 10.3389/fninf.2012.00022
PyMVPA is a Python-based framework for neural decoding using multivariate pattern analysis. It affords both volume- and surface-based analyses using a wide variety of supervised and unsupervised machine learning methods, representational similarity analyses, searchlight analyses, hyperalignment of representational spaces, and model-based decoding and encoding. The software also can be used for neural data other than fMRI, including analysis of MEG and EEG data through spatio-temporo-frequency band searchlights and cross-modal EEG to fMRI trans-fusion. It also has been used for analyses on data unrelated to neuroscience, demonstrating its general utility. PyMVPA also serves as a repository for sample data sets (e.g., Haxby et al. 2001) that has found wide applicability for education, development of new algorithms, or new analyses and independent research reports.
- M. Hanke, Y.O. Halchenko, et al. (2009). PyMVPA: A Python toolbox for multivariate pattern analysis of fMRI data. Neuroinformatics, 7, 37-53. DOI: 10.1007/s12021-008-9041-y
DataLad is ongoing work funded by NSF and German BMBF, to adapt the model of open-source software (OSS) distributions to address the technical limitations of today's data-sharing and develop all components of a "data distribution". The key concepts are: 1) Leverage - but do not replace - independent, existing, and future data hosting solutions to form a federated platform for data-sharing. 2) Employ software for data tracking and deployment logistics specialized for large data (git-annex) built atop Git, the most capable distributed version control system (dVCS) available today, to enable efficient data access at any level of granularity (from single files to entire collections of datasets). DataLad will provide access to data available from various sources (e.g. lab or consortium web-sites such as humanconnectome.org; data sharing portals such as openfmri.org and crcns.org) through a single interface. It will enable students and scientists to operate on data using familiar concepts, such as files and directories, while transparently managing data access and authorization with underlying hosting providers.
- NSF (#1429999) and BMBF awarded CRCNS US-German Data Sharing Project: DataGit - converging catalogues, warehouses, and deployment logistics into a federated 'data distribution'. PIs: Y.O. Halchenko and M. Hanke
- M. Hanke, M. Visconti di Oleggio Castello, K. Meyer, B. Poldrack, and Y.O. Halchenko (2018). YODA: YODA's organigram on data analysis. OHBM 2018, Singapore.
DueCredit provides solution for the problem of inadequate citation and referencing of scientific software and methods. It provides a simple framework (at the moment for Python only) to embed publication or other references in the original code so they are automatically collected and reported to the user at the necessary level of reference detail, i.e. only references for actually used functionality will be presented back if software provides multiple citeable implementations.
As a side-effect, we hope that DueCredit also will reduce demand in "prima-ballerina" projects, will encourage contributions to existing open-source codebases, and as a result would solidify scientific software ecosystem.
- Y.O. Halchenko and M. Visconti di Oleggio Castello (2016). DueCredit - automagically collect citations for software, methods, and data you use. OHBM 2016, Geneva, Switzerland
HeuDiConv / ReproIn
HeuDiConv is a flexible DICOM converter for organizing brain imaging data into structured directory layouts. As a part of the larger, NIH supported ReproNim effort, we are developing a HeuDiConv-based ReproIn solution for turnkey automatic conversion of all collected MR data to a collection of the BIDS DataLad datasets. It includes a flexible BIDS-like specification how to name scanning sequences in the scanner, and a HeuDiConv dbic_bids.py heuristic to automate layout and conversion of the datasets. This solution is deployed at DBIC (Dartmouth Brain Imaging Center) and already facilitates reproducible research, data sharing, and uploads to central archives such as NDA.
- M. Visconti di Oleggio Castello, James E. Dobson, Terry Sackett, Chandana Kodiweera, J.V. Haxby, M. Goncalves, S. Ghosh, Y.O. Halchenko ReproIn: automatic generation of shareable, version-controlled BIDS datasets from MR scanners, OHBM 2018, Singapore.
NICEMAN (Neuroimaging Computational Environments Manager) is also a part of the NIH supported ReproNim effort. It aims to facilitate reproducible computation via collection of detailed information about origin of the used components (Debian and/or Conda packages, VCS repositories, etc), so that computational environments could be analyzed, and re-created.
- M. Travers, R. Buccigrossi, C. Haselgrove, K. Meyer, and Y.O. Halchenko NICEMAN: NeuroImaging Computational Environments Manager, OHBM 2018, Singapore.
Quail is a Python toolbox for analyzing data from free recall memory experiments. Some key features include:
- A simple data structure for storing encoding and recall data
- A set of functions for analyzing data by computing standard memory performance metrics
- A simple API for customizing plot styles
- Support for "naturalistic" stimuli such as movies, texts, and speech data
- A set of powerful tools for importing data, automatically transcribing audio files (speech-to-text), and more
- A.C. Heusser, P.C. Fitzpatrick, C.E. Field, K. Ziman, and J.R. Manning (2017). Quail: A Python toolbox for analyzing and plotting free recall data. The Journal of Open Source Software, 2(18): 424.
HyperTools is a Python toolbox for gaining geometric insights into high dimensional data. Features include:
- Functions for plotting high-dimensional datasets in 2D and 3D
- Static and animated plots
- Simple API for customizing plot styles
- Set of powerful data manipulation tools including hyperalignment, k-means clustering, normalizing, and more
- Support of lists of Numpy arrays, Pandas dataframes, text, or (mixed) lists
- Applying topic models and other text and word embedding methods to text data
- A.C. Heusser, K. Ziman, L.L.W. Owen, and J.R. Manning (2018). HyperTools: a Python Toolbox for Gaining Geometric Insights into High-Dimensional Data. Journal of Machine Learning Research, 18: 1-6.
SuperEEG is a Python toolbox for inferring whole-brain activity from sparse ECoG recordings. The way the technique works is to leverage data from different patients' brains (who had electrodes implanted in different locations) to learn a "correlation model" that describes how activity patterns at different locations throughout the brain relate. Given this model, along with data from a sparse set of locations, we use Gaussian process regression to "fill in" what the patients' brains were "most probably" doing when those recordings were taken. Details on our approach may be found in this preprint. You may also be interested in watching this talk or reading this blog post from a recent conference.
- L.L.W. Owen, A.C. Heusser, and J.R. Manning (2018). A Gaussian process model of human electrocorticographic data. bioRxiv, 121020.
Open Brain Consent
Open Brain Consent initiative aims to facilitate neuroimaging data sharing by providing an "out of the box" solution addressing aforementioned human subjects concerns and consisting of
- widely acceptable consent form allowing deposition of anonymized data to public data archives
- collection of tools/pipelines to help anonymization of neuroimaging data making it ready for sharing
- Y.O. Halchenko, C.F. Gorgolewski, et al.
Brain Imaging Data Structure (BIDS)
- Gorgolewski, K. J., et many, Y.O. Halchenko et many more (2016). The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Scientific Data, 3. DOI: 10.1038/sdata.2016.44
- Example (test) datasets
- OpenfMRI Datasets in DataLad distribution
ReproNim: Reproducible Basics
Reproducible Basics training module of the ReproNim training curriculum presents daily core tools (shell, version control, etc) and explains how you could make your research more reproducible having gained improved knowledge of them.
- Y.O. Halchenko et al.
NIPY BuildBot Master
Instance was initiated by Matthew Brett to provide
continuous integration testing for the NiPy project. It quickly
to cover up a
wide variety of associated projects
our PyMVPA). Although it is
just an ad-hoc setup, it provides many project developers
testing environments which they could not otherwise easily
obtain elsewhere (e.g. on Travis-CI) -- various releases of
different operational systems (OS X, Windows, GNU/Linux Debian),
and even different architectures (e.g., PowerPC and SPARC).
Such rich coverage provides a valuable resource to the
scientific community helping to identify