DataLad is ongoing work funded by NSF and German BMBF, to adapt the model of open-source software (OSS) distributions to address the technical limitations of today's data-sharing and provides a versatile data management platform. It uses software for data tracking and deployment logistics specialized for large data (git-annex) built atop Git, the most capable distributed version control system (dVCS) available today. DataLad provides access to data available from various sources (e.g. lab or consortium web-sites such as humanconnectome.org; data sharing portals such as openneuro.org and crcns.org) through a single interface. It enables students and scientists to operate on data using familiar concepts, such as files and directories, while transparently managing data access and authorization with underlying hosting providers.
- M. Hanke, M. Visconti di Oleggio Castello, K. Meyer, B. Poldrack, and Y.O. Halchenko (2018). YODA: YODA's organigram on data analysis. OHBM 2018, Singapore.
- JOSS paper (under review) providing a succinct overview of DataLad
- DataLad Handbook: everything you need to know about DataLad.
- Funding support
Distributed Archives for Neurophysiology Data Integration (DANDI) is a platform for publishing, sharing, and processing neurophysiology data funded by the BRAIN Initiative. The platform is now available for data upload and distribution, and provides supplementary client tools to assist with introspection and organization of data following NWB standard.
DueCredit provides solution for the problem of inadequate citation and referencing of scientific software and methods. It provides a simple framework (at the moment for Python only) to embed publication or other references in the original code so they are automatically collected and reported to the user at the necessary level of reference detail, i.e. only references for actually used functionality will be presented back if software provides multiple citeable implementations.
As a side-effect, we hope that DueCredit also will reduce demand in "prima-ballerina" projects, will encourage contributions to existing open-source codebases, and as a result would solidify scientific software ecosystem.
- Y.O. Halchenko and M. Visconti di Oleggio Castello (2016). DueCredit - automagically collect citations for software, methods, and data you use. OHBM 2016, Geneva, Switzerland
HeuDiConv / ReproIn
HeuDiConv is a flexible DICOM converter for organizing brain imaging data into structured directory layouts. As a part of the larger, NIH supported ReproNim effort, we are developing a HeuDiConv-based ReproIn solution for turnkey automatic conversion of all collected MR data to a collection of the BIDS DataLad datasets. It includes a flexible BIDS-like specification how to name scanning sequences in the scanner, and a HeuDiConv reproin.py heuristic to automate layout and conversion of the datasets. This solution is deployed at DBIC (Dartmouth Brain Imaging Center), it already facilitates reproducible research, data sharing, and uploads to central archives such as NDA.
- M. Visconti di Oleggio Castello, James E. Dobson, Terry Sackett, Chandana Kodiweera, J.V. Haxby, M. Goncalves, S. Ghosh, Y.O. Halchenko ReproIn: automatic generation of shareable, version-controlled BIDS datasets from MR scanners, OHBM 2018, Singapore.
NeuroDebian is a turnkey research software platform for all aspects of the neuroscientific research process. It takes the ideas of the software hosting portals such as NITRC on maximizing research transparency and methods sharing, one step further, by providing a comprehensive suite of readily usable and fully integrated software with a robust testing and deployment infrastructure. Consequently, it improves interoperability among the tools and frees researchers from the burden of tedious installation or upgrade procedures. That, in turn, positively affects their availability for actual research activities, as well as their motivation to test new analysis tools and stay connected with the latest methodological developments in the field.
- Y.O. Halchenko & M. Hanke (2012). Open is not enough. Let's take the next step: An integrated, community-driven computing platform for neuroscience. Frontiers in Neuroinformatics, 6:22. [PDF] DOI: 10.3389/fninf.2012.00022
PyMVPA is a Python-based framework for neural decoding using multivariate pattern analysis. It affords both volume- and surface-based analyses using a wide variety of supervised and unsupervised machine learning methods, representational similarity analyses, searchlight analyses, hyperalignment of representational spaces, and model-based decoding and encoding. The software also can be used for neural data other than fMRI, including analysis of MEG and EEG data through spatio-temporo-frequency band searchlights and cross-modal EEG to fMRI trans-fusion. It also has been used for analyses on data unrelated to neuroscience, demonstrating its general utility. PyMVPA also serves as a repository for sample data sets (e.g., Haxby et al. 2001) that has found wide applicability for education, development of new algorithms, or new analyses and independent research reports.
- M. Hanke, Y.O. Halchenko, et al. (2009). PyMVPA: A Python toolbox for multivariate pattern analysis of fMRI data. Neuroinformatics, 7, 37-53. DOI: 10.1007/s12021-008-9041-y
ReproMan (Reproducible computational environments Manager; formerly known as REPROMAN) is also a part of the NIH supported ReproNim effort. It aims to facilitate reproducible computation via collection of detailed information about origin of the used components (Debian and/or Conda packages, VCS repositories, etc), so that computational environments could be analyzed, and re-created.
- M. Travers, R. Buccigrossi, C. Haselgrove, K. Meyer, and Y.O. Halchenko NICEMAN: NeuroImaging Computational Environments Manager, OHBM 2018, Singapore.
tinuous is a command for downloading build logs and (for GitHub only) artifacts and release assets for a GitHub repository from GitHub Actions, Travis-CI.com, and/or Appveyor. By downloading them all, and optionally placing them under DataLad control you can establish the backup, distribution, and convenient harmonious access to all those artifacts.
pyout is a Python package that defines an interface for writing structured records as a table in a terminal. It is being developed to replace custom code for displaying tabular data in in DANDI client and others.
Quail is a Python toolbox for analyzing data from free recall memory experiments. Some key features include:
- A simple data structure for storing encoding and recall data
- A set of functions for analyzing data by computing standard memory performance metrics
- A simple API for customizing plot styles
- Support for "naturalistic" stimuli such as movies, texts, and speech data
- A set of powerful tools for importing data, automatically transcribing audio files (speech-to-text), and more
- A.C. Heusser, P.C. Fitzpatrick, C.E. Field, K. Ziman, and J.R. Manning (2017). Quail: A Python toolbox for analyzing and plotting free recall data. The Journal of Open Source Software, 2(18): 424.
HyperTools is a Python toolbox for gaining geometric insights into high dimensional data. Features include:
- Functions for plotting high-dimensional datasets in 2D and 3D
- Static and animated plots
- Simple API for customizing plot styles
- Set of powerful data manipulation tools including hyperalignment, k-means clustering, normalizing, and more
- Support of lists of Numpy arrays, Pandas dataframes, text, or (mixed) lists
- Applying topic models and other text and word embedding methods to text data
- A.C. Heusser, K. Ziman, L.L.W. Owen, and J.R. Manning (2018). HyperTools: a Python Toolbox for Gaining Geometric Insights into High-Dimensional Data. Journal of Machine Learning Research, 18: 1-6.
SuperEEG is a Python toolbox for inferring whole-brain activity from sparse ECoG recordings. The way the technique works is to leverage data from different patients' brains (who had electrodes implanted in different locations) to learn a "correlation model" that describes how activity patterns at different locations throughout the brain relate. Given this model, along with data from a sparse set of locations, we use Gaussian process regression to "fill in" what the patients' brains were "most probably" doing when those recordings were taken. Details on our approach may be found in this preprint. You may also be interested in watching this talk or reading this blog post from a recent conference.
- L.L.W. Owen, A.C. Heusser, and J.R. Manning (2018). A Gaussian process model of human electrocorticographic data. bioRxiv, 121020.
Open Brain Consent
Open Brain Consent initiative aims to facilitate neuroimaging data sharing by providing an "out of the box" solution addressing aforementioned human subjects concerns and consisting of
- widely acceptable consent form allowing deposition of anonymized data to public data archives
- collection of tools/pipelines to help anonymization of neuroimaging data making it ready for sharing
- Y.O. Halchenko, C.F. Gorgolewski, et al.
Brain Imaging Data Structure (BIDS)
BIDS is a project lead by a steering group elected by the BIDS community to provide a simple and intuitive way to organize and describe your neuroimaging and behavioral data.
- Gorgolewski, K. J., et many, Y.O. Halchenko et many more (2016). The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Scientific Data, 3. DOI: 10.1038/sdata.2016.44
- Example (test) datasets
- Historical OpenfMRI and up-to-date OpenNeuro Datasets in DataLad distribution.
Neurodata Without Borders: Neurophysiology (NWB:N)
NWB:N is a data standard for neurophysiology, providing neuroscientists with a common standard to share, archive, use, and build analysis tools for neurophysiology data. It is a standard supported by BIDS and the DANDI archive.
ReproNim: Reproducible Basics
Reproducible Basics training module of the ReproNim training curriculum presents daily core tools (shell, version control, etc) and explains how you could make your research more reproducible having gained improved knowledge of them.
- Y.O. Halchenko et al.
NIPY BuildBot Master Instance was initiated by Matthew Brett to provide continuous integration testing for the NiPy project. It quickly grew up to cover up a wide variety of associated projects (e.g., Dipy, Nipype, and our PyMVPA). Although it is just an ad-hoc setup, it provides many project developers testing environments which they could not otherwise easily obtain elsewhere (e.g. on Travis-CI) -- various releases of different operational systems (OS X, Windows, GNU/Linux Debian), and even different architectures (e.g., PowerPC and SPARC). Such rich coverage provides a valuable resource to the scientific community helping to identify defects before shipping releases to users. We collaborate and help to maintaining the setup and a park of a test boxes (e.g., SPARC machines).
To provide archival and uninterrupted access to over 9TBs of singularity-hub.org Singularity containers, we have established ///shub DataLad dataset and a service to serve all the shub:// URLs.