This is a collection of some BigBio formats developed under BigBio group. More information and project can be found in the specific github repository BigBio GitHub Organization

We are available for consulting, so feel free to contact us if you’d like to work together.

Educational videos

Our YouTube channel contains educational videos on various our tools, algorithms and workflows for all levels.

Proteomics formats

MAGE-TAB for Proteomics - 📄 The Proteomics Sample Metadata Project aims to standardize the way ProteomeXchange partners and the proteomics community capture the relation between the samples and the data generated within a PX submission. We have adapted the MAGE-TAB v1.1 format to capture necessary metadata for Proteomics experiments to allow automated re-processing. The MAGE-TAB (MicroArray Gene Expression Tabular) is the file format to store the metadata and sample information on transcriptomics experiments. By repurposing and extending the MAGE-TAB for Proteomics, we aim to provide a format for future submissions of multiomics experiments to ProteomeXchange partners and better integration with other omics data. The MAGE-TAB is divided in two main files: IDF (Investigation Description Format) and SDRF (Sample and Data Relationship Format). We will describe how these two files are adapted for Proteomics.

Our goal is to ensure maximum reusability of the deposited data. Our work aims to define the minimum information required to report the experimental design of proteomics experiments, enabling the use and reuse of the deposited data by the proteomics community. The following Use Cases should be considered to design the Proteomics Sample Metadata Format:

  • The MAGE-TAB for proteomics should be fully compatible with MAGE-TAB version v1.1 that is used to represent transcriptomics data.
  • The IDF part of the MAGE-TAB should be compatible with the current proteomeXchange.xml file format.
  • The “Sample and Data Relationship Format for Proteomics (SDRF-Proteomics)” based on the SDRF part of MAGE-TAB should capture the Sample to Data relationships.
  • The resulting file format SHOULD enable data submitters and curators to annotate a proteomics dataset at different levels, including the sample metadata (e.g. organism and tissues), technical metadata (e.g. instrument model) and the experimental design.
  • The resulting file format SHOULD facilitate the automatic reanalysis of public proteomics datasets, by providing a better representation of quantitative datasets in public repositories.