BigBio Team | Projects

This is a collection of some BigBio algorithm, tools, workflows and pipelines developed under BigBio group. More information and project can be found in the specific github repository BigBio GitHub Organization

We are available for consulting, so feel free to contact us if you’d like to work together.

Educational videos

Our YouTube channel contains educational videos on various our tools, algorithms and workflows for all levels.

Lectures and Presentations

“Distributed Proteomics Data analysis using OpenMS and Nextflow” hosted in Youtube. This talk 🗯️ is mainly about the use of proteomicsLFQ and proteomicsTMT workflows developed with OpenMS and Nextflow for the analysis of public proteomics data.
“BioConda and BioContainers hosted in Youtube. This talk 🗯️ was made by a BigBio collaborator Björn Grüning about BioContainers, a leading project of the BigBio framework.

Proteogenomics tools

pypgatk - 🔦 is a Python library that provides different bioinformatics tools for proteogenomics data analysis. The library has been developed in Python and among others:
- Generation of protein variant sequences from genome variant databases such as COSMIC or CBioPortal
- Generation of non-canonical peptides from pseudo-genes, lncRNAs, sORFs
pgdb - 🔬 is a Nextflow workflow that uses the pypgatk tool to generate large-scale proteogenomics databases.
pepgenome - 🦗 PepGenome uses transcript translations and reference gene annotations to identify the genomic loci of peptides and post-translational modifications. Multiple occurrences of peptides in the input data resulting in the same genomic loci will be collapsed as a single occurrence in the output.

Quantitative Proteomics Workflows

proteomicsLFQ - ☁️ Proteomics label-free quantification (LFQ) analysis pipeline using OpenMS and MSstats, with feature quantification, feature summarization, quality control and group-based statistical analysis.
proteomicsTMT - ☁️ Proteomics Tandem Mass Tags quantification (TMT) analysis pipeline using OpenMS and MSstats, with feature quantification, feature summarization, quality control and group-based statistical analysis.

Metadata annotation

PROMA - 🖋 PROteomics Metadata Annotator tool allows proteomics researchers to annotate their data using the MAGE-TAB for proteomics format. More details of PROMA can be found here. PROMA is a Google Sheets Add-on that enable to search for ontology terms to annotate datasets. Its uses the powerful functionalities of Google Sheets to enable users to copy / remove / add new samples, samples characteristics.
sdrf-pipelines - 📝 A Python tool that allows users to validate MAGE-TAB for proteomics files. In addition, It allows users to convert SDRF (Sample to Data relationship format) to OpenMS and MaxQuant configuration files.

R packages

pquant - 📊 Proteomics Quantitation downstream analysis package and library. pquant is a python and R package to perform downstream analysis of proteomicsLFQ and proteomicsTMT quantitative data. It also included R shiny application which is designed to do the downstream analysis of proteomics dataset, currently these figures are included: Heatmap, Volcano Plot, QC plot.
feseR - 🌌 We provide here a R package which combine multiple Feature Selection (FS) methods in a workflow for analizing high-dimentional omics data. The different feature selection steps can be classificated in: i) Univariate (Correlation filter and Gain Information), ii) Multivariate (Principal Component Analysis and Matrix Correlation based) and iii) Recursive Feature Elimination (wrapped up with a Machine Learning algorithm, e.g. Random Forest).
pIR - 📈 An R package to analyze the isoelectric point of peptides and proteins based on experimental values and predicted using different functions. The package provides an statistical framework to analyze the correlation between predicted and expeted values, and it can be use in other contexts.

Java libraries and tools

pia - 👓 PIA allows you to inspect the results of common proteomics spectrum identification search engines, combine them seamlessly and conduct statistical analyses. The main focus of PIA lies on the integrated inference algorithms, i.e. concluding the proteins from a set of identified spectra.
jmztab - 🗂 JmzTab is maintained by the BigBio Stack team to read/write and validate mzTab files.

Python packages

pmultiqc - 👮‍ The pmultiqc package is a Python packahe based on the multiqc a framework to generate quality control reports of omics’s analysis pipelines. pmultiqc allows to generate reports containing the number of peptides and proteins identified in a proteomics experiments and also other QC metrics.