Transcriptomics

Input Information

BiomiX analyses the expression matrix to quantify the expression difference between two groups of samples discriminating automatically between raw counts, analyzed using the Deseq package, and normalized counts, analyzed by Limma package. Biomix requires an expression matrix Msg where the columns s represent the samples while the rows g contain the genes in Ensembl or gene names format.

BiomiX is developed to deconvolute matrices containing samples from different experiments and select only the ones the user wants to analyze. For example, analyze only B cells in a matrix containing the expression of B cells, T cells, Neutrophils, and Monocytes.

The tool has been developed to accept both formats and to provide input in MOFA including only the ID number to allow integration with different omics. So if you have a convoluted matrix it is enough to run the DGE analysis, cross the selection option and use in the label the name of the tissue. BiomiX will look in the SAMPLE_TYPE column and filter on the tissue_name.

The Analysis

The analysis highlights the differentially expressed genes (DEG), their Log2FC and p.adj plus pff reports containing volcano plots, heatmaps, and gene variances. If sex and gender are available in the metadata these will be automatically used to correct the Deseq2 or Limma model. The default thresholds to consider the differentially expressed genes were set to Log2FC > 0.5 and p.adj < 0.05.

The enrichment of biological processes in the results is explored in the R version of EnrichR. Moreover, pre-treated output files are produced to carry out the gene set enrichment analysis using GSEA or EnrichR. The expression matrix is also normalized by variance stabilizing transformation (vst) for data visualization and MOFA integration.

Additional Features

Metabolomics

BiomiX metabolomics data requires peaks signals matrix Msg where the columns s represent the samples while the rows g contain the arbitrary peak numbers. BiomiX is flexible, allowing annotated and not annotated metabolomics data. If the annotation is required, based on the available information can execute an MS1 annotation (mass/charge ratio and retention time values required) or an MS2 one (fragmentation files . Mzml required).

If you have only row .mzML files, to generate the peak signals matrix we strongly suggest this easy R pipeline by Á. Fernández-Ochoa (https://doi.org/10.3390/metabo10010028) or Metaboanalyst.

The peak signals from the condition/treated samples are compared with the CTRL calculating the Log2FC as the log2 of the ratio between the median peak signal in condition-treated samples/ median peak signal in CTRL for each peak.

The significant metabolites increased and reduced are displayed in a volcano plot while a logarithmic transformation is applied to the peak signal matrix for MOFA integration.

To unveil the change in biological pathways, BiomiX automatically exploits a methPath v1.0.5 from tidymass v1.0.8 to spot enrichment in the metabolic process among the significantly changed metabolites. Moreover, for a deeper analysis, it prepares ready-to-copy files as input for Metaboanalyst. It is one of the most advanced online tools for metabolomics analysis, including enrichment and Metabolite set enrichment analysis. Autonomously, if transcriptomics results are available in BiomiX analysis will join the metabolomics and transcriptomics results into files copiable on Metaboanalyst to launch joint pathway analysis and network analysis.

Methylomics

Biomix requires an expression matrix Msg where the columns s represent the samples while the rows g contain the CpG island annotation. The matrix must contain the beta-values, if not available, these can be obtained by the minfi R package. The Differential methylation analysis (DMA) is performed using the ChAMP database, providing the results of the CpG island hypermethylated and hypomethylated with a volcano plot. The threshold has been set as default to the beta value change (Δbeta) > |0.15| and p.adj < 0.05.

Interface Usage

Assisted Input Modification

Click on the upload button and select the file you want to modify.

After the file selection, it is possible to decide if the selected file must be used as input for the analysis or if you desire to modify it in the BiomiX standard format.

When you select "Yes" to the Pop-up "Do you want to modify the file?", another Pop-up will appear, asking for the file separator, the header, the column containing the ID and the type of decimal separator.

Then the matrix preview will appear in the "preview" tab, allowing the user to visualize if the format is the right one. If not the case it is possible to modify the table using the other tab, which contains buttons to remove specific columns and rows and eventually transpose the matrix. WARNING: The preview will include the first 10 columns and rows, so it does not represent the entire matrix.

Main interface guide

The main interface structure:

  • Condition. Group to analyse.
  • Control. Reference group, compared with the condition one.
  • Output. Directory where save the output results of the analysis.
  • Omics input grid:

  • Analysis. Check the box to analyse that omic by our single omics pipelines.
  • Data type. Select the type of omics uploaded.
  • Integration. Check the box to include that omics in the MOFA integration.
  • Label. Omics name, used to name the output folder and filter the sample based on the sample names.
  • Selection. If selected filters the samples based on the sample name using a regex derived by the label name.
  • Data upload. Click on the button to upload the matrix. Here is possible to modify the matrix the the BiomiX format (.tsv).
  • Integration grid:

  • Integration. Do you want to perform the integration with the uploaded omics data?.
  • Method. Which integration method? (Only MOFA is available for now).
  • N° Factors. How many MOFA factor do you want to calculate for the integration? (select 0 for an automatic selection).
  • Factor to explore. Which factor do you want to explore graphically? (Contributors, heatmap clustering etc...).
  • Omics overlay. Which is the minimum number of omics that a sample should have to be included in the integration?
  • Open advance option. Button to open the advance option interface
  • Open BiomiX chatbot. Button to open the BiomiX chatbot
  • Start Analysis. Button to start the analysis.
  • Guide to the advanced option interface

    Advanced options interface (General section):

  • Log2FC threshold. Log2FC threshold value for significant results.
  • P.adj threshold. Adjusted p.value threshold value for significant results.
  • Gene Panel. Button to upload the gene panel for the subgrouping analysis.
  • Array type. The type of chip from which the data comes (450K/EPIC).
  • N° of genes within the panel with score > one or two. Parameters to define the positive/negative subgrouping by the gene panel. The user can modify the number of genes that must have a Z-score > one or two of the control standard deviation
  • Removal of positive control samples. If checked, the controls positive for the gene panel can be excluded from the downstream analysis.
  • N° top DE genes in heatmap Number of top genes(or metabolites) to visualize in the heatmaps.
  • Clustering distance. Clustering distance used in the heatmaps, both in subgrouping and single omics analysis.
  • Clustering methods. Clustering method used in the heatmaps, both in subgrouping and single omics analysis.
  • CPU threads. Number of CPU threads used in your PC in parallel.
  • N° MOFA input features. Number of top features used in the MOFA analysis. It is suggested to use a similar number of features in each omics
  • Advanced options interface (Metabolomics section):

  • Metabolite annotation. Which annotation (HMDB, KEGG, compound name) are used in the uploaded annotated metabolomics matrices.
  • Ion mode. Type of ionization used in the metabolomics data acquisition.
  • M/Z Tolerance ppm. Ppm tolerance used in the MS1 annotation.
  • Adduct positive mode. Type of positive adduct generated during the metabolomics data acquisition.
  • Adduct negative mode. Type of negative adduct generated during the metabolomics data acquisition.
  • Adduct neutral mode. If checked will consider these data as neutral.
  • MS1 files upload. Button were upload the MS1 annotation (required also for the MS2) with the corresponding input number.
  • Databases MS1. Databases consulted by the MS1 annotation (By Ceu Mass Mediator).
  • mz match MS1/2. Mass charge ratio tolerance for the match between MS1 annotation and the fragmentation spectra MS2 provided within the .mzml files.
  • RT match MS1/2. Retention time tolerance for the match between MS1 annotation and the fragmentation spectra MS2 provided within the .mzml files.
  • Column. Type of column used in the liquid chromatography.
  • Databases MS2. Databases consulted by the MS2 annotation (By Tidymass).
  • MS2 directory Directory containing the fragmentation spectra (.mzml)
  • Advanced options interface (Metadata section):

  • Column name Column used for the sample filtering
  • Column name Type of data in the defined column
  • Threshold/Factor selected Threshold to filter the samples (format: ">= 90" / "== 0.56") or specific condition (format: "== male" / "== treated")
  • Advanced options interface (MOFA section):

  • Max iteration. Maximum number of MOFA iterations in the model training.
  • Convergence mode. Speed to train the MOFA model. The longer the more accurate.
  • Threshold contribution weight. Threshold to isolate the top MOFA factor contributors. Among the 5% top percent contributors (prefiltered by BiomiX) the user can define a contribution threshold set to 0.5 by default.
  • Type of research (Bibliography). The type of document where the contributors will be researched in Pubmed. Abstract/Title or Text word.
  • N° articles (Bibliography). Number of articles consulted by BiomiX to select article related to the significant factors. Attention, Pubmed server has a max number of request and the research of too many contributors in too many articles can drive the request interruption with the server. In this case reduce the number of articles consulted or number of contributors researched.
  • N° top contributors (Bibliography). Top contributors of each omics researched in the PubMed abstracts.
  • N° keywords extracted (Bibliography). Number of keywords extracted in each article related to significant MOFA factors. These are the keywords actively extracted selecting the most cited words of the abstract by natural language processing.
  • P.adj threshold (Pathway mining). Adjusted p.value threshold to consider significant a biological pathway.
  • Pathways shown (Pathway mining). Number of biological pathways visualized as output in the pdf reports.
  • Numerical (Clinical). Check the box to correlate the clinical data (Numeric) with the significant MOFA factor. (.tsv format and sample on the rows. Use the Assisted Input Modification if needed)
  • Binary (Clinical). Check the box to correlate the clinical data (Binary) with the significant MOFA factor. (.tsv format and sample on the rows. Use the Assisted Input Modification if needed)
  • Save advanced options (Save parameters). Click on the blue button to save the advanced parameters.