MOFA Integration

BiomiX-assisted format converter

Usage:

Click on the upload button and select the file you want to modify.
After the file selection, it is possible to decide if the selected file must be used as input for the analysis or if you desire to modify it in the BiomiX standard format.
When you select "Yes" to the Pop-up "Do you want to modify the file?", another Pop-up will appear, asking for the file separator, the header, the column containing the ID and the type of decimal separator.
Then the matrix preview will appear in the "preview" tab, allowing the user to visualize if the format is the right one. If not the case it is possible to modify the table using the other tab, which contains buttons to remove specific columns and rows and eventually transpose the matrix. WARNING: The preview will include the first 10 columns and rows, so it does not represent the entire matrix.

BiomiX-toolkit

If the preview-QC is selected within an omics input slot, a shiny interface is opened to visualize the data. The sections include:

Upload Data Section: Contains a button to upload data for imputation using various methods, including replacement by 0, variable mean or median, lasso, NIPALS, and random forest. The button filter variables and samples remove the samples and variables having more 0 or missing values than the threshold specified above.

Boxplot by variables section: It shows the percentage of 0 or missing values by variable

Boxplot by samples section: It shows the percentage of 0 or missing values by sample.

Summary table section: the imputed data modifications can be visualized here

Download section: the imputed data can be downloaded here

Main interface guide

More details guideline about the parameter usage and omics principles here: Parameters_guidelines

The main interface structure:

Condition. Group to analyse.

Control. Reference group, compared with the condition one.

Output. Directory where save the output results of the analysis.

Omics input grid:

Input. Input slot number.

Preview-QC. Opens a shiny app that visualizes, transforms the data and removes variables or sample outliers.

Single omics Analysis. Check the box to analyse that omic by our single omics pipelines.

Data type. Select the type of omics uploaded.

Integration. Check the box to include those omics in the MOFA integration.

Label. Omics name, used to name the output folder and filter the sample based on the sample names.

Selection. If selected filters the samples based on the sample name using a regex derived by the label name.

Data upload. Click on the button to upload the matrix. Here, it is possible to modify the BiomiX format (.tsv,xls,xlsx) matrix.

Integration grid:

Integration. Do you want to perform the integration with the uploaded omics data?.

Method. Which integration method? (Only MOFA is available for now).

N° Factors. How many MOFA factors do you want to calculate for the integration? (select 0 for an automatic selection).

Factor to explore. Which factor do you want to explore graphically? (Contributors, heatmap clustering etc...).

Omics overlay. What is the minimum number of omics that a sample should have to be included in the integration?

Open advance option. Button to open the advance option interface

Open BiomiX chatbot. Button to open the BiomiX chatbot

Start Analysis. Button to start the analysis.

Guide to the advanced option interface

More details guideline about the parameter usage and omics principles here: Parameters_guidelines

Advanced options interface (General section):

Log2FC threshold. Log2FC threshold value for significant results.

P.adj threshold. Adjusted p.value threshold value for significant results.

Gene Panel. Button to upload the gene panel for the subgrouping analysis.

Array type. The type of chip from which the data comes (450K/EPIC).

N° of genes within the panel with score > one or two. Parameters to define the positive/negative subgrouping by the gene panel. The user can modify the number of genes that must have a Z-score > one or two of the control standard deviation

Removal of positive control samples. If checked, the controls positive for the gene panel can be excluded from the downstream analysis.

N° top DE genes in heatmap Number of top genes(or metabolites) to visualize in the heatmaps.

Clustering distance. Clustering distance used in the heatmaps, both in subgrouping and single omics analysis.

Clustering methods. Clustering method used in the heatmaps, both in subgrouping and single omics analysis.

CPU threads. Number of CPU threads used in your PC in parallel.

N° MOFA input features. Number of top features used in the MOFA analysis. It is suggested to use a similar number of features in each omics

Advanced options interface (Metabolomics section):

Metabolite annotation. Which annotation (HMDB, KEGG, compound name) are used in the uploaded annotated metabolomics matrices.

Ion mode. Type of ionization used in the metabolomics data acquisition.

M/Z Tolerance ppm. Ppm tolerance used in the MS1 annotation.

Adduct positive mode. Type of positive adduct generated during the metabolomics data acquisition.

Adduct negative mode. Type of negative adduct generated during the metabolomics data acquisition.

Adduct neutral mode. If checked will consider these data as neutral.

MS1 files upload. Button were upload the MS1 annotation (required also for the MS2) with the corresponding input number.

Databases MS1. Databases consulted by the MS1 annotation (By Ceu Mass Mediator).

mz match MS1/2. Mass charge ratio tolerance for the match between MS1 annotation and the fragmentation spectra MS2 provided within the .mzml or .mgf files.

RT match MS1/2. Retention time tolerance for the match between MS1 annotation and the fragmentation spectra MS2 provided within the .mzml or .mgf files.

Column. Type of column used in the liquid chromatography.

Databases MS2. Databases consulted by the MS2 annotation (By Tidymass).

MS2 directory Directory containing the fragmentation spectra (.mzml or .mgf)

Advanced options interface (Metadata section):

Column name Column used for the sample filtering

Column name Type of data in the defined column

Threshold/Factor selected Threshold to filter the samples (format: ">= 90" / "== 0.56") or specific condition (format: "== male" / "== treated")

Advanced options interface (MOFA section):

Max iteration. Maximum number of MOFA iterations in the model training.

Convergence mode. Speed to train the MOFA model. The longer the more accurate.

Threshold contribution weight. Threshold to isolate the top MOFA factor contributors. Among the 5% top percent contributors (prefiltered by BiomiX) the user can define a contribution threshold set to 0.5 by default.

Type of research (Bibliography). The type of document where the contributors will be researched in Pubmed. Abstract/Title or Text word.

N° articles (Bibliography). Number of articles consulted by BiomiX to select articles related to the significant factors. Attention, Pubmed server has a max number of request and the research of too many contributors in too many articles can drive the request interruption with the server. In this case reduce the number of articles consulted or number of contributors researched.

N° top contributors (Bibliography). Top contributors of each omics researched in the PubMed abstracts.

N° keywords extracted (Bibliography). Number of keywords extracted in each article related to significant MOFA factors. These are the keywords actively extracted selecting the most cited words of the abstract by natural language processing.

P.adj threshold (Pathway mining). Adjusted p.value threshold to consider significant a biological pathway.

Pathways shown (Pathway mining). Number of biological pathways visualized as output in the pdf reports.

Numerical (Clinical). Check the box to correlate the clinical data (Numeric) with the significant MOFA factor.

Binary (Clinical). Check the box to correlate the clinical data (Binary) with the significant MOFA factor.

Save advanced options (Save parameters). Click on the blue button to save the advanced parameters.