Single-omics Analysis

BiomiX was developed to compare two groups using single and multiple omics. To work properly, the input file must follow a specific format and criteria:

BiomiX requires uploading a metadata file in the BiomiX launcher. The first column must contain the sample ID, while the others contain other sample variables.

Any other metadata column can be used to filter the data via the Metadata section in the advanced options or for the PCA and UMAP preview-QC visualization.

Matrix:

BiomiX requires an expression matrix where the columns represent the samples and the rows contain the variables.

Directory system:

BiomiX adhere to the FAIR principles (Findability, Accessibility, Interoperability, and Reuse) It means that the tool is set up to make analyses reproducible.

Preview-QC visualisation

If the preview-QC is selected within an omics input slot, a shiny interface is opened to visualize the data. The sections include:

Once the transformation and the modifications performed are satisfied, press the Close App button to use the current matrix for the single omics or integration analysis.

Warning For metabolomics analysis, the QC samples can be uploaded labeling them as "QC" within the CONDITION column in metadata. By this, the QC can be visualized in the preview-QC to compare their distributions with the other samples and then removed for the statistics.

Transcriptomics

Single Omics Analysis

The transcriptomics single omics analysis identifies differentially expressed genes (DEGs) along with their statistics (Log2FC and adjusted p-values), and provides visualizations such as volcano plots and heatmaps. If SEX and GENDER columns are included in the metadata, they will be used to adjust the DESeq2 or Limma model. The default thresholds for DEGs are set to Log2FC > 0.5 and adjusted p-value < 0.05.

Biological process enrichment is explored using the R version of EnrichR, with options to export data for external tools such as GSEA Link and EnrichR Link . Within the Preview-QC visualization, users can choose to transform the data: if transformed, DGE is performed with Limma; if not, and the expression matrix contains raw counts, DESeq2 is used instead. The transformed data are then used as input for MOFA integration. To optimize analysis speed, only the most variable genes are selected (5000 by default, customizable in the MOFA section of the advanced options).

Additional Features

Gene Panel subgrouping: It is possible to provide a panel of genes within GENE_PANEL file, in the General section of the advanced options The genes must be in Gene symbol format and tab-delimited (by enter button).
```
GENES_FOR_SUBPOPULATION
SIGLEC1
IFIT3
IFI6
LY6E
MX1
USP18
OAS3
IFI44L
```
These genes will generate a heatmap displaying the expression standard deviation of each specific gene across samples. This enables classification of condition/disease samples into positive and negative groups, while BiomiX performs DGE analysis to compare the chosen controls with these two subgroups
Clinical or Biological Marker Validation: Adding a column named MARKER in the metadata is possible to classify samples based on clinical or biological marker measurements. This marker classification can then be compared to gene panel subgrouping to observe similarities or differences.

Metabolomics

The matrix's peak or metabolite signals are first visualized by preview-QC to decide if and how to transform the data. The transformed signals are then compared among the chosen groups by calculating Log2FC as the log2 ratio of their median peak signals. P-values are calculated using the non-parametric Mann–Whitney test and corrected using the FDR method. The Volcano plot and heatmap display significant metabolites or peaks that are increased or decreased. Warning: Please ensure the data matrix is curated to remove variables without biological relevance (e.g., contaminants, poorly reproducible signals) and that instrumental normalization is performed before uploading to BiomiX. All transformed peak or metabolite signals are included for MOFA integration.

To reveal changes in biological pathways, BiomiX uses methPath v1.0.5 from tidymass v1.0.8 to detect enrichment in metabolic processes among significantly changed metabolites. For deeper analysis, it also prepares ready-to-copy files for input into MetaboAnalyst, one of the most advanced online tools for metabolomics analysis, supporting enrichment and metabolite set enrichment analysis. If transcriptomic results are available in BiomiX, the platform will automatically integrate metabolomics and transcriptomics data to create files ready for MetaboAnalyst's joint pathway and network analysis.

If you have only raw .mzML files, we strongly recommend using the R pipeline by Á. Fernández-Ochoa (https://doi.org/10.3390/metabo10010028) or MetaboAnalyst to generate the peak signals matrix.

Additional Features

BiomiX is flexible, supporting the analysis of both targeted (annotated) and untargeted metabolomics data, with the option to analyze untargeted data with or without annotation. The untargeted annotation is based on MS1 annotation (mass/charge ratio “m/z” and retention time values) or MS2 data (MS1 annotation plus raw MS2 fragmentation files in mzML or .mgf format).

Peaks automatic annotation:
Targeted metabolomics: This option supports metabolomics data obtained from any analytical platform such as LC-MS, GC-MS, CE-MS or NMR, provided the metabolites' biological identities are available. Despite the annotation here is not performed the choice of the proper metabolite identifier (HMDB or KEGG) is required for biological pathway analysis. Also the Iupac compound name option is available but will not perform any biological pathway analysis.

MS1 annotation (Untargeted HRMS: High-resolution mass spectrometry): Users can upload MS1 files, containing mass-to-charge ratio (m/z). BiomiX exploits the CEU Mass Mediator database to match the m/z of the metabolomics peaks with those provided by the database, reporting the best matches for each peak. The MS1 m/z match is set by default to a 15-ppm error for positive mode, but neutral and negative modes are also available. The adducts available in the positive mode include M+H, H+2H, H+NA, H+NH4, and M+H-H2O, and in the negative mode, they include M-H, M-Cl, M+FA-H, and M-H-H2O. By default, all available MS1 databases are examined (the Human Metabolome Database [HBMD], Lipidmaps, Metlin, and Kegg), but their use is customizable. The user should carefully review both the databases and parameters according to the dataset.

MS1 and DDA-MS/MS annotation (.mzML and .mgf files) (Untargeted HRMS): Same MS1 annotation pipeline including additional steps. The fragmentation files (.mzML or .mgf) are crossed with HMDB, MassBank, and MoNA databases to match each fragmentation spectra in the samples with those available in these databases. These matches are available in the results folder. The annotation based on the fragmentation spectra MS/MS (MS2) retrieved in this step will replace those obtained based on the MS1 annotation because of its higher reliability.
Sample type filtering: BiomiX examines the lists containing previously identified or predicted metabolites in the HBMD to filter metabolites at the same time associated with one identical peak for retaining those already identified or spotted in a type of specimen. These include plasma, urine, saliva, cerebrospinal fluid, feces, sweat, breast milk, bile and amniotic fluid samples.

Methylomics

BiomiX performs a Differential methylation analysis using the ChAMP packae, providing the CpG island Δbeta value, the p-adjusted corrected by FDR and a summarizing volcano plot and heatmap. The threshold has been set as the default to the beta value change (Δbeta) > |0.15| and p.adj corrected by the FDR method < 0.05.

General guidelines

Matrix:

Directory system:

Preview-QC visualisation

Transcriptomics

Single Omics Analysis

Additional Features

Metabolomics

Additional Features

Methylomics