General guidelines

BiomiX was developed to compare two groups using single and multiple omics. To work properly, the input file must follow a specific format and criteria:

Metadata:

BiomiX requires uploading a metadata file in the BiomiX launcher. The first column must contain the sample ID, while the others contain other sample variables.

  • "ID" column (mandatory): This is the first column and contains the sample ID.
  • "CONDITION" column (mandatory): This column must contain the sample condition. These conditions will be available for selection in the main interface.
  • "SAMPLE_TYPE" column (optional): This column is designed for specific conditions. Suppose the input matrix contains samples from different tissues or cells (convoluted matrix), and this source is also defined in the sample ID (e.g., ID = 25658_BLymphocyte, 25658_GlialCells). In that case, this column allows analysis to be performed only on data from the selected tissues or cells.
  • "GENDER" column (optional): This column can be used to adjust the LIMMA or DESeq2 model for transcriptomics data.
  • "AGE" column (optional): This column can also be used to adjust the LIMMA or DESeq2 model for transcriptomics data.
  • Any other metadata column can be used to filter the data via the Metadata section in the advanced options or for the PCA and UMAP preview-QC visualization.

    Matrix:

    BiomiX requires an expression matrix where the columns represent the samples and the rows contain the variables.

    Omics differences:

  • In transcriptomics, the variables (genes) should be named using GENE SYMBOL or ENSEMBL ID.
  • In metabolomics, variables (metabolites) should be named using KEGG, HMDB, or official compound names. If the peaks are not annotated, the format should be "peak_[number]," especially for MS1 and MS2 annotation.
  • In methylomics, variables must use the EPIC or 450K array code.
  • In Undefined omics, any variable name is accepted.
  • Directory system:

    BiomiX adhere to the FAIR principles (Findability, Accessibility, Interoperability, and Reuse) It means that the tool is set up to make analyses reproducible.

    How ensure reproducible analysis?

  • Save the metadata and matrices in the correct location Although BiomiX can upload the matrices and metadata from any location, we strongly suggest saving a copy within the BiomiX folders. Respectively, The metadata file within the Metadata folder, and the omics input within the input files within the corresponding omics.
  • Analysis Reports After each BiomiX analysis is completed without errors, a report file including the name of the files used, the type of analysis and the parameters are saved to guarantee reproducibility.
  • Preview-QC visualisation

    If the preview-QC is selected within an omics input slot, a shiny interface is opened to visualize the data. The sections include:

  • Plot section: includes the scatter plot visualization among the top 50 variables with high-variance, PCA, UMAP, correlation heatmap, and the variables values distribution within the sample
  • Data Table section: Shows the matrix table
  • Outliers section: includes a filtering to remove the selected percentage of variables with higher variance and a outlier identifier PCA-based.
  • Download section: the transformed data can be downloaded here
  • Once the transformation and the modifications performed are satisfied, press the Close App button to use the current matrix for the single omics or integration analysis.

    Warning For metabolomics analysis, the QC samples can be uploaded labeling them as "QC" within the CONDITION column in metadata. By this, the QC can be visualized in the preview-QC to compare their distributions with the other samples and then removed for the statistics.



    Transcriptomics

    Single Omics Analysis

    The transcriptomics single omics analysis identifies differentially expressed genes (DEGs) along with their statistics (Log2FC and adjusted p-values), and provides visualizations such as volcano plots and heatmaps. If SEX and GENDER columns are included in the metadata, they will be used to adjust the DESeq2 or Limma model. The default thresholds for DEGs are set to Log2FC > 0.5 and adjusted p-value < 0.05.

    Biological process enrichment is explored using the R version of EnrichR, with options to export data for external tools such as GSEA Link and EnrichR Link . Within the Preview-QC visualization, users can choose to transform the data: if transformed, DGE is performed with Limma; if not, and the expression matrix contains raw counts, DESeq2 is used instead. The transformed data are then used as input for MOFA integration. To optimize analysis speed, only the most variable genes are selected (5000 by default, customizable in the MOFA section of the advanced options).

    Additional Features

    Metabolomics

    The matrix's peak or metabolite signals are first visualized by preview-QC to decide if and how to transform the data. The transformed signals are then compared among the chosen groups by calculating Log2FC as the log2 ratio of their median peak signals. P-values are calculated using the non-parametric Mann–Whitney test and corrected using the FDR method. The Volcano plot and heatmap display significant metabolites or peaks that are increased or decreased. Warning: Please ensure the data matrix is curated to remove variables without biological relevance (e.g., contaminants, poorly reproducible signals) and that instrumental normalization is performed before uploading to BiomiX. All transformed peak or metabolite signals are included for MOFA integration.

    To reveal changes in biological pathways, BiomiX uses methPath v1.0.5 from tidymass v1.0.8 to detect enrichment in metabolic processes among significantly changed metabolites. For deeper analysis, it also prepares ready-to-copy files for input into MetaboAnalyst, one of the most advanced online tools for metabolomics analysis, supporting enrichment and metabolite set enrichment analysis. If transcriptomic results are available in BiomiX, the platform will automatically integrate metabolomics and transcriptomics data to create files ready for MetaboAnalyst's joint pathway and network analysis.

    If you have only raw .mzML files, we strongly recommend using the R pipeline by Á. Fernández-Ochoa (https://doi.org/10.3390/metabo10010028) or MetaboAnalyst to generate the peak signals matrix.

    Additional Features

    BiomiX is flexible, supporting the analysis of both targeted (annotated) and untargeted metabolomics data, with the option to analyze untargeted data with or without annotation. The untargeted annotation is based on MS1 annotation (mass/charge ratio “m/z” and retention time values) or MS2 data (MS1 annotation plus raw MS2 fragmentation files in mzML or .mgf format).

    Methylomics

    BiomiX performs a Differential methylation analysis using the ChAMP packae, providing the CpG island Δbeta value, the p-adjusted corrected by FDR and a summarizing volcano plot and heatmap. The threshold has been set as the default to the beta value change (Δbeta) > |0.15| and p.adj corrected by the FDR method < 0.05.