MOFA is used in BiomiX following developer guidelines, (https://biofam.github.io/MOFA2/) with transformed data inputs and reduced feature size. Transformation methods depend on the omics type: transcriptomics data use variance-stabilizing transformation, metabolomics data use log transformation to approximate Gaussian distribution, and methylomics data need no transformation. Top genes and CpG islands with the highest variance are selected for MOFA integration, except for metabolomics, where fewer peaks are typically included.
BiomiX’s MOFA implementation allows customization of parameters like convergence speed, Evidence Lower Bound frequency, and maximum iterations. An automated tuning mode optimizes the total factor count, stopping when three models have a factor explaining less than 1% of variance. The top three models, prioritizing those with more statistically significant discriminant factors, are selected. Each model’s factors are tested with a Mann–Whitney test, with p-values adjusted by FDR method.
Both automatic and not automatic MOFA analysis includes three methods to ease the user interpretation of the discriminating MOFA factors.
Correlation analysis: Users can upload a matrix containing binary or numerical clinical features to integrate into the MOFA model. The numerical data are correlated through a Pearson correlation with each MOFA factor, while the binary clinical data are analyzed using the Wilcoxon test after dividing the groups into positive and negative categories. The nominal p-values are corrected using the Benjamini–Hochberg method.
Pathway analysis:BiomiX identifies the top contributing genes, metabolites, and CpG islands for discriminating factors in each MOFA model. According to the omics data type, BiomiX selects an R package to determine if a biological or metabolic pathway is enriched within these genes, metabolites, or CpG islands. Genes are analyzed with EnrichR using Reactome, Biological Process, Encyclopedia of DNA Elements (ENCODE), and ChIP-X Enrichment Analysis (ChEA) transcription factor libraries. Metabolites are assessed via MetPath using the KEGG and HMDB databases. CpG islands, when linked to associated genes, are also analyzed using EnrichR.
Pubmed bibliography research: For each discriminating factor in each MOFA model, top contributing genes, metabolites, and CpG island genes are used for PubMed searches to gain insights into factor identities. The search algorithm operates on three levels, prioritizing abstracts that combine more multiomic contributors. At the first level, abstracts include at least one of the top ten contributors; at the second level, abstracts contain at least one contributor from any two omics (e.g., transcriptomics–metabolomics); and the third level includes abstracts with at least one contributor from a single omics.
The output is a .TSV file with PubMed articles, total and individual contributor matches, DOIs, and keywords. Since author-provided keywords are limited, BiomiX performs additional text-mining with litsearchr v1.0.0, generating a vocabulary of frequent word combinations filtered against Gene Ontology and human phenotype terms. The top 15 terms are included in the .TSV file, along with a final word frequency analysis across all abstracts retrieved from each level.