Biomarker Discovery

Find molecular signatures that distinguish patient groups in a translational cohort and decide which ones are worth taking forward. Inflexa runs multi-method differential expression, pathway enrichment, and effect-size estimation on your data, cross-validates against held-out subsets, then grounds the surviving candidates in PubMed evidence so each biomarker in the dossier carries both statistical and mechanistic support.

Install Inflexa

What You Get

Deliverables

Candidate biomarkers with multi-method consensus

Differentially expressed genes identified through multi-method analysis and ranked by statistical confidence. For the TCGA-BRCA ancestry comparison, Inflexa identified 11,424 significant DEGs (FDR<0.05) from 1,084 patients. The top hits, LOC90784 (FDR 3.99e-45), CROCCL1 (FDR 6.5e-39), and CRYBB2 (FDR 1.6e-37), represent the most statistically robust ancestry-associated expression differences in breast cancer.

LOC90784-1.003.99e-45

CROCCL1+1.036.5e-39

CRYBB2+1.241.6e-37

FAM3A+0.811.2e-35

HEXDC+1.081.9e-35

Effect sizes with confidence intervals

Beyond statistical significance, Inflexa computes effect sizes (Cohen's d) with confidence intervals for key markers. The immune checkpoint analysis revealed OX40/TNFRSF4 with a large effect size (d=+0.82, AA>EA), followed by PD-1, CTLA-4, and LAG-3, all elevated in African American patients. These effect sizes provide the practical significance needed to assess biomarker utility in clinical settings.

Cross-validation and reproducibility metrics

Inflexa tests whether discovered signatures hold up across subgroups. For the ancestry comparison, prognostic signature overlap was remarkably low: only 1.2% (2 shared genes out of 167), demonstrating that ancestry-specific prognostic models are needed. The null survival effect after clinical adjustment (HR 1.046, p=0.837) further shows that molecular differences don't always translate to outcome differences, a finding that's as important as discovering significant biomarkers.

DECISION ENABLED

Prioritise biomarker candidates for validation assays and clinical translation: large effect sizes, supporting pathway biology, cross-validated reproducibility, and a literature-grounded mechanism story your translational team can defend.

Sample Output

Breast cancer ancestry analysis: DEGs, pathways, and immune markers

TCGA-BRCA: Ancestry Comparison (AA vs EA)1,084 patients

11,424

DEGs (FDR<0.05)

1,084

PATIENTS

1.2%

SIGNATURE OVERLAP

Volcano Plot: AA vs EA11,424 DEGs

Top Differentially Expressed Genesranked by FDR

Gene	log2FC	FDR
LOC90784	-1.00	3.99e-45
CROCCL1	+1.03	6.5e-39
CRYBB2	+1.24	1.6e-37
FAM3A	+0.81	1.2e-35
HEXDC	+1.08	1.9e-35
NACA2	+1.27	2.0e-35
PRSS45	+1.51	4.2e-34
DDX6	-0.56	1.3e-32
SNRNP70	+0.74	6.8e-32
CDK10	+0.87	1.2e-30

Pathway Enrichment: BasalMyo SubtypeGSEA

Immune Checkpoint MarkersCohen's d effect sizes

Marker	Cohen's d	Direction
OX40/TNFRSF4	+0.82	AA > EA
PD-1	+0.65	AA > EA
CTLA-4	+0.58	AA > EA
LAG-3	+0.54	AA > EA

Key finding: Ancestry drives significant molecular differences (11,424 DEGs) but survival effect is null after clinical adjustment (HR 1.046, p=0.837). Prognostic signature overlap between ancestries: only 1.2% (2 shared genes out of 167). This demonstrates analytical rigor; not every molecular difference translates to clinical outcomes.

ATOPIC DERMATITIS BIOMARKERS (GSE157194)

15-Gene Minimal Therapeutic Coredupilumab + cyclosporine convergence

Genes suppressed by BOTH dupilumab and cyclosporine despite different mechanisms (165 samples, 57 patients):

S100A12S100A7ACCL20IL36ASPRR2ASPRR2BSPRR2DSPRR2F

InflammatoryChemokineInterleukinBarrier

ALOX15: Pharmacodynamic Biomarker (Bidirectional)treatment-discriminating

ALOX15 moves in opposite directions under dupilumab (-1.53) vs cyclosporine (+1.18). This bidirectional response makes it a pharmacodynamic biomarker for distinguishing treatment mechanism at the molecular level.

13-Gene LASSO Biomarker Panelpredictive model

GENES

r=0.428

CORRELATION

165

SAMPLES

PATIENTS

How It Works

Methodology

STEP 1

Quality control and normalization

RNA-seq count data undergoes quality control filtering (low-count genes, outlier samples) and normalization. For the TCGA-BRCA dataset, 1,084 patients across multiple ancestry groups were processed with standard variance-stabilizing transformation.

STEP 2

DESeq2 differential expression

Multi-method differential expression with appropriate covariates (clinical stage, subtype, age). DESeq2 identifies genes with significant expression differences between groups while controlling for confounders. The analysis produced 11,424 DEGs at FDR<0.05.

STEP 3

Multi-method consensus scoring

Results from multiple analytical approaches are compared for directional agreement. Genes with consistent results across methods receive higher confidence scores. This addresses the critical finding that analytical methods can disagree substantially.

STEP 4

Cross-validation assessment

Discovered signatures are tested for reproducibility across held-out subsets and independent subgroups. The 1.2% signature overlap between ancestries was discovered through systematic cross-validation, revealing that population-specific biomarker panels are needed.

STEP 5

Effect size estimation

Cohen's d effect sizes with confidence intervals are computed for all significant findings. This separates statistically significant but biologically trivial differences from markers with meaningful effect sizes suitable for clinical translation.

Who This Is For

Target personas

Biomarker scientist

Discover and prioritize biomarker candidates from high-dimensional omics data with full statistical rigor.

Diagnostics lead

Evaluate biomarker candidates by effect size, reproducibility, and population specificity for diagnostic panel development.

Clinical trial designer

Use ancestry-specific molecular differences to inform enrichment strategies and patient selection criteria.

Academic researcher

Run a publication-grade biomarker analysis on a public dataset without a bioinformatics core, a budget, or an account — the tool is free, and the record it writes is what a reviewer will ask you for.

Related Use Cases

Run this on your own data.

Inflexa is free and open source under Apache 2.0. Install it, point it at your dataset, and see what it finds.