Biomarker Discovery
Find molecular signatures that distinguish patient groups from high-dimensional omics data. Cortex runs multi-method differential expression, pathway enrichment, and effect size estimation, then cross-validates findings to separate robust biomarker candidates from noise.
Deliverables
Candidate biomarkers with multi-method consensus
Differentially expressed genes identified through multi-method analysis and ranked by statistical confidence. For the TCGA-BRCA ancestry comparison, Cortex identified 11,424 significant DEGs (FDR<0.05) from 1,084 patients. The top hits, LOC90784 (FDR 3.99e-45), CROCCL1 (FDR 6.5e-39), and CRYBB2 (FDR 1.6e-37), represent the most statistically robust ancestry-associated expression differences in breast cancer.
Effect sizes with confidence intervals
Beyond statistical significance, Cortex computes effect sizes (Cohen's d) with confidence intervals for key markers. The immune checkpoint analysis revealed OX40/TNFRSF4 with a large effect size (d=+0.82, AA>EA), followed by PD-1, CTLA-4, and LAG-3, all elevated in African American patients. These effect sizes provide the practical significance needed to assess biomarker utility in clinical settings.
Cross-validation and reproducibility metrics
The platform tests whether discovered signatures reproduce across subgroups. For the ancestry comparison, prognostic signature overlap was remarkably low: only 1.2% (2 shared genes out of 167), demonstrating that ancestry-specific prognostic models are needed. The null survival effect after clinical adjustment (HR 1.046, p=0.837) further shows that molecular differences don't always translate to outcome differences, a finding that's as important as discovering significant biomarkers.
Prioritize candidates for validation assays based on statistical rigor and biological plausibility. Focus resources on biomarkers with large effect sizes, biological pathway support, and demonstrated cross-validation performance.
Breast cancer ancestry analysis: DEGs, pathways, and immune markers
| Gene | log2FC | FDR |
|---|---|---|
| LOC90784 | -1.00 | 3.99e-45 |
| CROCCL1 | +1.03 | 6.5e-39 |
| CRYBB2 | +1.24 | 1.6e-37 |
| FAM3A | +0.81 | 1.2e-35 |
| HEXDC | +1.08 | 1.9e-35 |
| NACA2 | +1.27 | 2.0e-35 |
| PRSS45 | +1.51 | 4.2e-34 |
| DDX6 | -0.56 | 1.3e-32 |
| SNRNP70 | +0.74 | 6.8e-32 |
| CDK10 | +0.87 | 1.2e-30 |
| Marker | Cohen's d | Direction |
|---|---|---|
| OX40/TNFRSF4 | +0.82 | AA > EA |
| PD-1 | +0.65 | AA > EA |
| CTLA-4 | +0.58 | AA > EA |
| LAG-3 | +0.54 | AA > EA |
Key finding: Ancestry drives significant molecular differences (11,424 DEGs) but survival effect is null after clinical adjustment (HR 1.046, p=0.837). Prognostic signature overlap between ancestries: only 1.2% (2 shared genes out of 167). This demonstrates analytical rigor; not every molecular difference translates to clinical outcomes.
ATOPIC DERMATITIS BIOMARKERS (GSE157194)
Genes suppressed by BOTH dupilumab and cyclosporine despite different mechanisms (165 samples, 57 patients):
ALOX15 moves in opposite directions under dupilumab (-1.53) vs cyclosporine (+1.18). This bidirectional response makes it a pharmacodynamic biomarker for distinguishing treatment mechanism at the molecular level.
Methodology
Quality control and normalization
RNA-seq count data undergoes quality control filtering (low-count genes, outlier samples) and normalization. For the TCGA-BRCA dataset, 1,084 patients across multiple ancestry groups were processed with standard variance-stabilizing transformation.
DESeq2 differential expression
Multi-method differential expression with appropriate covariates (clinical stage, subtype, age). DESeq2 identifies genes with significant expression differences between groups while controlling for confounders. The analysis produced 11,424 DEGs at FDR<0.05.
Multi-method consensus scoring
Results from multiple analytical approaches are compared for directional agreement. Genes with consistent results across methods receive higher confidence scores. This addresses the critical finding that analytical methods can disagree substantially.
Cross-validation assessment
Discovered signatures are tested for reproducibility across held-out subsets and independent subgroups. The 1.2% signature overlap between ancestries was discovered through systematic cross-validation, revealing that population-specific biomarker panels are needed.
Effect size estimation
Cohen's d effect sizes with confidence intervals are computed for all significant findings. This separates statistically significant but biologically trivial differences from markers with meaningful effect sizes suitable for clinical translation.
Target personas
Biomarker scientist
Discover and prioritize biomarker candidates from high-dimensional omics data with full statistical rigor.
Diagnostics lead
Evaluate biomarker candidates by effect size, reproducibility, and population specificity for diagnostic panel development.
Clinical trial designer
Use ancestry-specific molecular differences to inform enrichment strategies and patient selection criteria.
Explore more
Ready to see your own data analyzed?
Tell us what you're working on. We can show you what the output would look like.