Case Study · April 2026 · ~12 min read

From a NASH Liver Atlas to a Chemistry-Aware Repurposing Hypothesis

A two-stage computational case study

Why this case study is interesting

Most drug-repurposing posts stop at "compound X reverses the disease signature, therefore X is a candidate." That framing is brittle: it ignores chemistry, target selectivity, and the gap between transcriptome reversal and clinical benefit. The pipeline shown here goes two steps further.

Run 1 builds a NASH disease model from public RNA-seq + a toxicogenomics compendium and ranks 128 compounds by signature reversal. Run 2 then takes the #1 hit, theophylline, and does the chemistry homework: scaffold decomposition, isoform selectivity, off-target liability, and chemical-space mapping against 567 PDE4 inhibitors from ChEMBL.

The result is not a single drug nomination but a structured set of testable hypotheses with explicit failure modes.

The data

GSE126848: bulk liver RNA-seq, 57 biopsies across healthy (n=14), obese (n=12), NAFLD (n=15), NASH (n=16). Polyadenylated RNA, Illumina NextSeq 500. Strong gender skew (47M / 10F).
OpenRiskNet toxicogenomics compendium: 822 Affymetrix microarray profiles of HepG2 / HepaRG cells treated with 282 compounds (238 with ChEMBL IDs) at multiple doses and time points; each profile is a log-ratio versus matched DMSO control, labelled GTX (n=289) / NGTX (n=533).
ChEMBL PDE4B/D bioactivity: 567 unique compounds with IC50 pChEMBL ≥ 5.0 (median pChEMBL 8.03, ≈9 nM).

The two transcriptomics datasets share an Ensembl gene space, which is what makes a CMap-style reversal scoring possible.

Run 1: Building the NASH disease model

A 1,306-gene NASH signature dominated by fibrogenesis and inflammation

DESeq2 with a gender covariate identified 1,306 DE genes in NASH vs healthy liver (792 up, 514 down). Top-ranked genes (TP53I3, CYCS, DYNLT1, plus the canonical fibrosis triad SERPINE1 / MMP9 / THBS2) recapitulate established NASH biology validated in independent biopsy cohorts (Kozumi et al. 2021, Hepatology; Zhao et al. 2024).

Volcano plot of NASH vs healthy liver differential expression — Volcano plot showing 1,306 differentially expressed genes in NASH vs healthy liver. Red: upregulated (792), blue: downregulated (514).

Where does NASH actually start? A 54-gene NAFLD→NASH transition

A gender-corrected NASH-vs-NAFLD contrast gave a much sharper, 54-gene signature with a striking 50:4 up/down asymmetry: COL1A1, COL16A1, LAMC3, LOXL4, ITGBL1, MRC2, THBS2, SERPINE1, CCDC80. This is the "fibrogenic switch": the transition from steatosis to steatohepatitis is overwhelmingly an activation program, not a loss-of-function.

LOXL4 in particular has been recently re-identified (over LOXL2) as the critical collagen-crosslinking enzyme in fibrosis (Ma et al. 2023, Sci Adv).

Volcano plot of NASH vs NAFLD differential expression — The NAFLD→NASH transition signature: 54 genes with a striking 50:4 up/down asymmetry reflecting the fibrogenic switch.

The dominant inflection point is obese → NAFLD, not NAFLD → NASH

LRT trajectory analysis on 6,782 progression genes clustered into four trends; the two largest clusters (4,729 genes combined) showed sharp expression changes at the obese-to-NAFLD boundary, while healthy and obese livers were transcriptomically near-identical.

P4HA1 (collagen prolyl hydroxylase) declined from obesity onward, a paradoxical early suppression of normal ECM homeostasis preceding the later fibrogenic switch. No prior literature reports this trajectory, making it a genuinely novel hypothesis worth experimental follow-up.

LRT trajectory cluster profiles across disease stages — Trajectory analysis across disease stages. The dominant transcriptomic shift occurs at the obese→NAFLD boundary, not the NAFLD→NASH transition.

Co-expression modules expose two unexpected biology layers

WGCNA produced 8 modules; 7 correlated significantly with NASH status. Two stand out:

M7 (r = −0.484) is a sinusoidal endothelial / Kupffer-cell identity module: KDR, FLT1, FLT4, CSF1R, CDH5, LYVE1, VCAM1. Its decline mirrors the documented replacement of embryo-derived TIMD4⁺/LYVE1⁺ Kupffer cells by pro-inflammatory monocyte-derived macrophages.
M1 (r = +0.318) is an RNA-processing / splicing module led by ZC3H13 (a core m6A methyltransferase complex component), RBM25, NEMF, SRSF11, PHIP. Splicing dysregulation in NASH is established for SRSF1/SRSF6, but the specific M1 composition and ZC3H13's epitranscriptomic contribution are novel.

WGCNA module-trait correlation heatmap — Module–trait correlation heatmap. 7 of 8 WGCNA modules correlate significantly with NASH status.

Hub gene network for NASH-associated modules — Hub gene network for key NASH-associated modules, highlighting M7 (Kupffer cell identity) and M1 (RNA processing/splicing).

A novel single-gene hypothesis: TIMD4 collapse

TIMD4 (TIM-4, the Kupffer-cell phosphatidylserine receptor for efferocytosis) is among the most strongly downregulated genes in the dataset (log2FC = −4.42). A landmark 2025 Sci Transl Med paper (Shi et al., PMID 40929246), published after this hypothesis would have been generated, provides direct causal validation: TIM4 loss impairs apoptotic hepatocyte clearance and drives HSC activation through loss of macrophage IL-10.

This connects M7 (KC identity loss) to the SERPINE1/fibrosis axis as a single mechanistic chain.

Cross-contrast GSEA Hallmark pathway heatmap — Cross-contrast GSEA Hallmark heatmap showing pathway enrichment patterns across disease stage comparisons.

CMap-style reversal scoring on the OpenRiskNet compendium

Querying 128 unique compounds against the NASH signature identified:

Rank	Compound	Reversal score	Notes
1	Theophylline	−0.313	Non‑selective PDE inhibitor + adenosine antagonist
3	Doxorubicin	n/a	Cytotoxic confounder
4	Aflatoxin B1	n/a	Cytotoxic confounder
10	Simvastatin	n/a	92nd percentile: positive control passes
16	Metformin	n/a	88th percentile: positive control passes
38	Dexamethasone	n/a	Negative control correctly outside top quartile
40	Valproic acid	n/a	Negative control correctly outside top quartile

The negative/positive control behaviour validates the methodology. Metformin's #1 composite score (0.703) directly contradicts its failure in multiple NASH histology RCTs, a brutally honest finding the pipeline surfaces rather than hides.

Top 30 reversal compounds ranked by KS score — Top 30 reversal compounds by KS connectivity score and Pearson anti-correlation. GTX (genotoxic) compounds flagged in orange/red.

Composite scoring bubble chart — Composite scoring combining reversal strength and correlation metrics across the compound library.

Run 2: Doing the chemistry homework on the top hit

Theophylline is biologically plausible (PDE4 + A-receptor) but pharmacologically messy. Run 2 asks: can we find a clean PDE4 inhibitor that captures theophylline's transcriptional reversal without its off-target liabilities?

A SAR landscape of 567 PDE4 inhibitors

Murcko scaffold decomposition across the ChEMBL set gave 272 unique scaffolds. Indole-triazole-piperazine/piperidine variants dominate PDE4B activity (~130 compounds, mean pChEMBL 7.4–9.2, best 10.0 ≈ 0.1 nM). Strikingly, every top-15 scaffold is exclusively PDE4B or PDE4D, with no scaffold-level dual activity. 43 activity-cliff pairs (SALI > 10) were detected; the steepest cliff in PDE4B reached SALI 130.5.

PubMed searches for "indole-triazole-piperazine PDE4" return zero hits: this is likely a proprietary patent series surfacing in ChEMBL. PDE4B-selective inhibition in fibrotic lung disease is independently validated by nerandomilast, a thienopyrimidine PDE4B inhibitor with Phase 3 success in IPF (Richeldi et al. 2025, NEJM).

Representative indole-triazole-piperazine Murcko scaffold dominating PDE4B activity.

Scaffold-activity heatmap for PDE4B and PDE4D — Scaffold–activity heatmap showing mean pChEMBL per target for the top 15 scaffolds. Every scaffold is exclusively PDE4B- or PDE4D-selective.

Off-target screening: a clean adenosine receptor profile

Zero of 567 PDE4-active compounds have recorded pChEMBL ≥ 5 activity against any adenosine receptor (ADORA1/2A/2B/3) in ChEMBL. This is a strong (if absence-of-evidence-caveated) signal that selective PDE4 chemistry can in principle separate cleanly from theophylline's A-receptor pharmacology.

Theophylline lives in a different chemical universe

Tanimoto comparison of theophylline against the entire PDE4-selective set yielded maximum similarity 0.154 (i.e., there is no near-neighbour PDE4 inhibitor structurally resembling theophylline). The reversal signal is mechanism-driven, not chemistry-driven.

Tanimoto similarity distribution of theophylline vs PDE4 inhibitors — Tanimoto similarity distribution showing theophylline sits in a completely different chemical space from selective PDE4 inhibitors (max similarity 0.154).

The critical isoform selectivity gap

Here the pipeline does something genuinely useful: it flags its own internal contradiction.

The strongest mechanistic NASH evidence implicates PDE4D (hepatocyte steatosis via CD36-TGFβ1).
The recommended top scaffolds from this SAR landscape are PDE4B-selective (inflammation / Kupffer-cell TNF).
Theophylline's anti-fibrotic activity may partially derive from A2A blockade on hepatic stellate cells, an effect a clean PDE4 inhibitor would lose.

So a pure PDE4-selective successor to theophylline could underperform on fibrosis endpoints despite better selectivity. This is the kind of structured hypothesis a downstream wet-lab program can actually act on.

What hypotheses come out of this?

TIMD4 / efferocytosis loss is upstream of fibrogenic activation in NASH. Restoring Kupffer cell efferocytosis (e.g. via HIF-2α modulation) should reduce SERPINE1/COL1A1-driven fibrosis. (Independently confirmed by PMID 40929246.)
The obese→NAFLD transition is the dominant transcriptomic inflection point, not NAFLD→NASH. Early intervention windows may sit before histological steatosis.
P4HA1 early suppression is a novel pre-steatosis biomarker hypothesis with no prior literature support.
ZC3H13-centred m6A dysregulation is a candidate upstream mechanism for downstream NASH metabolic and splicing phenotypes.
A PDE4B-selective indole-triazole-piperidine compound could replicate theophylline's anti-inflammatory reversal but would need PDE4D activity (or A2A-independent anti-fibrotic add-on) to match theophylline's fibrosis effect.
Metformin's high composite reversal score is a cautionary finding, not a candidate; it concretely demonstrates the gap between transcriptome reversal and human histology.

Why the analytical depth matters

What distinguishes this pipeline from a one-shot CMap query:

Independent positive/negative controls were prespecified (simvastatin, metformin / dexamethasone, valproic acid) and behaved correctly.
The pipeline contradicts itself out loud: metformin's #1 ranking is reported alongside the RCT evidence that refutes it (PMID 22278337, 34400208).
Findings are explicitly graded as confirmed / partially confirmed / novel / contradicted, with PMID-level literature concordance.
The chemistry follow-up exposes a target-isoform mismatch (PDE4D vs PDE4B) that a transcriptomics-only analysis would have missed entirely.
An independent CMap study of the same dataset (PMID 40936782) returned different candidates (WYE-354, Triciribine), and this pipeline reports that pipeline-dependence rather than glossing over it.

The output of computational repurposing should look like this: a ranked list, a mechanism, a chemistry plan, and a written list of how each step could be wrong.

Interested in this kind of analysis?

See how Inflexa runs multi-stage pipelines from raw data to structured hypotheses.

Install Inflexa