Perspective · June 9, 2026 · ~8 min read

The Best Hypothesis About Lung Cancer Might Come From Astronomy

How analogical reasoning in Inflexa surfaced five Tier-1-worthy hypotheses about lung squamous cell carcinoma — none of them from an oncology paper.

The idea

One of Inflexa's core capabilities is analogical reasoning: when a user brings an open-ended scientific problem, the system is encouraged to deliberately reach into other domains — physics, linguistics, ecology, engineering — and translate their problem-shapes back into the user's. This article is a worked example. Asked for novel, publication-worthy insight about lung squamous cell carcinoma (LUSC), the system stopped being an oncologist and became a linguist, an astronomer, an ecologist, and an adversary — and came back with five falsifiable hypotheses.

The problem with reading more papers

There is a quiet, unspoken law of academic research: the more you read in your field, the harder it becomes to have a genuinely new idea in it.

Every domain accumulates a default ontology — a set of variables it measures, a vocabulary it uses, a list of mechanisms it considers plausible. After enough exposure, those defaults stop feeling like choices. They feel like reality. And when reality is fixed, the search space for novel hypotheses collapses.

The cure is not more papers. It is other fields.

We built Inflexa, in part, around this conviction. When a user brings an open-ended scientific problem, the system is encouraged to reach into other domains and translate their problem-shapes back into the user's. What follows is what that looks like in practice.

The setup

A user came to Inflexa with a curated bulk RNA-seq compendium of human lung tissue: 4,316 samples, drawn from 76 independent GEO studies, spanning all major lung cancer subtypes plus normal-adjacent and healthy controls. Within that, the LUSC-relevant cohort:

  • 400 lung squamous cell carcinoma (LUSC) tumors
  • 1,009 lung adenocarcinoma (LUAD), for contrast
  • 926 unclassified NSCLC — the genuinely ambiguous middle
  • 474 normal-adjacent samples
  • 22,881 protein-coding genes; raw counts; heavy multi-study batch structure

The question was deliberately open-ended: surface novel, publication-worthy insights about LUSC.

LUSC is, by most accounts, a hard target. It is the less-glamorous sibling of LUAD: fewer actionable mutations, fewer approved targeted therapies, worse outcomes, and a decades-deep literature that has settled into four canonical molecular subtypes nobody has meaningfully improved on. The standard analytical playbook — differential expression, subtype assignment, survival association — has been run to exhaustion.

So we asked Inflexa to skip the playbook and reason by analogy instead.

Five framings, five hypotheses

What follows are the five cross-domain framings the system produced. Each one starts in another field, identifies a deep structural correspondence with the LUSC problem, and exports a falsifiable hypothesis.

1Audio signal processing → bulk deconvolution

The cocktail party

A bulk tumor sample is a mixture recording: the integrated voice of every cell type in the biopsy, captured through a single channel. Each GEO study is a different microphone in a different room — imposing its own acoustic distortion (batch effect) on top of the underlying signal.

Standard pipelines try to remove the room. But in blind source separation, the channel response is sometimes part of the signal. Applied to LUSC, that reframing changes what you go looking for.

The hypothesis

A hierarchical multi-resolution decomposition — compartment, then niche, then program — recovers a previously invisible squamous–myeloid interface program: a co-program of tumor keratinocytes and tumor-associated macrophages, absent in LUAD and orthogonal to the four canonical LUSC subtypes.

2Computational linguistics → tumor transcriptomes

The corpus

Treat each tumor as a document. Genes are words. Transcription factors are authors. Histological labels (LUSC, LUAD, NSCLC-unspecified) are author personas — priors that condition which topics show up.

Author-topic models from computational linguistics separate what the author wrote from what the publisher distorted. The same machinery separates real biology from labeling artifact in the ambiguous middle of the cohort.

The hypothesis

Applied to the 926 unclassified NSCLC samples, a third stratum appears: an intermediate-plasticity topic with simultaneous high loadings on squamous markers (KRT5, TP63) and adeno markers (NKX2-1, SFTPC). Not noise. Not mislabeling. A real intermediate state in adeno↔squamous transdifferentiation — and a patient stratum that has never been formally named.

3Astronomy → tumor differentiation history

The galaxy spectrum

Astronomers reconstruct a galaxy's star-formation history from a single integrated spectrum, by fitting it as a non-negative mixture of "simple stellar populations" at different ages and metallicities. Swap stars for cells, age for differentiation state, and the same machinery applies.

The hypothesis

Each LUSC tumor has a fate history, and that distribution is bimodal. One mode is a burst history — synchronous arrest at a single basal-like stage, closed chromatin, immune-cold, ICI-refractory. The other is an extended history — continuous differentiation flux, immune-engaged, responsive. The bulk transcriptome remembers how the tumor got here.

4Statistical physics → therapy-induced transformation

The tipping point

Ecosystems near collapse exhibit critical slowing down: rising lag-1 autocorrelation, rising variance, lengthening correlations — measurable before the regime shift actually happens. These are early-warning signals, validated everywhere from lake ecosystems to climate records to financial crashes.

The hypothesis

Order LUSC tumors along a squamous-purity pseudo-trajectory, slide a window, and compute the early-warning statistics on the EMT/p63 module. A narrow band of tumors sits at a transcriptional tipping point — flagged from baseline RNA-seq as high-risk for therapy-induced transformation to small-cell or sarcomatoid disease. A baseline biomarker for one of the most feared modes of treatment failure, imported wholesale from lake ecology.

5Cybersecurity → immune evasion

The adversarial attack

A tumor evading T-cell recognition and an attacker evading an intrusion-detection system are solving the same problem: find a minimal perturbation along a low-detectability direction of the defender's feature vector. The adversarial ML literature has shown that attackers favor a small number of structured strategies, not a smooth continuum.

The hypothesis

LUSC immune evasion is not one mechanism, but 3–4 archetypes — HLA-LOH–like collapse of presentation, B2M-quiet presentation failure, STAT1-high but chemokine-off rewiring, and co-inhibitory ligand saturation. Each predicts a different mode of ICI failure. Each maps to a different rescue regimen.

What the framings share

These analogies look unrelated. They aren't. Every one of them sees the same underlying structure:

  • A mixture of latent signals
  • Captured by heterogeneous channels
  • With a structured perturbation layered on top

Cocktail parties. Corpora. Galaxy spectra. Ecosystems near tipping points. Adversarial attacks on classifiers. Bulk tumor RNA-seq is all of them at once — and most other interesting biological problems are too. The reason analogies are productive here is not that biology is uniquely metaphor-friendly. It's that the problem shape repeats across fields, and each field has refined its own toolkit for it.

What the user gets back isn't five gestures at novelty. It's a single coherent analysis pipeline — where every step is independently grounded in another discipline's most rigorous method:

hierarchical NMF → deconvolution → pseudo-trajectory → early-warning statistics → archetype clustering

The broader point

Most "AI for science" tools are accelerators of the default playbook. They read more papers, run more comparisons, surface more associations within the field's existing ontology. That is useful. It is also where the field's blind spots live.

The hypotheses above are not blind spots Inflexa filled in by reading more LUSC literature. They are hypotheses that became visible only after the system was asked to stop being an oncologist and start being a linguist, an astronomer, an ecologist, an adversary.

That is the capability we are building toward: not faster searches within a domain, but a structured way of importing the rest of science into it.

If you have a problem that the default playbook in your field has run to exhaustion, that is the moment Inflexa is built for.

Inflexa is in private beta. The five framings above are hypothesis-generating scaffolds produced by analogical reasoning on a public LUSC RNA-seq compendium — methodology, not clinical recommendations.

Have a dataset the playbook has exhausted?

Put your hardest problem in front of Inflexa and see what the rest of science has to say about it.