This matters because the lab-first model was never only about where experiments happened. It also determined the sequence in which a biotech company was expected to become credible: raise capital, build infrastructure, generate data, then prove the science.
What is changing now is that this sequence can be partially reversed. Early credibility can increasingly be built through data access, computational validation, model performance, and targeted external experimentation before a company has committed itself to the full fixed cost of a traditional wet-lab operation.
The convergence of high-throughput multi-omics data, cloud computing, and deep learning architectures has broken this paradigm – not gradually, but structurally. The entry barriers have collapsed in specific, measurable ways, and the operational consequence is a new class of biotech venture whose architecture looks nothing like the institutional model it is displacing.
By Timotej Szalay, Founder & CEO, ABYA Genomics
The Real Bottleneck Was Never Physical
The prevailing narrative frames the computational turn in drug development as a cost-reduction story. That framing is incomplete. The deeper shift is that the primary constraint in genomics and rare disease research is no longer physical – it is analytical and predictive.
Consider the data. The cost of whole-genome sequencing dropped from approximately $95 million per genome in 2001 to below $200 by 2024 – a decline that outpaced Moore’s Law by a factor of roughly 2,000 between 2008 and 2012, according to the National Human Genome Research Institute.1 The consequence is not simply that sequencing is cheaper. It is that we now generate biological data at a rate that has completely outstripped our capacity to interpret it. There are over 7,000 recognized rare diseases globally, collectively affecting an estimated 350 million people – and roughly 95 percent of those conditions lack an approved therapy. The bottleneck is not a scarcity of biological samples. It is the absence of analytical infrastructure capable of moving from raw sequence data to actionable therapeutic hypotheses.
This is the analytical and predictive crisis: an abundance of data, a scarcity of the interpretive architecture to extract signal from it. If the constraint is analytical rather than physical, then computational platforms – not wet labs – become the primary site of value creation in early-stage drug discovery. That is the founding logic of ABYA Genomics, and building from that premise has clarified, for me, just how much of traditional biotech’s cost structure was downstream of the wrong assumption about where the bottleneck actually sat.
The bottleneck in genomics is not a scarcity of biological samples. It is the absence of the analytical infrastructure to extract signals from the data we already have.
The Economics of a 90 Percent Failure Rate
No honest account of why traditional biotech demanded institutional infrastructure can avoid confronting its economics of failure. Nine out of ten drug candidates that enter Phase I clinical trials ultimately fail.2 The cost of bringing a single approved drug to market, including capitalized cost of failures, has been estimated at $2.3 to $2.6 billion, with Phase III trials alone averaging $350 million per program.3
The failure modes are instructive. Analyzes of clinical trial data from 2010 to 2017 attribute failures as follows: lack of clinical efficacy (40–50 percent), unmanageable toxicity (30 percent), poor drug-like properties (10–15 percent), and lack of commercial rationale (roughly 10 percent). The first two categories – efficacy and toxicity – account for 70 to 80 percent of all failures, and both are, in principle, predictable from molecular and structural biology before anyone enters the clinic.
This is not a minor observation. It means that the overwhelming majority of the $2.6 billion average development cost reflects failures that could potentially have been filtered out earlier with sufficiently reliable predictive tools. The traditional model did not fail because it was scientifically unsound. It failed because it could not access high-fidelity predictive information at a cost that allowed early filtering. In silico methods, at the frontier, change that calculus.

Structural Biology at Scale and the Integration Problem
The open release of AlphaFold2 in 2021 – and the subsequent expansion of the AlphaFold Protein Structure Database to over 200 million predicted protein structures – was the most visible inflection point in this transition. For the first time, a structural vocabulary for the proteome existed at scale. Structure-based drug design became tractable for targets and disease areas that had previously been computationally inaccessible, because the experimental structures simply did not exist.
I want to be precise here, because the AlphaFold narrative is often oversimplified. Significant limitations remain – particularly in modeling protein-ligand binding pockets with the conformational accuracy required for virtual screening.4 AlphaFold2 models often require structural refinement before they are useful for docking-based lead generation, and performance varies substantially across protein families. What the technology provided, at scale, was not a wholesale replacement for experimental structural biology. It was a structural vocabulary that made computational approaches viable for a much broader range of targets – including, critically, the novel disease-associated variants common in rare disease research.
But the deeper operational challenge is not any individual algorithm. It is integration. The bioinformatics software landscape is vast: tools exist for sequence alignment, variant calling, protein structure prediction, molecular docking, molecular dynamics simulation, pathway analysis, and ADMET prediction. The existence of these tools does not constitute a functional pipeline. Each requires distinct input formats, operates under different computational paradigms, and produces output that must be transformed before serving as input downstream.
From building ABYA’s platform, I can say directly: a researcher without an integrated environment spends a disproportionate fraction of their time on data wrangling, format conversion, and dependency management – none of which generates scientific insight. This overhead is a structural barrier that effectively excludes researchers with deep domain expertise but limited software engineering background. The no-code framing we use at ABYA is not about simplifying the science. It is about eliminating the software engineering tax that has historically been imposed on scientific judgment – target selection, hypothesis formation, biological interpretation. Those are the activities that actually require expertise; the pipeline plumbing should not.
The no-code framing is not about simplifying the science. It is about eliminating the software engineering tax imposed on scientific judgment – the activities that actually require expertise.
Digital Twins, Rare Disease, and the Near-Term Horizon
The most consequential near-term application of computational pipelines in rare diseases is not the replacement of clinical trials – that is a longer arc, and regulatory acceptance of virtual trials as standalone evidence is not imminent. The FDA’s ISTAND program is actively working on in silico method qualification, and the ICH S7B guideline update in 2022 established regulatory precedent for computational cardiac safety models. These are meaningful steps, but the evidentiary standards for digital twin validation remain unsettled.
What is immediately operationally meaningful is using digital twin and virtual cell approaches to augment conventional development: better patient stratification through in silico prediction of responder populations; earlier identification of toxicity signals; more accurate dose-response prediction before first-in-human studies. For rare diseases specifically, this augmentation potential is disproportionately valuable. Patient populations are too small to power conventional Phase II and III trials effectively. Many therapeutic candidates never reach clinical development not because the underlying biology is unsound, but because the trial design cannot accommodate the epidemiology.
At ABYA, this is not an abstract architectural question. It is the practical daily challenge of building a platform that allows researchers – including our collaborators at the Slovak Academy of Sciences and early-stage partners in regenerative medicine – to move from a genomic hypothesis to a computationally validated therapeutic candidate without the capital overhead that would otherwise make that work inaccessible. The platform has to be reliable enough that when a prediction survives in silico validation, it meaningfully improves the probability that subsequent experimental work will confirm it.
Where the Moat Actually Sits
For founders building in this space – and for investors evaluating them – it is worth being precise about where defensible value actually resides. The algorithms that underpin in silico discovery are largely commoditized. AlphaFold2 is open. GROMACS is open. Most of the molecular dynamics tooling is available. A platform whose competitive position depends on proprietary access to any individual component is in a precarious position. The durable differentiation lies in the architecture: the coherence of the integration, the domain-specific validation datasets against which predictions are benchmarked, and the workflow design calibrated to specific disease contexts.
This has a concrete strategic implication. Academic and industry partnerships – even early, informal ones – provide access to the experimental validation data that allows computational predictions to improve and be meaningfully differentiated. A platform validated against observed outcomes in specific disease areas is substantially more defensible than one demonstrating accuracy only on published benchmarks. The data strategy is, in this model, as important as the algorithm strategy. The algorithms can be replicated; validated domain-specific datasets cannot.
The asset-light framing that characterizes the most computationally interesting biotech companies is not simply a capital efficiency story. It reflects a different theory of where scientific value is created in early-stage drug development. The claim is not that physical validation becomes unnecessary. It is that the order of operations has changed: the highest expected-value activity in early-stage biotech is now computational prediction, and wet lab resources should be deployed to validate the strongest outputs of that process rather than to explore chemical space empirically from the outset.
That reordering has operational consequences that are still working themselves out across the sector. A two-person team building on cloud infrastructure, with access to open structural biology data and a well-integrated analytical pipeline, can now advance a therapeutic hypothesis to a stage that would have required a fully staffed laboratory and multi-year funding a decade ago. That is not a claim about replacing experimental biology. It is a claim about when experimental biology becomes necessary – and what kind of startup architecture that permits. The capital moat that gatekept this sector was not protecting scientific rigor. It was protecting an outdated assumption about where the real work of early-stage discovery happens.
References
1. NHGRI DNA Sequencing Costs Data (genome.gov), 2001-2022.
2. PMC/Nature Reviews Drug Discovery. Clinical trial failure rate analysis, 2010–2017 cohort.
3. Deloitte / Tufts Center for the Study of Drug Development, 2022-2023.
4. Alhumaid & Tawfik, International Journal of Molecular Sciences, 2024.









