April 28, 2025
While AI has been topping the agenda at most industry events for several years now, the progression from futuristic vision to nearly commodity capability was particularly apparent at BioIT World 2025. With four conference tracks focused on different types of AI and specialist exhibitors such as Bullfrog AI, expertai, and Union AI, to name just a few, the variety and depth of use cases on display were fascinating. And if your first thought when you hear 'RAG' is still red, amber, green rather than retrieval augmented generation for large language models (LLMs), you're not alone!
The advances in AI allow for a full spectrum of support, that many might not be aware of. In biopharma, AI support ranges from relatively straightforward but highly useful generative AI (GenAI), such as automatically creating an experiment summary in a standardized format, through to full molecule design/optimization and predictive models for in silico development. A few glimpses of what the future holds include efforts to speed up the clinical trial application process using an integrated FAIR data hub, the use of preclinical data to predict clinical performance including pharmacokinetic (PK) properties at a fraction of the time and cost of traditional approaches, and the development of early predictive developability assessment models with significant economic and environmental benefits.
Despite the benefits, it wouldn't be a tech conference without a few cautionary tales about AI. One of the more thought-provoking talks was around the limitations of artificial general intelligence (AGI) and the concept of a potential "AI winter" on the horizon, where funding, interest, and data for AI development run low and impedes progress. AI hallucination is a very real concern, particularly when there are large gaps in the training data, and the common practice of only saving and employing 'good' data from experiments influences AI based outcomes unintentionally. Ensuring sufficient data sets for training models may require deliberately 'bad' experiments to explore undesirable as well as desirable results.
With so much complexity in terms of the science, engineering, and technology involved in drug discovery and development, it's no surprise that many speakers used analogies to help the audience gain a better understanding of what they're trying to do and why. Analogies can be a double-edged sword, however. One presenter used the convenience of a "one stop shop" at a large supermarket to argue for implementing a single data platform. Yet, that same analogy could be used to argue that you tend to find better quality products by shopping at specialty stores, such as a bakery or butcher, and the same can be true for informatics tools. We're starting to see this as an emerging trend. It is still common for large pharma organizations to want to manage complexity by standardizing on one or at most two electronic lab notebook (ELN) systems across the whole organization. The use of an ELN can provide much more functionality than a blank page to write on and a place to store attachments. Additional capabilities range from resource scheduling and instrument integration through to data aggregation and visualization. However, there is no single ELN, or even Laboratory Information Management System (LIMS), solution that can meet the diverse requirements of all groups in pharma R&D equally well. Rather than forcing some groups to give up their preferred system for the greater good of a single company-wide platform, a few companies have invested in building a connecting layer between different tools. This approach allows each user group to choose the best fit-for-purpose tool for their needs and use cases, while still facilitating data access, collation, and knowledge sharing across the organization using a standard, underlying data structure and vocabulary. This mindset expands beyond publishing results data to a common data warehouse or data lake; instead, it establishes a data-centric architecture and "plug and play" infrastructure that enables different groups to work together effectively without sacrificing functionality.
Regardless of how an organization captures, stores, and collates data, a common question on the minds of most of the conference attendees was how to make sure data is "AI ready". Several different but potentially complementary perspectives were described during talks and at vendor booths. Ontologies and controlled vocabularies are generally considered a fundamental requirement but approaches to managing these range from extracting text from ELN entries and building out local ontologies over time to requests for open partnership and sharing of pre-competitive information. BioRels, an open-source data preparation infrastructure developed by Lilly Medicines Unit, for example, was designed to process 30 of the main drug discovery resources and the presenter, Dr. Jeremy Desaphy, made a compelling argument that no one wins if data isn't prepared correctly.
It's no secret that we recommend the open approach when it comes to data preparation and harmonization. An inward-facing approach, where external data models are enriched with company-specific terminology and aliases, may provide some benefits in the short term, but the scalability of a standardized approach to data standards and ontologies provides streamlined opportunities for information sharing with external partners and complex supply chains. Additionally, this open approach is critical to address the widespread challenge of data silos. The two don't have to be mutually exclusive, however. There's something to be said for a hybrid approach that starts with controlled vocabularies for key terms such as study, patient, sample, etc., but doesn't let perfect get in the way of progress and continuously adapts the data models and data governance as more data is generated and the ability to query that data becomes more sophisticated. Many of the taxonomies and ontologies in the public domain have been developed by incorporating experienced domain expertise and those that are actively curated will continue to grow and evolve with the science, accelerating the development of a core standards reference internally.
Looking beyond data preparation and harmonization, it is important to consider the lifecycle of drug development in all of this. The progress from discovery to commercialization is actually more circular than linear. One of the best ways to visualize the real value chain in biopharma development is to follow the lifecycle of a sample – where does it come from, why was it taken, and what decisions will be made based on the analytical test results. Our AI efforts should never lose focus regarding this, and the next wave of digitalization should align and empower data and business processes at an operational level to accelerate time to market. Leveraging the patterns that have emerged from previous manufacturing/clinical performance to shape a quality by design (QbD)/design of experiments (DOE) approach to study design and execution is a valuable guide to ensuring that your digital roadmap provides a full end to end solution for all users and potential customers.
Business Consultant
Senior R&D Technology Manager
Senior Director, Digital Transformation
TAGS: R&D Technology Digital Transformations Artificial Intelligence
July 31, 2020
COVID-19 has been the ultimate Big Pharma catalyst, as well as the biggest thing to hit biopharmaceutical news pages in decades. On the industry side, however, investors and observers remain mindful...
October 29, 2019
Big data is hitting us from all angles, and life science industries are not being left out. Why? Your life depends on it, literally. Life sciences generate lots of large and complex data every single...
February 23, 2021
Research and development is a key component in the successful discovery and development of new drugs and medical devices entering the market. It’s also an area of big business, with investments in...