AUGMENTED INTELLIGENCE IN DRUG DISCOVERY XCHANGE
EAST COAST 2023
Boston
May 25, 2023
Welcome to hubXchange’s Augmented Intelligence (AI) in Drug Discovery Xchange East Coast 2023, bringing together executives from pharma and biotech to address and find solutions to the key issues faced in AI-led drug discovery.
Discussion topics will cover Data Quality, Target Identification, Lead Generation, Lead Optimization and Drug Response Prediction.
Take advantage of this unique highly interactive meeting format designed for maximum engagement, collaboration and networking with your peers.
Please note this is an In-Person meeting.
VENUE DETAILS: Hilton Boston Woburn Hotel, 2 Forbes Road, Woburn MA 01801
SNAPSHOTS OF DISCUSSION TOPICS
- Overcoming current limitations in data generation and management
- Guidance on standard practice in clinical genomic data generation and data analysis pipelines
- Use of AI in integrating multi-dimensional datasets for target discovery
- Emerging modeling approaches to identify the covalently druggable targets
- Bridging the gap between biological nails and AI hammers
- Molecular design cycle: computation-first approach
- Challenges and approaches to building deep learning models for antibody lead optimization
- Can machine learning methods provide insights and predictions for cancer drug response?
Full Xchange Agenda
Click on each track for detailed agenda
Data Quality
Opening Address & Keynote Presentation
Building predictive models on rock and not quicksand: solid data foundations
Most if not all cutting-edge predictive models we see appear in Data Science and Life Science R&D are built on data. The challenge which is often overlooked is how to make sure the data foundations for these models are of high quality, reliable, normalized, and re-usable. This talk will focus on the experiences and processes needed in making both Elsevier and customer internal data “machine-learning ready”. Surprisingly, this often not only involves changes in data capture and modelling, but also changes in people and processes.
Senior Director Professional Services and Consulting, Corporate R&D, Elsevier
In Elsevier’s Professional Services team, Frederik leads the global consultancy practice on data integration and analytics projects throughout the life science, chemistry and engineering domains using commercial, proprietary, and public data sources. He holds a doctorate in Chemical Physics from the University of Amsterdam / FOM Institute AMOLF and a master’s degree in Chemistry from Utrecht University.
Guidance on standard practice in clinical genomic data generation and data analysis pipelines
- Vendor Qualification and fit for purpose assay selection
- Global regulatory guidance for CRO based data generation
- Integration of new technologies in clinical research
Senior Scientist, Takeda
After finishing his postdoctoral training in Harvard Med School, Genetics dept. Banerjee has been navigating through genomics research both at the biotech industry and Pharmaceutical organization setting and has gained significant knowledge of the various facets of drug development process. He spent a few years as a Program co-Lead and Tech R&D lead in several drug programs in Cellarity, Inc. There he lead a team of scientists to perform protocol optimization and data generation efforts using fit for purpose technology /tool-kits.
Currently, he’s the Genomics subject matter expert (SME) for preclinical and clinical drug development workflows at Takeda Pharmaceuticals as a member of the Biomarker Science and Technologies team in PTS. He is an advisor for the current programs for genomics assay development to ensure fit-for-purpose assay selection and development for clinical biomarker assessment in patients for Neuroscience, GI, Oncology and Rare disease programs.
Data quality challenges: how enabling data dogfooding helps to use and re-use data in AI Drug Discovery.
- Capturing data with model building / re-use in mind
- Mapping, harmonizing and combining internal and external data sets for analytics and ML
model training - The role of ontologies in data harmonization
- Levels of trust in quality of internal and external/public data
Senior Director Professional Services and Consulting, Corporate R&D, Elsevier
In Elsevier’s Professional Services team, Frederik leads the global consultancy practice on data
integration and analytics projects throughout the life science, chemistry and engineering domains using commercial, proprietary, and public data sources. He holds a doctorate in Chemical Physics from the University of Amsterdam / FOM Institute AMOLF and a master’s degree in Chemistry from Utrecht University.
Networking Lunch
1-2-1 Meetings / Networking Break
1-2-1 Meetings / Networking Break
Poster Session
Unlocking Biomedical Data for AI / ML using Large Language Models (LLMs)
- To effectively train predictive models in drug discovery, large volumes of clean and linked data are required, which can be a costly and time-consuming task to curate manually. As such, there is a growing need for automated curation processes that can accurately and efficiently label data at scale.
- Elucidata has developed a biocuration process that leverages domain-trained BERT-like models for a variety of information extraction tasks: Identification of cell type, cell line, tissue, disease, and other characteristics from unstructured biomedical datasets. This approach has shown promising results in improving the quality and efficiency of data curation.
- In this session, we will explore the specifics of Elucidata’s NLP-based biocuration and how it can help R&D teams make their data interoperable, enable data and metadata integration and generate model quality datasets for AI/ML use cases.
- As a bonus, we will also demonstrate our experiments with Open AI’s Chat-GPT and its potential in solving edge cases in biocuration.
Solutions Architect, Elucidata
Mya (my-uh, not me-uh) Steadman is a Bioinformatics Scientist at Elucidata, working with data scientists at therapeutic and diagnostic companies to identify and resolve roadblocks in their research due to unclean biomedical data. Before Elucidata, Mya was a metabolomics scientist studying Cancer and Autoimmune diseases at several biotech start-ups in Cambridge, Massachusetts, including Agios Pharmaceuticals. While at Agios, she had the chance to work with Elucidata’s founder Abhishek Jha, who taught her that the value of your results is determined by the quality of your data.
Afternoon Refreshments
Data quality, variability and integrity: how do we achieve it?
- What data are you using? What’s the most challenging problem for data quality in your job? How do you ensure high data quality?
- Key issues: messy public data/metadata, raw data are not available, data are not processed in a consistent way
Director of Computational Biology, Immunitas Therapeutics
Ming “Tommy” Tang is the Director of Computational Biology at Immunitas. Prior to joining Immunitas Tommy was at Dana-Farber Cancer Institute and Harvard University, where he led a team to analyze immune-oncology related single-cell sequencing datasets and spearheaded an NIH-funded project called Cancer Immunological Data Commons. Tommy has a wealth of experience as a computational biologist with over ten years in analyzing large-scale (epi)genomic/transcriptomic data and automating the analysis by using workflow languages such as Snakemake.. Prior to joining Dana-Farber, Tommy received his Ph.D. in Genetics and Genomics from the University of Florida and completed a three-year postdoc at MD Anderson. He has a keen interest in teaching computational skills to wet biologists and is a certified instructor for Data Carpentry.
Target Identification
Opening Address and Keynote Presentation
Building predictive models on rock and not quicksand: solid data foundations
Most if not all cutting-edge predictive models we see appear in Data Science and Life Science R&D are built on data. The challenge which is often overlooked is how to make sure the data foundations for these models are of high quality, reliable, normalized, and re-usable. This talk will focus on the experiences and processes needed in making both Elsevier and customer internal data “machine-learning ready”. Surprisingly, this often not only involves changes in data capture and modelling, but also changes in people and processes.
Senior Director Professional Services and Consulting, Corporate R&D, Elsevier
In Elsevier’s Professional Services team, Frederik leads the global consultancy practice on data integration and analytics projects throughout the life science, chemistry and engineering domains using commercial, proprietary, and public data sources. He holds a doctorate in Chemical Physics from the University of Amsterdam / FOM Institute AMOLF and a master’s degree in Chemistry from Utrecht University.
Use Of AI in integrating multi-dimensional datasets for target discovery
- Areas where AI is applied in drug target selection
- Role of AI in integration and analysis of multi-OMICS data
- Machine learning for gene prioritization using genetic evidence data
- Application of AI in literature analysis
- Challenges and limitations in using AI methods in target discovery
Associate Director, Computational Biology & Data Sciences, Lexicon Pharmaceuticals
Lakshmi Kuttippurathu is an accomplished interdisciplinary scientist with 15+ years of experience in Computational Biology. During her research career at Harvard MIT Health Science and Technology, Lakshmi developed a computational tool to study regulatory dynamics of transcription factors. As a postdoctoral fellow and later as a Faculty at Thomas Jefferson University, she contributed to the field of Liver regeneration and Neuroscience with focus on understanding regulatory network dynamics driven by perturbations, using Systems Biology approach. Currently, she is the Associate Director, Computational Biology and Data Sciences at Lexicon Pharmaceuticals, where she is spearheading the computational efforts on developing and implementing a strategy for preclinical target discovery.
Disrupting Disease Landscapes: AI-Driven Insights from Large Biomedical Datasets
- AI’s impact on complex disease research: Discuss how artificial intelligence and advanced machine learning techniques transform human health, genetic, and rich phenotypic data analysis.
- Shifting to longitudinal studies: Explore the integration of AI and dynamic systems theory into very large models of human health to enable comprehensive causal inference, understanding of time scales, reversibility, and intricate relation between aging and complex diseases.
- Connecting molecular biology and disease understanding: Examine the role of real-world, longitudinal data in deepening our understanding of complex diseases and informing treatment strategies.
- Advancing drug discovery and development: Assess the prospects for novel therapeutic targets, emerging therapeutic modalities, and generative chemistry-driven pipelines that lead to groundbreaking drugs and improved healthcare.
Chief Executive Officer and Co-founder, Gero
Ph.D. from the University of Amsterdam. Co-founder of Gero, a data-driven longevity biotech company, that develops new drugs against aging and other complex diseases using AI-platform. An author of 75+ published papers in multiple domain areas, including publications in Science and Nature Communications.
13:55 – 14:25
15:00 – 16:00
Emerging modeling approaches to identify the covalently druggable targets
- To predict cryptic pocket and protein conformation states in high throughput
- To evaluate residue reactivity accurately & efficiently
- Modeling strategy to mine new targets
Principal Scientist, Frontier Medicines
Dr. Yuan Hu is a Computational Chemist in Frontier Medicines at Boston site. Before that, he worked at Alkermes and Merck with molecules he designed being advanced into clinical trial. He has been a chemistry team leader and has substantial modeling knowledges in new target identification, hit finding, lead generation, and lead optimization with expertise in molecular dynamics, FEP/TI, generative modeling, reinforcement learning, quantum mechanics, and workflow automation for both small molecule and biologics drug discovery. He received his Ph.D. in Computational Chemistry from University of Delaware. Additionally, he has a M.S. Degree in Organic Chemistry.
16:15 – 17:15
Exploring key issues in drug target identification
- Complexity of disease mechanisms
- Lack of understanding of disease biology
- Limited availability of suitable drug targets
- High failure rate
- Ethical and legal considerations
Data Scientist, Novartis
Andac Demir received the B.S. degree in Electrical Engineering from Tufts University, Medford, MA, USA in 2017, and the Ph.D. degree in Electrical Engineering with Northeastern University, Boston, MA. He worked as a Research Intern at Schlumberger-Doll Research Center in Cambridge, MA, and in 2020, he worked as a Research Intern at Mitsubishi Electric Research Laboratories (MERL) in Cambridge, MA. In 2022, he joined Novartis Institutes for BioMedical Research. Andac Demir has published several papers on topics such as ligand-based virtual screening, molecular property prediction, and tumor segmentation in top-tier AI/ML conferences including NeurIPS, ICLR, and CVPR. His research interests include geometric deep learning, topological data analysis, graph convolution, and adversarial robustness.
Lead Generation
Opening Address & Keynote Presentation
Building predictive models on rock and not quicksand: solid data foundations
Most if not all cutting-edge predictive models we see appear in Data Science and Life Science R&D are built on data. The challenge which is often overlooked is how to make sure the data foundations for these models are of high quality, reliable, normalized, and re-usable. This talk will focus on the experiences and processes needed in making both Elsevier and customer internal data “machine-learning ready”. Surprisingly, this often not only involves changes in data capture and modelling, but also changes in people and processes.
Senior Director Professional Services and Consulting, Corporate R&D, Elsevier
In Elsevier’s Professional Services team, Frederik leads the global consultancy practice on data integration and analytics projects throughout the life science, chemistry and engineering domains using commercial, proprietary, and public data sources. He holds a doctorate in Chemical Physics from the University of Amsterdam / FOM Institute AMOLF and a master’s degree in Chemistry from Utrecht University.
Applying AI methods to lead generation
How can we effectively navigate the billions of molecules available through synthesis on demand?
Does generative chemistry provide a reliable route to hit identification?
Can deep learning impact structure-based virtual screening?
How will protein structure prediction impact hit identification?
How can machine learning interact with methods like DEL screening?
Chief Data Officer, Relay Therapeutics
Pat Walters is Chief Data Officer at Relay Therapeutics in Cambridge, MA. Prior to joining Relay, he spent more than 20 years at Vertex Pharmaceuticals where he was Global Head of Modeling & Informatics. Pat is the 2023 recipient of the Herman Skolnik Award for Chemical Information Science from the American Chemical Society. He is a member of the editorial advisory boards for the Journal of Medicinal Chemistry and the Journal of Chemical Information and Modeling. He is co-author of the book “Deep Learning for the Life Sciences,” published in 2019 by O’Reilly and Associates. Pat received his Ph.D. in Organic Chemistry from the University of Arizona where he studied the application of artificial intelligence in conformational analysis.
Molecular design cycle: computation-first approach
- How can computational methods be used to generate novel lead molecules for drug discovery?
- What are some of the most effective strategies for identifying potential lead molecules using in silico techniques?
- How can machine learning and artificial intelligence be applied to the lead molecule generation process in drug discovery?
- What are some of the current limitations and challenges in using computational tools for lead molecule generation in drug discovery?
- How can we integrate experimental data and validation into computational lead molecule generation to improve hit-to-lead conversion rates?
Head of Data Sciences and Machine Learning, Psivant Therapeutics
Shivam Patel is an experienced and skilled professional with a background in Data Science and experience working at the interface of computation and laboratory in drug discovery. He holds a Master’s degree in Data Science from Northeastern University and has over three years of industry experience. In January 2020, Mr. Patel joined Silicon Therapeutics as a Research Associate, where he was part of the team that discovered the clinical compound SNX281 STING agonist. He later joined Roivant Sciences as Senior Data Scientist following the acquisition of Silcontx. Currently, he holds the position of Head of Data Sciences and Machine Learning at Psivant Therapeutics, where he is responsible for generating and evaluating large libraries of synthesizable small molecules.
Poster Session
Fast and Affordable: An AI-based platform for high-throughput virtual screening and molecule generation with cloud scalability
Our validated fast screening platform utilizes cutting-edge technologies to enable high-throughput virtual screening of billions of molecules in just two weeks at a low cost in the AWS cloud. The platform leverages a representative subset of the database to dock against a given target and is further validated with the PELE Monte Carlo algorithm. We use this data to train a surrogate model that can efficiently predict binding scores for the entire database. The platform incorporates an active learning approach optimized for retraining the model, enabling it to explore larger regions of the space more accurately. In addition, two generative models, one designed to generate great molecular diversity and another capable of producing high-quality variations of the best molecules offer alternatives to already saturated IP spaces. This platform offers a powerful tool for efficient screening of large databases and holds promise for accelerating drug discovery.
Director of AI, Nostrum Biodiscovery
With a research background in protein folding and drug-drug interaction prediction, he joined Barcelona Supercomputing Center in 2019 to work with Prof. Victor Guallar on developing generative AI algorithms for molecules and proteins. As the current AI research lead at Nostrum Biodiscovery, his focus is on generative AI and ultra-large database screenings for drug discovery
Bridging the gap between biological nails and AI hammers
- How do you identify the parts of your pipeline where AI has the potential to address unmet scientific needs?
- To what extent do your digital and bench teams collaborate on joint priorities?
- What ways have you found to explore and validate unproven digital approaches before investing heavily in them?
- How do you balance your data teams’ time between focused work and communication with wet lab colleagues?
Principal Data Scientist, Moderna
Eric is a Principal Data Scientist at Moderna, leading the Data Science and AI (Research) team. Previously, he worked at Novartis Institutes for Biomedical Research and was an Insight Health Data Fellow. He completed his doctoral thesis in Biological Engineering at MIT. Eric is an open-source software developer, leading projects like pyjanitor and nxviz, and contributes to various projects. His personal motto is from Luke 12:48.
Lead Optimization
Opening Address & Keynote Presentation
Building predictive models on rock and not quicksand: solid data foundations
Most if not all cutting-edge predictive models we see appear in Data Science and Life Science R&D are built on data. The challenge which is often overlooked is how to make sure the data foundations for these models are of high quality, reliable, normalized, and re-usable. This talk will focus on the experiences and processes needed in making both Elsevier and customer internal data “machine-learning ready”. Surprisingly, this often not only involves changes in data capture and modelling, but also changes in people and processes.
Senior Director Professional Services and Consulting, Corporate R&D, Elsevier
In Elsevier’s Professional Services team, Frederik leads the global consultancy practice on data integration and analytics projects throughout the life science, chemistry and engineering domains using commercial, proprietary, and public data sources. He holds a doctorate in Chemical Physics from the University of Amsterdam / FOM Institute AMOLF and a master’s degree in Chemistry from Utrecht University.
09:05 – 10:15
- Extremely large sequences space to explore
- How to co-optimize multiple properties
- More rounds for experimental characterization or more sequence to screen
- How to avoid immunogenicity
Head of AI Innovation for Antibody, Sanofi
Yu joined Sanofi in March 2017 as a Senior Scientist in the Protein Engineering group of Biologics Research. Yu earned his PhD in Biochemistry in 2012, where he mainly used NMR to study protein structures and dynamics. Then he moved to St. Jude Children’s Research Hospital as a postdoc where he studied a Ubiquitin-like protein conjugation cascade in autophagy by biochemistry, enzymology, crystallography and EM. At Sanofi, he solved structures of antibody-antigen complexes by crystallography and cryoEM and provides structure-based antibody engineering support to projects in multiple therapeutic areas. Since 2021, he has led efforts focusing on in silico antibody engineering and de novo design mainly by physics-based and ML-based methods.
12:20 – 13:20
Networking Lunch
13:20 – 13:50
15:00 – 16:00
Analyzing the challenge of in vivo PK profiles prediction
- Accurately predicting binding affinity. Docking is a static approximation whose results are often far from reality. Dynamics approaches are much more expensive and should result in better estimates. Unfortunately, often they don’t.
- Generalizable predictive models have their limitations in terms of accuracy. Still, they have a wider applicability domain than specific models trained on a subset of chemotypes that are more accurate but have limited applicability. It relates not only to ML-based models but docking as well.
- A need for activity/ADMET data for a particular target/chemotype to train a suitable model.
Head of AI Platforms, Insilico Medicine
Petrina Kamya is currently Head of AI Platforms at InSilico Medicine and President of Insilico Medicine Canada Inc., Prior to joining InSilico, Petrina worked in Market Access, and computer-aided drug discovery and development at Certara and Chemical Computing Group respectively. Petrina holds a Ph.D., in Computational Chemistry and a BSc in Biochemistry.
16:00 – 16:15
16:15 – 17:15
Challenges and approaches to building deep learning models for antibody lead optimization
- Data capture & processing
- Training models with on small data datasets & limited resources
- Making predictions on assays and processes that have large variations
- Machine learning approach to optimizing experiment designs
Senior Scientist, Global Biologics Discovery, Abbvie
Xin is a senior scientist at Abbvie. Research in her group focuses on building the solutions from assay design, data capture, and machine learning, such as cell line engineering & assay development to support antibody screening funnels, developing full-stack applications for capture & processing of unstructured data from assays, and optimization of architecture to facilitate rapid development of ML models.
Drug Response Prediction
Opening Address & Keynote Presentation
Building predictive models on rock and not quicksand: solid data foundations
Most if not all cutting-edge predictive models we see appear in Data Science and Life Science R&D are built on data. The challenge which is often overlooked is how to make sure the data foundations for these models are of high quality, reliable, normalized, and re-usable. This talk will focus on the experiences and processes needed in making both Elsevier and customer internal data “machine-learning ready”. Surprisingly, this often not only involves changes in data capture and modelling, but also changes in people and processes.
Senior Director Professional Services and Consulting, Corporate R&D, Elsevier
In Elsevier’s Professional Services team, Frederik leads the global consultancy practice on data integration and analytics projects throughout the life science, chemistry and engineering domains using commercial, proprietary, and public data sources. He holds a doctorate in Chemical Physics from the University of Amsterdam / FOM Institute AMOLF and a master’s degree in Chemistry from Utrecht University.
Can machine learning methods provide insights and predictions for cancer drug response?
- Patient stratification – from cox regression to deep learning – What methods/strategy should we develop to maximize the utilization of data (public and internal) to find predictive biomarkers and predict response/resistance?
- Integration of clinical, demographic and molecular features – from patient health to cancer biology – what challenges do we face and how should we approach them?
- The flood of molecular data coming from novel high-throughput technologies – how do we make best use of them to inform drug discovery? How do we choose which one to use and when?
Senior Director, Head of Data Science and AI, Early Oncology, AstraZeneca
Etai is an R&D senior director with more than 15 years of experience in developing and applying computational biology and ML/AI in drug discovery and early clinical development. His group at AZ specializes in data science and ML/AI research for ‘omics and clinical bioinformatics for drug discovery, translational medicine, and early clinical trials. Etai holds a PhD in Computational Biology from Bar-Ilan University and the Weizmann Institute of Science (in collaboration), MSc in Computational Biology and BA in Computer Science.