Generative Molecular Design Isn't As Easy As People Make It Look
Practical Cheminformatics
by Pat Walters
2M ago
I was taken aback by a recent CNBC article entitled “Generative AI will be designing new drugs all on its own in the near future”.  I should know better than to pay attention to AI articles in the popular press, but I feel that even scientists working in drug discovery may have a skewed perception of what generative AI can and can’t do.  To understand exactly what’s involved, it might be instructive to walk through a typical generative molecular design workflow and point out a few things.  First, these programs are far from autonomous.  Even when presented with a well-de ..read more
Visit website
AI in Drug Discovery - A Highly Opinionated Literature Review (Part III)
Practical Cheminformatics
by Pat Walters
6M ago
The third post in this series is a collection of review articles published in 2023 that I found helpful.  Property Prediction Machine Learning Methods for Small Data Challenges in Molecular Science https://pubs.acs.org/doi/full/10.1021/acs.chemrev.3c00189 Practical guidelines for the use of gradient boosting for molecular property prediction https://jcheminf.biomedcentral.com/articles/10.1186/s13321-023-00743-7 Application of message passing neural networks for molecular property prediction https://www.sciencedirect.com/science/article/pii/S0959440X23000908?via%3Dihub Molecular Similar ..read more
Visit website
AI in Drug Discovery - A Highly Opinionated Literature Review (Part II)
Practical Cheminformatics
by Pat Walters
7M ago
  Picking up where we left off in Part I, this post covers several other ML in drug discovery topics that interested me in 2023.  Some areas, like large language models, are new, and most of the work is at the proof-of-concept stage.  Others, like active learning, are more mature, and several groups are starting to explore nuances of the methods.   Here’s the structure of Part II.  4. Large Language Models 5. Active Learning 6. Federated Learning 7. Generative Models 8. Explainable AI 9. Other Stuff 4. Large Language Models The emergence of GPT-4 and ChatGPT brou ..read more
Visit website
AI in Drug Discovery 2023 - A Highly Opinionated Literature Review (Part I)
Practical Cheminformatics
by Pat Walters
7M ago
Here’s the first part of my review of some interesting machine learning (ML) papers I read in 2023.  As with the previous editions, this shouldn’t be considered a comprehensive review.  The papers covered here reflect my research interests and biases, and I’ve certainly overlooked areas that others consider vital.  This post is pretty long, so I've split it into three parts, with parts II and III to be posted in the next couple of weeks.    I. Docking, protein structure prediction, and benchmarking II. Large Language Models, active learning, federated learning, genera ..read more
Visit website
Some Thoughts on Biotech vs Pharma for Computational Chemists
Practical Cheminformatics
by Pat Walters
8M ago
A recent editorial by Dean Brown in J Med Chem and follow-up posts by Keith Hornberger and Derek Lowe prompted me to think about how we train computational chemists and cheminformaticians for careers in drug discovery. It also brought to mind some unique differences between how computational chemistry is practiced in biotech and pharma. For those who haven’t read Dean Brown’s editorial and the subsequent reactions, I’d highly recommend them. In short, the authors focused on how medicinal chemists were trained in the past and how biotech and the growth of outsourcing are changing that mode ..read more
Visit website
Comparing Classification Models - You’re Probably Doing It Wrong
Practical Cheminformatics
by Pat Walters
8M ago
In my last post, I discussed benchmark datasets for machine learning (ML) in drug discovery and several flaws in widely used datasets.  In this installment, I’d like to focus on how methods are compared.  Every year, dozens, if not hundreds, of papers present comparisons of ML methods or molecular representations.  These papers typically conclude that one approach is superior to several others for a specific task.  Unfortunately, in most cases, the conclusions presented in these papers are not supported by any statistical analysis. I thought providing an example demons ..read more
Visit website
We Need Better Benchmarks for Machine Learning in Drug Discovery
Practical Cheminformatics
by Pat Walters
1y ago
Most papers describing new methods for machine learning (ML) in drug discovery report some sort of benchmark comparing their algorithm and/or molecular representation with the current state of the art.  In the past, I’ve written extensively about statistics and how methods should be compared.  In this post, I’d like to focus instead on the datasets we use to benchmark and compare methods.  Many papers I’ve read recently use the MoleculeNet dataset, released by the Pande group at Stanford in 2017, as the “standard” benchmark.   This is a mistake.  In this post ..read more
Visit website
A Simple Tool for Exploring Functional Group Filters
Practical Cheminformatics
by Pat Walters
1y ago
 When working in drug design, we often need filters to identify molecules containing functional groups that may be toxic, reactive, or could interfere with an assay.  A few years ago, I collected the functional group filters available in the ChEMBL database and wrote some Python code that made applying these filters to an arbitrary set of molecules easy.  This functionality is available in the pip installable useful_rdkit_utils package that's available on PyPI and GitHub.  Applying these filters is easy.  If we have a Pandas dataframe with a SMILES column, we can do so ..read more
Visit website
Getting Real with Molecular Property Prediction
Practical Cheminformatics
by Pat Walters
1y ago
IntroductionIf you believe everything you read in the popular press, this AI business is easy. Just ask ChatGPT, and the perfect solution magically appears. Unfortunately, that's not the reality. In this post, I'll walk through a predictive modeling example and demonstrate that there are still a lot of subtleties to consider. In addition, I want to show that data is critical to building good machine learning (ML) models. If you don't have the appropriate data, a simple empirical approach may be better than an ML model.  A recent paper from Cheng Fang and coworkers at Biogen presents pro ..read more
Visit website
Using Counterfactuals to Understand Machine Learning Models
Practical Cheminformatics
by Pat Walters
1y ago
While machine learning (ML) models have become integral to many drug discovery efforts, most of these models are "black boxes" that don't explain their predictions.  There are several reasons we would like to be able to explain a prediction.  Provide scientific insights that will guide the design of new compounds.  Instill confidence among team members.  As I've said before, a computational chemist only has two jobs; to convince someone to do an experiment and to convince someone not to do an experiment.  These jobs are much easier when you can explain the "why" beh ..read more
Visit website

Follow Practical Cheminformatics on FeedSpot

Continue with Google
Continue with Apple
OR