Noel O'Blog on Feedspot

Threading time through Vortex

Noel O'Blog

by Noel O'Boyle

2y ago

Vortex (a chemical spreadsheet/visualisation software from Dotmatics) has a plugin system built around Jython. Simply drop a .vpy file into a specific scripts folder, and a menu item immediately appears in the application. Here are some notes on using this to communicate with a webserver. Code organisation I found it best to separate Vortex-specific code (in the .vpy files) from supporting code that could be written and tested independently. This also naturally enables reuse of code across plugins. This supporting code I put in a folder adjacent to the scripts folder, and accessed it as follow ..read more

Visit website

Finding matched pairs of a peptide at the RDKit UGM

Noel O'Blog

by Noel O'Boyle

3y ago

The recent RDKit UGM was a masterclass in how to organise a conference virtually, successfully replicating at least some of the in-person experience. This was due to the extensive use of Discord (best known as a chat server for gamerz) to manage questions, answers, discussion and networking, but also the technical support for Discord (thanks to Floriane Montanari) and Zoom (thanks to Knime - sorry I can't remember the name). With previous virtual meetings I have attended, the meeting only had an existence while someone was speaking; here discussions filled the interims between, and indeed the ..read more

Visit website

Comparing methods two-by-two

Noel O'Blog

by Noel O'Boyle

3y ago

It is common to compare different methods using results from N distinct datasets. My earlier blogpost described why the mean rank is not a good measure of performance in these cases. Essentially, the relative performance of two methods (e.g. A and B) can be altered based on the performance of other methods (e.g. C, D and E). But it's not just the mean rank that's the problem. It's the use of any performance measure where the assessment of the pairwise performance (e.g. between methods A and B) can be altered by the performance of other methods. At the recent (virtual) AI in Chemistry Meeting ..read more

Visit website

Reflecting on stereochemistry

Noel O'Blog

by Noel O'Boyle

4y ago

Stereochemistry is a tricky concept. And writing programs to handle it is equally tricky. Take tetrahedral stereochemistry for example; it's 'interesting' to note that none of the following markers of parity use the same system: R/S in IUPAC names, D/L in biochemical nomenclature, @/@@ in SMILES, atom parity flag 1/2 in MOL files, or InChI's +/-. Which is why a sane cheminformatics library abstracts the representation of stereochemistry away from any one format to one which can be more easily reasoned about and interconverted to all. Here's the introduction to a description I wrote up last we ..read more

Visit website

No charge - A simple approach to neutralising charged molecules

Noel O'Blog

by Noel O'Boyle

4y ago

There are several reasons you might want to convert a molecule with charges to its neutral form. For example, as a step in standardising a representation for registration, to generate a parent form during registration, to deduplicate vendor-supplied databases, or to simplify SMARTS pattern searches for a particular ionizable group. Here I describe a simple procedure to generate the neutral form of a molecule, and briefly compare it to existing approaches. While a more nuanced approach may be more appropriate to prepare molecules for registration, the described method appears to be suitable for ..read more

Visit website

Comparing chemical spaces

Noel O'Blog

by Noel O'Boyle

4y ago

In a recent blog post, Pat Walters describes using visualisation of chemical space to answer questions about the overlap and complementarity of different chemical datasets. His introductory paragraph gives two examples of where this is useful: considering the purchase of an additional set of screening compounds, and secondly comparing training and test sets. Now usually I'd be the first to reach for a visualisation technique to answer a question, but here I'm going to argue that at least for the first example, where the within-set diversity is high, this might not be the best approach. I woul ..read more

Visit website

Open Babel 3.0 released

Noel O'Blog

by Noel O'Boyle

4y ago

As announced by Geoff on the mailing list, Open Babel 3.0 is now available for download: I'm pleased to announce the release of Open Babel 3.0.0 [finally]: This release represents a major update and is strongly recommended for all users. It also removes deprecated components and breaks the API in a few places. For information on migrating from the previous version, please see: https://open-babel.readthedocs.io/en/latest/UseTheLibrary/migration.html#migrating-to-3-0 We intend to move to semi-annual releases in Spring and Fall, with bug fix releases as needed. A sample of major new features ..read more

Visit website

Least Publishable Unit #3 - Which similarity metric is best?

Noel O'Blog

by Noel O'Boyle

4y ago

Following on from #1 and #2, here's some work I began (maybe twice) but never completed. Back in 2016 I published a paper entitled "Which fingerprint is best?" - actually, no, someone talked me down into the title "Comparing structural fingerprints using a literature-based similarity benchmark", a title which accurately describes what we did, but not why we did it. Anyway, it's work I'm very proud of - for example, it shows there is a detectable performance difference between 16384 and 4096 bit ECFP fingerprints, and that the fingerprints most appropriate for finding very close analogs are di ..read more

Visit website

Mutation testing partialsmiles

Noel O'Blog

by Noel O'Boyle

4y ago

Many Python programmers will be familiar with the concept of achieving high test coverage (e.g. via an analysis tool such as coverage.py), but they should also know that high coverage does not mean that everything is bug free. So where can you go from there? One option is to use mutation testing. This is based around the concept that at least one test should fail if the code is "mutated". Now, mutated is a bit vague but consider changing a plus to a minus, or a "break" to a "continue" in a Python script. If your tests still all pass, it means that they don't sufficiently cover all scenarios ..read more

Visit website

Partialsmiles - A validating parser for partial SMILES

Noel O'Blog

by Noel O'Boyle

4y ago

I've just released version 1.0 of partialsmiles, which is available via pip. Documentation can be found here and the project is hosted by GitHub. partialsmiles is a Python library that provides a validating SMILES parser with support for partial SMILES. The parser checks syntax, ability to kekulize aromatic system, and checks whether atoms’ valences appear on a list of allowed valences. The main use of the library is to accept or reject SMILES strings while they are being generated character-by-character by a machine-learning method. Previously the only way to check such strings was after th ..read more

Visit website

Follow Noel O'Blog on FeedSpot