Pitfalls of using Pearson’s correlation for comparing model performance
Oxford Protein Informatics Group
by Oliver Crook
1w ago
Pearson’s R (correlation coefficient) is a measure of the linear correlation between two variables, giving a value between -1 and 1, where 1 is total positive linear correlation, 0 is no linear correlation, and -1 is total negative linear correlation. While it’s a useful statistic for understanding the relationship between two variables, it is often used to compare the performance of two or more models. For example, imagine we had experimental values that we are predicting and several models’ predictions. Obviously, we would prefer the model with the highest Pearson’s R … or perhaps not? Pears ..read more
Visit website
Open Source PyMOL installation on Windows
Oxford Protein Informatics Group
by Yael Ziv
2w ago
A year ago, I used Gheorghe Rotaru’s helpful blog post to install PyMOL. Unfortunately, after resetting my computer, I have just discovered that some of the links are broken. Here are the installation steps with new links provided by Christoph Gohlke, who generously offers pre-compiled Windows versions of the latest PyMOL software along with all its requirements. Install the latest version of Python 3 for Windows: Download the Windows Installer (x-bit) for Python 3 from their website, with x being your Windows architecture – 32 or 64. Follow the instructions provided on how to ..read more
Visit website
Optimising for PR AUC vs ROC AUC – an intuitive understanding
Oxford Protein Informatics Group
by Lewis Chinery
2w ago
When training a machine learning (ML) model, our main aim is usually to get the ‘best’ model out the other end in an unbiased manner. Of course, there are other considerations such as quick training and inference, but mostly we want to be good at predicting the right answer. A number of factors will affect the quality of our final model, including the chosen architecture, optimiser, and – importantly – the metric we are optimising for. So, how should we pick this metric? A recent preprint from McDermott et al.[1] highlights that optimising an ML model for the areas under the precision-recall c ..read more
Visit website
3 approaches to linear-memory Transformers
Oxford Protein Informatics Group
by Isaac Ellmen
1M ago
Transformers are a very popular architecture for processing sequential data, notably text and (our interest) proteins. Transformers learn more complex patterns with larger models on more data, as demonstrated by models like GPT-4 and ESM-2. Transformers work by updating tokens according to an attention value computed as a weighted sum of all other tokens. In standard implentations this requires computing the product of a query and key matrix which requires O(N2d) computations and, problematically, O(N2) memory for a sequence of length N and an embedding size of d. To speed up Transformers, and ..read more
Visit website
Fail fast
Oxford Protein Informatics Group
by Arun Raja
1M ago
While scrolling through my Instagram reels feed, I came across a reel of Jensen Huang, NVIDIA’s CEO, talking about the need to fail fast, which motivated me to write a post. ‘Fail fast’ is a recent piece of advice I have been hearing since I embarked on my PhD; fail fast on the research directions that we plan to pursue so that we can understand the difficulties and limitations of the research problems and methods used which will in turn give us more time to finetune our problem and develop more nuanced approaches. Since childhood, most of us have been taught that failures eventually lead to s ..read more
Visit website
Making your figures more accessible
Oxford Protein Informatics Group
by Fabian Spoendlin
1M ago
You might have created the most esthetic figures for your last presentation with a beautiful colour scheme, but have you considered how these might look to someone with colourblindness? Around 5% of the gerneral population suffer from some kind of color vision deficiency, so making your figures more accessible is actually quite important! There are a range of online tools that can help you create figures that look great to everyone. Colourblindness simulators are a useful tools to check how your figures would look to people with different types of colour vision deficiencies. There are many too ..read more
Visit website
Plotext: The Matplotlib Lookalike That Breaks Free from X Servers
Oxford Protein Informatics Group
by Ruben Sanchez-Garcia
1M ago
Imagine this: you’ve spent days computing intricate analyses, and now it’s time to bring your findings to life with a nice plot. You fire up your cluster job, scripts hum along, and… matplotlib throws an error, demanding an X server it can’t find. Frustration sets in. What a waste of computation! What happened? You just forgot to add the -X to your ssh command, or it may be just that X forwarding is not allowed in your cluster. So you will need to rerun your scripts, once you have modified them to generate a file that you can copy to your local machine rather than plotting it directly. But wai ..read more
Visit website
In defence of chaos
Oxford Protein Informatics Group
by Gabriel Abrahams
1M ago
I commend you on your skepticism, but even the skeptical mind must be prepared to accept the unacceptable when there is no alternative. If it looks like a duck, and quacks like a duck, we have at least to consider the possibility that we have a small aquatic bird of the family Anatidæ on our hands. Douglas Adams It’s not every day that someone recommends a new whizzbang note-taking software. It’s every second day, or third if you’re lucky. They all have their bells and whistles: Obsidian turns your notes into a funky graph that pulses with information, the web of complexity of your stored kno ..read more
Visit website
Working with PDB Structures in Pandas
Oxford Protein Informatics Group
by Benjamin McMaster
1M ago
Pandas is one of my favourite data analysis tools working in Python! The data frames offer a lot of power and organization to any data analysis task. Here at OPIG we work with a lot of protein structure data coming from PDB files. In the following article I will go through an example of how I use pandas data frames to analyze PDB data. Tools for the job There are a couple of available tools to load PDB data into a pandas dataframe. BioPandas is an open source library for this very task and makes the process easy. I also have developed my own library, PythonPDB, that can load PDB files into pan ..read more
Visit website
Navigating the world of GNN layers with PyTorch Geometric
Oxford Protein Informatics Group
by Ísak Valsson
1M ago
Data can often naturally be represented in a graph format and being able to directly employ a deep learning architecture on that data without finding a different representation is an appealing idea. Graph neural networks (GNNs) have become a standard part of the ML toolbox but navigating the world of different architectures available out-of-the-box can be a daunting task. A great place to start looking for architectures is with PyTorch Geometric, which provides an extensive list of readily available GNN layers and tutorials on how to use them in your standard PyTorch models. There are many thi ..read more
Visit website

Follow Oxford Protein Informatics Group on FeedSpot

Continue with Google
Continue with Apple
OR