How to Scrape FDA Drug Approval Data with Python
Phyo Phyo Kyaw Zin » Cheminformatics
by Zin
4M ago
Personal Update: Before we embark on today’s tutorial, I wanted to share a personal update with you that sheds light on my recent hiatus from blogging. In the past few months, I’ve been immersed in the world of motherhood, cherishing precious moments with my newborn, Ellie. As I transition back to my role in the biotech world, I find myself navigating the delicate balance of work and motherhood. It’s been a journey filled with new challenges, learning experiences, and a deep sense of fulfillment. As I continue to share coding insights related to data science, visualization and Cheminformatics ..read more
Visit website
How to Merge Multiple Datasets with Pandas and Python – Part 1
Phyo Phyo Kyaw Zin » Cheminformatics
by Zin
2y ago
Today’s tutorial is on how to merge multiple datasets using the Pandas library in python. We will add new columns based on a key column, and we will also aggregate information for the same column names from various datasets. I have made five sample datasets (A1.csv, A2.csv, A3.csv, A4.csv, A5.csv) that we will be merging. The code and the data can be found in this GitHub repository. I have organized the five datasets in this “to-merge” folder. Each dataset contains an ID column (key column on which we will be merging different datasets) unique columns (other datasets don’t have that column) c ..read more
Visit website
How To Curate Chemical Data for Cheminformatics
Phyo Phyo Kyaw Zin » Cheminformatics
by Zin
2y ago
In early 2021, I gave a talk at the MIDD+ Conference held by Simulations Plus Inc. on data curation using one of the projects that I worked on — the Madin-Darby Canine Kidney (MDCK) project. In this blog post, I will be focusing on the general data curation aspects of that project. Let me emphasize why data curation is an essential step in Cheminformatics. A machine learning model can only be as good as the data it is built on. If the data is noisy, filled with activity cliffs, and full of mistakes, you won’t get any useful model. Often, cheminformatics datasets aren’t quite large, like thousa ..read more
Visit website
How to Box Plot with Python
Phyo Phyo Kyaw Zin » Cheminformatics
by Zin
2y ago
This blog post is for readers as well as myself. In this tutorial, I will show how to make different types of boxplots including horizontal, vertical, grouped boxplots, and interactive ones. It’s not meant to be comprehensive. It’s just a collection of different styles and visualizations that I like. For the code, you will need the following python libraries: pandas, NumPy, Plotly, Matplotlib, and seaborn. They all can be installed with either pip or conda. I will be using fake data to show different types of boxplots. Normally, I would create a conda environment and install these required lib ..read more
Visit website
Nested Cross-Validation & Cross-Validation Series – Part 2B
Phyo Phyo Kyaw Zin » Cheminformatics
by Zin
2y ago
Please check out the previous blog posts from this series if you haven’t done so already: Part 1 algorithm for k-fold Cross-Validation Part 2A of the Nested Cross-Validation & Cross-Validation Series where I went through a python tutorial on implementing k-fold CV regressors using random forest (RF) from scikit-learn with a simple cheminformatics dataset with descriptors and endpoints of interest. Here in Part 2B, I will cover the python tutorial for the dataset containing high-dimensional matrices where each matrix represents features of a chemical structure (this is taken from one o ..read more
Visit website
Nested Cross-Validation & Cross-Validation Series – Part 3
Phyo Phyo Kyaw Zin » Cheminformatics
by Zin
2y ago
This is part 3 of the Nested Cross-Validation & Cross-Validation Series where I will explain the algorithm of nested cross-validation (NeCV), and compare Cross-Validation and NeCV. Please read this blog first if you need to learn about cross-validation so that you can dive into NeCV after. I would like to first clarify that there are variations in the implementations and algorithms of NeCV. The algorithm I will be describing is a common one used in several studies including an article I published in 2020. Below is a diagram illustrating the algorithm of NeCV. (taken from https://pubs.acs.o ..read more
Visit website
Nested Cross-Validation & Cross-Validation Series – Part 1
Phyo Phyo Kyaw Zin » Cheminformatics
by Zin
2y ago
A few people have asked me to explain and share the code for Nested Cross-Validation. I think it makes sense for me to explain the basics of whats and whys in using the NeCV first before diving into the code, so I will be covering these topics in four separate blog posts. For part 1, I will explain the algorithm for Cross-Validation (k-fold). For part 2, I will explain how to implement the k-fold cross-validation algorithm in python with tutorials using two cheminformatics datasets (A) a simple dataset with descriptors and endpoints of interest, and (B) high-dimensional matrices where each mat ..read more
Visit website
Nested Cross-Validation & Cross-Validation Series – Part 2A
Phyo Phyo Kyaw Zin » Cheminformatics
by Zin
2y ago
This is part 2A of the Nested Cross-Validation & Cross-Validation Series. I will go through a python tutorial on implementing k-fold CV regressors using random forest (RF) from scikit-learn with the first dataset: (A) a simple cheminformatics dataset with descriptors and endpoints of interest. In Part 2B, I will cover the same python tutorial for the second dataset: (B) high-dimensional matrices where each matrix represents features of a chemical structure (this is taken from one of my Ph.D. projects; MD-QSAR with Imatinib derivatives). Please check out part 1 of this series to learn more ..read more
Visit website
A
by
ago
A ..read more
Visit website
A
by
ago
A ..read more
Visit website

Follow Phyo Phyo Kyaw Zin » Cheminformatics on FeedSpot

Continue with Google
Continue with Apple
OR