Phyo Phyo Kyaw Zin » Cheminformatics on Feedspot

Phyo Phyo Kyaw Zin » Cheminformatics

19 FOLLOWERS

I am Phyo Phyo. I will share some tips on Cheminformatics techniques in the context of drug discovery. I am passionate about Cheminformatics. My research areas include molecular informatics, computer-aided drug design, machine learning (Quantitative Structure-Activity Relationship; QSAR modeling), and molecular docking.

How to Scrape FDA Drug Approval Data with Python

Phyo Phyo Kyaw Zin » Cheminformatics

by Zin

4M ago

Personal Update: Before we embark on today’s tutorial, I wanted to share a personal update with you that sheds light on my recent hiatus from blogging. In the past few months, I’ve been immersed in the world of motherhood, cherishing precious moments with my newborn, Ellie. As I transition back to my role in the biotech world, I find myself navigating the delicate balance of work and motherhood. It’s been a journey filled with new challenges, learning experiences, and a deep sense of fulfillment. As I continue to share coding insights related to data science, visualization and Cheminformatics ..read more

Visit website

How to Merge Multiple Datasets with Pandas and Python – Part 1

Phyo Phyo Kyaw Zin » Cheminformatics

by Zin

2y ago

Today’s tutorial is on how to merge multiple datasets using the Pandas library in python. We will add new columns based on a key column, and we will also aggregate information for the same column names from various datasets. I have made five sample datasets (A1.csv, A2.csv, A3.csv, A4.csv, A5.csv) that we will be merging. The code and the data can be found in this GitHub repository. I have organized the five datasets in this “to-merge” folder. Each dataset contains an ID column (key column on which we will be merging different datasets) unique columns (other datasets don’t have that column) c ..read more

Visit website

How To Curate Chemical Data for Cheminformatics

Phyo Phyo Kyaw Zin » Cheminformatics

by Zin

2y ago

In early 2021, I gave a talk at the MIDD+ Conference held by Simulations Plus Inc. on data curation using one of the projects that I worked on — the Madin-Darby Canine Kidney (MDCK) project. In this blog post, I will be focusing on the general data curation aspects of that project. Let me emphasize why data curation is an essential step in Cheminformatics. A machine learning model can only be as good as the data it is built on. If the data is noisy, filled with activity cliffs, and full of mistakes, you won’t get any useful model. Often, cheminformatics datasets aren’t quite large, like thousa ..read more

Visit website

How to Box Plot with Python

Phyo Phyo Kyaw Zin » Cheminformatics

by Zin

2y ago

This blog post is for readers as well as myself. In this tutorial, I will show how to make different types of boxplots including horizontal, vertical, grouped boxplots, and interactive ones. It’s not meant to be comprehensive. It’s just a collection of different styles and visualizations that I like. For the code, you will need the following python libraries: pandas, NumPy, Plotly, Matplotlib, and seaborn. They all can be installed with either pip or conda. I will be using fake data to show different types of boxplots. Normally, I would create a conda environment and install these required lib ..read more

Visit website

Nested Cross-Validation & Cross-Validation Series – Part 2B

Phyo Phyo Kyaw Zin » Cheminformatics

by Zin

2y ago

Please check out the previous blog posts from this series if you haven’t done so already: Part 1 algorithm for k-fold Cross-Validation Part 2A of the Nested Cross-Validation & Cross-Validation Series where I went through a python tutorial on implementing k-fold CV regressors using random forest (RF) from scikit-learn with a simple cheminformatics dataset with descriptors and endpoints of interest. Here in Part 2B, I will cover the python tutorial for the dataset containing high-dimensional matrices where each matrix represents features of a chemical structure (this is taken from one o ..read more

Visit website

Nested Cross-Validation & Cross-Validation Series – Part 3

Phyo Phyo Kyaw Zin » Cheminformatics

by Zin

2y ago

This is part 3 of the Nested Cross-Validation & Cross-Validation Series where I will explain the algorithm of nested cross-validation (NeCV), and compare Cross-Validation and NeCV. Please read this blog first if you need to learn about cross-validation so that you can dive into NeCV after. I would like to first clarify that there are variations in the implementations and algorithms of NeCV. The algorithm I will be describing is a common one used in several studies including an article I published in 2020. Below is a diagram illustrating the algorithm of NeCV. (taken from https://pubs.acs.o ..read more

Visit website

Nested Cross-Validation & Cross-Validation Series – Part 1

Phyo Phyo Kyaw Zin » Cheminformatics

by Zin

2y ago

A few people have asked me to explain and share the code for Nested Cross-Validation. I think it makes sense for me to explain the basics of whats and whys in using the NeCV first before diving into the code, so I will be covering these topics in four separate blog posts. For part 1, I will explain the algorithm for Cross-Validation (k-fold). For part 2, I will explain how to implement the k-fold cross-validation algorithm in python with tutorials using two cheminformatics datasets (A) a simple dataset with descriptors and endpoints of interest, and (B) high-dimensional matrices where each mat ..read more

Visit website

Nested Cross-Validation & Cross-Validation Series – Part 2A

Phyo Phyo Kyaw Zin » Cheminformatics

by Zin

2y ago

This is part 2A of the Nested Cross-Validation & Cross-Validation Series. I will go through a python tutorial on implementing k-fold CV regressors using random forest (RF) from scikit-learn with the first dataset: (A) a simple cheminformatics dataset with descriptors and endpoints of interest. In Part 2B, I will cover the same python tutorial for the second dataset: (B) high-dimensional matrices where each matrix represents features of a chemical structure (this is taken from one of my Ph.D. projects; MD-QSAR with Imatinib derivatives). Please check out part 1 of this series to learn more ..read more

Visit website

ago

A ..read more

Visit website

ago

A ..read more

Visit website

Follow Phyo Phyo Kyaw Zin » Cheminformatics on FeedSpot