Visualizing trees with Sklearn
Hi! I am Nagdev
by Nagdev Amruthnath
2y ago
Tree-based models are probably the second easiest ML technique for explaining the model to a non-data scientist. I am a big fan of tree-based models because of their simplicity and interpretability. But, when I try to visualize them is, when it gets my nerves. There are so many packages out there to visualize them. Sklearn has finally provided us with a new API to visualize trees through matplotlib. In this tutorial, I will show you how to visualize trees using sklearn for both classification and regression. Importing libraries The following are the libraries that are required to load datasets ..read more
Visit website
Multi-Output Regression using Sklearn
Hi! I am Nagdev
by Nagdev Amruthnath
2y ago
Regression analysis is a process of building a linear or non-linear fit for one or more continuous target variables. That’s right! there can be more than one target variable. Multi-output machine learning problems are more common in classification than regression. In classification, the categorical target variables are encoded to convert them to multi-output. In my professional experience, I see about 90% of the data science regression problems usually have a single target variable and the rest usually require fitting for multiple target variables. Some applications for multi-output target var ..read more
Visit website
Sentiment Analysis on Reddit using R
Hi! I am Nagdev
by Nagdev Amruthnath
3y ago
According to Wikipedia, Reddit is an American social news aggregation, web content rating, and discussion website. Registered members submit content to the site such as links, text posts, images, and videos, which are then voted up or down by other members. Posts are organized by subject into user-created boards called “communities” or “subreddits”, which cover a variety of topics such as news, politics, religion, science, movies, video games, music, books, sports, fitness, cooking, pets, and image-sharing. According to me, Reddit is almost anonymous social media platf ..read more
Visit website
Big Data Ignite 2020 Webinar Series
Hi! I am Nagdev
by Nagdev Amruthnath
3y ago
Big Data Ignite (BDI) was born out of a shared vision: To foster a local center of excellence in advanced computing technologies and practice. After initial success in organizing local Meetup groups, co-founders Elliott and Tuhin realized that to achieve their goal, the scope and scale of activism would need to grow. So, in 2016, the Big Data Ignite conference was launched! In its first year, the conference attracted over 30 presenters and over 200 participants. The two-day conference was designed to provide hands-on skills in data-related tools and techniques, as well as a practical overview ..read more
Visit website
Benford’s Law: Applying to Existing Data
Hi! I am Nagdev
by Nagdev Amruthnath
3y ago
Benford’s Law is one of the most underrated and widely used techniques that are commonly used in various applications. United States IRS neither confirms nor denies their use of Benford’s law to detect any number of manipulations in income tax filing. Across the Atlantic, the EU is very open and proudly claims its use of Benford’s law. Today, this is widely used in accounting to detect any fraud. Nigrini, a professor at the University of Cape Town, also used this law to identify financial discrepancies in Enron’s financial statement. In another case, Jennifer Golbeck, a professor at the Univer ..read more
Visit website
How to use CI/CD for your ML Projects?
Hi! I am Nagdev
by Nagdev Amruthnath
4y ago
The terms CI/CD stands for Continuous Integration and Continuous Delivery – Deployment. Before we jump into how all these work, let’s take a step back and walk through the process of ML. Most of the data scientists do their data analytics on their laptops. For every data analytics projects there are various steps involved and most common one’s are as follows: 1. Data collection 2. Feature extraction 3. Data cleaning and pre-processing 4. Data validation 5. Model building 6. Model testing 7. Model Deployment In most cases, each of these steps are performed by different team members. Any changes ..read more
Visit website
Will Netflix Renew the Show?
Hi! I am Nagdev
by Nagdev Amruthnath
4y ago
In last couple of years, Netflix has become a part of my lifestyle. At the end of my day when I turn on my TV, by default i’m tuned to check out Netflix. I always look forward for Friday when they release their original content and make sure I binge them by the end of my weekend. My wife and I recently binged their reality TV show called “Indian Matchmaking“. Honestly, it was binge-worthy. Me and couple of friends have been talking quite a lot about this show and season 2. We also have been following them on social media. During our conversation, I got curious to see if Netflix would renew the ..read more
Visit website
Why balancing your data set is important?
Hi! I am Nagdev
by Nagdev Amruthnath
4y ago
In real world, its not uncommon to come across unbalanced data sets where, you might have class A with 90 observations and class B with 10 observations. One of the rules in machine learning is, its important to balance out the data set or at least get it close to balance it. The main reason for this is to give equal priority to each class in laymen terms. Let’s consider the above example, where we had class A with 90 observations and class B with 10 observations. If we predict the entire data set as class A, we will achieve an accuracy of 90% which seems really not bad for a classification mod ..read more
Visit website
Data Science Application in Manufacturing
Hi! I am Nagdev
by Nagdev Amruthnath
4y ago
Last week, I had a great opportunity to give a talk on data science application in manufacturing at Acharya Institute of Technology(AIT), Bangalore. Being an alumni, AIT has a special place in my heart. A lot of curious young minds who attended my session had great questions. Some of the highlights of Q&A session are Questions What is the difference between Data Scientist and Data Analyst? A data analyst works on combining data from different sources, performing data discovery, creating schema, verify and validate data consistency and provide data reports. They also perform visualization t ..read more
Visit website
Testing the Effect of Data Imputation on Model Accuracy
Hi! I am Nagdev
by Nagdev Amruthnath
4y ago
Most of us have come across situations where, we do not have enough data for building reliable models due to various reasons such as, it’s expensive to collect data (human studies), limited resources, lack of historical data availability (earth quakes). Even before we begin talking about how to overcome the challenge, let’s first talk about why we need minimum samples even before we consider building model. First of all, we can build a model with low samples. It is definitely possible! But, the as the number of samples decreases, the margin of error increases and vice versa. If you want to bui ..read more
Visit website

Follow Hi! I am Nagdev on FeedSpot

Continue with Google
Continue with Apple
OR