06 How to Become a Data Scientist
Machine Learning with Coffee
by Gustavo Lujan
2y ago
We talk about what it takes to become a Data Scientist. We also discuss 4 prerequisites before preparing yourself to become a Data Scientist. Finally, we provide recommendations on 3 online courses, that if mastered, will put you above 90% of all Data Scientists out there ..read more
Visit website
14 XGBoost: The Winner of Many Competitions
Machine Learning with Coffee
by Gustavo Lujan
2y ago
XGBoost is an open-source software library which has won several Machine Learning competitions in Kaggle. It is based on the principles of gradient boosting, which is based on the ideas of the Leo Breiman, the creator of Random Forest. The theory behind gradient boosting was later formalized by Jerome H. Friedman. Gradient boosting combines weak learners just as Random Forest. XGBoost is an engineering implementation which includes a clever penalization of trees and a proportional shrinking of leaf nodes ..read more
Visit website
13 Random Forest
Machine Learning with Coffee
by Gustavo Lujan
2y ago
Random Forest is one of the best out-of-the-shelf algorithms. In this episode we try to understand the intuition behind the Random Forest and how it tries to leverage the capabilities of Decision Trees by aggregating them using a very smart trick called “bagging”. Variable Importance and out-of-bag error are two of the nice capabilities of Random Forest which allow us to find the most important predictors and compute a good generalization error, respectively.  ..read more
Visit website
18 PCA: Principal Component Analysis
Machine Learning with Coffee
by Gustavo Lujan
2y ago
We discuss Principal Component Analysis as one of the most popular techniques to reduce the dimensionality of a dataset. PCA helps us be more efficient in terms of the number of variables we feed to our machine learning models.  ..read more
Visit website
17 Anomaly Detection: Clustering
Machine Learning with Coffee
by Gustavo Lujan
2y ago
We present 3 clustering algorithms which will help us detect anomalies: DBSCAN, Gaussian Mixture Models and K-means. These 3 algorithms are very popular and basic but have passed the test of time. All these algorithms have many variations which try to overcome some of the disadvantages of the original implementation ..read more
Visit website
12 Decision Trees
Machine Learning with Coffee
by Gustavo Lujan
2y ago
We talk about Decision Trees as one of the most basic statistical learning algorithms out there that all Data Scientist should know. Decision Trees are one of a few machine learning models which are easy to interpret which makes them a favorite when it is desired to understand the logic behind a certain decision. Decision Trees naturally handle all types of variables without the need to create dummy variables, no need to scale or normalize and they are also very robust against outliers.  ..read more
Visit website
11 Inferential Statistics
Machine Learning with Coffee
by Gustavo Lujan
2y ago
We talk about the importance of inferential statistics in Data Science. Inferential statistics are a set of techniques used to make generalizations about a population from a sample. One of the tools used in inferential statistics is hypothesis testing. In this episode we provide a couple of examples on when and why to use 1-sample t-tests and 2-sample t-tests. We also argue that the mean or average of a sample means nothing if we do not also consider the variation of the data ..read more
Visit website
03 What is Machine Learning?
Machine Learning with Coffee
by Gustavo Lujan
2y ago
The definition of Machine Learning and other related areas such as: artificial intelligence, business analytics, business intelligence and Big Data, is provided. These are not academic definitions extracted from books, these are real world concepts as I see them. We discuss similarities, differences and overlap between all these, sometimes confusing terms, which people tend to misuse.   ..read more
Visit website
09 Regularization to Deal with Overfitting
Machine Learning with Coffee
by Gustavo Lujan
2y ago
In this episode with talk about regularization, an effective technique to deal with overfitting by reducing the variance of the model. Two techniques are introduced: ridge regression and lasso. The latter one is effectively a feature selection algorithm.  ..read more
Visit website
02 My Personal Journey: How I Became a Data Scientist
Machine Learning with Coffee
by Gustavo Lujan
2y ago
In this episode I talk about my personal journey, how I became a Data Scientist. I start by talking about how I decided to go to college, what major to choose, how I chose my master’s degree. I talk about my time studying a PhD in Engineering and the most useful classes I took related to machine learning and data science. Finally, I briefly talk about my job experience as Data Scientist.  ..read more
Visit website

Follow Machine Learning with Coffee on FeedSpot

Continue with Google
Continue with Apple
OR