
Machine Learning with Coffee
838 FOLLOWERS
Machine Learning with Coffee is a podcast where we are going to be sharing ideas about Machine Learning and related areas such as artificial intelligence, business intelligence, business analytics, data mining, and Big data. The objective is to promote a healthy discussion on the current state of this fascinating world of Machine Learning. We will be sharing our experience, sharing tricks,..
Machine Learning with Coffee
1y ago
We talk about what it takes to become a Data Scientist. We also discuss 4 prerequisites before preparing yourself to become a Data Scientist. Finally, we provide recommendations on 3 online courses, that if mastered, will put you above 90% of all Data Scientists out there ..read more
Machine Learning with Coffee
1y ago
XGBoost is an open-source software library which has won several Machine Learning competitions in Kaggle. It is based on the principles of gradient boosting, which is based on the ideas of the Leo Breiman, the creator of Random Forest. The theory behind gradient boosting was later formalized by Jerome H. Friedman. Gradient boosting combines weak learners just as Random Forest. XGBoost is an engineering implementation which includes a clever penalization of trees and a proportional shrinking of leaf nodes ..read more
Machine Learning with Coffee
1y ago
Random Forest is one of the best out-of-the-shelf algorithms. In this episode we try to understand the intuition behind the Random Forest and how it tries to leverage the capabilities of Decision Trees by aggregating them using a very smart trick called “bagging”. Variable Importance and out-of-bag error are two of the nice capabilities of Random Forest which allow us to find the most important predictors and compute a good generalization error, respectively.  ..read more
Machine Learning with Coffee
1y ago
We discuss Principal Component Analysis as one of the most popular techniques to reduce the dimensionality of a dataset. PCA helps us be more efficient in terms of the number of variables we feed to our machine learning models.  ..read more
Machine Learning with Coffee
1y ago
We present 3 clustering algorithms which will help us detect anomalies: DBSCAN, Gaussian Mixture Models and K-means. These 3 algorithms are very popular and basic but have passed the test of time. All these algorithms have many variations which try to overcome some of the disadvantages of the original implementation ..read more
Machine Learning with Coffee
1y ago
We talk about Decision Trees as one of the most basic statistical learning algorithms out there that all Data Scientist should know. Decision Trees are one of a few machine learning models which are easy to interpret which makes them a favorite when it is desired to understand the logic behind a certain decision. Decision Trees naturally handle all types of variables without the need to create dummy variables, no need to scale or normalize and they are also very robust against outliers.  ..read more
Machine Learning with Coffee
1y ago
We talk about the importance of inferential statistics in Data Science. Inferential statistics are a set of techniques used to make generalizations about a population from a sample. One of the tools used in inferential statistics is hypothesis testing. In this episode we provide a couple of examples on when and why to use 1-sample t-tests and 2-sample t-tests. We also argue that the mean or average of a sample means nothing if we do not also consider the variation of the data ..read more
Machine Learning with Coffee
1y ago
The definition of Machine Learning and other related areas such as: artificial intelligence, business analytics, business intelligence and Big Data, is provided. These are not academic definitions extracted from books, these are real world concepts as I see them. We discuss similarities, differences and overlap between all these, sometimes confusing terms, which people tend to misuse.   ..read more
Machine Learning with Coffee
1y ago
In this episode with talk about regularization, an effective technique to deal with overfitting by reducing the variance of the model. Two techniques are introduced: ridge regression and lasso. The latter one is effectively a feature selection algorithm.  ..read more
Machine Learning with Coffee
1y ago
In this episode I talk about my personal journey, how I became a Data Scientist. I start by talking about how I decided to go to college, what major to choose, how I chose my master’s degree. I talk about my time studying a PhD in Engineering and the most useful classes I took related to machine learning and data science. Finally, I briefly talk about my job experience as Data Scientist.  ..read more