Vincent Granville Blog
52 FOLLOWERS
Vincent Granville's articles, tutorials, and eBooks about machine learning, data science, mathematics, and related topics. Written in simple English.
Vincent Granville Blog
2y ago
A different way to do regression with prediction intervals. In Python and without math. No calculus, no matrix algebra, no statistical engineering, no regression coefficients, no bootstrap. Multivariate and highly non-linear. Interpretable and illustrated on synthetic data. Read more here.
For years, I have developed machine learning techniques that barely use any mathematics. I view it as a sport. Not that I don’t know anything about mathematics, quite the contrary. I believe you must be very math-savvy to achieve such accomplishments. This article epitomizes math-free machine lear ..read more
Vincent Granville Blog
2y ago
This self-published book is dated July 2020 according to Amazon. But it appears to be an ongoing project. Like many new books, the material is on GitHub. The most recent version, dated June 2021, is available in PDF format.
This is not a traditional book. It feels like a repository of Python code, printed on paper if you buy the print version. The associated GitHub repository is much more useful if you want to re-use the code with simple copy and paste. It covers a lot of topics and performance metrics, with emphasis on computer vision problems. The code is documented in details. The code re ..read more
Vincent Granville Blog
2y ago
I share here my new article on synthetic data and interpretable machine learning. It will show you how to set up such an environment. I also mention three popular books published in the last three months. The figure below is from the first article featured in this newsletter.
Article: synthetic data and interpretable machine learning. This first article in a new series on synthetic data and explainable AI, focuses on making linear regression more meaningful and controllable. Includes synthetic data, advanced machine learning with Excel, combinatorial feature selection, parametric bootstrap ..read more
Vincent Granville Blog
2y ago
Here I share my roadmap for the next 12 months. While I am also looking for external contributors and authors to add more variety, my focus — as far as my technical content is concerned — is to complete the following projects and publish the material on this platform.
Summary
All my blog posts will be available to everyone. Some technical papers (in PDF format) may be offered to subscribers only (you can subscribe here). My plan is to also produce books focusing on specific topics, covering material from several articles in a self-contained unified package. They will be available on our e-St ..read more
Vincent Granville Blog
2y ago
All statistical textbooks focus on centrality (median, average or mean) and volatility (variance). None mention the third fundamental class of metrics: bumpiness.
Here we introduce the concept of bumpiness and show how it can be used. Two different datasets can have same mean and variance, but a different bumpiness. Bumpiness is linked to how the data points are ordered, while centrality and volatility completely ignore order. So, bumpiness is useful for datasets where order matters, in particular time series. Also, bumpiness integrates the notion ..read more
Vincent Granville Blog
2y ago
I'm talking about streaming data displayed in video rather than chart format, like 200 scatter plots continuously updated, as in my recent video series from chaos to clusters, consisting of three parts:
video clip #2 (bubbles)
video click #3 (molecules)
video clip #1 (fast)
In this article, I explain and illustrate how to produce these videos. You don't need to be a data scientist to understand.
Here's one frame from one version of video clip #3.
Here's the solution:
1. Produce the data that you want to visualize
Using Python, R, Perl, Excel, SAS or any other ..read more
Vincent Granville Blog
2y ago
Learn advanced machine learning techniques using Excel. No coding required.
It is amazing what you can do with a simple tool such as Excel. In this series, I share some of my spreadsheets. They cover many topics, including multiple types of regression, model-free confidence intervals, resampling, an original technique known as hidden decision trees, scatter plots with multiple groups, advanced visualization techniques, and more.
No plug-in is required. I don't use macros, pivot tables or any advanced Excel feature. In part 1 (this article), I cover the following techniques:
Read the full ..read more
Vincent Granville Blog
2y ago
The new variance introduced in this article fixes two big data problems associated with the traditional variance and the way it is computed in Hadoop, using a numerically unstable formula.
Synthetic Metrics
This new metric is synthetic: It was not derived naturally from mathematics like the variance taught in any statistics 101 course, or the variance currently implemented in Hadoop (see above picture). By synthetic, I mean that it was built to address issues with big data (outliers) and the way many big data computations are now done: Map Reduce framework, Hadoop be ..read more
Vincent Granville Blog
2y ago
The book is available on our e-store, here. View the table of contents, bibliography, index, list of figures and exercises here on my GitHub repository. To view the full list of books, visit MachineLearningRecipes.com. To receive updates about our future books sign up here.
Introduction
This scratch course on stochastic processes covers significantly more material than usually found in traditional books or classes. The approach is original: I introduce a new yet intuitive type of random structure called perturbed lattice or Poisson-binomial process, as the gateway to all the stochastic process ..read more
Vincent Granville Blog
2y ago
Confidence interval is abbreviated as CI. In this new article (part of our series on robust techniques for automated data science) we describe an implementation both in Excel and Perl, and discuss our popular model-free confidence interval technique introduced in our original Analyticbridge article, as part of our (open source) intellectual property sharing. This technique has the following advantages:
Very easy to understand by non-statisticians (business analysts, software engineers, programmers, data architects)
Simple (if not basic) to code; no need to use tables of Gaussi ..read more