Many people these days are fascinated by deep learning, as it enabled new capabilities in many areas, particularly in computer vision. Deep nets are however black boxes and most people have no idea how they work (and frankly most of us, scientists trained in the field can't tell exactly how they work either). But the success of deep learning and a set of its surprising failure modes teach us a valuable lesson about the data we process.
In this post I will present a perspective of what deep learning actually enables, how it relates to classical computer vision (which is far from being dead) and what are the potential dangers of relying on DL for critical applications.
The vision problem
First of all, some things need to be said about the problem of vision/computer vision. In principle it could be formulated as follows: given an image from a camera allow the computer to answer questions about the contents of that image. Such questions can range from "is there a triangle in the image", "is there a human face in the image" to more complex instances such as "is there a dog chasing a cat in the image". Although many of … Read more...
Once upon a time, in the 1980's there was a magical place called Silicon Valley. Wonderful things were about to happen there and many people were about make a ton of money. These things were all related to the miracle of a computer and how it would revolutionize pretty much everything.
Computers had a ton of applications in front of them: completely overhauling office work, enabling entertainment via computer games and changing the way we communicate, shop and use banking system. But back then they were clumsy, slow and expensive. And although the hope was there, many of these things wouldn't be accomplished unless computers somehow got orders of magnitude faster and cheaper.
But there was the Moore's law - over the decade of the 1970' the number of transistors in an integrated circuit doubled every ~18 months. If this law were to hold, the future would be rosy and beautiful. The applications would be unlocked for which the markets were awaiting. Money was to be made.
By mid 1990's it was clear that it worked. Computers were getting faster and software was getting more complex so rapidly, that upgrades had to happen on a yearly basis to keep up … Read more...
It has became a tradition that I write a quick update on the state of self driving car development every year when the California DMV releases their disengagement data [ 2017 post here,2018 post here]. 2018 was an important year for self driving as we had seen the first fatal accident caused by an autonomous vehicle (the infamous Uber crash in Arizona).
Let me start with a disclaimer: I plot disengagements against human crashes and fatalities not because it is a good comparison, but because this is the only comparison we have. There are many reasons why this is not the best measure and depending on the reason the actual "safety" of AV may be either somewhat better or significantly worse than indicated here. Below are some of my reasons:
A disengagement is a situation in which a machine cannot be trusted and the human operator takes over to avoid any danger. The precise definition under California law is: “a deactivation of the autonomous mode when a failure of the autonomous technology is detected or when the safe operation of the vehicle requires that the autonomous vehicle test driver disengage the autonomous mode and take immediate manual
Every rule of thumb in data science has a counterexample. Including this one.
In this post I'd like to explore several simple and low dimensional examples that expose how our typical intuitions about the geometry of data may be fatally flawed. This is generally a practical post, focused on examples, but there is a subtle message I'd like to provide. In essence: be careful. It is easy to make data based conclusions which are totally wrong.
Dimensionality reduction is not always a good idea
It is a fairly common practice to reduce the input data dimension via some projection, typically via principal component analysis (PCA) to get a lover-dimensional, more "condensed" data. This often works fine, as often the directions along which data is separable align with the principal axis. But this does not have to be the case, see a synthetic example below:
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from scipy.stats import ortho_group
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import matplotlib.pyplot as plt
N = 10 # Dimension of the data
M = 500 # Number of samples
# Random rotation matrix
R = ortho_group.rvs(dim=N)
# Data variances
variances = np.sort(np.random.rand((N)))[::-1]
Elon Musk is a polarizing figure. His ideas frequently come about in casual conversations. People are often amused and impressed by his achievements. I must admit, a few years back I thought he is literally the next Steve Jobs, only actually better, since he was onto so many things... I admired SpaceX, thought that Tesla cars had many great solutions in them...
At some point in 2015 or 2016 Elon started talking outrageous stuff in the domain of AI, a domain of my own expertise, which I could tell right away was total bullshit. And then I began looking at all this stuff in detail. Doing some math here and there. Reading various opinions. As a result, my opinion on Musk and many of his ideas has changed somewhat substantially. At this point, I can pretty much say with confidence that 90% of his stuff is utter BS, and the remaining 10% is perhaps impressive but still questionable.
Nevertheless he is quite a character with many fans almost religiously believing everything he says. Any time I meet somebody who is a Musk fan I have to go over these issues so I decided to write this post as a point … Read more...
Those who regularly read my blog are aware that I'm a bit skeptical of the current AI "benchmarks" and whether they serve the field well. In particular I think that the lack of definition of intelligence is the major elephant in the room. For a proof that this apparently is not a well recognized issue take this recent twitter thread:
Aside from the broader context of this thread discussing evolution and learning, Ilya Sutskever, one of the leading deep learning researchers, is expressing a nice sounding empirical approach: we don't have to argue, we can just test. Well, as it may clearly follow from my reply, I don't think this is really the case. I have no idea what Sutskever means by "obviously more intelligent" - do you? Does he mean better ability to overfit existing datasets? Play yet another Atari computer game? I find this approach prevalent in the circles associated with deep learning, as if this field had some very well defined empirical measurement foundation. Quite the opposite is true: the field is driven by a dogma that a "dataset" (blessed as standard in the field by some committee) and some God given measure (put Hinton, LeCun or … Read more...
Electric cars are great. They don't pollute, drive without making noise, have incredible responsiveness and torque all over the RPM range. There are limited number of moving parts, they don't need lubrication hence don't consume oil.
These are all true. There is no point in arguing with these facts, anyone who ever driven an electric car will concur. But there is always the other side, the one enthusiasts will not want to discuss. Let me go into a few issues I have with this technology.
All those amazing cars (such as Tesla) are based on Lithium Ion battery. Much like any other battery, this one uses electrodes, one made of lithium compound and the other out of a form of carbon such as graphite. The electrolyte in between these electrodes typically contains cobalt (typically in the form of an oxide). The exact chemistry varies between different types of cells but overall positively charged lithium ions get carried from the anode to cathode during discharge and the reverse is happening during charging. Cobalt oxide mediates the ions. So in some sense the electric car actually has zillions of moving parts if we count all these ions traveling from anode to … Read more...
Almost six months ago (May 28th 2018) I posted the "AI winter is well on its way" post that went viral. The post amassed nearly a quarter million views and got picked up in Bloomberg, Forbes, Politico, Venturebeat, BBC, Datascience Podcast and numerous other smaller media outlets and blogs [1, 2, 3, 4, ...], triggered violent debate on Hacker news and Reddit. I could not have anticipated this post to be so successful and hence I realized I touched on a very sensitive subject. One can agree with my claims or not, but the sheer popularity of the post almost itself serves as a proof that something is going on behind the scenes and people are actually curious and doubtful if there is anything solid behind the AI hype.
Since the post made a prediction, that the AI hype is cracking (particularly in the space of autonomous vehicles) and the as a result we will have another "AI winter" episode, I decided to periodically go over those claims, see what has changed and bring some new evidence.
First of all a bit of clarification: some readers … Read more...
There are many many deep learning models out there doing various things. Depending on the exact task they are solving, they may be constructed differently. Some will use convolution followed by pooling. Some will use several convolutional layers before there is any pooling layer. Some will use max-pooling. Some will use mean-pooling. Some will have a dropout added. Some will have a batch-norm layer here and there. Some will use sigmoid neurons, some will use half-recitfiers. Some will classify and therefore optimize for cross-entropy. Others will minimize mean-squared error. Some will use unpooling layers. Some will use deconvolutional layers. Some will use stochastic gradient descent with momentum. Some will use ADAM. Some will have RESNET layers, some will use Inception. The choices are plentiful (see e.g. here).
Reading any of these particular papers, one is faced with a set of choices the authors had made, followed by the evaluation on the dataset of their choice. The discussion of choices typically refers strongly to papers where given techniques were first introduced, whereas the results section typically discusses in detail the previous state of the art. The shape of the architecture is often broken down into obvious and non obvious decisions. … Read more...
In some recent email exchanges I've realized that when people by some coincidence make it to this blog, they rarely end up visiting my main website, and even if they do, they rarely browse through the teaching materials. This is not really a complaint, I hardly ever visit my website myself, but there are some materials there that I go back to every once in a while (though I have copies on my laptop). These are the lecture notes I made for a lecture on mathematical foundations of neuroscience.
As a bit of a background, in 2009 after I defended my PhD and before I joined Brain Corporation I was briefly an Adjunct Professor at the Faculty of Mathematics and Computer Science Nicolaus Copernicus University in Torun. During that time I decided to refresh everything I gathered about mathematics of neuroscience and prepare a lecture series complete with exercises, lots of pictures, graphs, and all the necessary theory. And even though 9 years have passed since then, the lectures hold up pretty well, hence why not bring that content to a broader audience?
The lecture consists of 15 main pdf presentations, a number of sample exercises as well … Read more...