inFERENCe

54 FOLLOWERS

A blog about machine learning research, deep learning, causal inference, variational learning, by Ferenc Huszár. Ferenc Huszár is a Senior Lecturer in Machine Learning at the University of Cambridge. He recently joined the Department of Computer Science and Technology, a nice and cozy department, where he's building a new machine learning group with Neil Lawrence and Carl Henrik Ek and..

inFERENCe

1y ago

Automating mathematical theorem proving has been a long standing goal of artificial intelligence and indeed computer science. It's one of the areas I became very interested in recently. This is because I feel we may have the ingredients needed to make very, very significant progress:
a structured search space with clear-cut success criterion that can be algorithmically generated: the language of formal mathematics
a path to obtaining very good heuristics to guide search in the space - LLMs trained on a mixture of code, formal and informal mathematics.
learning algorithms that can exploit the ..read more

inFERENCe

1y ago

Last week in Cambridge was Hinton bonanza. He visited the university town where he was once an undergraduate in experimental psychology, and gave a series of back-to-back talks, Q&A sessions, interviews, dinners, etc. He was stopped on the street by random passers-by who recognised him from the lecture, students and postdocs asked to take a selfie with him after his packed lectures.
Things are very different from the last time I met Hinton in Cambridge: I was a PhD student, around 12 years ago, in a Bayesian stronghold safe from deep learning influence. There was the usual email about a vi ..read more

inFERENCe

1y ago

A few years ago I was very much into maximum likelihood-based generative modeling and autoregressive models (see this, this or this). More recently, my focus shifted to characterising inductive biases of gradient-based optimization focussing mostly on supervised learning. I only very recently started combining the two ideas, revisiting autoregressive models throuh the lens of inductive biases, motivated by a desire to understand a bit more about LLMs. As I did so, I found myself surprised by a number of observations, which really should not have been surprising to me at all. This post document ..read more

inFERENCe

1y ago

"Deep Learning is Easy, Learn something Harder" - I proclaimed in one of my early and provocative blog posts from 2016. While some observations were fair, that post is now evidence that I clearly underestimated the the impact simple techniques will have, and probably gave counterproductive advice.
I wasn't alone in my deep learning skepticism, in fact I'm far from being the most extreme deep learning skeptic. Many of us who grew up working in Bayesian ML, convex optimization, kernels and statistical learning theory confidently predicted the inevitable failure of deep learning, continued to cla ..read more

inFERENCe

2y ago

This intriguing paper kept me thinking long enough for me to I decide it's time to resurrect my blogging (I started writing this during ICLR review period, and realised it might be a good idea to wait until that's concluded)
Sang Michael Xie, Aditi Raghunathan, Percy Liang and Tengyu Ma (2021) An Explanation of In-context Learning as Implicit Bayesian Inference
I liked this paper because it relates to one of my favourite concepts and ideas: exchangeability. And it took me back to thoughts I had back in 2015 (pre-historic by deep learning standards) about leveraging exchangeable sequence mode ..read more

inFERENCe

2y ago

Excruciating. One phrase I often use to describe what it's like to read reference letters for Eastern European applicants to PhD and Master's programs in Cambridge.
Even objectively outstanding students often receive dull, short, factual, almost negative-sounding reference letters. This is a result of (A) cultural differences - we are very good at sarcasm, painfully good at giving direct negative feedback, not so good at praising others and (B) the fact that reference letters play no role in Eastern Europe and most professors have never written or seen a good one before.
Poor reference letters ..read more

inFERENCe

3y ago

This post is written with my PhD student and now guest author Partik Reizinger and is part 4 of a series of posts on causal inference:
Part 1: Intro to causal inference and do-calculus
Part 2: Illustrating Interventions with a Toy Example
Part 3: Counterfactuals
➡️️ Part 4: Causal Diagrams, Markov Factorization, Structural Equation Models
One way to think about causal inference is that causal models require a more fine-grained models of the world compared to statistical models. Many causal models are equivalent to the same statistical model, yet support different causal inferences. This pos ..read more

inFERENCe

3y ago

Few days ago we had a talk by Gergely Neu, who presented his recent work:
Gergely Neu Information-Theoretic Generalization Bounds for Stochastic Gradient Descent
I'm writing this post mostly to annoy him, by presenting this work using super hand-wavy intuitions and cartoon figures. If this isn't enough, I will even find a way to mention GANs in this context.
But truthfully, I'm just excited because for once, there is a little bit of learning theory that I half-understand, at least at an intuitive level, thanks to its reliance on KL divergences and the mutual information.
A simple gues ..read more

inFERENCe

3y ago

I wanted to highlight an intriguing paper I presented at a journal club recently:
Samuel L Smith, Benoit Dherin, David Barrett, Soham De (2021) On the Origin of Implicit Regularization in Stochastic Gradient Descent
There's actually a related paper that came out simultaneously, studying full-batch gradient descent instead of SGD:
David G.T. Barrett, Benoit Dherin (2021) Implicit Gradient Regularization
One of the most important insights in machine learning over the past few years relates to the importance of optimization algorithms in generalization performance.
Why deep learning works at ..read more

inFERENCe

3y ago

guest post with Dóra Jámbor
This is a half-guest-post written jointly with Dóra, a fellow participant in a reading group where we recently discussed the original paper on $\beta$-VAEs:
Irina Higgins et al (ICLR 2017): $\beta$-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework
On the surface of it, $\beta$-VAEs are a straightforward extension of VAEs where we are allowed to directly control the tradeoff between the reconstruction and KL loss terms. In an attempt to better understand where the $\beta$-VAE objective comes from, and to further motivate why it makes se ..read more