We may finally crack Maths. But should we?
inFERENCe
by Ferenc Huszar
10M ago
Automating mathematical theorem proving has been a long standing goal of artificial intelligence and indeed computer science. It's one of the areas I became very interested in recently. This is because I feel we may have the ingredients needed to make very, very significant progress: a structured search space with clear-cut success criterion that can be algorithmically generated: the language of formal mathematics a path to obtaining very good heuristics to guide search in the space - LLMs trained on a mixture of code, formal and informal mathematics. learning algorithms that can exploit the ..read more
Visit website
Mortal Komputation: On Hinton's argument for superhuman AI.
inFERENCe
by Ferenc Huszar
10M ago
Last week in Cambridge was Hinton bonanza. He visited the university town where he was once an undergraduate in experimental psychology, and gave a series of back-to-back talks, Q&A sessions, interviews, dinners, etc. He was stopped on the street by random passers-by who recognised him from the lecture, students and postdocs asked to take a selfie with him after his packed lectures. Things are very different from the last time I met Hinton in Cambridge: I was a PhD student, around 12 years ago, in a Bayesian stronghold safe from deep learning influence. There was the usual email about a vi ..read more
Visit website
Autoregressive Models, OOD Prompts and the Interpolation Regime
inFERENCe
by Ferenc Huszar
1y ago
A few years ago I was very much into maximum likelihood-based generative modeling and autoregressive models (see this, this or this). More recently, my focus shifted to characterising inductive biases of gradient-based optimization focussing mostly on supervised learning. I only very recently started combining the two ideas, revisiting autoregressive models throuh the lens of inductive biases, motivated by a desire to understand a bit more about LLMs. As I did so, I found myself surprised by a number of observations, which really should not have been surprising to me at all. This post document ..read more
Visit website
We May be Surprised Again: Why I take LLMs seriously.
inFERENCe
by Ferenc Huszar
1y ago
"Deep Learning is Easy, Learn something Harder" - I proclaimed in one of my early and provocative blog posts from 2016. While some observations were fair, that post is now evidence that I clearly underestimated the the impact simple techniques will have, and probably gave counterproductive advice. I wasn't alone in my deep learning skepticism, in fact I'm far from being the most extreme deep learning skeptic. Many of us who grew up working in Bayesian ML, convex optimization, kernels and statistical learning theory confidently predicted the inevitable failure of deep learning, continued to cla ..read more
Visit website
Implicit Bayesian Inference in Large Language Models
inFERENCe
by Ferenc Huszar
2y ago
This intriguing paper kept me thinking long enough for me to I decide it's time to resurrect my blogging (I started writing this during ICLR review period, and realised it might be a good idea to wait until that's concluded) Sang Michael Xie, Aditi Raghunathan, Percy Liang and Tengyu Ma (2021) An Explanation of In-context Learning as Implicit Bayesian Inference I liked this paper because it relates to one of my favourite concepts and ideas: exchangeability. And it took me back to thoughts I had back in 2015 (pre-historic by deep learning standards) about leveraging exchangeable sequence mode ..read more
Visit website
Eastern European Guide to Writing Reference Letters
inFERENCe
by Ferenc Huszar
2y ago
Excruciating. One phrase I often use to describe what it's like to read reference letters for Eastern European applicants to PhD and Master's programs in Cambridge. Even objectively outstanding students often receive dull, short, factual, almost negative-sounding reference letters. This is a result of (A) cultural differences - we are very good at sarcasm, painfully good at giving direct negative feedback, not so good at praising others and (B) the fact that reference letters play no role in Eastern Europe and most professors have never written or seen a good one before. Poor reference letters ..read more
Visit website
Causal inference 4: Causal Diagrams, Markov Factorization, Structural Equation Models
inFERENCe
by Patrik Reizinger
3y ago
This post is written with my PhD student and now guest author Partik Reizinger and is part 4 of a series of posts on causal inference: Part 1: Intro to causal inference and do-calculus Part 2: Illustrating Interventions with a Toy Example Part 3: Counterfactuals ➡️️ Part 4: Causal Diagrams, Markov Factorization, Structural Equation Models One way to think about causal inference is that causal models require a more fine-grained models of the world compared to statistical models. Many causal models are equivalent to the same statistical model, yet support different causal inferences. This pos ..read more
Visit website
On Information Theoretic Bounds for SGD
inFERENCe
by Ferenc Huszar
3y ago
Few days ago we had a talk by Gergely Neu, who presented his recent work: Gergely Neu Information-Theoretic Generalization Bounds for Stochastic Gradient Descent I'm writing this post mostly to annoy him, by presenting this work using super hand-wavy intuitions and cartoon figures. If this isn't enough, I will even find a way to mention GANs in this context. But truthfully, I'm just excited because for once, there is a little bit of learning theory that I half-understand, at least at an intuitive level, thanks to its reliance on KL divergences and the mutual information. A simple  gues ..read more
Visit website
Notes on the Origin of Implicit Regularization in SGD
inFERENCe
by Ferenc Huszar
3y ago
I wanted to highlight an intriguing paper I presented at a journal club recently: Samuel L Smith, Benoit Dherin, David Barrett, Soham De (2021) On the Origin of Implicit Regularization in Stochastic Gradient Descent There's actually a related paper that came out simultaneously, studying full-batch gradient descent instead of SGD: David G.T. Barrett, Benoit Dherin (2021) Implicit Gradient Regularization One of the most important insights in machine learning over the past few years relates to the importance of optimization algorithms in generalization performance. Why deep learning works at ..read more
Visit website
An information maximization view on the $beta$-VAE objective
inFERENCe
by Dora Jambor
3y ago
guest post with Dóra Jámbor This is a half-guest-post written jointly with Dóra, a fellow participant in a reading group where we recently discussed the original paper on $\beta$-VAEs: Irina Higgins et al (ICLR 2017): $\beta$-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework On the surface of it, $\beta$-VAEs are a straightforward extension of VAEs where we are allowed to directly control the tradeoff between the reconstruction and KL loss terms. In an attempt to better understand where the $\beta$-VAE objective comes from, and to further motivate why it makes se ..read more
Visit website

Follow inFERENCe on FeedSpot

Continue with Google
Continue with Apple
OR