Describing Double Descent with WeightWatcher
CALCULATED CONTENT
by Charles H Martin, PhD
1M ago
Double Descent (DD) is something that has surprised statisticians, computer scientists, and deep learning practitioners–but it was known in the physics literature in the 80s: And while DD can seem complicated in deep learning models, the original model is actually very easy to understand — and reproduce — with just a few lines of python. IMHO, DD is a great way to understand how and when Deep Neural Networks might overfit their data, and, moreover, where they achieve optimal performance. And you can do this and more with the open-source weightwatcher tool https://weightwatcher.ai The original ..read more
Visit website
SVDSmoothing LLM Layers with WeightWatcher
CALCULATED CONTENT
by Charles H Martin, PhD
2M ago
Recently, Microsoft Research published the LASER method: ”Layer-Selective Rank Reduction” in this recent, very popular paper The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction And it got a lot of press (the Verge ) because it hints that it may be possible to improve the truthfulness of LLMs with a simple mathematical transformation The thing is, the weightwatcher tool has had a similar feature for some time, called SVDSmoothing. And like the name sounds, you can apply TruncatedSVD to the layers of an AI model, like an LLM, to improve performance ..read more
Visit website
Evaluating Fine-Tuned LLMs with WeightWatcher Part II: PEFT / LoRa Models
CALCULATED CONTENT
by Charles H Martin, PhD
2M ago
Evaluating LLMs is hard. Especially when you don’t have a lot of test data. In the last post, we saw how to evaluate fine-tuned LLMs using the open-source weightwatcher tool. Specifically, we looked at models after the ‘deltas’ (or updates) have been merged into the base model. In this post, we will look at LLMs fine-tuned using Parameter Efficient Fine-Tuning (PEFT), also called Low-Rank Adaptations (LoRA). The LoRA technique lets one update the weight matrices (W) of the LLM with a Low-Rank update (BA): Here is a great blog post explaining all things LoRA The A and B matrices are significan ..read more
Visit website
Evaluating Fine-Tuned LLMs with WeightWatcher
CALCULATED CONTENT
by Charles H Martin, PhD
3M ago
if you are fine-tuning your own LLMs, you need a way to evaluate them. And while there are over a dozen popular methods to choose from, each of them are biased toward a specific, narrowly scoped measure. none of them can identify potential internal problems in your model, and in the end, you will probably need to design a custom metric for your LLM Can you do better? Before you design a custom metric, there is a better, cheaper, and faster approach to help you get started–using the open -source weightwatcher tool WeightWatcher is a one-of-a-kind must-have tool for anyone training, deploying ..read more
Visit website
WeightWatcher new feature: fix_fingers=’clip_xmax’
CALCULATED CONTENT
by Charles H Martin, PhD
1y ago
WeightWatcher 0.7 has just been released, and it includes the new and improved advanced feature for analyzing Deep Neural Networks (DNN) called fix_fingers. To activate this, simply use: details = watcher.analyze(..., fix_fingers='clip_xmax', ...) This will take a tiny bit longer, and will yield more reliable alpha for your model layers, along with a new column, num_fingers, which reports the number of outliers found Note that other metrics, such as alpha_weighted (alpha-hat), will not be affected. It is recommended that the fix_fingers be added for all analysis moving forward, however, we ..read more
Visit website
WeightWatcher 0.7: March 2023
CALCULATED CONTENT
by Charles H Martin, PhD
1y ago
First, let me say thanks to all the users in our great community — we have reached over 93K downloads as of March 2023 ! The latest release of the open-source weightwatcher tool includes several important advances, including removing explicit dependence on tensorflow and torch on install the ability to process very large models, directly from their pytorch statdict files GPU-enabled SVD calculations Much faster and more stable power law calculations Lower memory footprint on GPU enabled machines an improved method for finding the weightwatcher shape-metric alpha, with the option fix_fingers ..read more
Visit website
Deep Learning and Effective Correlation Spaces
CALCULATED CONTENT
by Charles H Martin, PhD
1y ago
AI has taaken the world by storm. With recent advances like AlphaFold, Stable Difussion, and ChatGPT, Deep Neural Networks (DNNs) have had their Sputnik moment. And yet, we really don’t understand why DNNs even work. Unless, of course, you follow this blog and use the widely popular open-source weightwatcher tool. The open-source weightwatcher tool has been featured in Nature Communications and has over 86K downloads. It can help you diagnose problems in your DNN models, layer-by-layer, without even needing access to the test or training data. But how can weightwatcher possibly do this? In a p ..read more
Visit website
Protected: A new theory of AI
CALCULATED CONTENT
by Charles H Martin, PhD
1y ago
This post is password protected. You must visit the website and enter the password to continue reading ..read more
Visit website
Better than BERT: Pick your best model
CALCULATED CONTENT
by Charles H Martin, PhD
1y ago
Have you ever had to sort through HuggingFace to find your best model ? There are over 54,000 models on HuggingFace! So it’s not an easy task. Most people just choose the most popular model–and this is usually BERT. Or some BERT variant. Bert was created by Google, so it must be good. But is BERT the really best choice for you ? How can you find out ? You can search through the literature, read blogs, ask on Reddit, etc, and try to find a better model. This is time consuming and imperfect. Fortunately, there is a better way. The weightwatcher tool can tell you. WeightWatcher is an open-source ..read more
Visit website
Is your layer over-trained ? (part 2)
CALCULATED CONTENT
by Charles H Martin, PhD
2y ago
Say you are training a Deep Neural Network (DNN), and you see your model is over-trained. Or just not performing well. Is there a way to detect which layer is actually over-trained? In this post, we will show how to use the open-source weightwatcher tool to answer this. WeightWatcher is an open-source, data-free diagnostic tool for analyzing (pre-)trained DNNs. It is based on my personal research into Why Deep Learning Works, in collaboration with UC Berkeley. It is based on ideas from the Statistical Mechanics of Learning (i.e theoretical physics and chemistry). pip install weightwatcher Weig ..read more
Visit website

Follow CALCULATED CONTENT on FeedSpot

Continue with Google
Continue with Apple
OR