CALCULATED CONTENT on Feedspot

Describing Double Descent with WeightWatcher

CALCULATED CONTENT

by Charles H Martin, PhD

1M ago

Double Descent (DD) is something that has surprised statisticians, computer scientists, and deep learning practitioners–but it was known in the physics literature in the 80s: And while DD can seem complicated in deep learning models, the original model is actually very easy to understand — and reproduce — with just a few lines of python. IMHO, DD is a great way to understand how and when Deep Neural Networks might overfit their data, and, moreover, where they achieve optimal performance. And you can do this and more with the open-source weightwatcher tool https://weightwatcher.ai The original ..read more

Visit website

SVDSmoothing LLM Layers with WeightWatcher

CALCULATED CONTENT

by Charles H Martin, PhD

2M ago

Recently, Microsoft Research published the LASER method: ”Layer-Selective Rank Reduction” in this recent, very popular paper The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction And it got a lot of press (the Verge ) because it hints that it may be possible to improve the truthfulness of LLMs with a simple mathematical transformation The thing is, the weightwatcher tool has had a similar feature for some time, called SVDSmoothing. And like the name sounds, you can apply TruncatedSVD to the layers of an AI model, like an LLM, to improve performance ..read more

Visit website

Evaluating Fine-Tuned LLMs with WeightWatcher Part II: PEFT / LoRa Models

CALCULATED CONTENT

by Charles H Martin, PhD

2M ago

Evaluating LLMs is hard. Especially when you don’t have a lot of test data. In the last post, we saw how to evaluate fine-tuned LLMs using the open-source weightwatcher tool. Specifically, we looked at models after the ‘deltas’ (or updates) have been merged into the base model. In this post, we will look at LLMs fine-tuned using Parameter Efficient Fine-Tuning (PEFT), also called Low-Rank Adaptations (LoRA). The LoRA technique lets one update the weight matrices (W) of the LLM with a Low-Rank update (BA): Here is a great blog post explaining all things LoRA The A and B matrices are significan ..read more

Visit website

Evaluating Fine-Tuned LLMs with WeightWatcher

CALCULATED CONTENT

by Charles H Martin, PhD

3M ago

if you are fine-tuning your own LLMs, you need a way to evaluate them. And while there are over a dozen popular methods to choose from, each of them are biased toward a specific, narrowly scoped measure. none of them can identify potential internal problems in your model, and in the end, you will probably need to design a custom metric for your LLM Can you do better? Before you design a custom metric, there is a better, cheaper, and faster approach to help you get started–using the open -source weightwatcher tool WeightWatcher is a one-of-a-kind must-have tool for anyone training, deploying ..read more

Visit website

WeightWatcher new feature: fix_fingers=’clip_xmax’

CALCULATED CONTENT

by Charles H Martin, PhD

1y ago

WeightWatcher 0.7 has just been released, and it includes the new and improved advanced feature for analyzing Deep Neural Networks (DNN) called fix_fingers. To activate this, simply use: details = watcher.analyze(..., fix_fingers='clip_xmax', ...) This will take a tiny bit longer, and will yield more reliable alpha for your model layers, along with a new column, num_fingers, which reports the number of outliers found Note that other metrics, such as alpha_weighted (alpha-hat), will not be affected. It is recommended that the fix_fingers be added for all analysis moving forward, however, we ..read more

Visit website

WeightWatcher 0.7: March 2023

CALCULATED CONTENT

by Charles H Martin, PhD

1y ago

First, let me say thanks to all the users in our great community — we have reached over 93K downloads as of March 2023 ! The latest release of the open-source weightwatcher tool includes several important advances, including removing explicit dependence on tensorflow and torch on install the ability to process very large models, directly from their pytorch statdict files GPU-enabled SVD calculations Much faster and more stable power law calculations Lower memory footprint on GPU enabled machines an improved method for finding the weightwatcher shape-metric alpha, with the option fix_fingers ..read more

Visit website

Deep Learning and Effective Correlation Spaces

CALCULATED CONTENT

by Charles H Martin, PhD

1y ago

AI has taaken the world by storm. With recent advances like AlphaFold, Stable Difussion, and ChatGPT, Deep Neural Networks (DNNs) have had their Sputnik moment. And yet, we really don’t understand why DNNs even work. Unless, of course, you follow this blog and use the widely popular open-source weightwatcher tool. The open-source weightwatcher tool has been featured in Nature Communications and has over 86K downloads. It can help you diagnose problems in your DNN models, layer-by-layer, without even needing access to the test or training data. But how can weightwatcher possibly do this? In a p ..read more

Visit website

Protected: A new theory of AI

CALCULATED CONTENT

by Charles H Martin, PhD

1y ago

This post is password protected. You must visit the website and enter the password to continue reading ..read more

Visit website

Better than BERT: Pick your best model

CALCULATED CONTENT

by Charles H Martin, PhD

1y ago

Have you ever had to sort through HuggingFace to find your best model ? There are over 54,000 models on HuggingFace! So it’s not an easy task. Most people just choose the most popular model–and this is usually BERT. Or some BERT variant. Bert was created by Google, so it must be good. But is BERT the really best choice for you ? How can you find out ? You can search through the literature, read blogs, ask on Reddit, etc, and try to find a better model. This is time consuming and imperfect. Fortunately, there is a better way. The weightwatcher tool can tell you. WeightWatcher is an open-source ..read more

Visit website

Is your layer over-trained ? (part 2)

CALCULATED CONTENT

by Charles H Martin, PhD

2y ago

Say you are training a Deep Neural Network (DNN), and you see your model is over-trained. Or just not performing well. Is there a way to detect which layer is actually over-trained? In this post, we will show how to use the open-source weightwatcher tool to answer this. WeightWatcher is an open-source, data-free diagnostic tool for analyzing (pre-)trained DNNs. It is based on my personal research into Why Deep Learning Works, in collaboration with UC Berkeley. It is based on ideas from the Statistical Mechanics of Learning (i.e theoretical physics and chemistry). pip install weightwatcher Weig ..read more

Visit website

Follow CALCULATED CONTENT on FeedSpot