Machine Learning Techniques on Feedspot

Hallucination-Free, Self-Tuned, Fast Hierarchical LLMs with Multi-Token Embeddings

Machine Learning Techniques

by Vincent Granville

6d ago

The new generation of RAG / LLM architecture is moving away from the original monolithic and generic OpenAI model, towards a collection of decentralized and specialized LLMs jointly organized and governed via multi-agent systems. The benefits are obvious: low latency, smaller tables (one per LLM), faster training and fine-tuning, energy-efficient, better results, with much lower GPU consumption. The number of tokens or weights is dramatically reduced. If you charge customers by the token as many vendors do, this is another competitive advantage. It also leads to local implementations and ..read more

Visit website

Extreme LLM: Case Study, Documentation, Best Practices, and Python sources

Machine Learning Techniques

by Vincent Granville

1M ago

Extreme LLM, abbreviated as xLLM, relies on multiple specialized large language models, one per top category, to deliver highly relevant answers to specific questions, covering the entire human knowledge or targeted content such as corporate repositories. The user, in addition to the classic prompt, is invited to select or guess top categories. Behind the scenes, it involves one simple LLM per top category and a reconstructed granular taxonomy of the input sources (crawled webpages, or parsed data). Each LLM has its own set of summary tables: embeddings, links (URLs), dictionary, stopwor ..read more

Visit website

Probabilistic ANN: The Swiss Army Knife of GenAI

Machine Learning Techniques

by Vincent Granville

2M ago

ANN — Approximate Nearest Neighbors — is at the core of fast vector search, itself central to GenAI, especially GPT and LLM. My new methodology, abbreviated as PANN, has many other applications: clustering, classification, measuring the similarity between two datasets (images, soundtracks, time series, and so on), tabular data synthetization (improving poor synthetizations), model evaluation, and even detecting extreme observations. Just to give an example, you could use it to categorize all time series without statistical theory. Statistical models are redundant and less explainable, le ..read more

Visit website

My Top 10 GenAI Articles of the Year

Machine Learning Techniques

by Vincent Granville

4M ago

Here is some good reading for the holiday season. More than just reading as the material includes full Python implementations and datasets. The most up-to-date versions are in my new book Statistical Optimization for GenAI and Machine Learning, available here. As a courtesy, if you buy it by December 31, you are entitled to a free copy of my unpublished textbook Practical AI & Machine Learning Projects and Datasets (with solutions). Contact me to get your copy! My previous GenAI book (published by Elsevier) will be released in January. Selected Articles Genome: Synthesizing DNA Sequences ..read more

Visit website

Genome: Synthesizing DNA Sequences with LLM Techniques

Machine Learning Techniques

by Vincent Granville

4M ago

This methodology is not focused on genome data alone. The purpose is to design a generic solution that may also work in other contexts, such as synthesizing molecules. The problem involves dealing with a large amount of “text”. Indeed, the sequences discussed here consist of letter arrangements, from an alphabet that has 5 symbols: A, C, G, T and N. The first four symbols stand for the types of bases found in a DNA molecule: adenine (A), cytosine (C), guanine (G), and thymine (T). The last one (N) represents missing data. No prior knowledge of genome sequencing is required. Summary The data co ..read more

Visit website

10 GenAI Notebooks: OpenAI, LLM, RAG, GPT, and More

Machine Learning Techniques

by Vincent Granville

4M ago

For developers and AI/ML professionals. This comprehensive free resource offered by our sponsor is designed to provide you with hands-on experience and deeper insights into building cutting-edge GenAI applications. Special Opportunity: You can win a pair of Apple Airpods simply by following the tutorial and learning something new. How to Participate Follow these 2 steps: Access the Code Notebooks and explore its contents. Complete one of the Notebook from start to finish, by December 10. To start, click on “Try It Now”, when visiting the notebook of your choice. By completing the notebook ..read more

Visit website

Easy Trick to Debias GenAI Models: Quantile Convolution

Machine Learning Techniques

by Vincent Granville

5M ago

All of the GenAI apps that I tested, including my own, have the same problem. They cannot easily generate data outside the observation range. As an example, let’s focus on the insurance dataset discussed in my new book. I use it to generate synthetic data with GAN (generative adversarial networks) and the NoGAN models discussed in chapters 6 and 7. In the training set, one of the features is “charges”, that is, the medical expenses incurred by the policy holder, in a given year. The range is from $1,121 to $63,770. In the synthesized data, the amount always stays within these two bounds. Worst ..read more

Visit website

New Book: Understanding Deep Learning

Machine Learning Techniques

by Vincent Granville

5M ago

By Simon Prince, computer science Professor at the University of Alberta. To be published by MIT Press, Dec 2023. The author shares the associated Jupyter notebooks on his website, here. Very popular, it got over 5,000 likes when the author announced the upcoming book on LinkedIn. I pre-ordered my copy. Summary An authoritative, accessible, and up-to-date treatment of deep learning that strikes a pragmatic middle ground between theory and practice. Deep learning is a fast-moving field with sweeping relevance in today’s increasingly digital world. Understanding Deep Learning provides an authori ..read more

Visit website

NoGAN: Ultrafast Data Synthesizer – My Talk at ODSC San Francisco

Machine Learning Techniques

by Vincent Granville

5M ago

My talk at the ODSC Conference, San Francisco, October 2023. Includes Notebook demonstration, using our open-source Python libraries. View or download the PowerPoint presentation, here. I discuss NoGAN, an alternative to standard tabular data synthetization. It runs 1000x faster than GAN, consistently delivering better results according to the most sophisticated evaluation metric, implemented here for the first time. A game changer that significantly reduces costs (cloud or GPU time, training time, and fine-tuning parameters replaced by auto-tuning). Now available as open-source. In real-life ..read more

Visit website

Quantum derivatives, GenAI, and the Riemann Hypothesis

Machine Learning Techniques

by Vincent Granville

5M ago

Have you ever encountered a function or cumulative probability distribution (CDF) that is nowhere differentiable, yet continuous everywhere? Some are featured in this article. For a CDF, it means that it does not have a probability density function (PDF), and for a standard function, it has no derivative. At least, not until now. The quantum derivative — the solution to differentiating such a function — is an awkward mathematical object. If it was of no use, I would not write about it. In fact, this object contains a lot of information about the original function. I use it here to gain d ..read more

Visit website

Follow Machine Learning Techniques on FeedSpot