Hallucination-Free, Self-Tuned, Fast Hierarchical LLMs with Multi-Token Embeddings
Machine Learning Techniques
by Vincent Granville
6d ago
The new generation of RAG / LLM architecture is moving away from the original monolithic and generic OpenAI model, towards a collection of decentralized and specialized LLMs jointly organized and governed via multi-agent systems. The benefits are obvious: low latency, smaller tables (one per LLM), faster training and fine-tuning, energy-efficient, better results, with much lower GPU consumption. The number of tokens or weights is dramatically reduced.  If you charge customers by the token as many vendors do, this is another competitive advantage. It also leads to local implementations and ..read more
Visit website
Extreme LLM: Case Study, Documentation, Best Practices, and Python sources
Machine Learning Techniques
by Vincent Granville
1M ago
Extreme LLM, abbreviated as xLLM, relies on multiple specialized large language models, one per top category, to deliver highly relevant answers to specific questions, covering the entire human knowledge or targeted content such as corporate repositories. The user, in addition to the classic prompt, is invited to select or guess top categories. Behind the scenes, it involves one simple LLM per top category and a reconstructed granular taxonomy of the input sources (crawled webpages, or parsed data).  Each LLM has its own set of summary tables: embeddings, links (URLs), dictionary, stopwor ..read more
Visit website
Probabilistic ANN: The Swiss Army Knife of GenAI
Machine Learning Techniques
by Vincent Granville
2M ago
ANN — Approximate Nearest Neighbors —  is at the core of fast vector search, itself central to GenAI, especially GPT and LLM. My new methodology, abbreviated as PANN, has many other applications: clustering, classification, measuring the similarity between two datasets (images, soundtracks, time series, and so on), tabular data synthetization (improving poor synthetizations), model evaluation, and even detecting extreme observations. Just to give an example, you could use it to categorize all time series without statistical theory. Statistical models are redundant and less explainable, le ..read more
Visit website
My Top 10 GenAI Articles of the Year
Machine Learning Techniques
by Vincent Granville
4M ago
Here is some good reading for the holiday season. More than just reading as the material includes full Python implementations and datasets. The most up-to-date versions are in my new book Statistical Optimization for GenAI and Machine Learning, available here. As a courtesy, if you buy it by December 31, you are entitled to a free copy of my unpublished textbook Practical AI & Machine Learning Projects and Datasets (with solutions). Contact me to get your copy! My previous GenAI book (published by Elsevier) will be released in January. Selected Articles Genome: Synthesizing DNA Sequences ..read more
Visit website
Genome: Synthesizing DNA Sequences with LLM Techniques
Machine Learning Techniques
by Vincent Granville
4M ago
This methodology is not focused on genome data alone. The purpose is to design a generic solution that may also work in other contexts, such as synthesizing molecules. The problem involves dealing with a large amount of “text”. Indeed, the sequences discussed here consist of letter arrangements, from an alphabet that has 5 symbols: A, C, G, T and N. The first four symbols stand for the types of bases found in a DNA molecule: adenine (A), cytosine (C), guanine (G), and thymine (T). The last one (N) represents missing data. No prior knowledge of genome sequencing is required. Summary The data co ..read more
Visit website
10 GenAI Notebooks: OpenAI, LLM, RAG, GPT, and More
Machine Learning Techniques
by Vincent Granville
4M ago
For developers and AI/ML professionals. This comprehensive free resource offered by our sponsor is designed to provide you with hands-on experience and deeper insights into building cutting-edge GenAI applications. Special Opportunity: You can win a pair of Apple Airpods simply by following the tutorial and learning something new. How to Participate Follow these 2 steps: Access the Code Notebooks and explore its contents. Complete one of the Notebook from start to finish, by December 10. To start, click on “Try It Now”, when visiting the notebook of your choice. By completing the notebook ..read more
Visit website
Easy Trick to Debias GenAI Models: Quantile Convolution
Machine Learning Techniques
by Vincent Granville
5M ago
All of the GenAI apps that I tested, including my own, have the same problem. They cannot easily generate data outside the observation range. As an example, let’s focus on the insurance dataset discussed in my new book. I use it to generate synthetic data with GAN (generative adversarial networks) and the NoGAN models discussed in chapters 6 and 7. In the training set, one of the features is “charges”, that is, the medical expenses incurred by the policy holder, in a given year. The range is from $1,121 to $63,770. In the synthesized data, the amount always stays within these two bounds. Worst ..read more
Visit website
New Book: Understanding Deep Learning
Machine Learning Techniques
by Vincent Granville
5M ago
By Simon Prince, computer science Professor at the University of Alberta. To be published by MIT Press, Dec 2023. The author shares the associated Jupyter notebooks on his website, here. Very popular, it got over 5,000 likes when the author announced the upcoming book on LinkedIn. I pre-ordered my copy. Summary An authoritative, accessible, and up-to-date treatment of deep learning that strikes a pragmatic middle ground between theory and practice. Deep learning is a fast-moving field with sweeping relevance in today’s increasingly digital world. Understanding Deep Learning provides an authori ..read more
Visit website
NoGAN: Ultrafast Data Synthesizer – My Talk at ODSC San Francisco
Machine Learning Techniques
by Vincent Granville
5M ago
My talk at the ODSC Conference, San Francisco, October 2023. Includes Notebook demonstration, using our open-source Python libraries. View or download the PowerPoint presentation, here. I discuss NoGAN, an alternative to standard tabular data synthetization. It runs 1000x faster than GAN, consistently delivering better results according to the most sophisticated evaluation metric, implemented here for the first time. A game changer that significantly reduces costs (cloud or GPU time, training time, and fine-tuning parameters replaced by auto-tuning). Now available as open-source. In real-life ..read more
Visit website
Quantum derivatives, GenAI, and the Riemann Hypothesis
Machine Learning Techniques
by Vincent Granville
5M ago
Have you ever encountered a function or cumulative probability distribution (CDF) that is nowhere differentiable, yet continuous everywhere? Some are featured in this article. For a CDF, it means that it does not have a probability density function (PDF), and for a standard function, it has no derivative. At least, not until now. The quantum derivative — the solution to differentiating such a function — is an awkward mathematical object.  If it was of no use, I would not write about it. In fact, this object contains a lot of information about the original function. I use it here to gain d ..read more
Visit website

Follow Machine Learning Techniques on FeedSpot

Continue with Google
Continue with Apple
OR