Data Machina #250
Data Machina
by Carlos
4d ago
Llama 3: A Watershed AI moment? I reckon that the release of Llama 3 is perhaps one of the most important moments in AI development so far. The Llama 3 stable is already giving birth to all sorts of amazing animals and model derivatives. You can expect Llama 3 will unleash the mother of all battles against closed AI models like GPT-4. Meta AI just posted: ”Our largest Llama 3 models are over 400B parameters. And they are still being trained.” The upcoming Llama-400B will change the playing field for many independent researchers, little AI startups, one-man AI developers, and also enterprise AI ..read more
Visit website
Data Machina #248
Data Machina
by Carlos
2w ago
Jailbreaking AI Models: It’s easy. Hundreds of millions of dollars have been thrown at AI Safety & Alignment over the years. Despite that, jailbreaking LLMs in April 2024 is easy. Oddly enough, as the LLM models become more capable and sophisticated, the jailbreaking attacks are becoming easier to perform, more effective, and frequent. Gary Marcus - who is hypercritical about LLMs and current AI trends- just published this very opinionated post: An unending array of jailbreaking attacks could be the death of LLMs. I often speak to colleagues and clients about the “LLM jailbreaking elephant ..read more
Visit website
Data Machina #247
Data Machina
by Carlos
3w ago
The New Breed of Open Mixture-of-Experts (MoE) Models. In a push to beat the closed-box AI models from the AI Titans, many startups and research orgs have embarked in releasing open MoE-based models. These new breed of MoE-based models introduce many clever architectural tricks, and seek to balance training cost efficiency, output quality, inference performance and much more. For an excellent introduction to MoEs, checkout this long post by the Hugging Face team: Mixture of Experts Explained We’re starting to see several open MoE-based models achieving near-SOTA or SOTA performance as compared ..read more
Visit website
Data Machina #246
Data Machina
by Carlos
1M ago
New Trends in Vision-Language Models (VLMs.) The evolution of VLMs in recent months has been pretty impressive. Today VLMs exhibit some amazing capabilities. See the two links below on what VLMs can do and how they work: A Guide to Vision-Language Models (VLMs) Vision-Language Models for Vision Tasks: A Survey But still VLMs are facing some challenges for example in terms of: multimodal training datasets, resolution, long-form modality, vision-language integration, and concept understanding. Somewhat along those lines, I see 5 trends happening in VLMs: 1) VLMs run on local environment 2 ..read more
Visit website
Data Machina #245
Data Machina
by Carlos
1M ago
The GenAI RAG House Revisited. Since Facebook AI introduced RAG three years ago, RAG systems have evolved from Naive to Advanced, and then to Modular RAG. But Modular RAG also added more complexity, components, interfaces, etc. to the LLMOps pipeline. Many naive RAG and advanced RAG projects never made it to prod. I know many companies that have spent a lot effort and money in building enterprise RAG apps, only to realise they couldn’t produce accurate, reliable results at a manageable cost. Building a RAG system that is scalable, cost-efficient, accurate, and modular requires deep expertise ..read more
Visit website
Data Machina #244
Data Machina
by Carlos
1M ago
AI Reasoning Like Humans. The storm has been battering the airport viciously. Three hours later we departed enduring some massive turbulences. Then this: “Captain speaking. This is to inform you that we’ll be performing an auto-pilot landing [watch this] upon arriving to Heathrow.” We should trust the AI-copilot reasoning in harsh situations. Shouldn’t we?… Five days ago, Anthropic introduced next-gen Claude 3 model family. I’ve tried Claude 3: It’s very good at certain language tasks, it pars or beats GPT-4 Turbo in several areas, has a huge context window, and it’s quite cheaper. Funnily eno ..read more
Visit website
Data Machina #243
Data Machina
by Carlos
1M ago
Beyond GenAI & LLMs. GenAI & LLMs have literally kidnapped the DL/ML space, and pretty much sucked all the investment and top AI minds, as if there is nothing else under the sun. There are literally hundreds of GenAI models out there. Many of these GenAI models have questionable value or are a repeat-rinse-recycle of the same. Checkout this mega spreadsheet: An Annotated and Updated Directory of GenAI Models. So: Many DL/ML researchers are starting to question the LLM status quo: Shouldn’t we focus $ and brains in new, alt AI/DL paradigms beyond LLMs? And: Do we need so many GenAI mode ..read more
Visit website
Data Machina #242
Data Machina
by Carlos
2M ago
AI and Causality. The introduction of OpenAI Sora (simulate real worlds from video understanding) has sparked a bit of a debate among some prominent AI researchers. First, What do AI researchers mean by “causal”? Secondly: Do LLMs have causal reasoning capabilities? Can LLMs learn causality from just real world training data? Can LLMs learn, represent, and understand world models and physics? Judea Pearl - a world’s top researchers in Probabilistic AI, Bayesian Networks, and Causal Inference- once famously said in an interview: Deep Learning -albeit complex and non-trivial- it’s a curve fi ..read more
Visit website
Data Machina #241
Data Machina
by Carlos
2M ago
AI World Models and Video. My world vision model for waking up at 4am to travel overseas is frankly a bit fuzzy and unreliable. But What’s an AI world model? My 2 cents explainer: It’s a model that builds an internal representation of a real-world, physical, [human] environment, and uses that to predict or simulate future events within that environment. Until recently, research in AI world models has been very much focused on video games and Reinforcement Learning. But now, the boom of GenAI and large multi-modal models have triggered a new trend in AI world models based on large scale video u ..read more
Visit website
Data Machina #240
Data Machina
by Carlos
2M ago
Foundation Models, Transformers and Time-Series. Statisticians and econometricians have been searching for the Holy Grail of time-series forecasting (TSF) for more than 40 years. “Classical” models like ARIMA still work remarkably well in some TSF scenarios. But today, the cool stuff is all about transformer-based, DL & foundation models for TSF. How “good” are these new DL models for TSF? How do we evaluate these models? Do these new models really achieve SOTA performance as some papers claim? Are DL researchers cherrypicking ts datasets to easily fit a SOTA TSF DL model?... Real world ti ..read more
Visit website

Follow Data Machina on FeedSpot

Continue with Google
Continue with Apple
OR