Data Machina on Feedspot

Data Machina #250

Data Machina

by Carlos

4d ago

Llama 3: A Watershed AI moment? I reckon that the release of Llama 3 is perhaps one of the most important moments in AI development so far. The Llama 3 stable is already giving birth to all sorts of amazing animals and model derivatives. You can expect Llama 3 will unleash the mother of all battles against closed AI models like GPT-4. Meta AI just posted: ”Our largest Llama 3 models are over 400B parameters. And they are still being trained.” The upcoming Llama-400B will change the playing field for many independent researchers, little AI startups, one-man AI developers, and also enterprise AI ..read more

Visit website

Data Machina #248

Data Machina

by Carlos

2w ago

Jailbreaking AI Models: It’s easy. Hundreds of millions of dollars have been thrown at AI Safety & Alignment over the years. Despite that, jailbreaking LLMs in April 2024 is easy. Oddly enough, as the LLM models become more capable and sophisticated, the jailbreaking attacks are becoming easier to perform, more effective, and frequent. Gary Marcus - who is hypercritical about LLMs and current AI trends- just published this very opinionated post: An unending array of jailbreaking attacks could be the death of LLMs. I often speak to colleagues and clients about the “LLM jailbreaking elephant ..read more

Visit website

Data Machina #247

Data Machina

by Carlos

3w ago

The New Breed of Open Mixture-of-Experts (MoE) Models. In a push to beat the closed-box AI models from the AI Titans, many startups and research orgs have embarked in releasing open MoE-based models. These new breed of MoE-based models introduce many clever architectural tricks, and seek to balance training cost efficiency, output quality, inference performance and much more. For an excellent introduction to MoEs, checkout this long post by the Hugging Face team: Mixture of Experts Explained We’re starting to see several open MoE-based models achieving near-SOTA or SOTA performance as compared ..read more

Visit website

Data Machina #246

Data Machina

by Carlos

1M ago

New Trends in Vision-Language Models (VLMs.) The evolution of VLMs in recent months has been pretty impressive. Today VLMs exhibit some amazing capabilities. See the two links below on what VLMs can do and how they work: A Guide to Vision-Language Models (VLMs) Vision-Language Models for Vision Tasks: A Survey But still VLMs are facing some challenges for example in terms of: multimodal training datasets, resolution, long-form modality, vision-language integration, and concept understanding. Somewhat along those lines, I see 5 trends happening in VLMs: 1) VLMs run on local environment 2 ..read more

Visit website

Data Machina #245

Data Machina

by Carlos

1M ago

The GenAI RAG House Revisited. Since Facebook AI introduced RAG three years ago, RAG systems have evolved from Naive to Advanced, and then to Modular RAG. But Modular RAG also added more complexity, components, interfaces, etc. to the LLMOps pipeline. Many naive RAG and advanced RAG projects never made it to prod. I know many companies that have spent a lot effort and money in building enterprise RAG apps, only to realise they couldn’t produce accurate, reliable results at a manageable cost. Building a RAG system that is scalable, cost-efficient, accurate, and modular requires deep expertise ..read more

Visit website

Data Machina #244

Data Machina

by Carlos

1M ago

AI Reasoning Like Humans. The storm has been battering the airport viciously. Three hours later we departed enduring some massive turbulences. Then this: “Captain speaking. This is to inform you that we’ll be performing an auto-pilot landing [watch this] upon arriving to Heathrow.” We should trust the AI-copilot reasoning in harsh situations. Shouldn’t we?… Five days ago, Anthropic introduced next-gen Claude 3 model family. I’ve tried Claude 3: It’s very good at certain language tasks, it pars or beats GPT-4 Turbo in several areas, has a huge context window, and it’s quite cheaper. Funnily eno ..read more

Visit website

Data Machina #243

Data Machina

by Carlos

1M ago

Beyond GenAI & LLMs. GenAI & LLMs have literally kidnapped the DL/ML space, and pretty much sucked all the investment and top AI minds, as if there is nothing else under the sun. There are literally hundreds of GenAI models out there. Many of these GenAI models have questionable value or are a repeat-rinse-recycle of the same. Checkout this mega spreadsheet: An Annotated and Updated Directory of GenAI Models. So: Many DL/ML researchers are starting to question the LLM status quo: Shouldn’t we focus $ and brains in new, alt AI/DL paradigms beyond LLMs? And: Do we need so many GenAI mode ..read more

Visit website

Data Machina #242

Data Machina

by Carlos

2M ago

AI and Causality. The introduction of OpenAI Sora (simulate real worlds from video understanding) has sparked a bit of a debate among some prominent AI researchers. First, What do AI researchers mean by “causal”? Secondly: Do LLMs have causal reasoning capabilities? Can LLMs learn causality from just real world training data? Can LLMs learn, represent, and understand world models and physics? Judea Pearl - a world’s top researchers in Probabilistic AI, Bayesian Networks, and Causal Inference- once famously said in an interview: Deep Learning -albeit complex and non-trivial- it’s a curve fi ..read more

Visit website

Data Machina #241

Data Machina

by Carlos

2M ago

AI World Models and Video. My world vision model for waking up at 4am to travel overseas is frankly a bit fuzzy and unreliable. But What’s an AI world model? My 2 cents explainer: It’s a model that builds an internal representation of a real-world, physical, [human] environment, and uses that to predict or simulate future events within that environment. Until recently, research in AI world models has been very much focused on video games and Reinforcement Learning. But now, the boom of GenAI and large multi-modal models have triggered a new trend in AI world models based on large scale video u ..read more

Visit website

Data Machina #240

Data Machina

by Carlos

2M ago

Foundation Models, Transformers and Time-Series. Statisticians and econometricians have been searching for the Holy Grail of time-series forecasting (TSF) for more than 40 years. “Classical” models like ARIMA still work remarkably well in some TSF scenarios. But today, the cool stuff is all about transformer-based, DL & foundation models for TSF. How “good” are these new DL models for TSF? How do we evaluate these models? Do these new models really achieve SOTA performance as some papers claim? Are DL researchers cherrypicking ts datasets to easily fit a SOTA TSF DL model?... Real world ti ..read more

Visit website

Follow Data Machina on FeedSpot