Dataset Factory - A Toolchain for Generative Computer Vision Datasets
Iterative Blog
by
1M ago
The fast proliferation of analytical and Generative AI solutions is driving requirements for data versioning and data curation to the next level, where the dataset management tools must understand data and be able to use metadata for data curation. This goal is not achievable with the traditional MLOps toolchains that remain blind to the content of managed files. We solve this problem by introducing the next generation of Data-Centric AI software - DVCx. We have been building DVCx for several years now and are happy to share some of the thinking and motivation that came into this product. For ..read more
Visit website
Tutorial: Scalable and Distributed ML Workflows with DVC and Ray on AWS (Part 2)
Iterative Blog
by
1M ago
In Part 1 of the tutorial, we explored the basics of setting up and integrating DVC with Ray for distributed machine learning workflows. By leveraging Ray's distributed computing capabilities and DVC's data version control, we establish a robust framework for managing complex ML experiments. This combination allows for enhanced scalability, reproducibility, and collaboration in ML projects. In Part 2, we extend the solution to a Ray Cluster on AWS, demonstrating how to adapt the setup for cloud-based distributed computing. This part involves configuring AWS resources, deploying Ray clusters in ..read more
Visit website
Running DVC on a SLURM cluster
Iterative Blog
by
1M ago
Introduction For many ML projects, there comes a point when local development hits the wall and we need to scale up the underlying compute resources. Maybe the dataset grows too large for your primary workstation or the deep learning model requires several high-end GPUs. This should be a routine transition for ML developers, and one to which they shouldn’t have to give much thought. In this blog post, we’ll explain our approach to remote DVC experiments on a SLURM cluster and share some code to get you started. We work at an AI-driven precision medicine company called Exscientia. Our goal is t ..read more
Visit website
Tutorial: Scalable and Distributed ML Workflows with DVC and Ray (Part 1)
Iterative Blog
by
1M ago
Training models at the scale of the Gemini or GPT-4 models requires advanced tools that manage complexity while ensuring efficiency. This tutorial explores how Data Version Control (DVC) can be a game-changer for ambitious projects. DVC simplifies AI development by automating pipelines, managing versions, and tracking experiments while embracing GitOps for reproducibility. It excels in both local and cloud environments for traditional ML workflows. However, the rise of Generative AI and complex deep learning projects demands scalable, distributed training solutions. This tutorial is divided in ..read more
Visit website
Tutorial: Automate Data Validation and Model Monitoring Pipelines with DVC and Evidently
Iterative Blog
by
1M ago
Feel free to clone the repository provided. It's more than a learning tool; it's a flexible reference architecture that you can adapt to fit your unique use cases. Why DVC and Evidently? In the realm of Machine Learning Operations (MLOps), ensuring the robustness and reliability of models is paramount. Using the right tools can significantly enhance your MLOps practices. Typical Machine Learning Operations (MLOps) workflow DVC is an open-source tool that brings agility and reproducibility to data science projects by treating data and model training pipelines as software. It connects versioned ..read more
Visit website
Integrating DVC and Git LFS via libgit2 filters
Iterative Blog
by
1M ago
One of the main features provided by DVC is the ability to import and download files from any Git repository. In prior releases this came with the caveat where projects which use Git LFS were unsupported. As of version 3.31.0, DVC now supports reading Git LFS objects, so you can now dvc import upstream datasets from platforms like Hugging Face which use Git LFS, without needing to install any additional dependencies! Read on for an overview on how the DVC Git LFS client was implemented. To get started using DVC with Hugging Face, please refer to the DVC integrations documentation DVC builds on ..read more
Visit website
Turn Your Favorite IDE into a Full Machine Learning Experimentation Platform
Iterative Blog
by
1M ago
Need an easy way to run and track your experiments? Install the DVC extension from the VS Code marketplace. Then, run experiments, visualize deep learning metrics in real-time, compare experiments, and save the ones you like - all from your IDE. Run a Python file and see results Want to simplify your chaotic ML iterations? With the DVC extension, you can run reproducible workflows directly from VS Code. Run a new experiment directly from VS Code Live plots let you visualize metrics from these runs in real-time. View plots in real-time To make it easy for you to create the workflows, the extens ..read more
Visit website
Leveraging LLMs in Chatbots: The DVC Approach
Iterative Blog
by
1M ago
In the modern world of Machine Learning (ML) and Natural Language Processing (NLP), there's been a surge in applications built on top of Large Language Models (LLMs). There has been an almost exponential adoption in applications and companies building applications from LLMs across a variety of areas. In this post we will show how DVC can make designing LLM applications more efficient and organized. We take a Retrieval-Augmented Generation (RAG) approach and illustrate how we can break down the various phases of a RAG chatbot and version them with DVC. We can use DVC to both "time travel" and a ..read more
Visit website
Achieving SOC 2 Type 2 Compliance - Our Continued Commitment to Data Security
Iterative Blog
by
1M ago
As data breaches and cybersecurity threats are on the rise, safeguarding sensitive information has become a necessity for organizations that truly care about their customers. We are proud to announce the next milestone for our company to this end – the attainment of SOC 2 Type 2 compliance. This achievement underscores our unwavering commitment to ensuring the security and privacy of our customers' data. What is SOC 2 Type 2 Compliance? Before we delve into the significance of this accomplishment, let's briefly explain what SOC 2 Type 2 compliance entails. SOC 2 (Service Organization Control 2 ..read more
Visit website
Fine-Tuning Large Language Models with a Production-Grade Pipeline
Iterative Blog
by
1M ago
Introduction - Solving cloud resources and reproducibility for LLMs A few of weeks ago, I wrote a post about the challenges of training large ML models, in particular: the need for more computing power and the complexity of managing cloud resources; the difficulty of keeping track of ML experiments and reproducing results. There I proposed a solution to these problems by using SkyPilot and DVC to manage cloud resources and track experiments, respectively. These problems are especially relevant for large language models, where both the model size and the amount of data required for training a ..read more
Visit website

Follow Iterative Blog on FeedSpot

Continue with Google
Continue with Apple
OR