Astronomer's Role in the Airflow Ecosystem: A Deep Dive with Pete DeJoy
Data Engineering Podcast
by Tobias Macey
1w ago
Summary In this episode of the Data Engineering Podcast Pete DeJoy, co-founder and product lead at Astronomer, talks about building and managing Airflow pipelines on Astronomer and the upcoming improvements in Airflow 3. Pete shares his journey into data engineering, discusses Astronomer's contributions to the Airflow project, and highlights the critical role of Airflow in powering operational data products. He covers the evolution of Airflow, its position in the data ecosystem, and the challenges faced by data engineers, including infrastructure management and observability. The conversation ..read more
Visit website
Accelerated Computing in Modern Data Centers With Datapelago
Data Engineering Podcast
by Tobias Macey
1w ago
Summary In this episode of the Data Engineering Podcast Rajan Goyal, CEO and co-founder of Datapelago, talks about improving efficiencies in data processing by reimagining system architecture. Rajan explains the shift from hyperconverged to disaggregated and composable infrastructure, highlighting the importance of accelerated computing in modern data centers. He discusses the evolution from proprietary to open, composable stacks, emphasizing the role of open table formats and the need for a universal data processing engine, and outlines Datapelago's strategy to leverage existing frameworks li ..read more
Visit website
The Future of Data Engineering: AI, LLMs, and Automation
Data Engineering Podcast
by Tobias Macey
3w ago
Summary In this episode of the Data Engineering Podcast Gleb Mezhanskiy, CEO and co-founder of DataFold, talks about the intersection of AI and data engineering. He discusses the challenges and opportunities of integrating AI into data engineering, particularly using large language models (LLMs) to enhance productivity and reduce manual toil. The conversation covers the potential of AI to transform data engineering tasks, such as text-to-SQL interfaces and creating semantic graphs to improve data accessibility, and explores practical applications of LLMs in automating code reviews, testing, an ..read more
Visit website
Evolving Responsibilities in AI Data Management
Data Engineering Podcast
by Tobias Macey
1M ago
Summary In this episode of the Data Engineering Podcast Bartosz Mikulski talks about preparing data for AI applications. Bartosz shares his journey from data engineering to MLOps and emphasizes the importance of data testing over software development in AI contexts. He discusses the types of data assets required for AI applications, including extensive test datasets, especially in generative AI, and explains the differences in data requirements for various AI application styles. The conversation also explores the skills data engineers need to transition into AI, such as familiarity with vector ..read more
Visit website
CSVs Will Never Die And OneSchema Is Counting On It
Data Engineering Podcast
by Tobias Macey
2M ago
Summary In this episode of the Data Engineering Podcast Andrew Luo, CEO of OneSchema, talks about handling CSV data in business operations. Andrew shares his background in data engineering and CRM migration, which led to the creation of OneSchema, a platform designed to automate CSV imports and improve data validation processes. He discusses the challenges of working with CSVs, including inconsistent type representation, lack of schema information, and technical complexities, and explains how OneSchema addresses these issues using multiple CSV parsers and AI for data type inference and validat ..read more
Visit website
Breaking Down Data Silos: AI and ML in Master Data Management
Data Engineering Podcast
by Tobias Macey
2M ago
Summary In this episode of the Data Engineering Podcast Dan Bruckner, co-founder and CTO of Tamr, talks about the application of machine learning (ML) and artificial intelligence (AI) in master data management (MDM). Dan shares his journey from working at CERN to becoming a data expert and discusses the challenges of reconciling large-scale organizational data. He explains how data silos arise from independent teams and highlights the importance of combining traditional techniques with modern AI to address the nuances of data reconciliation. Dan emphasizes the transformative potential of large ..read more
Visit website
Building a Data Vision Board: A Guide to Strategic Planning
Data Engineering Podcast
by Tobias Macey
3M ago
Summary In this episode of the Data Engineering Podcast Lior Barak shares his insights on developing a three-year strategic vision for data management. He discusses the importance of having a strategic plan for data, highlighting the need for data teams to focus on impact rather than just enablement. He introduces the concept of a "data vision board" and explains how it can help organizations outline their strategic vision by considering three key forces: regulation, stakeholders, and organizational goals. Lior emphasizes the importance of balancing short-term pressures with long-term strategi ..read more
Visit website
How Orchestration Impacts Data Platform Architecture
Data Engineering Podcast
by Tobias Macey
3M ago
Summary The core task of data engineering is managing the flows of data through an organization. In order to ensure those flows are executing on schedule and without error is the role of the data orchestrator. Which orchestration engine you choose impacts the ways that you architect the rest of your data platform. In this episode Hugo Lu shares his thoughts as the founder of an orchestration company on how to think about data orchestration and data platform design as we navigate the current era of data engineering. Announcements Hello and welcome to the Data Engineering Podcast, the show ab ..read more
Visit website
An Exploration Of The Impediments To Reusable Data Pipelines
Data Engineering Podcast
by Tobias Macey
3M ago
Summary In this episode of the Data Engineering Podcast the inimitable Max Beauchemin talks about reusability in data pipelines. The conversation explores the "write everything twice" problem, where similar pipelines are built without code reuse, and discusses the challenges of managing different SQL dialects and relational databases. Max also touches on the evolving role of data engineers, drawing parallels with front-end engineering, and suggests that generative AI could facilitate knowledge capture and distribution in data engineering. He encourages the community to share reference implemen ..read more
Visit website
The Art of Database Selection and Evolution
Data Engineering Podcast
by Tobias Macey
3M ago
Summary In this episode of the Data Engineering Podcast Sam Kleinman talks about the pivotal role of databases in software engineering. Sam shares his journey into the world of data and discusses the complexities of database selection, highlighting the trade-offs between different database architectures and how these choices affect system design, query performance, and the need for ETL processes. He emphasizes the importance of understanding specific requirements to choose the right database engine and warns against over-engineering solutions that can lead to increased complexity. Sam also tou ..read more
Visit website

Follow Data Engineering Podcast on FeedSpot

Continue with Google
Continue with Apple
OR