Data Engineering in Towards Data Science
1,126 FOLLOWERS
Read writing about Data Engineering in Towards Data Science. Your home for data science.
Data Engineering in Towards Data Science
1w ago
The evolution of evaluating Data Stacks. Image the Author’sOne step forward; no steps back
We’ve all heard of the Pareto Principle. It’s also known as the 80/20 rule. It’s the idea that you can get 80% of the work done with 20% of the effort; the final 20% of the job takes up the other 80% of the effort.
It derives from the work of an economist that’s so famous you even study them at school. Their name is Vilfredo Pareto. The first thing they teach you at school is a different concept that’s also eponymous with Pareto: Pareto Efficiency.
VIlfredo Pareto. Source: WIkipedia
An out ..read more
Data Engineering in Towards Data Science
1w ago
Exploiting the full potential of universal data supply Photo by NASA on Unsplash
There is a lot of talk about the value of data being the new gold. Many companies are therefore pouring large sums of money into becoming data-driven. It is sold as a completely new way of doing business, almost as if business has never been driven by data before.
Sure, many companies struggle with being fully data-driven, but I think the bigger issue is, that it’s not at all sufficient.
We all seek to extract something magical from the data. That‘s why we collect data in big data lakes, data wareho ..read more
Data Engineering in Towards Data Science
3w ago
One answer and many best practices for how larger organizations can operationalizing data quality programs for modern data platforms An answer to “who does what” for enterprise data quality. Image courtesy of the author.
I’ve spoken with dozens of enterprise data professionals at the world’s largest corporations, and one of the most common data quality questions is, “who does what?” This is quickly followed by, “why and how?”
There is a reason for this. Data quality is like a relay race. The success of each leg — detection, triage, resolution, and measurement — depends on the other ..read more
Data Engineering in Towards Data Science
3w ago
Open-Source Data Observability with Elementary — From Zero to Hero (Part 1) A step-by-step hands-on guide I wish I had when I was a beginner
Data observability and its importance have often been discussed and written about as a crucial aspect of modern data and analytics engineering. Many tools are available on the market with various features and prices. In this 2 part article, we will focus on the open-source version of Elementary, one of these data observability platforms, tailored for and designed to work seamlessly with dbt. We will start by setting up from zero and aiming to un ..read more
Data Engineering in Towards Data Science
1M ago
Data Mesh trends in data platform design AI-generated image using Kandinsky
In this article, I aim to delve into the various types of data platform architectures, taking a better look at their evolution, strengths, weaknesses, and practical applications. A key focus will be the Data Mesh architecture, its role in Modern Data Stack (MDS) and today’s data-driven landscape.
It’s a well-known fact that the architecture of a data platform profoundly affects its performance and scalability. The challenge often lies in selecting an architecture that best aligns with your specific business n ..read more
Data Engineering in Towards Data Science
1M ago
DATA ENGINEERING Fundamentals, responsibilities, and challenges Image generated by the author.
In today’s digital landscape, marketing is increasingly driven by data. Unlike traditional marketing (examples: television, radio, billboards, signs, and print), where measuring the impact of marketing channels, strategies, and ads was often unclear, digital marketing allows us to measure these impacts with precision. In fact, digital marketing has become a more effective approach not just because of the large audiences online, but because it enables us to evaluate our marketing efforts with gre ..read more
Data Engineering in Towards Data Science
1M ago
Expert techniques to elevate your analysis AI-generated image using Kandinsky
This story delves into advanced SQL techniques that will be useful for data science practitioners. In this piece, I will provide a detailed exploration of expert-grade SQL queries I use daily in my analytics projects. SQL, along with modern data warehouses, forms the backbone of data science. It is an indispensable tool for data manipulation and user behaviour analytics. The techniques I am going to talk about are designed to be practical and beneficial from the data science perspective. Mastery of SQL is a valu ..read more
Data Engineering in Towards Data Science
1M ago
Getting to the bottom of what structuring your data responsibly really means
Imagine constructing a skyscraper without a blueprint, figuring out how to lay concrete, align support beams and wire electrical in an ad-hoc way as the building takes shape.
Well, this is exactly what’s happening in data organisations today…
Data modelling, the foundational blueprint of the data ecosystem, is being neglected in the rush to adopt the latest technologies and deliver quick results. And we see a predictable mess of problems being addressed with short-term engineering fixes.
What Even is Data Mo ..read more
Data Engineering in Towards Data Science
1M ago
Unlocking the power of large language models Photo by ZHENYU LUO on Unsplash
In this article, I will examine how large language models (LLMs) can convert natural language into SQL, making query writing more accessible to non-technical users. The discussion will include practical examples that showcase the ease of developing LLM-based solutions. We’ll also cover various use cases and demonstrate the process by creating a simple Slack application. Building an AI-driven database querying system involves several critical considerations, including maintaining security, ensuring data relev ..read more
Data Engineering in Towards Data Science
1M ago
The Azure Landing Zone for a Data Platform in the Cloud
Working with sensitive data or within a highly regulated environment requires safe and secure cloud infrastructure for data processing. The cloud might seem like an open environment on the internet and raise security concerns. When you start your journey with Azure and don’t have enough experience with the resource configuration it is easy to make design and implementation mistakes that can impact the security and flexibility of your new data platform. In this post, I’ll describe the most important aspects of designing a cloud adapta ..read more