Simon Späti
3 FOLLOWERS
I'm a dedicated, empathetic, entrepreneurial data engineer with 15+ years of experience. I enjoy maintaining awareness of new innovative and emerging open-source technologies on my blog.
Simon Späti
2w ago
From Fortune 500 companies processing trillions of security records to innovative startups building interactive data tools, DuckDB is revolutionizing how organizations handle analytical workloads. Building on our exploration of DuckDB’s core capabilities in Part 1, this guide showcases production implementations and promising experimental applications across five key categories.
Companies grouped into five key categories from Part 1
Each example demonstrates practical implementations, gained performance, and architectural decisions that drive business value. While some cases are included ..read more
Simon Späti
1M ago
BI-as-Code and the New Era of GenBI
Imagine creating business dashboards by simply describing what you want to see. No more clicking through complex interfaces or writing SQL queries - just have a conversation with AI about your data needs. This is the promise of Generative Business Intelligence (GenBI).
At its core, GenBI delivers an unreasonably effective human interface, where we iterate quickly, based on BI-as-Code. A simplified version looks like this ..read more
Simon Späti
1M ago
DuckDB has a significant share and is frequently featured in the latest data engineering news. However, it’s still in its early adopter phase and has yet to be adopted by larger enterprises. Sure, all data creators and startups have used and potentially grown to love DuckDB, but is it also suitable for enterprises?
What about scaling out and sharing it with others in the organization? Isn’t it only a database file? And why would anyone in a large enterprise adopt DuckDB? In this article, we’ll discuss five key use cases, categorize them, and highlight the unique advantages of an enterprise usi ..read more
Simon Späti
1M ago
Data stacks have come a long way, evolving from monolithic, one-fits-all systems like Oracle/SAP to today’s modular open data stacks. This begs the question, what’s next? Or why is the current not meeting our needs?
As we see more analytics engineering and software best practices, embracing codeful, Git-based, and more CLI-based workflows, the future looks more code-first. Beyond SQL transformations, across the entire data stack. From ingestion to transformation, orchestration, and measures in dashboards—all defined declaratively ..read more
Simon Späti
1M ago
I’m currently on vacation, and it is time to dive into one of my favorite topics: knowledge workflow management. As I’m sharing most of my notes and even my book publicly, it might be interesting to see my knowledge management workflow. I’m also journaling, reflecting, and connecting all my notes, sparking most of my insights into my sharing. All of it happens in plain text in my note-taking app. This article will detail my Obsidian workflow, which many of you have requested. That’s why I’m sharing some more details here ..read more
Simon Späti
1M ago
In my journey, detailed in why Vim is more than an editor, I’ve discovered the profound impact of integrating Vim and its motions into my entire computer workflow. This evolution, from using familiar tools like Notepad++ and SQL Server Management Studio to embracing Vim, represents a significant shift in how I approach tasks in data engineering and writing.
This blog post delves into how this transition to Vim, coupled with a step-by-step adoption of Markdown, has streamlined my process. Moving away from the limitations of WYSIWYG editors, I’ve embraced the simplicity and power of Markdown, as ..read more
Simon Späti
1M ago
Data manipulation and analysis can be challenging and involve working with large datasets. Thankfully, a widely used Python library known as Pandas has become the go-to tool for processing and manipulating data. Pandas recently got an update, which is version 2.0. This article takes a closer look at what Pandas is, its success, and what the new version brings, including its ecosystem around Arrow, Polars, and DuckDB.
Pandas has established itself as the standard tool for in-memory data processing in Python, and it offers an extensive range of data manipulation capabilities. As such, it is unsu ..read more
Simon Späti
1M ago
As I sit down to write this article, I’m filled with a sense of vulnerability and excitement. You see, this is a story that only I can tell. It’s a tale of finding my Pathless Path and discovering who I am in the process.
I have learned that some of my best decision-making comes from following my gut, heart, and intuition - a place of inner knowing.
Along the way, I discovered the importance of staying flexible and adaptable. I learned that life is a journey, and it can be overwhelming; there’s no one right way to live it ..read more
Simon Späti
1y ago
Welcome to the third and final installment of our series “Data Modeling: The Unsung Hero of Data Engineering.” If you’ve journeyed with us from Part 1, where we dove into the importance and history of data modeling, or joined us in Part 2 to explore various approaches and techniques, I’m delighted you’ve stuck around. In this third part, we’ll delve into data architecture patterns and their influence on data modeling. We’ll explore general and specialized patterns, debating the merits of various approaches like batch vs ..read more
Simon Späti
1y ago
Amidst the excitement and hype surrounding artificial intelligence, the significance of data engineering and its critical foundation—data modeling—can often be overlooked. This article is the first in a three-part series that will shine a spotlight on the fascinating world of data modeling, delving into its crucial importance within the broader context of data engineering. We will explore the history of data modeling, pioneered by visionaries like Kimball and Inmon, and discuss the necessity for a comprehensive understanding of data architecture in today’s data-driven world ..read more