Apache Doris
97 FOLLOWERS
Read writing from ApacheDoris on Medium. An open-source real-time data warehouse. Featuring critical and useful insights for Data Engineers, Data analysts, and enthusiasts. Follow step-by-step guides, discussions, and solutions to various complex problems and industry news.
Apache Doris
2M ago
The Apache Doris community is excited to announce its first in-person meetup in Singapore!
Ready to meet the brilliant minds in the big data world and enjoy a friendly open-source atmosphere? This meetup brings together our main developers, users, and friends to discuss various topics about Apache Doris, including technology introductions and user experiences. Whether you are new to Apache Doris or already know it well, you will be inspired by the interesting talks and meet a lot of big data enthusiasts.
Agenda
14:30~15:00 Registration & Networking
15:00~15:05 Introductions
15:05~15:35 A ..read more
Apache Doris
2M ago
Live streaming and e-commerce are among the biggest revenue drivers of TikTok, and they both rely on real-time data processing. It is more challenging than offline batch processing because it involves complicated operations like multi-stream JOINs and dimension table changes. It requires a higher level of development and maintenance input, and due to the need for system stability guarantee, it often leads to resource redundancy and waste.
We are excited to invite the data platform team of TikTok to talk about how they use Apache Doris in their real-time data architecture and how they bene ..read more
Apache Doris
3M ago
To handle large datasets, distributed databases introduce strategies like partitioning and bucketing. Data is divided into smaller units based on specific rules and distributed across different nodes, so databases can perform parallel processing for higher performance and data management flexibility.
Like in many databases, Apache Doris shards data into partitions, and then a partition is further divided into buckets. Partitions are typically defined by time or other continuous values. This allows query engines to quickly locate the target data during queries by pruning irrelevant data ra ..read more
Apache Doris
4M ago
For more than three years, we’ve avoided talking about StarRocks, a fork of Apache Doris, whose history remains partly unknown by the majority.
Apache Doris have focused on improving its features, performance, ease of use, and reliability, while also helping expand the open-source community and supporting more users. This has been the top priority for us. However, lately we saw a comparative article by StarRocks, which was filled with bias and distasteful implications. That’s why we decided to respond to this so-called comparison.
While both Apache Doris and StarRocks are excellent OLAP databa ..read more
Apache Doris
5M ago
Among of all the claim-to-be alternatives to Rockset, Apache Doris is one of the few that cover all the key features of Rockset.
By Zaki Lu, Apache Doris Committer
OpenAI dropped a bomb on the data world by announcing the acquisition of Rockset, a cloud-based, fully managed analytical database. Among all the congratulating voices, one question is raised: why Rockset?
Founded in 2016 by Venkat Venkataramani, former Engineering Director at Meta, Rockset focuses on real-time search and data analytics. Compared to other DBMS, Rockset stands out by its:
Real-time data updates: Rock ..read more
Apache Doris
6M ago
From the Volcano Model to the Pipeline Execution Engine, and now PipelineX, Apache Doris brings its computation efficiency to a higher level with each iteration.
What makes a modern database system? The three key modules are query optimizer, execution engine, and storage engine. Among them, the role of execution engine to the DBMS is like the chef to a restaurant. This article focuses on the execution engine of the Apache Doris data warehouse, explaining the secret to its high performance.
To illustrate the role of the execution engine, let’s follow the execution process of an SQL statement ..read more
Apache Doris
6M ago
The built-in Doris Job Scheduler triggers pre-defined operations efficiently and reliably. It is useful in many cases including ETL and data lake analytics.
Job scheduling is an important part of data management as it enables regular data updates and cleanups. In a data platform, it is often undertaken by workflow orchestration tools like Apache Airflow and Apache Dolphinscheduler. However, adding another component to the data architecture also means investing extra resources for management and maintenance. That’s why Apache Doris 2.1.0 introduces a built-in Job Scheduler. It is strategically ..read more
Apache Doris
7M ago
NetEase has replaced Elasticsearch and InfluxDB with Apache Doris in its monitoring and time series data analysis platforms, respectively, achieving 11X query performance and saving 70% of resources.
For most people looking for a log management and analytics solution, Elasticsearch is the go-to choice. The same applies to InfluxDB for time series data analysis. These were exactly the choices of NetEase, one of the world’s highest-yielding game companies but more than that. As NetEase expands its business horizons, the logs and time series data it receives explode, and problems like surging sto ..read more
Apache Doris
7M ago
Apache Doris supports workload isolation based on Resource Tag and Workload Group. It provides solutions for different tradeoffs among the level of isolation, resource utilization, and stable performance.
This is an in-depth introduction to the workload isolation capabilities of Apache Doris. But first of all, why and when do you need workload isolation? If you relate to any of the following situations, read on and you will end up with a solution:
You have different business departments or tenants sharing the same cluster and you want to prevent the interference of workloads among them ..read more
Apache Doris
7M ago
Users can execute queries with their old SQL syntaxes directly in Doris or batch convert their existing SQL statements on the visual SQL conversion interface.
Apache Doris is an all-in-one data platform that is capable of real-time reporting, ad-hoc queries, data lakehousing, log management and analysis, and batch data processing. As more and more companies have been replacing their component-heavy data architecture with Apache Doris, there is an increasing need for a more convenient data migration solution. That’s why the Doris SQL Convertor is made.
Most database systems run their own ..read more