Cross-Cluster Replication for read-write separation: story of a grocery store brand
Apache Doris
by Apache Doris
15h ago
Cross-cluster replication (CCR) in Apache Doris is proven to be fast, stable, and easy to use. It secures a real-time data synchronization latency of 1 second. This is about how a grocery store brand leverages the Cross-Cluster Replication (CCR) capability of Apache Doris to separate their data reading and writing workloads. In this case, where the freshness of groceries is guaranteed by the freshness of data, they use Apache Doris as their data warehouse to monitor and analyze their procurement, sale, and stock in real time for all their stores and supply chains. Why they need  ..read more
Visit website
Arrow Flight SQL in Apache Doris for 10X faster data transfer
Apache Doris
by Apache Doris
1w ago
Apache Doris 2.1 supports Arrow Flight SQL protocol for reading data from Doris. It delivers tens-fold speedups compared to PyMySQL and Pandas. For years, JDBC and ODBC have been commonly adopted norms for database interaction. Now, as we gaze upon the vast expanse of the data realm, the rise of data science and data lake analytics brings bigger and bigger datasets. Correspondingly, we need faster and faster data reading and transmission, so we start to look for better answers than JDBC and ODBC. Thus, we include Arrow Flight SQL protocol into Apache Doris 2.1, which provides tens-fold pe ..read more
Visit website
Auto-increment columns in databases: a simple magic that makes a big difference
Apache Doris
by Apache Doris
2w ago
Auto-increment columns in Apache Doris accelerates dictionary encoding and pagination without damaging data writing performance. This is an introduction to its usage, applicable scenarios, and implementation details. Auto-increment column is a bread-and-butter feature of single-node transactional databases. It assigns a unique identifier for each row in a way that requires the least manual effort from users. With an auto-increment column in the table, whenever a new row is inserted into the table, the new row will be assigned with the next available value from the auto-increment sequence. This ..read more
Visit website
Variant in Apache Doris 2.1.0:
Apache Doris
by Apache Doris
1M ago
Variant in Apache Doris 2.1.0: a new data type 8 times faster than JSON for semi-structured data analysis Semi-structured data is data arranged in flexible formats. Unlike structured data, it does not require data users to pre-define the table schema for it, so it provides convenience for data storage and analysis. Common forms of semi-structured data include XML, JSON, and log files. They are widely seen in the following industry scenarios: E-commerce platforms store user reviews of products as semi-structured data for sentiment analysis and user behavior pattern mining. Telecommun ..read more
Visit website
Apache Doris 2.1.0 is released! 100% higher out-of-the-box performance
Apache Doris
by Apache Doris
1M ago
Dear Apache Doris community, we are thrilled to announce the advent of Apache Doris 2.1.0. In this version, you can expect: Higher out-of-the-box query performance: 100% faster speed proven by TPC-DS 1TB benchmark tests. Improved data lake analytics capabilities: 4~6 times faster than Trino and Spark, compatibility with various SQL dialects for smooth migration, read/write interface based on Arrow Flight for 100 times faster data transfer. Solid support for semi-structured data analysis: a newly-added Variant data type, support for more IP types, and a more comprehensive suite of analyti ..read more
Visit website
Breaking down data silos with a unified data warehouse: an Apache Doris-based CDP
Apache Doris
by Apache Doris
1M ago
An insurance company uses Apache Doris, a unified data warehouse, in replacement of Spark + Impala + HBase + NebulaGraph, in their Customer Data Platform for 4 times faster customer grouping. The data silos problem is like arthritis for online business, because almost everyone gets it as they grow old. Businesses interact with customers via websites, mobile apps, H5 pages, and end devices. For one reason or another, it is tricky to integrate the data from all these sources. Data stays where it is and cannot be interrelated for further analysis. That’s how data silos come to form. The bigger yo ..read more
Visit website
A financial anti-fraud solution based on the Apache Doris data warehouse: case study of a retail…
Apache Doris
by Apache Doris
2M ago
A financial anti-fraud solution based on the Apache Doris data warehouse: case study of a retail bank Financial fraud prevention is a race against time. This post will get into details about how a retail bank builds their fraud risk management platform based on Apache Doris and how it performs. Financial fraud prevention is a race against time. Implementation-wise, it relies heavily on the data processing power, especially under large datasets. Today I’m going to share with you the use case of a retail bank with over 650 million individual customers. They have compared analytics component ..read more
Visit website
How Inverted Index Accelerates Text Searches by 40 Times
Apache Doris
by Apache Doris
3M ago
As an open-source real-time data warehouse, Apache Doris provides a rich choice of indexes to speed up data scanning and filtering. Based on user involvement, they can be divided into built-in smart indexes and user-created indexes. The former is automatically generated by Apache Doris on data ingestion, such as ZoneMap index and prefix index, while the latter is the index users choose for various use cases, including inverted index and NGram BloomFilter index. This post is a deep dive into inverted index and NGram BloomFilter index, providing a hands-on guide to applying them for various  ..read more
Visit website
The financial sector’s choice: fast, secure, and highly available real-time data warehousing based…
Apache Doris
by Apache Doris
3M ago
The financial sector’s choice: fast, secure, and highly available real-time data warehousing based on Apache Doris This is a whole-journey guide for Apache Doris users, especially those from the financial sector which requires a high level of data security and availability. If you don’t know how to build a real-time data pipeline and make the most of the Apache Doris functionalities, start with this post and you will be loaded with inspiration after reading. This is the best practice of a non-banking payment service provider that serves over 25 million retailers and processes data fr ..read more
Visit website
From Elasticsearch to Apache Doris: upgrading an observability platform
Apache Doris
by Apache Doris
4M ago
Observability platforms are akin to the immune system. Just like immune cells are everywhere in human bodies, an observability platform patrols every corner of your devices, components, and architectures, identifying any potential threats and proactively mitigating them. However, I might have gone too far with that metaphor, because till these days, we have never invented a system as sophisticated as the human body, but we can always make advancements. The key to upgrading an observability platform is to increase data processing speed and reduce costs. This is based on two reasons: The f ..read more
Visit website

Follow Apache Doris on FeedSpot

Continue with Google
Continue with Apple
OR