Setting Up Kafka Multi-Tenancy 
DoorDash Engineering Blog
by Yunji Zhong, Amit Gud and Carlos Herrera
3w ago
Real-time event processing is a critical component of a distributed system’s scalability. At DoorDash, we rely on message queue systems based on Kafka to handle billions of real-time events. One of the challenges we face, however, is how to properly validate the system before going live. Traditionally, an isolated environment such as staging is used to validate new features. But setting up a different data traffic pipeline in a staging environment to mimic billions of real-time events is difficult and inefficient, while requiring ongoing maintenance to keep data up-to-date. To address this cha ..read more
Visit website
Improving ETAs with Multi-Task Models, Deep Learning, and Probabilistic Forecasts
DoorDash Engineering Blog
by Chi Zhang, Lewis Warne, Ziqi Jiang, Qingyang Xu, Hubert Jenq and Jianzhe Luo
1M ago
The DoorDash ETA team is committed to providing an accurate and reliable estimated time of arrival (ETA) as a cornerstone DoorDash consumer experience. We want to ensure that every customer can trust our ETAs, ensuring a high-quality experience in which their food arrives on time every time.  With more than 2 billion orders annually, our dynamic engineering challenge is to improve and maintain accuracy at scale while managing a variety of conditions within diverse delivery and merchant scenarios. Here we delve into three critical focus areas aimed at accomplishing this:  Extending o ..read more
Visit website
Introducing DoorDash’s In-House Search Engine
DoorDash Engineering Blog
by Konstantin Shulgin, Satish Subhashrao Saley and Anish Walawalkar
2M ago
We reviewed the architecture of our global search at DoorDash in early 2022 and concluded that our rapid growth meant within three years we wouldn’t be able to scale the system efficiently, particularly as global search shifted from store-only to a hybrid item-and-store search experience. Our analysis identified Elasticsearch as our architecture’s primary bottleneck. Two primary aspects of that search engine were causing the trouble: its document-replication mechanism and its lack of support for complex document relationships. In addition, Elasticsearch does not provide internal capabilities f ..read more
Visit website
Experiment Faster and with Less Effort
DoorDash Engineering Blog
by Yicong ("Nicole") Lin and Yixin Tang
2M ago
Business Policy Experiments Using Fractional Factorial Designs At DoorDash, we constantly strive to improve our experimentation processes by addressing four key dimensions, including velocity to increase how many experiments we can conduct,  toil to minimize our launch and analysis efforts, rigor to ensure a sound experimental design and robustly efficient analyses, and efficiency to reduce costs associated with our experimentation efforts. Here we introduce a new framework that has demonstrated significant improvements in the first two of these dimensions: velocity and toil. Because Door ..read more
Visit website
Cassandra Unleashed: How We Enhanced Cassandra Fleet’s Efficiency and Performance
DoorDash Engineering Blog
by Seed Zeng
3M ago
In the realm of distributed databases, Apache Cassandra stands out as a significant player. It offers a blend of robust scalability and high availability without compromising on performance. However, Cassandra also is notorious for being hard to tune for performance and for the pitfalls that can arise during that process. The system’s expansive flexibility, while a key strength, also means that effectively harnessing its full capabilities often involves navigating a complex maze of configurations and performance trade-offs. If not carefully managed, this complexity can sometimes lead to unexpe ..read more
Visit website
Meeting DoorDash Growth with a Self-Service Logistics Configuration Platform 
DoorDash Engineering Blog
by Saurabh Gupta and Reid Arwood
3M ago
DoorDash has grown from executing simple restaurant deliveries to working with a wide variety of businesses, ranging from grocery and retail to parcels and pet supplies. Each business faces its own set of constraints as it strives to meet its goals. Our logistics teams — which range across a number of functions, including Dashers, assignment, payment processes, and time estimations — seek to achieve these goals by tuning a variety of configurations for each use case and type of business.  Although that process started with a limited set of configurations, the old system struggled to keep ..read more
Visit website
Staying in the Zone: How DoorDash used a service mesh to manage  data transfer, reducing hops and cloud spend
DoorDash Engineering Blog
by Hochuen Wong and Levon Stepanian
3M ago
There have been many benefits gained through DoorDash’s evolution from a monolithic application architecture to one that is based on cells and microservices. The new architecture has reduced the time required for development, test, and deployment and at the same time has improved scalability and resiliency for end-users including merchants, Dashers, and consumers. As the number of microservices and back-ends has grown, however, DoorDash has observed an uptick in cross-availability zone (AZ) data transfer costs. These data transfer costs — incurred on both send and receive — allow DoorDash to p ..read more
Visit website
Personalizing the DoorDash Retail Store Page Experience
DoorDash Engineering Blog
by Luming Chen, Yuan Meng and Anthony Zhou
4M ago
The DoorDash retail shopping experience mission seeks to combine the best parts of in-person shopping with the power of personalization. While shopping in a physical store has its advantages, a brick-and-mortar store cannot be personalized – the onus is on the consumer to navigate aisles to find what they need. Conversely, a digital shopping experience can be highly personalized. By understanding each consumer’s purchasing history, dietary restrictions, favorite brands, and other personalized details, we not only can recommend items that reflect a consumer’s unique shopping needs and preferenc ..read more
Visit website
Atlantis Hardening and Review Fatigue
DoorDash Engineering Blog
by Dmitriy Dunin and Ron Waisberg
5M ago
Many organizations use infrastructure-as-code (IaC) with pull request (PR) automation to provide a more secure, safe environment for making infrastructure changes. Despite the power and flexibility of IaC software, the lack of strong, secure defaults in PR automation software can make that sense of security a false one. Infrastructure-as-code and pull request automation IaC enables a declarative, reusable, and auditable way to manage configuration changes. At DoorDash, the primary platform for this is Terraform, operated by an account-isolated or specifically configured Atlantis instance runni ..read more
Visit website
API-First Approach to Kafka Topic Creation
DoorDash Engineering Blog
by Varun Chakravarthy, Basar Onat, Seed Zeng and Luke Christopherson
5M ago
DoorDash’s Engineering teams revamped Kafka Topic creation by replacing a Terraform/Atlantis based approach with an in-house API, Infra Service. This has reduced real-time pipeline onboarding time by 95% and saved countless developer hours. DoorDash’s Real-Time Streaming Platform, or RTSP, team is under the Data Platform organization and manages over 2,500 Kafka Topics across five clusters. Kafka is the pub-sub layer of the Iguazu pipeline, which provides real-time event delivery at DoorDash. Almost six billion messages are processed each day at an average rate of four million messages per min ..read more
Visit website

Follow DoorDash Engineering Blog on FeedSpot

Continue with Google
Continue with Apple
OR