Data Mechanics Blog
975 FOLLOWERS
Learn about best practices on Apache Spark, Kubernetes, data science and data engineering at scale, straight from the Data Mechanics engineering team. We give superpowers to the data scientists and engineers of the world so they can make sense of their data and build applications at scale on top of it. Today it takes a lot of manual and mechanical work to build, deploy, configure, and maintain..
Data Mechanics Blog
2y ago
A step-by-step tutorial to help you run R applications with Spark on a Kubernetes cluster using the SparklyR library. We'll go through building a compatible Docker image, building the code of the SparlyR application itself, and deploying it on Data Mechanics ..read more
Data Mechanics Blog
3y ago
Apache Spark 3.2 is now released and available on our platform. Spark 3.2 bundles Hadoop 3.3.1, Koalas (for Pandas users) and RocksDB (for Streaming users). For Spark-on-Kubernetes users, Persistent Volume Claims (k8s volumes) can now "survive the death" of their Spark executor and be recovered by Spark, preventing the loss of precious shuffle files ..read more
Data Mechanics Blog
3y ago
In this tutorial, we'll show you how to build your first PySpark applications from scratch and run it inside a Docker container. We'll also show you how to install libraries (like koalas) and write to a data sink (postgres database ..read more
How the United Nations Modernized their Maritime Traffic Data Exploration while cutting costs by 70%
Data Mechanics Blog
3y ago
By migrating from HBase and EMR to the Data Mechanics platform, the united nations reduced their costs by 70% while improving their team productivity and development experience ..read more
Data Mechanics Blog
3y ago
Native support for Docker is in fact one of the main reasons companies choose to deploy Spark on top of Kubernetes instead of YARN. In this article, we will illustrate the benefits of Docker for Apache Spark by going through the end-to-end development cycle used by many of our users at Data Mechanics ..read more
Data Mechanics Blog
3y ago
How Is Data Mechanics different than running Spark on Kubernetes open-source? In this article, we explain how our platform extends and improves on Spark on Kubernetes to make it easy-to-use, flexible, and cost-effective. We'll go over our intuitive user interfaces, dynamic optimizations, and custom integrations ..read more
Data Mechanics Blog
3y ago
Apache Spark is the leading technology for data engineering at scale. But making Spark easy-to-use, stable, and cost-efficient remains challenging. In this article, the AI & Data consulting firm Quantmetry and Data Mechanics team up to give you their best practices to ensure you're successful with Spark in 2021 ..read more
Data Mechanics Blog
3y ago
Data + AI Summit 2020 Highlights: What’s new for the Apache Spark community? In this article we’ll go over the highlights of the conference, focusing on the new developments which were recently added to Apache Spark or are coming up in the coming months: Spark on Kubernetes, Koalas, Project Zen ..read more
Data Mechanics Blog
3y ago
Today we’re releasing a web-based Spark UI and Spark History Server which work on top of any Spark platform, whether it’s on-premise or in the cloud, over Kubernetes or YARN, with a commercial service or using open-source Apache Spark. This is our first step towards building Data Mechanics Delight - the new and improved Spark UI ..read more
Data Mechanics Blog
3y ago
Customer Story: Weather2020 is a predictive weather analytics company. In 3 weeks, their data engineering team built Apache Spark pipelines ingesting terabytes of weather data to power their core product. Data Mechanics performance optimizations and pricing model lowered their costs by 60% compared to Databricks, the main alternative they considered ..read more