Data Mechanics Blog on Feedspot

Data Mechanics Blog

974 FOLLOWERS

Learn about best practices on Apache Spark, Kubernetes, data science and data engineering at scale, straight from the Data Mechanics engineering team. We give superpowers to the data scientists and engineers of the world so they can make sense of their data and build applications at scale on top of it. Today it takes a lot of manual and mechanical work to build, deploy, configure, and maintain..

Tutorial: Run your R (SparklyR) workloads at scale with Spark-on-Kubernetes

Data Mechanics Blog

2y ago

A step-by-step tutorial to help you run R applications with Spark on a Kubernetes cluster using the SparklyR library. We'll go through building a compatible Docker image, building the code of the SparlyR application itself, and deploying it on Data Mechanics ..read more

Visit website

Apache Spark 3.2 Release: Main Features and What's New for Spark-on-Kubernetes

Data Mechanics Blog

2y ago

Apache Spark 3.2 is now released and available on our platform. Spark 3.2 bundles Hadoop 3.3.1, Koalas (for Pandas users) and RocksDB (for Streaming users). For Spark-on-Kubernetes users, Persistent Volume Claims (k8s volumes) can now "survive the death" of their Spark executor and be recovered by Spark, preventing the loss of precious shuffle files ..read more

Visit website

Tutorial: Running PySpark inside Docker containers

Data Mechanics Blog

2y ago

In this tutorial, we'll show you how to build your first PySpark applications from scratch and run it inside a Docker container. We'll also show you how to install libraries (like koalas) and write to a data sink (postgres database ..read more

Visit website

How the United Nations Modernized their Maritime Traffic Data Exploration while cutting costs by 70%

Data Mechanics Blog

2y ago

By migrating from HBase and EMR to the Data Mechanics platform, the united nations reduced their costs by 70% while improving their team productivity and development experience ..read more

Visit website

Spark and Docker: Your Spark development cycle just got 10x faster !

Data Mechanics Blog

3y ago

Native support for Docker is in fact one of the main reasons companies choose to deploy Spark on top of Kubernetes instead of YARN. In this article, we will illustrate the benefits of Docker for Apache Spark by going through the end-to-end development cycle used by many of our users at Data Mechanics ..read more

Visit website

Spark on Kubernetes Made Easy: How Data Mechanics Improves On The Open-Source Version

Data Mechanics Blog

3y ago

How Is Data Mechanics different than running Spark on Kubernetes open-source? In this article, we explain how our platform extends and improves on Spark on Kubernetes to make it easy-to-use, flexible, and cost-effective. We'll go over our intuitive user interfaces, dynamic optimizations, and custom integrations ..read more

Visit website

How to be successful with Apache Spark in 2021

Data Mechanics Blog

3y ago

Apache Spark is the leading technology for data engineering at scale. But making Spark easy-to-use, stable, and cost-efficient remains challenging. In this article, the AI & Data consulting firm Quantmetry and Data Mechanics team up to give you their best practices to ensure you're successful with Spark in 2021 ..read more

Visit website

Data + AI Summit Europe 2020 Highlights

Data Mechanics Blog

3y ago

Data + AI Summit 2020 Highlights: What’s new for the Apache Spark community? In this article we’ll go over the highlights of the conference, focusing on the new developments which were recently added to Apache Spark or are coming up in the coming months: Spark on Kubernetes, Koalas, Project Zen ..read more

Visit website

Released: Free Cross-platform Spark UI & Spark History Server

Data Mechanics Blog

3y ago

Today we’re releasing a web-based Spark UI and Spark History Server which work on top of any Spark platform, whether it’s on-premise or in the cloud, over Kubernetes or YARN, with a commercial service or using open-source Apache Spark. This is our first step towards building Data Mechanics Delight - the new and improved Spark UI ..read more

Visit website

Cost-Effective Weather Analytics At Scale with Cloud-Native Apache Spark

Data Mechanics Blog

3y ago

Customer Story: Weather2020 is a predictive weather analytics company. In 3 weeks, their data engineering team built Apache Spark pipelines ingesting terabytes of weather data to power their core product. Data Mechanics performance optimizations and pricing model lowered their costs by 60% compared to Databricks, the main alternative they considered ..read more

Visit website

Follow Data Mechanics Blog on FeedSpot