Tutorial: Run your R (SparklyR) workloads at scale with Spark-on-Kubernetes
Data Mechanics Blog
by
1y ago
A step-by-step tutorial to help you run R applications with Spark on a Kubernetes cluster using the SparklyR library. We'll go through building a compatible Docker image, building the code of the SparlyR application itself, and deploying it on Data Mechanics ..read more
Visit website
Apache Spark 3.2 Release: Main Features and What's New for Spark-on-Kubernetes
Data Mechanics Blog
by
1y ago
Apache Spark 3.2 is now released and available on our platform. Spark 3.2 bundles Hadoop 3.3.1, Koalas (for Pandas users) and RocksDB (for Streaming users). For Spark-on-Kubernetes users, Persistent Volume Claims (k8s volumes) can now "survive the death" of their Spark executor and be recovered by Spark, preventing the loss of precious shuffle files ..read more
Visit website
Tutorial: Running PySpark inside Docker containers
Data Mechanics Blog
by
1y ago
In this tutorial, we'll show you how to build your first PySpark applications from scratch and run it inside a Docker container. We'll also show you how to install libraries (like koalas) and write to a data sink (postgres database ..read more
Visit website
How the United Nations Modernized their Maritime Traffic Data Exploration while cutting costs by 70%
Data Mechanics Blog
by
1y ago
By migrating from HBase and EMR to the Data Mechanics platform, the united nations reduced their costs by 70% while improving their team productivity and development experience ..read more
Visit website
Spark and Docker: Your Spark development cycle just got 10x faster !
Data Mechanics Blog
by
1y ago
Native support for Docker is in fact one of the main reasons companies choose to deploy Spark on top of Kubernetes instead of YARN. In this article, we will illustrate the benefits of Docker for Apache Spark by going through the end-to-end development cycle used by many of our users at Data Mechanics ..read more
Visit website
Spark on Kubernetes Made Easy: How Data Mechanics Improves On The Open-Source Version
Data Mechanics Blog
by
1y ago
How Is Data Mechanics different than running Spark on Kubernetes open-source? In this article, we explain how our platform extends and improves on Spark on Kubernetes to make it easy-to-use, flexible, and cost-effective. We'll go over our intuitive user interfaces, dynamic optimizations, and custom integrations ..read more
Visit website
How to be successful with Apache Spark in 2021
Data Mechanics Blog
by
1y ago
Apache Spark is the leading technology for data engineering at scale. But making Spark easy-to-use, stable, and cost-efficient remains challenging. In this article, the AI & Data consulting firm Quantmetry and Data Mechanics team up to give you their best practices to ensure you're successful with Spark in 2021 ..read more
Visit website
Data + AI Summit Europe 2020 Highlights
Data Mechanics Blog
by
1y ago
Data + AI Summit 2020 Highlights: What’s new for the Apache Spark community? In this article we’ll go over the highlights of the conference, focusing on the new developments which were recently added to Apache Spark or are coming up in the coming months: Spark on Kubernetes, Koalas, Project Zen ..read more
Visit website
Released: Free Cross-platform Spark UI & Spark History Server
Data Mechanics Blog
by
1y ago
Today we’re releasing a web-based Spark UI and Spark History Server which work on top of any Spark platform, whether it’s on-premise or in the cloud, over Kubernetes or YARN, with a commercial service or using open-source Apache Spark. This is our first step towards building Data Mechanics Delight - the new and improved Spark UI ..read more
Visit website
Cost-Effective Weather Analytics At Scale with Cloud-Native Apache Spark
Data Mechanics Blog
by
1y ago
Customer Story: Weather2020 is a predictive weather analytics company. In 3 weeks, their data engineering team built Apache Spark pipelines ingesting terabytes of weather data to power their core product. Data Mechanics performance optimizations and pricing model lowered their costs by 60% compared to Databricks, the main alternative they considered ..read more
Visit website

Follow Data Mechanics Blog on Feedspot

Continue with Google
OR