Spark Technical Debt Deep Dive
Cloudera | Data Engineering
by François Reynald
5h ago
How Bad is Bad Code: The ROI of Fixing Broken Spark Code Once in a while I stumble upon Spark code that looks like it has been written by a Java developer and it never fails to make me wince because it is a missed opportunity to write elegant and efficient code: it is verbose, difficult to read, and full of distributed processing anti-patterns. One such occurrence happened a few weeks ago when one of my colleagues was trying to make some churn analysis code downloaded from GitHub work. I was looking for some broken code to add a workshop to our Spark Performance Tuning class and write a blog p ..read more
Visit website
Optimizing the Energy Sector with Data Analytics
Cloudera | Data Engineering
by Pablo Boixeda
1M ago
Across the energy supply chain from generation to consumer, we can see that the trend toward investing in renewable energy has picked up pace as demand has grown for energy companies to actively pursue investments in energies with little or no environmental impact in the quest for decarbonisation. McKinsey estimates that by 2035, 50% of energy will be wind and solar. The move toward renewable energy has a distinct and significant impact on energy generation and distribution that needs to be carefully managed. Efficient use of data will therefore be critical to improving the competitiveness an ..read more
Visit website
Cloudera Named a Leader in the 2022 Gartner® Magic Quadrant™ for Cloud Database Management Systems (DBMS)
Cloudera | Data Engineering
by David Dichmann
1M ago
We are pleased to announce that Cloudera has been named a Leader in the 2022 Gartner® Magic Quadrant for Cloud Database Management Systems. Cloudera has been recognized in this cloud DBMS report since its inception in 2020. This year we’ve been named a Leader. This validates our significant momentum in global enterprises. And together, with our recent recognition in the Gartner Peer Insights Customer Choice Distinction for Cloud DBMS, cements our position as an industry leader. We’re proud to be recognized for the data management and data analytics innovations we have delivered in the new Clo ..read more
Visit website
Implement A Multi-cloud Open Lakehouse with Apache Iceberg in Cloudera Data Platform
Cloudera | Data Engineering
by Bill Zhang
2M ago
Since we announced the general availability of Apache Iceberg in Cloudera Data Platform, Cloudera customers, such as Teranet, have built open lakehouses to future-proof their data platforms for all their analytical workloads. Cloudera partners are also benefiting from Apache Iceberg in CDP. For example Modak Nabu, is helping their enterprise customers accelerate data ingestion, curation and consumption at petabyte scale. Today, we are thrilled to share some new advancements in Cloudera’s integration of Apache Iceberg in CDP, such as to help accelerate your multi-cloud open data lakehouse impl ..read more
Visit website
Enriching Streams with Hive tables via Flink SQL
Cloudera | Data Engineering
by Jimit Patel
2M ago
Introduction Stream processing is about creating business value by applying logic to your data while it is in motion. Many times that involves combining data sources to enrich a data stream. Flink SQL does this and directs the results of whatever functions you apply to the data into a sink. Business use cases, such as fraud detection, advertising impression tracking, health care data enrichment, augmenting financial spend information, GPS device data enrichment, or personalized customer communication are great examples of using hive tables for enriching datastreams. Therefore, there are two co ..read more
Visit website
Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)
Cloudera | Data Engineering
by Raghotham Murthy
4M ago
Introduction dbt allows data teams to produce trusted data sets for reporting, ML modeling, and operational workflows using SQL, with a simple workflow that follows software engineering best practices like modularity, portability, and continuous integration/continuous development (CI/CD). We’re excited to announce the general availability of the open source adapters for dbt for all the engines in CDP—Apache Hive, Apache Impala, and Apache Spark, with added support for Apache Livy and Cloudera Data Engineering. Using these adapters, Cloudera customers can use dbt to collaborate, test, deploy, a ..read more
Visit website
The Modern Data Lakehouse: An Architectural Innovation
Cloudera | Data Engineering
by David Dichmann
5M ago
The promise of a modern data lakehouse architecture Imagine having self-service access to all business data, anywhere it may be, and being able to explore it all at once. Imagine quickly answering burning business questions nearly instantly, without waiting for data to be found, shared, and ingested. Imagine independently discovering rich new business insights from both structured and unstructured data working together, without having to beg for data sets to be made available. As a data analyst or data scientist, we would all love to be able to do all these things, and much more. This is the p ..read more
Visit website
Building Custom Runtimes with Editors in Cloudera Machine Learning
Cloudera | Data Engineering
by Oleksandr Akulov
6M ago
Cloudera Machine Learning (CML) is a cloud-native and hybrid-friendly machine learning platform. It unifies self-service data science and data engineering in a single, portable service as part of an enterprise data cloud for multi-function analytics on data anywhere. CML empowers organizations to build and deploy machine learning and AI capabilities for business at scale, efficiently and securely, anywhere they want. It’s built for the agility and power of cloud computing, but isn’t limited to any one cloud provider or data source. Data professionals who use CML spend the vast majority of the ..read more
Visit website
How to Use Apache Iceberg in CDP’s Open Lakehouse
Cloudera | Data Engineering
by Bill Zhang
6M ago
In June 2022, Cloudera announced the general availability of Apache Iceberg in the Cloudera Data Platform (CDP). Iceberg is a 100% open-table format, developed through the Apache Software Foundation, which helps users avoid vendor lock-in and implement an open lakehouse.  The general availability covers Iceberg running within some of the key data services in CDP, including Cloudera Data Warehouse (CDW), Cloudera Data Engineering (CDE), and Cloudera Machine Learning (CML). These connections empower analysts and data scientists to easily collaborate on the same data, with their choice of t ..read more
Visit website
Applying Fine Grained Security to Apache Spark
Cloudera | Data Engineering
by Shaun Ahmadian
6M ago
Fine grained access control (FGAC) with Spark Apache Spark with its rich data APIs has been the processing engine of choice in a wide range of  applications from data engineering to machine learning, but its security integration has been a pain point.t    Many enterprise customers needi finer granularity of control, in particular at the column and row level (commonly known as Fine Grained Access Control or FGAC). The challenges of arbitrary code execution notwithstanding, there have been attempts to provide a stronger security model but with mixed results.  One approach is ..read more
Visit website

Follow Cloudera | Data Engineering on Feedspot

Continue with Google
OR