Cloudera | Data Engineering on Feedspot

Cloudera | Data Engineering

1,123 FOLLOWERS

Read articles on Apache Hive, Apache Impala, Kubernetes and Enterprise data cloud. Cloudera accelerate's digital transformation for the world's largest enterprises. They help innovative organizations across all industries tackle transformational use cases and exact real-time insights from an ever-increasing amount of data to drive value and competitive differentiation.

DNS Zone Setup Best Practices on Azure

Cloudera | Data Engineering

by Dongkai Yu

5M ago

In Cloudera deployments on public cloud, one of the key configuration elements is the DNS. Get it wrong and your deployment may become wholly unusable with users unable to access and use the Cloudera data services. If the DNS is set up less ideal than it could be, connectivity and performance issues may arise. In this blog, we’ll take you through our tried and tested best practices for setting up your DNS for use with Cloudera on Azure. To get started and give you a feel for the dependencies for the DNS, in an Azure deployment for Cloudera, these are the Azure managed services being used:&nbs ..read more

Visit website

Using Dead Letter Queues with SQL Stream Builder

Cloudera | Data Engineering

by Cloudera

1y ago

What is a dead letter queue (DLQ)? Cloudera SQL Stream builder gives non-technical users the power of a unified stream processing engine so they can integrate, aggregate, query, and analyze both streaming and batch data sources in a single SQL interface. This allows business users to define events of interest for which they need to continuously monitor and respond quickly. A dead letter queue (DLQ) can be used if there are deserialization errors when events are consumed from a Kafka topic. DLQ is useful to see if there are any failures due to invalid input in the source Kafka topic and makes i ..read more

Visit website

Trusted Data: Alchemy For Misinformation

Cloudera | Data Engineering

by Shayde Christian

1y ago

The best description of untrusted data I’ve ever heard is, “We all attend the QBR – Sales, Marketing, Finance – and present quarterly results, except the Sales reports and numbers don’t match Marketing numbers and neither match Finance reports. We argue about where the numbers came from, then after 45 minutes of digging for common ground, we chuck our shovels and abandon the call in disgust.” How would you go about fixing that situation? How would you get the trust into trusted data? Consult the Book of Spells Our spells are cast from our Enterprise Business Glossary. Our wizard is Data ..read more

Visit website

Materialized Views in SQL Stream Builder

Cloudera | Data Engineering

by Cloudera

1y ago

What is a materialized view? Cloudera SQL Stream Builder (SSB) gives the power of a unified stream processing engine to non-technical users so they can integrate, aggregate, query, and analyze both streaming and batch data sources in a single SQL interface. This allows business users to define events of interest for which they need to continuously monitor and respond quickly. There are many ways to distribute the results of SSB’s continuous queries to embed actionable insights into business processes. In this blog we will cover materialized views—a special type of sink that makes t ..read more

Visit website

Implementing and Using UDFs in Cloudera SQL Stream Builder

Cloudera | Data Engineering

by Cloudera

1y ago

Cloudera’s SQL Stream Builder (SSB) is a versatile platform for data analytics using SQL. As apart of Cloudera Streaming Analytics it enables users to easily write, run, and manage real-time SQL queries on streams with a smooth user experience, while it attempts to expose the full power of Apache Flink. SQL has been around for a long time, and it is a very well understood language for querying data. The SQL standard has had time to mature, and thus it provides a complete set of tools for querying and analyzing data. Nevertheless, as good as it is sometimes it is necessary, or at least desirab ..read more

Visit website

Job Notifications in SQL Stream Builder

Cloudera | Data Engineering

by Botond Kismoni

1y ago

Special co-author credits: Adam Andras Toth, Software Engineer Intern With enterprises’ needs for data analytics and processing getting more complex by the day, Cloudera aims to keep up with these needs, offering constantly evolving, cutting-edge solutions to all your data related problems. Cloudera Stream Processing aims to take real-time data analytics to the next level. We’re excited to highlight job monitoring with notifications, a new feature for SQL Stream Builder (SSB). What problem are we solving with job notifications? The sudden failing of a complex data pipeline can lead to devasta ..read more

Visit website

Spark Technical Debt Deep Dive

Cloudera | Data Engineering

by François Reynald

1y ago

How Bad is Bad Code: The ROI of Fixing Broken Spark Code Once in a while I stumble upon Spark code that looks like it has been written by a Java developer and it never fails to make me wince because it is a missed opportunity to write elegant and efficient code: it is verbose, difficult to read, and full of distributed processing anti-patterns. One such occurrence happened a few weeks ago when one of my colleagues was trying to make some churn analysis code downloaded from GitHub work. I was looking for some broken code to add a workshop to our Spark Performance Tuning class and write a blog p ..read more

Visit website

Optimizing the Energy Sector with Data Analytics

Cloudera | Data Engineering

by Pablo Boixeda

1y ago

Across the energy supply chain from generation to consumer, we can see that the trend toward investing in renewable energy has picked up pace as demand has grown for energy companies to actively pursue investments in energies with little or no environmental impact in the quest for decarbonisation. McKinsey estimates that by 2035, 50% of energy will be wind and solar. The move toward renewable energy has a distinct and significant impact on energy generation and distribution that needs to be carefully managed. Efficient use of data will therefore be critical to improving the competitiveness an ..read more

Visit website

Cloudera Named a Leader in the 2022 Gartner® Magic Quadrant™ for Cloud Database Management Systems (DBMS)

Cloudera | Data Engineering

by David Dichmann

1y ago

We are pleased to announce that Cloudera has been named a Leader in the 2022 Gartner® Magic Quadrant for Cloud Database Management Systems. Cloudera has been recognized in this cloud DBMS report since its inception in 2020. This year we’ve been named a Leader. This validates our significant momentum in global enterprises. And together, with our recent recognition in the Gartner Peer Insights Customer Choice Distinction for Cloud DBMS, cements our position as an industry leader. We’re proud to be recognized for the data management and data analytics innovations we have delivered in the new Clo ..read more

Visit website

Implement A Multi-cloud Open Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera | Data Engineering

by Bill Zhang

1y ago

Since we announced the general availability of Apache Iceberg in Cloudera Data Platform, Cloudera customers, such as Teranet, have built open lakehouses to future-proof their data platforms for all their analytical workloads. Cloudera partners are also benefiting from Apache Iceberg in CDP. For example Modak Nabu, is helping their enterprise customers accelerate data ingestion, curation and consumption at petabyte scale. Today, we are thrilled to share some new advancements in Cloudera’s integration of Apache Iceberg in CDP, such as to help accelerate your multi-cloud open data lakehouse impl ..read more

Visit website

Follow Cloudera | Data Engineering on FeedSpot