Cloudera Blog » Data Warehousing
33 FOLLOWERS
The Cloudera blog provides information on Data Warehousing related to business and technical categories. At Cloudera, we empower people to transform complex data anywhere into actionable insights faster and easier. We deliver a hybrid data platform with secure data management and portable cloud-native data analytics.
Cloudera Blog » Data Warehousing
2M ago
In Cloudera deployments on public cloud, one of the key configuration elements is the DNS. Get it wrong and your deployment may become wholly unusable with users unable to access and use the Cloudera data services. If the DNS is set up less ideal than it could be, connectivity and performance issues may arise. In this blog, we’ll take you through our tried and tested best practices for setting up your DNS for use with Cloudera on Azure.
To get started and give you a feel for the dependencies for the DNS, in an Azure deployment for Cloudera, these are the Azure managed services being used:&nbs ..read more
Cloudera Blog » Data Warehousing
2M ago
Overview
This blog post describes support for materialized views for the Iceberg table format.
Apache Iceberg is a high-performance open table format for petabyte-scale analytic datasets. It has been designed and developed as an open community standard to ensure compatibility across languages and implementations. It brings the reliability and simplicity of SQL tables to big data while enabling engines like Hive, Impala, Spark, Trino, Flink, and Presto to work with the same tables at the same time. Apache Iceberg forms the core foundation for Cloudera’s Open Data Lakehouse wit ..read more
Cloudera Blog » Data Warehousing
3M ago
As described in our recent blog post, an SQL AI Assistant has been integrated into Hue with the capability to leverage the power of large language models (LLMs) for a number of SQL tasks. It can help you to create, edit, optimize, fix, and succinctly summarize queries using natural language. This is a real game-changer for data analysts on all levels and will make SQL development faster, easier, and less error-prone.
This blog post aims to help you understand what you can do to get started with generative AI assisted SQL using Hue image version 2023.0.16.0 or higher on the public clou ..read more
Cloudera Blog » Data Warehousing
4M ago
Imagine you’ve just started a new job working as a business analyst. You’ve been given a new burning business question that needs an immediate answer. How long would it take you to find the data you need to even begin to come up with a data-driven response? Imagine how many iterations of query writing you’d have to go through.
In this scenario, you also have reports that need updating as well. Those contain some of the biggest hair-ball queries you’ve ever seen. What do they mean? Imagine how long it takes to unravel those queries just to understand them, let alone make modificati ..read more
Cloudera Blog » Data Warehousing
7M ago
Fast moving data and real time analysis present us with some amazing opportunities. Don’t blink—or you’ll miss it! Every organization has some data that happens in real time, whether it is understanding what our users are doing on our websites or watching our systems and equipment as they perform mission critical tasks for us. This real-time data, when captured and analyzed in a timely manner, may deliver tremendous business value. For example:
In manufacturing, fast-moving data provides the only way to detect—or even predict and prevent—defects in real time before they pro ..read more
Cloudera Blog » Data Warehousing
8M ago
The world is awash with data, no more so than in the telecommunications (telco) industry. With some Cloudera customers ingesting multiple petabytes of data every single day— that’s multiple thousands of terabytes!—there is the potential to understand, in great detail, how people, businesses, cities and ecosystems function. This information is essential for the management of the telco business, from fault resolution to making sure families have the right content package for their needs, to supply chain dashboards for businesses based on IoT data.
The world has changed—business and people ..read more
Cloudera Blog » Data Warehousing
9M ago
Introduction
The snapshots feature of the Apache Hadoop Distributed Filesystem (HDFS) enables you to capture point-in-time copies of the file system and protect your important data against corruption, user-, or application errors. This feature is available in all versions of Cloudera Data Platform (CDP), Cloudera Distribution for Hadoop (CDH) and Hortonworks Data Platform (HDP). Regardless of whether you’ve been using snapshots for a while or contemplating their use, this blog gives you the insights and techniques to make them look their best.
Using snapshots to protect data ..read more
Cloudera Blog » Data Warehousing
10M ago
Iceberg is an emerging open-table format designed for large analytic workloads. The Apache Iceberg project continues developing an implementation of Iceberg specification in the form of Java Library. Several compute engines such as Impala, Hive, Spark, and Trino have supported querying data in Iceberg table format by adopting this Java Library provided by the Apache Iceberg project.
Different query engines such as Impala, Hive, and Spark can immediately benefit from using Apache Iceberg Java Library. A range of Iceberg table analysis such as listing table’s data file, selecting table sn ..read more
Cloudera Blog » Data Warehousing
10M ago
Apache Impala and Apache Kudu make a great combination for real-time analytics on streaming data for time series and real-time data warehousing use cases. More than 200 Cloudera customers have implemented Apache Kudu with Apache Spark for ingestion and Apache Impala for real-time BI use cases successfully over the last decade, with thousands of nodes running Apache Kudu. These use cases have varied from telecom 4G/5G analytics to real-time oil and gas reporting and alerting, to supply chain use cases for pharmaceutical companies or core banking and stock trading analytics systems.   ..read more
Cloudera Blog » Data Warehousing
1y ago
Special co-author credits: Adam Andras Toth, Software Engineer Intern
With enterprises’ needs for data analytics and processing getting more complex by the day, Cloudera aims to keep up with these needs, offering constantly evolving, cutting-edge solutions to all your data related problems. Cloudera Stream Processing aims to take real-time data analytics to the next level. We’re excited to highlight job monitoring with notifications, a new feature for SQL Stream Builder (SSB).
What problem are we solving with job notifications?
The sudden failing of a complex data pipeline can lead to devasta ..read more