Cloudera Blog » Data Warehousing on Feedspot

Cloudera Blog » Data Warehousing

by Dongkai Yu

2M ago

In Cloudera deployments on public cloud, one of the key configuration elements is the DNS. Get it wrong and your deployment may become wholly unusable with users unable to access and use the Cloudera data services. If the DNS is set up less ideal than it could be, connectivity and performance issues may arise. In this blog, we’ll take you through our tried and tested best practices for setting up your DNS for use with Cloudera on Azure. To get started and give you a feel for the dependencies for the DNS, in an Azure deployment for Cloudera, these are the Azure managed services being used:&nbs ..read more

Visit website

Materialized Views in Hive for Iceberg Table Format

Cloudera Blog » Data Warehousing

by Aman Sinha

2M ago

Overview This blog post describes support for materialized views for the Iceberg table format. Apache Iceberg is a high-performance open table format for petabyte-scale analytic datasets. It has been designed and developed as an open community standard to ensure compatibility across languages and implementations. It brings the reliability and simplicity of SQL tables to big data while enabling engines like Hive, Impala, Spark, Trino, Flink, and Presto to work with the same tables at the same time. Apache Iceberg forms the core foundation for Cloudera’s Open Data Lakehouse wit ..read more

Visit website

Setting up and Getting Started with Cloudera’s New SQL AI Assistant

Cloudera Blog » Data Warehousing

by Björn Alm

3M ago

As described in our recent blog post, an SQL AI Assistant has been integrated into Hue with the capability to leverage the power of large language models (LLMs) for a number of SQL tasks. It can help you to create, edit, optimize, fix, and succinctly summarize queries using natural language. This is a real game-changer for data analysts on all levels and will make SQL development faster, easier, and less error-prone. This blog post aims to help you understand what you can do to get started with generative AI assisted SQL using Hue image version 2023.0.16.0 or higher on the public clou ..read more

Visit website

Introducing the SQL AI Assistant:Create, Edit, Explain, Optimize, and Fix Any Query

Cloudera Blog » Data Warehousing

by David Dichmann

4M ago

Imagine you’ve just started a new job working as a business analyst. You’ve been given a new burning business question that needs an immediate answer. How long would it take you to find the data you need to even begin to come up with a data-driven response? Imagine how many iterations of query writing you’d have to go through. In this scenario, you also have reports that need updating as well. Those contain some of the biggest hair-ball queries you’ve ever seen. What do they mean? Imagine how long it takes to unravel those queries just to understand them, let alone make modificati ..read more

Visit website

Don’t Blink: You’ll Miss Something Amazing!

Cloudera Blog » Data Warehousing

by David Dichmann

7M ago

Fast moving data and real time analysis present us with some amazing opportunities. Don’t blink—or you’ll miss it! Every organization has some data that happens in real time, whether it is understanding what our users are doing on our websites or watching our systems and equipment as they perform mission critical tasks for us. This real-time data, when captured and analyzed in a timely manner, may deliver tremendous business value. For example: In manufacturing, fast-moving data provides the only way to detect—or even predict and prevent—defects in real time before they pro ..read more

Visit website

Telecommunications Data Monetization Strategies in 5G and beyond with Cloudera and AWS

Cloudera Blog » Data Warehousing

by Anthony Behan

8M ago

The world is awash with data, no more so than in the telecommunications (telco) industry. With some Cloudera customers ingesting multiple petabytes of data every single day— that’s multiple thousands of terabytes!—there is the potential to understand, in great detail, how people, businesses, cities and ecosystems function. This information is essential for the management of the telco business, from fault resolution to making sure families have the right content package for their needs, to supply chain dashboards for businesses based on IoT data. The world has changed—business and people ..read more

Visit website

HDFS Snapshot Best Practices

Cloudera Blog » Data Warehousing

by Tsz Sze

9M ago

Introduction The snapshots feature of the Apache Hadoop Distributed Filesystem (HDFS) enables you to capture point-in-time copies of the file system and protect your important data against corruption, user-, or application errors. This feature is available in all versions of Cloudera Data Platform (CDP), Cloudera Distribution for Hadoop (CDH) and Hortonworks Data Platform (HDP). Regardless of whether you’ve been using snapshots for a while or contemplating their use, this blog gives you the insights and techniques to make them look their best. Using snapshots to protect data ..read more

Visit website

12 Times Faster Query Planning With Iceberg Manifest Caching in Impala

Cloudera Blog » Data Warehousing

by Riza Suminto

10M ago

Iceberg is an emerging open-table format designed for large analytic workloads. The Apache Iceberg project continues developing an implementation of Iceberg specification in the form of Java Library. Several compute engines such as Impala, Hive, Spark, and Trino have supported querying data in Iceberg table format by adopting this Java Library provided by the Apache Iceberg project. Different query engines such as Impala, Hive, and Spark can immediately benefit from using Apache Iceberg Java Library. A range of Iceberg table analysis such as listing table’s data file, selecting table sn ..read more

Visit website

Integrating Cloudera Data Warehouse with Kudu Clusters

Cloudera Blog » Data Warehousing

by Varun Jaitly

10M ago

Apache Impala and Apache Kudu make a great combination for real-time analytics on streaming data for time series and real-time data warehousing use cases. More than 200 Cloudera customers have implemented Apache Kudu with Apache Spark for ingestion and Apache Impala for real-time BI use cases successfully over the last decade, with thousands of nodes running Apache Kudu. These use cases have varied from telecom 4G/5G analytics to real-time oil and gas reporting and alerting, to supply chain use cases for pharmaceutical companies or core banking and stock trading analytics systems. ..read more

Visit website

Job Notifications in SQL Stream Builder

Cloudera Blog » Data Warehousing

by Botond Kismoni

1y ago

Special co-author credits: Adam Andras Toth, Software Engineer Intern With enterprises’ needs for data analytics and processing getting more complex by the day, Cloudera aims to keep up with these needs, offering constantly evolving, cutting-edge solutions to all your data related problems. Cloudera Stream Processing aims to take real-time data analytics to the next level. We’re excited to highlight job monitoring with notifications, a new feature for SQL Stream Builder (SSB). What problem are we solving with job notifications? The sudden failing of a complex data pipeline can lead to devasta ..read more

Visit website

Follow Cloudera Blog » Data Warehousing on FeedSpot