DNS Zone Setup Best Practices on Azure
Cloudera Blog » Data Warehousing
by Dongkai Yu
2M ago
In Cloudera deployments on public cloud, one of the key configuration elements is the DNS. Get it wrong and your deployment may become wholly unusable with users unable to access and use the Cloudera data services. If the DNS is set up less ideal than it could be, connectivity and performance issues may arise. In this blog, we’ll take you through our tried and tested best practices for setting up your DNS for use with Cloudera on Azure. To get started and give you a feel for the dependencies for the DNS, in an Azure deployment for Cloudera, these are the Azure managed services being used:&nbs ..read more
Visit website
Materialized Views in Hive for Iceberg Table Format
Cloudera Blog » Data Warehousing
by Aman Sinha
2M ago
Overview This blog post describes support for materialized views for the Iceberg table format.   Apache Iceberg is a high-performance open table format for petabyte-scale analytic datasets. It  has been designed and developed as an open community standard to ensure compatibility across languages and implementations. It brings the reliability and simplicity of SQL tables to big data while enabling engines like Hive, Impala, Spark, Trino, Flink, and Presto to work with the same tables at the same time. Apache Iceberg forms the core foundation for Cloudera’s Open Data Lakehouse wit ..read more
Visit website
Setting up and Getting Started with Cloudera’s New SQL AI Assistant
Cloudera Blog » Data Warehousing
by Björn Alm
3M ago
As described in our recent blog post, an SQL AI Assistant has been integrated into Hue with the capability to leverage the power of large language models (LLMs) for a number of SQL tasks. It can help you to create, edit, optimize, fix, and succinctly summarize queries using natural language. This is a real game-changer for data analysts on all levels and will make SQL development faster, easier, and less error-prone.  This blog post aims to help you understand what you can do to get started with generative AI assisted SQL using Hue image version ​​2023.0.16.0 or higher on the public clou ..read more
Visit website
Introducing the SQL AI Assistant:Create, Edit, Explain, Optimize, and Fix Any Query
Cloudera Blog » Data Warehousing
by David Dichmann
4M ago
Imagine you’ve just started a new job working as a business analyst. You’ve been given a new burning business question that needs an immediate answer. How long would it take you to find the data you need to even begin to come up with a data-driven response? Imagine how many iterations of query writing you’d have to go through.   In this scenario, you also have reports that need updating as well. Those contain some of the biggest hair-ball queries you’ve ever seen. What do they mean? Imagine how long it takes to unravel those queries just to understand them, let alone make modificati ..read more
Visit website
Don’t Blink: You’ll Miss Something Amazing!
Cloudera Blog » Data Warehousing
by David Dichmann
7M ago
Fast moving data and real time analysis present us with some amazing opportunities. Don’t blink—or you’ll miss it!  Every organization has some data that happens in real time, whether it is understanding what our users are doing on our websites or watching our systems and equipment as they perform mission critical tasks for us. This real-time data, when captured and analyzed in a timely manner, may deliver tremendous business value.  For example:  In manufacturing, fast-moving data provides the only way to detect—or even predict and prevent—defects in real time before they pro ..read more
Visit website
Telecommunications Data Monetization Strategies in 5G and beyond with Cloudera and AWS
Cloudera Blog » Data Warehousing
by Anthony Behan
8M ago
The world is awash with data, no more so than in the telecommunications (telco) industry. With some Cloudera customers ingesting multiple petabytes of data every single day— that’s multiple thousands of terabytes!—there is the potential to understand, in great detail, how people, businesses, cities and ecosystems function. This information is essential for the management of the telco business, from fault resolution to making sure families have the right content package for their needs, to supply chain dashboards for businesses based on IoT data.  The world has changed—business and people ..read more
Visit website
HDFS Snapshot Best Practices
Cloudera Blog » Data Warehousing
by Tsz Sze
9M ago
Introduction The snapshots feature of the Apache Hadoop Distributed Filesystem (HDFS) enables you to capture point-in-time copies of the file system and protect your important data against corruption, user-, or application errors.  This feature is available in all versions of Cloudera Data Platform (CDP), Cloudera Distribution for Hadoop (CDH) and Hortonworks Data Platform (HDP). Regardless of whether you’ve been using snapshots for a while or contemplating their use, this blog gives you the insights and techniques to make them look their best.   Using snapshots to protect data ..read more
Visit website
12 Times Faster Query Planning With Iceberg Manifest Caching in Impala
Cloudera Blog » Data Warehousing
by Riza Suminto
10M ago
Iceberg is an emerging open-table format designed for large analytic workloads. The Apache Iceberg project continues developing an implementation of Iceberg specification in the form of Java Library. Several compute engines such as Impala, Hive, Spark, and Trino have supported querying data in Iceberg table format by adopting this Java Library provided by the Apache Iceberg project.  Different query engines such as Impala, Hive, and Spark can immediately benefit from using Apache Iceberg Java Library. A range of Iceberg table analysis such as listing table’s data file, selecting table sn ..read more
Visit website
Integrating Cloudera Data Warehouse with Kudu Clusters
Cloudera Blog » Data Warehousing
by Varun Jaitly
10M ago
Apache Impala and Apache Kudu make a great combination for real-time analytics on streaming data for time series and real-time data warehousing use cases. More than 200 Cloudera customers have implemented Apache Kudu with Apache Spark for ingestion and Apache Impala for real-time BI use cases successfully over the last decade, with thousands of nodes running Apache Kudu. These use cases have varied from telecom 4G/5G analytics to real-time oil and gas reporting and alerting, to supply chain use cases for pharmaceutical companies or core banking and stock trading analytics systems.   ..read more
Visit website
Job Notifications in SQL Stream Builder
Cloudera Blog » Data Warehousing
by Botond Kismoni
1y ago
Special co-author credits: Adam Andras Toth, Software Engineer Intern With enterprises’ needs for data analytics and processing getting more complex by the day, Cloudera aims to keep up with these needs, offering constantly evolving, cutting-edge solutions to all your data related problems. Cloudera Stream Processing aims to take real-time data analytics to the next level. We’re excited to highlight job monitoring with notifications, a new feature for SQL Stream Builder (SSB). What problem are we solving with job notifications? The sudden failing of a complex data pipeline can lead to devasta ..read more
Visit website

Follow Cloudera Blog » Data Warehousing on FeedSpot

Continue with Google
Continue with Apple
OR