One instance of Cloudera Manager (CM) can manage N clusters. In the current Role Based Access Control (RBAC) model, CM users hold privileges and permissions across everything in CM’s purview (including every cluster that CM manages). For example, Read-Only user John is a user who can perform all the actions of Read-Only users on all clusters managed by CM. The “Cluster Admin” user Chris is a cluster administrator of all the clusters managed by CM.
Cloudera Enterprise Backup and Disaster Recovery (BDR) enables you to replicate data across data centers for disaster recovery scenarios. As a lower cost solution to geographical redundancy or as a means to perform an on-premises to cloud migration, BDR can also replicate HDFS and Hive data to and from Amazon S3 or a Microsoft Azure Data Lake Store.
Many customers may require an automated solution for creating, running, and managing replication schedules in order to minimize Recovery Point Objectives (RPOs) for late arriving data or to automate recovery after disaster recovery.
Cloudera Director 2.8 introduces a simpler way to create clusters in AWS or Microsoft Azure that requires less information to get started than the standard procedure. A new configuration export capability enables retrieval of a client configuration file for any cluster as a starting point to create new clusters.
Cloudera Director helps you deploy, scale, and manage Cloudera clusters in AWS, Microsoft Azure, or Google Cloud Platform.
Self-service BI and exploratory analytics are some of the most common use cases we see our customers running on Cloudera’s analytic database solution. Over the past year, we made significant advancements to provide a simpler user experience for SQL developers and make them more productive for their everyday self-service BI tasks and workflows by leveraging Hue as the SQL development workbench.
The multi-part blog post Untangling Apache Hadoop YARN provided an overview of how the YARN scheduler works. In this post we discuss technical details around how FairScheduler Preemption works and best practices to consider when configuring it.
We also present a recent overhaul of FairScheduler Preemption in CDH 5.11 which attempts to address a number of issues as documented in YARN-4752.
The previous two sections have concentrated on infrastructure considerations and services and role layouts for categories of workloads such as Analytic DB and Operational DB. Many of the concepts described therein apply predominantly to on-premise clusters while others apply to clusters deployed on-premise or in the cloud. This section will concentrate predominantly on those considerations that apply to cloud deployments only.
At the time of this writing, Cloudera supports 3 Infrastructure as a Service (IaaS) platforms: Amazon Elastic Compute Cloud (AWS),
As a member of Cloudera’s Partner Engineering team, I evaluate hardware and cloud computing platforms offered by commercial partners who want to certify their products for use with Cloudera software. One of my primary goals is to make sure that these platforms provide a stable and well-performing base upon which our products will run, a state of operation that a wide variety of customers performing an even wider variety of tasks can appreciate.
It has been a long and patient wait for Apache Hadoop 3.0 to mature. A major new version of the storage layer obviously impacts all our integrated components, including Apache Solr and all our integrations with the rest of the platform, commonly referred to as Cloudera Search. Since our customers’ Search deployments are so often mission critical, we’ve made sure to take time to do extensive integration testing and focus on the upgrade experience.
Now the moment has finally come to announce Solr 7.0 in Cloudera Search and available as of our new major release,
Traditional messaging models fall into two categories: Shared Message Queues and Publish-Subscribe models. Both models have their own pros and cons. Neither could successfully handle big data ingestion at scale due to limitations in their design. Apache Kafka implements a publish-subscribe messaging model which provides fault tolerance, scalability to handle large volumes of streaming data for real-time analytics. It was developed at LinkedIn in 2010 to meet its growing data pipeline needs. Apache Kafka bridges the gaps that traditional messaging models failed to achieve.
One of the worst things that can happen in mission-critical production environments is loss of data and another is downtime. For a search service that provides end users with easy access to data using natural language, downtime would mean complete halt for those parts of your organization. Even worse if the search service is fueling your online business, it interrupts your customer access and end user experience.
That is why we designed multiple options of backup and disaster recovery for your data served via Cloudera Search,