We are excited to announce the general availability of Cloudera Altus SDK for Java to programmatically leverage the Altus platform-as-a service for ETL, batch machine learning, and cloud bursting. Altus empowers customers and partners alike, to run data engineering workloads in the cloud, leveraging cloud infrastructures such as AWS. Cloudera Altus also provides the ability to create data engineering pipelines using both a web console and CLI.
Cloudera Altus SDK for Java was developed to provide easier programmatic access with the popular Java programming language so that users can automate their data engineering workloads.
Many types of business problems boil down to making recommendations, and machine learning is the special sauce that makes these problems solvable. Machine learning for recommendations is a challenging endeavor in its own right, but it is just one part of the recommendation system, which must move, store, process, and update data, in production, across several different components. In this post we show how to use Cloudera’s distribution of open source software to build a production scale recommendation system,
Self-service BI and exploratory analytics are some of the most common use cases we see our customers running on Cloudera’s analytic database solution. Over the past year, we made significant advancements to provide a more powerful user experience for SQL developers and make them more productive for their everyday self-service BI tasks and workflows. Leveraging Hue as the SQL development workbench, we continue to see usage of the platform increase and the number of analytic use cases grow –
A few weeks back, we announced the upcoming beta of Cloudera Altus Analytic DB for cloud-based data warehousing. As promised, the beta is now available and we wanted to spend some time describing the unique architecture.
Architecture of Cloudera Altus Analytic DB
Altus Analytic DB is built on the Cloudera Altus platform-as-a-service foundation, which also supports the Altus Data Engineering service. The architecture of Cloudera Altus is based around a few simple but important premises —
For the Apache Spot novice or for quick evaluation of a Cybersecurity solution on Cloudera Enterprise Data Hub (EDH) without the arduous tasks of manual installation, we’ve created a rapid deployment of Apache Spot on Amazon Web Services (AWS) using Cloudera Director.
You will immediately see how you can isolate and identify suspicious activities from the Apache Spot UI using the sample data provided in the deployment at cloud scale.
Cloudera Director 2.7 introduces support for LDAP authentication, improved Java 8 support, and instance template level normalization configuration. Continuing improvements have been made to the AWS plugin.
Cloudera Director helps you deploy, scale, and manage Cloudera clusters in AWS, Azure, or Google Cloud Platform. Its enterprise-grade features deliver a mechanism for establishing production-ready clusters in the cloud for big-data workloads and applications in a simple, reliable, automated fashion.
In Part 1: Infrastructure Considerations in this three part revamped series on deploying clusters like a boss, we provided a general explanation for how nodes are classified, disk layout configurations and network topologies to think about when deploying your clusters.
In this Part 2: Service and Role Layouts segment of the series, we take a step higher up the stack looking at the various services and roles that make up your Cloudera Enterprise deployment.
Data analytics is increasingly being brought to bear to treat human disease, but as more and more health data is stored in computer databases, one significant challenge is how to perform analyses across these disparate databases. In this post I take a look at the Observational Health Data Sciences and Informatics (or OHDSI, pronounced “Odyssey”) program that was formed to address this challenge, and which today accounts for 1.26 billion patient records collectively stored across 64 databases in 17 countries.
One of the principal features used in analytic databases is table partitioning. This feature is so frequently used because of its ability to significantly reduce query latency by allowing the execution engine to skip reading data that is not necessary for the query. For example, consider a table of events partitioned on the event time using calendar day granularity. If the table contained 2 years of events and a user wanted to find the events for a given 7-day window,
Cloudera Director 2.6 and Cloudera Manager 5.13 offer a simple way to have TLS configured for Cloudera Manager and CDH clusters. In this blog post, Bill Havanki describes how to use the new feature and offers technical details behind how the automatic configuration happens.
Why TLS in the Cloud
An important tenet of information security is defense in depth. The idea behind defense in depth is to have multiple layers of security protecting valued assets,