Large Scale Industrialization Key to Open Source Innovation
Cloudera Blog | Apache Hadoop
by Sudhir Menon
1y ago
We are now well into 2022 and the megatrends that drove the last decade in data—The Apache Software Foundation as a primary innovation vehicle for big data, the arrival of cloud computing, and the debut of cheap distributed storage—have now converged and offer clear patterns for competitive advantage for vendors and value for customers. Cloudera has been parlaying those patterns into clear wins for the community at large and, more importantly, streamlining the benefits of that innovation to our customers.  At Cloudera, we have had the benefit of an early start, and as a result we have cu ..read more
Visit website
Apache Ozone Powers Data Science in CDP Private Cloud
Cloudera Blog | Apache Hadoop
by George Huang
2y ago
Apache Ozone is a scalable distributed object store that can efficiently manage billions of small and large files. Ozone natively provides Amazon S3 and Hadoop Filesystem compatible endpoints in addition to its own native object store API endpoint and is designed to work seamlessly with enterprise scale data warehousing, machine learning and streaming workloads. The object store is readily available alongside HDFS in CDP (Cloudera Data Platform) Private Cloud Base 7.1.3+. This means that there is out of the box support for Ozone storage in services like Apache Hive, Apache Impala, Apache Spark ..read more
Visit website
One billion files in Ozone
Cloudera Blog | Apache Hadoop
by Nandakumar Vadivelu
3y ago
Apache Hadoop Ozone is a distributed key-value store that can manage both small and large files alike. Ozone was designed to address the scale limitations of HDFS with respect to small files. HDFS is designed to store large files and the recommended number of files on HDFS is 300 million for a Namenode, and doesn’t scale well beyond this limit. Principal features of Ozone which help it achieve scalability are: The namespace in Ozone is written to a local RocksDB instance, with this design a balance between performance (keeping everything in memory) and scalability (persisting the less used me ..read more
Visit website
EMR workloads + CDP = better performance and lower costs
Cloudera Blog | Apache Hadoop
by Wim Stoop
3y ago
The first thing that comes to mind when talking about synergy is how 2+2=5. Being the writer that he is, Mark Twain described it a lot more eloquently as “the bonus that is achieved when things work together harmoniously”. There is a multitude of product and business examples to illustrate the point and I particularly like how car manufacturers can bring together relatively small engines to do big things.  To provide supercar performance in a more environmentally friendly way for the i8, BMW stepped away from ever bigger power plants. They paired the same 1.5-liter petrol engine as you’ll ..read more
Visit website
An Architecture for Secure COVID-19 Contact Tracing
Cloudera Blog | Apache Hadoop
by Tristan Stevens
3y ago
This post describes an architecture, and associated controls for privacy, to build a data platform for a nationwide proactive contact tracing solution. Background After calls for a way of using technology to facilitate the lifting of restrictions on freedom of movement for people not self isolating, whilst ensuring regulatory obligations such as the UK Human Rights Act and equivalent GDPR provisions, this paper proposes a reference architecture for a contact pairing database that maintains privacy, yet is built to scale to support large-scale lifting of restrictions of movement. Contact Tracin ..read more
Visit website
Operational Database Management
Cloudera Blog | Apache Hadoop
by Gokul Kamaraj
3y ago
This blog post is part of a series on Cloudera’s Operational Database (OpDB) in CDP. Each post goes into more details about new features and capabilities. Start from the beginning of the series with, Operational Database in CDP.  This blog post gives you an overview of the OpDB management tools and features in the Cloudera Data Platform. The tools discussed in this article will help you understand the various options available to manage the operations of your OpDB cluster. Backup and recovery tools Cloudera provides multiple mechanisms to allow backup and recovery, including: Snapshots R ..read more
Visit website
Hadoop: Decade Two, Day Zero*
Cloudera Blog | Apache Hadoop
by Arun Murthy
3y ago
This blog was originally published on Medium The Data Cloud — Powered By Hadoop One key aspect of the Cloudera Data Platform (CDP), which is just beginning to be understood, is how much of a recombinant-evolution it represents, from an architectural standpoint, vis-à-vis Hadoop in its first decade. I’ve been having a blast showing CDP to customers over the past few months and the response has been nothing short of phenomenal… Through these discussions, I see that the natural proclivity is to imagine that CDP is just another “distro” (i.e. “unity distro”) of the two-parent distros (CDH &a ..read more
Visit website
Operational Database Accessibility
Cloudera Blog | Apache Hadoop
by Liliana Kadar
3y ago
This blog post is part of a series on Cloudera’s Operational Database (OpDB) in CDP. Each post goes into more details about new features and capabilities. Start from the beginning of the series with, Operational Database in CDP.  Cloudera’s OpDB provides a rich set of capabilities to store and access data. In this blog post, we’ll look at the accessibility capabilities of OpDB and how you can make use of these capabilities to access your data.   Distribution and sharding Cloudera’s Operational Database (OpDB) is a scale-out Database Management System (DBMS) that is designed to s ..read more
Visit website
Apache Hadoop Ozone Security – Authentication
Cloudera Blog | Apache Hadoop
by Xiaoyu Yao
3y ago
Apache Ozone is a distributed object store built on top of Hadoop Distributed Data Store service.  It can manage billions of small and large files that are difficult to handle by other distributed file systems. Ozone supports rich APIs such as Amazon S3, Kubernetes CSI as well as native Hadoop File System APIs. This makes Ozone easily consumable by different kinds of big data workloads such as data warehouse on Apache Hive, data ingestion with Apache Nifi, streaming with Apache Spark/Flink and machine learning with Tensorflow.    With the growing data footprint and multifac ..read more
Visit website
Disk and Datanode Size in HDFS
Cloudera Blog | Apache Hadoop
by Lokesh Jain
3y ago
This blog discusses answers to questions like what is the right disk size in datanode and what is the right capacity for a datanode. A few of our customers have asked us about using dense storage nodes. It is certainly possible to use dense nodes for archival storage because IO bandwidth requirements are usually lower for cold data. However the decision to use denser nodes for hot data must be evaluated carefully as it can have an impact on the performance of the cluster. You may be able to use denser nodes for hot data if you have provisioned adequate network bandwidth to mitigate the higher ..read more
Visit website

Follow Cloudera Blog | Apache Hadoop on FeedSpot

Continue with Google
Continue with Apple
OR