Visualizing Kafka with Cloudera messaging manager (SMM)
Old dog, New tricks | Hadoop
by Guy Shilo
3y ago
Streams messaging manager (SMM) is not new to HortonWorks users, but since I was mostly using Cloudera I never had the opportunity to use it. Following the merger of Cloudera and Hortonworks in early 2019, many good products that were originally part of HDP finally made their way into the Cloudera platform including SMM. Cloudera’s… Read More ..read more
Visit website
Dealing with HDFS small files problem using Hadoop archives
Old dog, New tricks | Hadoop
by Guy Shilo
3y ago
Hadoop was built to handle very large files. It’s default block size is 128Mb and it’s all about throughput. It has hard time handling many small files. The memory footprint of the namenodes becomes high as they have to keep track of many small blocks and the performance of scans goes down. The best way… Read More ..read more
Visit website
Setting default resource pool for JDBC connections
Old dog, New tricks | Hadoop
by Guy Shilo
3y ago
This is a quick tip about connecting to Hive or Impala via JDBC. Accessing hive or impala using their JDBC driver is very convenient. Client programs s like beeline or Jetbrains DataGrip use it as the main way of accessing Hive/Impala and many people also use it in their own written programs. Things get a… Read More ..read more
Visit website
Apache Livy – a REST gateway for Spark
Old dog, New tricks | Hadoop
by Guy Shilo
3y ago
Apache Livy is an open source server that exposes Spark as a service. Its backend connects to a Spark cluster while the frontend enables REST API. This enables running it as the organization’s Spark gateway and even run in in docker containers. Not only it enables running Spark jobs from anywhere, but it also enables… Read More ..read more
Visit website
How to view the contents of fsimage or edits file
Old dog, New tricks | Hadoop
by Guy Shilo
3y ago
Everyone who is familiar with HDFS knows it stores its metadata in fsimage files and that the latest changes are stored in edits files. Periodically edits files are merged into the main fsimage file. Fsimage and edits are binary files so we cannot view their contents directly. However, HDFS offers  built in utilities that can… Read More ..read more
Visit website
Testing HDFS centralized cache
Old dog, New tricks | Hadoop
by Guy Shilo
3y ago
I really hate negative posts. I try technologies hoping to find them useful so as many readers as possible can adopt them and benefit from them. However, sometimes a technology does not live to the expectations, at least in my tests. It can still be useful as users will know what not to use, or… Read More ..read more
Visit website
Changing c3p0 parameters in Cloudera manager
Old dog, New tricks | Hadoop
by Guy Shilo
3y ago
This post was ready long ago but I had many doubts and thoughts before I decided to publish it anyway. We had a problem with Cloudera manager and we could not find any ready-made solution on the web or in Cloudera’s official documentation. So we ended up doing some reverse engineering and changing Cloudera manager’s… Read More ..read more
Visit website
Enabling debug mode in Cloudera Manager
Old dog, New tricks | Hadoop
by Guy Shilo
3y ago
This is a short post but it can save you some wandering and searching. Sometimes when you try to find and fix issues with Cloudera Manager you will want to increase the log level to debug so you can see what’s wrong. The procedure cannot be found in the documentation (or at least cannot be… Read More ..read more
Visit website
Hadoop 3 Erasure coding examined
Old dog, New tricks | Hadoop
by Guy Shilo
3y ago
One of the most interesting new features in Hadoop 3 is called Erasure coding (I found the reason for the name in this nice article that explains some of the theory. It’s called after the binary erasure channel. an algorithm that helps to correct corrupt communication data). It is based on a much older technology… Read More ..read more
Visit website
Caching with Nginx
Old dog, New tricks | Hadoop
by Guy Shilo
3y ago
In my workplace, we are heavily using Cloudera manager API. Actually our different programs and scripts query this API about 20k times per hour. plus actions done by users that login to Cloudera manager user interface. All this stressed Cloudera manager and made it slow. We tried to increase Java heap size, change Java garbage… Read More ..read more
Visit website

Follow Old dog, New tricks | Hadoop on FeedSpot

Continue with Google
Continue with Apple
OR