Old dog, New tricks | Hadoop
0 FOLLOWERS
Guy Shilo is a data engineer from Israel. Get useful insights on hadoop systems by reading the blog.
Old dog, New tricks | Hadoop
3y ago
Streams messaging manager (SMM) is not new to HortonWorks users, but since I was mostly using Cloudera I never had the opportunity to use it. Following the merger of Cloudera and Hortonworks in early 2019, many good products that were originally part of HDP finally made their way into the Cloudera platform including SMM. Cloudera’s… Read More ..read more
Old dog, New tricks | Hadoop
3y ago
Hadoop was built to handle very large files. It’s default block size is 128Mb and it’s all about throughput. It has hard time handling many small files. The memory footprint of the namenodes becomes high as they have to keep track of many small blocks and the performance of scans goes down. The best way… Read More ..read more
Old dog, New tricks | Hadoop
3y ago
This is a quick tip about connecting to Hive or Impala via JDBC. Accessing hive or impala using their JDBC driver is very convenient. Client programs s like beeline or Jetbrains DataGrip use it as the main way of accessing Hive/Impala and many people also use it in their own written programs. Things get a… Read More ..read more
Old dog, New tricks | Hadoop
3y ago
Apache Livy is an open source server that exposes Spark as a service. Its backend connects to a Spark cluster while the frontend enables REST API. This enables running it as the organization’s Spark gateway and even run in in docker containers. Not only it enables running Spark jobs from anywhere, but it also enables… Read More ..read more
Old dog, New tricks | Hadoop
3y ago
Everyone who is familiar with HDFS knows it stores its metadata in fsimage files and that the latest changes are stored in edits files. Periodically edits files are merged into the main fsimage file. Fsimage and edits are binary files so we cannot view their contents directly. However, HDFS offers built in utilities that can… Read More ..read more
Old dog, New tricks | Hadoop
3y ago
I really hate negative posts. I try technologies hoping to find them useful so as many readers as possible can adopt them and benefit from them. However, sometimes a technology does not live to the expectations, at least in my tests. It can still be useful as users will know what not to use, or… Read More ..read more
Old dog, New tricks | Hadoop
3y ago
This post was ready long ago but I had many doubts and thoughts before I decided to publish it anyway. We had a problem with Cloudera manager and we could not find any ready-made solution on the web or in Cloudera’s official documentation. So we ended up doing some reverse engineering and changing Cloudera manager’s… Read More ..read more
Old dog, New tricks | Hadoop
3y ago
This is a short post but it can save you some wandering and searching. Sometimes when you try to find and fix issues with Cloudera Manager you will want to increase the log level to debug so you can see what’s wrong. The procedure cannot be found in the documentation (or at least cannot be… Read More ..read more
Old dog, New tricks | Hadoop
3y ago
One of the most interesting new features in Hadoop 3 is called Erasure coding (I found the reason for the name in this nice article that explains some of the theory. It’s called after the binary erasure channel. an algorithm that helps to correct corrupt communication data). It is based on a much older technology… Read More ..read more
Old dog, New tricks | Hadoop
3y ago
In my workplace, we are heavily using Cloudera manager API. Actually our different programs and scripts query this API about 20k times per hour. plus actions done by users that login to Cloudera manager user interface. All this stressed Cloudera manager and made it slow. We tried to increase Java heap size, change Java garbage… Read More ..read more