Rahul Patidar Blog
435 FOLLOWERS
I am a Senior Data Engineer with extensive experience in building Big Data systems to provide an Analytics Platform (Batch & Streaming platforms). With expertise in implementing data pipelines, I am responsible for converting data into informational insights thus helping the organization to make data-driven decisions.
Rahul Patidar Blog
5M ago
Introduction:
Hello everyone, Spark programming questions are crucial for preparing for big data interviews. In today’s post, I’ll cover important Spark programming questions that can help you get ready for your big data interview.
Write a Spark program in Scala that takes a DataFrame with bill IDs and order names (as lists of items) and returns the count of each distinct item across all orders.
import org.apache.spark.sql.SparkSessionimport org.apache.spark.sql.functions._val spark = SparkSession.builder.appName("Order Count").getOrCreate()import spark.implicits._val data = Seq( (10 ..read more
Rahul Patidar Blog
1y ago
What Is Kafka?
Kafka is a powerful system that handles streaming data in real-time. It is like a highly reliable and scalable infrastructure that allows you to process and analyze large amounts of data as it flows through the system. With Kafka, you can build applications that handle data ingestion, analytics, and other real-time data processing tasks effectively.
What Is KSQL?
KSQL is a tool that allows you to use SQL-like queries to process and analyze streaming data in real-time using Apache Kafka. It simplifies the development of real-time applications by enabling you to write S ..read more
Rahul Patidar Blog
1y ago
Upserts, Deletes And Incremental Processing on Big Data.
Introduction:
Managing large-scale data sets has become increasingly difficult for organizations in today’s digital age. As data volumes increase, traditional data management and processing methods become obsolete. This is where Apache Hudi enters the picture. Apache Hudi is a free and open-source data management framework that handles incremental data updates and manages large-scale data sets in real time.
What exactly is Apache Hudi?
The Apache Software Foundation created Apache Hudi, also known as Hadoop Upserts, Delete ..read more
Rahul Patidar Blog
2y ago
Hi Everyone,
In This article we will cover installation of Hadoop Components Locally. In MacOs You need to install brew if it’s not there already.
Installation which we will cover.
1. brew
2. Java
3. scala
4. hadoop
5. spark
6. mysql
7. hive
8. sbt
9. Kafka
Setup brew in Mac:
$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
2. Install java
$ brew install java
above command will download and install Java in Mac.Now Set the Path for java using below command.
$ vi ~/.bash_profile
Enter Below Lines in File and save ..read more
Rahul Patidar Blog
2y ago
EveryThing You Need To Know About Scala
Everyone who is a beginner in scala faces many problems and has many confusions while trying to understand important concepts in scala. In interviews, interviewers frequently ask about these concepts in Scala, and candidates are frequently unable to impress or explain these concepts to interviewers.
So in today’s post, I will be covering some important concepts in Scala, so whenever you go for your next interview, just read these concepts once and it will really help you to answer Scala Related Interview Questions.
This post will cover the followin ..read more
Rahul Patidar Blog
2y ago
Hi Everyone,In this blog I’ve brought together a list of useful Linux commands , which will help you to learn basic unix command.
So, I’ve categorised these commands into the following segments.
1. Time and Date commands
2. Unix users commands
3. Text file operations commands.
4. Unix directory management commands.
5. Unix system status commands.
6. Networking commands.
7. Remote access commands.
8. File Transfer Command.
Time and Date commands :
date — This Command Will show current date and time.
sleep — This Command Will wait for a given number of seconds ..read more
Rahul Patidar Blog
2y ago
Spark Optimised Cluster Configurations : Let’s Escalate Our Job Using Power Of Spark.
Introduction:
Spark Optimisation is the most Important concepts.It is the responsibility of developer to design the spark application in a optimize manner so we can take full advantage of spark execution engine.When Application are not optimised, simple code take longer to execute, resulting in performance lags and downtime, and it takes effect on the other Application which is using the same cluster.Spark is a crucial component in the operations of today’s enterprises. This is why it is crucial to ..read more
Rahul Patidar Blog
2y ago
A Comprehensive Guide On Apache Sqoop : Let’s Transfer Data From External Sources. Introduction to Sqoop :
Sqoop is a data ingestion tool used to transfer data from RDBMS to HDFS and HDFS to RDBMS. So, depending on the business requirements, we can transfer data and use other Hadoop tools to process it further. In this guide, we will be discussing Apache Sqoop so that whenever you have the requirement to use external data sources, you can easily use sqoop and transfer data inside Hadoop for further processing.
Let’s try to understand how sqoop import and export work:
Sqoop ..read more
Rahul Patidar Blog
2y ago
This is a Continuous Blog after Hive basics Series(from basics to Advance)(Hive Basics), If You have not visited on this Blog I suggest Kindly Read my Previous blog and continue with this blog to learn Hive for Big-data.
Whenever we design a Big-data solution and execute hive queries on clusters it is the responsibility of a developer to optimize the hive queries.
What is performance tuning in the hive?
Hive performance tuning refers to the collection of steps designed to improve hive query performance. When queries are not optimised, simple statements take longer to execute, result ..read more
Rahul Patidar Blog
2y ago
Hi Everyone,
In This article we will cover installation of Hadoop Components Locally.In MacOs You need to install brew if it’s not there already.
Installation which we will cover.
1. brew
2. Java
3. scala
4. hadoop
5. spark
6. mysql
7. hive
8. sbt
9. Kafka
Setup brew in Mac:
$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
2. Install java
$ brew install java
above command will download and install Java in Mac.Now Set the Path for java using below command.
$ vi ~/.bash_profile
Enter Below Lines in File and sav ..read more