SreeRam Hadoop Notes
405 FOLLOWERS
Read the blog to find useful hadoop notes.
SreeRam Hadoop Notes
3y ago
hive> create table samp1(line string);
-- here we did not select any database.
default database in hive is "default".
the hdfs location of default database is
/user/hive/warehouse
-- when you create a table in default database, under warehouse location, one directory will be created with table name.
in hdfs,
/user/hive/warehouse/samp1 directory is created.
hive> create database mydb;
when a database is created, in warehouse location, with name database and extension ".db" , one directory will be crea ..read more
SreeRam Hadoop Notes
3y ago
Pig class Links:
PigLab1 Video:
https://drive.google.com/file/d/0B6ZYkhJgGD6XTzVHbzBYUFY0a1k/view?usp=sharing
PigLab Notes1:
https://drive.google.com/file/d/0B6ZYkhJgGD6XeU9tUF9aS3QxUWc/view?usp=sharing
PigLab2 Video:
https://drive.google.com/file/d/0B6ZYkhJgGD6XNnhvZUN5eTJSaHM/view?usp=sharing
PigLab2 Notes:
https://drive.google.com/file/d/0B6ZYkhJgGD6Xd0ZHb1hWZVhjbmc/view?usp=sharing
PigLab3 Video:
https://drive.google.com/file/d/0B6ZYkhJgGD6XY3ZTWFFZZ3VMcnM/view?usp=sharing
PigLab3 Notes:
https://drive.google.com/file/d/0B6ZYkhJgGD6Xb1k1aklZOXdjaUE/view?usp=sharing
PigLab4 Video[its a ..read more
SreeRam Hadoop Notes
3y ago
[cloudera@quickstart ~]$ cat saleshistory
01/01/2011,2000
01/01/2011,3000
01/02/2011,5000
01/02/2011,4000
01/02/2011,1000
01/03/2011,2000
01/25/2011,3000
01/25/2011,5000
01/29/2011,4000
01/29/2011,1000
02/01/2011,2000
02/01/2011,3000
02/02/2011,8000
03/02/2011,9000
03/02/2011,3000
03/03/2011,5000
03/25/2011,7000
03/25/2011,2000
04/29/2011,5000
04/29/2011,3000
05/01/2011,2000
05/01/2011,3000
05/02/2011,5000
05/02/2011,4000
06/02/2011,1000
06/03/2011,2000
06/25/2011,3000
07/25/2011,5000
07/29/2011,4000
07/29/2011,1000
08/01/2011,2000
08/01/2011,3000
08/02/2011,5000
09/02/2011,4000
09/02/2011,10 ..read more
SreeRam Hadoop Notes
3y ago
we can keep multiple functions
under one program(.py)
transoform.py
-------------------------
from pig_util import outputSchema
@outputSchema(name:Chararray)
def firstUpper(x):
fc = x[0].upper()
rc = x[1:].lower()
n = fc+rc
return n
@outputSchema(sex:Chararray)
def gender(x):
if x=='m':
x = 'Male'
else:
x = 'Female'
return x
@outputSchema(dname:chararray)
def dept(dno):
dname="Others"
if dno==11:
  ..read more
SreeRam Hadoop Notes
3y ago
name = input("Enter name ")
age = input("Enter age")
print(name, " is ", age, " years old ")
-----------------------------------
# if
a = 10
b = 25
if a>b:
print(a , " is big")
else:
print(b , " is big ")
-----------------------------
# nested if
a = 10
b = 20
c = 17
big = 0
if a>b:
if a>c:
big=a
else:
big=c
elif b>c:
big=b
else:
big=c
print("Biggest is ", big)
----------------------------------
# if and loop combination:
lst = [10,20,34,23,12,34,23,45]
big = lst[0]
for v in lst:
if v>big ..read more
SreeRam Hadoop Notes
3y ago
steps:
1) start zookeper server
2) Start Kafka brokers [ one or more ]
3) create topic .
4) start console producer [ to write messages into topic ]
5) start console consumer [ to test , whether messages are stremed ]
6) create spark streaming context,
which streams from kafka topic.
7) perform transformations or aggregations
8) output operation : which will direct the results into another kafka topic.
------------------------------------------
following code tested with ,
  ..read more
SreeRam Hadoop Notes
3y ago
Pig UDFS
----------
UDF ---> user defined functions.
adv:
i) custom functionalities.
ii) reusability.
Pig UDFs can be developed by
java
python
ruby
c++
javascript
perl
step1:
Develop udf code.
step2:
export into jar file
ex: /home/cloudera/Desktop/pigs.jar
step3:
register jar file into pig.
grunt> register Desktop/pigs.jar
step4:
create t ..read more
SreeRam Hadoop Notes
3y ago
Cross:
-----
used cartisian product.
each element of left set, joins with each element of right set.
ds1 --> (a)
(b)
(c)
ds2 --> (1)
(2)
x = cross ds1, ds3
(a,1)
(a,2)
(b,1)
(b,2)
(c,1)
(c,2)
emp = load 'piglab/emp' using PigStorage(',')
as (id:int, name:chararray, sal:int,
sex:chararray, dno:int);
task:
f ..read more
SreeRam Hadoop Notes
3y ago
order :-
to sort data (tuples) in ascending or descending order.
emp = load 'piglab/emp'
using PigStorage(',')
as (id:int, name:chararray,
sal:int, sex:chararray, dno:int);
e1 = order emp by name;
e2 = order emp by sal desc;
e3 = order emp by sal desc, sex, dno desc;
---------------------------------------
sql:
select * from emp order by sal desc limit 3;
e = order emp by sal desc;
top3 = limit e 3;
limitation:
101,aaa,30000,.....
102,bbb,90000 ..read more
SreeRam Hadoop Notes
3y ago
[cloudera@quickstart ~]$ hadoop fs -cat spLab/e
101,aaaa,40000,m,11
102,bbbbbb,50000,f,12
103,cccc,50000,m,12
104,dd,90000,f,13
105,ee,10000,m,12
106,dkd,40000,m,12
107,sdkfj,80000,f,13
108,iiii,50000,m,11
109,jj,10000,m,14
110,kkk,20000,f,15
111,dddd,30000,m,15
[cloudera@quickstart ~]$ hadoop fs -cat spLab/d
11,marketing,hyd
12,hr,del
13,fin,del
21,admin,hyd
22,production,del
[cloudera@quickstart ~]$
$ cat > joins.pig
emp = load 'spLab/e' using PigStorage(',')
as (id:int, name:chararray, sal:int,
sex:chararray, dno:int);
dept = load 's ..read more