SreeRam Hadoop Notes on Feedspot

Hive(10AmTo1:00Pm) Lab1 notes : Hive Inner and External Tables

SreeRam Hadoop Notes

by

3y ago

hive> create table samp1(line string); -- here we did not select any database. default database in hive is "default". the hdfs location of default database is /user/hive/warehouse -- when you create a table in default database, under warehouse location, one directory will be created with table name. in hdfs, /user/hive/warehouse/samp1 directory is created. hive> create database mydb; when a database is created, in warehouse location, with name database and extension ".db" , one directory will be crea ..read more

Visit website

Pig Video Lessons

SreeRam Hadoop Notes

by

3y ago

Pig class Links: PigLab1 Video: https://drive.google.com/file/d/0B6ZYkhJgGD6XTzVHbzBYUFY0a1k/view?usp=sharing PigLab Notes1: https://drive.google.com/file/d/0B6ZYkhJgGD6XeU9tUF9aS3QxUWc/view?usp=sharing PigLab2 Video: https://drive.google.com/file/d/0B6ZYkhJgGD6XNnhvZUN5eTJSaHM/view?usp=sharing PigLab2 Notes: https://drive.google.com/file/d/0B6ZYkhJgGD6Xd0ZHb1hWZVhjbmc/view?usp=sharing PigLab3 Video: https://drive.google.com/file/d/0B6ZYkhJgGD6XY3ZTWFFZZ3VMcnM/view?usp=sharing PigLab3 Notes: https://drive.google.com/file/d/0B6ZYkhJgGD6Xb1k1aklZOXdjaUE/view?usp=sharing PigLab4 Video[its a ..read more

Visit website

Hive Partitioned tables [case study]

SreeRam Hadoop Notes

by

3y ago

[cloudera@quickstart ~]$ cat saleshistory 01/01/2011,2000 01/01/2011,3000 01/02/2011,5000 01/02/2011,4000 01/02/2011,1000 01/03/2011,2000 01/25/2011,3000 01/25/2011,5000 01/29/2011,4000 01/29/2011,1000 02/01/2011,2000 02/01/2011,3000 02/02/2011,8000 03/02/2011,9000 03/02/2011,3000 03/03/2011,5000 03/25/2011,7000 03/25/2011,2000 04/29/2011,5000 04/29/2011,3000 05/01/2011,2000 05/01/2011,3000 05/02/2011,5000 05/02/2011,4000 06/02/2011,1000 06/03/2011,2000 06/25/2011,3000 07/25/2011,5000 07/29/2011,4000 07/29/2011,1000 08/01/2011,2000 08/01/2011,3000 08/02/2011,5000 09/02/2011,4000 09/02/2011,10 ..read more

Visit website

Pig : Udfs using Python

SreeRam Hadoop Notes

by

3y ago

we can keep multiple functions under one program(.py) transoform.py ------------------------- from pig_util import outputSchema @outputSchema(name:Chararray) def firstUpper(x): fc = x[0].upper() rc = x[1:].lower() n = fc+rc return n @outputSchema(sex:Chararray) def gender(x): if x=='m': x = 'Male' else: x = 'Female' return x @outputSchema(dname:chararray) def dept(dno): dname="Others" if dno==11: ..read more

Visit website

Python Examples 1

SreeRam Hadoop Notes

by

3y ago

name = input("Enter name ") age = input("Enter age") print(name, " is ", age, " years old ") ----------------------------------- # if a = 10 b = 25 if a>b: print(a , " is big") else: print(b , " is big ") ----------------------------- # nested if a = 10 b = 20 c = 17 big = 0 if a>b: if a>c: big=a else: big=c elif b>c: big=b else: big=c print("Biggest is ", big) ---------------------------------- # if and loop combination: lst = [10,20,34,23,12,34,23,45] big = lst[0] for v in lst: if v>big ..read more

Visit website

Spark : Spark streaming and Kafka Integration

SreeRam Hadoop Notes

by

3y ago

steps: 1) start zookeper server 2) Start Kafka brokers [ one or more ] 3) create topic . 4) start console producer [ to write messages into topic ] 5) start console consumer [ to test , whether messages are stremed ] 6) create spark streaming context, which streams from kafka topic. 7) perform transformations or aggregations 8) output operation : which will direct the results into another kafka topic. ------------------------------------------ following code tested with , ..read more

Visit website

Pig : UDFs

SreeRam Hadoop Notes

by

3y ago

Pig UDFS ---------- UDF ---> user defined functions. adv: i) custom functionalities. ii) reusability. Pig UDFs can be developed by java python ruby c++ javascript perl step1: Develop udf code. step2: export into jar file ex: /home/cloudera/Desktop/pigs.jar step3: register jar file into pig. grunt> register Desktop/pigs.jar step4: create t ..read more

Visit website

Pig : Cross Operator to Cartisian

SreeRam Hadoop Notes

by

3y ago

Cross: ----- used cartisian product. each element of left set, joins with each element of right set. ds1 --> (a) (b) (c) ds2 --> (1) (2) x = cross ds1, ds3 (a,1) (a,2) (b,1) (b,2) (c,1) (c,2) emp = load 'piglab/emp' using PigStorage(',') as (id:int, name:chararray, sal:int, sex:chararray, dno:int); task: f ..read more

Visit website

Pig : Order [ Sorting ] , exec, run , pig

SreeRam Hadoop Notes

by

3y ago

order :- to sort data (tuples) in ascending or descending order. emp = load 'piglab/emp' using PigStorage(',') as (id:int, name:chararray, sal:int, sex:chararray, dno:int); e1 = order emp by name; e2 = order emp by sal desc; e3 = order emp by sal desc, sex, dno desc; --------------------------------------- sql: select * from emp order by sal desc limit 3; e = order emp by sal desc; top3 = limit e 3; limitation: 101,aaa,30000,..... 102,bbb,90000 ..read more

Visit website

Pig : Joins

SreeRam Hadoop Notes

by

3y ago

[cloudera@quickstart ~]$ hadoop fs -cat spLab/e 101,aaaa,40000,m,11 102,bbbbbb,50000,f,12 103,cccc,50000,m,12 104,dd,90000,f,13 105,ee,10000,m,12 106,dkd,40000,m,12 107,sdkfj,80000,f,13 108,iiii,50000,m,11 109,jj,10000,m,14 110,kkk,20000,f,15 111,dddd,30000,m,15 [cloudera@quickstart ~]$ hadoop fs -cat spLab/d 11,marketing,hyd 12,hr,del 13,fin,del 21,admin,hyd 22,production,del [cloudera@quickstart ~]$ $ cat > joins.pig emp = load 'spLab/e' using PigStorage(',') as (id:int, name:chararray, sal:int, sex:chararray, dno:int); dept = load 's ..read more

Visit website

Follow SreeRam Hadoop Notes on FeedSpot