Databricks Community Forum » Hadoop on Feedspot

Java.io.IOException: While processing file s3://test/abc/request_dt=2021-07-28/someParquetFile. [XYZ] BINARY is not in the store

Databricks Community Forum » Hadoop

by zeyk

2y ago

Hi Team, I am facing an issue "java.io.IOException: While processing file s3://test/abc/request_dt=2021-07-28/someParquetFile. [XYZ] BINARY is not in the store" The things i did before getting the above exception: 1. Alter table tableName1 add columns(xyz string); 2. ALTER TABLE tableName1 RECOVER PARTITIONS; -- Recovered the new partitioned equest_dt=2021-07-28 3. INSERT OVERWRITE TABLE tableName2 PARTITION (request_dt) SELECT old_column_names, XYZ from tableName1 And hence the the third command causes the Exception Any suggestions would be of great help. Thanks in Advance ..read more

Visit website

Getting error after: hdfs namenode - format

Databricks Community Forum » Hadoop

by bghose

3y ago

Hi, I am installing hadoop (3.2.2) on Ubuntu 18.04, and for the first time. At the end of the installation when I run 'hdfs namenode -format' it shows: ERROR: Invalid HADOOP_COMMON_HOME So I am wondering whether this is a common error and how to correct it. Please let me know. Thanks in advance ..read more

Visit website

Is it possible to install Apache Hadoop 3.3.1and Apache HBase 2.4.2 Via Ambari?

Databricks Community Forum » Hadoop

by aomer

3y ago

We are considering canceling our support from Cloudera as there are budgeting issues. We currently use hadoop v3.1.5 and hbase v2.6 HDP version. When we installed the HDP cluster we install all of its componets Via HDP repos ( VDF file ) and Ambari installed everything. is there a VDF file we can use to pull the repos for Apache version (opensource ) for use in Ambari to install the cluster ..read more

Visit website

Need Installer of bigsql Service for CDH6.3.2 Hadoop Cluster , Please help to find same

Databricks Community Forum » Hadoop

by keshwalnitin23

3y ago

I need to install bigsql service on CDH 6.3.2 Hadoop Cluster . please help to find correct installer for same. Thanks in Advance ..read more

Visit website

Can some one suggest best practices for migrating from Hadoop to Databricks on AWS

Databricks Community Forum » Hadoop

by DJAY

3y ago

Can some one suggest best practices for migrating from Hadoop to Databricks on AWS ..read more

Visit website

Hadoop Client version 3.2.1 vulnerability

Databricks Community Forum » Hadoop

by laszloczol

3y ago

I'm having a problem using hadoop-client version 3.2.1 in my dependency tree. It has a vulnerable jar: org.apache.hadoop : hadoop-mapreduce-client-core : 3.2.1 The code for the vulnerability is: CVE-2017-3166, basically if a file in an encryption zone with access permissions that make it world readable is localized via YARN's localization mechanism, that file will be stored in a world-readable location and can be shared freely with any application that requests to localize that file The problem is that: if I'm updating for the 3.3.0 hadoop-client version the vulnerability remains. Does anybody ..read more

Visit website

Having trouble running Hadoop services

Databricks Community Forum » Hadoop

by Aman Kr Sharma

3y ago

I have installed Hadoop on my system but when I tried to run hadoop services using start-all.sh in terminal. I get this issue: zsh: command not found: start-all.sh ..read more

Visit website

Dynamic Querying Hadoop

Databricks Community Forum » Hadoop

by chaithudbd

3y ago

Hi Team Is there any way in Hadoop or Spark or any other components which enables dynamic querying on Big Data. For example i have 3 TB of data in HDFS. I wanted to build an application which enables users to choose there filters or predicates on their own and build a query and get the result in real time or near real time Example in detail : I have an Employee data of size 3 TB in HDFS. I have created hive external partitioned tables on top of this pointing to hdfs files. so here the goal is to enable users to choose the required data by filtering or selecting or ordering required columns. ne ..read more

Visit website

Unexpected arguments error appearing on the command line when running mapreduce job (MRjob) using python

Databricks Community Forum » Hadoop

by amackrach06

3y ago

I am fairly new to this process. I am trying to run a simple map-reduce job using python 3.8 with a csv on a local Hadoop cluster (Hadoop version 3.2.1). I am currently running it on Windows 10 (64-bit). The aim of what I'm trying to do is to process a csv file where I will get an output of a count representing the top 10 salaries from the file, but it does not work. When I enter this command: $ python test2.py hdfs:///sample/salary.csv -r hadoop --hadoop-streaming-jar %HADOOP_HOME%/share/hadoop/tools/lib/hadoop-streaming-3.2.1.jar The output reports an error: <code>No configs foun ..read more

Visit website

How to Read a parquet file , change datatype and write to another Parquet file in Hadoop using pyspark

Databricks Community Forum » Hadoop

by harrykrishs

3y ago

New to Pyspark.. My source parquet file has everything as string. My destination parquet file needs to convert this to different datatype like int, string, date etc. How do I do this ..read more

Visit website

Follow Databricks Community Forum » Hadoop on FeedSpot