Databricks Community Forum » Hadoop
77 FOLLOWERS
A community forum to discuss working with Databricks Cloud and Spark.
Databricks Community Forum » Hadoop
2y ago
Hi Team,
I am facing an issue "java.io.IOException: While processing file s3://test/abc/request_dt=2021-07-28/someParquetFile. [XYZ] BINARY is not in the store"
The things i did before getting the above exception:
1. Alter table tableName1 add columns(xyz string);
2. ALTER TABLE tableName1 RECOVER PARTITIONS; -- Recovered the new partitioned equest_dt=2021-07-28
3. INSERT OVERWRITE TABLE tableName2
PARTITION (request_dt) SELECT old_column_names, XYZ from tableName1
And hence the the third command causes the Exception
Any suggestions would be of great help.
Thanks in Advance ..read more
Databricks Community Forum » Hadoop
3y ago
Hi, I am installing hadoop (3.2.2) on Ubuntu 18.04, and for the first time. At the end of the installation when I run 'hdfs namenode -format' it shows: ERROR: Invalid HADOOP_COMMON_HOME
So I am wondering whether this is a common error and how to correct it. Please let me know.
Thanks in advance ..read more
Databricks Community Forum » Hadoop
3y ago
We are considering canceling our support from Cloudera as there are budgeting issues. We currently use hadoop v3.1.5 and hbase v2.6 HDP version.
When we installed the HDP cluster we install all of its componets Via HDP repos ( VDF file ) and Ambari installed everything.
is there a VDF file we can use to pull the repos for Apache version (opensource ) for use in Ambari to install the cluster ..read more
Databricks Community Forum » Hadoop
3y ago
I need to install bigsql service on CDH 6.3.2 Hadoop Cluster .
please help to find correct installer for same.
Thanks in Advance ..read more
Databricks Community Forum » Hadoop
3y ago
Can some one suggest best practices for migrating from Hadoop to Databricks on AWS ..read more
Databricks Community Forum » Hadoop
3y ago
I'm having a problem using hadoop-client version 3.2.1 in my dependency tree. It has a vulnerable jar: org.apache.hadoop : hadoop-mapreduce-client-core : 3.2.1 The code for the vulnerability is: CVE-2017-3166, basically if a file in an encryption zone with access permissions that make it world readable is localized via YARN's localization mechanism, that file will be stored in a world-readable location and can be shared freely with any application that requests to localize that file The problem is that: if I'm updating for the 3.3.0 hadoop-client version the vulnerability remains. Does anybody ..read more
Databricks Community Forum » Hadoop
3y ago
I have installed Hadoop on my system but when I tried to run hadoop services using start-all.sh in terminal. I get this issue:
zsh: command not found: start-all.sh ..read more
Databricks Community Forum » Hadoop
3y ago
Hi Team
Is there any way in Hadoop or Spark or any other components which enables dynamic querying on Big Data. For example i have 3 TB of data in HDFS. I wanted to build an application which enables users to choose there filters or predicates on their own and build a query and get the result in real time or near real time
Example in detail :
I have an Employee data of size 3 TB in HDFS. I have created hive external partitioned tables on top of this pointing to hdfs files. so here the goal is to enable users to choose the required data by filtering or selecting or ordering required columns. ne ..read more
Databricks Community Forum » Hadoop
3y ago
I am fairly new to this process. I am trying to run a simple map-reduce job using python 3.8 with a csv on a local Hadoop cluster (Hadoop version 3.2.1). I am currently running it on Windows 10 (64-bit). The aim of what I'm trying to do is to process a csv file where I will get an output of a count representing the top 10 salaries from the file, but it does not work.
When I enter this command:
$ python test2.py hdfs:///sample/salary.csv -r hadoop --hadoop-streaming-jar %HADOOP_HOME%/share/hadoop/tools/lib/hadoop-streaming-3.2.1.jar
The output reports an error:
<code>No configs foun ..read more
Databricks Community Forum » Hadoop
3y ago
New to Pyspark.. My source parquet file has everything as string. My destination parquet file needs to convert this to different datatype like int, string, date etc. How do I do this ..read more