Qubole on Feedspot

Apache Sqoop 1.4.7 – 9 reasons why you need it

Qubole

by Shefali Aggarwal

2y ago

The sixth release of Apache Sqoop i.e. 1.4.7 is out! This is one of the most significant updates to the Sqoop platform. We give you 9 reasons why you need Apache Sqoop 1.4.7, including the enhanced Sqoop on the Qubole Data Platform, which has additional features that help you run Extract-Transform-Load (ETL) pipelines more efficiently and connect securely with data warehouses like Google Big Query, Amazon Redshift or Snowflake. What is Apache Sqoop? Apache Sqoop is designed to efficiently transfer enormous volumes of data between Apache Hadoop and structured datastores such as relational datab ..read more

Visit website

Qubole and Google Join Forces to Deliver Unified User Experience for Apache Spark and Hadoop

Qubole

by Victoria Nava

2y ago

I’m very excited to announce our expanded partnership with Google Cloud Platform (GCP). We have joined forces to offer an enterprise self-service data platform powered by optimized versions of Apache Spark and Hadoop, with unified tools for data science and data engineering running on GCP. Why Now? My co-founder Joydeep and I have always felt strongly that the future of big data is on the cloud. As a result, we created a platform with the flexibility to use the technologies and frameworks that best fit your environment today as well as how that environment will look tomorrow. In recent years w ..read more

Visit website

A Technical Overview of Qubole Data Platform on Google Cloud

Qubole

by Shefali Aggarwal

2y ago

Today at Google Cloud Next 2019, we announced the launch of Qubole Data Platform on Google Cloud — an easy, collaborative, enterprise service for advanced analytics, machine learning, and AI built on a modern architecture leveraging Kubernetes. From its inception, we have closely partnered with Google engineering and product teams to build and launch this service on Google Cloud. Qubole Data Platform on GCP offers data science and data engineering teams a rich and unified experience with built-in notebooks, dashboards, and an integrated workbench to execute any command, all available right wit ..read more

Visit website

How to Increase Your Big Data Value with Apache Spark on Qubole

Qubole

by Shefali Aggarwal

2y ago

If big data frameworks had a popularity contest, Apache Spark would be the attractive, trendy option everyone wants to be seen with. First developed at the AMPLab at UC Berkeley, Spark has become a widely adopted framework among organizations with lofty visions for Machine Learning (ML), sophisticated processing, and advanced analytics, among other big data projects. Spark’s vibrant community of more than 1,700 contributors prioritizes agility, flexibility, and scalability through the 300 to 400 code commits deployed per month. And data professionals confirm the framework’s rising popularity ..read more

Visit website

Reducing Big Data TCO on Azure with Qubole

Qubole

by Shefali Aggarwal

2y ago

Many companies start their big data cloud journey on Azure by testing Microsoft’s native offering HDInsight (HDI). With data already in Blob Storage or Azure Data Lake Store, HDI makes it easy to get started on projects. For an experienced team in big data analytics and Machine Learning (ML), with data engineers, data scientists, and analysts on staff, configuring and tuning infrastructure on HDI can make sense. However, for most organizations, managing and tuning their big data infrastructure manually can quickly become a barrier to scaling beyond a departmental project or proof of concept. T ..read more

Visit website

Honored to Receive the SIGMOD Systems Award for Apache Hive

Qubole

by Victoria Nava

2y ago

Qubole co-founders Ashish Thusoo and Joydeep Sen Sharma were recently awarded the SIGMOD Software Systems Award for developing a seminal software system—Apache Hive—that brought relational-style declarative programming to the Hadoop ecosystem. A decade back, while at Facebook, we conceived the idea of Apache Hive (Hive), an SQL-like interface for querying data that sits atop Hadoop. Turning this project into a reality required immense contributions from a talented team with a passion for the idea. We would be remiss not to mention the names of the prolific Zeng Shao and the always dependable N ..read more

Visit website

Evolution of Hadoop

Qubole

by Shefali Aggarwal

2y ago

Over the course of the next month, we will be going deeper into some of the trends uncovered in our 2018 Big Data Activation Report. In this post, we look at the trend of companies who have migrated their Hadoop resource manager from MapReduce (Hadoop 1) to YARN (Hadoop 2) in the past two years and the resulting benefits. It’s been over a decade since Hadoop first entered this world. Spawned from Google’s MapReduce white paper and the founding of Nutch, Hadoop has come a long way in the enterprise from being just another Silicon Valley tool. Hadoop’s first recorded massive scale production was ..read more

Visit website

Big Data Analytics: Microsoft Azure Data Lake Store and Qubole

Qubole

by Smita Sinha

2y ago

Microsoft’s Azure Data Lake Store (ADLS) integration with Qubole Data Service (QDS) is a significant milestone in the journey we started when we launched QDS for Azure Blob Storage in 2017. With this integration, it will now be possible to run rich queries and derive deeper insights from your data in ADLS as well. ADLS is an enterprise-grade hyper-scale repository for big data workloads. It enables you to capture and process data of any size, type, and ingestion speed in one single place. ADLS supports any application that uses the open-source Apache Hadoop Distributed File System (HDFS) stand ..read more

Visit website

Container Packing: A New Algorithm for Resource Scheduling in the Cloud

Qubole

by Shefali Aggarwal

2y ago

In this post we describe a new algorithm for allocating resources among long-running distributed applications. When applied to big data workloads running on an elastic compute cluster, this algorithm has been seen to result in hardware savings of more than 40 percent. YARN and Uniform Resource Allocation Many big data frameworks such as MapReduce, Tez, and Spark use YARN for resource allocation and management. By default, YARN tries to allocate resources according to capacity on each node and locality constraints attached in the resource request. If multiple nodes satisfy these constraints, YA ..read more

Visit website

Data Platforms 2017: The Conference I Wish Existed in 2007

Qubole

by Shefali Aggarwal

2y ago

This post is authored by Ashish Thusoo, Co-Founder and Chief Executive Officer, Qubole Ten years ago, I was starting out at Facebook, working to build the Facebook Data Service Team. In those days, the big data landscape was like the Wild West. We were among the first teams to try this new experiment of building a company around deriving value from big data. We were trailblazing, making mistakes along the way, and often finding ourselves stumped over roadblocks that now seem routine. There were times at Facebook when we would have given a lot to have an intensive dialogue with colleagues who ..read more

Visit website

Follow Qubole on FeedSpot