Achieve peak performance and boost scalability using multiple Amazon Redshift serverless workgroups and Network Load Balancer
AWS Big Data Blog
by Ricardo Serafim
17h ago
As data analytics use cases grow, factors of scalability and concurrency become crucial for businesses. Your analytic solution architecture should be able to handle large data volumes at high concurrency and without compromising speed, thereby delivering a scalable high-performance analytics environment. Amazon Redshift Serverless provides a fully managed, petabyte-scale, auto scaling cloud data warehouse to support high-concurrency analytics. It offers data analysts, developers, and scientists a fast, flexible analytic environment to gain insights from their data with optimal price-performanc ..read more
Visit website
Use AWS Glue Data Catalog views to analyze data
AWS Big Data Blog
by Leonardo Gomez
17h ago
In this post, we show you how to use the new views feature the AWS Glue Data Catalog. SQL views are a powerful object used across relational databases. You can use views to decrease the time to insights of data by tailoring the data that is queried. Additionally, you can use the power of SQL in a view to express complex boundaries in data across multiple tables that can’t be expressed with simpler permissions. Data lakes provide customers the flexibility required to derive useful insights from data across many sources and many use cases. Data consumers can consume data where they need to acros ..read more
Visit website
Governing data in relational databases using Amazon DataZone
AWS Big Data Blog
by Jose Romero
17h ago
Data governance is a key enabler for teams adopting a data-driven culture and operational model to drive innovation with data. Amazon DataZone is a fully managed data management service that makes it faster and easier for customers to catalog, discover, share, and govern data stored across Amazon Web Services (AWS), on premises, and on third-party sources. It also makes it easier for engineers, data scientists, product managers, analysts, and business users to access data throughout an organization to discover, use, and collaborate to derive data-driven insights. Amazon DataZone allows you to ..read more
Visit website
Revolutionizing data querying: Amazon Redshift and Visual Studio Code integration
AWS Big Data Blog
by Navnit Shukla
1w ago
In today’s data-driven landscape, the efficiency and accessibility of querying tools play a crucial role in driving businesses forward. Amazon Redshift recently announced integration with Visual Studio Code (), an action that transforms the way data practitioners engage with Amazon Redshift and reshapes your interactions and practices in data management. This innovation not only unlocks new possibilities, but also tackles long-standing challenges in data analytics and query handling. While the Amazon Redshift query editor v2 (QE v2) offers a smooth experience for data analysts and business use ..read more
Visit website
Detect and handle data skew on AWS Glue
AWS Big Data Blog
by Salim Tutuncu
1w ago
AWS Glue is a fully managed, serverless data integration service provided by Amazon Web Services (AWS) that uses Apache Spark as one of its backend processing engines (as of this writing, you can use Python Shell, Spark, or Ray). Data skew occurs when the data being processed is not evenly distributed across the Spark cluster, causing some tasks to take significantly longer to complete than others. This can lead to inefficient resource utilization, longer processing times, and ultimately, slower performance. Data skew can arise from various factors, including uneven data distribution, skewed j ..read more
Visit website
How Fujitsu implemented a global data mesh architecture and democratized data
AWS Big Data Blog
by Kanehito Miyake
1w ago
This is a guest post co-authored with Kanehito Miyake, Engineer at Fujitsu Japan.  Fujitsu Limited was established in Japan in 1935. Currently, we have approximately 120,000 employees worldwide (as of March 2023), including group companies. We develop business in various regions around the world, starting with Japan, and provide digital services globally. To provide a variety of products, services, and solutions that are better suited to customers and society in each region, we have built business processes and systems that are optimized for each region and its market. However, in recent ..read more
Visit website
Introducing Amazon Q data integration in AWS Glue
AWS Big Data Blog
by Noritaka Sekiyama
1w ago
Today, we’re excited to announce general availability of Amazon Q data integration in AWS Glue. Amazon Q data integration, a new generative AI-powered capability of Amazon Q Developer, enables you to build data integration pipelines using natural language. This reduces the time and effort you need to learn, build, and run data integration jobs using AWS Glue data integration engines. Tell Amazon Q Developer what you need in English, it will return a complete job for you. For example, you can ask Amazon Q Developer to generate a complete extract, transform, and load (ETL) script or code snippet ..read more
Visit website
Dive deep into security management: The Data on EKS Platform
AWS Big Data Blog
by Yuzhu Xiao
1w ago
The construction of big data applications based on open source software has become increasingly uncomplicated since the advent of projects like Data on EKS, an open source project from AWS to provide blueprints for building data and machine learning (ML) applications on Amazon Elastic Kubernetes Service (Amazon EKS). In the realm of big data, securing data on cloud applications is crucial. This post explores the deployment of Apache Ranger for permission management within the Hadoop ecosystem on Amazon EKS. We show how Ranger integrates with Hadoop components like Apache Hive, Spark, Trino, Ya ..read more
Visit website
Orchestrate an end-to-end ETL pipeline using Amazon S3, AWS Glue, and Amazon Redshift Serverless with Amazon MWAA
AWS Big Data Blog
by Radhika Jakkula
2w ago
Amazon Managed Workflows for Apache Airflow (Amazon MWAA) is a managed orchestration service for Apache Airflow that you can use to set up and operate data pipelines in the cloud at scale. Apache Airflow is an open source tool used to programmatically author, schedule, and monitor sequences of processes and tasks, referred to as workflows. With Amazon MWAA, you can use Apache Airflow and Python to create workflows without having to manage the underlying infrastructure for scalability, availability, and security. By using multiple AWS accounts, organizations can effectively scale their workload ..read more
Visit website
Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries
AWS Big Data Blog
by Takeshi Nakatani
2w ago
In the era of data, organizations are increasingly using data lakes to store and analyze vast amounts of structured and unstructured data. Data lakes provide a centralized repository for data from various sources, enabling organizations to unlock valuable insights and drive data-driven decision-making. However, as data volumes continue to grow, optimizing data layout and organization becomes crucial for efficient querying and analysis. One of the key challenges in data lakes is the potential for slow query performance, especially when dealing with large datasets. This can be attributed to fact ..read more
Visit website

Follow AWS Big Data Blog on FeedSpot

Continue with Google
Continue with Apple
OR