Orchestrate an end-to-end ETL pipeline using Amazon S3, AWS Glue, and Amazon Redshift Serverless with Amazon MWAA
AWS Big Data Blog
by Radhika Jakkula
13h ago
Amazon Managed Workflows for Apache Airflow (Amazon MWAA) is a managed orchestration service for Apache Airflow that you can use to set up and operate data pipelines in the cloud at scale. Apache Airflow is an open source tool used to programmatically author, schedule, and monitor sequences of processes and tasks, referred to as workflows. With Amazon MWAA, you can use Apache Airflow and Python to create workflows without having to manage the underlying infrastructure for scalability, availability, and security. By using multiple AWS accounts, organizations can effectively scale their workload ..read more
Visit website
Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries
AWS Big Data Blog
by Takeshi Nakatani
13h ago
In the era of data, organizations are increasingly using data lakes to store and analyze vast amounts of structured and unstructured data. Data lakes provide a centralized repository for data from various sources, enabling organizations to unlock valuable insights and drive data-driven decision-making. However, as data volumes continue to grow, optimizing data layout and organization becomes crucial for efficient querying and analysis. One of the key challenges in data lakes is the potential for slow query performance, especially when dealing with large datasets. This can be attributed to fact ..read more
Visit website
Run interactive workloads on Amazon EMR Serverless from Amazon EMR Studio
AWS Big Data Blog
by Sekar Srinivasan
2d ago
Starting from release 6.14, Amazon EMR Studio supports interactive analytics on Amazon EMR Serverless. You can now use EMR Serverless applications as the compute, in addition to Amazon EMR on EC2 clusters and Amazon EMR on EKS virtual clusters, to run JupyterLab notebooks from EMR Studio Workspaces. EMR Studio is an integrated development environment (IDE) that makes it straightforward for data scientists and data engineers to develop, visualize, and debug analytics applications written in PySpark, Python, and Scala. EMR Serverless is a serverless option for Amazon EMR that makes it straightfo ..read more
Visit website
Dynamic DAG generation with YAML and DAG Factory in Amazon MWAA
AWS Big Data Blog
by Jayesh Shinde
4d ago
Amazon Managed Workflow for Apache Airflow (Amazon MWAA) is a managed service that allows you to use a familiar Apache Airflow environment with improved scalability, availability, and security to enhance and scale your business workflows without the operational burden of managing the underlying infrastructure. In Airflow, Directed Acyclic Graphs (DAGs) are defined as Python code. Dynamic DAGs refer to the ability to generate DAGs on the fly during runtime, typically based on some external conditions, configurations, or parameters. Dynamic DAGs helps you to create, schedule, and run tasks withi ..read more
Visit website
How Salesforce optimized their detection and response platform using AWS managed services
AWS Big Data Blog
by Atul Khare
1w ago
This is a guest blog post co-authored with Atul Khare and Bhupender Panwar from Salesforce. Headquartered in San Francisco, Salesforce, Inc. is a cloud-based customer relationship management (CRM) software company building artificial intelligence (AI)-powered business applications that allow businesses to connect with their customers in new and personalized ways. The Salesforce Trust Intelligence Platform (TIP) log platform team is responsible for data pipeline and data lake infrastructure, providing log ingestion, normalization, persistence, search, and detection capability to ensure Salesfor ..read more
Visit website
Amazon OpenSearch Service Under the Hood : OpenSearch Optimized Instances(OR1)
AWS Big Data Blog
by Bukhtawar Khan
1w ago
Amazon OpenSearch Service recently introduced the OpenSearch Optimized Instance family (OR1), which delivers up to 30% price-performance improvement over existing memory optimized instances in internal benchmarks, and uses Amazon Simple Storage Service (Amazon S3) to provide 11 9s of durability. With this new instance family, OpenSearch Service uses OpenSearch innovation and AWS technologies to reimagine how data is indexed and stored in the cloud. Today, customers widely use OpenSearch Service for operational analytics because of its ability to ingest high volumes of data while also providing ..read more
Visit website
Power analytics as a service capabilities using Amazon Redshift
AWS Big Data Blog
by Sandipan Bhaumik
1w ago
Analytics as a service (AaaS) is a business model that uses the cloud to deliver analytic capabilities on a subscription basis. This model provides organizations with a cost-effective, scalable, and flexible solution for building analytics. The AaaS model accelerates data-driven decision-making through advanced analytics, enabling organizations to swiftly adapt to changing market trends and make informed strategic choices. Amazon Redshift is a cloud data warehouse service that offers real-time insights and predictive analytics capabilities for analyzing data from terabytes to petabytes. It off ..read more
Visit website
Introducing Amazon MWAA larger environment sizes
AWS Big Data Blog
by Hernan Garcia
1w ago
Amazon Managed Workflows for Apache Airflow (Amazon MWAA) is a managed service for Apache Airflow that streamlines the setup and operation of the infrastructure to orchestrate data pipelines in the cloud. Customers use Amazon MWAA to manage the scalability, availability, and security of their Apache Airflow environments. As they design more intensive, complex, and ever-growing data processing pipelines, customers have asked us for additional underlying resources to provide greater concurrency and capacity for their tasks and workflows. To address this, today, we are announcing the availability ..read more
Visit website
Uplevel your data architecture with real- time streaming using Amazon Data Firehose and Snowflake
AWS Big Data Blog
by Swapna Bandla
2w ago
Today’s fast-paced world demands timely insights and decisions, which is driving the importance of streaming data. Streaming data refers to data that is continuously generated from a variety of sources. The sources of this data, such as clickstream events, change data capture (CDC), application and service logs, and Internet of Things (IoT) data streams are proliferating. Snowflake offers two options to bring streaming data into its platform: Snowpipe and Snowflake Snowpipe Streaming. Snowpipe is suitable for file ingestion (batching) use cases, such as loading large files from Amazon Simple S ..read more
Visit website
Achieve near real time operational analytics using Amazon Aurora PostgreSQL zero-ETL integration with Amazon Redshift
AWS Big Data Blog
by Raks Khare
2w ago
“Data is at the center of every application, process, and business decision. When data is used to improve customer experiences and drive innovation, it can lead to business growth,” – Swami Sivasubramanian, VP of Database, Analytics, and Machine Learning at AWS in With a zero-ETL approach, AWS is helping builders realize near-real-time analytics. Customers across industries are becoming more data driven and looking to increase revenue, reduce cost, and optimize their business operations by implementing near real time analytics on transactional data, thereby enhancing agility. Based on custom ..read more
Visit website

Follow AWS Big Data Blog on FeedSpot

Continue with Google
Continue with Apple
OR