Use DeepSeek with Amazon OpenSearch Service vector database and Amazon SageMaker
AWS Big Data Blog
by Jon Handler
2d ago
OpenSearch Service provides rich capabilities for RAG use cases, as well as vector embedding-powered semantic search. You can use the flexible connector framework and search flow pipelines in OpenSearch to connect to models hosted by DeepSeek, Cohere, and OpenAI, as well as models hosted on Amazon Bedrock and SageMaker. In this post, we build a connection to DeepSeek’s text generation model, supporting a RAG workflow to generate text responses to user queries ..read more
Visit website
Handle errors in Apache Flink applications on AWS
AWS Big Data Blog
by Alexis Tekin
3d ago
This post discusses strategies for handling errors in Apache Flink applications. However, the general principles discussed here apply to stream processing applications at large ..read more
Visit website
How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions
AWS Big Data Blog
by Michael Davies
1w ago
At Open Universities Australia (OUA), we empower students to explore a vast array of degrees from renowned Australian universities, all delivered through online learning. In this post, we show you how we used AWS services to replace our existing third-party ETL tool, improving the team’s productivity and producing a significant reduction in our ETL operational costs ..read more
Visit website
Hybrid big data analytics with Amazon EMR on AWS Outposts
AWS Big Data Blog
by Shoukat Ghouse
1w ago
In this post, we dive into the transformative features of EMR on Outposts, showcasing its flexibility as a native hybrid data analytics service that allows seamless data access and processing both on premises and in the cloud ..read more
Visit website
How MuleSoft achieved cloud excellence through an event-driven Amazon Redshift lakehouse architecture
AWS Big Data Blog
by Sean Zou
1w ago
In our previous thought leadership blog post Why a Cloud Operating Model we defined a COE Framework and showed why MuleSoft implemented it and the benefits they received from it. In this post, we'll dive into the technical implementation describing how MuleSoft used Amazon EventBridge, Amazon Redshift, Amazon Redshift Spectrum, Amazon S3, & AWS Glue to implement it ..read more
Visit website
OpenSearch Vector Engine is now disk-optimized for low cost, accurate vector search
AWS Big Data Blog
by Dylan Tong
2w ago
OpenSearch Vector Engine can now run vector search at a third of the cost on OpenSearch 2.17+ domains. You can now configure k-NN (vector) indexes to run on disk mode, optimizing it for memory-constrained environments, and enable low-cost, accurate vector search that responds in low hundreds of milliseconds. Disk mode provides an economical alternative to memory mode when you don’t need near single-digit latency. In this post, you’ll learn about the benefits of this new feature, the underlying mechanics, customer success stories, and getting started ..read more
Visit website
Access Amazon S3 Iceberg tables from Databricks using AWS Glue Iceberg Rest Catalog in Amazon SageMaker Lakehouse
AWS Big Data Blog
by Srividya Parthasarathy
2w ago
In this post, we will show you how Databricks on AWS general purpose compute can integrate with the AWS Glue Iceberg REST Catalog for metadata access and use Lake Formation for data access. To keep the setup in this post straightforward, the Glue Iceberg REST Catalog and Databricks cluster share the same AWS account ..read more
Visit website
Generate vector embeddings for your data using AWS Lambda as a processor for Amazon OpenSearch Ingestion
AWS Big Data Blog
by Jagadish Kumar
2w ago
In this post, we demonstrate how to use the OpenSearch Ingestion’s Lambda processor to generate embeddings for your source data and ingest them to an OpenSearch Serverless vector collection. This solution uses the flexibility of OpenSearch Ingestion pipelines with a Lambda processor to dynamically generate embeddings ..read more
Visit website
Automate topic provisioning and configuration using Terraform with Amazon MSK
AWS Big Data Blog
by Vijay Kardile
3w ago
In this post, we address common challenges associated with manual MSK topic configuration management and present a robust Terraform-based solution. This solution supports both provisioned and serverless MSK clusters ..read more
Visit website
How EUROGATE established a data mesh architecture using Amazon DataZone
AWS Big Data Blog
by Dr. Leonard Heilig
3w ago
In this post, we show you how EUROGATE uses AWS services, including Amazon DataZone, to make data discoverable by data consumers across different business units so that they can innovate faster. Two use cases illustrate how this can be applied for business intelligence (BI) and data science applications, using AWS services such as Amazon Redshift and Amazon SageMaker ..read more
Visit website

Follow AWS Big Data Blog on FeedSpot

Continue with Google
Continue with Apple
OR