Follow AWS Machine Learning Blog on Feedspot

Continue with Google
Continue with Facebook


When CEO Josh Churlik co-founded Well Data Labs in 2014, he was acutely aware of a bizarre dichotomy in his industry: For oil and gas companies, “downhole” innovation (that is, what happens underground) far exceeds the pace of data and analysis innovation. The data systems used then were relics of the 1990s – more homages to history than helpful to the people who needed them.

Like many others in the industry, Josh and the Well Data Labs team were frustrated with the inability to access information that would have made frontline engineers’ jobs much easier. While the industry plodded along with spreadsheets, Churlik and his team saw an opportunity to build a modern software company around the rapid advancements in cloud computing.

The resulting company, Well Data Labs, describes itself as “a modern web application built to give operators the fastest and simplest way to manage, analyze, and report on their internal data.” In other words, Well Data Labs efficiently handles the messy time-series data created during operations—capturing, normalizing, structuring, and enabling analysis on that data—all within a web-based app.

With what Well Data Labs offers, engineers can make faster, more informed decisions—decisions that have a direct and immediate impact on the cost and success of the operations. The company has replaced manual front-end data collection and analysis with custom-developed machine learning (ML) models running on AWS, so that Well Data Labs’ customers can monitor field operations in real-time.

The AWS tech stack powers this solution. Churlik explained, “When we were getting started, we did a bakeoff between other cloud providers and AWS. Even though we’re a .NET stack and SQL database, AWS was significantly more performant.” So, AWS was their choice; to this day, Well Data Labs uses AWS for all their cloud needs. “What we’ve liked about AWS is we can always scale. We’ve been able to continuously build and grow,” Churlik added. “AWS was and still is ahead of its industry peers on technology services.”

Well Data Labs leverages the seamless integration between AWS services to power their robust solution. Currently, the Well Data Labs architecture includes Amazon Elastic Compute Cloud (Amazon EC2) for all of their managed servers (to power their applications), Amazon S3 to store the various data artifacts without worrying about storage limitations, Amazon Simple Queue Service (Amazon SQS) to create a distributed system, and Amazon Virtual Private Cloud (VPC) and AWS Identity and Access Management (IAM) to keep its infrastructure secure. In addition to all of those core services, Well Data Labs uses Amazon SageMaker in their Machine Learning (ML) workloads.

Churlik recalls that he started a data science team to begin exploratory R&D with ML about a year ago. “We asked ourselves, ‘what is the value that it [ML] could be providing to our customers?’ And then we started experimenting.”

Now, the team uses Amazon SageMaker to deploy trained models on custom Docker containers via their proprietary SaaS application. The Amazon SageMaker models and SageMaker endpoint features enable Well Data Labs to integrate ML into the SaaS application and thereby bring frontline engineering workers real-time data for event detection and notification during operations. Well Data Labs set the precedent by bringing ML to the oil and gas market in this way.

Using AWS to build and host many of their solutions means the Well Data Labs team can focus on R&D and developing new product features, rather than on managing infrastructure. Well Data Labs data scientists can deploy new prediction models as soon as they are ready and iterate on new versions rapidly. The quick integration and deployment of ML functionality into the SaaS application in turn enables frontline users to benefit from data science advances right away. The first set of models that Well Data Labs built immediately saved their customers up to an hour a day of manual data entry.

Achieving that kind of success right out of the gates is exciting, and this is only the beginning. Well Data Labs pioneered the “digital oilfield” (where technology, data, automation, and people in the oil and gas industry all intersect), and their customers affirm that this small but mighty Denver-based company is ushering in a new era for the oil and gas industry.

About the Author

Marisa Messina is on the AWS ML marketing team, where her job includes identifying the most innovative AWS-using customers and showcasing their inspiring stories. Prior to AWS, she worked on consumer-facing hardware and then university-facing cloud offerings at Microsoft. Outside of work, she enjoys exploring the Pacific Northwest hiking trails, cooking without recipes, and dancing in the rain.






  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

The growth of artificial intelligence could create 58 million net new jobs in the next few years, states the World Economic Forum [1]. Yet, according to the Tencent Research Institute, it’s estimated that currently there are 300,000 AI engineers worldwide, but millions are needed [2]. As you can tell, there is a unique and immediate opportunity to develop creative experiences and introduce you—no matter what your developer skill levels are—to essential ML concepts. These experiences in fields of ML like deep learning, reinforcement learning, and so on, will expand your skills and help close the talent gap.

To help you advance your AI/ML capabilities with hands-on and fun ML learning experiences, I am thrilled to announce the AWS DeepRacer Scholarship Challenge. 

What is AWS DeepRacer?

In November 2018, Jeff Barr announced the launch of AWS DeepRacer on the AWS News Blog as a new way to learn ML. With AWS DeepRacer, you have an opportunity to get hands-on with a fully autonomous 1/18th-scale race car driven by reinforcement learning, a 3D racing simulator, and a global racing league.

What is the AWS DeepRacer Scholarship Challenge?

AWS and Udacity are collaborating to educate developers of all skill levels on ML concepts.  Those skills are reinforced by putting them to the test through the world’s first autonomous racing league—the AWS DeepRacer League.

Students enrolled in the AWS DeepRacer Scholarship Challenge who have the top lap times can win full scholarships to the Machine Learning Engineer nanodegree program. The Udacity Nanodegree program is a unique online educational offering designed to bridge the gap between learning and career goals. 

How does the AWS DeepRacer Scholarship Challenge work?

The program begins August 1, 2019 and runs through October 31, 2019. You can join the scholarship community at any point during these three months and immediately enroll in Udacity’s specialized AWS DeepRacer course. Register now to be in pole position for the start of the race.

After enrollment, you go through the AWS DeepRacer course, which consists of short, step-by-step modules (90 minutes in total). The modules prepare you to create, train, and fine-tune a reinforcement learning model in the AWS DeepRacer 3D racing simulator. Throughout the program and during each race, you have access to a custom scholarship student community to get pro tips from experts and exchange ideas with your classmates.

Each month, you can pit your skills against others in virtual races in the AWS DeepRacer console. Students compete for top spots in each month’s unique race course. Students that record the top lap times in August, September, and October 2019 qualify for one of 200 full scholarships to the Udacity Machine Learning Engineer nanodegree program, sponsored by Udacity.

Next steps

To get notified about the scholarship program and enrollment dates, register now. For a program FAQ, see AWS DeepRacer Scholarship Challenge.

Developers, start your engines! The first challenge starts August 1, 2019!

[1] Artificial Intelligence To Create 58 Million New Jobs By 2022, Says Report (Forbes)
[2] Tencent says there are only 300,000 AI engineers worldwide, but millions are needed (The Verge)

About the Author

Tara Shankar Jana is a Senior Product Marketing Manager for AWS Machine Learning. Currently he is working on building unique and scalable educational offerings for the aspiring ML developer communities- to help them expand their skills on ML. Outside of work he loves reading books, travelling and spending time with his family.

  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

The housing market is complex.  There is a continuously changing supply of student housing units around any given education campus. Moreover, the accepted value of a unit continuously changes based on physical and social variables. These variables could include proximity to campus with regard to other available options, friend groups living nearby, and the availability of nearby parking as other properties fill. The interplay happens at all levels—entire properties may shift in value and specific units within them may exacerbate or counteract those shifts.

For a property management company to earn the maximum revenue from its rental units, it needs to price each unit just within the price-point for the tenants—but it doesn’t know what their price constraints are.  The company would not want to leave money on the table by setting a price too low. Setting a price too high can mean that the unit sits empty—effectively costing the company to maintain the unit. Finding that balance is a difficult problem.

Entrata, a comprehensive technology provider of multifamily property management solutions, solves this problem by employing machine learning (ML) with AWS.  Specifically, they feed location-specific and even building-specific data (such as occupancy, proximity to campus, and lease term length) into an ML-based dynamic pricing engine running on Amazon SageMaker. The model helps Entrata’s customers—property managers—to predict occupancy levels and in turn optimize their prices of student housing.

At the implementation level, this solution relies on a number of AWS offerings.  AWS Glue extracts Entrata’s historical data into Amazon S3. This data enables Amazon SageMaker to make pricing predictions, which are written to an output bucket back into Amazon S3. Entrata’s applications consume this data request using API Gateway, which triggers AWS Lambda functions to deliver the most relevant forecast for any available unit.

Entrata developed this solution in partnership with AWS Premier Consulting Partner 1Strategy, a Seattle-based consultancy that helps businesses architect, migrate, and optimize their workloads on AWS. The partnership between 1Strategy and Entrata has existed for years, but the ML work is their most recent—and arguably, most impressive—joint technical accomplishment.

Their collaboration previously focused exclusively on data management through AWS—which in itself proves a non-trivial challenge due to the location, size, and complexity of the data. Entrata currently serves greater than 20,000 apartment communities nationwide and offers a variety of tools, from mobile apps to lease signing portals to accounting platforms.

The novel ML solution is exciting. Entrata’s CTO, Ryan Byrd, says, “The impact is far ranging and positive. Automating back-office functions with Amazon ML frees property management to focus on people first, instead of performing rote behind-the-scenes guessing of price recommendations.”

Entrata plans even more work with AWS in the future. Byrd adds, “AWS technologies will decrease our time to market with various ML projects.” He and his colleagues on the Entrata team are keen to aid customers in their decision-making efforts. They also use ML for various operational elements for their and their customers’ businesses, strategic planning, and maintenance management.

About the Author

Marisa Messina is on the AWS ML marketing team, where her job includes identifying the most innovative AWS-using customers and showcasing their inspiring stories. Prior to AWS, she worked on consumer-facing hardware and then university-facing cloud offerings at Microsoft. Outside of work, she enjoys exploring the Pacific Northwest hiking trails, cooking without recipes, and dancing in the rain.

  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Course Hero is an online learning platform that provides students access to over 25 million course-specific study materials, including study guides, class notes, and practice problems for numerous subjects. The platform, which runs on AWS, is designed to enable every student to take on their courses feeling confident and prepared. To make that possible, Course Hero is equipped to do some learning of its own, using Amazon Machine Learning (Amazon ML), which powers Course Hero and serves as its primary artificial intelligence and ML platform.

The artificial intelligence group at Course Hero is tasked with building the company’s semantic knowledge graph. This constantly expanding graph enables students to access personalized learning experiences and gives educators tools to create unique course content.

Most aspects of Course Hero’s offerings rely on AWS in some form or another (either compute or ML). For example, Amazon Elasticsearch Service (Amazon ES) powers the search function that students and educators use to search for materials. The Amazon ES platform allows the Course Hero team to write custom implementations through its API extension plugin. The plugin gives them the flexibility to create relevant user experiences, even for more esoteric searches that require locally dense semantic search capability.

Students and educators search within Course Hero’s document library (which is freely accessible) in exchange for uploading one’s own content. Course Hero does not accept all documents as publishable library material; documents gain acceptance to the library after going through a cloud-driven vetting process. When new documents are uploaded, an artificial intelligence platform running on Amazon EMR and Amazon SageMaker Inference Pipelines checks and validates the documents for fraud, honor code violations, copyright infringements, and spam.

The documents that pass quality review then move to further processing and tagging using ML models that are built on the label data that Amazon SageMaker Ground Truth has collected. This document labeling enables Course Hero to learn what kind of materials are used by a given student, then predict what else might be useful for them.

By personalizing the experience in this way, Course Hero provides each user with relevant content for their studying needs. With the right content in hand, students gain a deeper understanding and meet their learning objectives more efficiently.

AWS is a comprehensive platform for Course Hero. In addition to the student-facing use cases described above, Course Hero uses AWS services for ad hoc analyses, data exploration, trend discovery, real-time analytics, fraud detection, and more. Course Hero constructs its data platform using key AWS services, including the following:

Course Hero’s planning, tracking, and monitoring platforms also use Kibana, Logstash, and Amazon CloudWatch to keep all monitoring and service centers running smoothly.

The following diagram shows how all of these components work together.

To further augment the existing AWS technology that powers Course Hero, the team is exploring additional Amazon services, including Amazon Forecast, for time series and financial forecasting. It is also looking at possibilities using Amazon Echo that will allow users to ask questions via Alexa,

Course Hero’s Saurabh Khanwalkar, the Director of Machine Learning & Search Sciences, says, “The entire machine learning, engineering, and artificial intelligence stack runs on AWS. From our CI/CD pipelines to our code workbench to our end-to-end model development to our staging and production inferences, we’re on AWS.”

About the Author

Marisa Messina is on the AWS ML marketing team, where her job includes identifying the most innovative AWS-using customers and showcasing their inspiring stories. Prior to AWS, she worked on consumer-facing hardware and then university-facing cloud offerings at Microsoft. Outside of work, she enjoys exploring the Pacific Northwest hiking trails, cooking without recipes, and dancing in the rain.

  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Voice-powered experiences are gaining traction and customer love. Volley is at the cutting edge of voice-controlled entertainment with its series of popular smart-speaker games, and many aspects of Volley rely on Amazon Polly.

Every day, more and more people switch on lights, check the weather, and play music not by pushing buttons but with verbal commands to smart speakers. Volley is a San Francisco–based startup co-founded in 2016 by former Harvard roommates Max Child (CEO) and James Wilsterman (CTO). They’re on a mission to use smart speakers as the basis for building fun experiences.

Volley creates games of all sorts, from song quizzes to political satire to role-playing games. Many of the latter, such as “Yes Sire,” feature choose-your-own-adventure style games, in which infinite dialogue permutations can flow from each player’s choices. Volley relies heavily on Amazon Polly to enable these growing dialogue permutations amid multiple characters’ interactions.

“We associate each character with a particular Amazon Polly voice,” said Wilsterman. “Our on-the-fly TTS generation only works because Amazon Polly’s text-to-speech API latency is low enough to be essentially imperceptible to the user.”

From a cost perspective, the comparison is a no-brainer: hiring voice actors to voice the games would be a thousand times more expensive (literally–Volley ran the numbers). Amazon Polly has reaction speed nailed, with faster reactions than a human option. It also provides more diverse characters and reactions than recorded, scripted voice actors.

“We want our games to showcase diverse, memorable characters,” said Wilsterman. “We appreciate that Amazon Polly supports many different languages, accents, and age ranges to help us in that effort.” For example, Amazon Polly’s built-in German language support proved essential to Volley’s recent launch of a localized version of “Yes Sire” for Germany (called “Ja Exzellenz”).

Along with Amazon Polly, many other AWS services support Volley’s fun and games. This platform choice dates to Volley’s beginnings, when the co-founders were looking for the best services to host backend game logic and store persistent customer data.

“We realized quickly that AWS Lambda and Amazon DynamoDB would be ideal options,” said Wilsterman. He soon discovered that AWS also offered appealing scalability and affordability. The Volley team now uses Lambda not only to host the backend logic for their games but also to host a variety of internal tools and microservices deployed through Lambda functions.

DynamoDB supports Volley’s games by storing persistent data like users’ scores and levels, so they can return to the games and pick up right where they left off. And many of the in-game assets are stored in Amazon S3, which makes them instantly accessible to the backend Lambda functions. All those pieces are visualized together in the following workflow diagram.

Volley recently added a layer of sophistication to its machine learning work with Amazon SageMaker. They’re using Amazon SageMaker to strengthen their business by understanding user behavior and promoting their games accordingly. Specifically, the Volley team faces a bit of challenge because users don’t carry persistent tags. So, if someone finishes playing “World Detective” and immediately starts to play “Castle Master,” there is no way to identify that they’re the same user.

As a result, the Volley team must find creative ways to measure the impact of their cross-promotional efforts. With Amazon SageMaker, they can predictively generate the outcomes of their marketing based on the active users of each of the games and the timestamps. That helps them make sure that future marketing is better-targeted—and that future games meet the audience trends that Volley is seeing.

As Volley continues to expand its repertoire, the team is also considering new directions beyond sheer entertainment. “Self-improvement is an interesting space, like meditation, fitness, and other coaches,” said Wilsterman. “Also, learning and teaching. We are constantly asking, ‘What new experiences can be possible with voice as an input?’”

No matter what Volley chooses to pursue next, one thing is for sure: their cloud platform of choice. “The entire architecture runs on AWS; we use it for everything from storage to machine learning,” said Wilsterman.

About the Author

Marisa Messina is on the AWS ML marketing team, where her job includes identifying the most innovative AWS-using customers and showcasing their inspiring stories. Prior to AWS, she worked on consumer-facing hardware and then university-facing cloud offerings at Microsoft. Outside of work, she enjoys exploring the Pacific Northwest hiking trails, cooking without recipes, and dancing in the rain.

  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Amazon Elastic Inference allows you to attach low-cost GPU-powered acceleration to Amazon EC2 and Amazon SageMaker instances, and reduce the cost of running deep learning inference by up to 75 percent. The EIPredictorAPI makes it easy to use Elastic Inference.

In this post, we use the EIPredictor and describe a step-by-step example for using TensorFlow with Elastic Inference. Additionally, we explore the cost and performance benefits of using Elastic Inference with TensorFlow. We walk you through how we improved total inference time for FasterRCNN-ResNet50 over 40 video frames from ~113.699 seconds to ~8.883 seconds, and how we improved cost efficiency by 78.5 percent.

The EIPredictor is based on the TensorFlow Predictor API. The EIPredictor is designed to be consistent with the TensorFlow Predictor API to make code portable between the two data structures. The EIPredictor is meant to be an easy way to use Elastic Inference within a single Python script or notebook. A flow that’s already using the TensorFlow Predictor only needs one code change: importing and specifying theEIPredictor. This procedure is shown later.

Benefits of Elastic Inference

Look at how Elastic Inference compares to other EC2 options in terms of performance and cost.

Instance Type vCPUs CPU Memory (GB) GPU Memory (GB) FP32 TFLOPS $/hour TFLOPS/$/hr
1 m5.large 2 8 0.07 $0.10 0.73
2 m5.xlarge 4 16 0.14 $0.19 0.73
3 m5.2xlarge 8 32 0.28 $0.38 0.73
4 m5.4xlarge 16 64 0.56 $0.77 0.73
5 c5.4xlarge 16 32 0.67 $0.68 0.99
6 p2.xlarge (K80) 4 61 12 4.30 $0.90 4.78
7 p3.2xlarge (V100) 8 61 16 15.70 $3.06 5.13
8 eia.medium 1 1.00 $0.13 7.69
9 eia.large 2 2.00 $0.26 7.69
10 eia.xlarge 4 4.00 $0.52 7.69
11 m5.xlarge + eia.xlarge 4 16 4 4.14 $0.71 5.83

If you look at compute capability (teraFLOPS or floating point operations per second), m5.4xlarge provides 0.56 TFLOPS for $0.77/hour, whereas an eia.medium with 1.00 TFLOPS costs just $0.13/hour. If pure performance (ignoring costs) is the goal, it’s clear that a p3.2xlarge instance provides the most compute at 15.7 TFLOPS.

However, in the last column for TFLOPS per dollar, you can see that Elastic Inference provides the most value. Elastic Inference accelerators (EIA) must be attached to an EC2 instance. The last row shows one possible combination. The m5.xlarge + eia.xlarge has a similar amount of vCPUs and TFLOPS as a p2.xlarge, but at a $0.19/hour discount. With Elastic Inference, you can right-size your compute needs by choosing your compute instance, memory and GPU compute. With this approach, you can realize the maximum value per $ spent. The GPU attachments to your CPU are abstracted by framework libraries, which makes it easy to make inference calls without worrying about the underlying GPU hardware.

Video object detection example using the EIPredictor

Here is a step-by-step example of using Elastic Inference with the EIPredictor. For this example, we use a FasterRCNN-ResNet50 model, an m5.large CPU instance, and an eia.large accelerator.

  • Launch Elastic Inference with a setup script.
  • An m5.large instance and attached eia.large accelerator.
  • An AMI with Docker installed. In this post, we use DLAMI. You may choose an AMI without Docker, but install Docker first before proceeding.
  • Your IAM role has ECRFullAccess.
  • Your VPC security group has ports 80 and 443 open for both inbound and outbound traffic and port 22 open for inbound traffic.
Using Elastic Inference with TensorFlow
  1. SSH to your instance with port forwarding for the Jupyter notebook. For Ubuntu AMIs:
    ssh -i {/path/to/keypair} -L 8888:localhost:8888 ubuntu@{ec2 instance public DNS name}

    For Amazon Linux AMIs:

    ssh -i {/path/to/keypair} -L 8888:localhost:8888 ec2-user@{ec2 instance public DNS name} 
  2. Copy the code locally.
    git clone https://github.com/aws-samples/aws-elastic-inference-tensorflow-examples   
  3. Run and connect to your Jupyter notebook.
    cd aws-elastic-inference-tensorflow-examples; ./build_run_ei_container.sh

    Wait until the Jupyter notebook starts up. Go to localhost:8888 and supply the token that is given in the terminal.

  4. Run benchmarked versions of Object Detection examples.
    1. Open elastic_inference_video_object_detection_tutorial.ipynb and run the notebook.
    2. Take note of the session runtimes produced. The following two examples show without Elastic Inference, then with Elastic Inference.
      1. The first is TensorFlow running your model on your instance’s CPU, without Elastic Inference:
        Model load time (seconds): 8.36566710472
        Number of video frames: 40
        Average inference time (seconds): 2.86271090508
        Total inference time (seconds): 114.508436203
      2. The second reporting is using an Elastic Inference accelerator:
        Model load time (seconds): 21.4445838928
        Number of video frames: 40
        Average inference time (seconds): 0.23773444891
        Total inference time (seconds): 9.50937795639
    3. Compare the results, performance, and cost between the two runs.
      • In the screenshots posted above, Elastic Inference gives an average inference speedup of ~12x.
      • With this video of 340 frames of shape (1, 1080, 1920, 3) simulating streaming frames, about 44 of these full videos can be inferred in one hour using the m5.large+eia.large, considering one loading of the model.
      • With the same environment excluding the eia.large Elastic Inference accelerator, only three or four of these videos can be inferred in one hour. Thus, it would take 12–15 hours to complete the same task.
      • An m5.large costs $0.096/hour, and an eia.large slot type costs $0.26/hour. Comparing costs for inferring 44 replicas of this video, you would spend $0.356 to run inference on 44 videos in an hour using the Elastic Inference set up in this example. You’d spend between $1.152 and $1.44 to run the same inference job in 12–15 hours without the eia.large accelerator.
      • Using the numbers above, if you use an eia.large accelerator, you would run the same task in between a 1/12th and a 1/15th of the time and at ~27.5% of the cost. The eia.large accelerator allows for about 4.2 frames per second.
      • The complete video is 340 frames. To run object detection on the complete video, remove  and count < 40 from the def extract_video_frames function.
    4. Finally, you should produce a video like this one: annotated_dog_park.mp4.
    5. Also note the usage of the EIPredictor for using an accelerator (use_ei=True) and running the same task locally (use_ei=False).
      ei_predictor = EIPredictor(
Exploring all possibilities

Now, we’ve done more investigation and tried out a few more instance combinations for Elastic Inference. We experimented with FasterRCNN-ResNet50, batch size of 1, and input image dimensions of (1080, 1920, 3).

The model is loaded into memory with an initial inference using a random input of shape (1, 100, 100, 3). After rerunning the initial notebook, we started with combinations of m5.large, m5.xlarge, m5.2xlarge, and m5.4xlarge with Elastic Inference accelerators eia.medium, eia.large, and eia.xlarge. We produced the following table:

1 Client instance type Elastic Inference accelerator type Cost per hour Infer latency [ms] Cost per 100k inferences
2 m5.large eia.medium $0.23 353.53 $2.22
3 eia.large $0.36 222.78 $2.20
4 eia.xlarge $0.62 140.96 $2.41
5 m5.xlarge eia.medium $0.32 357.70 $3.20
6 eia.large $0.45 224.81 $2.82
7 eia.xlarge $0.71 150.29 $2.97
8 m5.2xlarge eia.medium $0.51 350.38 $5.00
9 eia.large $0.64 229.65 $4.11
10 eia.xlarge $0.90 142.55 $3.58
11 m5.4xlarge eia.medium $0.90 355.53 $8.87
12 eia.large $1.03 222.53 6.35
13 eia.xlarge $1.29 149.17 $5.34

Looking at the client instance types with the eia.medium (highlighted in yellow in the table above), you see similar results. This means that there isn’t much client-side processing, so going to a larger client instance does not improve performance. You can save on cost by choosing a smaller instance.

Similarly, looking at client instances using the largest eia.xlarge accelerator (highlighted in blue), there isn’t a noticeable performance difference. This means that you can stick with the m5.large client instance type, achieve similar performance, and pay less. For information about setting up different client instance types, see Launch accelerators in minutes with the Amazon Elastic Inference setup tool for Amazon EC2.

Comparing M5, P2, P3, and EIA instances

Plotting the data that you’ve collected from runs on different instance types, you can see that GPU performed better than CPU (as expected). EC2 P3 instances are 3.34x faster than EC2 P2 instances. Before this, you had to choose between P2 and P3. Now, Elastic Inference gives you another choice, with more granularity at a lower cost.

Based on instance cost per hour (us-west-2 for EIA and EC2), the m5.2xlarge + eia.medium costs in between the P2 and P3 instance costs (see the following table) for the TensorFlow EIPredictor example. When factoring the cost to perform 100,000 inferences, you can see that the P2 and P3 have a similar cost, while with m5.large+eia.large, you achieve nearly P2 performance at less than half the price!

1 Instance Type Cost per hour Infer latency [ms] Cost per 100k inferences
2 m5.4xlarge $0.77 415.87 $8.87
3 c5.4xlarge $0.68 363.45 $6.87
4 p2.xlarge $0.90 197.68 $4.94
5 p3.2xlarge $3.06 61.04 $5.19
6 m5.large+eia.large $0.36 222.78 $2.20
7 m5.large+eia.xlarge $0.62 140.96 $2.41

Comparing inference latency

Now that you’ve decided on an m5.large client instance type, you can look at the accelerator types (the orange bars). There is a progression from 222.78 ms and 140.96 ms in terms of inference latency. This shows that the Elastic Inference accelerators provide options between P2 and P3 in terms of latency, at a lower cost.

Comparing inference cost efficiency

The last column in the preceding table, Cost per 100k inferences, shows the cost efficiency of the combination. m5.large and eia.large have the best cost efficiency. The m5.large + eia.large combo provides the best cost efficiency compared to the m5.4xlarge and P2/P3 instances with 55% to 75% savings.

The m5.large and eia.xlarge provides a 2.95x speed increase over m5.4xlarge (CPU only) with 73% savings and a 1.4x speedup over p2.xlarge with 51% savings.


Here’s what we’ve found so far:

  • Combining Elastic Inference accelerators with any client EC2 instance type enables users to choose the amount of client compute, memory, etc. with a configurable amount of GPU memory and compute.
  • Elastic Inference accelerators provide a range of memory and GPU acceleration options at a lower cost.
  • Elastic Inference accelerators can achieve a better cost efficiency than M5, C5, and P2/P3 instances.

In our analysis, we found that increasing ease of use within TensorFlow is as simple as creating and calling an EIPredictor object. This allowed you to use largely the same test notebook on CPU, GPU, and CPU+EIA environments with TensorFlow, and ease testing and performance analysis.

We started with a FasterRCNN-ResNet50 model running on an m5.4xlarge instance with a 415.87 ms inference latency. We were able to reduce it to 140.96 ms by migrating to an m5.large and eia.xlarge, resulting in a 2.95x increase in speed with a $0.15 hourly cost savings to top it off. We also found that we could achieve a $0.41 hourly cost savings with an m5.large and eia.large and still get better performance (416 ms vs. 223 ms).


Try out TensorFlow on Elastic Inference and see how much you can save while still improving performance for inference on your model. Here are the steps we went through to analyze the design space for deep learning inference, and you too can follow for your model:

  1. Write a test script or notebook to analyze inference performance for CPU context.
  2. Create copies of the script with tweaks for GPU and EIA.
  3. Run scripts on M5, P2, and P3 instance types and get a baseline for performance.
  4. Analyze the performance.
    1. Start with the largest Elastic Inference accelerator type and large client instance type.
    2. Work backwards until you find a combo that is too small.
  5. Introduce cost efficiency to the analysis by computing cost to perform 100k inferences. 
About the author

Cory Pruce is a Software Development Engineer with AWS AI TensorFlow. He works on building AWS services in AI space, specifically using TensorFlow. In his free time, he likes participating in Data Science/Machine Learning competitions, learning about state-of-the-art techniques, and working on projects.

Srinivas Hanabe is a Principal Product Manager with AWS AI for Elastic Inference. Prior to this role, he was the PM lead for Amazon VPC. Srinivas loves running long distance, reading books on a variety of topics, spending time with his family, and is a career mentor.

  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Deep learning (DL) frameworks enable machine learning (ML) practitioners to build and train ML models. However, the process of deploying ML models in production to serve predictions (also known as inferences) in real time is more complex. It requires that ML practitioners build a scalable and performant model server, which can host these models and handle inference requests at scale.

Model Server for Apache MXNet (MMS) was developed to address this hurdle. MMS is a highly scalable, production-ready inference server. MMS was designed in a ML/DL framework agnostic way to host models trained in any ML/DL framework.

In this post, we showcase how you can use MMS to host a model trained using any ML/DL framework or toolkit in production. We chose Amazon SageMaker for production hosting. This PaaS solution does a lot of heavy lifting to provide infrastructure and allows you to focus on your use cases.

For this solution, we use the approach outlined in Bring your own inference code with Amazon SageMaker hosting. This post explains how you can bring your models together with all necessary dependencies, libraries, frameworks, and other components. Compile them in a single custom-built Docker container and then host them on Amazon SageMaker.

To showcase the ML/DL framework-agnostic architecture of MMS, we chose to launch a model trained with the PaddlePaddle framework into production. The steps for taking a model trained on any ML/DL framework to Amazon SageMaker using an MMS bring your own (BYO) container are illustrated in the following diagram:

As this diagram shows, you need two main components to bring your ML/DL framework to Amazon SageMaker using an MMS BYO container:

  1. Model artifacts/model archive: These are all the artifacts required to run your model on a given host.
    • Model files: Usually symbols and weights. They are the artifacts of training a model.
    • Custom service file: Contains the entry point that is called every time an inference request is received and served by MMS. This file contains the logic to initialize the model in a particular ML/DL framework, preprocess the incoming request, and run inference. It also post-processes the logic that takes the data coming out of the framework’s inference method and converts it to end-user consumable data.
    • MANIFEST : The interface between the custom service file and MMS. This file is generated by running a tool called the model-archiver, which comes as a part of MMS distribution.
  1. Container artifact: To load and run a model written in a custom DL framework on Amazon SageMaker, bring a container to be run on Amazon SageMaker. In this post, we show you how to use the MMS base container and extend it to support custom DL frameworks and other model dependencies. The MMS base container is a Docker container that comes with a highly scalable and performant model-server, which is readily launchable in Amazon SageMaker.

In the following sections, we describe each of the components in detail.

Preparing a model

The MMS container is ML/DL framework agnostic. Write models in a ML/DL framework of your choice and bring it to Amazon SageMaker with an MMS BYO container to get the features of scalability and performance. We show you how to prepare a PaddlePaddle model in the following sections.

Preparing model artifacts

Use the Understand Sentiment example that is available and published in the examples section of the PaddlePaddle repository.

First, create a model following the instructions provided in the PaddlePaddle/book repository. Download the container and run the training using the notebook provided as part of the example. We used the Stacked Bidirectional LSTM network for training, and trained the model for 100 epochs. At the end of this training exercise, we got the following list of trained model artifacts.

$ ls
embedding_0.w_0    fc_2.w_0    fc_5.w_0    learning_rate_0    lstm_3.b_0    moment_10    moment_18    moment_25    moment_32    moment_8
embedding_1.w_0    fc_2.w_1    fc_5.w_1    learning_rate_1    lstm_3.w_0    moment_11    moment_19    moment_26    moment_33    moment_9
fc_0.b_0    fc_3.b_0    fc_6.b_0    lstm_0.b_0    lstm_4.b_0    moment_12    moment_2    moment_27    moment_34
fc_0.w_0    fc_3.w_0    fc_6.w_0    lstm_0.w_0    lstm_4.w_0    moment_13    moment_20    moment_28    moment_35
fc_1.b_0    fc_3.w_1    fc_6.w_1    lstm_1.b_0    lstm_5.b_0    moment_14    moment_21    moment_29    moment_4
fc_1.w_0    fc_4.b_0    fc_7.b_0    lstm_1.w_0    lstm_5.w_0    moment_15    moment_22    moment_3    moment_5
fc_1.w_1    fc_4.w_0    fc_7.w_0    lstm_2.b_0    moment_0    moment_16    moment_23    moment_30    moment_6
fc_2.b_0    fc_5.b_0    fc_7.w_1    lstm_2.w_0    moment_1    moment_17    moment_24    moment_31    moment_7

These artifacts constitute a PaddlePaddle model.

Writing custom service code

You now have the model files required to host the model in production. To take this model into production with MMS, provide a custom service script that knows how to use these files. This script must also know how to pre-process the raw request coming into the server and how to post-process the responses coming out of the PaddlePaddle framework’s infer method.

Create a custom service file called paddle_sentiment_analysis.py. Here, define a class called PaddleSentimentAnalysis that contains methods to initialize the model and also defines pre-processing, post-processing, and inference methods. The skeleton of this file is as follows:

$ cat paddle_sentiment_analysis.py

import ...
class PaddleSentimentAnalysis(object):
    def __init__(self):

    def initialize(self, context):
    This method is used to initialize the network and read other artifacts.
    def preprocess(self, data):
    This method is used to convert the string requests coming from client 
    into tensors. 

    def inference(self, input):
    This method runs the tensors created in preprocess method through the 
    DL framework's infer method.

    def postprocess(self, output, data):
    Here the values returned from the inference method is converted to a 
    human understandable response.

_service = PaddleSentimentAnalysis()

def handle(data, context):
This method is the entrypoint "handler" method that is used by MMS.
Any request coming in for this model will be sent to this method.
    if not _service.initialized:

    if data is None:
        return None

    pre = _service.preprocess(data)
    inf = _service.inference(pre)
    ret = _service.postprocess(inf, data)
    return ret

To understand the details of this custom service file, see paddle_sentiment_analysis.py. This custom service code file allows you to tell MMS what the lifecycle of each inference request should look like. It also defines how a trained model-artifact can initialize the PaddlePaddle framework.

Now that you have the trained model artifacts and the custom service file, create a model-archive that can be used to create your endpoint on Amazon SageMaker.

Creating a model-artifact file to be hosted on Amazon SageMaker

To load this model in Amazon SageMaker with an MMS BYO container, do the following:

  1. Create a MANIFEST file, which is used by MMS as a model’s metadata to load and run the model.
  2. Add the custom service script created earlier and the trained model-artifacts, along with the MANIFEST file, to a .tar.gz file.

Use the model-archiver tool to do this. Before you use the tool to create a .tar.gz artifact, put all the model artifacts in a separate folder, including the custom service script mentioned earlier. To ease this process, we have made all the artifacts available for you. Run the following commands:

$ curl https://s3.amazonaws.com/model-server/blog_artifacts/PaddlePaddle_blog/artifacts.tgz | tar zxvf -
$ ls -R artifacts/sentiment
paddle_artifacts        paddle_sentiment_analysis.py    word_dict.pickle
embedding_0.w_0    fc_2.b_0    fc_4.w_0    fc_7.b_0    lstm_1.b_0    lstm_4.w_0    moment_12    moment_19    moment_25    moment_31    moment_6
embedding_1.w_0    fc_2.w_0    fc_5.b_0    fc_7.w_0    lstm_1.w_0    lstm_5.b_0    moment_13    moment_2    moment_26    moment_32    moment_7
fc_0.b_0    fc_2.w_1    fc_5.w_0    fc_7.w_1    lstm_2.b_0    lstm_5.w_0    moment_14    moment_20    moment_27    moment_33    moment_8
fc_0.w_0    fc_3.b_0    fc_5.w_1    learning_rate_0    lstm_2.w_0    moment_0    moment_15    moment_21    moment_28    moment_34    moment_9
fc_1.b_0    fc_3.w_0    fc_6.b_0    learning_rate_1    lstm_3.b_0    moment_1    moment_16    moment_22    moment_29    moment_35
fc_1.w_0    fc_3.w_1    fc_6.w_0    lstm_0.b_0    lstm_3.w_0    moment_10    moment_17    moment_23    moment_3    moment_4
fc_1.w_1    fc_4.b_0    fc_6.w_1    lstm_0.w_0    lstm_4.b_0    moment_11    moment_18    moment_24    moment_30    moment_5

Now you are ready to create the artifact required for hosting in Amazon SageMaker, using the model-archiver tool. The model-archiver tool is a part of the MMS toolkit. To get this tool, run these commands in a Python virtual environment because it provides isolation from the rest of the working environment.

The model-archiver tool comes preinstalled when you install mxnet-model-server.

# Create python virtual environment
$ virtualenv py
$ source py/bin/activate
# Lets install model-archiver tool in the python virtual environment
(py) $ pip install model-archiver
# Run the model-archiver tool to generate a model .tar.gz, which can be readily hosted
# on Sagemaker
(py) $ mkdir model-store
(py) $ model-archiver -f --model-name paddle_sentiment \
--handler paddle_sentiment_analysis:handle \
--model-path artifacts/sentiment --export-path model-store --archive-format tgz

This generates a file called sentiment.tar.gz in the /model-store directory. This file contains all the artifacts of the models and the manifest file.

(py) $ ls model-store

You now have all the model artifacts that can be hosted on Amazon SageMaker. Next, look at how to build a container and bring it into Amazon SageMaker.

Building your own BYO container with MMS

In this section, you build your own MMS-based container (also known as a BYO container) that can be hosted in Amazon SageMaker.

To help with this process, every released version of MMS comes with a corresponding MMS base CPU and GPU containers hosted on DockerHub, which can be hosted on Amazon SageMaker.

For this example, use a container tagged awsdeeplearningteam/mxnet-model-server:base-cpu-py3.6. To host the model created in the earlier section, install the PaddlePaddle and numpy packages in the container. Create a Dockerfile that extends from the base MMS image and installs the Python packages. The artifacts that you downloaded earlier come with the sample Dockerfile necessary to install required packages:

(py) $ cat artifacts/Dockerfile.paddle.mms
FROM awsdeeplearningteam/mxnet-model-server:base-cpu-py3.6

RUN pip install --user -U paddlepaddle \
    && pip install --user -U numpy

Now that you have the Dockerfile that describes your BYO container, build it:

(py) $ cd artifacts && docker build -t paddle-mms -f Dockerfile.paddle.mms .
# Verify that the image is built
(py) $ docker images
REPOSITORY      TAG        IMAGE ID            CREATED             SIZE
paddle-mms     latest     864796166b63        1 minute ago        1.62GB

You have the BYO container with all of the model artifacts in it, and you’re ready to launch it in Amazon SageMaker.

Creating an Amazon SageMaker endpoint with the PaddlePaddle model

In this section, you create an Amazon SageMaker endpoint in the console using the artifacts created earlier. We also provide an interactive Jupyter Notebook example of creating an endpoint using the Amazon SageMaker Python SDK and AWS SDK for Python (Boto3). The notebook is available on the mxnet-model-server GitHub repository.

Before you create an Amazon SageMaker endpoint for your model, do some preparation:

  1. Upload the model archive sentiment.tar.gz created earlier to an Amazon S3 bucket. For this post, we uploaded it to an S3 bucket called paddle_paddle.
  2. Upload the container image created earlier, paddle-mms, to an Amazon ECR repository. For this post, we created an ECR repository called “paddle-mms” and uploaded image there.
Creating the Amazon SageMaker endpoint

Now that the model and container artifacts are uploaded to S3 and ECR, you can create the Amazon SageMaker endpoint. Complete the following steps:

  1. Create a model configuration.
  2. Create an endpoint configuration.
  3. Create a user endpoint.
  4. Test the endpoint.
Create a model configuration

First, create a model configuration.

  1. On the Amazon SageMaker console, choose Models, Create model.
  2. Provide values for Model name, IAM role, location of inference code image (or the ECR repository), and Location of model artifacts (which is the S3 bucket where the model artifact was uploaded).

  3. Choose Create Model.
Create endpoint configuration

After you create the model configuration, create an endpoint configuration.

  1. In the left navigation pane, choose Endpoint Configurations, Create endpoint configuration.
  2. Give an endpoint configuration name, choose Add model, and add the model that we created earlier. Then choose create endpoint configuration.

Now we go to the final step, which is creating endpoint for users to send the inference requests to.

Create user endpoint
  1. In the left navigation pane, choose Endpoints, Create endpoint.
  2. For Endpoint name, enter a value such as sentiment and select the endpoint configuration that you created earlier.
  3. Choose Select endpoint configuration, Create endpoint.

You have created an endpoint called “sentiment” on Amazon SageMaker with an MMS BYO container to host a model built with the PaddlePaddle DL framework.

Now test this endpoint and make sure that it can indeed serve inference requests.

Testing the endpoint

Create a simple test client using the Boto3 library. Here is a small test script that sends a payload to the Amazon SageMaker endpoint and retrieves its response:

$ cat paddle_test_client.py

import boto3

runtime = boto3.Session().client(service_name='sagemaker-runtime',region_name='us-east-1')

payload="This is an amazing movie."
response = runtime.invoke_endpoint(EndpointName=endpoint_name,


The corresponding output from running this script is as follows:

b'Prediction : This is a Positive review'

In this post, we showed you how to build and host a PaddlePaddle model on Amazon SageMaker using an MMS BYO container. This flow can be reused with minor modifications to build BYO containers serving inference traffic on Amazon SageMaker endpoints with MMS for models built using many ML/DL frameworks, not just PaddlePaddle.

For a more interactive example to deploy the above PaddlePaddle model into Amazon SageMaker using MMS, see Amazon SageMaker Examples. To learn more about the MMS project, see the mxnet-model-server GitHub repository.

About the Authors

Vamshidhar Dantu is a Software Developer with AWS Deep Learning. He focuses on building scalable and easily deployable deep learning systems. In his spare time, he enjoy spending time with family and playing badminton.

Denis Davydenko is an Engineering Manager with AWS Deep Learning. He focuses on building Deep Learning tools that enable developers and scientists to build intelligent applications. In his spare time he enjoys spending time with his family, playing poker and video games.

  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Listening to your Microsoft Word documents as audio is a great way to save time or to be productive on a long commute. You can easily convert an entire block of text into MP3 format with Amazon Polly. But you can vastly improve your listening experience with just a few simple steps.

In this blog post, I show how you can use a serverless workflow to convert your word documents into MP3 playlists using AWS Lambda and Amazon Polly.

To review a Word document that I needed to listen to, I converted the whole document to one block of text, then converted it to MP3 using Amazon Polly. After listening, I realized that a long, single-voice MP3 file results in a monotonous stream of audio.

Next, I split the document into small parts and processed each part with a different voice and cadence. This process added audio cues to keep me engaged while listening. I came up with the following serverless architecture that takes in a Microsoft Word document and generates MP3 files and an ordered M3U playlist file. I can download my list and listen to the Word document as an audio playlist anywhere!

Solution overview

The following diagram shows the architecture of this solution.

The following steps generate the MP3 files and playlist:

  1. Upload the Word document to the Project bucket at /src.
  2. On upload, a PUT object event triggers the Word to SSML AWS Lambda function.
  3. The Lambda function splits the document into multiple SSML files, assigns a VoiceId tag to each file, and saves them to the project bucket at /ssml.
  4. Several PUT object events in the /ssml key trigger the Amazon Polly SSML to MP3 Lambda function, which starts an Amazon Polly task to convert the SSML document into an MP3 file. The Amazon Polly task then saves the MP3 file in Amazon S3 and the file metadata to the Mp3 metadata table in Amazon DynamoDB.
  5. After Amazon Polly completes its tasks, invoke the m3u builder Lambda function to generate the m3u playlist file and save it to the Project bucket.

The following table shows the solution components and describes how they are used.

Resource Type Description
Project bucket S3 bucket S3 bucket used for storing the Word document before processing, the generated SSML files, the generated MP3 files, and the M3U playlist file. Event notifications on the bucket trigger various Lambda functions.
Word to SSML Lambda function A Lambda function that uses the Java 8 runtime to take in a Word document and split it into several SSML documents based on the contained sections, topics, and paragraphs in the document. The S3 bucket stores the SSML documents, with each file assigned a VoiceId tag used later by the Amazon Polly SSML to MP3 Lambda function.
Amazon Polly SSML to MP3 Lambda function A Lambda function that takes one SSML file in S3 and converts it to MP3 using an Amazon Polly voice that matches the assigned VoiceId. It then stores the MP3 files in the Project bucket. It also saves the metadata of processed files and the corresponding Amazon Polly tasks to a DynamoDB table.
MP3 metadata DynamoDB table A DynamoDB table that stores the metadata of processed SSML files and corresponding Amazon Polly tasks.
M3U builder Lambda function A Lambda function that processes the metadata in the MP3 metadata table database, generates a correctly ordered M3U playlist file, and stores it in the Project bucket.
Building the Word to SSML Lambda function

I used Apache POI to read the Word document and split it into several small SSML files. I provide an extensible implementation that works for any three-level document that contains a set of sections, each containing a set of topics, and each of those topics containing a set of paragraphs.

I used the public Amazon Polly FAQs as an example document, which uses categories of the FAQ (for example, general, billing, data privacy) as the sections. Those sections divide into individual questions for the topics, and into individual answers for the paragraphs.

This same model generally applies to any three-level document: The user supplies a way to identify sections and topics. The default implementation extracts the sections from text with the Heading 1 Word style and identifies topics by recognizing the question mark character in the sentence.


You need a few tools to follow the steps in this post:

  • OpenJDK 8 and Apache Maven 3.5: The Word to SSML Lambda function uses the Java 8 runtime and uses Apache Maven for packaging. Install OpenJDK version 8 or higher and Maven version 3.5 or higher. I tested this solution with Maven version 3.5.0 and OpenJDK Runtime Environment Corretto-
  • AWS Command Line Interface: Some of the instructions assume that you have a working AWS CLI version to execute the test steps.
  • S3 bucket: Lambda functions can only use artifacts from an S3 bucket in the Region in which you choose to deploy your solution. Choose a bucket to reuse, or create a bucket by running the following command:
    aws s3 mb s3://<PROJECT-BUCKET> --region <REGION>
Deployment steps

Follow these steps to deploy your tool.

  1. Clone the GitHub repository for the project.
    git clone https://github.com/aws-samples/amazon-polly-mp3-for-microsoft-word.git
  2. Export the AWS Region, project S3 bucket, and AWS CloudFormation stack name as environment variables for convenience.
    export PROJECT_BUCKET=<your-project-bucket>
    export REGION=<your-region> 
    export STACK_NAME=polly-stack
  3. Change to the project directory and execute the deploy_lambda_cloudformation.sh script to provide your chosen AWS Region, S3 bucket, and name for your CloudFormation stack. This script performs the following actions:
    1. Packages the three Lambda functions and copies it to your S3 bucket.
    2. Copies the CloudFormation template to your S3 bucket.
    3. Deploys the stack with the chosen name.
    4. Waits until the Lambda function successfully creates the stack. This should take approximately two minutes.
    5. Updates the bucket notifications template (scripts/bucket_lambda_notification.json) with values from the stack output.
    6. Adds event notifications to the S3 bucket.
      cd Amazon-Polly-Microsoft-Word-to-MP3
      bash scripts/deploy_lambda_cloudformation.sh $REGION $PROJECT_BUCKET $STACK_NAME
  4. [Optional] After the script executes, in the AWS CloudFormation console, verify that the stack deployed and is in CREATE_COMPLETE status.
  5. In the S3 console, verify that the bucket contains your event notifications. The first notification, on the polly-faq-reader/src/ path, invokes the Word to SSML Lambda function when a new DOCX file uploads to this path. This Lambda function generates several SSML text files and uploads them to the polly-faq-reader/ssml/ A notification set up on this path then invokes the Amazon Polly SSML to MP3 Lambda function. The following screenshot shows sample events.
  6. Now you’re ready to test the MP3 conversion. Copy the demo/src/polly-faq.docx to the Project bucket at polly-faq-reader/src/. This triggers the Lambda functions to generate SSML and MP3 files.
    aws s3 cp demo/src/polly-faq.docx s3://${PROJECT_BUCKET}/polly-faq-reader/src/
  7. List the polly-faq-reader/ prefix in the S3 bucket and verify that it generates new SSML and MP3 directories.
    aws s3 ls s3://$PROJECT_BUCKET/polly-faq-reader/
                               PRE mp3/
                               PRE src/
                               PRE ssml/
  8. Wait about two minutes for the Amazon Polly tasks to complete. To verify when MP3 conversion completes, you can verify that the number of files in the /ssml directory matches the number of /mp3 files.
    aws s3 ls s3://$PROJECT_BUCKET/polly-faq-reader/mp3/ | wc -l
    aws s3 ls s3://$PROJECT_BUCKET/polly-faq-reader/ssml/ | wc -l
  9. The tool builds an M3U playlist file to play all the generated MP3 files in the correct order. In your terminal, in the scripts directory, execute the invoke_m3u_builder.sh script providing your Region, bucket name, and name of your AWS CloudFormation stack.
    bash scripts/invoke_m3u_builder.sh $REGION ${PROJECT_BUCKET} ${STACK_NAME}
  10. Verify that a new polly-faq.m3u file is present in the S3 bucket at polly-faq-reader/mp3/.
    aws s3 ls s3://$PROJECT_BUCKET/polly-faq-reader/mp3/polly-faq.m3u
  11. Download the mp3 files and m3u playlist to your computer.
    cd <your-chosen-mp3-directory>
    aws s3 sync s3://$PROJECT_BUCKET/polly-faq-reader/mp3/ 
  12. Open the m3u playlist file in your preferred media player and listen to the files.
Clean up

To clean up the deployment and avoid incurring future costs, follow these steps:

  1. In the S3 console, select your bucket and delete the two event notifications.
  2. In the AWS CloudFormation console, and delete the polly-stack.
  3. If you no longer need the SSML or MP3 files, delete them. Run the following commands:
    aws s3 rm --recursive s3://$PROJECT_BUCKET/polly-faq-reader/ssml/
    aws s3 rm --recursive s3://$PROJECT_BUCKET/polly-faq-reader/mp3/

In this post, I demonstrated a serverless workflow to convert Microsoft Word documents into an MP3 audio playlist using Amazon Polly and AWS Lambda.

To dig deeper into the code, check out the GitHub repository and create issues for providing feedback or suggesting enhancements. Open-source code contributions are welcome as pull requests.

About the Author

Vinod Shukla is a Partner Solutions Architect at Amazon Web Services. As part of the AWS Quick Starts team, he enjoys working with partners providing technical guidance and assistance in building gold-standard reference deployments.

  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Amazon Transcribe is a fully-managed automatic speech recognition (ASR) service that makes it easy for developers to add speech-to-text capabilities to applications. Depending on your use case, you may have domain-specific terminology that doesn’t transcribe properly (e.g. “EBITDA” or “myocardial infarction”). In this post, we will show you how to leverage the custom vocabulary feature – by leveraging custom pronunciations and custom display forms – to enhance transcription accuracy of domain-specific words or phrases that are relevant to your use case.

Custom vocabulary is a powerful feature that helps users transcribe terms that would otherwise not be part of our general ASR service. For instance, your use case may involve brand names or proper names that are not normally part of a language model’s regular lexicon, like in the case of “Hogwarts”. In this case, it would not only be helpful to be able to add the custom text, but also be able to inform our ASR service on the pronunciation to help our system better recognize unfamiliar terms. On a related note, perhaps you have a term, say, “Lotus” which is a brand name of a car. Naturally, we recognize “lotus” as a flower already. But for your use case, you’d like to have the word transcribed with proper capitalization in the context of recognizing it as a make or model of a vehicle. You can therefore use the recently added custom display forms to achieve this.

So, let’s walk through some examples of using both custom pronunciation and also custom display forms.

First, we’ve recorded a sample audio and stored it in an S3 bucket (this is a pre-requisite and can be achieved by following documentation). For reference, here’s the audio file’s ground truth transcript:

“Hi, my name is Paul. And I’m calling in about my order of two separate LEGO toys. The first one is the Harry Potter Hogwarts Castle that has a cool Grindelwald mini-fig. The second set is a model of the Lotus Elise car. I placed the orders on the same day. Can you tell me when they will be arriving please?”

As you can see, there are some very specific brand names and custom terms. Let’s see what happens when we pass the audio sample through Amazon Transcribe as is. First, let’s sign into the AWS Console and create a new transcription job:

Then, in the next screen, I’ll name my transcription job and reference the S3 bucket in which my sample audio is stored. I’ve selected the language model as US English and identified the file format as WAV. I’ll leave the sample rate blank as that’s optional. And also notice I deliberately left the custom vocabulary field blank, because we want to run a baseline transcription job without using the feature to see performance accuracy as is. I’ve left all of the remaining fields as default, since those are features we’re not interested in using for this baseline test. Then I’ll hit “Create Job” to initiate the transcription.

In the next screen you’ll see that the transcription job has completed with a preview window showing you the output text: “Hi. My name is Paul, and I’m calling in about my order of two separate Lego toys. The first one is the Harry Potter Hogwarts Castle that has a cool, Grendel walled many fig. The second set is a model of the lotus, At least car. I placed the orders on the same day. Can you tell me when they will be arriving, please? Thanks.”

Looks like the transcription output did pretty well overall, except it missed “Grindelwald”, “mini-fig”, and “Lotus Elise”. Additionally, it didn’t capture “LEGO” properly with full capitalization. No surprise, as these are pretty content-specific custom terms.

So, let’s see how we can use the custom vocabulary feature’s custom pronunciation to enhance the transcription output. First, we need to prepare a vocabulary file, which not only lists the custom terms (Phrase), but also indicates the corresponding pronunciations.

Using any simple text editor, I am going to create a new custom vocabulary file. And then type in the terminology (Phrase), the corresponding pronunciation (IPA, while International Phonetic Alphabet guidelines or orthography using SoundsLike), and then any output format of my preference (DisplayAs). In the text editor, I’ve configured the white bars to indicate when typing a tab for blanks where there are no inputs desired. Here’s what the vocabulary text file looks like in my text editor. Notice I basically augmented any of the words that were missed in the baseline transcription. I’ll save the file as “paul-sample-vocab.”

So now, all I have to do is upload this text file via the Amazon Transcribe Console by uploading the vocabulary file and clicking “Create Vocabulary”:

We can confirm that the custom vocabulary was successfully generated, as it will be visible in the custom vocabulary list:

Ok, so now we can start another new transcription job for the same audio file, but this time, we’ll invoke the custom vocabulary text file to see the accuracy results. The process is the same as we had been through before, except this time we will actually designate a custom vocabulary “paul-sample-vocab”. And of course, I’ll name the transcription job something different from the first one, like “customer-call-with-vocabulary”:

Let’s take a look at the new transcription results now!

Here’s the transcription output:

“Hi. My name is Paul, and I’m calling in about my order of two separate LEGO toys. The first one is the Harry Potter Hogwarts Castle. That has a cool Grindelwald mini-fig. The second set is a model of the Lotus Elise car. I placed the orders on the same day. Can you tell me when they will be arriving, please?”

We’ve not only correctly transcribed custom formal nouns such as “Grindelwald” but also custom terms like “mini-fig” which are specific to LEGO toys. And look at that, we also were able to properly capitalize “LEGO” as it is spelled as a brand, along with proper casing for “Lotus Elise” as well.

Custom vocabulary should be used in a targeted manner, meaning that the more specific a list of terms is when applied to specific audio recordings, the better the transcription result. We don’t recommend flooding a single vocabulary file with more than 300 words. The feature is available in all regions where Transcribe is available today. Refer to the Region Table to see the full list.

For more information, refer to the Amazon Transcribe technical documentation. If you have any questions, please leave them in the comments.

About the authors

Paul Zhao is a Product Manager at AWS Machine Learning. He manages the Amazon Transcribe service. Outside of work, Paul is a motorcycle enthusiast and avid woodworker.

Yibin Wang is a Software Development Engineer at Amazon Transcribe. Outside of work, Yibin likes to travel and explore new culinary experiences.

  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Remember the screech of the dial-up and plain-text websites? It was in that era that the Amazon.com website launched in the summer of 1995.

Like the rest of the web, Amazon.com has gone through a digital experience makeover that includes slick web controls, rich media, multi-channel support, and intelligent content placement.

Nonetheless, there are certain aspects of the experience that have remained relatively constant. Navigation for an online shopping experience still includes running searches, following recommendations, and textual navigation. However, with the democratization of IoT and AI, this is the moment for innovators to change the status quo.

Amazon, true to its culture of continuous innovation, has been experimenting with creating new customer experiences. Products like Echo Look use machine learning (ML) to allow a customer to ask “Alexa, how do I look?” Then Alexa gives you real-time feedback on your outfit, and you receive smart, specific, and fun styling advice.

In this blog post, I’ll show you how easy it is to create a shop-by-style experience. I’ll introduce you to AWS services that can put you on the right path for rapid experimentation and innovation of new customer experiences.

To demonstrate the shop-by-style experience, we’re going to use the product catalog from Zappos.com. The catalog consists of a variety of footwear from a large selection of brands that include shoes, boots, and sandals of various types.

Footwear is a great example of where a shop-by-style experience could be helpful. If you’re like me, you don’t know exactly what you’re looking for when you walk into a shoe store. Maybe you have some general preferences like color or a brand’s signature style, so you gravitate to specific selections on the shoe rack.

We can replicate this experience in the digital world with the help of machine learning. I’ll show you how you can deliver a quality experience quickly and economically with the help of the AWS Cloud.

The following animated GIF illustrates the concept. The large image displays the shopper’s current selection, and an ML model is used to identify six products from the catalog that are the most stylistically similar to the selection.

You can implement creative variations of this experience. For instance, your app could share products visually similar to those that the user’s favorite celebrity wears, or your app could use stylistic similarity as one of the features that influence product recommendations.

You can deploy this prototype to your AWS account using AWS CloudFormation by using the following link:

Solution architecture

Our minimalist solution architecture leverages the following AWS services:

Our solution makes use of Amazon SageMaker to manage the end-to-end process of building a deep learning (DL) model. We’ll use PyTorch, which is a DL framework favored by many for rapid prototyping. Together, PyTorch and Amazon SageMaker enable rapid development of a custom model tailored to our needs. However, depending on your preferences, Amazon SageMaker provides you with the choice of using other frameworks like TensorFlow, Keras, and Gluon.

Next, we’ll generate similarity scores using our model, store this data in our Amazon data lake, and use AWS Glue to catalog and transform the data so that it can be loaded into Amazon Neptune, a managed graph database.

Amazon Neptune provides us with a way to build graphical visualizations to analyze the similarity between our products. It’s also designed to serve as an operational database. Therefore, it can back a website by providing low-latency queries under high-concurrency.

We’ll build the rest of the website to be serverless using Amazon API Gateway, AWS Lambda, and Amazon S3. We want to maximize our time spent on creating a great web experience and minimize the time spent on managing servers.

Building a tailored image similarity model

Our journey starts with launching an Amazon SageMaker managed Notebook Instance where we implement PyTorch scripts to build, train, and deploy our deep learning model. Here is a link to a Jupyter notebook that will take you through the entire process. The notebook demonstrates the “Bring-Your-Own-Script” integration for PyTorch on Amazon SageMaker. In this example, we bring our own native PyTorch script that implements a Siamese network (model and training scripts located here).

A Siamese network is a type of neural network architecture and is one of a few common methods for creating a model that can learn the similarities between images. In our implementation, we leverage a pre-trained model provided by PyTorch based on ResNet-152.

ResNet-152 is a convolution neural network (CNN) architecture famous for achieving superhuman level accuracy on classifying images from ImageNet, an image database of over 14 million images. As the following illustration shows, ResNet-152 is a complex model that consists of 512 layers (of neurons) with over 60 million parameters.

A lot of computation is involved in training this model on ImageNet, so it normally takes hours to days depending on the training infrastructure.

It turns out that this model has a lot of “transferable knowledge” acquired from being trained on a large image dataset. The first image that follows is a visualization of the basic features, like edges that a CNN can extract in the early layers. The next two images illustrate how more complex features are learned and extracted in the deeper layers of a trained CNN like ResNet-152.

Intuitively, the pre-trained ResNet-152 model can be used as a feature extractor for images. We can inherit the properties of ResNet-152 through a technique called transfer learning. Transfer learning enables us to create a high-performing model with little data, computational resources, and in less time.

We’re going to take advantage of transfer learning. We do so by replacing the final pre-trained layer of the PyTorch ResNet-152 model with a new untrained extension of the model (which could simply be a single untrained layer). We then re-train this new model on the Zappos catalog while leaving the pre-trained layers immutable.

A dataset like Zappos50k, which has a single image of each of approximately fifty-thousand unique products will suffice for our example.

The Siamese network is trained on image pairings with target values where zero represents a pair of identical images, and values near and up to the value of one represent different images. In effect, the training process translates our images into a numerical encoding of features —referred to as feature vectors—and discovers a dimensional space where the distance between these vectors represents similarity. Details about the Siamese network are illustrated in the following diagram.

Ultimately, this model will provide us a means to measure the visual similarity between product images in the Zappos50k dataset.

This model yielded good results for this scenario, but you should always consider your options. For example, using triplet Loss, k-NN, or another clustering algorithm might be more suitable under certain circumstances. In the notebook that I’ve provided, I demonstrate an unconventional method that also yielded good results. The method is inspired by an DL technique called style transfer, which was first published in this research paper. The technique is generally used for artistic applications. For example, the technique could be used to synthesize an image of your home in the style of the artist Van Gogh by blending a photo of your home with Van Gogh’s Starry Night.

In the provided notebook, I demonstrate that the most important stylistic features of products in our catalog could be extracted through similar techniques to quantify the style of each product. In turn, we can then measure the stylistic similarity between products in our catalog. The technique didn’t require additional model training to produce better results than k-NN search (using the same model with L1 and L2 distance). It is purely an inference technique and can use user input to adapt to varying opinions in style in real time. See the notebook for great results even when using a simpler architecture like VGG-16 or ResNet-34 instead of ResNet-152. The following diagrams illustrate the concept.

After we’ve defined the model architecture in PyTorch, a training job for our PyTorch model can be launched on a cluster of training servers managed by Amazon SageMaker with just a couple of lines of code using the AWS Python SDK. First, we create an Amazon SageMaker estimator object:

estimator = PyTorch(entry_point="siamese.py",

The estimator contains information about the location of your PyTorch scripts, how to run them, and what infrastructure to use.

Next, we can launch the training job by calling the fit method with the location of your training data set in Amazon S3.


Behind the scenes, an Amazon SageMaker managed container for PyTorch is launched with the hardware specs, scripts, and data that were specified.

Model optimization

Depending on the infrastructure selected, we’ll have a good model in minutes to hours. However, we could further improve the performance of our model through a tedious process called hyperparameter tuning. We have the option of accelerating this process by leveraging Amazon SageMaker Automatic Model Tuning. This option is available to us regardless of which framework or algorithm we use.

First, we specify the hyperparameters and the range of values we want the tuning job to search over to discover an optimized model. See the following code snippet from the provided notebook. For our model, we explore a range of learning rates, different sizes for the final layer of our model, and a couple of different optimization algorithms.

                        'learning-rate': ContinuousParameter(1e-6, 1e-4),
                        'similarity-dims': CategoricalParameter([16,32,64,96,128]),
                        'optimizer': CategoricalParameter(['Adam','SGD'])

Second, we need to set an objective metric to define what we’re going to optimize. If your goal is to optimize a classification model, then your objective could be to improve classification accuracy. In this case, we’ve set the objective to minimize loss with this line of code in our notebook.

OBJECTIVE_METRIC_NAME = 'average training loss'

This minimizes the error between our model estimates, and the subjective truth of similarity measurements provided in the training data.

Next, we create a HyperparameterTuner by providing, as input, the PyTorch estimator (the one we created previously), the objective metric, hyperparameter ranges, and the maximum number of training jobs and degree of parallelism. This corresponds to the following code snippet in our notebook:

tuner = HyperparameterTuner(estimator=estimator,
                            objective_metric_name = OBJECTIVE_METRIC_NAME,
                            hyperparameter_ranges  = HYPERPARAM_RANGES,
                            metric_definitions = METRIC_DEFINITIONS,

Third, we launch the tuning job by calling the fit method:

tuner.fit({'train': DATA_S3URI})

The tuning job will launch training jobs according to your configurations, and proceed to find some optimal combination of hyperparameters using Bayesian optimization. This is an ML algorithm designed to accelerate the search for optimal hyperparameters. It’s better than common strategies like random or grid search. The intended benefit is to improve productivity through automation and lower the total training time required to produce an optimized model.

Generating product similarity scores

At the end of our tuning process, Amazon SageMaker delivers a well-tuned model that can be used to produce similarity scores. But we can get more value from our model if we could run graph queries on our similarity scores for analysis. We also need to deliver these queries with consistently low response times at scale to deliver a quality user experience on our customer-facing systems. Amazon Neptune makes this possible.

We’ll take the approach of pre-calculating and storing similarity scores in Amazon Neptune with the help of Amazon SageMaker Batch Transform. Batch Transform is well suited for high-throughput batch processing.

First, we ”bring our own” native PyTorch model serving script over to Amazon SageMaker. By doing so, we can run our script as a batch processing job at scale without having to build and manage the infrastructure. The provided model serving script illustrates a programmatic interface that you can optionally redefine (method override), as we did in our example. Each of the interface functions serves as a stage in a batch inference invocation.

  • Model_fn(…): Loads the model into the PyTorch framework from the trained model artifacts.
  • Input_fn(…): Performs transformations on the input batches.
  • Predict_fn(…): Performs the prediction step logic.
  • Output_fn(…): Performs transformations on predictions to produce results in the expected format.

Launching a batch transform job only requires a few configurations from the AWS Management Console, or a few lines of code using the AWS SDK. There are two distinct steps illustrated in our notebook. The first is model registration:

batchModel = PyTorchModel(model_data=MODELS_S3URI+'/model.tar.gz', 

batchModel.sagemaker_session = sagemaker_session
container_def = batchModel.prepare_container_def(instance_type=BATCH_INSTANCE_TYPE)
sagemaker_session.create_model(BATCH_MODEL_NAME, role, container_def)

After running this code, you should see your trained model listed in the Amazon SageMaker console.

At last, we launch the batch transform job, which could be done programmatically with another couple of lines of code:

from sagemaker.transformer import Transformer

transformer = Transformer(model_name=BATCH_MODEL_NAME,
                          instance_type= BATCH_INSTANCE_TYPE,
                          accept = 'text/csv',

transformer.transform(BATCH_INPUT_S3URI, content_type= 'application/x-npy')

This code creates a Transformer object that is configured to use our trained model, our selected infrastructure, and an Amazon S3 location to write out the results of our job. When the transform method is executed, Amazon SageMaker provisions resources underneath the covers for you to perform the batch job. You can monitor the status of your job from the Amazon SageMaker console.

Transforming inference results to graph data

After our batch inference output is stored in Amazon S3, AWS Glue can run crawlers to automatically catalog this new dataset within our data lake. However, before we can load this data into Amazon Neptune, we need to transform our inference results into one of the supported open graph data formats. We’ll use the Gremlin compatible CSV format to keep our transformations simple. The format requires the graph to be formatted in two set of CSV files. One set defines the graph vertices (complete vertices file provided here), and another set defines the edges.

As a serverless ETL service, AWS Glue allows us to run Apache Spark jobs without managing any infrastructure. I can configure my transform job to run on a schedule, on demand, and optionally use Job Bookmarks for facilitating incremental reoccurring processing. This sample script demonstrates how our batch inference results can be transformed to graph edges compatible with Gremlin.

Let’s go to the AWS Glue console and kick off an AWS Glue job to perform this transformation.

AWS Glue allows us to specify the number of resources to allocate to our ETL job. Our dataset can..

Read for later

Articles marked as Favorite are saved for later viewing.
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Separate tags by commas
To access this feature, please upgrade your account.
Start your free month
Free Preview