Loading...

Follow TensorFlow Official Blog on Feedspot

Continue with Google
Continue with Facebook
or

Valid

A guest article posted by the VSCO Engineering team

At VSCO, we build creative tools, spaces, and connections driven by self-expression. Our app is a place where people can create and edit images and videos, discover tips and new ideas, and connect to a vibrant global community void of public likes, comments, and follower counts.

We use machine learning as a tool for personalizing and guiding each user’s creative process. VSCO has a catalog of over 160 presets, which are pre-configured editing adjustments to help our users transform their images, including emulations of old film camera stock.

A photo of a building before any filter is applied. Image by Sarah Hollander (left) The photo of the building with the AU5 preset applied. Image by Sarah Hollander (right)

However, our research suggested members were overwhelmed by the number of presets and stuck to using the few favorites they knew and liked instead of trying new presets.

The Solution

Our challenge was to overcome decision fatigue by providing trusted guidance and encouraging discovery, while still leaving space for our users to be creative in how they edit their images. Our Imaging team thoughtfully curated presets that complement different types of images, allowing us to deliver a personalized recommendation for each photo. We decided to solve this problem by suggesting presets for images with on-device machine learning using deep convolutional neural network (CNN) models because these models can understand a lot of the nuances in the images, making categorization easier and faster than traditional computer vision algorithms.

With all this in mind, we developed the “For This Photo” feature, which uses on-device machine learning to identify what kind of photo someone is editing and then suggest relevant presets from a curated list. The feature has been so loved by our users that “For This Photo” is now the second most used category after “All”, which shows all presets.

Video of “For This Photo” in action

To understand the flow of how “For This Photo” feature works when a user starts editing an image, we will walk through the steps that take place. When a user loads the image in Edit View, inference via the model is instantly kicked off and the model returns a category for the image. The category ID is then matched with the ID in the cached catalog and a list of presets for that category is retrieved. Six presets are picked, which are a combination of free presets as well as presets that are part of VSCO membership to be shown to the user in the “For This Photo” section. Showing free and paid presets was important to us in order to provide value to our non-members. The presets that are part of VSCO membership can still be previewed by non-members to provide them context on the value of VSCO membership. For members, it has the benefit of educating them on the type of images their member entitlement presets can be used for.

On-device ML ensures accessibility, speed, and privacy

From the get-go, we knew server based machine learning was not an option for this feature. There were three main reasons we wanted this feature to use on-device ML: offline editing, speed, and privacy.

First, we did not want to limit our members’ creativity by offering this feature only when they were online. Inspiration can strike anywhere — someone could be taking and editing photos with limited network connectivity, in the middle of a desert, or high up on a mountain. A large percentage of our user base is outside the United States, so not everyone has access to high-speed internet at all times.

Second, we wanted to ensure editing would be fast. If we offered “For This Photo” with the ML model in the cloud, we’d have to upload users’ images to categorize them (which takes time, bandwidth, and data), and users would have to download the presets, making the process slow on a good connection and impossible on a poor one. Doing the ML on-device means that everything happens locally, quickly, with no connection required. This is crucial for helping users capture the moment and stay in the creative flow.

Third, the editing process is private. A server-side solution would require us to upload user photos while the users were still editing them, before they published them. We wanted to be cognizant of our users’ privacy in their creative process.

Why TensorFlow

Knowing that we wanted to do on-device machine learning with our custom model, TensorFlow Lite was an obvious choice due to the ease of taking a model trained on the server and converting it to a model compatible for the phone (.tflite format) by using the TFLiteConverter.

Also, we had already experienced the success of TensorFlow and TensorFlow Serving in production systems for our server-side ML. TensorFlow’s libraries are designed with running ML in production as a primary focus so we felt TensorFlow Lite will be no different.

We used ML Kit to run inference on a TensorFlow Lite model straight-forward and incorporating it into our app very seamless. This enabled us to take the feature from prototype to production-ready pretty quickly. ML Kit provided higher level APIs for us to take care of initializing and loading a model as well as for running inference on images without having to deal with the lower level TensorFlow Lite C++ libraries directly, making the development process much faster and leaving us with more time to hone our model instead.

Overview of VSCO’s Machine Learning Stack

For our ML stack, we use TensorFlow for deep learning on images and Apache Spark for shallow learning on behavioral data.

In production, we have a real-time cloud-based inference pipeline using TensorFlow Serving that runs every image uploaded to VSCO through various convolutional neural networks in real-time. The inference results of these models are then used for product features like Related Images, Search, Just For You and other sections in Discover, etc. For on-device machine learning, we use the mobile friendly parts of the TensorFlow ecosystem — ML Kit and TensorFlow Lite — we’ll get into the details of this part of our stack in the next section.

Related ImagesJust For YouSearchUser Suggestions

We also have a Spark-based recommendation engine that allows us to train models on large datasets from different sources in our data store: image metadata, behavioral events and relational data. We then use the results of these models to serve personalized recommendations in various forms, for example: user suggestions.

To power other parts of our ML pipeline, we use Elasticsearch for search and relevance features, Apache Kafka for log-based distributed streaming of data that serves as input for all our ML models, and Kubernetes for deploying all our micro-services. The languages we use include Python, C++, Go, Scala, and for on-device integrations, we use Java/Kotlin and Swift/Obj-C.

On-device ML: How “For This Photo” WorksStep One: Categorizing Images

In order to build a model that serves the “For This Photo” feature, we wanted to be able to first assign a category to the image and suggest presets that were designed to work well for that category. The diagram below depicts the process for categorizing an image:

Categorizing an image

We started with image data tagged by our in-house human curators. These curators are photography experts who have a front-row seat to our user behavior, so they know better than anyone what type of content is being shared and what trends are developing. They helped our engineering team come up with image categories for our model, which include art, portrait, vibrant, coastal, nature, architecture, light and shadow, monochrome, and many more. As the saying goes, 90% of machine learning is about cleaning up your data. These steps helped take care of making sure the data our model was based on was reliable.

Using the categorized dataset we created with our human curators, we trained a CNN model in TensorFlow based on SqueezeNet architecture. We chose this architecture because of its smaller size without much loss in accuracy. We converted this trained model from TensorFlow’s Saved Model format to TensorFlow Lite (.tflite) format using the TFLiteConverter for use on Android. Some of the causes of our initial bugs in this stage were due to the mismatch in the version of TFLiteConverter we used compared to the version of TensorFlow Lite library that ML Kit had a reference to via Maven. The ML Kit team was very helpful in helping us fix these issues as we went along.

Once we had a model that could assign categories to images, we were able to bundle it into our app and run inference on images with it using ML Kit. Since we were using our own custom trained model, we used Custom Model API from ML Kit. For better accuracy, we decided to forgo the quantization step in model conversion and decided to use a floating point model in ML Kit. There were some challenges here because ML Kit by default assumes a quantized model. However, with not much effort, we were able to change some of the steps in model initialization to support a floating point model.

Step Two: Suggesting Presets

The next challenge was to suggest presets based on these categories of images. We collaborated with our in-house Imaging team who had created these presets to come up with lists of presets that work well for images in each of these categories. This process included rigorous testing on many images in each category where we analyzed how each preset affected various colors. At the end, we had a curated catalog with presets that are mapped to each category.

Suggesting presets for an image

These curated catalogs are ever-evolving as we add new presets to our membership offering. In order to make it easier for us to update these lists on the fly without the users having to update their app, we decided to store these catalogs on the server and serve them using an API, a microservice written in Go which the mobile clients can check-in with periodically to make sure they have the latest version of the catalogs. The mobile clients cache this catalog and only fetch when there is a new version of the catalog available. However, this approach creates a “cold start” problem for users who do not connect to the internet before trying out this feature for the first time, not giving the app an opportunity to talk to the API and download these catalogs. To solve this, we decided to ship a default version of these catalogs with the app. This allows all users to be able to use this feature regardless of their internet connectivity, as was one of the goals of this feature to begin with.

Results and Conclusion

With “For This Photo”, we accomplished our goal to make editing with presets easier to navigate. We believe that if our members don’t find value in getting new presets, their creative progress is being hindered. We wanted to do better. We wanted to help more users not just discover new presets, but zero in on those presets that best matched what they were working on.

We want to continue to improve “For This Photo” to provide recommendations based on other image characteristics as well as the user’s community actions (e.g. follows, favorites and reposts). In addition, we also want to provide greater context for those recommendations and to encourage our community of creators to discover and inspire each other.

As we look forward to the future of this feature, we are also reflecting on the past. We recognize that this feature and VSCO’s on-device ML capabilities would not have been possible without TensorFlow Lite and ML Kit. We are excited to continue to invest in this area and build more features leveraging this technology in the future.

Suggesting Presets for Images: Building: “For This Photo” at VSCO was originally published in TensorFlow on Medium, where people are continuing the conversation by highlighting and responding to this story.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Posted by the TensorFlow team

At the 2019 TensorFlow Dev Summit we announced the Powered by TF Challenge on DevPost specifically for users to create and share their latest and greatest with TensorFlow 2.0.

Thank you to everyone who joined the challenge, we had over 600 participants. We loved seeing what you built, ranging from an app that shows memes and asserts how you feel and try to cheer you up, to using Convolutional Neural Networks to predict noise in lab-rat MRI signals.

Today we’re excited to announce our winners. We’ll be featuring their work on our blog over the next month, so make sure you check back in to learn more about how they hacked on 2.0.

In the meantime, continue hacking on newly released TensorFlow 2.0 beta and provide feedback! For more on TensorFlow 2.0, join our developers mailing list, file issues with the 2.0 tag, and check out our docs.

  1. Huskarl: a Deep Reinforcement Learning framework, built with Keras

Huskarl is a framework for deep reinforcement learning focused on research and fast prototyping. It’s built on TensorFlow 2.0 and uses the tf.keras API when possible for conciseness and readability.

Huskarl makes it easy to parallelize computation of environment dynamics across multiple CPUs. This is useful for speeding up on-policy learning algorithms that benefit from multiple concurrent sources of experience such as A2C or PPO. It is especially useful for computationally intensive environments such as physics-based ones.

The Huskarl A2C agent learning to balance a cartpole using 16 environment instances simultaneously. The thicker blue line shows the reward obtained using the greedy, target policy. A gaussian epsilon-greedy policy is used when acting in the other 15 environments, with epsilon mean varying from 0 to 1.

2. Nbody.ai: a Python 3 package for generating N-body simulations

A python 3 package for generating N-body simulations, computing transit timing variations (TTV) and retrieving orbit parameters and uncertainties from TTV measurements within a Bayesian framework. Machine learning is used to estimate the orbit parameters and constraints priors before running a retrieval to model orbital perturbations.

Top Left: plots of the orbit positions for each object. Top Middle: Radial velocity semi-amplitude (m/s) for the star. Top Right: Lomb-Scargle periodogram of the RV semi-amplitude signal. Bottom Left: Table of simulation parameters. Bottom Middle: The difference (or residuals) between the observed transit times and a calculated linear ephemeris (O-C). Bottom Righ:t Lomb-Scargle periodogram of the O-C signal for each planet.

3. HandTrack.js: a library for prototyping hand gestures in the browser

Handtrack.js is a library for prototyping real time hand detection (bounding box), directly in the browser. Underneath, it uses a trained convolutional neural network that provides bounding box predictions for the location of hands in an image. The convolutional neural network (ssdlite, mobilenetv2) is trained using the TensorFlow object detection api.

4. DeepPavlov: an open-source library for end-to-end dialog systems and chatbots

The DeepPavlov team built a framework for building dialogue systems that include all state-of-the-art (SOTA) NLP components. The framework contains a set of SOTA NLP models including Named-Entity Recognition (NER), Open-Domain Question Answering (ODQA) and more.

Specifically, their goal is to provide AI-application developers and researchers with a set of pre-trained NLP models, pre-defined dialog system components (ML/DL/Rule-based) and pipeline templates and a framework for implementing and testing their own dialog models

5. Disaster Watch: uses classification on tweets to identify natural disasters

Disaster Watch is a disaster mapping platform that collects data from twitter, extracts disaster-related information from tweets, and visualizes the results on a map. It enables users to quickly locate all the information in different geographic areas at a glance, and to find the physical constraints caused by the disaster, such as non-accessible river bridges, and take an informed action. Such information helps public and disaster responders.

Disaster Watch’s overall architecture

Announcing the winners of the #PoweredByTF 2.0 Dev Post Challenge was originally published in TensorFlow on Medium, where people are continuing the conversation by highlighting and responding to this story.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 
Modeling “Unknown Unknowns” with TensorFlow Probability — Industrial AI, Part 3

Posted by Venkatesh Rajagopalan, Director Data Science & Analytics; Mahadevan Balasubramaniam, Principal Data Scientist; and Arun Subramaniyan, VP Data Science & Analytics at BHGE Digital

We believe in a slightly modified version of George Box’s famous comment: “All models are wrong, some are useful” for a short period of time. Irrespective of how sophisticated a model is, it needs to be updated periodically to accurately represent the underlying system. In the first and second parts of this blog series, we introduced our philosophy of building hybrid probabilistic models infused with physics to predict complex nonlinear phenomena with sparse datasets. In particular, in the second installment, we described how to update probabilistic models with newly available information. We were able to model the known unknown behavior of how our physical system would degrade over time by using data from a different, but similar physical asset at a more advanced stage of degradation.

In this final part, we will describe the uncertainties that are characterized as “unknown unknowns” and the techniques used for effectively modeling them. To bring all the aspects of our modeling philosophy together in one application, we will predict the performance of a lithium ion battery when we don’t know its real deterioration characteristics.

Battery Health Model

Battery storage has become critical for various applications, ranging from consumer devices to electric vehicles. Lithium ion batteries, in particular, are widely used because of their high power and energy densities. Modeling the battery performance is very crucial to predict its state of charge (SoC) and state of health (SoH). As a battery ages, its performance deteriorates in a non-linear manner. Significant research has been undertaken to understand the phenomenon of battery degradation and to develop models for predicting battery life. For more details on the causes of battery degradation and the associated modeling, please refer to the rich literature[1].

Cell potential (voltage) and capacity (usually stated in Ampere-hours) are two primary metrics of interest in a lithium-ion battery. The chart below depicts the time-profile of cell potential in a discharge cycle as a function of current draw. As the discharge current increases, the cell potential decreases more rapidly over time.

Cell potential versus time for different current draws.

The typical degradation in cell potential, across charge-discharge cycles, for a single current discharge (2.5 A), is shown below. With cycling (i.e., usage), the capability of the battery to deliver a voltage will decrease even for the same current draw. Experimental results validating such behavior can be found here[2,3].

Effect of cycling (usage) on cell potential.

While there are several approaches for modeling battery characteristics, including physics and hybrid models, we intend to illustrate a data-driven approach for estimating the state of health (or degradation) of the battery when the physics of degradation is unknown. The key advantage of this approach is that the model requires data only from pristine (little or no degradation) batteries and the extent of degradation can simply be tracked as an anomaly (i.e., a change from the nominal behavior) for incipient detection. Further, the model lends itself to continuous updating with measurements — critical for accurate prediction of cell potential. This continuous model updating is required since every customer operation of an asset is unique. We refer to this updating as a model of one, where we can continuously update the model to track a particular customer’s usage in the field. The challenge is to do this updating in a computationally efficient manner with sporadic, noisy field measurements.

Data Generation

We will use the following equations to generate the discharge curve:

V= a-0.0005*Q + log(b-Q) where a=4–0.25*I and b=6000–250*I

Where a is the linear decrease and b is the asymptotic decrease regions as shown in the discharge curves.

I is the discharge current and Q represents the charge expended by the battery with

The values of a and b are notional and representative of discharge curve profiles[4].

Every application is unique and utilizes the battery differently. Consider two dimensions of variation: the amount of discharge and the rate of discharge. Then consider that both of these dimensions can also be applied to the recharging cycle. In other words, a battery can be discharged and recharged fully or partially; it can also be discharged and recharged slowly or quickly. The usage of each battery and variation in manufacturing process affect how each battery degrades. We simulate this variation by choosing a random value for the deterioration parameter δ and computing a response for a random usage at a random cycle. After multiple discharge-recharge cycles, a battery will degrade in performance and provide a lower voltage and lower overall capacity. To illustrate the concept of degradation tracking with a simple example, we will model the deterioration by functionally altering the linear and asymptotic segments of the battery for the entire cycle by a deterioration factor.

If δ is the deterioration factor, then modified battery responses would be the following:

The data from degraded batteries are used for model updating, and not for building the initial model. Please note that this functional form is only an approximation but still representative of field behavior of a battery.

Known Unknowns versus Unknown Unknowns: With the above dataset, this problem can be cast either as a known-unknown case or an unknown-unknown case. This problem can be solved as a known unknown case, if the modeler knows the physics equation that describes the degradation phenomena and thus understands the specific way in which the degradation parameter (δ) interacts with the physics equation. Knowing these specific details, the modeler can cast this as a model update problem as described in the second part of our blog series.

However, if the underlying physics is unknown, then the modeler has no option but to use the “raw” data from pristine batteries to build a data-driven model and then update the data-driven model as the model performance deteriorates. Without specifically knowing the degradation mechanism, the modeler may be left with no choice but to update a large section of the data-driven model. The hope is that there are sufficient degrees of freedom in the model and sufficient information in the new dataset to update the model accurately. In real-world applications of solving for unknown unknowns, it is common for these conditions to occur.

We will show the latter case of modeling the unknown unknown here by building a DNN model with the data from pristine batteries. We have provided representative values to ensure the reader can create the data to build a “generic” deep learning model.

The process that we will follow for the rest of the blog is shown below:

Modeling process to illustrate unknown unknowns.

The simple DNN model architecture shown below with 5 hidden layers (4,16, 64,16,4) will be used for illustrating the concepts.

Simple Deep Neural Network architecture to model battery performance.

Training the DNN with up to 200 epochs produces a sufficiently accurate model as shown below. Both training and test points are reasonably predicted for the pristine battery.

Training and test predictions from DNN model.

We will add a non-linear degradation to the pristine battery as described above. The specific instance of the degraded battery performance is shown by the blue dots below for a degradation parameter (δ) of 0.163. Based on actual usage, the blue dots denote actual measurements of a battery during a discharge cycle.

Pristine DNN model predictions compared to degraded battery data.

Adding degradation almost immediately makes this model inaccurate. Since we don’t know the exact mechanism of degradation, we will choose to update the hyperparameters of the last hidden layer[5], with a particle filter.

Model Updating with Particle Filter

Particle filtering (PF) is a technique for implementing a recursive Bayesian filter by Monte Carlo simulations. The key idea is to represent the posterior density function with a set of random samples with associated weights; we compute estimates based on these samples and weights. As the number of samples becomes very large, the PF estimate approaches the optimal Bayesian estimate. For an extensive discussion on the various PF algorithms, please refer to this tutorial.

As with the Unscented Kalman Filter (UKF) methodology described in our previous blog post, we will model the “states” of the PF as a random walk process. The output model is the DNN. The equations describing the process and measurement models are summarized below:

x[k] = x[k-1] + w[k]
y[k] = h(x[k], u[k]) + v[k]

where:

  • x represents the states of the PF (i.e., the hyper-parameters of the DNN that need to be updated)
  • h is the functional form of the DNN
  • u represent the inputs of the DNN: current and capacity
  • y is the output of the DNN: cell potential
  • w represents the process noise
  • v represents the measurement noise

The PF algorithm makes no assumptions other than the independence of noise samples in the process and measurement models.

The particle filter is robust for updating the DNN model with field measurements as well as versatile enough to be used for model training. Refer to this paper for a detailed discussion. The salient steps of the PF-based model update methodology are summarized below:

  1. Generate initial particles from a proposed density, called the importance density
  2. Assign weights to the particles and normalize the weights
  3. Add process noise to each particle
  4. Generate output predictions for all the particles
  5. Calculate likelihood of error between actual and predicted output value, for each particle
  6. Update a particle weight using its error likelihood
  7. Normalize the weights
  8. Calculate effective number of samples
  9. If effective number of samples < Threshold, then resample the particles
  10. Estimate the state vector and its covariance
  11. Repeat steps 3 to 10, for each measurement.

The PF-based methodology for updating the DNN model has been implemented as a Depend On Docker project and can be found in this repository, and in this Google Colab. As stated earlier, PF assumes only that the noise in the process and measurement model are independent. Accordingly, we need not assume that the noise follows a Gaussian distribution, in contrast to the UKF assumptions. Thus, the extensive functionality provided by TensorFlow Probability’s tfp.distributions module can be used for implementing all the key steps in the particle filter, including:

  • generating the particles,
  • generating the noise values, and
  • computing the likelihood of the observation, given the state.

Code snippets for a few steps of the algorithm are provided below.

Generating initial set of particles

The particle filter is initialized with a set of particles generated using TF Probability.

Generating output for each particle

Output predictions are generated for every particle by setting the particle values to the bias and the weights of the last layer of the DNN and running the model.

Updating weight of each particle

Once the predictions have been generated, the likelihood of a particle being in the true state is calculated and the weight associated with that particle is updated accordingly.

Resampling the particles

Particles are resampled, if required, through systematic resampling.

Results and Discussion

Updating the last layer of weights and biases of the baseline DNN model with the first datapoint using the particle filter produces the model predictions shown below in green. The shaded region signifies the prediction uncertainty computed using 500 random particles.

Updated model prediction (from 1st update point).

The algorithm chooses the next update point as the time at which the observation falls outside the prediction uncertainty. In this case, it is the time point 13 (close to 700 seconds). Incrementally updating the model whenever the observed data falls beyond the model prediction uncertainty produces the desired accuracy as shown below.

Sequential model update results initiated based on prediction uncertainty.

Combining all the models together, we get the prediction across the entire degradation cycle. Clearly the baseline model misses the degradation completely resulting in a 50% error in voltage predictions at the end of the battery cycle. Simple updates with prediction uncertainty thus enables the modeler to account for “unknown unknowns”.

Final model (only latest updates) predictions compared to initial model.Comparison of model prediction errors.

For the last hidden layer of four nodes, we depict below the initial state of the battery model and the final updated state. Clearly, the state has changed significantly and the model predictions improve even though the updates were performed with only a few data points. The change in the state distributions can also be viewed as an indication of the anomaly (in this case, the deterioration δ) in the system. Thus, these plots indicate that there was sufficient information in the data to update the models. By contrast, updating the model with data sets that do not have any relevant information will result in the model continuing to be inaccurate. Thus, choosing the right time to update the model is as important — or more important — as the method used to update the model.

Comparison of initial and final state of the parameters of the last DNN hidden layer.Summary

Making AI real in the industrial world requires a combination of several disciplines: domain knowledge, probabilistic methods, traditional machine learning, and deep learning. We have shown the techniques to combine effectively these disciplines to model real-world phenomena with limited and uncertain data. The third part of this series of blogs focused on modeling “unknown unknowns”. To build a digital twin of an industrial asset, it is essential to build a model of “one” — where the model continuously adapts to track field measurements over time for every single “asset” in the ecosystem. The particle filter methodology for updating the weights and biases of a DNN layer built on pristine data, can be used to build the model of “one”.

In real-world industrial applications, there are many challenges that limit the quantity of useful data, including noise, missing values, and inconsistent measurements. Therefore, choosing what to model, selecting an appropriate modeling methodology (physics, data-driven or hybrid), and choosing which parameters to update continuously are critical to maintaining useful models.

Acknowledgments

This blog is a result of the hard work and deep collaboration of several teams at Google and BHGE. We are particularly grateful to Mike Shwe, Josh Dillon, Scott Fitzharris, Alex Walker, Fabio Nonato and Gautam Subbarao for their many edits, results, code snippets, and — most importantly — enthusiastic support.

Citations:

  1. Modeling of Battery Life (https://www.sandia.gov/ess-ssl/EESAT/2003_papers/Liaw.pdf)
  2. Remaining capacity estimation of lithium-ion batteries based on the constant voltage charging profile (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6034863/)
  3. https://batteryuniversity.com/learn/article/what_causes_lithium_ion_to_die
  4. Li-ion Battery Aging Datasets (https://c3.nasa.gov/dashlink/resources/133/)
  5. https://www.learnopencv.com/keras-tutorial-fine-tuning-using-pre-trained-models/

Modeling “Unknown Unknowns” with TensorFlow Probability — Industrial AI, Part 3 was originally published in TensorFlow on Medium, where people are continuing the conversation by highlighting and responding to this story.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Posted by Laurence Moroney, Martin Aguinis

TensorFlow tiene una comunidad global de desarrolladores que amplían los límites del aprendizaje automático. Pero hemos recibido muchos comentarios de desarrolladores a los que les encantaría ver sus idiomas representados en tutoriales para que puedan ayudar a sus comunidades a comprender cómo funciona realmente el aprendizaje automático e inteligencia artificial.

Así que hemos estado trabajando arduamente para recopilar algunos de los mejores videos de ‘Coding TensorFlow’ con hablantes nativos de diferentes idiomas, comenzando con Coding TensorFlow en español.

En la primera serie, Martin Aguinis (@MartinAguinis) nos enseña acerca de TensorFlow.js, una biblioteca que trae el poder de TensorFlow al navegador.

En el episodio 1, Martin analiza cómo hacer funcionar TensorFlow.js en el navegador. Verás cómo crear un ejemplo muy simple de Machine Learning (o aprendizaje automático) y cómo se ejecuta completamente en el navegador, ¡desde el entrenamiento del modelo hasta su ejecución!

En el episodio 2, Martin muestra cómo usar archivos CSV para cargar datos en TensorFlow, dividiendo los datos en características y etiquetas. Estos se ajustan entre sí para crear un modelo de aprendizaje automático que se puede usar para predecir valores futuros.

El episodio 3 une todo. Martin demuestra cómo crear una red neuronal y entrenarla en base a los datos que procesó en el episodio 2. Esta red neuronal será entrenada para crear un modelo que, al ver datos sobre una flor, puede predecir qué tipo de flor es!

Háganos saber qué otro tipo de contenido de TensorFlow le gustaría ver en su idioma y comuníquese con nosotros para saber qué otros idiomas le gustaría ver en “Coding TensorFlow.”

Suscríbete al canal TensorFlow → http://bit.ly/TensorFlow1

Presentando Coding TensorFlow En Español was originally published in TensorFlow on Medium, where people are continuing the conversation by highlighting and responding to this story.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 
TensorFlow Model Optimization Toolkit — Post-Training Integer Quantization

Posted by the TensorFlow Model Optimization Team

Since we introduced the Model Optimization Toolkit — a suite of techniques that both novice and advanced developers can use to optimize machine learning models for deployment and execution — we have been working hard to reduce the complexity of quantizing machine learning models.

Initially, we supported post-training quantization via “hybrid operations”, which is quantizing the parameters of the model (i.e. weights), but allowing certain parts of the computation to take place in floating point. Today, we are happy to announce the next addition to our tooling: post-training integer quantization. Integer quantization is a general technique that reduces the numerical precision of the weights and activations of models to reduce memory and improve latency.

Quantize models to reduce size, latency, and power consumption with negligible accuracy lossWhy you should use post-training integer quantization

Our previously released “hybrid” post training quantization approach reduced the model size and latency in many cases, but it has the limitation of requiring floating point computation, which may not be available in all hardware accelerators (i.e. Edge TPUs), but makes it suitable for CPU.

Our new post-training integer quantization enables users to take an already-trained floating-point model and fully quantize it to only use 8-bit signed integers (i.e. `int8`). By leveraging this quantization scheme, we can get reasonable quantized model accuracy across many models without resorting to retraining a model with quantization-aware training. With this new tool, models will continue to be 4x smaller, but will see even greater CPU speed-ups. Fixed point hardware accelerators, such as Edge TPUs, will also be able to run these models.

Compared to quantization-aware training, this tool is much simpler to use, and offers comparable accuracy on most models. There may still be use cases where quantization-aware training is required, but we expect this to be rare as we continue to improve post-training tooling.

In summary, a user should use “hybrid” post training quantization when targeting simple CPU size and latency improvements. When targeting greater CPU improvements or fixed-point accelerators, they should use this integer post training quantization tool, potentially using quantization-aware training if accuracy of a model suffers.

How to enable post-training integer quantization

Our integer quantization tool requires a small calibration set of representative data. By simply providing the representative_dataset generator to the converter, the optimization parameter will perform integer quantization on the input model.

Is the model entirely quantized?

Just like the existing post-training quantization functionality, by default, the operations (“ops”) that do not have quantized implementations will automatically be left in floating point. This allows conversion to occur smoothly, and will produce a model that will always execute on a typical mobile CPU — consider that TensorFlow Lite will execute the integer operations in the integer-only accelerator, falling back to CPU for the operations involving floating point. To execute entirely on specialized hardware that does not support floating point operations at all (for example, some machine learning accelerators, including the Edge TPU), you can specify a flag in order to output only integer operations:

When this flag is used and an operation has no integer quantizable counterpart, the TensorFlow Lite Converter will throw an error.

Very little data is needed

In our experiments, we found that a few dozen examples that are representative of what the model will see during execution are sufficient to get the best accuracy. For instance the accuracy numbers below are from models calibrated on only 100 images from the ImageNet dataset.

ResultsLatency

Compared to their float counterparts, quantized models are up to 2–4x faster on CPU and 4x smaller. We expect further speed-ups with hardware accelerators, such as Edge TPUs.

Accuracy

With just 100 calibration images from ImageNet dataset, fully quantized integer models have comparable accuracy with their float versions (MobileNet v1 loses 1%).

How these integer models workRecording dynamic ranges

Our new tool works by recording dynamic ranges, running multiple inferences on a floating point TensorFlow Lite model, using the user-provided representative dataset as input. We use the values logged from inferences to determine the scaling parameters needed to execute all tensors of the model in integer arithmetic.

Int8 quantization scheme

It is important to note that our new quantization specification enabled this post-training use case that uses per-axis quantization for certain operations. Prior to our addition of per-axis quantization, post-training integer quantization was impractical due to accuracy drops; but the accuracy benefits of per-axis bring the accuracy much closer to float for many models.

8-bit quantization approximates floating point values using the following formula:

real_value = (sint8_value — zero_point) * scale.

Per-axis (also known as “per-channel”) or per-layer weights represented by int8 two’s complement values in the range [-127, 127] with zero-point equal to 0.

Per-layer activations/inputs represented by int8 two’s complement values in the range [-128, 127], with a zero-point in range [-128, 127].

For more details, see the full quantization specification.

What about quantization aware training?

We believe in making quantization as simple as possible. Hence, enabling a way to quantize models after training is something that we are very excited about! However, we also know that some models preserve the best quality when they are trained with quantization. That’s why we are also working on a quantization aware training API. In the meantime, we encourage you to try post-training quantization, since it may be all your model needs!

Documentation and tutorial

On the TensorFlow website you can find out more about post-training integer quantization, our new quantization spec, and a post-training integer quantization tutorial. We’d love to hear how you use this — share your story!

Acknowledgements

Suharsh Sivakumar, Jian Li, Shashi Shekhar, Yunlu Li, Alan Chiao, Raziel Alvarez, Lawrence Chan, Daniel Situnayake, Tim Davis, Sarah Sirajuddin

TensorFlow Model Optimization Toolkit — Post-Training Integer Quantization was originally published in TensorFlow on Medium, where people are continuing the conversation by highlighting and responding to this story.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Posted by Robby Neale, Software Engineer

TensorFlow provides a wide breadth of ops that greatly aid in building models from images and video. However, there are many models that begin with text, and the language models built from these require some preprocessing before the text can be fed into the model. For example, the Text Classification tutorial that uses the IMDB set begins with text data that has already been converted into integer IDs. This preprocessing done outside the graph may create skew if it differs at training and inference times, and requires extra work to coordinate these preprocessing steps.

TF.Text is a TensorFlow 2.0 library that can be easily installed using PIP and is designed to ease this problem by providing ops to handle the preprocessing regularly found in text-based models, and other features useful for language modeling not provided by core TensorFlow. The most common of these operations is text tokenization. Tokenization is the process of breaking up a string into tokens. Commonly, these tokens are words, numbers, and/or punctuation.

Each of the included tokenizers return RaggedTensors with the innermost dimension of tokens mapping to the original individual strings. As a result, the resulting shape’s rank is increased by one. This is illustrated below, but also please review the ragged tensor guide if you are unfamiliar with RaggedTensors.

Tokenizers

We are initially making available three new tokenizers (as proposed in a recent RFC). The most basic new tokenizer is the whitespace tokenizer that splits UTF-8 strings on ICU defined whitespace characters (eg. space, tab, new line).

The initial release also includes a unicode script tokenizer, which splits UTF-8 strings based on Unicode script boundaries. Unicode scripts are collections of characters and symbols that have historically related language derivations. View the International Components for Unicode (ICU) UScriptCode values for the complete set of enumerations. It’s worth noting that this is similar to the whitespace tokenizer with the most apparent difference being that it will split punctuation USCRIPT_COMMON from language texts (eg. USCRIPT_LATIN, USCRIPT_CYRILLIC, etc).

The final tokenizer provided in the TF.Text launch is a wordpiece tokenizer. It is an unsupervised text tokenizer which requires a predetermined vocabulary for further splitting tokens down into subwords (prefixes & suffixes). Wordpiece is commonly used in BERT models.

Each of these tokenizes on UTF-8 encoded strings and includes an option for getting byte offsets into the original string. This allows the caller to know the byte alignment into the original string for each token that was created.

Conclusion

This just brushes the surface of TF.Text. Along with these tokenizers, we are also including ops for normalization, n-grams, sequence constraints for labeling, and more! We encourage you to visit our Github repository, and try using these ops in your own model development. Installation is easy with PIP.

pip install tensorflow-text

And for more in depth working examples, please view our Colab notebook. It includes a variety of code snippets for many of the newly available ops not discussed here. We look forward to continuing this effort and providing even more tools to make your language models even easier to build in TensorFlow.

Introducing TF.Text was originally published in TensorFlow on Medium, where people are continuing the conversation by highlighting and responding to this story.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

We’re delighted to announce the release of the beta of TensorFlow 2.0 today. It can be installed using:

>  pip install tensorflow==2.0.0-beta0
  • Keras as a high-level API for quick and easy model design and training
  • Eager execution as a default for fast, intuitive development and debugging
  • @tf.function for graph performance and portability

We are already seeing how these usability improvements in the Alpha release are helping users get started, and are thrilled to see how the TensorFlow community is growing. Over 130,000 students have enrolled in the deeplearning.ai and Udacity courses that launched alongside the alpha, and the GitHub repository has gotten over 128,000 stars and been forked over 75,000 times.

In this beta release, we have completed renaming and deprecating symbols for the 2.0 API. This means the current API is final and is also available as a v2 compatibility module inside the TensorFlow 1.14 release. (A list of all symbol changes can be found here.) We have also added 2.0 support for Keras features like model subclassing, simplified the API for custom training loops, added distribution strategy support for most kinds of hardware, and lots more.

Core components of TensorFlow product ecosystem such as TensorBoard, TensorFlow Hub, TensorFlow Lite, and TensorFlow.js work with the Beta. Support for TensorFlow Extended (TFX) components and end to end pipelines is still in progress.

We’ve closed over 100 issues you reported against the alpha release, and we continue to iterate on what’s left. We value all your feedback as it has helped get where we are today. Please keep it coming!

Between beta and the release candidate (RC) for the final 2.0 version, we will be completing Keras model support on Cloud TPUs and TPU pods, further work on performance, and closing even more issues. A list of known open TensorFlow 2.0 issues is on the issue tracker, and you can track our progress in the release notes.

We are aiming to reach RC sometime this summer. In the meantime, please test the beta out and provide your feedback! For more on TensorFlow 2.0, join our developers mailing list, file issues with the 2.0 tag, and check out our docs.

Announcing TensorFlow 2.0 Beta was originally published in TensorFlow on Medium, where people are continuing the conversation by highlighting and responding to this story.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Posted by: Pooya Davoodi (NVIDIA), Guangda Lai (Google), Trevor Morris (NVIDIA), Siddharth Sharma (NVIDIA)

Last year we introduced integration of TensorFlow with TensorRT to speed up deep learning inference using GPUs. This article dives deeper and share tips and tricks so you can get the most out of your application during inference. Even if you are unfamiliar with the integration, this article provides enough context so you can follow along.

By the end of this article, you will know:

  • Models supported and integration workflow
  • New techniques such as quantization aware training to use with INT8 precision
  • Profiling techniques to measure performance
  • New experimental features and a peek at the roadmap
Three Phases of Optimization with TensorFlow-TensorRT

Once trained, a model can be deployed to perform inference. You can find several pre-trained deep learning models on the TensorFlow GitHub site as a starting point. These models use the latest TensorFlow APIs and are updated regularly. While you can run inference in TensorFlow itself, applications generally deliver higher performance using TensorRT on GPUs. TensorFlow models optimized with TensorRT can be deployed to T4 GPUs in the datacenter, as well as Jetson Nano and Xavier GPUs.

So what is TensorRT? NVIDIA TensorRT is a high-performance inference optimizer and runtime that can be used to perform inference in lower precision (FP16 and INT8) on GPUs. Its integration with TensorFlow lets you apply TensorRT optimizations to your TensorFlow models with a couple of lines of code. You get up to 8x higher performance versus TensorFlow only while staying within your TensorFlow environment. The integration applies optimizations to the supported graphs, leaving unsupported operations untouched to be natively executed in TensorFlow. The latest version of the integrated solution is always available in the NVIDIA NGC TensorFlow container.

The integrated solution can be applied to models in applications such as object detection, translation, recommender systems, and reinforcement learning. Accuracy numbers for an expanding set of models including MobileNet, NASNet, Inception and ResNet are available and updated regularly.

Once you have the integration installed and a trained TensorFlow model, export it in the saved model format. The integrated solution then applies TensorRT optimizations to the subgraphs supported by TensorFlow. The output is a TensorFlow graph with supported subgraphs replaced with TensorRT optimized engines executed by TensorFlow. The workflow and code to

achieve the same are below:

Fig 1 (a) workflows when performing inference in TensorFlow only and in TensorFlow-TensorRT using ‘savedmodel’ format
import tensorflow.contrib.tensorrt as trt
trt.create_inference_graph(
input_saved_model_dir = input_saved_model_dir,
output_saved_model_dir = output_saved_model_dir)

Another approach to export the TensorFlow model for inference is to freeze the trained model graph for inference. The image and code snippet below shows how to apply TensorRT optimizations to a graph in TensorFlow when using this approach. The output is a TensorFlow graph with supported subgraphs replaced with TensorRT optimized engines that can then be executed by TensorFlow.

Fig 1 (b) workflows when performing inference in TensorFlow only and using TensorFlow-TensorRT using frozen graphs
import tensorflow.contrib.tensorrt as trt
converted _graph_def = trt.create_inference_graph(
input_graph_def = frozen_graph,
outputs-[‘logits’, ‘classes’])

We maintain an updated list of operations supported by the integrated workflow.

Three operations are performed in the optimization phase of the process outlined above:

  1. Graph partition.TensorRT scans the TensorFlow graph for sub-graphs that it can optimize based on the operations supported.
  2. Layer conversion. Converts supported TensorFlow layers in each subgraph to TensorRT layers.
  3. Engine optimization. Finally, subgraphs are then converted into TensorRT engines and replaced in the parent TensorFlow graph.

Let’s look at an example of this process.

Example walkthrough

Take the graph below as an example. Green blocks highlight ops supported by TensorRT and gray blocks show an unsupported op (“Cast”).

The first phase of the optimization partitions the TensorFlow graph into TensorRT compatible versus non-compatible subgraphs. We traverse the graph backwards starting with the Relu operation (a) and add one node at a time to get to the largest subgraph possible. The only constraint is that the subgraph should be a direct cyclic graph and have no loops. The largest subgraph that can be created is shown in ©. The cluster adds all nodes till it gets to the reshape op. Then there is a loop (d), so it goes back. We now add a new cluster for it, so we finally end with 2 TensorRT compatible subgraphs (e).

Fig 2 (a) Example graph, TensorRT supported nodes in green, graph selected for optimization shown in orange box (b) 4 ops in the subgraph, no loops yet © adding Conv2D also does not add a loop (d) adding reshape to this subgraph creates a loop (e) 2 subgraphs created resolving the loopControlling Minimum Number of Nodes in a TensorRT engine

In the example above, we generated two TensorRT optimized subgraphs: one for the reshape operator and another for all ops other than cast. Small graphs, such as ones with just a single node, present a tradeoff between optimizations provided by TensorRT and the overhead of building and running TRT engines. While small clusters might not deliver high benefit, accepting only very large clusters would leave possible optimizations that were applicable to smaller clusters on the table. You can control the size of subgraphs by using the minimum_segment_size parameter. Setting this value to 3 (default value) would not generate TensorRT engines for subgraphs consisting of less than three nodes. In this example, a minimum segment size of 3 would skip having TensorRT optimize the reshape op even though it’s eligible for the TensorRT optimization, and will fall back to TensorFlow for the reshape op.

converted_graph_def = create_inference_graph(
input_saved_model_dir=model_dir,
minimum_segment_size=3,
is_dynamic_op=True,
maximum_cached_engines=1)

The final graph includes 2 subgraphs or clusters (Fig 3a).

Fig 3 (a) subgraph with TensorFlow operations (b) TensorFlow subgraph replaced with a TensorRTEngineOp. Next, the TensorRT compatible subgraph is wrapped into custom op called TRTEngineOp. The newly generated TensorRT op is then used to replace the TensorFlow subgraph. The final graph has 3 ops (Fig 3b).Variable Input Shapes

TensorRT usually requires that all shapes in your model are fully defined (i.e. not -1 or None, except the batch dimension) in order to select the most optimized CUDA kernels. If the input shapes to your model are fully defined, the default setting of is_dynamic_op=False can be used to build the TensorRT engines statically during the initial conversion process. If your model does have unknown shapes for models such as BERT or Mask R-CNN, you can delay the TensorRT optimization to execution time when input shapes will be fully specified. Set is_dynamic_op to true to use this approach.

converted_graph_def = create_inference_graph(
input_saved_model_dir=model_dir,
minimum_segment_size=3,
is_dynamic_op=false,
maximum_cached_engines=1)

Next, the graph is traversed in topological order to convert each TensorFlow op in the subgraph to one or more TensorRT layers. And finally TensorRT applies optimizations such as layer and tensor fusion, calibration for lower precision, and kernel auto-tuning. These optimizations are transparent to the user and are optimized for the GPU that you plan to run inference on.

Fig 4 (a) TensorFlow subgraph before conversion to TensorRT layers (b) first TensorFlow op is converted to TensorRT layer © All TensorFlow ops converted to TensorRT layers (d) final TensorRT engine from the graphsTensorRT Engine Cache and Variable Batch Sizes

TensorRT engines can be cached in an LRU cache located in the TRTEngineOp op. The key to this cache are the shapes of the op inputs. So a new engine is created if the cache is empty or if an engine for a given input shape does not exist in the cache. You can control the number of engines cached with the maximum_cached_engines parameter as below.

converted_graph_def = create_inference_graph(
input_saved_model_dir=model_dir
minimum_segment_size=3,
is_dynamic_op=True,
maximum_cached_engines=1)

Setting the value to 1, will force any existing cache to be evicted each time a new engine is created.

TensorRT uses batch size of the inputs as one of the parameters to select the highest performing CUDA kernels. The batch size is provided as the first dimension of the inputs. The batch size is determined by input shapes during execution when is_dynamic_op is true, and by the max_batch_size parameter when is_dynamic_op is false. An engine can be reused for a new input, if:

  • engine batch size is greater than or equal to the batch size of new input, and
  • non-batch dims match the new input

So in Fig 5a below, we do not need to create a new engine as the new batch size (2) is less than the batch size of the cached engine (4) while the other inputs dimension ([8,8,3] and [9,9,5] in this case) are the same. In 5b, the time non-batch size input dims are different ([8,8,3] vs [9,9,5]), and so a new engine will need to be generated. The final schematic representation of cache with the engines is shown in 5c.

Fig 5 (a), (b), © from left to right

Increase the maximum_cached_engines variable to prevent recreation of engines as much as possible. Caching more engines uses more resources on the machine, but we have not found that to be a problem for typical models.

Inference in INT8 Precision

Tesla T4 GPUs introduced Turing Tensor Core technology with a full range of precision for inference, from FP32 to FP16 to INT8. Tensor Cores deliver up to 30 teraOPS (TOPS) of throughput on the Tesla T4 GPUs. Using INT8 and mixed precision reduces the memory footprint, enabling larger models or larger mini-batches for inference.

Fig 6 Tensor Core performing matrix multiplication in reduced precision and accumulate in higher precision

You might wonder how it is possible to take a model operating in 32 bit floating point precision, representing billions of different numbers, and reduce all of that to an 8 bit integer only representing 256 possible values. Typically, the values of weights and activations lie in some small range in deep neural networks. If we can focus our precious 8 bits just in that range, we can maintain good precision with just some small rounding error.

TensorRT uses “symmetric linear quantization” for quantization, a scaling operation from the FP32 range (which is -6 to 6 in Fig 7) to the INT8 range (which for us is -127 to 127 to preserve symmetry). If we can find the range where the majority of values lie for each intermediate tensor in the network, we can quantize that tensor using that range while maintaining good accuracy

Quantize(x, r) = round(s * clip(x, -r, r))
where s = 127 / r
Fig 7: x is the input, r is the floating point range for a tensor, s is the scaling factor for number of values in INT8. Equation above takes the input x and returns a quantized INT8 value.

While an exhaustive treatment is out of scope for this article, two techniques are commonly used to determine activation ranges for each tensor in a network: calibration and quantization aware training.

Calibration is the recommended approach and works with most models with minimal accuracy loss (<1%). For calibration, inference is first run on a calibration dataset. During this calibration step, a histogram of activation values is recorded.The INT8 quantization ranges are then chosen to minimize information loss. Quantization happens late in the process, which becomes a new source of error for the training. See the code example below on how to perform calibration:

import tensorflow.contrib.tensorrt as trt
calib_graph = trt.create_inference_graph(…
precision_mode=’INT8',
use_calibration=True)
with tf.session() as sess:
tf.import_graph_def(calib_graph)
for i in range(10):
sess.run(‘output:0’, {‘input:0’: my_next_data()})
# data from calibration dataset
converted_graph_def = trt.calib_graph_to_infer_graph(calib_graph)

When using calibration for INT8, the quantization step happens after the model has been trained. This means no way exists to adjust the model for the error at that stage. Quantization-aware training tries to address this, though this is still in its early stages and released as an experimental feature. Quantization aware training models the quantization error during a fine-tuning step of training and quantization ranges are learned during training. This allows your model to compensate for the error. This can provide better accuracy than calibration in some cases.

Augment the graph with quantization nodes and then train the model as normal to perform quantization aware training. The quantization nodes will model the error due to quantization by clipping, scaling, rounding, and unscaling the tensor values, allowing the model to adapt to the error. You can use fixed quantization ranges or make them trainable variables. You can use tf.quantization.fake_quant_with_min_max_vars with narrow_range=True and max=min to match TensorRT’s quantization scheme for activations.

Fig 8 Quantization nodes in orange inserted in the TensorFlow graph

Other changes involve setting precision_mode=”INT8” and use_calibration=false as shown below:

calib_graph_def = create_inference_graph(
input_saved_model_dir=input_saved_model_dir,
precision_mode=”INT8",
use_calibration=False)

This extracts the quantization range from the graph and gives you the converted model for inference. The error is modeled using fake quantization nodes, for each one the range can be learned using gradient descent. TF-TRT will automatically absorb the learned quantization ranges from your graph and will create an optimized INT8 model ready for deployment.

Note that INT8 inference must be modeled as closely as possible during training. This means that you must not introduce a TensorFlow quantization node in places that will not be quantized during inference (due to a fusion occurring). Operation patterns such as Conv > Bias > Relu or Conv > Bias > BatchNorm > Relu are usually fused together by TensorRT, therefore, it would be wrong to insert a quantization node in between any of these ops. Learn more in the quantization aware training documentation.

Debugging and Profiling Tools for TensorFlow-TensorRT Applications

You can find many tools available for profiling a TensorFlow-TensorRT application, ranging from command-line profiler to GUI tools, including nvprof, NVIDIA NSIGHT Systems, TensorFlow Profiler, and TensorBoard. The easiest to begin with is nvprof, a command-line profiler available for Linux, Windows, and OS X. It is a light-weight profiler which presents an overview of the GPU kernels and memory copies in your application. You can use nvprof as below:

nvprof python <your application name>

NVIDIA NSIGHT Systems is an system-wide performance analysis tool designed to visualize an application’s algorithms, help users investigate bottlenecks, pursue optimizations with higher probability of performance gains, and tune to scale efficiently across any quantity or size of CPUs and GPUs. It also provides valuable insight into the behaviors and load of deep learning frameworks such as PyTorch and TensorFlow; allowing users to tune their models and parameters to increase overall single or multi-GPU utilization.

Let’s look at a use case of using these two tools together and some information you can gather from them. In the command prompt, use the command below:

nvprof python run_inference.py

Figure 9 below shows a list of CUDA kernels sorted by decreasing time taken for computation. Four out of the top five kernels are TensorRT kernels, GEMM operations running on Tensor Cores (more on how to use Tensor Cores in the next section). Ideally, you want to have GEMM operations occupy the top spots on this chart since GPUs are great at accelerating these operations. If they are not GEMM kernels, then this is a lead to investigate further work to remove or optimize those operations.

Fig 9 Output of nvprof in the command prompt showing top kernels by compute timeFig 10 NSIGHT Systems showing timeline view of a program utilizing the GPU well without major gaps

Figure 10 highlights the timeline of CUDA kernels marked with (1). The goal here is to identify the largest gaps in the timeline, which indicates that the GPU is not performing computations at that time. The GPU would be either waiting for data to be available or for a CPU operation to complete. Since ResNet-50 is well optimized, you notice that the gaps between kernels is very small of the order of few microseconds. If your graph has larger gaps, this is a lead to investigate what operations cause the gaps. You can also see in the image above the CUDA streams and their corresponding CUDA kernels. The yellow corresponds to TensorRT layers. An exhaustive treatment of debugging workflows is outside the scope of this blog.

You can see in figure 11 that a gap in the compute timeline corresponding to when the GPU is not utilized. You want to investigate patterns like this further.

Fig 11 NSIGHT Systems showing the timeline view for a program that has gap in GPU utilization

Vision and NLP represent common use cases requiring these tools for processing application input. If pre-processing of these applications is slow because of unavailability of data or network bottlenecks, using the tools above will help you identify areas to optimize your application. We often see that the bottleneck in the pipeline for inference in TF-TRT is loading inputs from disks or networks (such as jpeg images or TFRecords) and preprocessing them before feeding them into the inference engine. If data pre-processing is a bottleneck, you should explore using I/O libraries such as nvidia/dali to accelerate them using optimizations such as multithreaded I/O and image processing and performing image processing on GPUs.

TensorFlow Profiler is another tool that ships with TensorFlow and is handy for visualizing kernel timing information by putting additional parameters in the Python script. Examples include additional options and run_metadata provided to the session run: sess.run(res, options=options, run_metadata=run_metadata). After execution, a .json file with profiled data is generated in Chrome trace format and can be viewed by the Chrome browser.

You can use the TensorFlow logging capability as well as TensorBoard to see what parts of your application are converted to TensorRT. To use logging, increase the verbosity level in TensorFlow logs to print logs from a selected set of C++ files. You can learn more about verbose logging and levels allowed in the Debugging Tools documentation. See example code to increase verbosity level below:

TF_CPP_VMODULE=segment=2,convert_graph=2,convert_nodes=2,trt_engine_op=2 python run_inference.py

The other option is to visualize the graph in TensorBoard, a suite of visualization tools for TensorFlow. TensorBoard allows you to examine the TensorFlow graph, what nodes are in it, what TensorFlow nodes are converted to TensorRT node, what nodes are attached to TensorRT nodes, and even the shape of the tensors in the graph. Learn more in Visualizing TF-TRT Graphs With TensorBoard.

Is your algorithm using Tensor Cores?

You can use nvprof to check if your algorithm is using Tensor Cores. Figure 9 above shows an example of measuring performance using nvprof with the inference python script: nvprof python run_inference.py When using Tensor Cores with FP16 accumulation, the string ‘h884’ appears in the kernel name. On Turing, kernels using Tensor Cores may have ‘s1688’ and ‘h1688’ in their names, representing FP32 and FP16 accumulation respectively.

If your algorithm is not using Tensor Cores, you can do a few things to debug and understand why. In order to check if Tensor Cores are used for my network, follow these steps:

  1. Use command nvidia-smi on the command line to confirm that the current hardware architecture are Volta or Turing GPUs.
  2. Operators such as Fully Connected, MatMul, and Conv can use Tensor Cores. Make sure that all dimensions in these ops are multiples of 8 to trigger Tensor Core usage. For Matrix multiplication: M, N, K sizes must be multiples of 8. Fully-connected layers should use multiple-of-8 dimensions. If possible, pad input/output dictionaries to multiples of 8.

Note that in some cases TensorRT might select alternative algorithms not based on Tensor Cores if they perform faster for the chosen data and operations. You can always report bugs and interact with the TensorFlow-TensorRT community in the TensorRT forum.

Performance and Benchmarking Scripts

TensorRT maximizes inference performance, speeds up inference, and delivers low latency across a variety of networks for image classification, object detection, and segmentation. ResNet-50, as an example, achieves up to 8x higher throughput on GPUs using TensorRT in TensorFlow. You can achieve high throughput while maintaining high accuracy due to support of INT8 quantization. Find the latest performance results on NVIDIA GPU platforms in the Deep Learning Product Performance page.

Table 1 below..

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Posted by Margaret Maynard-Reid, Machine Learning GDE from Seattle US

A Docs Sprint is a way to improve documentation for an open-source project. Multiple docs sprints worldwide will take place for TensorFlow 2.0 around June 1, 2019. Join a TensorFlow 2.0 Global Docs Sprint whether you are a beginner or an expert in ML or TensorFlow. It’s a great way to get started with contributing to open-source projects, and you will learn a lot while contributing.

This is a step-by-step guide on how to review TensorFlow 2.0 documentation by going through a checklist for each API symbol, report issues found, and (optionally) create a Pull Request (PR) to fix the issues. It can be used as a cheatsheet whether you are attending a Docs Sprint near you, or you can follow along on your own.

High level summary of the steps:

Be sure to read how to Contribute to the TensorFlow Documentation on tensorflow.org for more details. If you run into questions, please post them to the TensorFlow Docs Gitter chat room here.

1. Decide which API symbol to review
  • Take a look at the TensorFlow Docs Task List here.
  • Choose a symbol that you are interested in working on. Don’t choose it if someone else is already working on it — look at the Owner’s GitHub Handle’s cell to make sure that no one else is working on it.
  • Write down your GitHub handle as the owner and indicates that you are reviewing this symbol. This is to avoid duplicate work by someone else. You add it to the sheet as a comment.
2. Review the documentation

For each symbol, you will need to review the documentation against the code. If there is a link in the symbol documentation, take tf.lite.TFLiteConverter as an example, click on that link (“ Defined inlite/python/lite.py”)to see the code as well as the Python docstring.

Review the symbol and assess the doc quality using the checklist in the TF 2.0 API Docs FAQ here. Note any issues such as missing links, incorrect description and missing information etc.

3. Report the review result

After you review the symbol, please report the results in the task list and report issues on GitHub if applicable:

  • Enter your review results (as comments) in the TensorFlow Docs Tasks here. Once Paige accept your comments in the task sheet, your updates will be reflected.
  • Open a new docs issue on Github to report any issues you found. Include the Github link in the docs task sheet. Note the issue already has the checklist pre-populated. ← see issue 25844 as an example. Make sure to prefix your issue title with [TF 2.0 API Docs].
Protip: If you’d like to fix the doc issue, answer “Yes” to the question “Submit a pull request?”; otherwise “no” in the new GitHub doc issue you are opening.
4. (Optional) Fix the doc issues

This is totally optional - if you would like to fix the doc issues yourself, you can update the docstring, and create a PR. Note: please include the link to PR in the GitHub doc issue you opened.

First you need to location the file to edit. As mentioned above, you can find the file from the link in each of he API documentation. (Note: you can’t find the file then just report the issue in the task list).

Once you are on the GitHub file that you would like update, you can use GitHub’s web-based file editor for making simple changes. Paige Bailey made a nice short video on this:

  • Switch to Edit mode — on Github click on the edit pencil icon “Edit the file in your fork of this project”
  • Make your changes in the docstring and save. (Note: there is no way to preview your markdown changes. You could copy/paste my changes to a GitHub gist, name it as a markdown file ‘test.md’ and you can sort of see a preview of it.)
  • Submit your PR — Add a title to your PR and click on “Commit changes” and a new PR will be created.

Note: this blog post points to several documents written by Paige Bailey. Many thanks to the review, feedback and contributions by Paige Bailey, Sergii Khomenko, Billy Lamberta, Josh Gordon, and the Docs Sprint organizers.

TensorFlow 2.0 Global Docs Sprint Cheatsheet was originally published in TensorFlow on Medium, where people are continuing the conversation by highlighting and responding to this story.

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Posted by Marcus Chang, Program Manager

Google I/O ’19 is now a wrap! From May 7–9, there were 13 AI and Machine Learning specific talks at I/O. TensorFlow was well represented with sessions on 2.0, AI for Mobile and IoT Devices, Swift for TensorFlow, TensorFlow Extended, TensorFlow.js, TensorFlow Graphics and much more! This post contains a listing of all the talks, and links.

Recorded sessions are now available to view on the TensorFlow YouTube channel (you can find the entire playlist here).

Machine Learning on Your Device: The Options

Developers have an often confusing plethora of options available to them in using machine learning to enhance their mobile apps and edge devices. This session demystified these options, showing you how TensorFlow can be used to train models and how you can use these models across a variety of devices with TensorFlow Lite.

Getting Started with TensorFlow 2.0

TensorFlow 2.0 is here! This talk will share a few examples for beginners and experts, and cover some of the differences between TensorFlow 1.0 and 2.0.

Swift for TensorFlow

Swift for TensorFlow is a platform for the next generation of machine learning that leverages innovations like first-class differentiable programming to seamlessly integrate deep neural networks with traditional software development. Learn how Swift for TensorFlow can make advanced machine learning research easier and why Jeremy Howard’s fast.ai has chosen it for the latest iteration of their deep learning course.

AI for Mobile and IoT devices: TensorFlow Lite

Imagine building an app that still hears voice commands when your phone is offline, or identifying products in real time with your camera. Learn how to build AI into any device using TensorFlow Lite, and no ML experience is required. Discover a library of pretrained models that are ready to use in your apps, or customize to your needs. You’ll see how quickly you can add ML to Android and iOS apps.

TensorFlow Extended (TFX): ML Pipelines and Model Understanding

This talk focuses on creating a production ML pipeline using TFX. Using TFX developers can implement ML pipelines capable of processing large datasets for both modeling and inference. In addition to data wrangling and feature engineering over large datasets, TFX enables detailed model analysis and versioning. This session focuses on implementing a TFX pipeline and a discussion of current topics in model understanding.

Machine Learning magic for your JavaScript application

TensorFlow.js is a library for training and deploying ML models in the browser and in Node.js, and offers unique opportunities for JavaScript developers. Learn about the TensorFlow.js ecosystem: how to bring an existing ML model into your JS app, re-train the model using your data and go beyond the browser to other JS platforms.

Federated Learning: Machine Learning on decentralized data

Meet federated learning: a technology for training and evaluating machine learning models across a fleet of devices (e.g. Android phones), orchestrated by a central server, without sensitive training data leaving any user’s device. Learn how this privacy-preserving technology is deployed in production in Google products and how TensorFlow Federated can enable researchers and pioneers to simulate federated learning on their own datasets.

Cloud TPU Pods: AI supercomputing that solves large ML problems

Cloud Tensor Processing Unit (TPU) is an ASIC designed by Google for neural network processing. TPUs feature a domain specific architecture designed specifically for accelerating TensorFlow training and prediction workloads and provides performance benefits on machine learning production use. Learn the technical details of Cloud TPU and Cloud TPU Pod, and new features of TensorFlow that enables a large scale model parallelism for deep learning training.

Machine Learning Fairness: Lessons Learned

ML fairness is a critical consideration in machine learning development. This session presented a few lessons Google has learned through our products, research, and how developers can apply these learnings in their own efforts. There’s a walkthrough of techniques that will enable model performance evaluation and improvement, and the resources such as datasets and Tensorflow Model Analysis that are at their disposal. This talk enables developers to proactively think about fairness in product development.

Machine Learning Zero to Hero

This is a talk for people who know code, but who don’t necessarily know ML. Learn the ‘new’ paradigm of machine learning, and how models are an alternative implementation for some logic scenarios, as opposed to writing if/then rules and other code. This recap guides you through understanding many of the new concepts in ML that you might not be familiar with, including eager mode, training loops, optimizers, and loss functions.

TF-Agents: A Flexible Reinforcement Learning Library for TensorFlow

TF-Agents is a clean, modular, and well-tested open-source library for Deep Reinforcement Learning with TensorFlow. This session covered recent advancements in Deep RL, and show how TF-Agents can help to jump start your project. You will also see how TF-Agent library components can be mixed, matched, and extended to implement new RL algorithms.

Cutting Edge TensorFlow: New Techniques

There’s lots of great new things available in TensorFlow since last year’s IO. This recap will take you through 4 of the hottest from Hyperparameter Tuning with Keras Tuner to Probabilistic Programming to being able to rank your data with learned ranking techniques and TF-Ranking. Finally, you will look at TF-Graphics that brings 3D functionalities to TensorFlow.

Introducing Google Coral: Building on-device AI

This session introduced Google Coral, a new platform for on-device AI application development and showcase it’s ML acceleration power with TensorFlow demos. Coral offers you the tools to bring private, fast, and efficient neural network acceleration right onto your device, and enables you to grow ideas of AI application from prototype to production. Learn the technical specs of Edge TPU hardware and software tools, as well as application development process.

TensorFlow.js and TensorFlow Lite also hosted demo stations in the ML/AI sandbox at I/O to showcase what’s new and to answer questions from attendees who visited the dome during the 3-day event!

With TensorFlow.js you can bring the power of ML to your JavaScript applications by using one of many pre-packaged models, start with previously trained models and use transfer learning to customize it on your own data, and deploy in browser, or server-side using Node.js. See some cool demos and examples and tutorials to help you get started.

TensorFlow Lite helps you bring AI to mobile apps and edge devices! To learn more, visit tensorflow.org/lite. Explore our Android and iOS example apps, download our pre-trained, mobile-optimized ML models, or learn about ML on microcontrollers.

TensorFlow @ Google I/O ’19 Recap was originally published in TensorFlow on Medium, where people are continuing the conversation by highlighting and responding to this story.

Read Full Article

Read for later

Articles marked as Favorite are saved for later viewing.
close
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Separate tags by commas
To access this feature, please upgrade your account.
Start your free month
Free Preview