GPT-3 for the People
Another Datum
by Yoel Zeldes
4y ago
A few days ago, OpenAI announced a new successor to their Language Model (LM) - GPT-3. This is the largest model trained so far, with 175 billion parameters. While training this large model has its merits, reading a large portion of 72 pages can be tiresome. In this blog post I’ll highlight the parts that I find interesting for people familiar with LMs, who merely wish to know (most of) the important points of this work. What’s in a Language Model? "The diversity of tasks the model is able to perform in a zero-shot setting suggests that high-capacity models trained to maximize the likelihood ..read more
Visit website
The accessibility of GPT-2 - text generation and fine-tuning
Another Datum
by Yoel Zeldes
4y ago
Natural Language Generation (NLG) is a well studied subject among the NLP community. With the rise of deep learning methods, NLG has become better and better. Recently, OpenAI has pushed the limits, with the release of GPT-2 - a Transformers based model that predicts the next token at each time space. Nowadays it’s quite easy to use these models - you don’t need to implement the code yourself, or train the models using expensive resources. HuggingFace, for instance, has released an API that eases the access to the pretrained GPT-2 OpenAI has published. Some of its features include generating t ..read more
Visit website
Mixture of Variational Autoencoders - a Fusion Between MoE and VAE
Another Datum
by Yoel Zeldes
4y ago
The Variational Autoencoder (VAE) is a paragon for neural networks that try to learn the shape of the input space. Once trained, the model can be used to generate new samples from the input space. If we have labels for our input data, it’s also possible to condition the generation process on the label. In the MNIST case, it means we can specify which digit we want to generate an image for. Let’s take it one step further... Could we condition the generation process on the digit without using labels at all? Could we achieve the same results using an unsupervised approach? If we wanted to re ..read more
Visit website
TensorFlow — The Scope of Software Engineering
Another Datum
by Yoel Zeldes
4y ago
So you’ve finished training your model, and it’s time to get some insights as to what it has learned. You decide which tensor should be interesting, and go look for it in your code — to find out what its name is. Then it hits you — you forgot to give it a name. You also forgot to wrap the logical code block with a named scope. It means you’ll have a hard time getting a reference to the tensor. It holds for python scripts as well as TensorBoard: Can you see that small red circle lost in the sea of tensors? Finding it is hard... That’s a bummer! It would have been much better if it looked more ..read more
Visit website
Preparing for the Unexpected
Another Datum
by Yoel Zeldes
4y ago
Some of the problems we tackle using machine learning involve categorical features that represent real world objects, such as words, items and categories. So what happens when at inference time we get new object values that have never been seen before? How can we prepare ourselves in advance so we can still make sense out of the input? Unseen values, also called OOV (Out of Vocabulary) values, must be handled properly. Different algorithms have different methods to deal with OOV values. Different assumptions on the categorical features should be treated differently as well. In this post, I’ll ..read more
Visit website
Think your Data Different
Another Datum
by Yoel Zeldes
4y ago
In the last couple of years deep learning (DL) has become a main enabler for applications in many domains such as vision, NLP, audio, click stream data etc. Recently researchers started to successfully apply deep learning methods to graph datasets in domains like social networks, recommender systems and biology, where data is inherently structured in a graphical way. So how do Graph Neural Networks work? Why do we need them? The Premise of Deep Learning In machine learning tasks involving graphical data, we usually want to describe each node in the graph in a way that allows us to feed it into ..read more
Visit website
How to Build Your Personal Brand as a Data Scientist
Another Datum
by Yoel Zeldes
4y ago
Personal branding is a thing now. It always has been, but I believe it’s been getting more and more attention recently. More people are aware of its importance, including the employers. Giving you a big paycheck, assuming you’re good, is obvious. Providing opportunities to flourish and build your personal brand is something an increasing number of companies are trying to seduce you with. While working in the algorithms group at Taboola, I was encouraged by the company to share my knowledge with the data science community. It has motivated me to embark on a journey to build my personal brand as ..read more
Visit website
TensorFlow Filesystem - Access Tensors Differently
Another Datum
by Yoel Zeldes
4y ago
Tensorflow is great. Really, I mean it. The problem is it’s great up to a point. Sometimes you want to do very simple things, but tensorflow is giving you a hard time. The motivation I had behind writing TFFS (TensorFlow File System) can be shared by anyone who has used tensorflow, including you. All I wanted was to know what the name of a specific tensor is; or what its input tensors are (ignoring operations). All of these questions can be easily answered using tensorboard. Sure, you just open the graph tab, and visually examine the graph. Really convenient, right? Well, only if you want ..read more
Visit website
Variational Autoencoders Explained in Detail
Another Datum
by Yoel Zeldes
4y ago
In the previous post of this series I introduced the Variational Autoencoder (VAE) framework, and explained the theory behind it. In this post I'll explain the VAE in more detail, or in other words - I'll provide some code :) After reading this post, you'll understand the technical details needed to implement VAE. As a bonus point, I'll show you how by imposing a special role to some of the latent vector's dimensions, the model can generate images conditioned on the digit type. In [1]: import numpy as np import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_ ..read more
Visit website

Follow Another Datum on FeedSpot

Continue with Google
Continue with Apple
OR