Ankit-AI | Sharing AI on Feedspot

Understanding SoTA Language Models (BERT, RoBERTA, ALBERT, ELECTRA)

Ankit-AI | Sharing AI

by

3y ago

Hi everyone, There are a ton of language models out there today! Many of which have their unique way of learning "self-supervised" language representations that can be used by other downstream tasks. In this article, I decided to summarize the current trends and share some key insights to glue all these novel approaches together. ? (Slide credits: Delvin et. al. Stanford CS224n) Problem: Context-free/Atomic Word Representations We started with context-free approaches like word2vec, GloVE embeddings in my previous post. The drawback of these approaches is that they do no ..read more

Visit website

Your Questions, My Answers on Stanford's Graduate AI Certification

Ankit-AI | Sharing AI

by

4y ago

I have been asked a lot of questions lately about Stanford's online course offerings and why somebody would choose them over myriad of options online. This is my attempt to bundle them together to help a broader audience. 1. How do you choose classes? It depends on your goals and interests. Here are some questions to ask. Goal: What do you want to achieve out of a particular course? Are you learning for fun or do you want to apply the knowledge to build something? Do you want to extend/switch careers to become a Deep Learning practitioner? Do you think these tools will help you solve a ..read more

Visit website

Review : Stanford's Online Artificial Intelligence Courses - Deep Learning and Machine Learning

Ankit-AI | Sharing AI

by

4y ago

Hello! I have been enrolled at Stanford and have been taking their courses online. Here are my few cents on the ones I have taken so far. CS224n - Natural Language Processing with Deep Learning (Prof. Manning) Difficulty: 4/5 (Moderate) What to expect: Get exposed to State-of-the-Art (SoTA) Deep Learning techniques applied to NLP. Key topics: Question and Answering Text Summarization Parts of Speech tagging Sequence-to-Sequence models Transformers Gives you a very good overview of where NLP is headed, homeworks are challenging but allow you to implement latest neural architectures to ..read more

Visit website

Future of Natural Language Processing with Deep Learning (NLP/DL)

Ankit-AI | Sharing AI

by

5y ago

I recently attended a talk by Kevin Clarke (CS224n) where he talked about the future trends in NLP. I am writing this post to summarize and discuss the recent trends. Slide snippets are from his guest lecture. There are 2 primary topics that lay down the trends for NLP with Deep Learning: 1. Pre-Training using Unsupervised/Unlabeled data 2. OpenAI GPT-2 breakthrough 1. Pre-Training using Unsupervised/Unlabeled dataSupervised data is expensive and limited, how can we use Unsupervised data to supplement training with supervised fine-tuning to do better? Let's apply this to the problem of Mac ..read more

Visit website

Primer on the math of Machine Learning

Ankit-AI | Sharing AI

by

5y ago

1. Dot Product of vectors (Inner Product or Scalar Product) <a | b>Dot product of 2 vectors a and b is defined as:aT . b , It can also be represented as bT . aThe dot product of two vectors a = [a1, a2, …, an] and b = [b1, b2, …, bn] is defined as:{\displaystyle \mathbf {\color {red}a} \cdot \mathbf {\color {blue}b} =\sum _{i=1}^{n}{\color {red}a}_{i}{\color {blue}b}_{i}={\color {red}a}_{1}{\color {blue}b}_{1}+{\color {red}a}_{2}{\color {blue}b}_{2}+\cdots +{\color {red}a}_{n}{\color {blue}b}_{n}}Dot Product is also called scalar product. Since, it produces a real valued output ..read more

Visit website

The evolution of Natural Language Models (NLM) - Must know NLP Basics

Ankit-AI | Sharing AI

by

5y ago

I decided to go through some of the break through papers in the field of NLP (Natural Language Processing) and summarize my learnings. The papers date from early 2000s to 2018. Source - KDNuggets If you are completely new to the field of NLP - I recommend you start by reading this article which touches on a variety of NLP basics. 1. A Neural Probabilistic Language Model 2. Efficient Estimation of Word Representations in Vector Space Word2Vec - Skipgram Model 3. Distributed Representations of Words and Phrases and their Compositionally 4. GloVe: Global Vectors for Word Representation ..read more

Visit website

A friendly introduction to Generative Adversarial Networks

Ankit-AI | Sharing AI

by

5y ago

So far, we have been talking about discriminative models which map input features x to labels y and approximate P(y/x) - Baye's law. Generative models do the opposite, they try to predict input features given the labels. Assuming given label is y how likely are we to see certain features x. They approximate the joint probability of P(x and y). Source: Medium / CycleGAN Generative Adversarial Networks (GANs) source: O'Reilly Components of a GAN:1. Generator - This is an inverse CNN, instead of compressing information as we go along a CNN chain and extracting features at the output, this n ..read more

Visit website

Normal Equation Algorithm for minimizing cost J

Ankit-AI | Sharing AI

by

5y ago

Gradient descent gives one way of minimizing J. A second way of doing so, this time performing the minimization explicitly and without resorting to an iterative algorithm. In the "Normal Equation" method, we will minimize J by explicitly taking its derivatives with respect to the θj ’s, and setting them to zero. This allows us to find the optimum theta without iteration. The normal equation formula is given below:\theta = (X^T X)^{-1}X^T yθ=(XTX)−1XTyThere is no need to do feature scaling with the normal equation.The following is a comparison of gradient descent and the normal equation:Gradien ..read more

Visit website

Understanding Convolutional Neural Networks (CNN) with an example

Ankit-AI | Sharing AI

by

5y ago

After completing Course #4 of the Coursera Deep Learning specialization I wanted to write a short summary to help y'all understand / brush up on the concept of Convolutional Neural Network (CNN). Let's understand CNNs with an example - Figure 1. CNN Example - Source: Coursera DL Specialization Let's say you have a 32x32 image of digits from 0 to 10 with 3 channels (RGB). You pass it through a filter of size f in the 1st Convolutional Layer (CL1). What is the size of the output image of the filter?The size of the output image is calculated by the following formula:Source: MediumIn our case ..read more

Visit website

Optimization Algorithms for Machine Learning

Ankit-AI | Sharing AI

by

5y ago

I have been learning through Andrew Ng's Deep Learning specialization on Coursera. I have completed the 1st of the 5 courses in the specialization (Neural Networks and Deep Learning). I am onto the 2nd one which is Improving Deep Learning. This one is a very interesting course which goes deep into Hyper-parameter tuning, Regularization and Optimization techniques. 1. What are Optimization Algorithms? They enable you to train your neural network much faster since Applied Machine Learning is a bery empirical process these algorithms help reach optimized results efficiently. Let's start looki ..read more

Visit website

Follow Ankit-AI | Sharing AI on FeedSpot