Understanding SoTA Language Models (BERT, RoBERTA, ALBERT, ELECTRA)
Ankit-AI | Sharing AI
by
3y ago
 Hi everyone, There are a ton of language models out there today! Many of which have their unique way of learning "self-supervised" language representations that can be used by other downstream tasks.  In this article, I decided to summarize the current trends and share some key insights to glue all these novel approaches together.  ? (Slide credits: Delvin et. al. Stanford CS224n) Problem: Context-free/Atomic Word Representations We started with context-free approaches like word2vec, GloVE embeddings in my previous post. The drawback of these approaches is that they do no ..read more
Visit website
Your Questions, My Answers on Stanford's Graduate AI Certification
Ankit-AI | Sharing AI
by
4y ago
I have been asked a lot of questions lately about Stanford's online course offerings and why somebody would choose them over myriad of options online. This is my attempt to bundle them together to help a broader audience. 1. How do you choose classes? It depends on your goals and interests. Here are some questions to ask. Goal:  What do you want to achieve out of a particular course?  Are you learning for fun or do you want to apply the knowledge to build something? Do you want to extend/switch careers to become a Deep Learning practitioner? Do you think these tools will help you solve a ..read more
Visit website
Review : Stanford's Online Artificial Intelligence Courses - Deep Learning and Machine Learning
Ankit-AI | Sharing AI
by
4y ago
Hello! I have been enrolled at Stanford and have been taking their courses online. Here are my few cents on the ones I have taken so far. CS224n - Natural Language Processing with Deep Learning (Prof. Manning) Difficulty: 4/5 (Moderate) What to expect:  Get exposed to State-of-the-Art (SoTA) Deep Learning techniques applied to NLP. Key topics:  Question and Answering Text Summarization Parts of Speech tagging Sequence-to-Sequence models Transformers Gives you a very good overview of where NLP is headed, homeworks are challenging but allow you to implement latest neural architectures to ..read more
Visit website
Future of Natural Language Processing with Deep Learning (NLP/DL)
Ankit-AI | Sharing AI
by
5y ago
I recently attended a talk by Kevin Clarke (CS224n) where he talked about the future trends in NLP. I am writing this post to summarize and discuss the recent trends. Slide snippets are from his guest lecture. There are 2 primary topics that lay down the trends for NLP with Deep Learning: 1. Pre-Training using Unsupervised/Unlabeled data 2. OpenAI GPT-2 breakthrough 1. Pre-Training using Unsupervised/Unlabeled dataSupervised data is expensive and limited, how can we use Unsupervised data to supplement training with supervised fine-tuning to do better? Let's apply this to the problem of Mac ..read more
Visit website
Primer on the math of Machine Learning
Ankit-AI | Sharing AI
by
5y ago
         1. Dot Product of vectors (Inner Product or Scalar Product) <a | b>Dot product of 2 vectors a and b is defined as:aT . b , It can also be represented as  bT . aThe dot product of two vectors a = [a1, a2, …, an] and b = [b1, b2, …, bn] is defined as:{\displaystyle \mathbf {\color {red}a} \cdot \mathbf {\color {blue}b} =\sum _{i=1}^{n}{\color {red}a}_{i}{\color {blue}b}_{i}={\color {red}a}_{1}{\color {blue}b}_{1}+{\color {red}a}_{2}{\color {blue}b}_{2}+\cdots +{\color {red}a}_{n}{\color {blue}b}_{n}}Dot Product is also called scalar product. Since, it produces a real valued output ..read more
Visit website
The evolution of Natural Language Models (NLM) - Must know NLP Basics
Ankit-AI | Sharing AI
by
5y ago
I decided to go through some of the break through papers in the field of NLP (Natural Language Processing) and summarize my learnings. The papers date from early 2000s to 2018. Source - KDNuggets If you are completely new to the field of NLP - I recommend you start by reading this article which touches on a variety of NLP basics. 1. A Neural Probabilistic Language Model 2. Efficient Estimation of Word Representations in Vector Space Word2Vec - Skipgram Model 3. Distributed Representations of Words and Phrases and their Compositionally 4. GloVe: Global Vectors for Word Representation ..read more
Visit website
A friendly introduction to Generative Adversarial Networks
Ankit-AI | Sharing AI
by
5y ago
So far, we have been talking about discriminative models which map input features x to labels y and approximate P(y/x) - Baye's law. Generative models do the opposite, they try to predict input features given the labels. Assuming given label is y how likely are we to see certain features x. They approximate the joint probability of P(x and y). Source: Medium / CycleGAN Generative Adversarial Networks (GANs) source: O'Reilly Components of a GAN:1. Generator - This is an inverse CNN, instead of compressing information as we go along a CNN chain and extracting features at the output, this n ..read more
Visit website
Normal Equation Algorithm for minimizing cost J
Ankit-AI | Sharing AI
by
5y ago
Gradient descent gives one way of minimizing J. A second way of doing so, this time performing the minimization explicitly and without resorting to an iterative algorithm. In the "Normal Equation" method, we will minimize J by explicitly taking its derivatives with respect to the θj ’s, and setting them to zero. This allows us to find the optimum theta without iteration. The normal equation formula is given below:\theta = (X^T X)^{-1}X^T yθ=(XTX)−1XTyThere is no need to do feature scaling with the normal equation.The following is a comparison of gradient descent and the normal equation:Gradien ..read more
Visit website
Understanding Convolutional Neural Networks (CNN) with an example
Ankit-AI | Sharing AI
by
5y ago
After completing Course #4 of the Coursera Deep Learning specialization I wanted to write a short summary to help y'all understand / brush up on the concept of Convolutional Neural Network (CNN). Let's understand CNNs with an example -  Figure 1. CNN Example - Source: Coursera DL Specialization Let's say you have a 32x32 image of digits from 0 to 10 with 3 channels (RGB). You pass it through a filter of size f in the 1st Convolutional Layer (CL1).  What is the size of the output image of the filter?The size of the output image is calculated by the following formula:Source: MediumIn our case ..read more
Visit website
Optimization Algorithms for Machine Learning
Ankit-AI | Sharing AI
by
5y ago
I have been learning through Andrew Ng's Deep Learning specialization on Coursera. I have completed the 1st of the 5 courses in the specialization  (Neural Networks and Deep Learning). I am onto the 2nd one which is Improving Deep Learning. This one is a very interesting course which goes deep into Hyper-parameter tuning, Regularization and Optimization techniques. 1. What are Optimization Algorithms? They enable you to train your neural network much faster since Applied Machine Learning is a bery empirical process these algorithms help reach optimized results efficiently. Let's start looki ..read more
Visit website

Follow Ankit-AI | Sharing AI on FeedSpot

Continue with Google
Continue with Apple
OR