Salmon Run on Feedspot

Book Report: Machine Learning for Drug Discovery

Salmon Run

by Sujit Pal

1M ago

Drug Discovery is a field where biochemists (and more recently computer scientists) turn ideas into potential medications. I first came across a few applications in this area when checking out how to build Graph Neural Networks (GNN) as part of auditing the CS224W: Machine Learning with Graphs course from Stanford, some learnings of which I recycled into my Deep Learning with Graphs tutorial at ..read more

Visit website

Hierarchical (and other) Indexes using LlamaIndex for RAG Content Enrichment

Salmon Run

by Sujit Pal

1M ago

At our weekly This Week in Machine Learning (TWIML) meetings, (our leader and facilitataor) Darin Plutchok pointed out a LinkedIn blog post on Semantic Chunking that has been recently implemented in the LangChain framework. Unlike more traditional chunking approaches that use number of tokens or separator tokens as a guide, this one chunks groups of sentences into semantic units by breaking them ..read more

Visit website

Thoughts on using LangChain LCEL with Claude

Salmon Run

by Sujit Pal

2M ago

I got into Natural Language Processing (NLP) and Machine Learning (ML) through Search. And this led me into Generative AI (GenAI), which led me back to Search via Retrieval Augmented Generation (RAG). RAG started out relatively simple -- take a query, generate search results, use search results as context for a Large Language Model (LLM) to generate an abstractive summary of the results. Back ..read more

Visit website

Book Report: Allen B Downey's Probably Overthinking It

Salmon Run

by Sujit Pal

2M ago

I have read Allen Downey's books on statistics in the past, when trying to turn myself from a Software Engineer into what Josh Wills says a Data Scientist is -- someone who is better at statistics than a Software Engineer and better at software than a statistician (with somewhat limited success in the first area, I will hasten to add). Last year, I had the good fortune to present at PyData Global ..read more

Visit website

Knowledge Graph Aligned Entity Linker using SentenceTransformers

Salmon Run

by Sujit Pal

4M ago

Most of us are familiar with Named Entity Recognizers (NERs) that can recognize spans in text as belonging to a small number of classes, such as Person (PER), Organization (ORG), Location (LOC), etc. These are usually multi-class classifier models, trained on input sequences to return BIO (Begin-Input-Output) tags for each token. However, recognizing entities in a Knowledge Graph (KG) using this ..read more

Visit website

PyData Global 2023: Trip Report

Salmon Run

by Sujit Pal

4M ago

I had the opportunity to present at PyData Global this year. It is a virtual conference that ran over 3 days in multiple tracks from December 6 to 8. I talked about Building Learning to Rank models for search using Large Language Models. For those attending the conference, I already shared the links to the slides and the associated code on its Discord channel, but for those who are not, they are ..read more

Visit website

Building Learning to Rank Models with Generative AI

Salmon Run

by Sujit Pal

5M ago

Generative AI has been the new cool kid on the AI / ML block since early this year. Like everyone else, I continue to be amazed and wowed with each successive success story as they break existing benchmark records and showcase novel applications built on top of their new functionality. I was also lucky to be involved in a Generative AI project since the middle of this year, which gave me access ..read more

Visit website

A PySpark idiom for efficient Model Inference

Salmon Run

by Sujit Pal

7M ago

I recently needed to build an Apache Spark (PySpark) job where the task was (among other things) to use a Language Model (LM) to encode text into vectors. This is an embarassingly parallel job where the text to encoding is one to one, so something like Spark works very well here. We could, in theory at least, achieve a N-fold performance improvement by horizontally partitioning the data into N ..read more

Visit website

Future of Data Centric AI -- Trip Report

Salmon Run

by Sujit Pal

11M ago

I attended the Future of Data Centric AI 2023 this week, a free virtual conference organized by Snorkel AI. Snorkel.AI is a company built around the open-source Snorkel framework for programmatic data labeling. The project originally started at Stanford University's Hazy Research group, and many (all?) of the company's founders and some engineers are from the original research team. Snorkel.AI ..read more

Visit website

BMI 702 Review Part III (Language Modeling)

Salmon Run

by Sujit Pal

1y ago

Welcome to Part III of my review of the Biomedical Artificial Intelligence (BMI 702) course, part of Harvard's Foundations of Biomedical Informatics 2023 Spring session, taught by Prof Marinka Zitnik and her team. If you want to check out my previous two reviews in this series, they are listed below. BMI 702 Review Part I BMI 702 Review Part II (Graph Learning) As the title of my post ..read more

Visit website

Follow Salmon Run on FeedSpot