Book Report: Machine Learning for Drug Discovery
Salmon Run
by Sujit Pal
3w ago
Drug Discovery is a field where biochemists (and more recently computer scientists) turn ideas into potential medications. I first came across a few applications in this area when checking out how to build Graph Neural Networks (GNN) as part of auditing the CS224W: Machine Learning with Graphs course from Stanford, some learnings of which I recycled into my Deep Learning with Graphs tutorial at ..read more
Visit website
Hierarchical (and other) Indexes using LlamaIndex for RAG Content Enrichment
Salmon Run
by Sujit Pal
1M ago
At our weekly This Week in Machine Learning (TWIML) meetings, (our leader and facilitataor) Darin Plutchok pointed out a LinkedIn blog post on Semantic Chunking that has been recently implemented in the LangChain framework. Unlike more traditional chunking approaches that use number of tokens or separator tokens as a guide, this one chunks groups of sentences into semantic units by breaking them ..read more
Visit website
Thoughts on using LangChain LCEL with Claude
Salmon Run
by Sujit Pal
1M ago
I got into Natural Language Processing (NLP) and Machine Learning (ML) through Search. And this led me into Generative AI (GenAI), which led me back to Search via Retrieval Augmented Generation (RAG). RAG started out relatively simple -- take a query, generate search results, use search results as context for a Large Language Model (LLM) to generate an abstractive summary of the results. Back ..read more
Visit website
Book Report: Allen B Downey's Probably Overthinking It
Salmon Run
by Sujit Pal
2M ago
I have read Allen Downey's books on statistics in the past, when trying to turn myself from a Software Engineer into what Josh Wills says a Data Scientist is -- someone who is better at statistics than a Software Engineer and better at software than a statistician (with somewhat limited success in the first area, I will hasten to add). Last year, I had the good fortune to present at PyData Global ..read more
Visit website
Knowledge Graph Aligned Entity Linker using SentenceTransformers
Salmon Run
by Sujit Pal
3M ago
Most of us are familiar with Named Entity Recognizers (NERs) that can recognize spans in text as belonging to a small number of classes, such as Person (PER), Organization (ORG), Location (LOC), etc. These are usually multi-class classifier models, trained on input sequences to return BIO (Begin-Input-Output) tags for each token. However, recognizing entities in a Knowledge Graph (KG) using this ..read more
Visit website
PyData Global 2023: Trip Report
Salmon Run
by Sujit Pal
4M ago
I had the opportunity to present at PyData Global this year. It is a virtual conference that ran over 3 days in multiple tracks from December 6 to 8. I talked about Building Learning to Rank models for search using Large Language Models. For those attending the conference, I already shared the links to the slides and the associated code on its Discord channel, but for those who are not, they are ..read more
Visit website
Building Learning to Rank Models with Generative AI
Salmon Run
by Sujit Pal
4M ago
Generative AI has been the new cool kid on the AI / ML block since early this year. Like everyone else, I continue to be amazed and wowed with each successive success story as they break existing benchmark records and showcase novel applications built on top of their new functionality. I was also lucky to be involved in a Generative AI project since the middle of this year, which gave me access ..read more
Visit website
A PySpark idiom for efficient Model Inference
Salmon Run
by Sujit Pal
6M ago
I recently needed to build an Apache Spark (PySpark) job where the task was (among other things) to use a Language Model (LM) to encode text into vectors. This is an embarassingly parallel job where the text to encoding is one to one, so something like Spark works very well here. We could, in theory at least, achieve a N-fold performance improvement by horizontally partitioning the data into N ..read more
Visit website
Future of Data Centric AI -- Trip Report
Salmon Run
by Sujit Pal
11M ago
I attended the Future of Data Centric AI 2023 this week, a free virtual conference organized by Snorkel AI. Snorkel.AI is a company built around the open-source Snorkel framework for programmatic data labeling. The project originally started at Stanford University's Hazy Research group, and many (all?) of the company's founders and some engineers are from the original research team. Snorkel.AI ..read more
Visit website
BMI 702 Review Part III (Language Modeling)
Salmon Run
by Sujit Pal
11M ago
Welcome to Part III of my review of the Biomedical Artificial Intelligence (BMI 702) course, part of Harvard's Foundations of Biomedical Informatics 2023 Spring session, taught by Prof Marinka Zitnik and her team. If you want to check out my previous two reviews in this series, they are listed below. BMI 702 Review Part I BMI 702 Review Part II (Graph Learning) As the title of my post ..read more
Visit website

Follow Salmon Run on FeedSpot

Continue with Google
Continue with Apple
OR