BMI 702 Review Part III (Language Modeling)
Salmon Run
by Sujit Pal
1w ago
Welcome to Part III of my review of the Biomedical Artificial Intelligence (BMI 702) course, part of Harvard's Foundations of Biomedical Informatics 2023 Spring session, taught by Prof Marinka Zitnik and her team. If you want to check out my previous two reviews in this series, they are listed below. BMI 702 Review Part I BMI 702 Review Part II (Graph Learning) As the title of my post ..read more
Visit website
Haystack US 2023: Trip Report
Salmon Run
by Sujit Pal
1M ago
I attended the Haystack US 2023 Search Relevance conference last week. It was a great opportunity to share ideas and techniques around search and search relevance, as well as to catch up with old friends and acquaintances and a chance to make new ones. I was there only for the two days of the actual conference, but there were events before and after the conference as well. The full talk schedule ..read more
Visit website
BMI 702 Review Part II (Graph Learning)
Salmon Run
by Sujit Pal
1M ago
This week I continue with the review of the papers suggested in the Biomedical Artificial Intelligence (BMI 702), specifically the Graph Learning (M3) module. There are 7 papers in the first week (2 required, 5 optional) and 5 in the second week (2 required, 3 optional). In this post I will attempt to enumerate my high level takeaways from this module and summarize these 12 papers so you can ..read more
Visit website
BMI 702 Review Part I
Salmon Run
by Sujit Pal
2M ago
I recently moved to our Health Markets division as part of an internal restructuring. While it is essentially a lateral shift, there are subtle differences in the kind of work I will do going forward versus what I have been doing at Elsevier so far. At my previous position at Labs, the focus of work was more on the use of technology to solve business problems of other teams such as those in ..read more
Visit website
Resurrection
Salmon Run
by Sujit Pal
3M ago
2022 has came and gone, and without a single blog post from my end. To be fair, my blogging output has been steadily decreasing over the last few years, so you would be justified in thinking of it as a somewhat inevitable trend. In other words, we had a good run, etc. Thinking back, one possible reason for my decreasing output is that my previous job was more product focused and my current one is ..read more
Visit website
Fine-tuning OpenAI CLIP for different domains
Salmon Run
by Sujit Pal
1y ago
In July this year, a group of us on the TWIML Slack Channel came together and participated in the Flax/JAX Community Week organized by Hugging Face and Google Cloud. Our project was about fine-tuning the CLIP Model from OpenAI with the RSICD (Remote Sensing Image Captioning Dataset), and ended up placing third. The code for the project is available on github at arampacha/CLIP-rsicd if you are ..read more
Visit website
Distributed Training of a Bengali ALBERT model
Salmon Run
by Sujit Pal
2y ago
Even though I am from India and my mother tongue is Bengali, and I speak, read, and write both Hindi and Bengali almost as well as English, in my career with Natural Language Processing (NLP) I have worked exclusively with English. This is probably not that uncommon, because until recently, English was the language where most NLP work happened, and to a lesser extent some of the major European ..read more
Visit website
More tricks to improve performance of CIFAR-10 classifier
Salmon Run
by Sujit Pal
2y ago
Some time back I wrote a post about Tricks to improve performance of CIFAR-10 classifier, based on things I learned from New York University's Deep Learning with Pytorch course taught by Yann Le Cun and Alfredo Canziani. The tricks I covered were conveniently located on a single slide in one of the lectures. Shortly thereafter, I learned of a few more tricks that wee mentioned in passing, so I ..read more
Visit website
Learning Vespa
Salmon Run
by Sujit Pal
2y ago
No, not the scooter :-). I meant Vespa.AI, a search engine that supports structured search, text search, and approximate vector search. While Vespa's vector search functionality was probably built in response to search engines incorporating vector based signals into their ranking algorithms, there are many ML/NLP pipelines as well that can benefit from vector search, i.e., the ability to find ..read more
Visit website
Comparison of Text Augmentation Strategies for Spam Detection
Salmon Run
by Sujit Pal
2y ago
Some time back, I found myself thinking of different data augmentation strategies for unbalanced datasets, i.e. datasets in which one or more classes are over-represented compared to the others, and wondering how these strategies stack up to one another. So I decided to set up a simple experiment to compare them. This post describes the experiment and its results. The dataset I chose for this ..read more
Visit website

Follow Salmon Run on Feedspot

Continue with Google
Continue with Apple
OR