Towards Data Science
8,873 FOLLOWERS
Sharing concepts, ideas and codes on data science. Towards Data Science Inc. is a corporation registered in Canada. It provides a platform for thousands of people to exchange ideas and to expand their understanding of data science.
Towards Data Science
12h ago
Concerns about the environmental impacts of Large Language Models (LLMs) are growing. Although detailed information about the actual costs of LLMs can be difficult to find, let’s attempt to gather some facts to understand the scale.
Generated with ChatGPT-4o
Since comprehensive data on ChatGPT-4 is not readily available, we can consider Llama 3.1 405B as an example. This open-source model from Meta is arguably the most “transparent” LLM to date. Based on various benchmarks, Llama 3.1 405B is comparable to ChatGPT-4, providing a reasonable basis for understanding LLMs within this rang ..read more
Towards Data Science
14h ago
Here’s a better framework for data-driven decision-making [Image by Author]
Data scientists are in the business of decision-making. Our work is focused on how to make informed choices under uncertainty.
And yet, when it comes to quantifying that uncertainty, we often lean on the idea of “statistical significance” — a tool that, at best, provides a shallow understanding.
In this article, we’ll explore why “statistical significance” is flawed: arbitrary thresholds, a false sense of certainty, and a failure to address real-world trade-offs.
Most important, we’ll learn how to move beyond the ..read more
Towards Data Science
14h ago
From data engineer to domain expert—what it takes to build a new data platform “The best data engineers are runaway software engineers; the best data analysts, scientists, and solution (data) architects are runaway data engineers; and the best data product managers are runaway data analysts or scientists.” [Photo by Yurii Khomitskyi on Unsplash]
Ever wonder why roles with multidisciplinary skills are the key to successful delivery in new data settings? — If curious, read this post to discover how hybrid roles and their LSS capabilities can get you from idea to revenue.
After spending ..read more
Towards Data Science
19h ago
ML Lessons for Managers and Engineers A technical walkthrough of lesson one Image created by the author
Welcome back to the second lesson in my series, ML Lessons for Managers and Engineers. Today, by popular demand, I’ll walk you through implementing the solution I wrote about in lesson one.
This is a more technical lesson than I originally intended for this series, but I believe that most professionals benefit from a better understanding of machine learning technology.
To keep it as relevant as possible, I’ll focus mainly on the underlying reasoning because that’s where the va ..read more
Towards Data Science
23h ago
Streamlit-AgGrid is amazing. But there are 2 scenarios where its use is not recommended. Image generated with DALL-E
Hello there! I assume you are reading this blog post because you are aware of Streamlit and AgGrid. If, by chance, you are not familiar with either or want to dive into the technical details of AgGrid, I wrote a detailed blog post on how to create well-styled dataframes using the Streamlit-AgGrid component created by Pablo Fonseca.
In my opinion, st_aggrid is one of the best “extra” components in Streamlit. In fact, as of writing, it is the top recommended component in ..read more
Towards Data Science
2d ago
A prompt-based experiment to improve both accuracy and transparent reasoning in content personalization. Deliver relevant content to readers at the right time. Image by author.
At DER SPIEGEL, we are continually exploring ways to improve how we recommend news articles to our readers. In our latest (offline) experiment, we investigated whether Large Language Models (LLMs) could effectively predict which articles a reader would be interested in, based on their reading history.
Our Approach
We conducted a study with readers who participated in a survey where they rated their interest in ..read more
Towards Data Science
2d ago
Master the art of behavioral interviews and land your dream job Image generated by chatGPT
I work for an institute which prepares professionals to land jobs in high tech companies such as Amazon, Meta, Google, etc. As part of the interview preparation, many candidates want to have behavioral mock interviews. Their main goal is to figure out what is going to be asked in these interviews and how they should prepare for it.
I wrote down my experiences—both as a candidate and as hiring manager- in few pages of this book called “Grokking Behavioral Interviews”. You can get the book h ..read more
Towards Data Science
2d ago
MODEL EVALUATION & OPTIMIZATION 12 must-know methods to validate your machine learning
Every day, machines make millions of predictions — from detecting objects in photos to helping doctors find diseases. But before trusting these predictions, we need to know if they’re any good. After all, no one would want to use a machine that’s wrong most of the time!
This is where validation comes in. Validation methods test machine predictions to measure their reliability. While this might sound simple, different validation approaches exist, each designed to handle specific challenges in ma ..read more
Towards Data Science
2d ago
Real-world examples on how actively using special methods can simplify coding and improve readability.
Dunder methods, though possibly a basic topic in Python, are something I have often noticed being understood only superficially, even by people who have been coding for quite some time.
Disclaimer: This is a forgivable gap, as in most cases, actively using dunder methods “simply” speeds up and standardize tasks that can be done differently. Even when their use is essential, programmers are often unaware that they are writing special methods that belong to the broader category of dunder&n ..read more
Towards Data Science
3d ago
Delve into an end-to-end Machine Learning project to improve the quality of the Open Food Facts database Image generated with Flux1
Open Food Facts’ purpose is to create the largest open-source food database in the world. To this day, it has collected over 3 millions products and their information thanks to its contributors.
Nutritional value, eco-score, product origins,… Various data that define each product and give consumers and researchers insights about what they put in their plates.
This information is provided by the community of users and contributors, who actively add p ..read more