
Data Skeptic
7,184 FOLLOWERS
Data Skeptic is your source for a perspective of scientific skepticism on topics in statistics, machine learning, big data, artificial intelligence, and data science. Our weekly podcast and blog bring you stories and tutorials to help understand our data-driven world.
Data Skeptic
3d ago
In this episode, Bnaya Gross, a Fulbright postdoctoral fellow at the Center for Complex Network Research at Northwestern University, explores the transformative applications of network science in fields ranging from infrastructure to medicine, by studying the interactions between networks ("a network of networks").
Listeners will learn how interdependent networks provide a framework for understanding cascading failures, such as power outages, and how these insights transfer to physical systems like superconducting materials and biological networks.
Key takeaways include understanding how depen ..read more
Data Skeptic
1w ago
Our guests, Erwan Le Merrer and Gilles Tredan, are long-time collaborators in graph theory and distributed systems. They share their expertise on applying graph-based approaches to understanding both large language model (LLM) hallucinations and shadow banning on social media platforms.
In this episode, listeners will learn how graph structures and metrics can reveal patterns in algorithmic behavior and platform moderation practices.
Key insights include the use of graph theory to evaluate LLM outputs, uncovering patterns in hallucinated graphs that might hint at the underlying structure and t ..read more
Data Skeptic
2w ago
In this episode, Šimon Mandlík, a PhD candidate at the Czech Technical University will talk with us about leveraging machine learning and graph-based techniques for cybersecurity applications.
We'll learn how graphs are used to detect malicious activity in networks, such as identifying harmful domains and executable files by analyzing their relationships within vast datasets.
This will include the use of hierarchical multi-instance learning (HML) to represent JSON-based network activity as graphs and the advantages of analyzing connections between entities (like clients, domains etc.).
Our gue ..read more
Data Skeptic
3w ago
Thibaut Vidal, a professor at Polytechnique Montreal, specializes in leveraging advanced algorithms and machine learning to optimize supply chain operations.
In this episode, listeners will learn how graph-based approaches can transform supply chains by enabling more efficient routing, districting, and decision-making in complex logistical networks.
Key insights include the application of Graph Neural Networks to predict delivery costs, with potential to improve districting strategies for companies like UPS or Amazon and overcoming limitations of traditional heuristic methods.
Thibaut’s work u ..read more
Data Skeptic
1M ago
Our guest in this episode is David Tench, a Grace Hopper postdoctoral fellow at Lawrence Berkeley National Labs, who specializes in scalable graph algorithms and compression techniques to tackle massive datasets.
In this episode, we will learn how his techniques enable real-time analysis of large datasets, such as particle tracking in physics experiments or social network analysis, by reducing storage requirements while preserving critical structural properties.
David also challenges the common belief that giant graphs are sparse by pointing to a potential bias: Maybe because of the challenge ..read more
Data Skeptic
1M ago
In this episode, Dave Bechberger, principal Graph Architect at AWS and author of "Graph Databases in Action", brings deep insights into the field of graph databases and their applications.
Together we delve into specific scenarios in which Graph Databases provide unique solutions, such as in the fraud industry, and learn how to optimize our DB for questions around connections, such as "How are these entities related?" or "What patterns of interaction indicate anomalies?"
This discussion sheds light on when organizations should consider adopting graph databases, particularly for cases that req ..read more
Data Skeptic
2M ago
In this episode, Adam Machowczyk, a PhD student at the University of Leicester, specializes in graph rewriting and its intersection with machine learning, particularly Graph Neural Networks.
Adam explains how graph rewriting provides a formalized method to modify graphs using rule-based transformations, allowing for tasks like graph completion, attribute prediction, and structural evolution.
Bridging the worlds of graph rewriting and machine learning, Adam's work aspire to open new possibilities for creating adaptive, scalable models capable of solving challenges that traditional methods ..read more
Data Skeptic
2M ago
Our guest today is Risa Shinoda, a PhD student at Kyoto University Agricultural Systems Engineering Lab, where she applies computer vision techniques.
She talked about the OpenAnimalTracks dataset and what it was used for. The dataset helps researchers predict animal footprint. She also discussed how she built a model for predicting tracks of animals. She shared the algorithms used and the accuracy they achieved. She also discussed further improvement opportunities for the model ..read more
Data Skeptic
2M ago
Today, We are joined by Petter Törnberg, an Assistant Professor in Computational Social Science at the University of Amsterdam and a Senior Researcher at the University of Neuchatel. His research is centered on the intersection of computational methods and their applications in social sciences. He joins us to discuss findings from his research papers, ChatGPT-4 Outperforms Experts and Crowd Workers in Annotating Political Twitter Messages with Zero-Shot Learning, and How to use LLMs for Text Analysis ..read more
Data Skeptic
2M ago
In this episode, the data scientist Wentao Su shares his experience in AB testing on social media platforms like LinkedIn and TikTok.
We talk about how network science can enhance AB testing by accounting for complex social interactions, especially in environments where users are both viewers and content creators. These interactions might cause a "spillover effect" meaning a possible influence across experimental groups, which can distort results.
To mitigate this effect, our guest presents heuristics and algorithms they developed ("one-degree label propagation”) to allow for good results on b ..read more