Elucidata
749 FOLLOWERS
Elucidata's mission is to use data analytics to transform decision-making processes in R&D labs in biotechnology and pharmaceutical companies. On our Blog, you will find easy-to-understand and actionable insights to help your company improve its data management.
Elucidata
2y ago
The ‘Monthly Dataset Roundup’ series features datasets on Polly that are of scientific value, intended to promote data sharing and reuse of bio-molecular data. Polly OmixAtlases contains highly curated ML-ready data sets from diverse public data repositories of both omics (transcriptomics, proteomics, metabolomics, single-cell data, etc.) and non-omics data (flow cytometry, lab measurements, immunological assays, etc.). It offers a unique advantage of allowing users to access, utilize and integrate diverse data types to perform truly multi-dimensional analysis of their research question. This ..read more
Elucidata
2y ago
According to a 2021 survey by Gartner[1], poor-quality data cost organizations $12.9 million annually on average. That number is increasing continuously, given that the data/information created, captured, copied, and consumed worldwide is estimated to have grown [2] from about 9 zettabytes in 2013 to about 97 zettabytes in 2022. If the average cost of data management is 3.5% of a company’s revenue, and half of that information has no value, there is a material waste of capital. For companies in verticals such as life sciences, with a research and development function, data management cos ..read more
Elucidata
2y ago
As much as 80 percent of the raw scientific data collected by researchers in the early 1990s is no longer usable, primarily because nobody knows where to find it. While much of this data is still available “somewhere”, the time and effort required to fetch it are invariably prohibitive. It was only in the late 1990s and early 2000s that researchers started archiving larger datasets institutionally. Although the biomedical research community has been one of the earliest adopters of the latest technologies and principles to store, process, and consume data and knowledge, the volume, variety, and ..read more
Elucidata
2y ago
Discovery teams working on single-cell data typically get stuck for days and weeks on the initial step of sourcing relevant datasets from open-source portals. Storing and analyzing this data is another roadblock. Let’s take a quick glance at the recurring challenges scientists face while performing single-cell analysis (SCA) and some solutions that could streamline their discovery process.
Single Cell RNA Data Analysis Workflow
Challenges of working with publicly available single-cell data
Semi-structured, raw scRNA-seq data from public repositories are difficult to retrieve and int ..read more
Elucidata
2y ago
The ‘Monthly Dataset Roundup’’ series features datasets on Polly that are of scientific value, intended to promote data sharing and reuse of biomedical molecular data. Polly Omixatlases contains ML-ready, curated data sets from diverse public data repositories of both omics (transcriptomics, proteomics, metabolomics, single-cell data, etc.) and non-omics data (flow cytometry, lab measurements, immunological assays, etc.). It offers a unique advantage of allowing users to access, utilize and integrate diverse data types to perform truly multi-dimensional analyses of their research question. Thi ..read more
Elucidata
2y ago
Inconsistency in data inflows can be a huge challenge to engineers who oversee ETL pipelines. Last month, one of our biggest clients introduced 32TB of biomedical molecular data on our platform. Up until that point, our cloud infrastructure was processing a maximum of 15TB every month. Handling such a large surge requires a superior architecture that makes the best use of all the available tools. We implemented this. The result is, today our platforms can ingest more than 600 GB per day, up from 60GB per day, a month ago. That’s a 10x performance improvement. Not only that, but we also brought ..read more
Elucidata
3y ago
Our ‘Dataset of the Week’ series features publicly available omics datasets of scientific value, intending to promote data sharing and reuse.
This week’s Dataset is from the publication titled ‘Ancestry as a potential modifier of gene expression in breast neoplasms/tumors from Colombian women.’
Hispanic/Latino populations are a genetically admixed and heterogeneous group. It has been previously reported that the most prevalent breast neoplasm intrinsic subtype in Colombian women was Luminal B. Therefore, this paper aimed to explore ancestry-associated differences in molecular profiles of ..read more
Elucidata
3y ago
The diversity of biomedical omics data provides a wealth of information to researchers but also presents considerable analytical challenges. Our platform Polly, lets you access samples, data files & associated metadata for heterogenous biomedical data in one place. As a recent example, we’ve introduced data from UKBiobank, Human Protein Atlas, RCSB & ImmPort on Polly, in a format that can readily be integrated with analytical tools and pipelines. Read on for more information on these data & their use cases.
1. Explore a diverse collection of biomedical molecular data on a single p ..read more
Elucidata
3y ago
“In today’s Big Data world, companies rely on data scientists to extract insights from their vast, ever-expanding and diversified data sets… Many people think of data science as a job, but it’s more accurate to think of it as a way of thinking, a means of extracting insights through the scientific method.” – Thilo Huellmann
Biological Big Data: What it means for drug discovery
In recent years, multi-omics studies such as genomics, transcriptomics, proteomics etc. have helped scientists to derive insights about the molecular mechanism of disease progression. But each of these studies produces d ..read more
Elucidata
3y ago
“Data is the new oil. It’s valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc to create a valuable entity that drives profitable activity; so must data be broken down, analyzed for it to have value.” — Clive Humby, 2006
Data Centricity to increase the accuracy of ML predictions: A paradigm shift
The traditional approach to machine learning (ML) was to curate the data to a machine-readable level, train a model, and then fine-tune the model to improve the accuracy of the results. Andrew Ng who is a familiar name in the circle of ML enthusi ..read more