UC San Diego's Data Science Blog – Center for Computational Biology & Bioinformatics
510 FOLLOWERS
A data science and bioinformatics blog written and maintained by UCSD's Center for Computational Biology & Bioinformatics
UC San Diego's Data Science Blog – Center for Computational Biology & Bioinformatics
3y ago
Amanda Birmingham (abirmingham at ucsd.edu)
Replicability and reproducibility* have been important components of the scientific method since Boyle and Huygens argued over their vacuum experiments in the 17th century. Since repeating a process the same way every time is one of the things computers do best, one might expect computational biology would outshine lab biology in this critical area. However, early findings incorporating bioinformatic analyses were often published with fulsome details on the wet lab work but barely a mention of the computational efforts.
In a 1986 paper from the Proce ..read more
UC San Diego's Data Science Blog – Center for Computational Biology & Bioinformatics
3y ago
Amanda Birmingham (abirmingham at ucsd.edu)
The excellent network-building web tool GeneMANIA has recently been given a facelift. Its new user interface is minimal and uncluttered–so much so that it took me a little trial and error to figure out what all the buttons and options did! In case you’re in the same boat, here’s the missing manual for how to use the new-and-improved GeneMANIA website to generate a network from your gene list. (To be clear, I don’t speak for the GeneMANIA project and am not associated with it in any way–just a fan )
Before beginnning, prepare a list of the genes fo ..read more
UC San Diego's Data Science Blog – Center for Computational Biology & Bioinformatics
3y ago
Julia Len (jlen at ucsd.edu)
Introduction
When working with networks, it is often useful to consider how similar two networks are. There are a number of ways of quantifying network similarity however. One could simply consider the number of nodes two networks have in common. However, this would miss any structural similarity, or lack thereof, between the edges. For example, it is possible for two networks to have completely identical node sets, but have completely disjoint edge sets. Note however that in order for two networks to share edges, they must shar ..read more
UC San Diego's Data Science Blog – Center for Computational Biology & Bioinformatics
3y ago
Amanda Birmingham (abirmingham at ucsd.edu)
Heat maps are a staple of data visualization for numerous tasks, including differential expression analyses on microarray and RNA-Seq data. Many people have already written heat-map-plotting packages for R, so it takes a little effort to decide which to use; here I investigate the performance of the six that I found referenced most frequently online.
My main goals (YMMV) beyond basic plotting were to be able to (a) annotate rows and columns with metadata information, (b) include scales and labels in the figure itself (since often figures are reuse ..read more
UC San Diego's Data Science Blog – Center for Computational Biology & Bioinformatics
3y ago
Brin Rosenthal (sbrosenthal at ucsd.edu)
You probably won’t get far learning about networks and graph theory before coming across communities and cliques in graphs. At first glance, these two concepts are quite similar- they both describe highly connected sets of nodes, after all. There are however situations which are best suited to one or the other. In this post we will explore some similarities and differences between communities and cliques, and a specific problem I came upon which I thought would be easily solved by a community-finding algorithm, but soon realized that cliques were the mu ..read more
UC San Diego's Data Science Blog – Center for Computational Biology & Bioinformatics
3y ago
Amanda Birmingham (abirmingham at ucsd.edu)
It is a truth universally acknowledged that a scientist in possession of a new dataset must be in want of visualization. Unfortunately, while its certainly better to look at our data than not to, choosing the wrong visual summary can impede good analyses by obscuring relevant features while encouraging unwarranted assumptions. In fact, even the friendly and familiar box plot can mislead the unwary!
Even the friendly and familiar box plot can mislead the unwary, but better options are a simple R command away!
While the limitations of box plots have lo ..read more
UC San Diego's Data Science Blog – Center for Computational Biology & Bioinformatics
3y ago
Brin Rosenthal (sbrosenthal at ucsd.edu)
Introduction
Data is everywhere these days, and being able to interact with visual representations of that data in real time can help bring it to life. You have to look no further than the D3 (data-driven-documents) examples page to see this. If you haven’t spent time browsing through the D3 examples library, I would highly recommend doing so, but be warned it is easy to spend a few captivating hours here! (A few of my favorites: collision avoidance, collapsible force layout, NCAA march madness predictions, pre ..read more
UC San Diego's Data Science Blog – Center for Computational Biology & Bioinformatics
3y ago
Amanda Birmingham (abirmingham at ucsd.edu)
Jupyter notebooks are wonderful, but eventually you will need to present your work to someone unable (or unwilling) to view it on a notebook server. Unfortunately, there are surprising difficulties in printing or otherwise outputting Jupyter notebooks attractively into a static, offline format. These difficulties are not limited to Python-kernel notebooks: R-kernel notebooks have their own issues. Here’s a description of those issues, and a work-around that doesn’t require learning to modify jinja2 templates.
Table of Contents
HTML Output: Mangled ..read more
UC San Diego's Data Science Blog – Center for Computational Biology & Bioinformatics
3y ago
In analysis of differential expression data, it is often useful to analyze properties of the local neighborhood of specific genes. I developed a simple interactive tool for this purpose, which takes as input diferential expression data, and gene interaction data (from http://www.genemania.org/). The network is then plotted in an interactive widget, where the node properties, edge properties, and layout can be mapped to different network properties. The interaction type (of the 6 options from genemania) can also be selected.
This notebook will also serve as an example for how to create ..read more