Speeding sourmash the heck up
Living in an Ivory Basement
by C. Titus Brown
1M ago
sourmash is our tool for genome and metagenome investigation. Using and developing it has been a major focus of our lab for over 7 years, and maintaining and extending it is my main passion project. sourmash is a k-mer multitool that enables all sorts of really neat bulk metagenome analyses! I'm proud to say that last week we released a new version of sourmash, v4.8.6, that continues to improve functionality, increase documentation, and decrease computational requirements. But, you know, we release new versions of sourmash pretty regularly, so that's only moderately exciting :). A bit more exc ..read more
Visit website
The history of the "Tragedy of the Commons"
Living in an Ivory Basement
by C. Titus Brown
1M ago
I've been really interested in applying lessons from common pool resource theory to my own work and interests in open source and open science (see my various posts). The framework around this created by Dr. Elinor Ostrom, for which she received the Nobel Prize in Economics, is awe-inspiring and incredibly motivational! I've also thoroughly enjoyed the Frontiers of Commoning podcast that David Bollier runs, which showcases many ongoing communities and efforts in these areas. All of this is strongly coupled (negatively) to the well-known concept of the Tragedy of the Commons, published in 1968 b ..read more
Visit website
Sourmash and branchwater licensing: thoughts on extractive engagement with projects
Living in an Ivory Basement
by C. Titus Brown
1M ago
I am helping maintain some petabase-scale genomic search infrastructure as part of the sourmash and branchwater projects. One of the questions that's frequently in the back of my mind is how to incentivize commons-style engagement rather than extractive engagement, and a key tool for this purpose is licensing. Sourmash is BSD-licensed, which, in essence, means that anyone can do whatever they want with the code - including incorporating it unchanged into a commercial closed-source product, rebranding it as a new product, and/or changing it in incompatible ways (and then rebranding it as a new ..read more
Visit website
Snakemake for doing bioinformatics - using wildcards to generalize your rules
Living in an Ivory Basement
by C. Titus Brown
1y ago
As we showed in a previous blog post, when you have repeated substrings between input and output, you can extract them into wildcards - going from a rule that makes specific outputs: rule sketch_genomes_1: input: "genomes/GCF_000017325.1.fna.gz", output: "GCF_000017325.1.fna.gz.sig", shell: """ sourmash sketch dna -p k=31 {input} --name-from-first """ to a rule that makes any output that fits a pattern: rule sketch_genomes_1: input: "genomes/{accession}.fna.gz", output: "{accession}.fna.gz.sig", shell: """ sourmas ..read more
Visit website
Conda & mamba on shared clusters works better now!
Living in an Ivory Basement
by C. Titus Brown
1y ago
Friends! Countrymen! I bring you good tidings! The bug is dead! Long live conda/mamba on shared clusters! OK, wait. Let's back up. What's this bug, and why does it matter that it's fixed? It all starts with teaching... conda is, like, the best for teaching bioinformatics!! I've been teaching bioinformatics using conda for about 5 years now. Not only do I straight up teach conda/mamba but I also use it extensively in my Intro Bioinformatics hands-on lab for graduate students, where I teach variant calling, de novo assembly, and RNAseq. Mostly I teach on a shared cluster, the 'farm' HPC, because ..read more
Visit website
A brief overview of automation and parallelization options in UNIX/on an HPC
Living in an Ivory Basement
by C. Titus Brown
1y ago
What do you do if you have a lot of computing jobs to run, and lots of computing resources to run them? Let's play with some options! We'll run a simple set of bioinformatics analyses as an example, but all of the approaches below should work for a wide variety of command line needs. Most of the commands below should work as straight-up copy/paste. Please let me know if they don't! Setup and file preparation Download some metagenome assemblies from our metagenome assembly evaluation project. These are all files generated from from Shakya et al., 2014 - specifically, assemblies of SRR606249 ..read more
Visit website
Snakemake for doing bioinformatics - a beginner's guide (part 2)
Living in an Ivory Basement
by C. Titus Brown
1y ago
(The below post contains excerpts from Slithering your way into bioinformatics with snakemake, Hackmd Press, 2023.) In Section 1, we introduced snakemake as a system for (efficiently and effectively) running a series of shell commands. In Section 2, we'll explore a number of important features of snakemake. Together with Section 1, this section covers the core set of snakemake functionality that you need to know in order to effectively leverage snakemake. After this section, you'll be well positioned to write a few workflows of your own, and then you can come back and explore more advanced fea ..read more
Visit website
Snakemake for doing bioinformatics - a beginner's guide (part 1)
Living in an Ivory Basement
by C. Titus Brown
1y ago
(The below post contains excerpts from Slithering your way into bioinformatics with snakemake, Hackmd Press, 2023.) Installation and setup! Setup and installation I suggest working in a new directory. You'll need to install snakemake and sourmash. We suggest using conda, via miniforge, for this. Getting the data: You'll need to download these three files: * GCF_000021665.1_ASM2166v1_genomic.fna.gz * GCF_000017325.1_ASM1732v1_genomic.fna.gz * GCF_000020225.1_ASM2022v1_genomic.fna.gz and rename them so that they are in a subdirectory genomes/ with the names: GCF_000017325.1.fna.gz GCF_00002022 ..read more
Visit website
Sourmash has a plugin interface!
Living in an Ivory Basement
by C. Titus Brown
1y ago
Over the holiday break, I took on a "palette cleansing" project - something technically neat, that wasn't critically important to anyone or anything, but could be useful. I decided to implement plugins for sourmash. Sourmash is open-source scientific software for fast, lightweight exploration of sequencing data set comparison, with a focus on metagenomics. It's largely a command-line program written in Python on top of a Rust library. It is maintained by a small group of developers, most of whom are (or were) affiliated in some way with my academic lab at UC Davis. Python has (what seems to be ..read more
Visit website
Reading "Orwell's Roses" by Rebecca Solnit
Living in an Ivory Basement
by C. Titus Brown
1y ago
Happy New Year's Eve! So, one of my resolutions for 2023 is that I want to do more non-escapist reading. Why? And what had I been reading?? For the last three years I've been reading a lot of trashy books. Unless it was for work (biology/bioinformatics papers) or random infovore articles that I found online, I've read almost nothing but mystery novels, romance novels, and LitRPG. (Don't judge, it's been a weird three years. ;) Now, LitRPG is all well and good (He Who Fights with Monsters is super fun!) and I had plenty of reasons to escape, but I was avoiding anything requiring an attention sp ..read more
Visit website

Follow Living in an Ivory Basement on FeedSpot

Continue with Google
Continue with Apple
OR