Inputs Germline heterozygous SNPs, informative fo...
Blue Collar Bioinformatics - Community built tools for biological data analysis
by Brad Chapman
9M ago
Inputs Germline heterozygous SNPs, informative for purity/ploidy/clone estimation Need estimation for tumor-only Copy number calls – GC corrected and normalized (to normal or process-matched normal) Split copy number calls into major/minor alleles, potentially with multiple states Somatic variant calls with allele frequencies, for tumor subclones Estimate subclones from somatic calls + major/minor CNVs Challenges Heterogeneous input samples ranging from WGS tumor/normal to panel/capture tumor-only, would like to have similar workflow to handle most cases Lack of good truth sets, so hard to ..read more
Visit website
Data science infrastructure for agricultural sustainability and food security
Blue Collar Bioinformatics - Community built tools for biological data analysis
by Brad Chapman
9M ago
I’m excited to be joining Ginkgo Bioworks. Ginkgo and the synthetic biology community have an incredible amount of useful data in intricate experimental designs, measured screening outcomes and pre-existing biological knowledge. I’ll help organize, present, compute on and do science with this data. I hope to enable downstream applications that improve agricultural sustainability and food security. I’m motivated to help with building an increasingly fair and climate friendly agricultural system due to the dire warnings about the state of our planet. There are many different ways to contribute o ..read more
Visit website
Developing low frequency filters for cancer variant calling using VarDict
Blue Collar Bioinformatics - Community built tools for biological data analysis
by Brad Chapman
9M ago
Overview The underlying work that goes into preparing good filters for variant calling is not always exposed to the biologists who use variants for making research and clinical decisions. Filter sets like the GATK best practice pipeline hard filters perform well and are widely used, but lack detailed background on the underlying truth sets and methods used to derive them. As a result, some scientists treat variant calling and filtering as a solved problem. This creates a disconnect between researchers who work on the underlying algorithms and understand the filtering trade offs and imperfectne ..read more
Visit website
Validating small RNA analysis with miRQC
Blue Collar Bioinformatics - Community built tools for biological data analysis
by Lorena Pantano
9M ago
Small RNA-seq with bcbio-nextgen The study of small RNA helps to understand part of the gene regulation of a cell. There are different types of small RNAs, the most important in mammals are miRNA, tRNA fragments and piRNAs. An advantage of small RNA-seq analysis is that we can study all small RNA types simultaneously, with the possibility to also detect novel small RNAs. bcbio-nextgen is a Python framework supported by a big scientific community that implements best practices for next-generation sequencing data and uses gold standard data to validate its analyses. It is well known for its vari ..read more
Visit website
Validated variant calling with human genome build 38
Blue Collar Bioinformatics - Community built tools for biological data analysis
by Brad Chapman
9M ago
This post describes a ready to run bcbio implementation of variant calling and validation on human genome build 38, demonstrating its utility for improving variant detection. Human genome build 38 (hg38, GRCh38) offers a major upgrade over the previous build, 37 (hg19, GRCh37). The new genome reflects our increased understanding of the heterogeneity within human sub-populations and contains a large number of alternative genomic loci that better capture our knowledge of genome structure. Better genomic representation improves mapping, avoiding a source of hard to remove false positives during v ..read more
Visit website
Validating multiple cancer variant callers and prioritization in tumor-only samples
Blue Collar Bioinformatics - Community built tools for biological data analysis
by Brad Chapman
9M ago
Overview The post discusses work validating multiple cancer variant callers in bcbio-nextgen using a synthetic reference call set from the ICGC-TCGA DREAM challenge. We've previously validated germline variant calling methods, but cancer calling is additionally challenging. Tumor samples have mixed cellularity due to contaminating normal sample, and consist of multiple sub-clones with different somatic variations. Low-frequency sub-clonal variations can be critical to understand disease progression but are more difficult to detect with high sensitivity and precision. Publicly available whole ..read more
Visit website
Benchmarking variation and RNA-seq analyses on Amazon Web Services with Docker
Blue Collar Bioinformatics - Community built tools for biological data analysis
by Brad Chapman
9M ago
Overview We developed a freely available, easy to run implementation of bcbio-nextgen on Amazon Web Services (AWS) using Docker. bcbio is a community developed tool providing validated and scalable variant calling and RNA-seq analysis. The AWS implementation automates all of the steps of building a cluster, attaching high performance shared filesystems, and running an analysis. This makes bcbio readily available to the research community without the need to install and configure a local installation. The entire installation bootstraps from standard Linux AMIs, enabling adjustment of the tools ..read more
Visit website
Validating generalized incremental joint variant calling with GATK HaplotypeCaller, FreeBayes, Platypus and samtools
Blue Collar Bioinformatics - Community built tools for biological data analysis
by Brad Chapman
9M ago
Incremental joint variant calling Variant calling in large populations is challenging due to the difficulty in providing a consistent set of calls at all possible variable positions. A finalized set of calls from a large population should distinguish reference calls, without a variant, from no calls, positions without enough read support to make a call. Calling algorithms should also be able to make use of information from other samples in the population to improve sensitivity and precision. There are two issues with trying to provide complete combined call sets. First, it is computationally ..read more
Visit website
Validated whole genome structural variation detection using multiple callers
Blue Collar Bioinformatics - Community built tools for biological data analysis
by Brad Chapman
9M ago
Structural variant detection goals This post describes community based work integrating structural variant calling and validation into bcbio-nextgen. I've previously written about approaches for validating single nucleotide changes (SNPs) and small insertions/deletions (Indels), but it has always been unfortunate to not have reliable ways to detect larger structural variations: deletions, duplications, inversions, translocations and other disruptive events. Detecting these events with short read sequencing is difficult, and our goal in bcbio is to create a global summary of predicted structur ..read more
Visit website
Whole genome trio variant calling evaluation: low complexity regions, GATK VQSR and high depth filters
Blue Collar Bioinformatics - Community built tools for biological data analysis
by Brad Chapman
9M ago
Whole genome trio validation I've written previously about the approaches we use to validate the bcbio-nextgen variant calling framework, specifically evaluating aligners and variant calling methods and assessing the impact of BAM post-alignment preparation methods. We're continually looking to improve both the pipeline and validation methods and two recent papers helped advance best-practices for evaluating and filtering variant calls: Michael Linderman and colleagues describe approaches for validating clinical exome and whole genome sequencing results. One key result I took from the paper ..read more
Visit website

Follow Blue Collar Bioinformatics - Community built tools for biological data analysis on FeedSpot

Continue with Google
Continue with Apple
OR