Renesh Bedre
371 FOLLOWERS
Covering broad topics related to Bioinformatics, Statistics, Machine learning, Python, and R for data analysis and visualization. Dr. Renesh
Bedre bridges the gap between high-throughput plant genetics data and actionable biological insights for experimental biologists.
Renesh Bedre
2d ago
In genomics and bioinformatics, samtools is widely used for extracting sequence reads from BAM file that fall within specific genomic regions.
samtools view command can be used as shown below to extract reads from single or multiple regions from the BAM file.
# extract reads from single region
samtools view -b input.bam "chr:start-end" > regions.bam
# extract reads from multiple regions
samtools view -b -L region.bed input.bam > regions.sam
To extract reads from a BAM file using samtools, you need to first create an index file (.bai) for the BAM file using samtools index input.bam ..read more
Renesh Bedre
2d ago
Samtools can be used for extracting mapped and unmapped sequence reads from SAM and BAM files.
Unlike single-end read filtering, you need to consider whether the paired-end reads are properly paired and both reads of the pairs are mapped while extracting mapped and unmapped sequence reads.
The paired-end reads are properly paired (concordant alignments) when both of the reads are mapped to the reference genome in the correct orientation as per library preparation protocol (e.g.first read on the forward strand and second read on the reverse strand). In addition, the properly paired reads will ..read more
Renesh Bedre
1M ago
Samtools is a suite of utilities commonly used in analyzing the aligned sequence data in the SAM (Sequence Alignment/Map) and BAM (Binary Alignment/Map) formats in bioinformatics and genomics analysis.
samtools view command with -F or -f parameter and a flag value is typically used in the filtering mapped and unmapped sequence reads from SAM/BAM files.
The flag value is a numerical value that encodes various properties of each read alignment. For example, the flag value of 4 (0x4) indicates that the sequence read does not have a valid alignment to the reference genome (unmapped sequence reads ..read more
Renesh Bedre
3M ago
A histogram is useful for visualizing the frequency distribution of data as a bar graph. The height of the bar in the histogram represents the frequency counts of observations falling into each interval (bins).
In this article, you will learn how to create the histogram using the numpy.histogram() function from Python NumPy package.
The general syntax of numpy.histogram() looks like this:
# import package
import numpy as np
# generate random numbers
np.histogram(data, bins=10)
Where,
Parameter
Description
data
Input data in array format
bins
number of equal-width bins. It can be s ..read more
Renesh Bedre
5M ago
The antilogarithm (antilog) refers to the inverse operation of a logarithmic (log) number. The antilog is used for finding the original number from the log number.
For example, the antilog of the log with a base 10 (log10) value can be found by raising the base value (10) to the power of the log value. If log10(x) = z then the antilog of z is 10z.
The following examples illustrates for how to find antilog of various log bases,
Base
Log
Antilog
10
log10(8) = 0.90309
100.90309 = 8
2
log2(8) = 3
23 = 8
e
log(8) = 2.079442
2.71822.079442 = 8
In R, you can use the 10^x, 2^x, or exp(x ..read more
Renesh Bedre
5M ago
The antilogarithm (antilog) refers to the inverse operation of a logarithmic (log) number. The antilog is used for finding the original number from the log number.
For example, the antilog of the log with a base 10 (log10) value can be found by raising the base value (10) to the power of the log value. If log10(x) = z then the antilog of z is 10z.
The following examples illustrates for how to find antilog of various log bases,
Base
Log
Antilog
10
log10(5) = 0.6989
100.6989 = 5
2
log2(5) = 2.3219
22.3219 = 5
e
log(5) = 1.6094
2.71821.6094 = 5
In Python, you can use the 10**x, 2**x ..read more
Renesh Bedre
5M ago
Quantiles and percentiles are often confusing statistics terms.
In data analysis, quantiles and percentiles are used for describing the distribution of data, as well as determining spread, relative position, and central tendency.
The key differences between quantiles and percentiles are:
Quantiles
Quantiles divide the dataset into any number of equal parts.
Quartiles and percentiles are parts of quantiles.
For example, quartiles and percentiles split the data into 4 and 100 equal parts
Quantiles are typically expressed as decimal values and range from 0 to 1 (e.g., 0.25, 0.5).
The 0.25 quanti ..read more
Renesh Bedre
5M ago
Quartile vs. Quantile vs. Percentile
In statistics, we often come across terms such as Quartiles, Quantile, and Percentiles, and often they are confusing to understand.
Quartiles, quantiles, and percentiles are used to describe the distribution of data, and particularly useful in understanding the spread, relative position, and central tendency of data.
The differences between quartiles, quantiles, and percentiles can be explained as follows:
Quartile
Quartiles divide a dataset into four equal parts, each containing 25% of the data.
Three quartiles (Q1, Q2, and Q3) are commonly used, which di ..read more
Renesh Bedre
5M ago
Survival analysis (also known as time-to-event analysis) is a statistical method for analyzing the duration of time until the event of interest occurs (e.g. death of patients).
The Kaplan-Meier survival method is a non-parametric statistical technique that estimates the survival probability of an event occurring at various points in survival time.
In the Kaplan-Meier survival curve, survival probability is plotted against survival time. The survival curve is useful for understanding the median survival time (the time at which survival probability is 50%).
The Kaplan-Meier survival method is a ..read more
Renesh Bedre
5M ago
Heatmap is a statistical visualization method for visualizing complex data sets in matrix form and quickly gaining insights from large datasets.
Heatmaps are widely used in bioinformatics for analyzing and visualizing large gene expression datasets obtained from different samples and conditions.
This tutorial explains how to use the Heatmap() function from the ComplexHeatmap R Bioconductor package for visualizing complex heatmaps.
Install ComplexHeatmap
You can install the ComplexHeatmap R package (from Bioconductor) as below:
if (!require("BiocManager", quietly = TRUE))
install.packages ..read more