Posts

Gtf Polars

June 09, 2026

Bioinformatics has a whole ecosystem of file formats. Gencode hosts GTF files for human gene annotations GTF stands for Gene Tranfer Format, defined by 9 tab delimited columns, as described here:

I’ve been writing Nextflow workflows for about a year now, but suprisingly hadn’t run any of the nf-core workflows until recently. Well, I finally got a chance to run the nf-core/sarek to generate pre-processed BAM files starting from raw FASTQ files.

Getting comfortable with GCP

August 15, 2025

I recently assumed the role of PTA President at my kids’ school. It’s my first experience with executive leadership and I’m quite excited. One of the goals of mine is to streamline PTA business processes, so they are less cumbersome and more efficient. One of the ways I’ve been doing this is building simple web applications with Streamlit and deploying them with Cloud Run on GCP.

Nextflow stub run

May 31, 2025

I’m still doing a lot Nextflow development lately for PacBio HiFi WGS secondary analysis. My workflow farily comprehensive, starting from un-aligned reads in BAM format, performing reference guided alignment, and then proceeding with DNA short variant calling, CNV and structural variant calling, and CpG and 6mA methylation calling.

Parsing JSON with Nextflow

May 13, 2025

I had a LinkedIn post a few weeks ago about design patterns in Nextflow described in this repo To my suprise, it went kinda viral by my modest standards - 40 likes and 2400 impressions.

CellXGene Census data processing pipeline with Nextflow

April 13, 2025

In an earlier post, I talked about using GitHub Actions to build and push Docker images. But I didn’t talk about what code was in the Docker image and what is was doing. That’s what this post is for.

My first GitHub Actions

April 08, 2025

I’ve been meaning to learn how to use GitHub Actions for a while now. I recently re-factored some code to run in a Docker container. Each time I updated the Python code, I had to re-build the Docker image and push to Docker Hub. A couple of times I forgot to do this, and I would run the code in the container, only to find out that my changes were not reflected in the image I was using.

Updated my Streamlit app to use Polars

March 06, 2025

I had posted ealier about exploring Polars, a Rust based dataframe library with a Python frontend. Polars is similar to Pandas, but is faster and more memory efficient. Polars is also designed to work with large datasets that don’t fit into memory.

Docker image with PacBio secondary analysis tools

February 20, 2025

I’ve been working a lot with PacBio Hifi data lately. After having zero prior experience with PacBio secondary analysis up until about 7 months ago, I’ve implemented a WGS and RNA-seq Nextflow pipeline.

Polars Lazyframes

February 17, 2025

I’ve been taking a closer look at Polars, a Rust based dataframe library with a Python frontend. Polars is similar to Pandas, but is faster and more memory efficient. Polars is also designed to work with large datasets that don’t fit into memory.

My first AWS Lambda function

January 23, 2025

Over the past few months I’ve become a bit more familiar with various Amazon Web Services (AWS) offerings, incuding EC2, S3, and Batch. I’ve heard about AWS Lambda, but never had a chance to try it out. So I decided to work through this tutorial to create my first Lambda function.

Running VEP with GTF files

January 23, 2025

The Variant Effect Predictor (VEP) is a popular tool for annotating genetic variants and it seems like has been around forever. Typically, running VEP requires the use of a cache file, which is a downloadable file containing transcript models and other features requried for variant annotation.

Hello World

January 17, 2025

Learning how to use Jekyll and GitHub Pages to create a blog. This is my first post.