Gtf Polars
June 09, 2026Bioinformatics has a whole ecosystem of file formats. Gencode hosts GTF files for human gene annotations GTF stands for Gene Tranfer Format, defined by 9 tab delimited columns, as described here:
Musings on Bioinformatics, Data Science, Python, R, and more.
Bioinformatics has a whole ecosystem of file formats. Gencode hosts GTF files for human gene annotations GTF stands for Gene Tranfer Format, defined by 9 tab delimited columns, as described here:
I’ve been writing Nextflow workflows for about a year now, but suprisingly hadn’t run any of the nf-core workflows until recently. Well, I finally got a chance to run the nf-core/sarek to generate pre-processed BAM files starting from raw FASTQ files.
I recently assumed the role of PTA President at my kids’ school. It’s my first experience with executive leadership and I’m quite excited. One of the goals of mine is to streamline PTA business processes, so they are less cumbersome and more efficient. One of the ways I’ve been doing this is building simple web applications with Streamlit and deploying them with Cloud Run on GCP.
I’m still doing a lot Nextflow development lately for PacBio HiFi WGS secondary analysis. My workflow farily comprehensive, starting from un-aligned reads in BAM format, performing reference guided alignment, and then proceeding with DNA short variant calling, CNV and structural variant calling, and CpG and 6mA methylation calling.
I had a LinkedIn post a few weeks ago about design patterns in Nextflow described in this repo To my suprise, it went kinda viral by my modest standards - 40 likes and 2400 impressions.
In an earlier post, I talked about using GitHub Actions to build and push Docker images. But I didn’t talk about what code was in the Docker image and what is was doing. That’s what this post is for.
I’ve been meaning to learn how to use GitHub Actions for a while now. I recently re-factored some code to run in a Docker container. Each time I updated the Python code, I had to re-build the Docker image and push to Docker Hub. A couple of times I forgot to do this, and I would run the code in the container, only to find out that my changes were not reflected in the image I was using.
I had posted ealier about exploring Polars, a Rust based dataframe library with a Python frontend. Polars is similar to Pandas, but is faster and more memory efficient. Polars is also designed to work with large datasets that don’t fit into memory.
I’ve been working a lot with PacBio Hifi data lately. After having zero prior experience with PacBio secondary analysis up until about 7 months ago, I’ve implemented a WGS and RNA-seq Nextflow pipeline.
I’ve been taking a closer look at Polars, a Rust based dataframe library with a Python frontend. Polars is similar to Pandas, but is faster and more memory efficient. Polars is also designed to work with large datasets that don’t fit into memory.
Over the past few months I’ve become a bit more familiar with various Amazon Web Services (AWS) offerings, incuding EC2, S3, and Batch. I’ve heard about AWS Lambda, but never had a chance to try it out. So I decided to work through this tutorial to create my first Lambda function.
The Variant Effect Predictor (VEP) is a popular tool for annotating genetic variants and it seems like has been around forever. Typically, running VEP requires the use of a cache file, which is a downloadable file containing transcript models and other features requried for variant annotation.
Learning how to use Jekyll and GitHub Pages to create a blog. This is my first post.