Postman - Auto Login & Renew OAuth2 Token

Introduction

When using Postman to interact with APIs behind an OAuth2 authentication, we need to login and renew the token manually. This can be automated using the following steps.

  • Set credentials in environment variables
  • Create a pre-request script to login and renew the token
  • Use the token in the request headers

Automating Login & Renewal

var e = pm.environment;
var isSessionExpired = true;

var loginTimestamp = e.get("loginTimestamp");
var expiresInSeconds = pm.environment.get("expiresInSeconds") || 86400;

if (loginTimestamp) {
  var loginDuration = Date.now() - loginTimestamp;
  isSessionExpired = loginDuration >= expiresInSeconds;
}

if (isSessionExpired) {
  pm.sendRequest({
    url: e.get('host') + "/auth/connect/token",
    method: 'POST',
    header: {
      'Content-Type': 'application/x-www-form-urlencoded',
      'Accept': 'application/json'
    },
    body: {
        mode: 'urlencoded',
        urlencoded: [
          { key: "username", value: e.get('username') },
          { key: "password", value: e.get('password') },
          { key: "grant_type", value: "password" },
          { key: "client_id", value: e.get("client_id") }
        ]
    }
  }, function (err, res) {
    jsonData = res.json();

    e.set("access_token", jsonData.access_token);

    if(res.json().expires_in){
        expiresInSeconds = res.json().expires_in * 1000;
    }
    e.set("expiresInSeconds", expiresInSeconds);
    e.set("loginTimestamp", Date.now())
  });
}

We can copy this script to the pre-request script of the collection.

Cockpit

Most of the script is self-explanatory. The script checks if the session is expired and sends a request to the token endpoint to get a new token. The token is stored in environment variables and used in the request headers.

Conclusion

This is a one time setup for Postman collection and it saves a lot of time in the long run. The script can be modified to handle different grant types and token renewal strategies.

Install Cockpit on Remote Linux VM

Introduction

Cockpit

Cockpit is an easy to use web-based interface(like a cPanel) for managing Linux servers. When we want to provide access to non-developers or people who are new to linux, it is a good idea to get them started with Cockpit. It provides a user-friendly interface to manage services, containers, storage, logs, and more.

Setup

Let's create a new Ubuntu VM and install Cockpit on it.

sudo apt update
. /etc/os-release
sudo apt install -t ${VERSION_CODENAME}-backports cockpit

Once the installation is complete, we can get the public ip of the VM and access the Cockpit web interface running on port 9090.

It will be difficult to remember the public ip of the VM. So, let's create a DNS record for the VM. Let's add an A record in DNS settings to point cockpit.avilpage.com to the public ip of the VM.

Reverse Proxy

Let's set up a reverse proxy to access the Cockpit web interface using a subdomain.

sudo apt install caddy

Add the below configuration to /etc/caddy/Caddyfile.

cockpit.avilpage.com {
    reverse_proxy localhost:9090
}

We need Origins to Cockpit configuration at /etc/cockpit/cockpit.conf to allow requests from the subdomain.

[WebService]
Origins = https://cockpit.avilpage.com

Restart both services and open https://cockpit.avilpage.com in browser.

sudo systemctl restart cockpit
sudo systemctl restart caddy

Conclusion

Cockpit web UI is a great tool to manage Linux servers even for non-developers. Users can browse/manage logs, services, etc. It also provides a terminal to run commands on the server

Mastering "Partial Covered Calls" - Part 1

Covered Calls

In a covered call strategy, we buy one lot of stocks (or 1 Future) and sell at the money call option. The payoff diagram looks like this:

Covered Call

There are 2 drawbacks of this strategy:

  1. It requires a lot of capital to buy the shares. We can't fully use the margin from pledging the stocks.
  2. We need to sell the stocks at expiry if stock price closes above the strike price.

To overcome this limitation, we can use a strategy called "partial covered calls."

Partial Covered Calls

Instead of buying one lot of shares, we can buy "partial" or "fractional" lot of shares and then sell a far away call option instead of at the money call option.

For example, we can buy 0.15 lot of shares and sell a call option which is 10% away from the current price. The payoff diagram looks like this:

Partial Covered Call

Here we can pledge the stocks we have bought and use the margin to sell call option using that margin.

Since we are selling a call option which is far away from the current price, the probability of the call option getting exercised is very low. So we can keep the premium we received from selling the call option.

In addition to that, we get the long term capital appreciation of the stocks we have bought and there won't be short term capital gains tax on them.

Conclusion

If you want to hold the stock for long but still want to generate regular income from it, then partial covered calls is a good strategy to consider.

Cube & Cubicle

Rubiks Cube

When I was in college, I was traveling to a friend's place and missed bus at midnight. The next bus was at 4 AM. While I was bored waiting for the bus, I found Rubik's Cube in a shop.

I scrambled the cube and spent the next 4 hours trying to solve the cube. I managed to solve one color. When I tried to solve the next color, the pieces in the previous layer started missing.

Even after spending a lot of time in the next 3 weeks, I couldn't solve it and gave up.

After a couple of years, when I "learnt" about the internet, I searched and found simple algorithms to solve the cube. Within a few days, I was able to solve the cube in a minute.

Office Cubicles

In the final year of college, there were placements. When I was preparing resume, I included "I can solve Rubik's Cube in a minute" in it.

During the interview, interviewer asked me if I can really solve the cube in a minute. He asked me to get my cube and show him during the lunch break. I did. Luckily, I got hired.

Even though, I was hired for Wipro I didn't join. I went to Bangalore and started applying for start-up jobs.

I went for an interview at a web development company in Malleswaram, Bangalore. The CEO looked at my résumé, took out a cube from his desk. He handed the cube to me, showed an empty cubicle behind me and said, "If you solve the cube in a minute, that cubicle is yours."

Just by learning the cube, I was able to land a job an at an MNC(Multi National Company) and a startup as well.

tailscale: Resolving CGNAT (100.x.y.z) Conflicts

Introduction

In an earlier blog post, I wrote about using tailscale to remotely access any device1. Tailscale uses 100.64.0.0/10 subnet2 to assign unique IP addresses to each device.

When a tailscale node joins another campus network3 (schools, universities, offices) that uses the same subnet, it will face conflicts. Let's see how to resolve this.

Private Network

tailscale dashboard

In the above scenario, node C1 will be able to connect C2 & C3 as they are in the same network.

Once we start tailscale on node C1, it will get a 100.x.y.z IP address from tailscale subnet. Now, node C1 will not be able to connect to node C2 & C3.

To avoid conflicts with the existing network, we can configure tailscale to use a "smaller" subnet using "ipPool".

{
    "acls": [
        "..."
    ],
    "nodeAttrs": [
        {
            "target": [
                "autogroup:admin"
            ],
            "ipPool": [
                "100.100.96.0/20"
            ]
        }
    ]
}

Once it is configured, taiscale will start assigning IP addresses from the new subnet. Even though ip address allocation is limited, we can't still access nodes in other subnets due to a bug5 in tailscale.

As a workaround, we can manually update the iptables to route traffic to the correct subnet.

Lets look at the iptables rules added by tailscale by stopping it and then starting it.

tailscale iptables rules

tailscale iptables rules

The highlighted rule drops any incoming packet that doesn't originate from tailscale0 interface, and source IP is 100.64.0.0/10 (100.64.0.0 to 100.127.255.255).

Let's delete this rule and add a new rule to restrict the source IP to 100.100.96.0/20 (100.100.96.1 to 100.100.111.254).

$ sudo iptables --delete ts-input --source 100.64.0.0/10 ! -i tailscale0 -j DROP
$ sudo iptables --insert ts-input 3 --source 100.100.96.0/20 ! -i tailscale0 -j DROP

tailscale iptables rules

Conclusion

By configuring tailscale to use a smaller subnet, we can avoid conflicts with existing networks. Even though there is a bug in tailscale, we can manually update iptables to route traffic to the correct subnet.

Mastering Kraken2 - Part 4 - Build FDA-ARGOS Index

Mastering Kraken2

Part 1 - Initial Runs

Part 2 - Classification Performance Optimisation

Part 3 - Build custom database indices

Part 4 - Build FDA-ARGOS index (this post)

Part 5 - Regular vs Fast Builds (upcoming)

Part 6 - Benchmarking (upcoming)

Introduction

In the previous post, we learnt how to build a custom index for Kraken2.

FDA-ARGOS1 is a popular database with quality reference genomes for diagnostic usage. Let's build an index for FDA-ARGOS.

FDA-ARGOS Kraken2 Index

FDA-ARGOS db is available at NCBI2 from which we can download the assembly file.

FDA-ARGOS NCBI

We can extract accession numbers from the assembly file and then download the genomes from these accession ids.

$ grep -e "^#" -v PRJNA231221_AssemblyDetails.txt | cut -d$'\t' -f1 > accessions.txt

$ wc accessions.txt
 1428  1428 22848 accessions.txt

$ ncbi-genome-download --section genbank --assembly-accessions accessions.txt --progress-bar bacteria --parallel 40

It took ~8 minutes to download all the genomes, and the downloaded file size is ~4GB.

We can use kraken-db-builder3 tool to build index from these genbank genome files.

# kraken-db-builder needs this to convert gbff to fasta format
$ conda install -c bioconda any2fasta

$ kraken-db-builder --genomes-dir genbank --threads 36 --db-name k2_argos

It took ~30 minutes to build the index.

Conclusion

We have built a Kraken2 index for the FDA-ARGOS database on 2024-Aug-24.

In the next post, we will look at the differences between regular and fast builds.

Midnight Coding for Narendra Modi & Ivanka Trump

GES 2017, modi trump mitra

Introduction

In 2017, GES Event was held in Hyderabad, India. Narendra Modi (the Prime Minister of India) & Ivanka Trump (daughter of the then US President Donald Trump) were the chief guests.

At that time, I was part of Invento team, and we decided to develop a new version of Mitra robot for the event.

The Challenge

We had to develop the new version of Mitra robot in a short span of time. Entire team worked day and night to meet the deadlines and finish the new version.

We went to Hyderabad from Bangalore a few days before to prepare for the event. We have cleared multiple security checks, did some demos for various people before the event.

A day before the event, around 9 PM we discovered a critical bug in the software. Due to that bug, the Robot motors were running at full speed which was dangerous. If the robot hits someone at full speed, it could cause serious injuries.

I spent a few hours debugging the issue and even tried rolling back a few versions. Still, I couldn't pinpoint the issue.

Since we need only a small set of Robot features, we decided to create a new version of the software with only limited features. I spent the next few hours creating a new release.

After that, we spent the next few hours doing extensive testing to make sure there are no bugs in the new version.

It was almost morning by the time we were done with testing. We quickly went to hotel to have some rest and get back early for the event.

Conclusion

Mitra robot welcoming Modi & Trump went very well. You can read about Balaji Viswanathan's experience at GES 2017 on Quora1.

GES 2017, modi trump mitra anand

How (and when) to use systemd timer instead of cronjob

Introduction

* * * * * bash demo.sh

Just a single line of code is sufficient to schedule a cron job. However, there are some scenarios where I find systemd timer more useful than cronjob.

How to use systemd timer

We need to create a service file(contains the script to be run) and a timer(contains the schedule).

# demo.service
[Unit]
Description=Demo service

[Service]
ExecStart=bash demo.sh
# demo.timer
[Unit]
Description=Run myscript.service every 1 minutes

[Timer]
OnBootSec=1min
OnUnitActiveSec=1min

[Install]
WantedBy=multi-user.target

We can copy these files to /etc/systemd/system/ and enable the timer.

$ sudo cp demo.service demo.timer /etc/systemd/system/

$ sudo systemctl daemon-reload

$ sudo systemctl enable --now demo.timer

We can use systemctl to see when the task is executed last and when it will be executed next.

$ sudo systemctl list-timers --all

systemd timer

Use Cases

  • Singleton - In the above example, lets say demo.sh takes ~10 minutes to run. With cron job, in ten minutes we will have 10 instances of demo.sh running. This is not ideal. With systemd timer, it will ensure only one instance of demo.sh is running at a time.

  • On demand runs - If we want to test out the script/job, systemd allows us to immediately run it with usual systemctl start demo without needing to run the script manually.

  • Timer - With cron, we can run tasks upto a minute precision. Timer can run tasks till second level precision.

[Timer]
OnCalendar=*-*-* 15:30:15

In addition to that, we can run tasks based on system events. For example, we can run a script 15 minutes from reboot.

[Timer]
OnBootSec=15min

Conclusion

Systemd timer is a powerful tool that can replace cronjob in many scenarios. It provides more control and flexibility over cronjob. However, cronjob is still a good choice for simple scheduling tasks.

Mastering Kraken2 - Part 3 - Build Custom Database

Mastering Kraken2

Part 1 - Initial Runs

Part 2 - Classification Performance Optimisation

Part 3 - Build custom database indices (this post)

Part 4 - Build FDA-ARGOS index

Part 5 - Regular vs Fast Builds (upcoming)

Part 6 - Benchmarking (upcoming)

Introduction

In the previous post, we learned how to improve kraken21 classification performance. So far we have downloaded & used pre-built genome indices(databases).

In this post, let's build a custom database for kraken2. For simplicity, let's use only refseq archaea genomes2 for building the index.

Building Custom Database

First, we need to download the taxonomy files. We can use the k2 script provided by kraken2.

$ k2 download-taxonomy --db custom_db

This takes ~30 minutes depending on the network speed. The taxonomy files are downloaded to the custom_db/taxonomy directory.

$ ls custom_db/taxonomy
citations.dmp  division.dmp  gencode.dmp  merged.dmp  nodes.dmp
nucl_wgs.accession2taxid delnodes.dmp  gc.prt 
images.dmp  names.dmp  nucl_gb.accession2taxid  readme.txt

$ du -hs custom_db/taxonomy
43G     custom_db/taxonomy

For simplicity, let's use the archaea refseq genomes. We can use kraken2-build to download the refseq genomes.

$ k2 download-library --library archaea --db custom_db

This runs on a single thread. Instead of using kraken2-build, we can use ncbi-genome-download3 tool to download the genomes. This provides much granular control over the download process. For example, we can download only --assembly-levels complete genomes. We can also download multiple genomes in parallel.

$ pip install ncbi-genome-download

$ conda install -c bioconda ncbi-genome-download

$ ncbi-genome-download -s refseq -F fasta --parallel 40 -P archaea
Checking assemblies: 100%|███| 2184/2184 [00:19<00:00, 111.60entries/s]
Downloading assemblies: 100%|███| 2184/2184  [02:04<00:00,  4.54s/files]
Downloading assemblies: 2184files [02:23, 2184files/s]

In just 2 minutes, it has downloaded all the files. Lets gunzip the files.

$ find refseq -name "*.gz" -print0 | parallel -0 gunzip

$ du -hs refseq
5.9G    refseq

Lets add all fasta genome files to the custom database

$ time find refseq -name "*.fna" -exec kraken2-build --add-to-library {} --db custom_db \;
667.46s user 90.78s system 106% cpu 12:54.80 total

kraken2-build doesn't use multiple threads for adding genomes to the database. In addition to that, it also doesn't check if the genome is already present in the database.

Let's use k2 for adding genomes to the database.

export KRAKEN_NUM_THREADS=40

$ find . -name "*.fna" -exec k2 add-to-library --files {} --db custom_db \;
668.37s user 88.44s system 159% cpu 7:54.40 total

This took only half the time compared to kraken2-build.

Let's build the index from the library.

$ time kraken2-build --db custom_db --build --threads 36
Creating sequence ID to taxonomy ID map (step 1)...
Found 0/125783 targets, searched through 60000000 accession IDs...
Found 59923/125783 targets, searched through 822105735 accession IDs, search complete.
lookup_accession_numbers: 65860/125783 accession numbers remain unmapped, see unmapped.txt in DB directory
Sequence ID to taxonomy ID map complete. [2m1.950s]
Estimating required capacity (step 2)...
Estimated hash table requirement: 5340021028 bytes
Capacity estimation complete. [23.875s]
Building database files (step 3)...
Taxonomy parsed and converted.
CHT created with 11 bits reserved for taxid.
Completed processing of 59911 sequences, 3572145823 bp
Writing data to disk...  complete.
Database files completed. [12m3.368s]
Database construction complete. [Total: 14m29.666s]
kraken2-build --db custom_db --build --threads 36  24534.98s user 90.50s system 2831% cpu 14:29.75 total

$ ls -ll
.rw-rw-r-- 5.3G anand  1 Aug 16:35 hash.k2d
drwxrwxr-x    - anand  1 Aug 12:32 library
.rw-rw-r--   64 anand  1 Aug 16:35 opts.k2d
.rw-rw-r-- 1.5M anand  1 Aug 16:22 seqid2taxid.map
.rw-rw-r-- 115k anand  1 Aug 16:23 taxo.k2d
lrwxrwxrwx   20 anand  1 Aug 12:31 taxonomy
.rw-rw-r-- 1.2M anand  1 Aug 16:22 unmapped.txt

We are able to build index for ~6GB input files in ~15 minutes.

Conclusion

We learnt some useful tips to speed up the custom database creation process. In the next post, we will learn about regular vs. fast builds.

Mastering Kraken2 - Part 2 - Performance Optimisation

Mastering Kraken2

Part 1 - Initial Runs

Part 2 - Classification Performance Optimisation (this post)

Part 3 - Build custom database indices

Part 4 - Build FDA-ARGOS index

Part 5 - Regular vs Fast Builds (upcoming)

Part 6 - Benchmarking (upcoming)

Introduction

In the previous post, we learned how to set up kraken21, download pre-built indices, and run kraken2. In this post, we will learn various ways to speed up the classification process.

Increasing RAM

Kraken2 standard database is ~80GB in size. It is recommended to have at least db size RAM to run kraken2 efficiently2. Let's use 128GB RAM machine and run kraken2 with ERR103599773 sample.

$ time kraken2 --db k2_standard --report report.txt ERR10359977.fastq.gz > output.txt
Loading database information... done.
95064 sequences (14.35 Mbp) processed in 2.142s (2662.9 Kseq/m, 402.02 Mbp/m).
  94816 sequences classified (99.74%)
  248 sequences unclassified (0.26%)
kraken2 --db k2_standard --report report.txt ERR10359977.fastq.gz >   1.68s user 152.19s system 35% cpu 7:17.55 total

Now the time taken has come down from 40 minutes to 7 minutes. The classification speed has also increased from 0.19 Mbp/m to 402.02 Mbp/m.

The previous sample had only a few reads, and the speed is not a good indicator. Let's run kraken2 with a larger sample.

$ time kraken2 --db k2_standard --report report.txt --paired SRR6915097_1.fastq.gz SRR6915097_2.fastq.gz > output.txt
Loading database information... done.
Processed 14980000 sequences (2972330207 bp) ...
17121245 sequences (3397.15 Mbp) processed in 797.424s (1288.2 Kseq/m, 255.61 Mbp/m).
  9826671 sequences classified (57.39%)
  7294574 sequences unclassified (42.61%)
kraken2 --db k2_standard --report report.txt --paired > output.txt  526.39s user 308.24s system 68% cpu 20:23.86 total

This took almost 20 minutes to classify ~3 Gbp of data. Out of 20 minutes, 13 minutes was spent in classification. The remaining time in loading the db into memory.

Let's use k2_plusPF4 db, which is twice the size of k2_standard and run kraken2.

$ time kraken2 --db k2_plusfp --report report.txt --paired SRR6915097_1.fastq.gz SRR6915097_2.fastq.gz > output.txt
Loading database information...done.
17121245 sequences (3397.15 Mbp) processed in 755.290s (1360.1 Kseq/m, 269.87 Mbp/m).
  9903824 sequences classified (57.85%)
  7217421 sequences unclassified (42.15%)
kraken2 --db k2_plusfp/ --report report.txt --paired SRR6915097_1.fastq.gz  >   509.71s user 509.51s system 55% cpu 30:35.49 total

This took ~30 minutes to complete, but the classification took only 13 minutes similar to k2_standard. The remaining time was spent in loading the db into memory.

Preloading db into RAM

We can use vmtouch5 to preload db into RAM. kraken2 provides --memory-mapping option to use preloaded db.

$ vmtouch -vt k2_standard/hash.k2d k2_standard/opts.k2d k2_standard/taxo.k2d
           Files: 3
     Directories: 0
   Touched Pages: 20382075 (77G)
         Elapsed: 434.77 seconds

When Linux requires RAM, it will incrementally evict the db from memory. To prevent this, we can copy the db to shared memory (/dev/shm) and then use vmtouch to preload the db.

$ cp -r k2_standard /dev/shm

$ vmtouch -t /dev/shm/*.k2d

Now, let's run kraken2 with --memory-mapping option.

$ time kraken2 --db k2_standard --report report.txt --memory-mapping --paired SRR6915097_1.fastq.gz SRR6915097_2.fastq.gz > output.txt
Loading database information... done.
17121245 sequences (3397.15 Mbp) processed in 532.486s (1929.2 Kseq/m, 382.79 Mbp/m).
  9826671 sequences classified (57.39%)
  7294574 sequences unclassified (42.61%)
  kraken2 --db k2_standard --report report.txt --paired SRR6915097_1.fastq.gz   >  424.20s user 11.76s system 81% cpu 8:54.98 total

Now the classification took only ~10 minutes.

Multi threading

kraken2 supports multiple threads. I am using a machine with 40 threads.

$ time kraken2 --db k2_standard --report report.txt --paired SRR6915097_1.fastq.gz SRR6915097_2.fastq.gz --memory-mapping --threads 32 > output.txt
Loading database information... done.
17121245 sequences (3397.15 Mbp) processed in 71.675s (14332.5 Kseq/m, 2843.81 Mbp/m).
  9826671 sequences classified (57.39%)
  7294574 sequences unclassified (42.61%)
kraken2 --db k2_standard --report report.txt --paired SRR6915097_1.fastq.gz      556.58s user 22.85s system 762% cpu 1:16.02 total

With 32 threads, the classification took only 1 minute. Beyond 32 threads, the classification time did not decrease significantly.

Optimising input files

So far we have used gzipped input files. Let's use unzipped input files and run kraken2.

$ gunzip SRR6915097_1.fastq.gz
$ gunzip SRR6915097_2.fastq.gz

$ time kraken2 --db k2_standard --report report.txt --paired SRR6915097_1.fastq SRR6915097_2.fastq --memory-mapping --threads 30 > output.txt
Loading database information... done.
17121245 sequences (3397.15 Mbp) processed in 34.809s (29512.0 Kseq/m, 5855.68 Mbp/m).
  9826671 sequences classified (57.39%)
  7294574 sequences unclassified (42.61%)
kraken2 --db k2_standard --report report.txt --paired SRR6915097_1.fastq    30   565.03s user 17.12s system 1530% cpu 38.047 total

Now the classification time has come down to 40 seconds.

Since the input fastq files are paired, interleaving the files also takes time. Lets interleave the files and run kraken2.

To interleave the files, lets use seqfu tool.

$ conda install -y -c conda-forge -c bioconda "seqfu>1.10"

$ seqfu interleave -1 SRR6915097_1.fastq.gz -2 SRR6915097_2.fastq.gz > SRR6915097.fastq

$ time kraken2 --db k2_standard --report report.txt --memory-mapping SRR6915097.fq --threads 32 > output.txt
Loading database information... done.
34242490 sequences (3397.15 Mbp) processed in 20.199s (101714.1 Kseq/m, 10090.91 Mbp/m).
  17983321 sequences classified (52.52%)
  16259169 sequences unclassified (47.48%)
kraken2 --db k2_standard --report report.txt --memory-mapping SRR6915097.fq  32  618.96s user 18.24s system 2653% cpu 24.013 total

Now the classification time has come down to 24 seconds.

Conclusion

In terms of classification speed, we have come a long way from 0.1 Mbp/m to 1200 Mbp/m. In the next post, we will learn how to optimise the creation of custom indices.