Autoscale k8s pods with queue size (KEDA)

keda-postgreql-scaling

Introduction

Kubernetes(k8s) is a popular container orchestration tool, and it provides Horizontal Pod Autoscaler(HPA) to scale pods based on CPU and Memory usage.

To scale pods based on the queue size we can use KEDA.

Setup

For this demo, we will use PostgreSQL table to store tasks and KEDA to scale the pods based on the queue size.

As written earlier, we can use k3d and setup k8s cluster with a single command anywhere.

Once the cluster is up, we can set up a simple shell script to produce and consume tasks from the PostgreSQL table.

I am adding only relevant snippets here. You can find the complete code in the GitHub repo.

PostgreSQL deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgres-deployment
spec:
  template:
    spec:
      containers:
        - name: postgres
          image: postgres:13

Producer shell script:

#!/bin/bash

psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" <<-EOSQL
  CREATE TABLE IF NOT EXISTS tasks (
      id SERIAL PRIMARY KEY,
      description TEXT NOT NULL,
      status TEXT NOT NULL
  );
EOSQL

# Continuously insert tasks
while true; do
    psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" <<-EOSQL
    INSERT INTO tasks (description, status)
    VALUES ('$DESCRIPTION', '$STATUS');
done

Producer dockerfile:

FROM postgres:13

RUN apt-get update && apt-get install -y bash curl

COPY producer.sh /producer.sh

RUN chmod +x /producer.sh

CMD ["/producer.sh"]

Producer deployment:

apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      containers:
        - name: task-producer
          image: task-producer
          imagePullPolicy: Never

Consumer shell script:

#!/bin/bash

TASK=$(psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" -t <<-EOSQL
  SELECT id, description, status FROM tasks
  WHERE status != 'Completed'
  ORDER BY id
  LIMIT 1;
EOSQL

TASK_ID=$(echo "$TASK" | awk '{print $1}')

psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" <<-EOSQL
  UPDATE tasks
  SET status = 'Completed'
  WHERE id = $TASK_ID;
EOSQL

Consumer dockerfile:

FROM postgres:13

RUN apt-get update && apt-get install -y bash

COPY consumer.sh /consumer.sh

RUN chmod +x /consumer.sh

CMD ["/consumer.sh"]

Consumer deployment:

apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      containers:
        - name: task-consumer
          image: task-consumer
          imagePullPolicy: Never

Instead of pushing these images to any container registry, we can directly load the images into cluster using k3d image.

k3d image import task-producer --cluster demo-cluster
k3d image import task-consumer --cluster demo-cluster

After that, we can deploy postgres, consumer and producer using the above deployment files.

Once the deployment is done, we can monitor the tasks from pod logs.

Lets install KEDA in the cluster and setup a scaled-object to scale the consumer pods based on number of pending tasks.

helm repo add kedacore https://kedacore.github.io/charts
helm repo update
helm install keda kedacore/keda 

ScaledObject:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: task-consumer-scaler
  namespace: default
spec:
  scaleTargetRef:
    name: task-consumer  
  pollingInterval: 1    
  cooldownPeriod: 1     
  minReplicaCount: 0    
  maxReplicaCount: 50   
  triggers:
    - type: postgresql
      metadata:
        host: postgres-service  
        query: "SELECT COUNT(*) FROM tasks WHERE status != 'Completed';"
        targetQueryValue: "10" 

With this, KEDA will scale the consumer pods based on the number of tasks in the PostgreSQL table. The targetQueryValue is set to 10, which means if there are more than 10 tasks in the table, KEDA will scale up the consumer pods.

In the cluster, after a while, a bunch of tasks will be created in the PostgreSQL table. Now, we can set the status of all tasks to empty so that we can see auto-scaling in action.

kubectl exec -it task-producer-<pod-id> -- psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" <<-EOSQL
  UPDATE tasks
  SET status = '';
EOSQL

On k8s dashboard, we can see the consumer pods scaling up and down based on the number of tasks in the PostgreSQL table.

keda-postgreql-scaling

Conclusion

KEDA is a powerful tool to scale k8s pods based on custom metrics. In this post, we have used PostgreSQL but we can use KEDA with variety of other data stores as well. Full code for this post is available here.

Free DockerHub Alternative - ECR Public Gallery

docker-rate-limits

DockerHub started rate limiting1 anonymous docker pulls. When testing out a new CI/CD setup, I hit the rate limit and had to wait for an hour to pull the image. This was a good time to look for alternatives.

AWS ECR Public Gallery2 is a good alternative to DockerHub as of today(2025 Feb). It is free and does not have rate limits even for anonymous users.

public-ecr-gallery

Once we find the required image from the gallery, we can simply change the image name in the docker pull command to pull the image from ECR Gallery.

docker pull public.ecr.aws/ubuntu/ubuntu

In Dockerfile, we can use the image from ECR Gallery as follows:

FROM public.ecr.aws/ubuntu/ubuntu

That is a quick way to avoid DockerHub rate limits.

Postman - Auto Login & Renew OAuth2 Token

Introduction

When using Postman to interact with APIs behind an OAuth2 authentication, we need to login and renew the token manually. This can be automated using the following steps.

  • Set credentials in environment variables
  • Create a pre-request script to login and renew the token
  • Use the token in the request headers

Automating Login & Renewal

var e = pm.environment;
var isSessionExpired = true;

var loginTimestamp = e.get("loginTimestamp");
var expiresInSeconds = pm.environment.get("expiresInSeconds") || 86400;

if (loginTimestamp) {
  var loginDuration = Date.now() - loginTimestamp;
  isSessionExpired = loginDuration >= expiresInSeconds;
}

if (isSessionExpired) {
  pm.sendRequest({
    url: e.get('host') + "/auth/connect/token",
    method: 'POST',
    header: {
      'Content-Type': 'application/x-www-form-urlencoded',
      'Accept': 'application/json'
    },
    body: {
        mode: 'urlencoded',
        urlencoded: [
          { key: "username", value: e.get('username') },
          { key: "password", value: e.get('password') },
          { key: "grant_type", value: "password" },
          { key: "client_id", value: e.get("client_id") }
        ]
    }
  }, function (err, res) {
    jsonData = res.json();

    e.set("access_token", jsonData.access_token);

    if(res.json().expires_in){
        expiresInSeconds = res.json().expires_in * 1000;
    }
    e.set("expiresInSeconds", expiresInSeconds);
    e.set("loginTimestamp", Date.now())
  });
}

We can copy this script to the pre-request script of the collection.

Cockpit

Most of the script is self-explanatory. The script checks if the session is expired and sends a request to the token endpoint to get a new token. The token is stored in environment variables and used in the request headers.

Conclusion

This is a one time setup for Postman collection and it saves a lot of time in the long run. The script can be modified to handle different grant types and token renewal strategies.

Install Cockpit on Remote Linux VM

Introduction

Cockpit

Cockpit is an easy to use web-based interface(like a cPanel) for managing Linux servers. When we want to provide access to non-developers or people who are new to linux, it is a good idea to get them started with Cockpit. It provides a user-friendly interface to manage services, containers, storage, logs, and more.

Setup

Let's create a new Ubuntu VM and install Cockpit on it.

sudo apt update
. /etc/os-release
sudo apt install -t ${VERSION_CODENAME}-backports cockpit

Once the installation is complete, we can get the public ip of the VM and access the Cockpit web interface running on port 9090.

It will be difficult to remember the public ip of the VM. So, let's create a DNS record for the VM. Let's add an A record in DNS settings to point cockpit.avilpage.com to the public ip of the VM.

Reverse Proxy

Let's set up a reverse proxy to access the Cockpit web interface using a subdomain.

sudo apt install caddy

Add the below configuration to /etc/caddy/Caddyfile.

cockpit.avilpage.com {
    reverse_proxy localhost:9090
}

We need Origins to Cockpit configuration at /etc/cockpit/cockpit.conf to allow requests from the subdomain.

[WebService]
Origins = https://cockpit.avilpage.com

Restart both services and open https://cockpit.avilpage.com in browser.

sudo systemctl restart cockpit
sudo systemctl restart caddy

Conclusion

Cockpit web UI is a great tool to manage Linux servers even for non-developers. Users can browse/manage logs, services, etc. It also provides a terminal to run commands on the server

Mastering "Partial Covered Calls" - Part 1

Covered Calls

In a covered call strategy, we buy one lot of stocks (or 1 Future) and sell at the money call option. The payoff diagram looks like this:

Covered Call

There are 2 drawbacks of this strategy:

  1. It requires a lot of capital to buy the shares. We can't fully use the margin from pledging the stocks.
  2. We need to sell the stocks at expiry if stock price closes above the strike price.

To overcome this limitation, we can use a strategy called "partial covered calls."

Partial Covered Calls

Instead of buying one lot of shares, we can buy "partial" or "fractional" lot of shares and then sell a far away call option instead of at the money call option.

For example, we can buy 0.15 lot of shares and sell a call option which is 10% away from the current price. The payoff diagram looks like this:

Partial Covered Call

Here we can pledge the stocks we have bought and use the margin to sell call option using that margin.

Since we are selling a call option which is far away from the current price, the probability of the call option getting exercised is very low. So we can keep the premium we received from selling the call option.

In addition to that, we get the long term capital appreciation of the stocks we have bought and there won't be short term capital gains tax on them.

Conclusion

If you want to hold the stock for long but still want to generate regular income from it, then partial covered calls is a good strategy to consider.

Cube & Cubicle

Rubiks Cube

When I was in college, I was traveling to a friend's place and missed bus at midnight. The next bus was at 4 AM. While I was bored waiting for the bus, I found Rubik's Cube in a shop.

I scrambled the cube and spent the next 4 hours trying to solve the cube. I managed to solve one color. When I tried to solve the next color, the pieces in the previous layer started missing.

Even after spending a lot of time in the next 3 weeks, I couldn't solve it and gave up.

After a couple of years, when I "learnt" about the internet, I searched and found simple algorithms to solve the cube. Within a few days, I was able to solve the cube in a minute.

Office Cubicles

In the final year of college, there were placements. When I was preparing resume, I included "I can solve Rubik's Cube in a minute" in it.

During the interview, interviewer asked me if I can really solve the cube in a minute. He asked me to get my cube and show him during the lunch break. I did. Luckily, I got hired.

Even though, I was hired for Wipro I didn't join. I went to Bangalore and started applying for start-up jobs.

I went for an interview at a web development company in Malleswaram, Bangalore. The CEO looked at my résumé, took out a cube from his desk. He handed the cube to me, showed an empty cubicle behind me and said, "If you solve the cube in a minute, that cubicle is yours."

Just by learning the cube, I was able to land a job an at an MNC(Multi National Company) and a startup as well.

tailscale: Resolving CGNAT (100.x.y.z) Conflicts

Introduction

In an earlier blog post, I wrote about using tailscale to remotely access any device1. Tailscale uses 100.64.0.0/10 subnet2 to assign unique IP addresses to each device.

When a tailscale node joins another campus network3 (schools, universities, offices) that uses the same subnet, it will face conflicts. Let's see how to resolve this.

Private Network

tailscale dashboard

In the above scenario, node C1 will be able to connect C2 & C3 as they are in the same network.

Once we start tailscale on node C1, it will get a 100.x.y.z IP address from tailscale subnet. Now, node C1 will not be able to connect to node C2 & C3.

To avoid conflicts with the existing network, we can configure tailscale to use a "smaller" subnet using "ipPool".

{
    "acls": [
        "..."
    ],
    "nodeAttrs": [
        {
            "target": [
                "autogroup:admin"
            ],
            "ipPool": [
                "100.100.96.0/20"
            ]
        }
    ]
}

Once it is configured, taiscale will start assigning IP addresses from the new subnet. Even though ip address allocation is limited, we can't still access nodes in other subnets due to a bug5 in tailscale.

As a workaround, we can manually update the iptables to route traffic to the correct subnet.

Lets look at the iptables rules added by tailscale by stopping it and then starting it.

tailscale iptables rules

tailscale iptables rules

The highlighted rule drops any incoming packet that doesn't originate from tailscale0 interface, and source IP is 100.64.0.0/10 (100.64.0.0 to 100.127.255.255).

Let's delete this rule and add a new rule to restrict the source IP to 100.100.96.0/20 (100.100.96.1 to 100.100.111.254).

$ sudo iptables --delete ts-input --source 100.64.0.0/10 ! -i tailscale0 -j DROP
$ sudo iptables --insert ts-input 3 --source 100.100.96.0/20 ! -i tailscale0 -j DROP

tailscale iptables rules

Conclusion

By configuring tailscale to use a smaller subnet, we can avoid conflicts with existing networks. Even though there is a bug in tailscale, we can manually update iptables to route traffic to the correct subnet.

Mastering Kraken2 - Part 4 - Build FDA-ARGOS Index

Mastering Kraken2

Part 1 - Initial Runs

Part 2 - Classification Performance Optimisation

Part 3 - Build custom database indices

Part 4 - Build FDA-ARGOS index (this post)

Part 5 - Regular vs Fast Builds (upcoming)

Part 6 - Benchmarking (upcoming)

Introduction

In the previous post, we learnt how to build a custom index for Kraken2.

FDA-ARGOS1 is a popular database with quality reference genomes for diagnostic usage. Let's build an index for FDA-ARGOS.

FDA-ARGOS Kraken2 Index

FDA-ARGOS db is available at NCBI2 from which we can download the assembly file.

FDA-ARGOS NCBI

We can extract accession numbers from the assembly file and then download the genomes from these accession ids.

$ grep -e "^#" -v PRJNA231221_AssemblyDetails.txt | cut -d$'\t' -f1 > accessions.txt

$ wc accessions.txt
 1428  1428 22848 accessions.txt

$ ncbi-genome-download --section genbank --assembly-accessions accessions.txt --progress-bar bacteria --parallel 40

It took ~8 minutes to download all the genomes, and the downloaded file size is ~4GB.

We can use kraken-db-builder3 tool to build index from these genbank genome files.

# kraken-db-builder needs this to convert gbff to fasta format
$ conda install -c bioconda any2fasta

$ kraken-db-builder --genomes-dir genbank --threads 36 --db-name k2_argos

It took ~30 minutes to build the index.

Conclusion

We have built a Kraken2 index for the FDA-ARGOS database on 2024-Aug-24.

In the next post, we will look at the differences between regular and fast builds.

Midnight Coding for Narendra Modi & Ivanka Trump

GES 2017, modi trump mitra

Introduction

In 2017, GES Event was held in Hyderabad, India. Narendra Modi (the Prime Minister of India) & Ivanka Trump (daughter of the then US President Donald Trump) were the chief guests.

At that time, I was part of Invento team, and we decided to develop a new version of Mitra robot for the event.

The Challenge

We had to develop the new version of Mitra robot in a short span of time. Entire team worked day and night to meet the deadlines and finish the new version.

We went to Hyderabad from Bangalore a few days before to prepare for the event. We have cleared multiple security checks, did some demos for various people before the event.

A day before the event, around 9 PM we discovered a critical bug in the software. Due to that bug, the Robot motors were running at full speed which was dangerous. If the robot hits someone at full speed, it could cause serious injuries.

I spent a few hours debugging the issue and even tried rolling back a few versions. Still, I couldn't pinpoint the issue.

Since we need only a small set of Robot features, we decided to create a new version of the software with only limited features. I spent the next few hours creating a new release.

After that, we spent the next few hours doing extensive testing to make sure there are no bugs in the new version.

It was almost morning by the time we were done with testing. We quickly went to hotel to have some rest and get back early for the event.

Conclusion

Mitra robot welcoming Modi & Trump went very well. You can read about Balaji Viswanathan's experience at GES 2017 on Quora1.

GES 2017, modi trump mitra anand

How (and when) to use systemd timer instead of cronjob

Introduction

* * * * * bash demo.sh

Just a single line of code is sufficient to schedule a cron job. However, there are some scenarios where I find systemd timer more useful than cronjob.

How to use systemd timer

We need to create a service file(contains the script to be run) and a timer(contains the schedule).

# demo.service
[Unit]
Description=Demo service

[Service]
ExecStart=bash demo.sh
# demo.timer
[Unit]
Description=Run myscript.service every 1 minutes

[Timer]
OnBootSec=1min
OnUnitActiveSec=1min

[Install]
WantedBy=multi-user.target

We can copy these files to /etc/systemd/system/ and enable the timer.

$ sudo cp demo.service demo.timer /etc/systemd/system/

$ sudo systemctl daemon-reload

$ sudo systemctl enable --now demo.timer

We can use systemctl to see when the task is executed last and when it will be executed next.

$ sudo systemctl list-timers --all

systemd timer

Use Cases

  • Singleton - In the above example, lets say demo.sh takes ~10 minutes to run. With cron job, in ten minutes we will have 10 instances of demo.sh running. This is not ideal. With systemd timer, it will ensure only one instance of demo.sh is running at a time.

  • On demand runs - If we want to test out the script/job, systemd allows us to immediately run it with usual systemctl start demo without needing to run the script manually.

  • Timer - With cron, we can run tasks upto a minute precision. Timer can run tasks till second level precision.

[Timer]
OnCalendar=*-*-* 15:30:15

In addition to that, we can run tasks based on system events. For example, we can run a script 15 minutes from reboot.

[Timer]
OnBootSec=15min

Conclusion

Systemd timer is a powerful tool that can replace cronjob in many scenarios. It provides more control and flexibility over cronjob. However, cronjob is still a good choice for simple scheduling tasks.