Mastering Kraken2 - Part 4 - Build FDA-ARGOS Index

Chillar Anand

2024-08-24

Mastering Kraken2

Part 1 - Initial Runs

Part 2 - Classification Performance Optimisation

Part 3 - Build custom database indices

Part 4 - Build FDA-ARGOS index (this post)

Part 5 - Regular vs Fast Builds (upcoming)

Part 6 - Benchmarking (upcoming)

Introduction

In the previous post, we learnt how to build a custom index for Kraken2.

FDA-ARGOS¹ is a popular database with quality reference genomes for diagnostic usage. Let's build an index for FDA-ARGOS.

FDA-ARGOS Kraken2 Index

FDA-ARGOS db is available at NCBI² from which we can download the assembly file.

FDA-ARGOS NCBI

We can extract accession numbers from the assembly file and then download the genomes from these accession ids.

$ grep -e "^#" -v PRJNA231221_AssemblyDetails.txt | cut -d$'\t' -f1 > accessions.txt

$ wc accessions.txt
 1428  1428 22848 accessions.txt

$ ncbi-genome-download --section genbank --assembly-accessions accessions.txt --progress-bar bacteria --parallel 40

It took ~8 minutes to download all the genomes, and the downloaded file size is ~4GB.

We can use kraken-db-builder³ tool to build index from these genbank genome files.

# kraken-db-builder needs this to convert gbff to fasta format
$ conda install -c bioconda any2fasta

$ kraken-db-builder --genomes-dir genbank --threads 36 --db-name k2_argos

It took ~30 minutes to build the index.

Conclusion

We have built a Kraken2 index for the FDA-ARGOS database on 2024-Aug-24.

FDA-ARGOS Library
Kraken2 Gzipped Index file (gzip size: 2.6GB, index size: 3.8GB, md5sum: 1dd946d2e405dfec35ed3e319e9dfeac)
Kraken2 Inspect file

In the next post, we will look at the differences between regular and fast builds.

Midnight Coding for Narendra Modi & Ivanka Trump

Chillar Anand

2024-08-18

GES 2017, modi trump mitra

Introduction

In 2017, GES Event was held in Hyderabad, India. Narendra Modi (the Prime Minister of India) & Ivanka Trump (daughter of the then US President Donald Trump) were the chief guests.

At that time, I was part of Invento team, and we decided to develop a new version of Mitra robot for the event.

The Challenge

We had to develop the new version of Mitra robot in a short span of time. Entire team worked day and night to meet the deadlines and finish the new version.

We went to Hyderabad from Bangalore a few days before to prepare for the event. We have cleared multiple security checks, did some demos for various people before the event.

A day before the event, around 9 PM we discovered a critical bug in the software. Due to that bug, the Robot motors were running at full speed which was dangerous. If the robot hits someone at full speed, it could cause serious injuries.

I spent a few hours debugging the issue and even tried rolling back a few versions. Still, I couldn't pinpoint the issue.

Since we need only a small set of Robot features, we decided to create a new version of the software with only limited features. I spent the next few hours creating a new release.

After that, we spent the next few hours doing extensive testing to make sure there are no bugs in the new version.

It was almost morning by the time we were done with testing. We quickly went to hotel to have some rest and get back early for the event.

Conclusion

Mitra robot welcoming Modi & Trump went very well. You can read about Balaji Viswanathan's experience at GES 2017 on Quora¹.

GES 2017, modi trump mitra anand

Answer on Quora ↩

How (and when) to use systemd timer instead of cronjob

Chillar Anand

2024-08-05

Introduction

* * * * * bash demo.sh

Just a single line of code is sufficient to schedule a cron job. However, there are some scenarios where I find systemd timer more useful than cronjob.

How to use systemd timer

We need to create a service file(contains the script to be run) and a timer(contains the schedule).

# demo.service
[Unit]
Description=Demo service

[Service]
ExecStart=bash demo.sh

# demo.timer
[Unit]
Description=Run myscript.service every 1 minutes

[Timer]
OnBootSec=1min
OnUnitActiveSec=1min

[Install]
WantedBy=multi-user.target

We can copy these files to /etc/systemd/system/ and enable the timer.

$ sudo cp demo.service demo.timer /etc/systemd/system/

$ sudo systemctl daemon-reload

$ sudo systemctl enable --now demo.timer

We can use systemctl to see when the task is executed last and when it will be executed next.

$ sudo systemctl list-timers --all

systemd timer

Use Cases

Singleton - In the above example, lets say demo.sh takes ~10 minutes to run. With cron job, in ten minutes we will have 10 instances of demo.sh running. This is not ideal. With systemd timer, it will ensure only one instance of demo.sh is running at a time.
On demand runs - If we want to test out the script/job, systemd allows us to immediately run it with usual systemctl start demo without needing to run the script manually.
Timer - With cron, we can run tasks upto a minute precision. Timer can run tasks till second level precision.

[Timer]
OnCalendar=*-*-* 15:30:15

In addition to that, we can run tasks based on system events. For example, we can run a script 15 minutes from reboot.

[Timer]
OnBootSec=15min

Conclusion

Systemd timer is a powerful tool that can replace cronjob in many scenarios. It provides more control and flexibility over cronjob. However, cronjob is still a good choice for simple scheduling tasks.

Mastering Kraken2 - Part 3 - Build Custom Database

Chillar Anand

2024-08-01

Mastering Kraken2

Part 1 - Initial Runs

Part 2 - Classification Performance Optimisation

Part 3 - Build custom database indices (this post)

Part 4 - Build FDA-ARGOS index

Part 5 - Regular vs Fast Builds (upcoming)

Part 6 - Benchmarking (upcoming)

Introduction

In the previous post, we learned how to improve kraken2¹ classification performance. So far we have downloaded & used pre-built genome indices(databases).

In this post, let's build a custom database for kraken2. For simplicity, let's use only refseq archaea genomes² for building the index.

Building Custom Database

First, we need to download the taxonomy files. We can use the k2 script provided by kraken2.

$ k2 download-taxonomy --db custom_db

This takes ~30 minutes depending on the network speed. The taxonomy files are downloaded to the custom_db/taxonomy directory.

$ ls custom_db/taxonomy
citations.dmp  division.dmp  gencode.dmp  merged.dmp  nodes.dmp
nucl_wgs.accession2taxid delnodes.dmp  gc.prt 
images.dmp  names.dmp  nucl_gb.accession2taxid  readme.txt

$ du -hs custom_db/taxonomy
43G     custom_db/taxonomy

For simplicity, let's use the archaea refseq genomes. We can use kraken2-build to download the refseq genomes.

$ k2 download-library --library archaea --db custom_db

This runs on a single thread. Instead of using kraken2-build, we can use ncbi-genome-download³ tool to download the genomes. This provides much granular control over the download process. For example, we can download only --assembly-levels complete genomes. We can also download multiple genomes in parallel.

$ pip install ncbi-genome-download

$ conda install -c bioconda ncbi-genome-download

$ ncbi-genome-download -s refseq -F fasta --parallel 40 -P archaea
Checking assemblies: 100%|███| 2184/2184 [00:19<00:00, 111.60entries/s]
Downloading assemblies: 100%|███| 2184/2184  [02:04<00:00,  4.54s/files]
Downloading assemblies: 2184files [02:23, 2184files/s]

In just 2 minutes, it has downloaded all the files. Lets gunzip the files.

$ find refseq -name "*.gz" -print0 | parallel -0 gunzip

$ du -hs refseq
5.9G    refseq

Lets add all fasta genome files to the custom database

$ time find refseq -name "*.fna" -exec kraken2-build --add-to-library {} --db custom_db \;
667.46s user 90.78s system 106% cpu 12:54.80 total

kraken2-build doesn't use multiple threads for adding genomes to the database. In addition to that, it also doesn't check if the genome is already present in the database.

Let's use k2 for adding genomes to the database.

export KRAKEN_NUM_THREADS=40

$ find . -name "*.fna" -exec k2 add-to-library --files {} --db custom_db \;
668.37s user 88.44s system 159% cpu 7:54.40 total

This took only half the time compared to kraken2-build.

Let's build the index from the library.

$ time kraken2-build --db custom_db --build --threads 36
Creating sequence ID to taxonomy ID map (step 1)...
Found 0/125783 targets, searched through 60000000 accession IDs...
Found 59923/125783 targets, searched through 822105735 accession IDs, search complete.
lookup_accession_numbers: 65860/125783 accession numbers remain unmapped, see unmapped.txt in DB directory
Sequence ID to taxonomy ID map complete. [2m1.950s]
Estimating required capacity (step 2)...
Estimated hash table requirement: 5340021028 bytes
Capacity estimation complete. [23.875s]
Building database files (step 3)...
Taxonomy parsed and converted.
CHT created with 11 bits reserved for taxid.
Completed processing of 59911 sequences, 3572145823 bp
Writing data to disk...  complete.
Database files completed. [12m3.368s]
Database construction complete. [Total: 14m29.666s]
kraken2-build --db custom_db --build --threads 36  24534.98s user 90.50s system 2831% cpu 14:29.75 total

$ ls -ll
.rw-rw-r-- 5.3G anand  1 Aug 16:35 hash.k2d
drwxrwxr-x    - anand  1 Aug 12:32 library
.rw-rw-r--   64 anand  1 Aug 16:35 opts.k2d
.rw-rw-r-- 1.5M anand  1 Aug 16:22 seqid2taxid.map
.rw-rw-r-- 115k anand  1 Aug 16:23 taxo.k2d
lrwxrwxrwx   20 anand  1 Aug 12:31 taxonomy
.rw-rw-r-- 1.2M anand  1 Aug 16:22 unmapped.txt

We are able to build index for ~6GB input files in ~15 minutes.

Conclusion

We learnt some useful tips to speed up the custom database creation process. In the next post, we will learn about regular vs. fast builds.

Mastering Kraken2 - Part 2 - Performance Optimisation

Chillar Anand

2024-07-28

Mastering Kraken2

Part 1 - Initial Runs

Part 2 - Classification Performance Optimisation (this post)

Part 3 - Build custom database indices

Part 4 - Build FDA-ARGOS index

Part 5 - Regular vs Fast Builds (upcoming)

Part 6 - Benchmarking (upcoming)

Introduction

In the previous post, we learned how to set up kraken2¹, download pre-built indices, and run kraken2. In this post, we will learn various ways to speed up the classification process.

Increasing RAM

Kraken2 standard database is ~80GB in size. It is recommended to have at least db size RAM to run kraken2 efficiently². Let's use 128GB RAM machine and run kraken2 with ERR10359977³ sample.

$ time kraken2 --db k2_standard --report report.txt ERR10359977.fastq.gz > output.txt
Loading database information... done.
95064 sequences (14.35 Mbp) processed in 2.142s (2662.9 Kseq/m, 402.02 Mbp/m).
  94816 sequences classified (99.74%)
  248 sequences unclassified (0.26%)
kraken2 --db k2_standard --report report.txt ERR10359977.fastq.gz >   1.68s user 152.19s system 35% cpu 7:17.55 total

Now the time taken has come down from 40 minutes to 7 minutes. The classification speed has also increased from 0.19 Mbp/m to 402.02 Mbp/m.

The previous sample had only a few reads, and the speed is not a good indicator. Let's run kraken2 with a larger sample.

$ time kraken2 --db k2_standard --report report.txt --paired SRR6915097_1.fastq.gz SRR6915097_2.fastq.gz > output.txt
Loading database information... done.
Processed 14980000 sequences (2972330207 bp) ...
17121245 sequences (3397.15 Mbp) processed in 797.424s (1288.2 Kseq/m, 255.61 Mbp/m).
  9826671 sequences classified (57.39%)
  7294574 sequences unclassified (42.61%)
kraken2 --db k2_standard --report report.txt --paired > output.txt  526.39s user 308.24s system 68% cpu 20:23.86 total

This took almost 20 minutes to classify ~3 Gbp of data. Out of 20 minutes, 13 minutes was spent in classification. The remaining time in loading the db into memory.

Let's use k2_plusPF⁴ db, which is twice the size of k2_standard and run kraken2.

$ time kraken2 --db k2_plusfp --report report.txt --paired SRR6915097_1.fastq.gz SRR6915097_2.fastq.gz > output.txt
Loading database information...done.
17121245 sequences (3397.15 Mbp) processed in 755.290s (1360.1 Kseq/m, 269.87 Mbp/m).
  9903824 sequences classified (57.85%)
  7217421 sequences unclassified (42.15%)
kraken2 --db k2_plusfp/ --report report.txt --paired SRR6915097_1.fastq.gz  >   509.71s user 509.51s system 55% cpu 30:35.49 total

This took ~30 minutes to complete, but the classification took only 13 minutes similar to k2_standard. The remaining time was spent in loading the db into memory.

Preloading db into RAM

We can use vmtouch⁵ to preload db into RAM. kraken2 provides --memory-mapping option to use preloaded db.

$ vmtouch -vt k2_standard/hash.k2d k2_standard/opts.k2d k2_standard/taxo.k2d
           Files: 3
     Directories: 0
   Touched Pages: 20382075 (77G)
         Elapsed: 434.77 seconds

When Linux requires RAM, it will incrementally evict the db from memory. To prevent this, we can copy the db to shared memory (/dev/shm) and then use vmtouch to preload the db.

$ cp -r k2_standard /dev/shm

$ vmtouch -t /dev/shm/*.k2d

Now, let's run kraken2 with --memory-mapping option.

$ time kraken2 --db k2_standard --report report.txt --memory-mapping --paired SRR6915097_1.fastq.gz SRR6915097_2.fastq.gz > output.txt
Loading database information... done.
17121245 sequences (3397.15 Mbp) processed in 532.486s (1929.2 Kseq/m, 382.79 Mbp/m).
  9826671 sequences classified (57.39%)
  7294574 sequences unclassified (42.61%)
  kraken2 --db k2_standard --report report.txt --paired SRR6915097_1.fastq.gz   >  424.20s user 11.76s system 81% cpu 8:54.98 total

Now the classification took only ~10 minutes.

Multi threading

kraken2 supports multiple threads. I am using a machine with 40 threads.

$ time kraken2 --db k2_standard --report report.txt --paired SRR6915097_1.fastq.gz SRR6915097_2.fastq.gz --memory-mapping --threads 32 > output.txt
Loading database information... done.
17121245 sequences (3397.15 Mbp) processed in 71.675s (14332.5 Kseq/m, 2843.81 Mbp/m).
  9826671 sequences classified (57.39%)
  7294574 sequences unclassified (42.61%)
kraken2 --db k2_standard --report report.txt --paired SRR6915097_1.fastq.gz      556.58s user 22.85s system 762% cpu 1:16.02 total

With 32 threads, the classification took only 1 minute. Beyond 32 threads, the classification time did not decrease significantly.

Optimising input files

So far we have used gzipped input files. Let's use unzipped input files and run kraken2.

$ gunzip SRR6915097_1.fastq.gz
$ gunzip SRR6915097_2.fastq.gz

$ time kraken2 --db k2_standard --report report.txt --paired SRR6915097_1.fastq SRR6915097_2.fastq --memory-mapping --threads 30 > output.txt
Loading database information... done.
17121245 sequences (3397.15 Mbp) processed in 34.809s (29512.0 Kseq/m, 5855.68 Mbp/m).
  9826671 sequences classified (57.39%)
  7294574 sequences unclassified (42.61%)
kraken2 --db k2_standard --report report.txt --paired SRR6915097_1.fastq    30   565.03s user 17.12s system 1530% cpu 38.047 total

Now the classification time has come down to 40 seconds.

Since the input fastq files are paired, interleaving the files also takes time. Lets interleave the files and run kraken2.

To interleave the files, lets use seqfu tool.

$ conda install -y -c conda-forge -c bioconda "seqfu>1.10"

$ seqfu interleave -1 SRR6915097_1.fastq.gz -2 SRR6915097_2.fastq.gz > SRR6915097.fastq

$ time kraken2 --db k2_standard --report report.txt --memory-mapping SRR6915097.fq --threads 32 > output.txt
Loading database information... done.
34242490 sequences (3397.15 Mbp) processed in 20.199s (101714.1 Kseq/m, 10090.91 Mbp/m).
  17983321 sequences classified (52.52%)
  16259169 sequences unclassified (47.48%)
kraken2 --db k2_standard --report report.txt --memory-mapping SRR6915097.fq  32  618.96s user 18.24s system 2653% cpu 24.013 total

Now the classification time has come down to 24 seconds.

Conclusion

In terms of classification speed, we have come a long way from 0.1 Mbp/m to 1200 Mbp/m. In the next post, we will learn how to optimise the creation of custom indices.

Mastering Kraken2 - Part 1 - Initial Runs

Chillar Anand

2024-07-28

Mastering Kraken2

Part 1 - Initial Runs (this post)

Part 2 - Classification Performance Optimisation

Part 3 - Build custom database indices

Part 4 - Build FDA-ARGOS index (this post)

Part 5 - Regular vs Fast Builds (upcoming)

Part 6 - Benchmarking (upcoming)

Introduction

Kraken2¹ is widely used for metagenomics taxonomic classification, and it has pre-built indexes for many organisms. In this series, we will learn

How to set up kraken2, download pre-built indices
Run kraken2 (8GB RAM) at ~0.19 Mbp/m (million base pairs per minute)
Learn various ways to speed up the classification process
Run kraken2 (128GB RAM) at ~1200 Mbp/m
Build custom indices

Installation

We can install kraken2 from source using the install_kraken2.sh script as per the manual².

$ git clone https://github.com/DerrickWood/kraken2
$ cd kraken2
$ ./install_kraken2.sh /usr/local/bin
# ensure kraken2 is in the PATH
$ export PATH=$PATH:/usr/local/bin

If you already have conda installed, you can install kraken2 from conda as well.

$ conda install -c bioconda kraken2

If you have brew installed on Linux or Mac(including M1), you can install kraken2 using brew.

$ brew install brewsci/bio/kraken2

Download pre-built indices

Building kraken2 indices take a lot of time and resources. For now, let's download and use the pre-built indices. In the final post, we will learn how to build the indices.

Genomic Index Zone³ provides pre-built indices for kraken2. Let's download the standard database. It contains Refeq archaea, bacteria, viral, plasmid, human1, & UniVec_Core.

$ wget https://genome-idx.s3.amazonaws.com/kraken/k2_standard_20240605.tar.gz
$ mkdir k2_standard
$ tar -xvf k2_standard_20240605.tar.gz -C k2_standard

The extracted directory contains three files - hash.k2d, opts.k2d, taxo.k2d which are the kraken2 database files.

$ ls -l *.k2d
.rw-r--r--  83G anand 13 Jul 12:34 hash.k2d
.rw-r--r--   64 anand 13 Jul 12:34 opts.k2d
.rw-r--r-- 4.0M anand 13 Jul 12:34 taxo.k2d

Classification

To run the taxonomic classification, let's use ERR10359977 human gut meta genome from NCBI SRA.

$ wget https://ftp.sra.ebi.ac.uk/vol1/fastq/ERR103/077/ERR10359977/ERR10359977.fastq.gz
$ kraken2 --db k2_standard --report report.txt ERR10359977.fastq.gz > output.txt

By default, the machine I have used has 8GB RAM and an additioinal 8GB swap. Since kraken2 needs entire db(~80GB) in memory, when the process tries to consume more than 16GB memory, the kernel will kill the process.

$ time kraken2 --db k2_standard --paired SRR6915097_1.fastq.gz SRR6915097_2.fastq.gz > output.txt
Loading database information...Command terminated by signal 9
0.02user 275.83system 8:17.43elapsed 55%CPU

To prevent this, let's increase the swap space to 128 GB.

# Create an empty swapfile of 128GB
sudo dd if=/dev/zero of=/swapfile bs=1G count=128

# Turn swap off - It might take several minutes
sudo swapoff -a

# Set the permissions for swapfile
sudo chmod 0600 /swapfile

# make it a swap area
sudo mkswap /swapfile  

# Turn the swap on
sudo swapon /swapfile

We can time the classification process using the time command.

$ time kraken2 --db k2_standard --report report.txt ERR10359977.fastq.gz > output.txt

If you have a machine with large RAM, the same scenario can be simulated using systemd-run. This will limit the memory usage of kraken2 to 6.5GB.

$ time systemd-run --scope -p MemoryMax=6.5G --user time kraken2 --db k2_standard --report report.txt ERR10359977.fastq.gz > output.txt

Depending on the CPU performance, this will take around ~40 minutes to complete.

Loading database information... done.
95064 sequences (14.35 Mbp) processed in 1026.994s (5.6 Kseq/m, 0.84 Mbp/m).
  94939 sequences classified (99.87%)
  125 sequences unclassified (0.13%)
  4.24user 658.68system 38:26.78elapsed 28%CPU

If we try gut WGS(Whole Genome Sequence) sample like SRR6915097 ⁴⁵. which contains ~3.3 Gbp, it will take weeks to complete.

$ wget -c https://ftp.sra.ebi.ac.uk/vol1/fastq/SRR691/007/SRR6915097/SRR6915097_1.fastq.gz
$ wget -c https://ftp.sra.ebi.ac.uk/vol1/fastq/SRR691/007/SRR6915097/SRR6915097_2.fastq.gz

$ time systemd-run --scope -p MemoryMax=6G --user time kraken2 --db k2_standard --paired SRR6915097_1.fastq.gz SRR6915097_2.fastq.gz > output.txt

I tried running this on 8 GB machine. Even after 10 days, it processed only 10% of the data.

If we have to process a large number of such samples, it takes months and this is not a practical solution.

Conclusion

In this post, we ran kraken2 on an 8GB machine and learned that it is not feasible to run kraken2 on large samples.

In the next post, we will learn how to speed up the classification process and run classification at 1200 Mbp/m.

Next: Part 2 - Performance Optimisation

Headlamp - k8s Lens open source alternative

Chillar Anand

2024-06-24

headlamp - Open source Kubernetes Lens alternator

Since Lens is not open source, I tried out monokle, octant, k9s, and headlamp¹. Among them, headlamp UI & features are closest to Lens.

Headlamp

Headlamp is CNCF sandbox project that provides cross-platform desktop application to manage Kubernetes clusters. It auto-detects clusters and provides cluster wide resource usage by default.

It can also be installed inside the cluster and can be accessed using a web browser. This is useful when we want to access the cluster from a mobile device.

$ helm repo add headlamp https://headlamp-k8s.github.io/headlamp/

$ helm install headlamp headlamp/headlamp

Lets port-forward the service & copy the token to access it.

$ kubectl create token headlamp

# we can do this via headlamp UI as well
$ kubectl port-forward service/headlamp 8080:80

Now, we can access the headlamp UI at http://localhost:8080.

headlamp - Open source Kubernetes Lens alternator

Conclusion

If you are looking for an open source alternative to Lens, headlamp is a good choice. It provides a similar UI & features as Lens, and it is accessible via mobile devices as well.

https://headlamp.dev/ ↩

macOS - Log & track historical CPU, RAM usage

Chillar Anand

2024-06-01

macOS - Log CPU & RAM history

In macOS, we can use inbuilt Activity Monitor or third party apps like Stats to check the live CPU/RAM usage. But, we can't track the historical CPU & memory usage. sar, atop can track the historical CPU & memory usage. But, they are not available for macOS.

Netdata

Netdata¹ is an open source observability tool that can monitor CPU, RAM, network, disk usage. It can also track the historical data.

Unfortunately, it is not stable on macOS. I tried installing it on multiple macbooks, but it didn't work. I raised an issue² on their GitHub repository and the team mentioned that macOS is a low priority for them.

Glances

Glances³ is a cross-platform monitoring tool that can monitor CPU, RAM, network, disk usage. It can also track the historical data.

We can install it using Brew or pip.

$ brew install glances

$ pip install glances

Once it is installed, we can monitor the resource usage using the below command.

$ glances

macOS - Log CPU & RAM history

Glances can log historical data to a file using the below command.

$ glances --export-csv /tmp/glances.csv

In addition to that, it can log data to services like influxdb, prometheus, etc.

Let's install influxdb and export stats to it.

$ brew install influxdb
$ brew services start influxdb
$ influx setup

$ python -m pip install influxdb-client

$ cat glances.conf
[influxdb]
host=localhost
port=8086
protocol=http
org=avilpage
bucket=glances
token=secret_token

$ glances --export-influxdb -C glances.conf

We can view stats in the influxdb from Data Explorer web UI at http://localhost:8086.

macOS - Log CPU & RAM history

Glances provides a prebuilt Grafana dashboard⁴ that we can import to visualize the stats.

From Grafana -> Dashboard -> Import, we can import the dashboard using the above URL.

macOS - Log CPU & RAM history

Conclusion

In addition to InfluxDB, Glances can export data to ~20 services. So far, it is the best tool to log, track and view historical CPU, RAM, network and disk usage in macOS. The same method works for Linux and Windows as well.

Automating Zscaler Connectivity on macOS

Chillar Anand

2024-05-14

Introduction

Zscaler is a cloud-based security service that provides secure internet access via VPN. Unfortunately, Zscaler does not provide a command-line interface to connect to the VPN. We can't use AppleScript to automate the connectivity as well.

Automating Zscaler Connectivity

Once Zscaler is installed on macOS, if we search for LaunchAgents & LaunchDaemons directories, we can find the Zscaler plist files.

$ sudo find /Library/LaunchAgents -name '*zscaler*'
/Library/LaunchAgents/com.zscaler.tray.plist


$ sudo find /Library/LaunchDaemons -name '*zscaler*'
/Library/LaunchDaemons/com.zscaler.service.plist
/Library/LaunchDaemons/com.zscaler.tunnel.plist
/Library/LaunchDaemons/com.zscaler.UPMServiceController.plist

To connect to Zscaler, we can load these services.

#!/bin/bash

/usr/bin/open -a /Applications/Zscaler/Zscaler.app --hide
sudo find /Library/LaunchAgents -name '*zscaler*' -exec launchctl load {} \;
sudo find /Library/LaunchDaemons -name '*zscaler*' -exec launchctl load {} \;

To disconnect from Zscaler, we can unload all of them.

#!/bin/bash

sudo find /Library/LaunchAgents -name '*zscaler*' -exec launchctl unload {} \;
sudo find /Library/LaunchDaemons -name '*zscaler*' -exec launchctl unload {} \;

To automatically toggle the connectivity, we can create a shell script.

#!/bin/bash

if [[ $(pgrep -x Zscaler) ]]; then
    echo "Disconnecting from Zscaler"
    sudo find /Library/LaunchAgents -name '*zscaler*' -exec launchctl unload {} \;
    sudo find /Library/LaunchDaemons -name '*zscaler*' -exec launchctl unload {} \;
else
    echo "Connecting to Zscaler"
    /usr/bin/open -a /Applications/Zscaler/Zscaler.app --hide
    sudo find /Library/LaunchAgents -name '*zscaler*' -exec launchctl load {} \;
    sudo find /Library/LaunchDaemons -name '*zscaler*' -exec launchctl load {} \;
fi

Raycast is an alternative to default spotlight search on macOS. We can create a script to toggle connectivity to Zscaler.

#!/bin/bash

# Required parameters:
# @raycast.schemaVersion 1
# @raycast.title toggle zscaler
# @raycast.mode silent

# Optional parameters:
# @raycast.icon ☁️

# Documentation:
# @raycast.author chillaranand
# @raycast.authorURL https://avilpage.com/

if [[ $(pgrep -x Zscaler) ]]; then
    echo "Disconnecting from Zscaler"
    sudo find /Library/LaunchAgents -name '*zscaler*' -exec launchctl unload {} \;
    sudo find /Library/LaunchDaemons -name '*zscaler*' -exec launchctl unload {} \;
else
    echo "Connecting to Zscaler"
    /usr/bin/open -a /Applications/Zscaler/Zscaler.app --hide
    sudo find /Library/LaunchAgents -name '*zscaler*' -exec launchctl load {} \;
    sudo find /Library/LaunchDaemons -name '*zscaler*' -exec launchctl load {} \;
fi

Save this script to a folder. From Raycast Settings -> Extensions -> Add Script Directory, we can select this folder, and the script will be available in Raycast.

raycast-connect-toggle

We can assign a shortcut key to the script for quick access.

raycast-connect-toggle

Conclusion

Even though Zscaler does not provide a command-line interface, we can automate the connectivity using the above scripts.

Screen Time Alerts from Activity Watch

Chillar Anand

2024-05-01

Introduction

Activity Watch¹ is a cross-platform open-source time-tracking tool that helps us to track time spent on applications and websites.

Activity Watch

At the moment, Activity Watch doesn't have any feature to show screen time alerts. In this post, we will see how to show screen time alerts using Activity Watch.

Python Script

Activity Watch provides an API to interact with the Activity Watch server. We can use the API to get the screen time data and show alerts.

import json
import os
from datetime import datetime

import requests


def get_nonafk_events(timeperiods=None):
    headers = {"Content-type": "application/json", "charset": "utf-8"}
    query = """afk_events = query_bucket(find_bucket('aw-watcher-afk_'));
window_events = query_bucket(find_bucket('aw-watcher-window_'));
window_events = filter_period_intersect(window_events, filter_keyvals(afk_events, 'status', ['not-afk']));
RETURN = merge_events_by_keys(window_events, ['app', 'title']);""".split("\n")
    data = {
        "timeperiods": timeperiods,
        "query": query,
    }
    r = requests.post(
        "http://localhost:5600/api/0/query/",
        data=bytes(json.dumps(data), "utf-8"),
        headers=headers,
        params={},
    )
    return json.loads(r.text)[0]


def main():
    now = datetime.now()
    timeperiods = [
        "/".join([now.replace(hour=0, minute=0, second=0).isoformat(), now.isoformat()])
    ]
    events = get_nonafk_events(timeperiods)

    total_time_secs = sum([event["duration"] for event in events])
    total_time_mins = total_time_secs / 60
    print(f"Total time: {total_time_mins} seconds")
    hours, minutes = divmod(total_time_mins, 60)
    minutes = round(minutes)
    print(f"Screen Time: {hours} hours {minutes} minutes")

    # show mac notification
    os.system(f"osascript -e 'display notification \"{hours} hours {minutes} minutes\" with title \"Screen TIme\"'")


if __name__ == "__main__":
    main()

This script² will show the screen time alerts using the Activity Watch API. We can run this script using the below command.

$ python screen_time_alerts.py

Screen Time Alerts

We can set up a cron job to run this script every hour to show screen time alerts.

$ crontab -e
0 * * * * python screen_time_alerts.py

We can also modify the script to show alerts only when the screen time exceeds a certain limit.

Conclusion

Since Actvity Watch is open-source and provides an API, we can extend its functionality to show screen time alerts. We can also use the API to create custom reports and dashboards.