Periodically Launch an App in Background

I recently started using Outlook app on my Mac. If the app is closed, it won't send any notifications. When I accidentally close the app, until I re-open it, I won't get any notifications.

I want to ensure that it starts periodically so that I don't miss any notifications for meetings.

After trying out various methods, I ended up using open command with cron to launch the app every 15 minutes.

$ crontab -e
*/15 * * * * /usr/bin/open -a "Microsoft Outlook"

This will launch the app every 15 minutes. This is inconvenient as it will bring Outlook to foreground every 15 minutes.

To avoid this, I passed -g option to run it in background.

$ crontab -e
*/15 * * * * /usr/bin/open -g -a "Microsoft Outlook"

This silently launches the app in background without causing any disturbance. Since the app is running the background, it will send notifications for any meetings.

This will ensure that I don't miss any meetings, even if I close outlook accidentally.

Rearrange CSV columns alphabetically from CLI

We can use tools like KDiff3 to compare two CSV files. But, it is difficult to identify the diff when the columns are not in the same order.

For example, look at the below output of 2 simple csv files.

kdiff3-csv-compare

Even though it highlights the diff, it is difficult to identify the diff because the columns are not in the same order. Here is the same diff after rearranging the columns alphabetically.

kdiff3-csv-compare-sorted

Now, it is easy to identify the diff.

Rearrange CSV columns alphabetically

We can write a simple python script using Pandas1 as follows.

#! /usr/bin/env python3

"""
re-arrange columns in alphabetical order
"""
import sys

import pandas as pd


def colsort(df):
    cols = list(df.columns)
    cols.sort()
    return df[cols]


def main():
    input_file = sys.argv[1]
    try:
        output_file = sys.argv[2]
    except IndexError:
        output_file = input_file
    df = pd.read_csv(input_file)
    df = colsort(df)
    df.to_csv(output_file, index=False)


if __name__ == '__main__':
    main()

We can use this script as follows.

$ python3 rearrange_csv_columns.py input.csv output.csv

Instead of writing a script by ourselves, we can use miller2 tool. Miller can perform various operations on CSV files. We can use sort-within-records to sort the columns.

$ mlr --csv sort-within-records -f input.csv > output.csv

Conclusion

We can use miller to sort the columns in a CSV file. This will help us to identify the diff easily when comparing two CSV files.

Train LLMs with Custom Dataset on Laptop

Problem Statement

I want to train a Large Language Model(LLM)1 with some private documents and query various details.

Journey

There are open-source available LLMs like Vicuna, LLaMa, etc which can be trained on custom data. However, training these models on custom data is not a trivial task.

After trying out various methods, I ended up using privateGPT2 which is quite easy to train on custom documents. There is no need to format or clean up the data as privateGPT can directly consume documents in many formats like txt, html, epub, pdf, etc.

Training

First, let's clone the repo, install requirements.txt and download the default model.

$ git clone https://github.com/imartinez/privateGPT
$ cd privateGPT
$ pip3 install -r requirements.txt
$ wget https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin

$ cp example.env .env
$ cat .env
MODEL_TYPE=GPT4All
MODEL_PATH=ggml-gpt4all-j-v1.3-groovy.bin

I have sourced all documents and kept them in a folder called docs. Let's ingest(train) the data.

$ cp ~/docs/* source_documents

$ python ingest.py

This will take a while depending on the number of documents we have. Once the ingestion is done, we can start querying the model.

$ python privateGPT.py
Enter a query: Summarise about Gaaliveedu

The default GPT4All-J v1.3-groovy3 model doesn't provide good results. We can easily swap it with LlamaCpp4. Lets download the model and convert it.

$ git clone https://huggingface.co/openlm-research/open_llama_13b

$ git clone https://github.com/ggerganov/llama.cpp.git
$ cd llama.cpp
$ python convert.py ../open_llama_13b
Wrote ../open_llama_13b/ggml-model-f16.bin

We can now update the .env file to use the new model and start querying again.

$ cat .env
MODEL_TYPE=LlamaCpp
MODEL_PATH=/path/to/ggml-model-f16.bin

$ python privateGPT.py
Enter a query: Summarise about Gaaliveedu

Conclusion

This makes it easy to build domain-specific LLMs and use them for various tasks. I have used this to build a chatbot for my internal docs and it is working well.

Remote Debug Docker Container with PyCharm

Problem Statement

How to debug a Python application running inside a Docker container that is launched by a third-party process using PyCharm?

Solution

  • Install the pydevd-pycharm package in the Docker image.
RUN pip install 'pydevd-pycharm~=222.4554.11'
  • Add the following lines to the Python script that you want to debug.
import pydevd_pycharm
pydevd_pycharm.settrace('host.docker.internal', port=12345, stdoutToServer=True, stderrToServer=True)
  • Create a new Python Remote Debug configuration in PyCharm with the following settings.

PyCharm Remote Debug Configuration

  • Run the Remote Debug configuration in PyCharm.

  • Run the Docker container with the following command or let a shell script or another package run the container.

$ docker build . -t flask_web
$ docker run --rm flask_web

Explanation

The pydevd-pycharm package is a Python debugger that can be used to debug a Python application running inside a Docker container. The pydevd_pycharm.settrace() function is used to connect the debugger to the PyCharm IDE. The host.docker.internal is the hostname of the host machine from inside the Docker container. The port is the port number that is used to connect to the PyCharm IDE. The stdoutToServer and stderrToServer are used to redirect the standard output and standard error to the PyCharm IDE.

Gotchas

  • You might face the following error depending on the version of the pydevd-pycharm package.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/flask/cli.py", line 218, in locate_app
    __import__(module_name)
  File "/app/app.py", line 5, in <module>
    import pydevd_pycharm
  File "/usr/local/lib/python3.10/site-packages/pydevd_pycharm.py", line 3, in <module>
    from pydevd import settrace
  File "/usr/local/lib/python3.10/site-packages/pydevd.py", line 41, in <module>
    from _pydevd_bundle import pydevd_utils
  File "/usr/local/lib/python3.10/site-packages/_pydevd_bundle/pydevd_utils.py", line 24, in <module>
    from _pydevd_asyncio_util.pydevd_asyncio_utils import eval_async_expression_in_context
ModuleNotFoundError: No module named '_pydevd_asyncio_util'

There seems to be an issue with all 223.*.* versions. The solution is to use the 222.*.* version.

  • You might face ConnectionRefused error when running the docker container.
  File "/usr/local/lib/python3.10/site-packages/pydevd.py", line 1758, in _locked_settrace
    debugger.connect(host, port)  # Note: connect can raise error.
  File "/usr/local/lib/python3.10/site-packages/pydevd.py", line 660, in connect
    s = start_client(host, port)
  File "/usr/local/lib/python3.10/site-packages/_pydevd_bundle/pydevd_comm.py", line 463, in start_client
    s.connect((host, port))
ConnectionRefusedError: [Errno 111] Connection refused

Ensure that you have started the Remote Debug configuration in PyCharm before running the docker container.

Mastering HammerSpoon - Excel Automation

Introduction

Recently, I have been using Excel a lot. When opening a new Excel file, I have to do the following:

  1. Maximize the window
  2. Select all columns and fit them to its width
  3. Apply filters to all columns
  4. Freeze the first row

When opening and closing multiple Excel files, this becomes a tedious task. So, I decided to automate this and came across Hammerspoon.

HammerSpoon

Hammerspoon1 is a powerful automation tool for macOS. It allows you to write Lua scripts to automate various tasks and make our keybindings.

First, let's install Hammerspoon using Homebrew.

$ brew install hammerspoon

We can write our automation script in ~/.hammerspoon/init.lua file. Let us see how we can automate the above tasks.

Automating Excel

-- excel
function excel(appObject)
   local win = hs.window.focusedWindow()
   if (not win) then
      return
   end
    win:maximize()

    appObject:selectMenuItem({"Edit", "Select all"})
    appObject:selectMenuItem({"Format", "Column", "Autofit Selection"})
    appObject:selectMenuItem({"Data", "Auto-filter"})

end)


function applicationWatcher(appName, eventType, appObject)
   local w = hs.application.watcher
   if (eventType == w.activated or eventType == w.launched) then
      if (appName == "Microsoft Excel") then
         excel(appObject)
      end
   end
end

This script will watch for application events and when Excel is launched or activated, it will call the excel function.

The excel function will maximize the window, select all columns and fit them to it's width, apply filters to all columns.

Free top row option is not available in the standard menu. So, I have added it to the quick access toolbar and click it via mouse event.

Conclusion

Hammerspoon is a powerful tool for various automation tasks. In addition to that it can replace a lot of utility apps like CheatSheet, BlueSnooze2, Rectangle, ShiftIT3, HotKey etc. I have replaced most of the utility apps with Hammerspoon, and it is working great. I will be writing about it in detail in the upcoming posts.

Record Resource Usage of Single Process

Introduction

On Linux & Mac, we can use an inbuilt top command line tool to monitor the resource usage of a single process in real time.

# On Linux, for a given pid
$ top -p 1234

# On Mac, for a given pid
$ top -pid 1234

In this article, we will see how to record and plot resource usage of a single process using top and a Python package called psrecord1.

Record Resource Usage

In some cases, we need to record the resource usage of a process to use it later. For example, we can use this data to find out the peak resource usage of a process. For this, we can use top to log resource usage into a text file.

# On Linux, for a given pid
$ top -p 1234 -b -d 1 > top.log

# On Mac, for a given pid
$ top -l 0 -s 1 -pid 32515 | awk 'NR%13==0; fflush(stdout)' > top.log

Once we have the log file, we can view the raw data or we can plot the resource usage by using tools like gnuplot or matplotlib.

Instead of using top command, we can use psrecord to record the resource usage of a process. psrecord is a Python package that can be installed all using pip.

$ python -m pip install psrecord

Once installed, we can use psrecord to record the resource usage of a process.

# record resource usage of a process with pid 1234
$ psrecord 1234 --log top.log

# start and record resource usage of a process
$ psrecord python script.py --plot graph.png

We can view the raw data in the log file.

# view raw data
$ head top.log
$ head a.txt
# Elapsed time   CPU (%)     Real (MB)   Virtual (MB)
       0.000        0.000        5.000   399461.438
       0.000       93.700        5.000   399461.438
       0.000       96.300        5.000   399461.438
       0.000       91.900        5.000   399461.438

Here is the generated graph.

single-proc-resource

Conclusion

In this article, we have seen how to record and plot resource usage of a single process using top(inbuilt tool), psrecord(3rd party package).

Reducing System Load With ChatGPT

Problem Statement

I am using M1 Macbook Air for Python development purposes. Since M1 uses ARM architecture, many Python packages don't have wheels for ARM64/aarch64. confluent-kafka-python is one of them.

I had to run AMD64 docker container to use confluent-kafka-python. Since it is a cross-architecture container, its CPU usage is too high and performance was too slow.

Solution

To reduce system load, I decided to build aarch64 wheels for confluent-kafka-python. I looked at open issues on GitHub and asked maintainers how to build aarch64 wheels. There was no response1 from them.

As a workaround, I asked ChatGPT2 on how to build confluent-kafka-python aarch64 wheels in a docker container.

chatgpt-reduce-system-load

This initial suggestion didn't work as confluent-kafka-python depends on librdkafka which is a C library. I had to build librdkafka from source for aarch64 and then build confluent-kafka-python from source.

To build librdkafka from the source, I again asked ChatGPT. After making minor changes to the snippet suggested by ChatGPT, I was able to build librdkafka from the source for aarch64.

Here is the final snippet:

FROM ubuntu:22.04

ARG DEBIAN_FRONTEND=noninteractive

RUN apt update && apt install -y \
  wget git curl g++ make postgresql-client \
  nano less shared-mime-info openjdk-17-jre-headless \
  libpq-dev vim tzdata python3 python3-dev

RUN apt install -y python3-pip
RUN python3 -m pip install setuptools

WORKDIR /
RUN git clone https://github.com/confluentinc/confluent-kafka-python
WORKDIR confluent-kafka-python

COPY . /app
WORKDIR /app
RUN ./configure --arch=aarch64 --prefix=/usr
RUN make
RUN make install

WORKDIR /confluent-kafka-python
RUN python3 setup.py install

Conclusion

By running native containers, I was able to reduce the system load by ~50%. With ChatGPT, it is easy to build/tweak programs in languages & environments that we are not familiar with.

Automator Quick Action for KDiff3 in Finder

The need for quick action

kdiff31 is a diff & merge tool that compares multiple files/directories and shows the difference line by line and character by character as shown below.

mac-finder-kdiff3

In Windows, when we select multiple files/directories and right click on them, it will show the option to compare selected items with kdiff3.

mac-finder-kdiff3-windows

However, in Macbook, it doesn't show this option. In this tutorial, let us see how we can create the same quick action in the right-click menu when we right-click on the files/directories.

Creating Quick Action

Let us open Automator2, create new file and select Quick Action.

mac-finder-automator

On the left side select Utilities and then select Run Shell Script.

For Workflow receives current, select files or folders and then select in Finder.

mac-finder-quick-action

Then select pass input as agruments and in the script section let us add the following command.

/path/to/kdiff3 $1 $2

After adding the command, save this Quick Action.

Now if we relaunch Finder app and then select multiple directories, and right click we can see Compare with KDiff3 in quick actions.

mac-finder-kdiff3

Conclusion

Even though we can use the command line to compare the files/directories, it is always good to have a quick action in the right-click menu.