Cross Platform File Explorer in 50 lines of code

In an earlier post, I wrote about why I need a "line count" column in file explorer and how I wrote a Lua script to see it in xplr file manager.

xplr has only terminal interface. It is hard for non-developers to use it. I wanted a small team to use this feature so that it will save several hours of their time. So I decided to write a cross-platform GUI app.

GUI app

Since I am familiar with PySimpleGUI, I decided to write a simple file explorer using it.

Cross Platform File Explorer

As seen in the above screenshot, the file explorer has a "Line Count" column. It is a simple Python script with ~50 lines of code.

The project is open source and source code is available at github.com/AvilPage/LCFileExplorer.

Cross Platform

A new user can't directly run this Python script on his machine unless Python is already installed. Even if Python is installed, he has to install the required packages and run it. This requires technical expertise.

To make it easy for non-tech users to run this program, I decided to use PyInstaller to create a single executable file for each platform.

I created a GitHub action to build the executable files for Windows, Linux, and macOS. The action is triggered on every push to the master branch. This will generate .exe file for Windows, .AppImage file for Linux, and .dmg file for macOS. The executable files are uploaded to the artifacts.

Conclusion

It is easy to create a cross-platform GUI app using Python and PySimpleGUI. It is also easy to distribute the apps built with Python using pyinstaller.

Running tests in parallel with pytest & xdist

When tests are taking too long to run, an easy way to speed them up is to run them in parallel.

When using pytest as test runner, pytest-xdist & pytest-parallel plugins makes it easy to run tests concurrently or in parallel.

pytest-parallel works better if tests are independent of each other. If tests are dependent on each other, pytest-xdist is a better choice.

If there are parameterised tests, pytest-xdist will fail as the order of the tests is not guaranteed.

$ pytest -n auto tests/

Different tests were collected between gw0 and gw1. The difference is: ...

To fix this, we have to make sure that the parameterised tests are executed in the same order on all workers. It can be achieved by sorting the parameterised tests by their name.

Alternatively, we can use pytest-randomly plugin to order the tests.

Remap F4 to Raycast, Alfred (cmd + space)

On Mac keyboard, there is F4 key which opens Spotlight1 by default. I use Raycast2 a lot instead of Spotlight and wanted to remap F4 to Raycast.

There is an app called Karabiner-Elements3 which can be used to remap keys. After the app is installed, we can use this rule4 called Map F4 to cmd+space.

You can import the rule from the above URL directly. Once the rule is imported & enabled, F4 will be remapped to cmd + space as shown in the video below.

Add "Line Count" Column in File Manager

While monitoring an ETL pipeline, I browse a lot of files and often need to know how many lines are there in a file. For that, I can switch to that directory from terminal and run wc -l for that.

To avoid the hassle of switching to the directory and running a command in the terminal, I wrote a simple lua script to show line count column in xplr1 file manager.

Failed Attempts

Initially I set out to write a Finder2 plugin to show the line count column. But I couldn't find a way to get the line count of a file in Finder plugin. I have explored other GUI file managers but none of them have a way to show custom columns with line count.

Finally, I stumbled upon xplr a TUI file manager, and it was a breeze to write a lua script to show the line count column.

xplr - line count

xplr can be installed via brew.

$ brew install xplr

$ xplr --version
xplr 0.21.3

xplr reads the default configuration from ~/.config/xplr/init.lua. The following configuration shows the line count column in xplr.

version = '0.21.3'

xplr.fn.custom.fmt_simple_column = function(m)
  return m.prefix .. m.relative_path .. m.suffix
end

xplr.fn.custom.row_count = function(app)
  if not app.is_file then
    return "---"
  end

  local file = io.open(app.absolute_path, "r")
  if file then
    local row_count = 0
    for _ in file:lines() do
      row_count = row_count + 1
    end
    file:close()
    return tostring(row_count)
  end
end


xplr.config.general.table.header.cols = {
  { format = "  path" },
  { format = "line_count" },
}

xplr.config.general.table.row.cols = {
  { format = "custom.fmt_simple_column" },
  { format = "custom.row_count" },
}

xplr.config.general.table.col_widths = {
  { Percentage = 30 },
  { Percentage = 20 },
}

This will show a row count on launch.

xplr - line count

Conclusion

xplr is a very powerful file manager, and it is very easy to write lua scripts to create custom columns. I couldn't find a way to sort items based on the custom column. Need to explore more on that.

Guide to setting up GeoDjango on Mac M1

There are a lot of guides on setting up GeoDjango and PostGIS. But most of them are outdated and doesn't work on Mac M1. In this article, let us look at how to set up GeoDjango on Mac M1/M2.

Ensure you have already installed Postgres on your Mac.

Install GeoDjango

The default GDAL version available on brew fails to install on Mac M1.

$ brew install gdal
==> cmake --build build
Last 15 lines from /Users/chillaranand/Library/Logs/Homebrew/gdal/02.cmake:
    [javac] Compiling 82 source files to /tmp/gdal-20231029-31808-1wl9085/gdal-3.7.2/build/swig/java/build/classes
    [javac] warning: [options] bootstrap class path not set in conjunction with -source 7
    [javac] error: Source option 7 is no longer supported. Use 8 or later.
    [javac] error: Target option 7 is no longer supported. Use 8 or later.

BUILD FAILED
/tmp/gdal-20231029-31808-1wl9085/gdal-3.7.2/swig/java/build.xml:25: Compile failed; see the compiler error output for details.

Total time: 0 seconds
gmake[2]: *** [swig/java/CMakeFiles/java_binding.dir/build.make:108: swig/java/gdal.jar] Error 1
gmake[2]: Leaving directory '/private/tmp/gdal-20231029-31808-1wl9085/gdal-3.7.2/build'
gmake[1]: *** [CMakeFiles/Makefile2:9108: swig/java/CMakeFiles/java_binding.dir/all] Error 2
gmake[1]: Leaving directory '/private/tmp/gdal-20231029-31808-1wl9085/gdal-3.7.2/build'
gmake: *** [Makefile:139: all] Error 2

We can use conda to install gdal. Create a new environment and install gdal in it.

$ conda create -n geodjango python=3.9
$ conda install -c conda-forge gdal
$ pip install django
$ pip install psycopg2-binary

Once installed, you can check the version using gdalinfo --version.

Remaining dependencies can be installed via brew.

$ brew install postgresql
$ brew install postgis
$ brew install libgeoip

Let's create a new django project and add spatial backends.

$ django-admin startproject geodjango

Add django.contrib.gis to INSTALLED_APPS in settings.py.

INSTALLED_APPS = [
    ...,
    'django.contrib.gis',
]

Add the following to DATABASES in settings.py.

DATABASES['default']['ENGINE'] = 'django.contrib.gis.db.backends.postgis'

Since we used conda to install gdal, we need to set the path to gdal in our django settings. Run locate libgdal.dylib to find the path to gdal.

GDAL_LIBRARY_PATH = '/opt/homebrew/anaconda3/envs/geodjango/lib/libgdal.dylib'

Similarly, we need to set GEOS_LIBRARY_PATH as well.

GEOS_LIBRARY_PATH = '/opt/homebrew/anaconda3/envs/geodjango/lib/libgeos_c.dylib'

Now, we can create a new app and add PointField or any other spatial fields to our models.

$ python manage.py startapp places
from django.contrib.gis.db import models

class Place(models.Model):
    name = models.CharField(max_length=100)
    location = models.PointField()

Conclusion

In this article, we looked at how to set up GeoDjango on Mac M1. We used conda to install gdal and brew to install other dependencies.

tailscale: Remote SSH Access to Pi or Any Device

I recently started using Raspberry Pi and I wanted to access it when I am outside of home as well. After trying out few solutions, I stumbled upon Tailscale1.

Tailscale is a mesh VPN that makes it easy to connect out devices, wherever they are. It is free for personal use and supports all major platforms like Linux, Windows, Mac, Android, iOS, etc.

Installation

I installed tailscale on Raspberry Pi using the following command.

$ curl -fsSL https://tailscale.com/install.sh | sh

Setup

Once the installation is done, I run tailscale up to start the daemon. This opened a browser window and asked me to log in with email address. After I logged in, I can see all the devices in the tailscale dashboard.

tailscale dashboard

tailscale has CLI tool as well and status can be viewed with the following command.

$ tailscale status
100.81.13.75   m1                    avilpage@  macOS   -
100.12.12.92   rpi1.tailscale.ts.net avilpage@  linux   offline

I also set up a cron job to start tailscale daemon on boot.

$ crontab -e
@reboot tailscale up

Access

Now I can access the device from anywhere using the tailscale IP address. For example, if the IP address is 100.34.2.23. I can ssh into the device using the following command.

$ ssh pi@100.81.12.92

It also provides DNS names for each device. For example, I can ssh into the device using the following command as well.

$ ssh pi@raspberry3.tailscale.net

Conclusion

Tailscale is a great tool to access devices remotely. It is easy to set up and works well with Raspberry Pi, Mac & Linux as well.

Create Telegram Bot To Post Messages to Group

Introduction

Recently I had to create a Telegram bot again to post updates to a group based on IoT events. This post is just a reference for future.

Create a Telegram Bot

First, create a bot using BotFather in the Telegram app and get the API token. Then, create a group and add the bot to the group. This will give the bot access to the group.

Post Messages to the Group

Now, we need to fetch the group id. For this, we can use the following curl API call.

curl is available by default on Mac and Linux terminals. On Windows, we can use curl from command prompt.

$ curl -X GET https://api.telegram.org/bot<API_TOKEN>/getUpdates

{
  "ok": true,
  "result": [
    {
      "update_id": 733724271,
      "message": {
        "message_id": 9,
        "from": {
          "id": 1122,
          "is_bot": false,
          "username": "ChillarAnand",
          "language_code": "en"
        },
        "chat": {
          "id": -114522,
          "title": "DailyPythonTips",
          "type": "group",
          "all_members_are_administrators": true
        },
        "date": 1694045795,
        "text": "@DailyPythonTipsBot hi",
        "entities": [
          {
            "offset": 0,
            "length": 19,
            "type": "mention"
          }
        ]
      }
    }
  ]
}

This will return a JSON response with the group id. It sends empty response if there are no recent conversations.

In that case, send a dummy message to the bot in the group and try again. It should return the group id in the response.

We can use this group id to post messages to the group.

$ curl -X POST https://api.telegram.org/bot<API_TOKEN>/sendMessage -d "chat_id=<GROUP_ID>&text=Hello"

{
  "ok": true,
  "result": {
    "message_id": 12,
    "from": {
      "id": 3349238234,
      "is_bot": true,
      "first_name": "DailyPythonTipsBot",
      "username": "DailyPythonTipsBot"
    },
    "chat": {
      "id": -114522,
      "title": "DailyPythonTips",
      "type": "group",
      "all_members_are_administrators": true
    },
    "date": 1694046381,
    "text": "Hello"
  }
}

Here is the message posted by the bot in the group.

Telegram Bot for IoT Updates

Now, we can use this API to post messages to the group from our IoT devices or from any other devices where curl command is available.

Periodically Launch an App in Background

I recently started using Outlook app on my Mac. If the app is closed, it won't send any notifications. When I accidentally close the app, until I re-open it, I won't get any notifications.

I want to ensure that it starts periodically so that I don't miss any notifications for meetings.

After trying out various methods, I ended up using open command with cron to launch the app every 15 minutes.

$ crontab -e
*/15 * * * * /usr/bin/open -a "Microsoft Outlook"

This will launch the app every 15 minutes. This is inconvenient as it will bring Outlook to foreground every 15 minutes.

To avoid this, I passed -g option to run it in background.

$ crontab -e
*/15 * * * * /usr/bin/open -g -a "Microsoft Outlook"

This silently launches the app in background without causing any disturbance. Since the app is running the background, it will send notifications for any meetings.

This will ensure that I don't miss any meetings, even if I close outlook accidentally.

Rearrange CSV columns alphabetically from CLI

We can use tools like KDiff3 to compare two CSV files. But, it is difficult to identify the diff when the columns are not in the same order.

For example, look at the below output of 2 simple csv files.

kdiff3-csv-compare

Even though it highlights the diff, it is difficult to identify the diff because the columns are not in the same order. Here is the same diff after rearranging the columns alphabetically.

kdiff3-csv-compare-sorted

Now, it is easy to identify the diff.

Rearrange CSV columns alphabetically

We can write a simple python script using Pandas1 as follows.

#! /usr/bin/env python3

"""
re-arrange columns in alphabetical order
"""
import sys

import pandas as pd


def colsort(df):
    cols = list(df.columns)
    cols.sort()
    return df[cols]


def main():
    input_file = sys.argv[1]
    try:
        output_file = sys.argv[2]
    except IndexError:
        output_file = input_file
    df = pd.read_csv(input_file)
    df = colsort(df)
    df.to_csv(output_file, index=False)


if __name__ == '__main__':
    main()

We can use this script as follows.

$ python3 rearrange_csv_columns.py input.csv output.csv

Instead of writing a script by ourselves, we can use miller2 tool. Miller can perform various operations on CSV files. We can use sort-within-records to sort the columns.

$ mlr --csv sort-within-records -f input.csv > output.csv

Conclusion

We can use miller to sort the columns in a CSV file. This will help us to identify the diff easily when comparing two CSV files.

Train LLMs with Custom Dataset on Laptop

Problem Statement

I want to train a Large Language Model(LLM)1 with some private documents and query various details.

Journey

There are open-source available LLMs like Vicuna, LLaMa, etc which can be trained on custom data. However, training these models on custom data is not a trivial task.

After trying out various methods, I ended up using privateGPT2 which is quite easy to train on custom documents. There is no need to format or clean up the data as privateGPT can directly consume documents in many formats like txt, html, epub, pdf, etc.

Training

First, let's clone the repo, install requirements.txt and download the default model.

$ git clone https://github.com/imartinez/privateGPT
$ cd privateGPT
$ pip3 install -r requirements.txt
$ wget https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin

$ cp example.env .env
$ cat .env
MODEL_TYPE=GPT4All
MODEL_PATH=ggml-gpt4all-j-v1.3-groovy.bin

I have sourced all documents and kept them in a folder called docs. Let's ingest(train) the data.

$ cp ~/docs/* source_documents

$ python ingest.py

This will take a while depending on the number of documents we have. Once the ingestion is done, we can start querying the model.

$ python privateGPT.py
Enter a query: Summarise about Gaaliveedu

The default GPT4All-J v1.3-groovy3 model doesn't provide good results. We can easily swap it with LlamaCpp4. Lets download the model and convert it.

$ git clone https://huggingface.co/openlm-research/open_llama_13b

$ git clone https://github.com/ggerganov/llama.cpp.git
$ cd llama.cpp
$ python convert.py ../open_llama_13b
Wrote ../open_llama_13b/ggml-model-f16.bin

We can now update the .env file to use the new model and start querying again.

$ cat .env
MODEL_TYPE=LlamaCpp
MODEL_PATH=/path/to/ggml-model-f16.bin

$ python privateGPT.py
Enter a query: Summarise about Gaaliveedu

Conclusion

This makes it easy to build domain-specific LLMs and use them for various tasks. I have used this to build a chatbot for my internal docs and it is working well.