Reducing System Load With ChatGPT

Problem Statement

I am using M1 Macbook Air for Python development purposes. Since M1 uses ARM architecture, many Python packages don't have wheels for ARM64/aarch64. confluent-kafka-python is one of them.

I had to run AMD64 docker container to use confluent-kafka-python. Since it is a cross-architecture container, its CPU usage is too high and performance was too slow.

Solution

To reduce system load, I decided to build aarch64 wheels for confluent-kafka-python. I looked at open issues on GitHub and asked maintainers how to build aarch64 wheels. There was no response1 from them.

As a workaround, I asked ChatGPT2 on how to build confluent-kafka-python aarch64 wheels in a docker container.

chatgpt-reduce-system-load

This initial suggestion didn't work as confluent-kafka-python depends on librdkafka which is a C library. I had to build librdkafka from source for aarch64 and then build confluent-kafka-python from source.

To build librdkafka from the source, I again asked ChatGPT. After making minor changes to the snippet suggested by ChatGPT, I was able to build librdkafka from the source for aarch64.

Here is the final snippet:

FROM ubuntu:22.04

ARG DEBIAN_FRONTEND=noninteractive

RUN apt update && apt install -y \
  wget git curl g++ make postgresql-client \
  nano less shared-mime-info openjdk-17-jre-headless \
  libpq-dev vim tzdata python3 python3-dev

RUN apt install -y python3-pip
RUN python3 -m pip install setuptools

WORKDIR /
RUN git clone https://github.com/confluentinc/confluent-kafka-python
WORKDIR confluent-kafka-python

COPY . /app
WORKDIR /app
RUN ./configure --arch=aarch64 --prefix=/usr
RUN make
RUN make install

WORKDIR /confluent-kafka-python
RUN python3 setup.py install

Conclusion

By running native containers, I was able to reduce the system load by ~50%. With ChatGPT, it is easy to build/tweak programs in languages & environments that we are not familiar with.

Automator Quick Action for KDiff3 in Finder

The need for quick action

kdiff31 is a diff & merge tool that compares multiple files/directories and shows the difference line by line and character by character as shown below.

mac-finder-kdiff3

In Windows, when we select multiple files/directories and right click on them, it will show the option to compare selected items with kdiff3.

mac-finder-kdiff3-windows

However, in Macbook, it doesn't show this option. In this tutorial, let us see how we can create the same quick action in the right-click menu when we right-click on the files/directories.

Creating Quick Action

Let us open Automator2, create new file and select Quick Action.

mac-finder-automator

On the left side select Utilities and then select Run Shell Script.

For Workflow receives current, select files or folders and then select in Finder.

mac-finder-quick-action

Then select pass input as agruments and in the script section let us add the following command.

/path/to/kdiff3 $1 $2

After adding the command, save this Quick Action.

Now if we relaunch Finder app and then select multiple directories, and right click we can see Compare with KDiff3 in quick actions.

mac-finder-kdiff3

Conclusion

Even though we can use the command line to compare the files/directories, it is always good to have a quick action in the right-click menu.

Setup Kubernetes Anywhere with Single Command


Introduction

In an earlier article, we have seen how to set up Kubernetes on M1 Mac. That involved spinning up a VM and installing Kubernetes1 on it. In this article, we will see how to set up Kubernetes directly on Docker so that we can use the same set-up on any operating system.

Prerequisites

Ensure you have Docker installed on your system. If you are on a Mac or Windows, you can install Docker Desktop2.

k3s/k3d

k3s3 is a lightweight Kubernetes distribution by Rancher. It is a single binary that can be run on any Linux machine. But it doesn't work on Mac or Windows.

k3d4 is a wrapper around k3s that allows you to run k3s on Docker. It is a great option for running Kubernetes on your local machine.

Installation

k3d can be installed using the following command:

$ brew install k3d  # mac
$ chocolatey install k3d  # windows
$ curl -s https://raw.githubusercontent.com/k3d-io/k3d/main/install.sh | bash # linux

Once it is installed, we can create a cluster using the following command:

$ k3d cluster create demo

This will launch a cluster with a single node. We can also setup a multi-node cluster using the following command:

$ k3d cluster create demo --servers 3 --agents 2

We can verify the cluster is up and running using the following command:

$ kubectl get nodes

We can also use GUI tools like Lens to manage and navigate the cluster. In the above video we have used Lens to create a Jenkins deployment as well.

Conclusion

In this article, we have seen how to set up Kubernetes on Docker. This is a great option for running Kubernetes on your local machine. We can also use this to run production setup for small applications.

Using Conda/Mamba with Python Pip on M1 Mac

Introduction

From 2020, all Apple MacBooks are powered by Apple Silicone(M1) chips. This chip uses Aarch64 architecture which is different from x86 architecture which was used by Intel chips earlier.

Python is a cross-platform language. It can run on any platform. However, Python packages are compiled for specific platforms. For example, a package compiled for x86 will not work on Aarch64 platform. Also, many Python packages are not yet available for ARM64/Aarch64 platform.

M1 Mac and Python

If we want to run a python package on M1 Mac which doesn't have ARM64 support, we need to use an emulator(or a cross-architecture Docker image). This will significantly slow down the application.

An alternate solution is to build packages for ARM64 platform. Building binary packages from the source code requires a lot of time and effort. Also, we need to build the package for each Python version.

Instead of building from source, we can use Conda/Mamba to install Python packages as well as other system packages. Conda/Mamba will automatically install the correct binary for the package.

For example, python-confluent-kafka3 package doesn't have Linux aarch64 support. To run it on aarch64 platform, we have to build from source which takes a lot of time. Instead, we can simply install it using Conda/Mamba with a single command.

$ conda install -c conda-forge python-confluent-kafka

Similar to pip, Conda can also install all the packages mentioned in a file like requirements.txt.

$ conda install --file requirements.txt

Conclusion

In data science ecosystem, Conda1/Mamba2 are widely used as package managers. In web development ecosystem, they are not as widely used as pip.

Conda/Mamba is a great cross-platform system package manager, and it doesn't have all the Python packages available on PyPi. However, we can use it along with pip for easy package management on M1 Macbook.

Hot Module Reload In Python With Reloadium

Introduction

Hot module reloading is a feature that allows you to reload a module without restarting the whole application. This is very useful when we are developing/debugging an application, and we want to see the changes instantaneously.

Reloadium

Reloadium1 is an advanced hot reloading library for python.

Instead of writing an article, I thought it would be much easier to show a live demo of Reloadium. In the below video, we can see how reloadium greatly improves developer experience.


Currently, reloadium can be used as a standalone tool. We can install it from PyPi and run any arbitrary python script with reloadium.

$ pip install reloadium
$ reloadium run myscript.py

Alternatively, it is available as a plugin for PyCharm as shown in the above video. VS Code support is also in the works.

Reloadium is capable of profiling too. Without writing a single line of code, we can profile Python code. But that's a topic for another article.

Conclusion

I have been using Reloadium from a few months, and it has become an essential part of my development workflow. These days I always run all the scripts or apps in debug mode with reloadium directly.

Best Pay After Placement Courses In India

Introduction

In India, huge number of students are graduating every year. Most of them are not able to get a job right after graduation. In order to get a job in IT industry, students need to have some technical skills.

There are thousands of institutes in India that are providing paid technical courses. Depending on the course, the fees can be anywhere between 5,000 to 5 lakhs. The percentage of students who are getting a job after doing these courses is extremely low. In addition to that quite a few students are not able to afford fees to join these courses.

Pay After Placement Courses

To combat this problem, some institutes are providing pay after placement(PAP) courses. In these courses, students will pay the fees only after getting a job with a desired package. This is a win-win situation for both the students and the institutes. This is a far better option than paying the fees upfront and not getting a job. These courses are also called as income share agreement(ISA) courses.

Here is a list of top pay after placement courses in India for front end developers, back end developers, full stack developers, data scientists, machine learning engineers, and data engineers.

Site-Rank Institute Fee(Approx INR)
134,988 Sharpener Tech 68,000
59,928 AccioJob 177,000
321,989 Placewit 0 (Upto 10L)
37,294 Masai School 350,000
1,482,058 Digikul 234,000
1,554,412 10xAcademy 295,000
295,708 Function Up 295,000
84,513 AlmaBetter Not known


Conclusion

Most of these courses have an entrance test that candidates have to clear before joining the course. However taking these courses is far better than paying the fees upfront and not getting a job. If you are interested in any of these courses, you can apply for the entrance test and join the course.

Pipe tail output into column

column command-line utility formats its input into multiple columns and aligns it nicely. It is useful for formatting output of csv files, or other commands.

$ cat users.csv
id,user,active
1,John Doe,true
2,Will Smith,false

$ column -s, -t < users.csv
id  user        active
1   John Doe    true
2   Will Smith  false

tail command-line utility prints the last 10 lines of a file. It can be used with -f option to follow the file as it grows.

$ tail -f users.csv
id,user,active
1,John Doe,true
2,Will Smith,false

To format the output of tail -f command, we can't use column command directly. column command can't produce output until it receives all the input. It needs all the input beforehand to calculate the column widths.

$ tail -f users.csv | column -s, -t

So, the above command won't work.

As the goal is to follow the output of the file, we can use watch command for this. watch command executes a command periodically, and displays its output.

$ watch -n 1 "tail -n 20 users.csv | column -s, -t"

This command will fetch the last 20 lines of the file, pipe it to column command, and display the output. It will repeat the command every 1 second.

As the file grows beyond 20 lines, the headers will be truncated. To preserve the headers, we can use head command in addition to tail command.

$ watch -n 1 "(head -n1 && tail -n20) < users.csv| column -s, -t"

This command will print the first line of the file, and then the last 20 lines of the file. The output will be piped to column command, and displayed.

Here is a screenshot of the output of a demo csv.

pipe tail output to column

This makes it easy to watch the output of a file as it grows.

Change Kafka Log Directory & Format It

Problem Statement

On my local Mac, I was using Kafka to pass messages between various applications. Due to some reason, when I tried to start Kafka recently, it was failing to start and here are the relevant error logs.

[2022-12-23 11:57:06,217] WARN [Controller 1] writeNoOpRecord: failed with unknown server exception RuntimeException at epoch 139 in 5198 us.  Renouncing leadership and reverting to the last committed offset 927938. (org.apache.kafka.controller.QuorumController)

[2022-12-23 11:57:06,536] ERROR [Controller 1] registerBroker: unable to start processing because of NotControllerException. (org.apache.kafka.controller.QuorumController)

[2022-12-23 12:23:35,834] ERROR [RaftManager nodeId=1] Had an error during log cleaning (org.apache.kafka.raft.KafkaRaftClient)
org.apache.kafka.common.errors.OffsetOutOfRangeException: Cannot increment the log start offset to 927939 of partition __cluster_metadata-0 since it is larger than the high watermark 926507
[2022-12-23 12:23:36,035] WARN [Controller 1] writeNoOpRecord: failed with unknown server exception RuntimeException at epoch 294 in 137 us.  Renouncing leadership and reverting to the last committed offset 927938. (org.apache.kafka.controller.QuorumController)
java.lang.RuntimeException: Cant create a new in-memory snapshot at epoch 926507 because there is already a snapshot with epoch 927938

[2022-12-23 12:23:36,252] ERROR Exiting Kafka due to fatal exception during startup. (kafka.Kafka$)

Debugging

I tried to figure out the exact root cause. After multiple failed attempts, I decided to change the log directory temporarily and go ahead for now.

Solution

I create a new temporary directory and set the log directory to that.

$ mkdir /tmp/kafka-logs

# inside server.properties
log.dirs=/tmp/kafka-logs

When I started the Kafka server, it failed.

$ kafka-server-start server.properties

[2022-12-23 12:30:50,018] ERROR Exiting Kafka due to fatal exception (kafka.Kafka$)
org.apache.kafka.common.KafkaException: No `meta.properties` found in /tmp/ (have you run `kafka-storage.sh` to format the directory?)

I ran the kafka-storage script to format the directory. First, we need to get the cluster-id. Since we already know the old kafa-logs directory, we can get the cluster-id from there.

$ cat ~/homebrew/var/lib/kraft-combined-logs/meta.properties 
#
#Thu Oct 20 11:48:12 IST 2022
cluster.id=5MB5lq-XT-6JzQqJeIuhWQ
node.id=1
version=1      

Now, we can format the new directory.

$ kafka-storage format --config server.properties --cluster-id 5MB5lq-XT-6JzQqJeIuhWQ

Formatting /tmp/kafka-logs/ with metadata.version 3.3-IV3.

After changing log directory, Kafka has started working.

$ kafka-start-server /path/to/server.properties

Since I have changed log directory all older messages are lost. Since I am doing this on my local machine, it is fine. Need to revisit it to debug further.

Hands-on RabbitMQ Tutorial

A short hands-on guide to get started with RabbitMQ for people who are in a hurry.

What is RabbitMQ?

RabbitMQ

Image Credit: CloudAMQP

RabbitMQ1 is an open-source message broker software that implements the Advanced Message Queuing Protocol (AMQP). With RabbitMQ, producer and consumer applications can communicate asynchronously, and they will be completely decoupled.

RabbitMQ Terminology

Producer: A producer is a client that publishes messages to the RabbitMQ broker. Producers write data to exchanges.

Consumer: A consumer is a client that subscribes to queues and processes the messages. Consumers read data from queues.

Queue: A queue is a buffer that stores messages. A queue is bound to an exchange and receives messages from it.

Exchange: An exchange is a message routing agent that receives messages from producers and routes them to queues.

Binding: A binding is a link between an exchange and a queue. It is created with a routing key. The producer sends messages to the exchange with a routing key. The exchange routes the message to the queues that are bound with a matching routing key.

RabbitMQ Setup

We can use the official RabbitMQ docker image to run RabbitMQ locally. We can run the following command to start a RabbitMQ container:

$ docker run --rm --name=rabbitmq -p 15672:15672 -p 5672:5672 rabbitmq:3-management

This image has rabbitmq management plugin enabled. We can access the management UI at http://localhost:15672. The default username and password are both guest.

It also has rabbitmqadmin command line tool installed, which can manage RabbitMQ.

Passing Messages from UI

We can use the management UI to send and receive messages. We can create a new queue and exchange from the Queues section.

RabbitMQ Queue

Once a queue is created, we can publish and consume messages from that queue.

RabbitMQ Publish

Passing Messages from CLI

Instead of using web UI, we can use rabbitmqadmin CLI tool2 to send and receive messages. Let's create a topic exchange and a queue.

$ docker exec rabbitmq rabbitmqadmin declare exchange type=direct name=orders
# => exchange declared
$ docker exec rabbitmq rabbitmqadmin declare queue name=orders
# => queue declared

Let's publish a message to the exchange:

$ docker exec rabbitmq rabbitmqadmin publish routing_key=orders payload='dummy message'
# => Message published

To receive messages from the queue, we can use the following command:

$ docker exec rabbitmq rabbitmqadmin get queue=orders

RabbitMQ CLI

Passing Messages from REST API

We can also use REST API to send and receive messages. Let's create a new exchange and queue:

$ curl -u guest:guest -X PUT -H "content-type:application/json" -d '{"type":"direct"}' http://localhost:15672/api/exchanges/%2f/orders
$ curl -u guest:guest -X PUT -H "content-type:application/json" -d '{"type":"topic", "durable": true}' http://localhost:15672/api/queues/%2f/orders

We can publish a message to the exchange:

$ curl -u guest:guest -X POST -H "content-type:application/json" -d '{"routing_key":"orders","payload":"dummy message","payload_encoding":"string", "properties": {} }' http://localhost:15672/api/exchanges/%2f/orders/publish

To receive messages from the queue, we can use the following command:

$ curl -u guest:guest -X GET http://localhost:15672/api/queues/%2f/orders/get

Conclusion

In this post, we have seen how to get started with RabbitMQ. We have seen how to use the management UI, CLI and REST API to send and receive messages.

Hands-on Apache Kafka Tutorial

A short hands-on guide to get started with Apache Kafka for people who are in a hurry.

In this guide, we will learn what is Apache Kafka, how to install and run it. We will also learn how to create/modify a topic and produce/consume messages from it.

What is Apache Kafka?

Apache Kafka

Apache Kafka1 is a distributed event store and streaming-processing platform. It is used to build real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, and has high throughput.

Kafka Terminology

Topic: A topic is a category or feed name to which records are published/consumed. It is configured with a set of key-value pairs called topic configuration.

Producer: A producer is a client that publishes records to the Kafka cluster. Producers write data to topics and partitions.

Consumer: A consumer is a client that subscribes to topics and processes the records. Consumers read data from topics and partitions.

Consumer Group: A consumer group is a group of consumers that share a common purpose. Consumer groups enable a pool of processes to divide the work of consuming and processing records.

Broker: A broker is a server that hosts a set of topics/partitions. It receives data from producers and sends data to consumers.

ZooKeeper: ZooKeeper is used to store the cluster configuration and the state of the cluster. All Kafka brokers connect to ZooKeeper.

Kraft: Kraft(Apache Kafka Raft) is a consensus protocol that is used to manage the metadata of the Kafka cluster. It is introduced to remove dependency on ZooKeeper.

Installing Apache Kafka

We can use cp-all-in-one2 docker compose files to run Apache Kafka locally. This image contains all the components of Confluent Platform including Apache Kafka, Apache Zookeeper, Confluent Schema Registry, Confluent REST Proxy, Confluent Control Center, and others.

$ git clone https://github.com/confluentinc/cp-all-in-one
$ cd cp-all-in-one/cp-all-in-one
$ docker-compose up

Confluent Control Center is a web UI to manage and monitor Apache Kafka.

Kafka Control Center

We can visit it http://localhost:9021 and monitor the cluster from this UI.

Producing and Consuming Messages

Kafka stores messages in topics. A topic is a category or feed name to which messages are published/consumed.

Let us create a topic called test with kafka-topics command.

$ docker-compose exec broker kafka-topics --bootstrap-server localhost:9092 --topic test --create 

This will create a topic called test with a single partition and a replication factor of 1. In multi-node cluster, we can use --replication-factor, --partitions to specify the number of replicas/partitions for the topic.

$ docker-compose exec broker kafka-topics --bootstrap-server localhost:9092 --topic test --partitions 3 --replication-factor 2 --create --if-not-exists

To produce messages to a topic named test, we can use kafka-console-producer and add messages to the topic:

$ docker-compose exec broker kafka-console-producer --broker-list localhost:9092 --topic test

>order received
>order updated
>order shipped
>order delivered
>{"status": "completed"}

To consume messages from the same topic:

$ docker-compose exec broker kafka-console-consumer --bootstrap-server localhost:9092 --topic test --from-beginning

order received
order updated
order shipped
order delivered
{"status": "completed"}

Since we have not defined schema for the messages, Kafka will store the messages as byte arrays. We can explicitly define the schema for the messages using Confluent Schema Registry if required.

We can list all the topics in cluster using kafka-topics:

$ docker-compose exec broker kafka-topics --bootstrap-server localhost:9092 --list

default_ksql_processing_log
docker-connect-configs
docker-connect-offsets
docker-connect-status
test

To show details of a topic:

$ docker-compose exec broker kafka-topics --bootstrap-server localhost:9092 --describe --topic test

Topic: test TopicId: 7CckqkXsQXCNY0MNHYRv2w PartitionCount: 1   ReplicationFactor: 1    Configs: 
    Topic: test Partition: 0    Leader: 1   Replicas: 1 Isr: 1  Offline:         

By default all messages are stored in the topic for 7 days. We can change this retention period using retention.ms configuration:

$ docker-compose exec broker kafka-topics --bootstrap-server localhost:9092 --alter --topic test --config retention.ms=10000

To see all the available consumer groups, we can use kafka-consumer-groups:

$ docker-compose exec broker kafka-consumer-groups --bootstrap-server localhost:9092 --list

Kafka Rest Proxy

Kafka Rest Proxy3 is a RESTful interface to Apache Kafka. It provides a RESTful interface to produce and consume messages, view the state of the cluster, and perform administrative actions without using the native Kafka protocol or clients.

To produce messages to a test topic with curl:

$ curl -X POST -H "Content-Type: application/vnd.kafka.json.v2+json" \
    --data '{"records":[{"value":{"status": "completed"}}]}' \
    "http://localhost:8082/topics/test"

To consume messages from the same topic:

$ curl -X GET -H "Accept: application/vnd.kafka.json.v2+json" \
    "http://localhost:8082/topics/test"

We can dynamically configure Kafka cluster settings as well.

To change log level of various components of Kafka cluster using Kafka Rest Proxy.

$ curl -X POST -H "Content-Type: application/vnd.kafka.v2+json" \
    --data '{"log4j.logger.kafka.server":"DEBUG"}' \
    "http://localhost:8082/config"

We can update the log level of various components of Kafka cluster and check the logs.

Conclusion

In this article, we have seen how to install Apache Kafka locally using Docker. We have also seen how to produce and consume messages using Kafka console commands and Kafka Rest Proxy.