Make Python Docker Builds Slim & Fast

Introduction

When using Docker, if the build is taking time or the build image is huge, it will waste system resources as well as our time. In this article, let's see how to reduce build time as well as image size when using Docker for Python projects.

Project

Let us take a hello world application written in flask.

import flask


app = flask.Flask(__name__)


@app.route('/')
def home():
    return 'hello world - v1.0.0'

Let's create a requirements.txt file to list out python packages required for the project.

flask==1.1.2
pandas==1.1.2

Pandas binary wheel size is ~10MB. It is included in requirements to see how python packages affect docker image size.

Here is our Dockerfile to run the flask application.

FROM python:3.7

ADD . /app

WORKDIR /app

RUN python -m pip install -r requirements.txt

EXPOSE 5000

ENTRYPOINT [ "python" ]

CMD [ "-m" "flask" "run" ]

Let's use the following commands to measure the image size & build time with/without cache.

$ docker build . -t flask:0.0 --pull --no-cache
[+] Building 45.3s (9/9) FINISHED

$ touch app.py  # modify app.py file

$ docker build . -t flask:0.1
[+] Building 15.3s (9/9) FINISHED

$ docker images | grep flask
flask               0.1     06d3e985f12e    1.01GB

With the current docker, here are the results.

1. Install requirements first

FROM python:3.7

WORKDIR /app

ADD ./requirements.txt /app/requirements.txt

RUN python -m pip install -r requirements.txt

ADD . /app

EXPOSE 5000

ENTRYPOINT [ "python" ]

CMD [ "-m" "flask" "run" ]

Let us modify the docker file to install requirements first and then add code to the docker image.

Now, build without cache took almost the same time. With cache, the build is completed in a second. Since docker caches step by step, it has cached python package installation step and thereby reducing the build time.

2. Disable Cache

FROM python:3.7

WORKDIR /app

ADD ./requirements.txt /app/requirements.txt

RUN python -m pip install -r requirements.txt --no-cache

ADD . /app

EXPOSE 5000

ENTRYPOINT [ "python" ]

CMD [ "-m" "flask" "run" ]

By default, pip will cache the downloaded packages. Since we don't need a cache inside docker, let's disable pip cache by passing --no-cache argument.

This reduced the docker image size by ~20MB. In real-world projects, where there are a good number of dependencies, the overall image size will be reduced a lot.

3. Use slim variant

Till now, we have been using defacto Python variant. It has a large number of common debian packages. There is a slim variant that doesn't contain all these common packages4. Since we don't need all these debian packages, let's use slim variant.

FROM python:3.7-slim

...

This reduced the docker image size by ~750 MB without affecting the build time.

4. Build from source

Python packages can be installed via wheels (.whl files) for a faster and smoother installation. We can also install them via source code. If we look at Pandas project files on PyPi1, it provides both wheels as well as tar zip source files. Pip will prefer wheels over source code the installation process will be much smoother.

To reduce Docker image size, we can build from source instead of using the wheel. This will increase build time as the python package will take some time to compile while building.

Here build size is reduced by ~20MB but the build has increased to 15 minutes.

5. Use Alpine

Earlier we have used, python slim variant the base image. However, there is Alpine variant which is much smaller than slim. One caveat with using alpine is Python wheels won't work with this image2.

We have to build all packages from source. For example, packages like TensorFlow provide only wheels for installation. To install this on Alpine, we have to install from the source which will take additional effort to figure out dependencies and install.

Using Alpine will reduce the image size by ~70 MB but it is not recomended to use Alpine as wheels won't work with this image.

All the docker files used in the article are available on github3.

Conclusion

We have started with a docker build of 1.01 GB and reduced it to 0.13 GB. We have also optimized build times using the docker caching mechanism. We can use appropriate steps to optimize build for size or speed or both.

How To Deploy Mirth Connect To Kubernetes

Introduction

NextGen Connect(previously Mirth Connect) is widely used integration engine for information exchange in health-care domain. In this article, let us see how to deploy Mirth Connect to a Kubernetes cluster.

Deployment To k8s

From version 3.8, NextGen has started providing official docker images for Connect1. By default, Connect docker exposes 8080, 8443 ports. We can start a Connect instance locally, by running the following command.

$docker run -p 8080:8080 -p 8443:8443 nextgenhealthcare/connect

We can use this docker image and create a k8s deployment to start a container.

---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: mirth-connect
  namespace: default
spec:
  template:
    spec:
      containers:
      - name: mirth-connect
        image: docker.io/nextgenhealthcare/connect
        ports:
        - name: http
          containerPort: 8080
        - name: https
          containerPort: 8443
        - name: hl7-test
          containerPort: 9001
        env:
          - name: DATABASE
            value: postgres
          - name: DATABASE_URL
            value: jdbc:postgresql://avilpage.com:5432/mirth_db

This deployment file can be applied on a cluster using kubectl.

$ kubectl apply -f connect-deployment.yaml

To access this container, we can create a service to expose this deployment to public.

---
apiVersion: v1
kind: Service
metadata:
  name: mirth-connect
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-ssl-cert: arn:aws:acm:ap-south-1:foo
    service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "443"
    external-dns.alpha.kubernetes.io/hostname: connect.avilpage.com
spec:
  type: LoadBalancer
  selector:
    app: mirth-connect
  ports:
    - name: http
      port: 80
      targetPort: 8080
      protocol: TCP
    - name: https
      port: 443
      targetPort: 8443
      protocol: TCP
    - name: hl7-test
      port: 9001
      targetPort: 9001
      protocol: TCP

This will create a load balancer in AWS through which we can access mirth connect instance. If an ingress controller is present in the cluster, we can use it directly instead of using a seperate load balancer for this service.

Once Mirth Connect is up & running, we might have to create HL7 channels running on various ports. In the above configuration files, we have exposed 9001 HL7 port for testing of channel. Once we configure Mirth Channels, we need to expose appropriate ports in deployment as well as service similiar to this.

Conclusion

Earlier, there were no official docker images for Mirth Connect and it was bit diffucult to dockerize Mirth Connect and deploy it. With the release of official Docker images, deploying Mirth Connect to k8s or any other container orchestration platform has become much easier.

Serial Bluetooth Terminal With Python Asyncio

Introduction

PySerial package provides a tool called miniterm1, which provides a terminal to interact with any serial ports.

However miniterm sends each and every character as we type instead of sending entire message at once. In addition to this, it doesn't provide any timestamps on the messages transferred.

In this article, lets write a simple terminal to address the above issues.

Bluetooth Receiver

pyserial-asyncio2 package provides Async I/O interface for communicating with serial ports. We can write a simple function to read and print all the messages being received on a serial port as follows.

import sys
import asyncio
import datetime as dt

import serial_asyncio


async def receive(reader):
    while True:
        data = await reader.readuntil(b'\n')
        now = str(dt.datetime.now())
        print(f'{now} Rx <== {data.strip().decode()}')


async def main(port, baudrate):
    reader, _ = await serial_asyncio.open_serial_connection(url=port, baudrate=baudrate)
    receiver = receive(reader)
    await asyncio.wait([receiver])


port = sys.argv[1]
baudrate = sys.argv[2]

loop = asyncio.get_event_loop()
loop.run_until_complete(main(port, baudrate))
loop.close()

Now we can connect a phone's bluetooth to a laptop bluetooth. From phone we can send messages to laptop using bluetooth terminal app like Serial bluetooth terminal4.

Here a screenshot of messages being send from an Android device.

We can listen to these messages on laptop via serial port by running the following command.

$ python receiver.py /dev/cu.Bluetooth-Incoming-Port 9600
2020-08-31 10:44:50.995281 Rx <== ping from android
2020-08-31 10:44:57.702866 Rx <== test message

Bluetooth Sender

Now lets write a sender to send messages typed on the terminal to the bluetooth.

To read input from terminal, we need to use aioconsole 3. It provides async input equivalent function to read input typed on the terminal.

import sys
import asyncio
import datetime as dt

import serial_asyncio
import aioconsole


async def send(writer):
    stdin, stdout = await aioconsole.get_standard_streams()
    async for line in stdin:
        data = line.strip()
        if not data:
            continue
        now = str(dt.datetime.now())
        print(f'{now} Tx ==> {data.decode()}')
        writer.write(line)


async def main(port, baudrate):
    _, writer = await serial_asyncio.open_serial_connection(url=port, baudrate=baudrate)
    sender = send(writer)
    await asyncio.wait([sender])


port = sys.argv[1]
baudrate = sys.argv[2]

loop = asyncio.get_event_loop()
loop.run_until_complete(main(port, baudrate))
loop.close()

We can run the program with the following command and send messages to phone's bluetooth.

$ python sender.py /dev/cu.Bluetooth-Incoming-Port 9600

ping from mac
2020-08-31 10:46:52.222676 Tx ==> ping from mac
2020-08-31 10:46:58.423492 Tx ==> test message

Here a screenshot of messages received on Android device.

Conclusion

If we combine the above two programmes, we get a simple bluetooth client to interact with any bluetooth via serial interface. Here is the complete code 5 for the client.

In the next article, lets see how to interact with Bluetooth LE devices.

Set Default Date For Date Hierarchy In Django Admin

Introduction

When we monitor daily events from django admin, most of the time we are interested in events related to today. Django admin provides date based drill down navigation page via ModelAdmin.date_hierarchy1 option. With this, we can navigate to any date to filter out events related to that date.

One problem with this drill down navigation is, we have to navigate to todays date every time we open a model in admin. Since we are interested in todays events most of the time, setting todays date as default filtered date will solve the problem.

Set Default Date For Date Hierarchy

Let us create an admin page to show all the users who logged in today. Since User model is already registered in admin by default, let us create a proxy model to register it again.

from django.contrib.auth.models import User


class DjangoUser(User):
    class Meta:
        proxy = True

Lets register this model in admin to show logged in users details along with date hierarchy.

from django.contrib import admin


@admin.register(DjangoUser)
class MetaUserAdmin(admin.ModelAdmin):
    list_display = ('username', 'is_active', 'last_login')
    date_hierarchy = 'last_login'

If we open DjangoUser model in admin page, it will show drill down navigation bar like this.

Now, if we drill down to a particular date, django adds additional query params to the admin url. For example, if we visit 2020-06-26 date, corresponding query params are /?last_login__day=26&last_login__month=6&last_login__year=2020.

We can override changelist view and set default params to todays date if there are no query params. If there are query params then render the original response.

@admin.register(DjangoUser)
class MetaUserAdmin(admin.ModelAdmin):
    list_display = ('username', 'is_active', 'last_login')
    date_hierarchy = 'last_login'

    def changelist_view(self, request, extra_context=None):
        if request.GET:
            return super().changelist_view(request, extra_context=extra_context)

        date = now().date()
        params = ['day', 'month', 'year']
        field_keys = ['{}__{}'.format(self.date_hierarchy, i) for i in params]
        field_values = [getattr(date, i) for i in params]
        query_params = dict(zip(field_keys, field_values))
        url = '{}?{}'.format(request.path, urlencode(query_params))
        return redirect(url)

Now if we open the same admin page, it will redirect to todays date by default.

Conclusion

In this article, we have seen how to set a default date for date_hierarchy in admin page. We can also achieve similar filtering by settiing default values for search_filter or list_filter which will filter items related to any specific date.