Work From Home Tips For Non-remote Workers


Remote-first and remote-friendly companies have a different work culture & communication process compared to non-remote companies. Due to COVID-191 world wide pandemic, majority of workers who didn't had prior remote experience are WFH(working from home). This article is intended to provide some helpful tips for such people.

Work Desk

It is important to have a dedicated room or at least a desk for work. This creates a virtual boundary between your office work and personal work. Otherwise, you will end up working from bed, dining tables, kitchen etc which will result in body pains due to bad postures.

Get Ready

Image Credit: raywenderlich 2

Start your daily routine as if you are going to office. It is easy to stop caring about personal grooming and attire when WFH. Your attire can influence your focus and productivity.

If you find getting ready is hard, schedule video calls for all meetings with your colleagues. This might give some additional motivation for you to get ready early in the morning. When the work time, go to your work desk and start working.

Self Discipline

Schedule your work time. Whenever possible try to stick to office working hours. Without proper schedule, you will either end up under working or over working as your personal work and office work gets mixed up.

Take regular breaks during work hours. Without any distractions, it is easy to get lost in the pixels for longer durations. Taking short breaks for a quick walk and getting a fresh air outside will freshen up.

With unlimited access to kitchen and snacks, it is hard to avoid binge eating at home. But atleast avoid binge eating during office hours.

Exercise. Since WFH involves only sitting in a chair through out the day, staying physically active is challenging especially during this pandemic. Exercise few minutes every morning, help yourself in the kitchen by making a meal or doing dishes, clean your house etc., should help in staying physically active.

Seek Help

WFH can be lonely at times as the social interactions are quite less. Schedule 1 to 1 meetings or virtual coffe meetings with your colleagues to increase social interactions. Discuss WFH problems with your colleagues, friends and remote communities to see how they are tacking those problems.

How To Reduce Python Package Footprint?


PyPi1 hosts over 210K+ projects and the average size of Python package is less than 1MB. However some of the most used packages in scientific computing like NumPy, SciPy has large footprint as they bundle shared libraries2 along with the package.

Build From Source

If a project needs to be deployed in AWS Lambda, the total size of deployment package should be less than 250MB3.

$ pip install numpy

$ du -hs ~/.virtualenvs/py37/lib/python3.7/site-packages/numpy/
 85M    /Users/avilpage/.virtualenvs/all3/lib/python3.7/site-packages/numpy/

Just numpy occupies 85MB space on Mac machine. If we include a couple of other packages like scipy & pandas, overall size of the package crosses 250MB.

An easy way reduce the size of python packages is to build from source instead of use pre-compiled wheels.

$ CLFAGS='-g0 -Wl -I/usr/include:/usr/local/include -L/usr/lib:/usr/local/lib' pip install numpy --global-option=build_ext

$ du -hs ~/.virtualenvs/py37/lib/python3.7/site-packages/numpy/
 23M    /Users/avilpage/.virtualenvs/all3/lib/python3.7/site-packages/numpy/

We can see the footprint has reduced by ~70% when using sdist instead of wheel. This4 article provides more details about these CFLAG optimization when installing a package from source.

Shared Packages

When using a laptop with low storage for multiple projects with conflicting dependencies, a seperate virtual environment is needed for each project. This will lead to installing same version of the package in multiple places which increases the footprint.

To avoid this, we can create a shared virtual environment which has most commonly used packages and share it across all the enviroments. For example, we can create a shared virtual enviroment with all the packages required for scientific computing.

For each project, we can create a virtual enviroment and share all packages of the common enviroment. If any project requires a specific version of the package, the same package can be install in project enviroment.

$ cat common-requirements.txt  # shared across all enviroments
numpy==1.18.1
pandas==1.0.1
scipy==1.4.1

$ cat project1-requirements.txt  # project1 requirements
numpy==1.18.1
pandas==1.0.0
scipy==1.4.1

$ cat project2-requirements.txt  # project2 enviroments
numpy==1.17
pandas==1.0.0
scipy==1.4.1

After creating a virtual enviroment for a project, we can create a .pth file with the path of site-packages of common virtual enviroment so that all those packages are readily available in the new project.

$ echo '/users/avilpage/.virtualenvs/common/lib/python3.7/site-packages' >
 ~/.virtualenvs/project1/lib/python3.7/site-packages/common.pth

Then we can install the project requirements which will install only missing packages.

$ pip install -r project1-requirements.txt

Global Store

The above shared packages solution has couple issues.

  1. User has to manually create and track shared packages for each Python version and needs to bootstrap it in every project.
  2. When there is an incompatible version of package in multiple projects, user will end up with duplicate installations of the same version.

To solve this5, we can have a global store of packages in a single location segregated by python and package version. Whenever a user tries to install a package, check if the package is in global store. If not install it in global store. If present, just link the package to virtualenvs.

For example, numpy1.17 for Python 3.7 and numpy1.18 for Python 3.6 can be stored in the global store as follows.

$ python3.6 -m pip install --target ~/.mpip/numpy/3.6_1.18 numpy

$ python3.7 -m pip install --target ~/.mpip/numpy/3.7_1.17 numpy

# in project venv
echo '~/.mpip/numpy/3.7_1.17' > PATH_TO_ENV/lib/python3.7/site-packages/numpy.pth

With this, we can ensure one version of the package is stored in the disk only once. I have created a simple package manager called mpip6 as a POC to test this and it seems to work as expected.

These are couple of ways to reduce to footprint of Python packages in a single environment as well as muliple enviroments.

Disabling Atomic Transactions In Django Test Cases


TestCase is the most used class for writing tests in Django. To make tests faster, it wraps all the tests in 2 nested atomic blocks.

In this article, we will see where these atomic blocks create problems and find ways to disable it.

Select for update

Django provides select_for_update() method on model managers which returns a queryset that will lock rows till the end of transaction.

def update_book(pk, name):
    with transaction.atomic():
        book = Book.objects.select_for_update().get(pk=pk)
        book.name = name
        book.save()

When writing test case for a piece of code that uses select_for_update, Django recomends not to use TestCase as it might not raise TransactionManagementError.

Threads

Let us take a view which uses threads to get data from database.

def get_books(*args):
    queryset = Book.objects.all()
    serializer = BookSerializer(queryset, many=True)
    response = serializer.data
    return response

class BookViewSet(ViewSet):

    def list(self, request):
        with ThreadPoolExecutor() as executor:
            future = executor.submit(get_books, ())
            return_value = future.result()
        return Response(return_value)

A test which writes some data to db and then calls this API will fail to fetch the data.

class LibraryPaidUserTestCase(TestCase):
    def test_get_books(self):
        Book.objects.create(name='test book')

        self.client = APIClient()
        url = reverse('books-list')
        response = self.client.post(url, data=data)
        assert response.json()

Threads in the view create a new connection to the database and they don't see the created test data as the transaction is not yet commited.

Transaction Test Case

To handle above 2 scenarios or other scenarios where database transaction behaviour needs to be tested, Django recommends to use TransactionTestCase instead of TestCase.

from django.test import TransactionTestCase

Class LibraryPaidUserTestCase(TransactionTestCase):
    def test_get_books(self):
        ...

With TransactionTestCase, db will be in auto commit mode and threads will be able to fetch the data commited earlier.

Consider a scenario, where there are other utility classes which are subclassed from TestCase.

class LibraryTestCase(TestCase):
    ...

class LibraryUserTestCase(LibraryTestCase):
    ...

class LibraryPaidUserTestCase(LibraryTestCase):
    ...

If we subclass LibraryTestCase with TransactionTestCase, it will slow down the entire test suite as all the tests run in autocommit mode.

If we subclass LibraryUserTestCase with TransactionTestCase, we will miss the functionality in LibraryTestCase. To prevent this, we can override the custom methods to call TransactionTestCase.

If we look at the source code of TestCase, it has 4 methods to handle atomic transactions. We can override them to prevent creation of atomic transactions.

class LibraryPaidUserTestCase(LibraryTestCase):
    @classmethod
    def setUpClass(cls):
        super(TestCase, cls).setUpClass()

    @classmethod
    def tearDownClass(cls):
        super(TestCase, cls).tearDownClass()

    def _fixture_setup(self):
        return super(TestCase, self)._fixture_setup()

    def _fixture_teardown(self):
        return super(TestCase, self)._fixture_teardown()

We can also create a mixin with the above methods and subclass it wherever this functionality is needed.

Conclusion

Django wraps tests in TestCase inside atomic transactions to speed up the run time. When we are testing for db transaction behaviours, we have to disable this using appropriate methods.

Mastering DICOM - Part #1


Introduction

In hospitals, PACS simplifies the clinical workflow by reducing physical and time barriers. A typical radiology workflow looks like this.

Credit: Wikipedia

A patient as per doctor's request will visit a radiology center to undergo CT/MRI/X-RAY. Data captured from modality(medical imaging equipments like CT/MRI machine) will be sent to QA for verfication and then sent to PACS for archiving.

After this when patient visits doctor, doctor can see this study on his workstation(which has DICOM viewer) by entering patient details.

In this series of articles, we will how to achieve this seamless transfer of medical data digitally with DICOM.

DICOM standard

DICOM modalities create files in DICOM format. This file has dicom header which contains meta data and dicom data set which has modality info(equipment information, equipment configuration etc), patient information(name, sex etc) and the image data.

Storing and retreiving DICOM files from PACS servers is generally achieved through DIMSE DICOM for desktop applications and DICOMWeb for web applications.

All the machines which transfer/receive DICOM data must follow DICOM standard. With this all the DICOM machines which are in a network can store and retrieve DICOM files from PACS.

When writing software to handle DICOM data, there are third party packages to handle most of the these things for us.

Conclusion

In this article, we have learnt the clinical radiology workflow and how DICOM standard is useful in digitally transferring data between DICOM modalities.

In the next article, we will dig into DICOM file formats and learn about the structure of DICOM data.

Verifying TLS Certificate Chain With OpenSSL


Introduction

To communicate securely over the internet, HTTPS (HTTP over TLS) is used. A key component of HTTPS is Certificate authority (CA), which by issuing digital certificates acts as a trusted 3rd party between server(eg: google.com) and others(eg: mobiles, laptops).

In this article, we will learn how to obtain certificates from a server and manually verify them on a laptop to establish a chain of trust.

Chain of Trust

TLS certificate chain typically consists of server certificate which is signed by intermediate certificate of CA which is inturn signed with CA root certificate.

Using OpenSSL, we can gather the server and intermediate certificates sent by a server using the following command.

$ openssl s_client -showcerts -connect avilpage.com:443

CONNECTED(00000006)
depth=2 C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert High Assurance EV Root CA
verify return:1
depth=1 C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert SHA2 High Assurance Server CA
verify return:1
depth=0 C = US, ST = California, L = San Francisco, O = "GitHub, Inc.", CN = www.github.com
verify return:1
---
Certificate chain
 0 s:/C=US/ST=California/L=San Francisco/O=GitHub, Inc./CN=www.github.com
   i:/C=US/O=DigiCert Inc/OU=www.digicert.com/CN=DigiCert SHA2 High Assurance Server CA
-----BEGIN CERTIFICATE-----
MIIHMTCCBhmgAwIBAgIQDf56dauo4GsS0tOc8
MQswCQYDVQQGEwJVUzEVMBMGA1UEChMMRGlna
0wGjIChBWUMo0oHjqvbsezt3tkBigAVBRQHvF
aTrrJ67dru040my
-----END CERTIFICATE-----
 1 s:/C=US/O=DigiCert Inc/OU=www.digicert.com/CN=DigiCert SHA2 High Assurance Server CA
   i:/C=US/O=DigiCert Inc/OU=www.digicert.com/CN=DigiCert High Assurance EV Root CA
-----BEGIN CERTIFICATE-----
MIIEsTCCA5mgAwIBAgIQBOHnpNxc8vNtwCtC
MQswCQYDVQQGEwJVUzEVMBMGA1UEChMMRGln
0wGjIChBWUMo0oHjqvbsezt3tkBigAVBRQHv
cPUeybQ=
-----END CERTIFICATE-----

    Verify return code: 0 (ok)

This command internally verfies if the certificate chain is valid. The output contains the server certificate and the intermediate certificate along with their issuer and subject. Copy both the certificates into server.pem and intermediate.pem files.

We can decode these pem files and see the information in these certificates using

$ openssl x509 -noout -text -in server.crt

Certificate:
    Data:
        Version: 3 (0x2)
    Signature Algorithm: sha256WithRSAEncryption
    ----

We can also get only the subject and issuer of the certificate with

$ openssl x509 -noout -subject -noout -issuer -in server.pem

subject= CN=www.github.com
issuer= CN=DigiCert SHA2 High Assurance Server CA

$ openssl x509 -noout -subject -noout -issuer -in intermediate.pem

subject= CN=DigiCert SHA2 High Assurance Server CA
issuer= CN=DigiCert High Assurance EV Root CA

Now that we have both server and intermediate certificates at hand, we need to look for the relevant root certificate (in this case DigiCert High Assurance EV Root CA) in our system to verify these.

If you are using a Linux machine, all the root certificate will readily available in .pem format in /etc/ssl/certs directory.

If you are using a Mac, open Keychain Access, search and export the relevant root certificate in .pem format.

We have all the 3 certificates in the chain of trust and we can validate them with

$ openssl verify -verbose -CAfile root.pem -untrusted intermediate.pem server.pem
server.pem: OK

If there is some issue with validation OpenSSL will throw an error with relevant information.

Conclusion

In this article, we learnt how to get certificates from the server and validate them with the root certificate using OpenSSL.

Writing & Editing Code With Code - Part 1


In Python community, metaprogramming is often used in conjunction with metaclasses. In this article, we will learn about metaprogramming, where programs have the ability to treat other programs as data.

Metaprogramming

When we start writing programs that write programs, it opens up a lot of possibilities. For example, here is a metaprogramme that generates a program to print numbers from 1 to 100.

with open('num.py', 'w') as fh:
    for i in range(100):
        fh.write('print({})'.format(i))

This 3 lines of program generates a hundred line of program which produces the desired output on executing it.

This is a trivial example and is not of much use. Let us see practical examples where metaprogramming is used in Django for admin, ORM, inspectdb and other places.

Metaprogramming In Django

Django provides a management command called inspectdb which generates Python code based on SQL schema of the database.

$ ./manage.py inspectdb

from django.db import models

class Book(models.Model):
    name = models.CharField(max_length=100)
    slug = models.SlugField(max_length=100)
    ...

In django admin, models can be registered like this.

from django.contrib import admin

from book.models import Book


admin.site.register(Book)

Eventhough, we have not written any HTML, Django will generate entire CRUD interface for the model in the admin. Django Admin interface is a kind of metaprogramme which inspects a model and generates a CRUD interface.

Django ORM generates SQL statements for given ORM statements in Python.

In [1]: User.objects.last()
SELECT "auth_user"."id",
       "auth_user"."password",
       "auth_user"."last_login",
       "auth_user"."is_superuser",
       "auth_user"."username",
       "auth_user"."first_name",
       "auth_user"."last_name",
       "auth_user"."email",
       "auth_user"."is_staff",
       "auth_user"."is_active",
       "auth_user"."date_joined"
  FROM "auth_user"
 ORDER BY "auth_user"."id" DESC
 LIMIT 1


Execution time: 0.050304s [Database: default]

Out[1]: <User: anand>

Some frameworks/libraries use metaprogramming to solve problems realted to generating, modifying and transforming code.

We can also use these techniques in everyday programming. Here are some use cases.

  1. Generate REST API automatically.
  2. Automatically generate unit test cases based on a template.
  3. Generate integration tests automatically from the network traffic.

These are a some of the things related to web development where we can use metaprogramming techniques to generate/modify code. We will learn more about this in the next part of the article.

Tips On Writing Data Migrations in Django Application


Introduction

In a Django application, when schema changes Django automatically generates a migration file for the schema changes. We can write additional migrations to change data.

In this article, we will learn some tips on writing data migrations in Django applications.

Use Management Commands

Applications can register custom actions with manage.py by creating a file in management/commands directory of the application. This makes it easy to (re)run and test data migrations.

Here is a management command which migrates the status column of a Task model.

from django.core.management.base import BaseCommand
from library.tasks import Task

class Command(BaseCommand):

    def handle(self, *args, **options):
        status_map = {
            'valid': 'ACTIVE',
            'invalid': 'ERROR',
            'unknown': 'UKNOWN',
        }
        tasks = Task.objects.all()
        for tasks in tasks:
            task.status = status_map[task.status]
            task.save()

If the migration is included in Django migration files directly, we have to rollback and re-apply the entire migration which becomes cubersome.

Link Data Migrations & Schema Migrations

If a data migration needs to happen before/after a specific schema migration, include the migration command using RunPython in the same schema migration or create seperate schema migration file and add schema migration as a dependency.

def run_migrate_task_status(apps, schema_editor):
    from library.core.management.commands import migrate_task_status
    cmd = migrate_task_status.Command()
    cmd.handle()


class Migration(migrations.Migration):

    dependencies = [
    ]

    operations = [
        migrations.RunPython(run_migrate_task_status, RunSQL.noop),
    ]

Watch Out For DB Queries

When working on a major feature that involves a series of migrations, we have to be careful with data migrations(which use ORM) coming in between schema migrations.

For example, if we write a data migration script and then make schema changes to the same table in one go, then the migration script fails as Django ORM will be in invalid state for that data migration.

To overcome this, we can explicitly select only required fields and process them while ignoring all other fields.

# instead of
User.objects.all()

# use
User.objects.only('id', 'is_active')

As an alternative, we can use raw SQL queries for data migrations.

Conclusion

In this article, we have seen some of the problems which occur during data migrations in Django applications and tips to alleviate them.