2019

Mastering DICOM - Part #1


Introduction

In hospitals, PACS simplifies the clinical workflow by reducing physical and time barriers. A typical radiology workflow looks like this.

Credit: Wikipedia

A patient as per doctor's request will visit a radiology center to undergo CT/MRI/X-RAY. Data captured from modality(medical imaging equipments like CT/MRI machine) will be sent to QA for verfication and then sent to PACS for archiving.

After this when patient visits doctor, doctor can see this study on his workstation(which has DICOM viewer) by entering patient details.

In this series of articles, we will how to achieve this seamless transfer of medical data digitally with DICOM.

DICOM standard

DICOM modalities create files in DICOM format. This file has dicom header which contains meta data and dicom data set which has modality info(equipment information, equipment configuration etc), patient information(name, sex etc) and the image data.

Storing and retreiving DICOM files from PACS servers is generally achieved through DIMSE DICOM for desktop applications and DICOMWeb for web applications.

All the machines which transfer/receive DICOM data must follow DICOM standard. With this all the DICOM machines which are in a network can store and retrieve DICOM files from PACS.

When writing software to handle DICOM data, there are third party packages to handle most of the these things for us.

Conclusion

In this article, we have learnt the clinical radiology workflow and how DICOM standard is useful in digitally transferring data between DICOM modalities.

In the next article, we will dig into DICOM file formats and learn about the structure of DICOM data.

Verifying TLS Certificate Chain With OpenSSL


Introduction

To communicate securely over the internet, HTTPS (HTTP over TLS) is used. A key component of HTTPS is Certificate authority (CA), which by issuing digital certificates acts as a trusted 3rd party between server(eg: google.com) and others(eg: mobiles, laptops).

In this article, we will learn how to obtain certificates from a server and manually verify them on a laptop to establish a chain of trust.

Chain of Trust

TLS certificate chain typically consists of server certificate which is signed by intermediate certificate of CA which is inturn signed with CA root certificate.

Using OpenSSL, we can gather the server and intermediate certificates sent by a server using the following command.

$ openssl s_client -showcerts -connect avilpage.com:443

CONNECTED(00000006)
depth=2 C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert High Assurance EV Root CA
verify return:1
depth=1 C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert SHA2 High Assurance Server CA
verify return:1
depth=0 C = US, ST = California, L = San Francisco, O = "GitHub, Inc.", CN = www.github.com
verify return:1
---
Certificate chain
 0 s:/C=US/ST=California/L=San Francisco/O=GitHub, Inc./CN=www.github.com
   i:/C=US/O=DigiCert Inc/OU=www.digicert.com/CN=DigiCert SHA2 High Assurance Server CA
-----BEGIN CERTIFICATE-----
MIIHMTCCBhmgAwIBAgIQDf56dauo4GsS0tOc8
MQswCQYDVQQGEwJVUzEVMBMGA1UEChMMRGlna
0wGjIChBWUMo0oHjqvbsezt3tkBigAVBRQHvF
aTrrJ67dru040my
-----END CERTIFICATE-----
 1 s:/C=US/O=DigiCert Inc/OU=www.digicert.com/CN=DigiCert SHA2 High Assurance Server CA
   i:/C=US/O=DigiCert Inc/OU=www.digicert.com/CN=DigiCert High Assurance EV Root CA
-----BEGIN CERTIFICATE-----
MIIEsTCCA5mgAwIBAgIQBOHnpNxc8vNtwCtC
MQswCQYDVQQGEwJVUzEVMBMGA1UEChMMRGln
0wGjIChBWUMo0oHjqvbsezt3tkBigAVBRQHv
cPUeybQ=
-----END CERTIFICATE-----

    Verify return code: 0 (ok)

This command internally verfies if the certificate chain is valid. The output contains the server certificate and the intermediate certificate along with their issuer and subject. Copy both the certificates into server.pem and intermediate.pem files.

We can decode these pem files and see the information in these certificates using

$ openssl x509 -noout -text -in server.crt

Certificate:
    Data:
        Version: 3 (0x2)
    Signature Algorithm: sha256WithRSAEncryption
    ----

We can also get only the subject and issuer of the certificate with

$ openssl x509 -noout -subject -noout -issuer -in server.pem

subject= CN=www.github.com
issuer= CN=DigiCert SHA2 High Assurance Server CA

$ openssl x509 -noout -subject -noout -issuer -in intermediate.pem

subject= CN=DigiCert SHA2 High Assurance Server CA
issuer= CN=DigiCert High Assurance EV Root CA

Now that we have both server and intermediate certificates at hand, we need to look for the relevant root certificate (in this case DigiCert High Assurance EV Root CA) in our system to verify these.

If you are using a Linux machine, all the root certificate will readily available in .pem format in /etc/ssl/certs directory.

If you are using a Mac, open Keychain Access, search and export the relevant root certificate in .pem format.

We have all the 3 certificates in the chain of trust and we can validate them with

$ openssl verify -verbose -CAfile root.pem -untrusted intermediate.pem server.pem
server.pem: OK

If there is some issue with validation OpenSSL will throw an error with relevant information.

Conclusion

In this article, we learnt how to get certificates from the server and validate them with the root certificate using OpenSSL.

Writing & Editing Code With Code - Part 1


In Python community, metaprogramming is often used in conjunction with metaclasses. In this article, we will learn about metaprogramming, where programs have the ability to treat other programs as data.

Metaprogramming

When we start writing programs that write programs, it opens up a lot of possibilities. For example, here is a metaprogramme that generates a program to print numbers from 1 to 100.

with open('num.py', 'w') as fh:
    for i in range(100):
        fh.write('print({})'.format(i))

This 3 lines of program generates a hundred line of program which produces the desired output on executing it.

This is a trivial example and is not of much use. Let us see practical examples where metaprogramming is used in Django for admin, ORM, inspectdb and other places.

Metaprogramming In Django

Django provides a management command called inspectdb which generates Python code based on SQL schema of the database.

$ ./manage.py inspectdb

from django.db import models

class Book(models.Model):
    name = models.CharField(max_length=100)
    slug = models.SlugField(max_length=100)
    ...

In django admin, models can be registered like this.

from django.contrib import admin

from book.models import Book


admin.site.register(Book)

Eventhough, we have not written any HTML, Django will generate entire CRUD interface for the model in the admin. Django Admin interface is a kind of metaprogramme which inspects a model and generates a CRUD interface.

Django ORM generates SQL statements for given ORM statements in Python.

In [1]: User.objects.last()
SELECT "auth_user"."id",
       "auth_user"."password",
       "auth_user"."last_login",
       "auth_user"."is_superuser",
       "auth_user"."username",
       "auth_user"."first_name",
       "auth_user"."last_name",
       "auth_user"."email",
       "auth_user"."is_staff",
       "auth_user"."is_active",
       "auth_user"."date_joined"
  FROM "auth_user"
 ORDER BY "auth_user"."id" DESC
 LIMIT 1


Execution time: 0.050304s [Database: default]

Out[1]: <User: anand>

Some frameworks/libraries use metaprogramming to solve problems realted to generating, modifying and transforming code.

We can also use these techniques in everyday programming. Here are some use cases.

  1. Generate REST API automatically.
  2. Automatically generate unit test cases based on a template.
  3. Generate integration tests automatically from the network traffic.

These are a some of the things related to web development where we can use metaprogramming techniques to generate/modify code. We will learn more about this in the next part of the article.

Tips On Writing Data Migrations in Django Application


Introduction

In a Django application, when schema changes Django automatically generates a migration file for the schema changes. We can write additional migrations to change data.

In this article, we will learn some tips on writing data migrations in Django applications.

Use Management Commands

Applications can register custom actions with manage.py by creating a file in management/commands directory of the application. This makes it easy to (re)run and test data migrations.

Here is a management command which migrates the status column of a Task model.

from django.core.management.base import BaseCommand
from library.tasks import Task

class Command(BaseCommand):

    def handle(self, *args, **options):
        status_map = {
            'valid': 'ACTIVE',
            'invalid': 'ERROR',
            'unknown': 'UKNOWN',
        }
        tasks = Task.objects.all()
        for tasks in tasks:
            task.status = status_map[task.status]
            task.save()

If the migration is included in Django migration files directly, we have to rollback and re-apply the entire migration which becomes cubersome.

Link Data Migrations & Schema Migrations

If a data migration needs to happen before/after a specific schema migration, include the migration command using RunPython in the same schema migration or create seperate schema migration file and add schema migration as a dependency.

def run_migrate_task_status(apps, schema_editor):
    from library.core.management.commands import migrate_task_status
    cmd = migrate_task_status.Command()
    cmd.handle()


class Migration(migrations.Migration):

    dependencies = [
    ]

    operations = [
        migrations.RunPython(run_migrate_task_status, RunSQL.noop),
    ]

Watch Out For DB Queries

When working on a major feature that involves a series of migrations, we have to be careful with data migrations(which use ORM) coming in between schema migrations.

For example, if we write a data migration script and then make schema changes to the same table in one go, then the migration script fails as Django ORM will be in invalid state for that data migration.

To overcome this, we can explicitly select only required fields and process them while ignoring all other fields.

# instead of
User.objects.all()

# use
User.objects.only('id', 'is_active')

As an alternative, we can use raw SQL queries for data migrations.

Conclusion

In this article, we have seen some of the problems which occur during data migrations in Django applications and tips to alleviate them.

Profiling & Optimizing Bottlenecks In Django


In the previous article, we have learnt where to start with performance optimization in django application and find out which APIs to optimize first. In this article, we will learn how to optimize those selected APIs from the application.

Profling APIs With django-silk

django-silk provides silk_profile function which can be used to profile selected view or a snippet of code. Let's take a slow view to profile and see the results.

from silk.profiling.profiler import silk_profile


@silk_profile()
def slow_api(request):
    time.sleep(2)
    return JsonResponse({'data': 'slow_response'})

We need to add relevant silk settings to django settings so that required profile data files are generated and stored in specified locations.

SILKY_PYTHON_PROFILER = True
SILKY_PYTHON_PROFILER_BINARY = True
SILKY_PYTHON_PROFILER_RESULT_PATH = '/tmp/'

Once the above view is loaded, we can see the profile information in silk profiling page.

In profile page, silk shows a profile graph and highlights the path where more time is taken.

It also shows cprofile stats in the same page. This profile data file can be downloaded and used with other visualization tools like snakeviz.

By looking at the above data, we can see most of the time is spent is time.sleep in our view.

Profling APIs With django-extensions

If you don't want to use silk, an alternate way to profile django views is to use runprofileserver command provided by django-extensions package. Install django-extensions package and then start server with the following command.

$ ./manage.py runprofileserver --use-cprofile --nostatic --prof-path /tmp/prof/

This command starts runserver with profiling tools enabled. For each request made to the server, it will save a corresponding .prof profile data file in /tmp/prof/ folder.

After profile data is generated, we can use profile data viewing tools like snakeviz, cprofilev visualize or browse the profile data.

Install snakeviz using pip

$ pip install snakeviz

Open the profile data file using snakeviz.

$ snakeviz /tmp/prof/api.book.list.4212ms.1566922008.prof

It shows icicles graph view and table view of profile data of that view.

These will help to pinpoint which line of code is slowing down the view. Once it is identified, we can take appropriate action like optimize that code, setting up a cache or moving it to a task queue if it is not required to be performed in the request-response cycle.

Versioning & Retrieving All Files From AWS S3 With Boto


Introduction

Amazon S3 (Amazon Simple Storage Service) is an object storage service offered by Amazon Web Services. For S3 buckets, if versioning is enabled, users can preserve, retrieve, and restore every version of the object stored in the bucket.

In this article, we will understand how to enable versioning for a bucket and retrieve all versions of an object from AWS web interface as well as Python boto library.

Versioning of Bucket

Bucket versioning can be changed with a toggle button from the AWS web console in the bucket properties.

We can do the same with Python boto3 library.

import boto3


bucket_name = 'avilpage'

s3 = boto3.resource('s3')
versioning = s3.BucketVersioning(bucket_name)

# check status
print(versioning.status)

# enable versioning
versioning.enable()

# disable versioning
versioning.suspend()

Retrieving Objects

Once versioning is enabled, we can store multiple versions of an object by uploading an object multiple times with the same key.

We can write a simple script to generate a text file with a random text and upload it to S3.

import random
import string

import boto3

file_name = 'test.txt'
key = file_name
s3 = boto3.client('s3')

with open(file_name, 'w') as fh:
    data = ''.join(random.choice(string.ascii_letters) for _ in range(10))
    fh.write(data)

s3.upload_file(key, bucket_name, file_name)

If this script is executed multiple times, the same file gets overridden with a different version id with the same key in the bucket.

We can see all the versions of the file from the bucket by selecting the file and then clicking drop-down at Latest version.

We can write a script to retrieve and show contents of all the versions of the test.txt file with the following script.

import boto3


bucket_name = 'avilpage'
s3_client = boto3.client('s3')

versions = s3_client.list_object_versions(Bucket=bucket_name)

for version in versions:
    version_id = versions['Versions'][0]['VersionId']
    file_key = versions['Versions'][0]['Key']

    response = s3.get_object(
        Bucket=bucket_name,
        Key=file_key,
        VersionId=version_id,
    )
    data = response['Body'].read()
    print(data)

Conclusion

Object versioning is useful to protect data from unintended overwrites. In this article, we learnt how to change bucket versioning, upload multiple versions of same file and retrieving all versions of the file using AWS web console as well as boto3.

Why My Grandma Can Recall 100+ Phone Numbers, But We Can't


On a leisure evening, as I was chit chatting with my grandma, my phone started ringing. Some one who is not in my contacts is calling me. As I was wondering who the heck is calling, my grandma just glanced at my screen and said, "Its you uncle Somu, pick up the phone". I was dumbstruck by this.

Later that evening, I asked my grandma to recall the phone numbers she remembers. She recalled 30+ phone numbers and she was able to recognize 100+ phone numbers based on the last 4 digits of the mobile.

That came as a surprise for me as I don't even remember 10+ phone numbers now. Most of the smart phone users don't remember family and friends phone numbers anymore.

A decade back, I used to remember most of my relatives and friends phone numbers even though I didn't had a phone. My grandma used to use a mini notebook to write all the phone numbers. I was worried about this mini notebook as it can get lost easily and it is always hard to find when required.

Since my grandma didn't have any contacts in her phone, she gets a glimpse of number every time someone calls her. She also dials the number every time she has to call someone. With this habit she is able to memorize all the numbers.

I, on the other hand started using a smart phone which has all the contacts. I search my contacts by name when I have to dial someone and there is no need to dial the number. Also whenever someone call me, their name gets displayed in large letters and I never get to focus on the number. Due to this, I don't remember any of the phone numbers

After this revelation, I started an experiment by disabling contact permissions for dialer app. With this, I am forced to type the number or select appropriate number from the call history and dial it. This was a bit uncomfortable at first. Soon I got used to it as recognized more and more numbers.

This might seem unnecessary in the smart phone age. But when you are traveling or when your phone gets switched off, it's hard to contact people. Even if someone gives their phone, it is of no use if I don't remember any numbers.

Also it is important to remember phone numbers of family and friends which might be needed in case of emergencies.

Switching Hosts With Bookmarklets - Web Development Tips


When debugging an issue related to web development projects, which is inconsistent between environments (local, development, QA, staging and production), we have to frequently switch between them.

If we are debugging something on the home page, then we can just bookmark the host URLs. We can switch between them by clicking on the relevant bookmark. Some browsers provide autocompletion for bookmarks. So we can type a few characters and then select the relevant URL from suggestions.

When debugging an issue on some other page like https://avilpage.com/validate/books/?name=darwin&year=2019, which has URL path and query param, switching between enviroment becomes tedious. To switch to local environment, we have to manually replace the hostname with localhost.

To avoid this, we can use a bookmarklet to switch the hosts. A bookmarklet is a bookmark which contains a JavaScript code snippet in its URL. This code snippet will be executed when the bookmarklet is clicked.

Lets create a bookmarklet to replace host in the URL with http://localhost:8000. Create new bookmark called To Local and in the URL add the following snippet.

javascript:(function() { window.location.replace("http://localhost:8000" + window.location.pathname + window.location.search); }())

If we click on To Local bookmarklet, it will redirect the current active page to localhost URL.

We can create one more bookmarklet to switch to production. Create a bookmarklet called To Production and add the following snippet in the URL.

javascript:(function() { window.location.replace("http://avilpage.com" + window.location.pathname + window.location.search); }())

We can create similar bookmarklets to switch to other enviroments. Now, switching between enviroments on any page is as easy as clicking a button.

A Short Guide To Debugging PostgreSQL Triggers


Introduction

PostgreSQL triggers will associate a function to a table for an event. If multiple triggers of the same kind are defined for the same event, they will executed in alphabetical order by name.

In this article we will see how to debug PostgreSQL triggers.

Triggers

First ensure triggers are enabled on the required tables for INSERT/UPDATE/DELETE events. We can check available triggers by running the following query.

SELECT * FROM information_schema.triggers;

We can also use EXPLAIN to show triggers which are executed for an event by running relevant queries.

PostgreSQL Logging

After ensuring triggers are applied correctly, set logging level for postgresql server and client in postgres.conf file.

# let server log all queries
log_statement = 'all'

# set client message to log level
client_min_messages = log

Restart postgresql to reflect configuration changes.

# Linux
sudo service postgres restart

# Mac
brew services restart postgres

Tail the logs and check if queries are executing correctly with appropriate values.

Triggers Logging

After enabling logging for PostgreSQL, we can raise messages/errors in triggers so that we can see if any unexpected things are happening at any point in the trigger.

RAISE 'Updating row with ID: %', id;
RAISE division_by_zero;
RAISE WARNING 'Unable to deleted record';

This makes sure triggers are executing as expected and if there are any warnings/errors, it will log a message.

SQL/PostgreSQL Gotchas

Even though queries and triggers are executing correctly, we might not see the desired result because of potentially suprising behaviour of PostgreSQL. There are a some scenarios where PostgreSQL seems to be not working at first but it actually is the expected behaviour.

  1. Unquoted object names will be treated as lower case. SELECT FOO FROM bar will become SELECT foo FROM bar.
  2. Comparing nullable fields. This might yield strange results as NULL != NULL.
  3. PostgreSQL uses POSIX offsets. For 04:21:42 UTC+01, +1 means the timezone is west of Greenwich.

Conclusion

By being aware of common PostgreSQL gotchas and enabling logging for PostgreSQL client, server & triggers, pinpointing the bug in triggers becomes easy. Once the bug is identified, appropriate action can be taken to fix the issue.

Essential PyCharm (Intellij) Plugins To Improve Productivity


As per 2019 JetBrains survery, PyCharm is the most widely used(36%) IDE for Python development. Eventhough PyCharm comes with lot of built in features, there are a lot of plugins available for PyCharm and other Intellij IDEs. In this article, we will see some plugins which will boost our productivity during development.

Highlight Bracket Pair

Instead of manually scanning where a bracket starts/ends, Highlight Bracket Pair will automatically hightlight the bracket pairs based on cursor position.

Rainbow Brackets

Highlight Bracket Pair will hightlight the bracket pair around the cursor. When there are multiple bracket pairs deeply nested, Rainbow Brackets will highlight matching bracket pairs with matching rainbow colors.

Grep Console

When running a django/flask server or any Python script which generates lot of output, it is hard to filter out required output on console. Grep Console can filter or highlight output based on specific conditions which makes it easier to debug the code.

Save Actions

Instead of manually optimizing imports or reformating code when changes are made, we can use Save Actions which will automatically run a set of actions on every file save.

Key Promoter

If you are new to PyCharm or an experienced user who is using mouse instead of key board shortcuts, Key Promoter will show relevant key board shortcut when mouse is used inside IDE. This provides an easy way to learn keyboard shortcuts faster.

String Manipulation

To convert lower case letters to upper case letter, String manipulation plugin will be useful. In addition to lower/upper case conversion, it also provides options to convert to cameCase, kebab-case, PascalCase etc.

Ace Jump

To move caret to a particular position in the editor without mouse, AceJump plugin will be useful. It allows to quickly navigate the caret to any position in the editor.

These are some plugins which will boost developers productivity while writing and debugging code in PyCharm or other JetBrains IDE.