Django Tips & Tricks #12 - Automatically Set CSRF Token in Postman


Introduction

Django has inbuilt CSRF protection mechanism for requests via unsafe methods to prevent Cross Site Request Forgeries. When CSRF protection is enabled on AJAX POST methods, X-CSRFToken header should be sent in the request.

Postman is one of the widely used tool for testing APIs. In this article, we will see how to set csrf token and update it automatically in Postman.

CSRF Token In Postman

Django sets csrftoken cookie on login. After logging in, we can see the csrf token from cookies in the Postman.

We can grab this token and set it in headers manually.

But this token has to be manually changed when it expires. This process becomes tedious to do it on an expiration basis.

Instead, we can use Postman scripting feature to extract token from cookie and set it to an environment variable. In Test section of postman, add these lines.

var xsrfCookie = postman.getResponseCookie("csrftoken");
postman.setEnvironmentVariable('csrftoken', xsrfCookie.value);

This extracts csrf token and sets it to an evironment variable called csrftoken in the current environment.

Now in our requests, we can use this variable to set the header.

When the token expires, we just need to login again and csrf token gets updated automatically.

Conclusion

In this article we have seen how to set and renew csrftoken automatically in Postman. We can follow similar techniques on other API clients like CURL or httpie to set csrf token.

Comments

How To Install Private Python Packages With Pip


Introduction

To distribute python code, we need to package it and host it somewhere, so that users can install and use it. If the code is public, it can be published to PyPi or any public repository, so that anyone can access it. If the code is private, we need to provide proper authentication mechanism before allowing users to access it.

In this article, we will see how to use pip to install Python packages hosted on GitLab, GitHub, Bitbucket or any other services.

Packaging

To package python project, we need to create setup.py file which is build script for setuptools. Below is a sample setup file to create a package named library.

import setuptools


with open("README.md", "r") as fh:
    long_description = fh.read()

setuptools.setup(
    name="library",
    version="0.0.1",
    author="chillaranand",
    author_email="foo@avilpage.com",
    description="A simple python package",
    long_description=long_description,
    url="https://github.com/chillaranand/library",
    packages=setuptools.find_packages(),
    classifiers=[
        "Programming Language :: Python :: 3",
        "License :: OSI Approved :: MIT License",
        "Operating System :: OS Independent",
    ],
)

Python provides detailed packaging documentation on structuring and building the package.

Installation

Once module(s) is packaged and pushed to hosting service, it can be installed with pip.

# using https
$ pip install git+https://github.com/chillaranand/library.git

# using ssh
pip install git+ssh://git@github.com/chillaranand/library.git

This usually requires authentication with usersname/password or ssh key. This setup works for developement machines. To use it in CI/CD pipelines or as a dependency, we can use tokens to simplify installation.

$ export GITHUB_TOKEN=foobar

$ pip install git+https://$GITHUB_TOKEN@github.com/chillaranand/library.git

Conclusion

In this article, we have seen how to package python code and install private packages with pip. This makes it easy to manage dependencies or install packages on multiple machines.

Comments

Django Tips & Tricks #11 - Finding High-impact Performance Bottlenecks


Introduction

When optimizing performance of web application, a common mistake is to start with optimizing the slowest page(or API). In addition to considering response time, we should also consider the traffic it is receving to priorotize the order of optimization.

In this article we will profile a django webapp, find high-impact performance bottlenecks and then start optimization them to yield better performance.

Profiling

django-silk is an open source profiling tool which intercepts and stores HTTP requests data. Install it with pip.

pip install django-silk

Add silk to installed apps and include silk middleware in django settings.

MIDDLEWARE = [
    ...
    'silk.middleware.SilkyMiddleware',
    ...
]

INSTALLED_APPS = (
    ...
    'silk'
)

Run migrations so that Silk can create required database tables to store profile data.

$ python manage.py makemigrations
$ python manage.py migrate
$ python manage.py collectstatic

Include silk urls in root urlconf to view the profile data.

urlpatterns += [url(r'^silk/', include('silk.urls', namespace='silk'))]

On silk requests page(http://host/silk/requests/), we can see all requests and sort them by overall time or time spent in database.

High Impact Bottlenecks

Silk creates silk_request table which contains information about the requests processed by django.

$ pgcli

library> \d silk_request;

+--------------------+--------------------------+-------------+
| Column             | Type                     | Modifiers   |
|--------------------+--------------------------+-------------|
| id                 | character varying(36)    |  not null   |
| path               | character varying(190)   |  not null   |
| time_taken         | double precision         |  not null   |
...

We can group these requests data by path, calculate number of requests, average time taken and impact factor of each path. Since we are considering response time and traffic, impact factor will be product of average response time and number of requests for that path.

library> SELECT
     s.*, round((s.avg_time * s.count)/max(s.avg_time*s.count) over ()::NUMERIC,2) as impact
 FROM
     (select path, round(avg(time_taken)::numeric,2) as avg_time, count(path) as count from silk_request group by PATH)
     s
 ORDER BY impact DESC;

+-------------------------+------------+---------+----------+
| path                    | avg_time   | count   | impact   |
|-------------------------+------------+---------+----------|
| /point/book/book/       | 239.90     | 1400    | 1.00     |
| /point/book/data/       | 94.81      | 1900    | 0.54     |
| /point/                 | 152.49     | 900     | 0.41     |
| /point/login/           | 307.03     | 400     | 0.37     |
| /                       | 106.51     | 1000    | 0.32     |
| /point/auth/user/       | 494.11     | 200     | 0.29     |
...

We can see /point/book/book/ has highest impact even though it is neighter most visited nor slowest view. Optimizing this view first yields in overall better performance of webapp.

Conclusion

In this article, we learnt how to profile django webapp and identify bottlenecks to improve performance. In the next article, we wil learn how to optimize these bottlenecks by taking an in-depth look at them.

Comments

Archive Million Pages With wget In Minutes


Introduction

webrecorder, heritrix, nutch, scrapy, colly, frontera are popular tools for large scale web crawling and archiving.

These tools require some learning curve and some of them don't have inbuilt support for warc(Web ARChive) output format.

wget comes bundled with most *nix systems and has inbuilt support for warc output. In this article we will see how to quickly archive web pages with wget.

Archiving with wget

In previous article we have extracted a superset of top 1 million domains. We can use that list or urls to archive. Save this list to a file called urls.txt.

This can be archived with the following command.

file=urls.txt
wget -i $file --warc-file=$file -t 3 --timeout=4 -q -o /dev/null -O /dev/null

wget has the ability to continue partially downloaded files. But this option won't work with warc output. So, it is better to split this list into small chunks and process them. One added advantage of this approach is we can parallely download multiple chunks with wget.

mkdir -p chunks
split -l 1000 urls.txt chunks/ -d --additional-suffix=.txt -a 3

This will split the file into several chunks each containing 1000 urls. wget doesn't have multithreading support. We can write a for loop to schedule a seperate process for each chunk.

for file in `ls -r chunks/*.txt`
do
   wget -i $file --warc-file=$file -t 3 --timeout=4 -q -o /dev/null -O /dev/null &
done

To archive 1000 urls, it takes ~15 minutes. In less than 20 minutes, it will download entire million pages.

Also, each process takes ~8MB of memory. To run 1000 process, a system needs 8GB+ memory. Otherwise, number of parallel processes should be reduced which increases overall run time.

Each archive chunk will be ~150MB and consume lot of storage. All downloaded acrhives can be zipped to reduce storage.

gzip *.warc

Here is an idempotent shell script to download and archive files in batches.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#! /bin/sh

set -x

batch=1000
size=`expr ${#batch} - 1`
maxproc=50
file=urls.txt
dir=$HOME'/projects/chunks'$batch


mkdir -p $dir
split -l $batch $file $dir'/' -d --additional-suffix=.txt -a $size
sleep 1

useragent='Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'


for file in `ls -r $dir/*.txt`
do
    warcfile=$file'.warc'
    warczip=$warcfile'.gz'
    if [ -f $warczip ] || [ -f $warcfile ]; then
        continue
    fi

    if [ $(pgrep wget -c) -lt $maxproc ]; then
        echo $file
        wget -H "user-agent: $useragent" -i $file --warc-file=$file -t 3 --timeout=4 -q -o /dev/null -O /dev/null &
        sleep 2
    else
        sleep 300
        for filename in `find $dir -name '*.warc' -mmin +5`
        do
            gzip $filename -9
        done
    fi
done

Conclusion

In this article, we have seen how to archive million pages with wget in few minutes.

wget2 has multithreading support and it might have warc output soon. With that, archiving with wget becomes much easier.

Comments

Comparision Of Alexa, Majestic & Domcop Top Million Sites


Introduction

Alexa, Majestic & Domcop(based on CommonCrawl data) provide top 1 million popular websites based on their analytics. In this article we will download this data and compare them using Linux command line tools.

Collecting data

Lets download data from above sources and extract domain names. The data format is different for each source. We can use awk tool to extract domains column from the source. After extracting data, sort it and save it to a file.

# alexa

$ wget http://s3.amazonaws.com/alexa-static/top-1m.csv.zip

$ unzip top-1m.csv.zip

# data sorted by ranking
$ head -n 5 top-1m.csv
1,google.com
2,youtube.com
3,facebook.com
4,baidu.com
5,wikipedia.org

$ awk -F "," '{print $2}' top-1m.csv | sort > alexa

# domains after sorting alphabetically
$ head -n 5 alexa
00000.life
00-000.pl
00004.tel
00008888.tumblr.com
0002rick.tumblr.com
# Domcop

$ wget https://www.domcop.com/files/top/top10milliondomains.csv.zip

$ unzip top10milliondomains.csv.zip

# data sorted by ranking
$ head -n 5 top10milliondomains.csv
"Rank","Domain","Open Page Rank"
"1","fonts.googleapis.com","10.00"
"2","facebook.com","10.00"
"3","youtube.com","10.00"
"4","twitter.com","10.00"

$ awk -F "\"*,\"*" '{if(NR>1)print $2}' top10milliondomains.csv.zip | sort > domcop

# domains after sorting alphabetically
$ head -n 5 domcop
00000000b.com
000000book.com
0000180.fortunecity.com
000139418.wixsite.com
000fashions.blogspot.com
# Majestic

$ wget http://downloads.majestic.com/majestic_million.csv

# data sorted by ranking
$ head -n 5 majestic_million.csv
GlobalRank,TldRank,Domain,TLD,RefSubNets,RefIPs,IDN_Domain,IDN_TLD,PrevGlobalRank,PrevTldRank,PrevRefSubNets,PrevRefIPs
1,1,google.com,com,474277,3016409,google.com,com,1,1,474577,3012875
2,2,facebook.com,com,462854,3093315,facebook.com,com,2,2,462860,3090006
3,3,youtube.com,com,422434,2504924,youtube.com,com,3,3,422377,2501555
4,4,twitter.com,com,412950,2497935,twitter.com,com,4,4,413220,2495261

$ awk -F "\"*,\"*" '{if(NR>1)print $2}' majestic_million.csv | sort > majestic

# domains after sorting alphabetically
$ head -n 5 majestic
00000.xn--p1ai
0000666.com
0000.jp
0000www.com
0000.xn--p1ai

Comparing Data

We have collected and extracted domains from above sources. Lets compare the domains to see how similar they are using comm.

$ comm -123 alexa domcop --total
871851  871851  128149  total

$ comm -123 alexa majestic --total
788454  788454  211546  total

$ comm -123 domcop majestic --total
784388  784388  215612  total
$ comm -12 alexa domcop | comm -123 - majestic --total
31314   903165  96835   total

So, only 96,835(9.6%) domains are common between all the datasets and the overlap between any two sources is ~20%. Here is a venn diagram showing the overlap between them.

Conclusion

We have collected data from alexa, domcorp & majestic, extracted domains from it and observed that there is only a small overlap between them.

Comments

Setup Continous Deployment For Python Chalice


Outline

Chalice is a microframework developed by Amazon for quickly creating and deploying serverless applications in Python.

In this article, we will see how to setup continous deployment with GitHub and AWS CodePipeline.

CD Setup

Chalice provides cli command deploy to deploy from local system.

Chalice also provides cli command generate-pipeline command to generate CloudFormation template. This template is useful to automatically generate several resources required for AWS pipeline.

This by default uses CodeCommit repository for hosting code. We can use GitHub repo as a source instead of CodeCommit.

Chalice by default provides a build file to package code and push it to S3. In the deploy step, it uses this artifact to deploy the code.

We can use a custom buildpsec file to directly deploy the code from build step.

version: 0.1

phases:
  install:
    commands:
      - echo Entering the install phase
      - echo Installing dependencies
      - sudo pip install --upgrade awscli
      - aws --version
      - sudo pip install chalice
      - sudo pip install -r requirements.txt

  build:
    commands:
      - echo entered the build phase
      - echo Build started on `date`
      - chalice deploy --stage staging

This buildspec file install requirements and deploys chalice app to staging. We can add one more build step to deploy it production after manual intervention.

Conclusion

We have seen how to setup continous deployment for chalice application with GitHub and AWS CodePipeline.

Comments

Parsing & Transforming mitmproxy Request Flows


mitmproxy is a free and open source interactive HTTPS proxy. It provides command-line interface, web interface and Python API for interaction and customizing it for our needs.

mitmproxy provides an option to export web request flows to curl/httpie/raw formats. From mitmproxy, we can press e(export) and then we can select format for exporting.

Exporting multiple requests with this interface becomes tedious. Instead we can save all requests to a file and write a python script to export them.

Start mitmproxy with this command so that all request flows are appended to requests.mitm file for later use.

$ mitmproxy -w +requests.mitm

Here is a python script to parse this dump file and print request URLs.

from mitmproxy.io import FlowReader


filename = 'requests.mitm'

with open(filename, 'rb') as fp:
    reader = FlowReader(fp)

    for flow in reader.stream():
        print(flow.request.url)

flow.request object has more attributes to provide information about the request.

In [31]: dir(flow.request)
Out[31]:
[...
 'host',
 'host_header',
 'http_version',
 'method',
 'multipart_form',
 'path',
 'raw_content',
 ...
 'wrap']

We can use the mitmproxy export utilities to transform mitm flows to other formats.

In [32]: flow = next(reader.stream())

In [33]: from mitmproxy.addons import export

In [34]: export.curl_command(flow)
Out[34]: "curl -H 'Host:mitm.it' -H 'Proxy-Connection:keep-alive' -H 'User-Agent:Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36' -H 'DNT:1' -H 'Accept:image/webp,image/apng,image/*,*/*;q=0.8' -H 'Referer:http://mitm.it/' -H 'Accept-Encoding:gzip, deflate' -H 'Accept-Language:en-US,en;q=0.9,ms;q=0.8,te;q=0.7' -H 'content-length:0' 'http://mitm.it/favicon.ico'"

In [35]: export.raw(flow)
Out[35]: b'GET /favicon.ico HTTP/1.1\r\nHost: mitm.it\r\nProxy-Connection: keep-alive\r\nUser-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36\r\nDNT: 1\r\nAccept: image/webp,image/apng,image/*,*/*;q=0.8\r\nReferer: http://mitm.it/\r\nAccept-Encoding: gzip, deflate\r\nAccept-Language: en-US,en;q=0.9,ms;q=0.8,te;q=0.7\r\n\r\n'

In [36]: export.httpie_command(flow)
Out[36]: "http GET http://mitm.it/favicon.ico 'Host:mitm.it' 'Proxy-Connection:keep-alive' 'User-Agent:Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36' 'DNT:1' 'Accept:image/webp,image/apng,image/*,*/*;q=0.8' 'Referer:http://mitm.it/' 'Accept-Encoding:gzip, deflate' 'Accept-Language:en-US,en;q=0.9,ms;q=0.8,te;q=0.7' 'content-length:0'"

With these utilities we can transform mitmproxy request flow to curl command or any other custom form to fit our needs.

Comments

Linux Performance Analysis In Less Than 10 Seconds


If you are using a Linux System or managing a Linux server, you might come across a situation where a process is taking too long to complete. In this article we will see how to track down such performance issues in Linux.

Netflix TechBlog has an article on how to anlyze Linux performance in 60 seconds. This article provides 10+ tools to use in order to see the resource usage and pinpoint the bottleneck.

It is strenuous to remember all those tools/options and laborious to run all those commands when working on multiple systems.

Instead, we can use atop, a tool for one stop solution for performance analysis. Here is a comparision of atop with other tools from LWN.

atop shows live & historical data measurement at system level as well as process level. To get the glimpse of system resource(CPU, memory, network, disk) usage install and run atop with

$ sudo apt install --yes atop

$ atop

By default, atop shows resources used in the last interval only and sorts them by CPU usage. We can use

$ atop -A -f 4

-A sorts the processes automatically in the order of the most busy system resource.

-f shows both active as well as inactive system resources in the ouput.

4 sets refresh interval to 4 seconds.

Just by looking at the output of atop, we get a glimpse of overall system resource usage as well as individual processes resource usage.

Comments

Django Tips & Tricks #10 - Log SQL Queries To Console


Django ORM makes easy to interact with database. To understand what is happening behing the scenes or to see SQL performance, we can log all the SQL queries that be being executed. In this article, we will see various ways to achieve this.

Using debug-toolbar

Django debug toolbar provides panels to show debug information about requests. It has SQL panel which shows all executed SQL queries and time taken for them.

When building REST APIs or micro services where django templating engine is not used, this method won't work. In these situations, we have to log SQL queries to console.

Using django-extensions

Django-extensions provides lot of utilities for productive development. For runserver_plus and shell_plus commands, it accepts and optional --print-sql argument, which prints all the SQL queries that are being executed.

./manage.py runserver_plus --print-sql
./manage.py shell_plus --print-sql

Whenever an SQL query gets executed, it prints the query and time taken for it in console.

In [42]: User.objects.filter(is_staff=True)
Out[42]: SELECT "auth_user"."id",
       "auth_user"."password",
       "auth_user"."last_login",
       "auth_user"."is_superuser",
       "auth_user"."username",
       "auth_user"."first_name",
       "auth_user"."last_name",
       "auth_user"."email",
       "auth_user"."is_staff",
       "auth_user"."is_active",
       "auth_user"."date_joined"
  FROM "auth_user"
 WHERE "auth_user"."is_staff" = true
 LIMIT 21


Execution time: 0.002107s [Database: default]

<QuerySet [<User: anand>, <User: chillar>]>

Using django-querycount

Django-querycount provides a middleware to show SQL query count and show duplicate queries on console.

|------|-----------|----------|----------|----------|------------|
| Type | Database  |   Reads  |  Writes  |  Totals  | Duplicates |
|------|-----------|----------|----------|----------|------------|
| RESP |  default  |    3     |    0     |    3     |     1      |
|------|-----------|----------|----------|----------|------------|
Total queries: 3 in 1.7738s


Repeated 1 times.
SELECT "django_session"."session_key",
"django_session"."session_data", "django_session"."expire_date" FROM
"django_session" WHERE ("django_session"."session_key" =
'dummy_key AND "django_session"."expire_date"
> '2018-05-31T09:38:56.369469+00:00'::timestamptz)

This package provides additional settings to customize output.

Django logging

Instead of using any 3rd party package, we can use django.db.backends logger to print all the SQL queries.

Add django.db.backends to loggers list and set log level and handlers.

    'loggers': {
        'django.db.backends': {
            'level': 'DEBUG',
            'handlers': ['console', ],
        },

In runserver console, we can see all SQL queries that are being executed.

(0.001) SELECT "django_admin_log"."id", "django_admin_log"."action_time", "django_admin_log"."user_id", "django_admin_log"."content_type_id", "django_admin_log"."object_id", "django_admin_log"."object_repr", "django_admin_log"."action_flag", "django_admin_log"."change_message", "auth_user"."id", "auth_user"."password", "auth_user"."last_login", "auth_user"."is_superuser", "auth_user"."username", "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", "auth_user"."is_staff", "auth_user"."is_active", "auth_user"."date_joined", "django_content_type"."id", "django_content_type"."app_label", "django_content_type"."model" FROM "django_admin_log" INNER JOIN "auth_user" ON ("django_admin_log"."user_id" = "auth_user"."id") LEFT OUTER JOIN "django_content_type" ON ("django_admin_log"."content_type_id" = "django_content_type"."id") WHERE "django_admin_log"."user_id" = 4 ORDER BY "django_admin_log"."action_time" DESC LIMIT 10; args=(4,)
[2018/06/03 15:06:59] HTTP GET /admin/ 200 [1.69, 127.0.0.1:47734]

These are few ways to log all SQL queries to console. We can also write a custom middleware for better logging of these queries and get some insights.

Comments

How To Deploy Django Channels To Production


In this article, we will see how to deploy django channels to production and how we can scale it to handle more load. We will be using nginx as proxy server, daphne as ASGI server, gunicorn as WSGI server and redis for channel back-end.

Daphne can serve HTTP requests as well as WebSocket requests. For stability and performance, we will use uwsgi/gunicorn to serve HTTP requests and daphne to serve websocket requests.

We will be using systemd to create and manage processes instead of depending on third party process managers like supervisor or circus. We will be using ansible for managing deployments. If you don't want to use ansible, you can just replace template variables in the following files with actual values.

Nginx Setup

Nginx will be routing requests to WSGI server and ASGI server based on URL. Here is nginx configuration for server.

server {
    listen {{ server_name }}:80;
    server_name {{ server_name }} www.{{ server_name }};

    return 301 https://avilpage.com$request_uri;
}


server {
    listen {{ server_name }}:443 ssl;
    server_name {{ server_name }} www.{{ server_name }};

    ssl_certificate     /root/certs/avilpage.com.chain.crt;
    ssl_certificate_key /root/certs/avilpage.com.key;

    access_log /var/log/nginx/avilpage.com.access.log;
    error_log /var/log/nginx/avilpage.com.error.log;

    location / {
            proxy_pass http://0.0.0.0:8000;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header Host $http_host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_redirect off;
    }

    location /ws/ {
            proxy_pass http://0.0.0.0:9000;
            proxy_http_version 1.1;

            proxy_read_timeout 86400;
            proxy_redirect     off;

            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection "upgrade";
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Host $server_name;
    }

    location /static {
        alias {{ project_root }}/static;
    }

    location  /favicon.ico {
        alias {{ project_root }}//static/img/favicon.ico;
    }

    location  /robots.txt {
        alias {{ project_root }}/static/txt/robots.txt;
    }

}

WSGI Server Setup

We will use gunicorn for wsgi server. We can run gunicorn with

$ gunicorn avilpage.wsgi --bind 0.0.0.0:8000 --log-level error --log-file=- --settings avilpage.production_settings

We can create a systemd unit file to make it as a service.

[Unit]
Description=gunicorn
After=network.target


[Service]
PIDFile=/run/gunicorn/pid
User=root
Group=root
WorkingDirectory={{ project_root }}
Environment="DJANGO_SETTINGS_MODULE={{ project_name }}.production_settings"
ExecStart={{ venv_bin }}/gunicorn {{ project_name}}.wsgi --bind 0.0.0.0:8000 --log-level error --log-file=- --workers 5 --preload


ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s TERM $MAINPID
Restart=on-abort
PrivateTmp=true


[Install]
WantedBy=multi-user.target

Whenever server restarts, systemd will automatically start gunicorn service. We can also restart gunicorn manually with

$ sudo service gunicorn restart

ASGI Server Setup

We will use daphne for ASGI server and it can be started with

$ daphne avilpage.asgi:application --bind 0.0.0.0 --port 9000 --verbosity 1

We can create a systemd unit file like the previous one to create a service.

[Unit]
Description=daphne daemon
After=network.target


[Service]
PIDFile=/run/daphne/pid
User=root
Group=root
WorkingDirectory={{ project_root }}
Environment="DJANGO_SETTINGS_MODULE={{ project_name }}.production_settings"
ExecStart={{ venv_bin }}/daphne --bind 0.0.0.0 --port 9000 --verbosity 0 {{project_name}}.asgi:application
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s TERM $MAINPID
Restart=on-abort
PrivateTmp=true


[Install]
WantedBy=multi-user.target

Deployment

Here is an ansible playbook which is used to deploy these config files to our server. To run the playbook on server avilpage.com, execute

$ ansible-playbook -i avilpage.com, django_setup.yml

Scaling

Now that we have deployed channels to production, we can do performance test to see how our server performs under load.

For WebSockets, we can use Thor to run performance test.

thor -C 100 -A 1000 wss://avilpage.com/ws/books/

Our server is able to handle 100 requests per second with a latency of 800ms. This is good enough for low traffic website.

To improve performance, we can use unix sockets instead of rip/port for gunicorn and daphne. Also, daphne has support for multiprocessing using shared file descriptors. Unfortunately, it doesn't work as expected. As mentioned here, we can use systemd templates and spawn multiple daphne process.

An alternate way is to use uvicorn to start multiple workers. Install uvicorn using pip

$ pip install uvicorn

Start uvicorn ASGI server with

$ uvicorn avilpage.asgi --log-level critical --workers 4

This will spin up 4 workers which should be able to handle more load. If this performance is not sufficient, we have to setup a load balancer and spin up multiple servers(just like scaling any other web application).

Comments