2018

Reliable Way To Test External APIs Without Mocking


Let us write a function which retrieves user information from GitHub API.

import requests


def get_github_user_info(username):
    url = f'https://api.github.com/users/{username}'
    response = requests.get(url)
    if response.ok:
        return response.json()
    else:
        return None

To test this function, we can write a test case to call the external API and check if it is returning valid data.

def test_get_github_user_info():
    username = 'ChillarAnand'
    info = get_github_user_info(username)
    assert info is not None
    assert username == info['login']

Even though this test case is reliable, this won't be efficient when we have many APIs to test as it sends unwanted requests to external API and makes tests slower due to I/O.

A widely used solution to avoid external API calls is mocking. Instead of getting the response from external API, use a mock object which returns similar data.

from unittest import mock


def test_get_github_user_info_with_mock():
    with mock.patch('requests.get') as mock_get:
        username = 'ChillarAnand'

        mock_get.return_value.ok = True
        json_response = {"login": username}
        mock_get.return_value.json.return_value = json_response

        info = get_github_user_info(username)

        assert info is not None
        assert username == info['login']

This solves above problems but creates additional problems.

  • Unreliable. Even though test cases pass, we are not sure if API is up and is returning a valid response.
  • Maintenance. We need to ensure mock responses are up to date with API.

To avoid this, we can cache the responses using requests-cache.

import requests_cache

requests_cache.install_cache('github_cache')


def test_get_github_user_info_without_mock():
    username = 'ChillarAnand'
    info = get_github_user_info(username)
    assert info is not None
    assert username == info['login']

When running tests from developer machine, it will call the API for the first time and uses the cached response for subsequent API calls. On CI pipeline, it will hit the external API as there won't be any cache.

When the response from external API changes, we need to invalidate the cache. Even if we miss cache invalidation, test cases will fail in CI pipeline before going into production.

Convert Browser Requests To Python Requests For Scraping


Scraping content behind a login page is bit difficult as there are wide variety of authentication mechanisms and web server needs correct headers, session, cookies to authenticate the request.

If we need a crawler which runs everyday to scrape content, then we have to implement authentication mechanism. If we need to quickly scrape content just for once, implementing authentication is an overhead.

Instead, we can manually login to the website, capture an authenticated request and use it for scraping other pages by changing url/form parameters.

From browser developer options, we can capture curl equivalent command for any request from Network tab with copy as cURL option.

Here is one such request.

curl 'http://avilpage.com/dummy' -H 'Cookie: ASPSESSIONIDSABAAQDA=FKOHHAGAFODIIGNNNDFKNGLM' -H 'Origin: http://avilpage.com' -H 'Accept-Encoding: gzip, deflate' -H 'Accept-Language: en-US,en;q=0.9,ms;q=0.8,te;q=0.7' -H 'Upgrade-Insecure-Requests: 1' -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36' -H 'Content-Type: application/x-www-form-urlencoded' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8' -H 'Cache-Control: max-age=0' -H 'Referer: http://avilpage.com/' -H 'Connection: keep-alive' -H 'DNT: 1' --data 'page=2&category=python' --compressed

Once we get curl command, we can directly convert it to python requests using uncurl.

$ pip install uncurl

Since the copied curl request is in clipboard, we can pipe it to uncurl.

$ clipit -c | uncurl

requests.post("http://avilpage.com/dummy",
    data='page=2&category=python',
    headers={
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
        "Accept-Encoding": "gzip, deflate",
        "Accept-Language": "en-US,en;q=0.9,ms;q=0.8,te;q=0.7",
        "Cache-Control": "max-age=0",
        "Content-Type": "application/x-www-form-urlencoded",
        "Origin": "http://avilpage.com",
        "Referer": "http://avilpage.com/",
        "Upgrade-Insecure-Requests": "1",
        "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36"
    },
    cookies={
        "ASPSESSIONIDSABAAQDA": "FKOHHAGAFODIIGNNNDFKNGLM"
    },
)

If we have to use some other programming language, we can use curlconverter to convert curl command to Go or Node.js equivalent code.

Now, we can use this code to get contents of current page and then continue scraping from the urls in it.

Running Django Web Apps On Android Devices


When deploying a django webapp to Linux servers, Nginx/Apache as server, PostgreSQL/MySQL as database are preferred. For this tutorial, we will be using django development server with SQLite database.

First install SSHDroid app on Android. It will start ssh server on port 2222. If android phone is rooted, we can run ssh on port 22.

Now install QPython. This comes bundled with pip, which will install required python packages.

Instead of installing these two apps, we can use Termux, GNURoot Debian or some other app which provides Linux environment in Android. These apps will provide apt package manager, which can install python and openssh-server packages.

I have used django-bookmarks, a simple CRUD app to test this setup. We can use rsync or adb shell to copy django project to android.

rsync -razP django-bookmarks :$USER@$HOST:/data/local/

Now ssh into android, install django and start django server.

$ ssh -v $USER@$HOST
$ python -m pip install django
$ cd /data/local/django-bookmarks
$ python manage.py runvserver

This will start development server on port 8000. To share this webapp with others, we will expose it with serveo.

$ ssh -R 80:localhost:8000 serveo.net

Forwarding HTTP traffic from https://incepro.serveo.net
Press g to start a GUI session and ctrl-c to quit.

Now we can share our django app with anyone.

I have used Moto G4 Plus phone to run this app. I have done a quick load test with Apache Bench.

ab -k -c 50 -n 1000  \
-H "Accept-Encoding: gzip, deflate" \
http://incepro.serveo/list/

It is able to server 15+ requests concurrently with an average response time of 800ms.

We can write a simple shell script or ansible playbook to automate this deployment process and we can host a low traffic website on an android phone if required.

Load Testing Celery With Different Brokers


Celery is mainly used to offload work from request/response cycle in web applications and to build pipelines in data processing applications. Lets run a load test on celery to see how well it queues the tasks with various brokers.

Let us take a simple add task and measure queueing time.

import timeit

from celery import Celery

broker = 'memory://'


app = Celery(broker=broker)


@app.task
def add(x, y):
    return x + y


tasks = 1000
start_time = timeit.default_timer()
results = [add.delay(1, 2) for i in range(tasks)]
duration = timeit.default_timer() - start_time
rate = tasks//duration
print("{} tasks/sec".format(str(rate))

On development machine, with AMD A4-5000 CPU, queueing time is as follows

  • memory ---> 400 tasks/sec
  • rabbitmq ---> 300 tasks/sec
  • redis ---> 250 tasks/sec
  • postgres ---> 30 tasks/sec

On production machine, with Intel(R) Xeon(R) CPU E5-2676, queueing time is as follows

  • memory ---> 2000 tasks/sec
  • rabbitmq ---> 1400 tasks/sec
  • redis ---> 1200 tasks/sec
  • postgres ---> 200 tasks/sec

For low/medium traffic webistes and applications, 1000 tasks/second should be fine. For high traffic webistes, there will be multiple servers queueing up the tasks.

Incase if we need to queue the tasks at a higher rate and if we have task arguments before hand, we can chunk the tasks.

tasks = add.chunks(zip(range(1000), range(1000)), 10)

This will divide 1000 tasks into 10 groups of 100 tasks each. As there is no messaging overhead, it can queue any number of tasks in less than a second.

How To Plot Renko Charts With Python?


Renko charts are time independent and are efficient to trade as they eliminate noise. In this article we see how to plot renko charts of any instrument with OHLC data using Python.

To plot renko charts, we can choose a fixed price as brick value or calculate it based on ATR(Average True Range) of the instrument.

There are two types of Renko charts based on which bricks are calculated.

Renko chart - Price movement

First one is based on price movement. In this, we will divide the price movement of current duration by brick size to get the bricks.

Once bricks are obtained, we need to assign the brick colors based on the direction of price movement and then plot rectangles for each available brick.

import pandas as pd
from matplotlib.patches import Rectangle
import matplotlib.pyplot as plt


brick_size = 2


def plot_renko(data, brick_size):
    fig = plt.figure(1)
    fig.clf()
    axes = fig.gca()
    y_max = max(data)

    prev_num = 0

    bricks = []

    for delta in data:
        if delta > 0:
            bricks.extend([1]*delta)
        else:
            bricks.extend([-1]*abs(delta))

    for index, number in enumerate(bricks):
        if number == 1:
            facecolor='green'
        else:
            facecolor='red'

        prev_num += number

        renko = Rectangle(
            (index, prev_num * brick_size), 1, brick_size,
            facecolor=facecolor, alpha=0.5
        )
        axes.add_patch(renko)

    plt.show()


df = pd.read_csv(file)

df['cdiff'] = df['close'] - df['close'].shift(1)
df.dropna(inplace=True)
df['bricks'] = df.loc[:, ('cdiff', )] / brick_size

bricks = df[df['bricks'] != 0]['bricks'].values
plot_renko(bricks, brick_size)

Here is a sample renko chart plotted using the above code.

Renko chart - Period close

In this bricks are calculated based on the close price of the instrument. Calculation of bricks is sligtly complex compared to price movement chart. I have created a seperate package called stocktrends which has this calculation.

from stocktrends import Renko

renko = Renko(df)
renko.brick_size = 2
data = renko.get_ohlc_data()
print(data.tail())

This will give OHLC data for the renko chart. Now we can use this values to plot the charts as mentioned above.