read

I'm going to run through a simple example in order to demonstrate how to profile and optimize code without worrying about other complicated stuff happening.

Let's say we are doing some machine learning algorithm that needs to find the distance between two vectors many, many times. This can happen often, for example in a k-nearest-neighbor algorithm. Doing this distance calculating repeatedly in python will likely be slow. Thus, this lets us get a feel for writing something that will likely be inefficient at first.

Naively we might write a distance function like such:

import math 

def euclid_dist(vec1, vec2):
    if len(vec1) != len(vec2):
        raise Exception("lengths of vectors must match for distance calculation")
    dist = 0
    for i in range(len(vec1)):
        dist = dist + (vec1[i]-vec2[i])**2
    return math.sqrt(dist)

There is nothing wrong with this. Let's benchmark and see what happens.

import math
import random

def euclid_dist(vec1, vec2):
    if len(vec1) != len(vec2):
        raise Exception("lengths of vectors must match for distance")
    dist = 0
    for i in range(len(vec1)-1):
        if vec1[i] != -1 and vec2[i] != -1:
            dist = dist + (vec1[i]-vec2[i])**2
    return math.sqrt(dist)


def all_pairs_distance(DATABASE):
    for vec1 in DATABASE:
        for vec2 in DATABASE:
            euclid_dist(vec1, vec2)

def ntimes(f, ntimes, *args, **kwargs):
    for _ in range(0,ntimes):
        f(*args, **kwargs)


def make_database():
    DATABASE = [[random.random() for x in range(5000)] for x in range(50)]
    return DATABASE

def main():
    DATABASE = make_database()
    ntimes(all_pairs_distance, 5, DATABASE)

if __name__ == '__main__':
    main()

We simply make a database of 50 random vectors each of length 10000 and compute the pairwise distance between all of them[including themselves] 5 times.

Now to profile [note I chopped off the tail end as there are lots of calls that take up very little time]:

python -m cProfile -s cumtime profileit.py 
         325111 function calls in 38.041 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.003    0.003   38.041   38.041 profileit.py:1(<module>)
        1    0.000    0.000   38.037   38.037 profileit.py:44(main)
        1    0.000    0.000   37.906   37.906 profileit.py:22(ntimes)
        5    0.008    0.002   37.906    7.581 profileit.py:17(all_pairs_distance)
    12500   37.506    0.003   37.898    0.003 profileit.py:7(euclid_dist)
    12552    0.385    0.000    0.385    0.000 {range}
        1    0.095    0.095    0.131    0.131 profileit.py:38(make_database)
   250000    0.034    0.000    0.034    0.000 {method 'random' of '_random.Random' objects}
    37500    0.005    0.000    0.005    0.000 {len}
    12501    0.004    0.000    0.004    0.000 {math.sqrt}
        1    0.000    0.000    0.001    0.001 random.py:40(<module>)
        1    0.001    0.001    0.001    0.001 hashlib.py:55(<module>)

Not surprising of the 38.041 seconds it takes to run the program 37.898 seconds are spent in the euclidean distance function. So while in this situation its extremely obvious that the distance function is the bottleneck, in other situations it won't be! Profiling lets us instantly see where to focus our attention.

Great, so we know the distance function is slow, is there a way to speed it up?

That brings me to part 1: rewrite your algorithms/data structures

As we are computing the pairwise distance between every pair naively, we immediately see we are computing these distances a bunch. Why do this? Wouldn't it make more sense to cache the results after computing the distance once? Yes it would.

So we change two lines of code to cache the results of the distance function given two vectors. The nice part about this memoize decorator is that it is generic and will work on (most) functions.

from memo import memoize
import math
import random

@memoize
def euclid_dist(vec1, vec2):
    if len(vec1) != len(vec2):
        raise Exception("lengths of vectors must match for distance")
    dist = 0
    for i in range(len(vec1)-1):
        if vec1[i] != -1 and vec2[i] != -1:
            dist = dist + (vec1[i]-vec2[i])**2
    return math.sqrt(dist)


def all_pairs_distance(DATABASE):
    for vec1 in DATABASE:
        for vec2 in DATABASE:
            euclid_dist(vec1, vec2)

def ntimes(f, ntimes, *args, **kwargs):
    for _ in range(0,ntimes):
        f(*args, **kwargs)

def make_database():
    DATABASE = [[random.random() for x in range(5000)] for x in range(50)]
    return DATABASE



def main():
    DATABASE = make_database()
    ntimes(all_pairs_distance, 5, DATABASE)

if __name__ == '__main__':
    main()

So how does the magic memoize decorator work?

It stores the result of the function in a dictionary where the key is the function input and the value is the function output. If the function is pure [ie: depends only on its inputs and not external state] then this means that every time the function is called with the same inputs it should return the same outputs and thus caching the function results in a dictionary lets us immediately speed up execution at the cost of memory usage.

#memo.python

def memoize(function):
    memo = {}
    def wrapper(*args):
        if args in memo:
            return memo[args]
        else:
            rv = function(*args)
            memo[args] = rv 
            return rv 
    return wrapper 

Disclaimer: this is not entirely true.

How much does it increase execution speed? Benchmark it.

python -m cProfile -s cumtime profileit.py 
         315112 function calls in 16.168 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.002    0.002   16.168   16.168 profileit.py:1(<module>)
        1    0.000    0.000   16.165   16.165 profileit.py:45(main)
        1    0.000    0.000   16.018   16.018 profileit.py:23(ntimes)
        5    0.271    0.054   16.018    3.204 profileit.py:18(all_pairs_distance)
    12500    6.794    0.001   15.746    0.001 memo.py:33(wrapper)
     2500    8.022    0.003    8.140    0.003 profileit.py:7(euclid_dist)
    12500    0.806    0.000    0.811    0.000 memo.py:16(convert)
        1    0.103    0.103    0.148    0.148 profileit.py:39(make_database)
     2552    0.117    0.000    0.117    0.000 {range}
   250000    0.043    0.000    0.043    0.000 {method 'random' of '_random.Random' objects}
    25000    0.005    0.000    0.005    0.000 {method 'append' of 'list' objects}
     2501    0.001    0.000    0.001    0.000 {math.sqrt}
        1    0.000    0.000    0.001    0.001 random.py:40(<module>)
     7500    0.001    0.000    0.001    0.000 {len}
        1    0.001    0.001    0.001    0.001 hashlib.py:55(<module>)

Notice we've cut execution time in less than half! By caching values we don't need to go do the expensive distance computation as often and we save a bunch of time at the cost of space.

#memo.python

#is it really this easy?
def memoize(function):
    memo = {}
    def wrapper(*args):
        if args in memo:
            return memo[args]
        else:
            rv = function(*args)
            memo[args] = rv 
            return rv 
    return wrapper 

So you're probably curious what magic is happening in memo.py when I said it wasn't entirely true that the code was that simple.

The problem is that we can only hash immutable data structures. If the data structure is mutable and the data structure changes, then its hash could change resulting in a dictionary obviously not working. So we must simply convert the mutable data structures to immutable data structures. In python lists and dictionaries are immutable so we must convert them to tuples and immutable dictionaries.

Tuples are a built in python type but immutable dictionaries are not so we must create our own.

class imdict(dict):
    def __hash__(self):
        return id(self)

    def _immutable(self, *args, **kws):
        raise TypeError('object is immutable')

    __setitem__ = _immutable
    __delitem__ = _immutable
    clear       = _immutable
    update      = _immutable
    setdefault  = _immutable
    pop         = _immutable
    popitem     = _immutable

#convert arguments to be immutable
def convert(args):
    newargs = []
    for arg in args:
        t = type(arg)
        if t==list or t==dict:
            if t == list:
                imarg = tuple(arg)
            if t == dict:
                imarg = imdict(arg)
            newargs.append(imarg)
        else:
            newargs.append(arg)

    return tuple(newargs)

def memoize(function):
    memo = {}
    def wrapper(*args):
        imargs = convert(args)
        if imargs in memo:
            return memo[imargs]
        else:
            rv = function(*imargs)
            memo[imargs] = rv 
            return rv 
    return wrapper

So it is a bit uglier but the basic logic is still there. You now have a plug-in memoize decorator that is not at all tested and production ready so please don't use this in a real application

And that concludes part 1 of the series: benchmarking and rewriting the logic of the code in pure python

Next time I'll talk about using existing python c-libraries to speed up code execution.

Blog Logo

Grant Nicholas


Published