A very common problem in software engineering is optimising out the problem of construction and destruction of objects. I don't necessarily mean objects exclusively in the context of object orientated programming. I mean objects as in data objects, which could be a large array or a struct and so on.
If you don't need objects for a long time it does seem silly to construct an object, initialise it, then destroy it and discard it. Especially as memory allocation can be expensive.
The most common approach to minimising the impact of this is object pools. An object pool is a collection of pre-constructed objects, sometimes even pre-initialised objects. When a program wants to use an object for a short period of time they can check-out an object, configure it to their needs (if necessary), use the object and then check it back into the pool.
This allows programs to borrow objects for a task without having to construct a new object every time.
I was playing around and decided to do some profiling and compare the speed differences between traditional construction and then destruction verus taking a pooled approach.
The first thing I needed was a object pool implementation.
class ObjectPool(object): init_size = number / 2 expand_by = int(init_size * 0.1) pool_class = object def __init__(self, *args, **kwargs): self.args = args self.kwargs = kwargs self.pool = [self.pool_class(*self.args, **self.kwargs) for i in xrange(self.init_size)] def checkout(self): """Take out an item""" try: item = self.pool.pop() except IndexError: self.expand(self.expand_by) item = self.pool.pop() return item def checkin(self, obj): """Return an item to the pool""" return self.pool.append(obj) def expand(self, n): """Expand the pool by n items.""" for i in xrange(n): self.pool.append( self.pool_class(*self.args, **self.kwargs) )
You'll notice that is a self expanding pool. Meaning that when all the objects are checked-out and another check-out request comes in, we increase our pools size. This gives a more natural program flow at the potential expense of using a lot of memory constantly.
Another approach is to throw an error if all objects are checked-out. This allows you to control your memory usage at the expense that your program will have to error check when it needs an object. If the pool is in full use the program could switch over to using traditional object creation. You can replace the checkout method to implement this approach like this.
def checkout(self): """Take out an item""" try: item = self.pool.pop() except IndexError: item = self.pool_class(*self.args, **self.kwargs) return item
Note that all we do here is create an independant object and throw it out. If the calling process then decided to checkin the traditionally created object then we'd end up increasing the memory usage anyway. So really you should return some sort of error condition but this is besides the point of this post.
The first class I decided to compare was the object class in Python. This should be the most primitive of classes and contain minimal memory allocation.
import timeit number = 1000000 t1 = timeit.timeit( stmt = "object()", number = number) print "Init object() %d times = %.2f usec/pass" % (number, number * t1/number) t2 = timeit.timeit( stmt = "pool.checkin(pool.checkout())", setup = "from %s import ObjectPool\npool = ObjectPool()" % __name__, number = number) print "Using pools for object() %d times = %.2f usec/pass" % (number, number * t2/number)
The results:
Init object() 1000000 times = 0.31 usec/pass Using pools for object() 1000000 times = 1.91 usec/pass
Pools didn't do us much good there. It cost us efficiency.
Next I decided to try a simple object with a single attribute.
class OneProperty(object): def __init__(self, a): self.a = a class OnePropertyPool(ObjectPool): pool_class = OneProperty t3 = timeit.timeit( stmt = "OneProperty(0)", setup = "from %s import OneProperty" % __name__, number = number) print "Init OneProperty(0) %d times = %.2f usec/pass" % (number, number * t3/number) t4 = timeit.timeit( stmt = "pool.checkin(pool.checkout())", setup = "from %s import OnePropertyPool\npool = OnePropertyPool(0)" % __name__, number = number) print "Using pools for OneProperty(0) %d times = %.2f usec/pass" % (number, number * t4/number)
The results:
Init OneProperty(0) 1000000 times = 1.45 usec/pass Using pools for OneProperty(0) 1000000 times = 1.98 usec/pass
Once again it's more inefficient to use pooling for this simple object.
Perhaps we need a bigger object that results more memory? So next I tried one with several attributes.
class ManyProperty(object): def __init__(self, a, b, c, d, e, f, g, h, i, j): self.a = a self.b = b self.c = c self.d = d self.e = e self.f = f self.g = g self.h = h self.i = i self.j = j class ManyPropertyPool(ObjectPool): pool_class = ManyProperty t5 = timeit.timeit( stmt = "ManyProperty(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)", setup = "from %s import ManyProperty" % __name__, number = number) print "Init ManyProperty(0, 1, 2, 3, 4, 5, 6, 7, 8, 9) %d times = %.2f usec/pass" % (number, number * t5/number) t6 = timeit.timeit( stmt = "pool.checkin(pool.checkout())", setup = "from %s import ManyPropertyPool\npool = ManyPropertyPool(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)" % __name__, number = number) print "Using pools for ManyPropertyPool(0, 1, 2, 3, 4, 5, 6, 7, 8, 9) %d times = %.2f usec/pass" % (number, number * t6/number)
The results:
Init ManyProperty(0, 1, 2, 3, 4, 5, 6, 7, 8, 9) 1000000 times = 4.13 usec/pass Using pools for ManyPropertyPool(0, 1, 2, 3, 4, 5, 6, 7, 8, 9) 1000000 times = 1.93 usec/pass
Finally, we've gained some significant ground by using pools. Over twice as fast using pooling.
Lastly I tried a much more expensive object to really prove pooling.
class ExpensiveInit(ManyProperty): """This one has an expensive setup.""" def __init__(self): self.data = range(300) class ExpensiveInitPool(ObjectPool): pool_class = ExpensiveInit t7 = timeit.timeit( stmt = "ExpensiveInit()", setup = "from %s import ExpensiveInit" % __name__, number = number) print "Init ExpensiveInit() %d times = %.2f usec/pass" % (number, number * t7/number) t8 = timeit.timeit( stmt = "pool.checkin(pool.checkout())", setup = "from %s import ExpensiveInitPool\npool = ExpensiveInitPool()" % __name__, number = number) print "Using pools for ExpensiveInitPool() %d times = %.2f usec/pass" % (number, number * t8/number)
The results:
Init ExpensiveInit() 1000000 times = 8.31 usec/pass Using pools for ExpensiveInitPool() 1000000 times = 2.31 usec/pass
Conclusion
Object pools can significantly improve your efficiency. After a point. For simple data objects it's probably not worth it and profiling can give you an idea of when your data object can be pooled. For very complex objects it's very likely that object pools can give you a benefit.
However you should keep in mind two things, one is the strategy you employ when you've checked-out every object in the pool.
The other is you must remember to initialise properly. Since objects are being borrowed between different components and reused you have to reset the object state. If you do not and make assumptions about the state of an object you've checked out you may get into an unexpected situation. This may add a little extra overhead but since the initialisation phase is part of object construction you'll still be more efficient than traditional construction.