NEVERFEAR.org - OS X Screen capture from Python/PyObjC

Looking through the unanswered Python questions on StackOverflow, I found one that seemed interesting.. "Python Get Screen Pixel Value in OS X" - how to access screen pixel values, without the overhead of calling the screencapture command, then loading the resulting image.

After a bit of searching, the best supported way of grabbing a screenshot is provided by the CoreGraphics API, part of Quartz, specifically CGWindowListCreateImage.

Since CoreGraphics is a C-based API, the code map almost directly to Python function calls. It's also simplified a bit, because PyObjC handles most of the memory-management (when the wrapping Python object goes out of scope, the underlying object is freed)

Getting the image

After finding some sample iOS code with sane arguments (which can also be found via Apple's docs), I ended up with a CGImage containing the screenshot:

>>> import Quartz.CoreGraphics as CG
>>> image = CG.CGWindowListCreateImage(CG.CGRectInfinite, CG.kCGWindowListOptionOnScreenOnly, CG.kCGNullWindowID, CG.kCGWindowImageDefault)
>>> print image
<CGImage 0x106b8eff0>

Hurray. We can get the width/height of the image with help from this SO question:

>>> width = CG.CGImageGetWidth(image)
>>> height = CG.CGImageGetHeight(image)

Extracting pixel values

Then it was a case of working out how to extract the pixel, which took far longer than all of the above. The simplest way I found of doing this is:

Use CGImageGetDataProvider to get an intermediate representation of the data
Pass the DataProvider to CGDataProviderCopyData. In Python this returns a string, which is really a byte-array containing 8-bit unsigned chars, suitable for unpacking with the handy struct module
Calculate the correct offset for a given (x,y) coordinate as described here

Like so:

>>> prov = CG.CGImageGetDataProvider(image)
>>> data = CG.CGDataProviderCopyData(prov)
>>> print prov
<CGDataProvider 0x7fc19b1022f0>
>>> print type(data)
<objective-c class __NSCFData at 0x7fff78073cf8>

..and calculate the offset

>>> x, y = 100, 200 # pixel coordinate to get value for
>>> offset = 4 * ((width*int(round(y))) + int(round(x)))
>>> print offset
1344400

Finally, we can unpack the pixels at that offset with struct.unpack_from - B is an unsigned char:

>>> b, g, r, a = struct.unpack_from("BBBB", data, offset=offset)
>>> print (r, g, b, a)
(23, 23, 23, 255)

Note that the values are stores as BGRA (not RGBA).

Verification, and code

To verify this wasn't generating nonsense values, I used the nice and simple pngcanvas to write the screenshot to a PNG file (pngcanvas is a useful module because it's pure-Python, and a single self-contained .py file - much lighter weight than something like the PIL, good for when you just want to write pixels to an image-file)

The performance was definitely better than the screencapture solution. The screencapture command took about 80ms to write a TIFF file, then there would be additional time to open and parse the TIFF file in Python. The PyObjC code takes about 70ms to take the screenshot and have the values accessible to Python.

Finally, the result - best to view the code on my StackOverflow answer (as there might be other better answers, or edits to the code)

I'll include the code here too, for completeness sake:

import time
import struct
 
import Quartz.CoreGraphics as CG
 
 
class ScreenPixel(object):
    """Captures the screen using CoreGraphics, and provides access to
    the pixel values.
    """
 
    def capture(self, region = None):
        """region should be a CGRect, something like:
 
        >>> import Quartz.CoreGraphics as CG
        >>> region = CG.CGRectMake(0, 0, 100, 100)
        >>> sp = ScreenPixel()
        >>> sp.capture(region=region)
 
        The default region is CG.CGRectInfinite (captures the full screen)
        """
 
        if region is None:
            region = CG.CGRectInfinite
        else:
            # TODO: Odd widths cause the image to warp. This is likely
            # caused by offset calculation in ScreenPixel.pixel, and
            # could could modified to allow odd-widths
            if region.size.width % 2 > 0:
                emsg = "Capture region width should be even (was %s)" % (
                    region.size.width)
                raise ValueError(emsg)
 
        # Create screenshot as CGImage
        image = CG.CGWindowListCreateImage(
            region,
            CG.kCGWindowListOptionOnScreenOnly,
            CG.kCGNullWindowID,
            CG.kCGWindowImageDefault)
 
        # Intermediate step, get pixel data as CGDataProvider
        prov = CG.CGImageGetDataProvider(image)
 
        # Copy data out of CGDataProvider, becomes string of bytes
        self._data = CG.CGDataProviderCopyData(prov)
 
        # Get width/height of image
        self.width = CG.CGImageGetWidth(image)
        self.height = CG.CGImageGetHeight(image)
 
    def pixel(self, x, y):
        """Get pixel value at given (x,y) screen coordinates
 
        Must call capture first.
        """
 
        # Pixel data is unsigned char (8bit unsigned integer),
        # and there are for (blue,green,red,alpha)
        data_format = "BBBB"
 
        # Calculate offset, based on
        # http://www.markj.net/iphone-uiimage-pixel-color/
        offset = 4 * ((self.width*int(round(y))) + int(round(x)))
 
        # Unpack data from string into Python'y integers
        b, g, r, a = struct.unpack_from(data_format, self._data, offset=offset)
 
        # Return BGRA as RGBA
        return (r, g, b, a)
 
 
if __name__ == '__main__':
    # Timer helper-function
    import contextlib
 
    @contextlib.contextmanager
    def timer(msg):
        start = time.time()
        yield
        end = time.time()
        print "%s: %.02fms" % (msg, (end-start)*1000)
 
 
    # Example usage
    sp = ScreenPixel()
 
    with timer("Capture"):
        # Take screenshot (takes about 70ms for me)
        sp.capture()
 
    with timer("Query"):
        # Get pixel value (takes about 0.01ms)
        print sp.width, sp.height
        print sp.pixel(0, 0)
 
 
    # To verify screen-cap code is correct, save all pixels to PNG,
    # using http://the.taoofmac.com/space/projects/PNGCanvas
 
    from pngcanvas import PNGCanvas
    c = PNGCanvas(sp.width, sp.height)
    for x in range(sp.width):
        for y in range(sp.height):
            c.point(x, y, color = sp.pixel(x, y))
 
    with open("test.png", "wb") as f:
        f.write(c.dump())