Computer Vision Experiments I

For the 2008-Oct-22 Mindshare, I built a piece of interactive art called the Bubble Thing.  I mounted a projector on a light tree, aimed at the ground.  The image that the projector created on the ground was of a simulation of a bunch of bubbles drifting around the screen.

People interacted with the device by “popping” the bubbles with their shadows.  If someone put their hand in between the projector and the ground, they would produce a shadow on the ground which was captured by a camera.

I didn’t photograph this project particularly well:
IMG_1761 IMG_1760

here’s the text that accompanied the presentation.

The computer vision component of this piece is implemented in C++ using the OpenCV library.

There is a camera mounted half-way up the stand, pointed at the projected image.  The software first corrects for the perspective difference between the camera and the projector.  Before the event, I calibrated the camera by projecting targets on the ground, and clicking on them in the captured image.  The resulting corrected image has the appearance of having been captured by a camera directly overhead.

For every captured frame, the software separates the dark pixels (shadow) from light pixels (everything else) into two bins.  This is known as thresholding.  On the small screen, we’re visualizing the two bins: shadows are shown in black and everything else is white.

We then perform a contour extraction.  The contour extraction makes a scalable vector image of the outline of the dark shadows in the thresholded image.  The extracted contours are represented by the green outlines.  We’ve drawn a green line between adjacent points in the contour.  The upshot of the contour extraction is that now separate connected contours probably represent different popping instruments (we’re hoping “arms and fingers”.)  We can now analyze each arm contour to determine which of its constituent points represents the tip of the finger.

We assume that each arm enters the screen through one of the four edges (left, top, right, bottom).  By examining the bounding rectangle of the extracted contour, we can determine which edge of the screen that is.  Once we’ve identified the edge, we make an assumption: your finger is outstretched, and is farther from your shoulder than any other part of your arm as seen in the image.  Therefore, we find the part of each contour that is farthest from the edge of the screen and store it in an array.

Its important to note that this is only one of many possible algorithms for determining the position of multiple fingers on the screen.  A more robust algorithm might seek to contours that match a particular profile (a finger), or by fitting lines or curves to the shadows themselves.

This array of points is sent via UDP to the bubble world which is implemented in Python using PyGame.

The bubble world is running a very simple first order simulation of bouncing particles.  It will occasionally reinject bubbles into the world as necessary.  When it receives the array of finger positions from the computer vision component, it compares each finger position to each of the bubbles.  If the finger is found to lie inside the radius of the bubble, the bubble pops.

The overhead lights are DMX controlled, using a ENTTEC USB-to-DMX controller.  The module to set the lighting color is also written in Python, and is actuated from inside the bubble world program.

Have fun popping bubbles.  The current record for bubbles popped in a single evening is held by my cat, who has been in cat heaven for the last couple days as I developed and tested this system at home.

And here are some gratuitous cat pictures.
IMG_1767 IMG_1766