The YMCA Detector — Computer Vision Division
Last week in Uruguay we celebrated a very Uruguayan holiday: ‘La noche de la nostalgia’ or nostalgia night. This basically means adults get to play teens for a night and go out drinking and dancing to the music that they used to listen to when they were young. Even though that definition could apply to any type of old music, the nostalgia night is centered mostly in the 70's and 80’s disco and glam rock, so you can dance to anything from Bon Jovi to Twisted Sisters to Bee Gees or Kiss. One of the most iconic 70s songs here is Village People’s ‘YMCA’ which comes along with it’s own little dance to celebrate, which implies mimicking the letters spelled by the song.
We’re suckers for a good song and with a bit of spare time in our hands we decided to build an AI model capable of recognizing the letters spelled by the dancers’ arms. To do this we explored MediaPipe, Google’s suite of high fidelity machine learning solutions where we came up againt MediaPose, an algorithm capable of estimating the pose for a given input image.
MediaPipe is able to estimate the pose of a given person on a frame by finding 32 fiducial points and then tying them together as below:
With the skeleton in place we are able to calculate many useful features in order to classify a given pose. For example, to check if a person is kneeling in a given picture, we could ask if the angle between their hips, knees and ankles is smaller than a threshold, for instance, 100º. In this case, We tried to involve the angle between elbows, wrists and index fingers, but found the algorithm didn’t perform quite as well as we would’ve hoped. So we studied the angles between shoulders, elbows and wrists in order to determine when the dancer was executing each letter. Below are a few trial and error shots, a successful sample cases and a few bloopers of our staff having fun with the YMCA detector.