Finding Plaid Shirts with the Amazon Rekognition API

Is that flannel you are wearing?

I’m pretty sure I’ve been bitten by the machine learning bug! The past few weeks, I’ve had the opportunity to work with Amazon Rekognition. It’s a new fangled deep-learning image recognition API that is part of AWS.It’s been fun to play around with. You feed it images and it will send back attempts to detect objects, faces, text and other things you’d want to detect. No need to train your own model and run all sorts of specialized software. Just sign up for AWS, set up a client on your machine and start sending the API images to analyze. It’ll take you around 30 minutes to setup a simple proof of concept and get an idea of the API’s features.

What is Rekognition?

First, lets go over a little bit about what Rekognition is for those who aren’t familiar. Rekognition is an API for deep-learning based image and video analysis. You send it photos or video and it can identify objects, people, faces, scenes, text and other stuff.  Rekognition’s deep-learning algo will attempt to label objects in the image.

There are four types of labelling currently supported

I was blown away by how many objects it could label and the granularity of its’ classifications. My expectations of the API’s accuracy were low initially but I was quickly proven wrong. For instance, the API is able to distinguish different breeds of dogs. It knows there’s a difference between a dung beetle and a cockroach. It is also great at finding faces and labeling the parts of a face. Nose, eye, eyebrow and pupil location are just a few. There was a bit of uncertainty when trying to label emotions. For some reason, it always set my emotion as ‘confused’? As time goes by it will only get better at identification. One thing it never fails to label is flannel/plaid. If there is plaid is an image Rekognition will label it like there is no tomorrow.

It can also analyze streaming video for faces in real time. I haven’t tried video yet but at work we have an AWS DeepLens preordered. It has specialized hardware for deep learning and will be able to use custom detection models.

Let’s Start Tinkering

It is easy to start tinkering with Rekognition. We’ll use the AWS CLI and an S3 bucket to get started. We will upload images to the S3 bucket and pass them to the API via CLI. When the API is done processing the image it will return a string of JSON.

To begin, we will setup a simple environment to send images to the API.

  1. Setup the AWS CLI
  2. Create a S3 bucket for the images to be labeled
  3. Upload those images
  4. Use CLI to run Rekognition on bucket images

If you don’t have the CLI setup here are some 3rd party guides. The AWS docs aren’t known for their quality.

Next, you’ll need to create an S3 bucket with public read permissions. I used the GUI on the AWS console to make one in an availability zone close to me. In this case us-east-1. Take note of your bucket’s AZ because Rekognition needs it to find the right image. Once the bucket was ready, I uploaded a few images to bucket and made sure they were publicly accessible.

Look at that tasty pic

woman eating chicken wings with face covered in hot sauce

We will use cutting edge technology to analyze this image


# The base CLI command for Rekognition is: aws rekognition 
# To detect labels in the image use aws rekognition detect-labels 
# We need to specify an S3 bucket and the proper AZ 
# The bucket is described with escaped JSON 
# The AZ uses the abbreviations used across AWS 
# This page has all the AZ shortnames if you forgot 
# https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html 
# this will return JSON describing the images aws rekognition 

detect-labels --image "{\"S3Object\":{\"Bucket\":\"tinker-bucket\",\"Name\":\"wing-woman.jpg\"}}" --region us-east-1

Here’s a sample of the JSON that gets returned. Some of the results are pretty funny
"Labels": [
 {
 "Name": "Human",
 "Confidence": 99.27762603759766
 },
 {
 "Name": "Corn",
 "Confidence": 94.26850891113281
 },
 {
 "Name": "Flora",
 "Confidence": 94.26850891113281
 },
 {
 "Name": "Grain",
 "Confidence": 94.26850891113281
 }
}

Here’s a command to use face detection instead of object detection. Face detection mode returns an array with entries for each face in the image. It can detect up to 100 faces per image. For each face it returns an array of potential emotions too. It seems like the emotion detection is hit or miss at times but it is still really good.
# Note the --attributes ALL argument at the end
# Without this the array of emotions wouldn't be returned
aws rekognition detect-faces --image "{\"S3Object\":{\"Bucket\":\"secrete-bucket\",\"Name\":\"wing-woman.jpg\"}}" --region us-east-1 --attributes ALL

// the result has been shortened
{
 "FaceDetails": [
 {
 "BoundingBox": {
 "Width": 0.3016826808452606,
 "Height": 0.46822741627693176,
 "Left": 0.359375,
 "Top": 0.15793386101722717
 },
 "AgeRange": {
 "Low": 23,
 "High": 38
 },
 "Smile": {
 "Value": false,
 "Confidence": 74.72581481933594
 },
 "Eyeglasses": {
 "Value": false,
 "Confidence": 50.551666259765625
 },
 "Emotions": [
 {
 "Type": "HAPPY",
 "Confidence": 38.40011215209961
 },
 {
 "Type": "SAD",
 "Confidence": 3.1377792358398438
 },
 {
 "Type": "DISGUSTED",
 "Confidence": 1.5140950679779053
 }
 ],
 "Landmarks": [
 {
 "Type": "eyeLeft",
 "X": 0.4536619782447815,
 "Y": 0.3465670645236969
 },
 {
 "Type": "eyeRight",
 "X": 0.5664145946502686,
 "Y": 0.3220127522945404
 }
 ]
 }
 ]
}

It returned a bounding box for the person’s face, a potential age range, if they have glasses or not,  an array of potential emotions and coordinates for facial landmarks like the person’s eyes.

One thing to note is the coordinates for landmarks are formatted as decimals between 0.0 and 1.0. To get values in pixels multiply ‘X’ coordinates by the source images width or ‘Y’ coordinates by the height.

Pretty Neat

Rekognition is pretty impressive considering how simple it is to setup and start using. The CLI is easy to setup and start using although there are some parts of the API you can’t use. For instance, you are stuck uploading images to S3 whenever you want them processed. Using an SDK gives you more control over the API and will let you integrate it with applications you are writing seamlessly. I have been using the Python SDK Boto3 and have been very pleased. It has methods for pretty much any AWS product. At some point I’ll post about using it to alter S3 buckets.

Leave a Reply

Your email address will not be published. Required fields are marked *