Deep Image Analysis with Ruby and Google Cloud Vision

How would your app change if you could know the contents of your user's photos as they uploaded them? Would you sort products based on the activities in their photos? Match users based on common locations? Suggest travel destinations?

But of course this has been a pipe dream. Image content is something that has historically been hard to identify and process programatically. It's a bummer. Photos are easily the most common upload to web apps. Perfect knowledge about their content would make those sames photos the most informative thing our apps could request. We simply have not had a good way to know what the contents of images are without manually identifying each image on upload.

What is deep image analysis?

Deep learning has been all the rave over the past year. But that excitement has so far come with lots of promises and very little game-changing solutions. From the smoke, though, Google's Cloud Vision API has risen as one of the first accessible products in the market from the deep learning wave. And it is a doozy.

As you may have guessed, Cloud Vision gives access to respectable image analysis. What is respectable? Well, so far, Cloud Vision can be used to:

  • check if an image contains adult or violent content
  • find major landmarks in your image (Eiffel Tower, White House, etc)
  • find the faces included in your image
  • run sentiment analysis on those faces
  • list possible tags for the image ("sailing", "hiking", "mountain", etc)
  • transcribe any text in the image
  • identify company logos in the image

I would encourage you to pause for a minute and read through that list again. Really think about how this will affect your app.

Cloud Vision results

For starters, head to Google and generate an api key. Then do yourself a favor and grab my cloud_vision gem. The gem returns the raw result allowing us to save the complete result before we use a parsed version. This is huge for saving time reprocessing images.

require 'cloud_vision'

image = 'awesome_image.jpg' )

raw_result = api_key ).analyze( image, [:faces, :text] )
result = CloudVision::Parser.parse_analysis( raw_result )

And in return we get results like this:

Faces and sentiment analysis. As far as i am concerned this is nuts, but it gets better.

Not pictured is that the image analysis also finds the text "E YACHT WEEK". Do you see said text? Me either. But upon closer inspection...

Yes. Seriously.

Find a reason to use this

Cloud Vision, to me, is one of those game changing tools that can fundamentally change how we serve our users. I would urge devs to play with the technology and think of how to leverage it in your apps.

tl;dr: Deep image analysis is now available to the masses through Google's Cloud Vision api. This gem makes using it easy.

Get the latest posts delivered right to your inbox.
Author image
Written by Ben
Ben is the co-founder of Skyward. He has spent the last 10 years building products and working with startups.