Azure Computer Vision and Form Recognizer API

Azure Computer Vision and Form Recognizer API

Azure Cognitive Services are some of the most exciting cloud services out there. You can use them to augment your existing software applications with artificial intelligence (AI) capabilities with a few lines of code.

Author: Jamie Maguire


Azure Cognitive Services are some of the most exciting cloud services out there. You can use them to augment your existing software applications with artificial intelligence (AI) capabilities with a few lines of code. Two of the most exciting AI services are the Computer Vision API and Form Recognizer API.

In this post I will:

  • Introduce Computer Vision and Form Recognizer API
  • Discuss some of the main capabilities of Computer Vision API and Form Recognizer API
  • Teach how you can use Computer Vision and Form Recognizer API
  • Show how I used Computer Vision API to surface actionable insights in Instagram data

When you finish reading this post, you’ll understand a little about these AI services. A more detailed breakdown of Computer Vision API and Form Recognizer API is also available in my new course Building a Form Recognizer using Azure Computer Vision.

Computer Vision API

You can use the Computer Vision API to let you quickly and easily extract rich information from images, videos, and related content. Some of the features in Computer Vision API include, but are not limited to

  • Detecting objects in images
  • Generating human-readable descriptions of images
  • Facial recognition
  • Identifying adult content

After provisioning the Computer Vision API in Azure, you make requests to the endpoint and tell it what attributes you want it to detect. The API will then deliver these in a rich JSON payload to your client application.

Instagram Graph API, C#, and Computer Vision API

I used the Computer Vision API on a client project to help categorize images on Instagram. The first step was to use the Instagram Graph API and C# to extract images from the platform. Once I had these images, I sent them to the Computer Vision API for processing.

After the Computer Vision API worked its magic, I had a rich data set that included tags used to describe each image, the most prominent colors in each image, and a readable description for each image.

Computer Vision API insights were then coupled with Instagram analytics data (e.g., likes, impressions) to provide the client with a better understanding of the type of content that performed better (e.g., an image with wildlife on a green background).

This information was then stored in an Azure SQL database for further reporting and analytics purposes, and it served to better inform decision-making. You might have a similar business requirement whereby you need to quickly annotate images at scale. Computer Vision API is the perfect solution.

Form Recognizer API

Form Recognizer API is a document extraction service that makes it simple for you to digitize your documents. With Form Recognizer, you can send in pseudo structured documents, and the API will provide you with structured information. One of the great features of Form Recognizer is that it gives you out-of-the-box models to help you quickly process standard types of documents that you would expect to see in everyday life.

If the out-of-the-box models aren’t suitable for your requirements, you also have the option to train the AI with your own documents. Doing this involves creating custom models.

Labeling Tool

To help you quickly build Custom Models that the Form Recognizer API uses, Microsoft provides an open source, free labeling tool. The labeling tool lets you visualize the documents you want to process and annotate them using a click-and-create interface.

Getting your documents loaded into the labeling tool takes a few steps, but I cover them all in my course Building a Form Recognizer using Azure Computer Vision. You can find the labeling tool at GitHub.

Consuming Computer Vision API and Form Recognizer API

You can consume Computer Vision API and Form Recognizer API in a few ways. If you prefer to have full control over the request and responses, both APIs can be accessed using their respective REST API endpoints. Using the REST APIs means you must manually handle the requests and subsequent JSON responses delivered by each of the APIs.

If you’re just getting started with Computer Vision API or Form Recognizer API, the quickest way to consume these cognitive services in code is to use the dedicated Client SDKs.

Client SDKs are available in many languages such as C# or JavaScript. These make it simple for developers to create requests, send requests, and handle data returned by Computer Vision API or Form Recognizer API.

Other options are available if security is a concern or you need to run on-premise.

Use Cases

There are many use cases for Computer Vision API and Form Recognizer API. Some use cases you may consider include the following:

Lost Property

Use Computer Vision API to automatically index scanned images of lost property. These can then power a searchable database and make it quick and simple to search for lost property.

Document Digitization

Use Form Recognizer to parse historical documents. Ingest the structure data and create a searchable repository, thereby making it easier for you to access historical data.

Image Classification and Content Moderation

Use some of the features of the Computer Vision API to detect content that may not be suitable for minors, and build automation to flag these images for review by an adult.

These are just some example use cases, and you will have your own ideas. Both APIs ship with many more features and capabilities.

Summary

In this post, I’ve introduced Azure Computer Vision API and Form Recognizer API. I’ve also shown a sample of the features these powerful AI services provide.

I’ve also discussed how you may want to use them. If you want to learn more about these cognitive services, you can check out my new course Building a Form Recognizer using Azure Computer Vision.



Related tags:

cloud   azure   computer vision  
About the author

Jamie is passionate about using AI technologies to help advance systems in a wide range of organisations. He has collaborated on many projects including working with Twitter, National Geographic and University of Michigan. Jamie is a keen contributor to the technology community and has gained global recognition for articles he has written and software he has built. He is a STEM Ambassador and Code Club volunteer, inspiring interest at grassroots level. Jamie shares his story and expertise at speaking events, on social media and through podcast interviews. He has co-authored a book with 16 fellow MVPs demonstrating how Microsoft AI can be used in the real world and regularly publishes material to encourage and promote the use of AI and .NET technologies.

Find out more at www.jamiemaguire.net

10-day free trial

Sign Up Now