Author: Jamie Maguire
Azure Cognitive Services are some of the most exciting cloud services out there. You can use them to augment your existing software applications with artificial intelligence (AI) capabilities with a few lines of code. Two of the most exciting AI services are the Computer Vision API and Form Recognizer API.
In this post I will:
- Introduce Computer Vision and Form Recognizer API
- Discuss some of the main capabilities of Computer Vision API and Form Recognizer API
- Teach how you can use Computer Vision and Form Recognizer API
- Show how I used Computer Vision API to surface actionable insights in Instagram data
When you finish reading this post, you’ll understand a little about these AI services. A more detailed breakdown of Computer Vision API and Form Recognizer API is also available in my new course Building a Form Recognizer using Azure Computer Vision.
Computer Vision API
You can use the Computer Vision API to let you quickly and easily extract rich information from images, videos, and related content. Some of the features in Computer Vision API include, but are not limited to
- Detecting objects in images
- Generating human-readable descriptions of images
- Facial recognition
- Identifying adult content
After provisioning the Computer Vision API in Azure, you make requests to the endpoint and tell it what attributes you want it to detect. The API will then deliver these in a rich JSON payload to your client application.
Instagram Graph API, C#, and Computer Vision API
I used the Computer Vision API on a client project to help categorize images on Instagram. The first step was to use the Instagram Graph API and C# to extract images from the platform. Once I had these images, I sent them to the Computer Vision API for processing.
After the Computer Vision API worked its magic, I had a rich data set that included tags used to describe each image, the most prominent colors in each image, and a readable description for each image.
Computer Vision API insights were then coupled with Instagram analytics data (e.g., likes, impressions) to provide the client with a better understanding of the type of content that performed better (e.g., an image with wildlife on a green background).
This information was then stored in an Azure SQL database for further reporting and analytics purposes, and it served to better inform decision-making. You might have a similar business requirement whereby you need to quickly annotate images at scale. Computer Vision API is the perfect solution.
Form Recognizer API
Form Recognizer API is a document extraction service that makes it simple for you to digitize your documents. With Form Recognizer, you can send in pseudo structured documents, and the API will provide you with structured information. One of the great features of Form Recognizer is that it gives you out-of-the-box models to help you quickly process standard types of documents that you would expect to see in everyday life.
If the out-of-the-box models aren’t suitable for your requirements, you also have the option to train the AI with your own documents. Doing this involves creating custom models.
To help you quickly build Custom Models that the Form Recognizer API uses, Microsoft provides an open source, free labeling tool. The labeling tool lets you visualize the documents you want to process and annotate them using a click-and-create interface.
Getting your documents loaded into the labeling tool takes a few steps, but I cover them all in my course Building a Form Recognizer using Azure Computer Vision. You can find the labeling tool at GitHub.
Consuming Computer Vision API and Form Recognizer API
You can consume Computer Vision API and Form Recognizer API in a few ways. If you prefer to have full control over the request and responses, both APIs can be accessed using their respective REST API endpoints. Using the REST APIs means you must manually handle the requests and subsequent JSON responses delivered by each of the APIs.
If you’re just getting started with Computer Vision API or Form Recognizer API, the quickest way to consume these cognitive services in code is to use the dedicated Client SDKs.
Other options are available if security is a concern or you need to run on-premise.
There are many use cases for Computer Vision API and Form Recognizer API. Some use cases you may consider include the following:
Use Computer Vision API to automatically index scanned images of lost property. These can then power a searchable database and make it quick and simple to search for lost property.
Use Form Recognizer to parse historical documents. Ingest the structure data and create a searchable repository, thereby making it easier for you to access historical data.
Image Classification and Content Moderation
Use some of the features of the Computer Vision API to detect content that may not be suitable for minors, and build automation to flag these images for review by an adult.
These are just some example use cases, and you will have your own ideas. Both APIs ship with many more features and capabilities.
In this post, I’ve introduced Azure Computer Vision API and Form Recognizer API. I’ve also shown a sample of the features these powerful AI services provide.
I’ve also discussed how you may want to use them. If you want to learn more about these cognitive services, you can check out my new course Building a Form Recognizer using Azure Computer Vision.