What is AWS Comprehend
15 Sep

What is AWS Comprehend?

Mahipal Nehra

If you are a developer, you must have heard about Amazon Web Services (AWS). AWS is among the world’s most comprehensive and broadly adopted on-demand cloud platforms. AWS offers scalable, reliable, and inexpensive services in cloud computing like EC2, S3, Aurora, SageMaker, Comprehend, Transcribe and more. However, the one we will be learning about today is AWS Comprehend.

So, let’s get started!

Overview of AWS Comprehend

Amazon Comprehend is an NLP or Natural Language Processing service. It uses a pre-trained Machine Learning (ML) model to obtain insights from unstructured data. This model is continuously trained on a large text body so an organization using it doesn’t have to provide training data. AWS Comprehend processes text files in the UTF-8 format.

All the insights are developed in AWS Comprehend by analyzing the sentiments (emotional sentiment of a document), key phrases ( noun phrase string to describe a thing), entities (people, places, or locations), language, PII (address, contact details, or account number), and syntax(adjective, adposition, adverb, etc.). It offers Sentiment Analysis, Keyword Extraction, Entity Recognition, Language Detection, PII Detection, Syntax Analyzing, and Topic Modeling APIs to help a developer integrate NLP in their applications.

Not only that but, AWS Comprehend can examine and analyze documents in multiple languages like German, English, Italian, French, Portuguese, Spanish, Hindi, Japanese, Chinese, Arabic and Korean. Comprehend does so by using IETF language tags (RFC 5646 or ISO 639-2).

Some of the benefits of AWS Comprehend are:

  • Scalable Natural Language Processing (NLP) that can be integrated into your application.

  • Easy integration of other AWS Services like S3, Lambda, KMS to enhance the application functionality.

  • Enables encryption of results and volume data at a low cost.

  • If used in medical, quickly extract medical information accurately and protect the information of the patient while reducing the document processing costs.

Amazon Comprehend – Features

Now that we have a basic knowledge of AWS Comprehend and its working, it is time to check the key features it facilitates.

  • Keyphrase Extraction

Keyphrase extraction is the process of extracting important words that are relevant to the document. It enables fast search over documents through indexing, clustering, summarizing and categorizing words and phrases from documents. With AWS Comprehend keyword extraction API, you can automatically get the key phrases along with a confidence score that supports that this word is the key phrase given in the document.

  • Syntax Analysis

Syntax analysis is the process of analyzing natural language with the rules of formal grammar. The syntax analysis aims to extract exact meaning from the given text. The AWS Comprehend Syntax API helps users to determine the text using Parts of Speech (PoS) and tokenization. It also labels each word with the associated speech like adjective, noun or verb.

  • Sentiment Analysis

Sentiment analysis, also known as Opinion Mining, is a procedure used to determine the emotion behind the text. Using AWS Comprehend sentiment analysis API, you can get the overall sentiment of a text with a confidence score that determines whether the emotion of the text is positive, negative, neutral or mixed.

  • Entity Recognition

Entity recognition or named-entity recognition is a technique that is used to seek, locate and classify the named entities in unstructured data under different predefined classes. The Entity Recognition API of AWS Comprehend returns the named entities such as time, expressions, people, qualities, places, etc that are automatically categorized depending on the text provided.

  • Language Detection

Language detection is the classification of documents based on their language and character encoding. The Language Detection API of AWS Comprehend helps you to automatically determine over 100 languages in which a text is written and return the dominant language of the document with a confidence score.

  • Custom Entities

Custom entities refer to the customization of AWS Comprehend to identify terms that are specific to your business domain. With the use of AutoML, Comprehend will learn from the given private index of examples to train a private and custom model to recognize these entities in any other text block. And the plus point here is, it doesn’t need servers to manage or algorithms to master.

  • Custom Classification

With custom classification, you can easily develop customized models for text classifications using the specific labels of your business. Using AWS Comprehend Custom Classification, you can easily create a custom model by providing example text for the labels you want to use. Once you have given the example labels, Comprehend will automatically train the model customized for your business. Moreover, you don’t even need machine learning or coding experience to build the custom model.

  • Topic Modeling

Topic Modeling is the task of recognising topics that describe the set of documents accurately. In Amazon Comprehend, Topic Modeling identifies relevant topics or terms within the set of documents that are stored in the Amazon S3 bucket. Then, it will determine the most common topics in the set and organize them into groups to outline the relation of the document in the given topic.

  • Comprehend Medical

Comprehend Medical is the HIPAA eligible NLP service that is used to extract health data from medical text using machine learning. A vast amount of health data is in free form today including reports of clinical trials, notes given by doctors, and so on. Extracting the data manually can be extremely time-consuming. But, with just an API call to Comprehend Medical one can accurately extract much-needed information like medications, tests, dosage, medical conditions and procedures.

  • Medical Ontology Linking

The AWS Comprehend’s Medical Ontology Linking API discovers the medical information from an unstructured data set and then links them to concepts and codes in the medical standard ontology. While medical conditions are linked to ICD-10 codes using InferICD10CM API, medications are linked to codes of RxNorms.

Use Cases of AWS Comprehend

AWS Comprehend has various use cases among which we will discuss the most used here.

  • Semantic Search

AWS Comprehend can be used to gain a better search experience as it enables the search engine to index sentiment, key phrases and entities. Semantic search also allows focusing on the context and intent of the articles rather than focusing on the basic keyword.

  • Customer Analytics

If you want to know the sentiment of the review or feedback of a customer received from emails, support calls, social media or websites to provide better services & products to them, AWS Comprehend can help you with the same. It will let you know whether the customer feedback is neutral, negative, positive or mixed with a confidence score to confirm.

  • Medical Cohort Analysis

In oncology, it's vital to swiftly find the correct selection criteria so that patients may be enrolled in clinical trials. Comprehend Medical by AWS recognises and analyses complicated medical information in unstructured data, making arranging and searching easier. In a fraction of the time, you can use these insights to identify and recruit patients for the proper clinical trial.


We hope this blog helped you understand the overall concept of AWS Comprehend and acknowledge its features, benefits and use cases. If you too want to integrate AWS Comprehend in your application to offer better services to your customers, contact an IT company with years of experience in the field.

Posted by Mahipal Nehra | Posted at 15 Sep, 2021 Web