Learn How to Classify Documents Using Computer Vision and NLP

Many companies, especially those in BFSI and Legal sectors, deal with a large volume of handwritten and scanned documents.

Affine

Many companies, especially those in BFSI and Legal sectors, deal with a large volume of handwritten and scanned documents. It is difficult to easily use the granular information in these documents to perform an analysis or even browse through the documents in a convenient manner. A simple classification of the documents into meaningful bins or folders would make it a lot easier to leverage the information within the documents.

The current blog focusses on a Document/Text Classification solution that we developed for an Insurance industry client which focussed on grouping medical/health insurance claims into pre-defined categories. The current process of categorization was done manually by a panel of experts. These experts had their own biasness and heuristics, which lead to inconsistencies.

We developed a Deep Learning based framework which ensembled learnings from document’s layout and structure, the content/text within a given document and amalgamation of consistent & coherent expert opinions. The framework helped in automation of the existing process leading to better efficiency and efficacy.

We had a set of 40k scanned images of medical insurance documents and tried building an algorithm to classify those documents into given 5 categories. These scanned documents exhibited characteristics for each of the classes based on the document structure and token sequences present in the document.

Document Sample:

Sample Image Features: Document Structure e.g. QR Code on Top Left Corner, Gridlines etc.

Sample Text Features: Presence of text sequences such as Health Claim Insurance Form, Field information e.g. Insured’s ID Number, Patient’s Name, Patient’s Address, Date of Service etc.

ANALYTICAL APPROACH

Over the past few years, Deep Learning (DL) architectures and algorithms have made impressive advances in fields such as image recognition and speech processing.

Their application to Natural Language Processing (NLP) has now proven to make significant contributions, yielding state-of-the-art results for some common NLP tasks. Named entity recognition (NER), topic modelling and sentiment analysis are some of the problems where neural network models have outperformed traditional approaches.
Convolutional neural networks (CNN) have also been widely used in automatic image classification systems, Object Detection and Recognition, Neural Style Transfer and many more applications. Image classification is the task of taking an input image and outputting a class or a probability of classes that best describes the image.

For the given problem, we decided to use the features of images as well as text in the document. Since these documents were scanned images, the first challenge was to extract text out of them, and second to draw meaningful insights from this text and image. Following is the overall solution architecture:

To extract the text sequences out of these images we performed OCR (Optical Character Recognition) using Tesseract. For python users, there is an OCR Library called “pytesseract” with the functionality of “image_to_string” conversion.

CHOICE OF MODEL

Through OCR steps we were able to extract the text sequences out of these scanned images. This leads us to achieve both text and image-based features which were leveraged for developing the classification algorithm.

Bi-directional LSTM:

For text features such as ‘Patient’s Name’, ‘Patient’s Address’, ‘Insured’s ID Number’, ‘Type of Bill’, ‘Patient’s Control Number’ etc. we decided to implement a bidirectional LSTM. Bidirectional LSTMs are an extension of traditional LSTMs that can improve model performance on sequence classification problems. It involves duplicating the first recurrent layer in the network so that there are now two layers side-by-side, then providing the input sequence as-is as input to the first layer and providing a reversed copy of the input sequence to the second.

The text sequences were converted into word vectors using pre-trained Glove embedding vector matrix and then passed through bidirectional LSTM (Long Short-Term Memory) Layer and 2 Dense Layers with ‘ReLU’ and ‘Softmax’ activations respectively.

Convolutional Neural Networks:

To deal with the image features we decided to implement a Convolutional Neural Network.
The image features were extracted and enhanced through 2Convolution layers and 3 dense layers with ReLU activations and softmax for the output layer activations respectively. Data Augmentation was done by flipping these scanned images for adding more features for robustness. Various transformations like resizing, rescaling etc. were also experimented with.

Combination of LSTM & CNN:

A combination of both LSTM and Convolution layers has also experimented on the text features, which resulted in a good classification accuracy.

Ensembling: Ensembling was done to capture features from different models and improve the accuracy of classification. This helped achieve 90% plus overall accuracy in correctly classifying the documents into respective 5 classes.

Please feel free to comment in case of any queries.

Contributors

Shifu Jain: Shifu is a Senior Business Analyst and a part of Affine Artificial Intelligence CoE “AICoE”. Her interests involve exploring and learning new researches in NLP, Topic Modelling, Recurrent Neural Networks etc.

Karthik Devaraj: Karthik is a Consultant at Affine with 6+ years’ experience in the field of Computer Vision, Machine Translation, Generative Algorithms.

About Author

Affine is leading AWS select consulting partner renowned for providing cutting-edge cloud services on AWS platform

Affine

Learn How to Classify Documents Using Computer Vision and NLP

Affine

About Author

Affine is leading AWS select consulting partner renowned for providing cutting-edge cloud services on AWS platform

Recommended Blogs & Articles

5 Pillars of AI Deployment in Startups

Although AI is becoming a critical factor in the long-term success of startups, a majority of them fail to deploy it. Most of them feel that employing...

Ankit Agarwal

A Lapse From Model-Centric to Data-Centric AI

Recently, AI has taken off the ground and has been bringing revolutionary changes in the industry. Its influence has been seen in many aspects of busi...

Affine

Accelerate Your eCommerce Sales with Big Data and AI for 2021

Holiday season is the most exciting time of the year for businesses. It has always driven some of the highest sales of the year. In 2019, online holid...

Heena Kohli

AI in Robotic Process Automation – The Missing Link

Robotic Process Automation as we know it today is a framework through which large scale processes can be automated. The biggest advantage of current R...

Eron kar

Bayesian Theorem: Breaking it to Simple Using PyMC3 Modelling

Abstract This article edition of Bayesian Analysis with Python introduced some basic concepts applied to the Bayesian Inference along with some pra...

Dr. Monika Singh

Bring your Art to Life with Pix2Pix

As an artist, I always wondered if I could bring my art to life. Although, it makes no sense, what if I told you that this was possible with Machine L...

Anamika Jha

Capsule Network: A step towards AI mimicking human learning systems

1. A quick introduction to Convolution Neural Networks The field of computer vision has witnessed a paradigm shift after the introduction of Convol...

Sourav Mazumdar

CatBoost – A new game of Machine Learning

Gradient Boosted Decision Trees and Random Forest are one of the best ML models for tabular heterogeneous datasets. CatBoost is an algorithm for gr...

Anamika Jha

Chatbot in Python-Part 1

According to Gartner, “by 2022, 70% of white-collar workers will interact with conversational platforms daily.” According to an estimate, more ...

Pratishtha Kapoor

Data Augmentation For Deep Learning Algorithms

Plentiful high-quality data is the key to great deep learning models. But good data doesn’t come easy, and that scarcity can impede the development ...

Affine

Demystifying the struggles of adopting AI in the Manufacturing Sector

“For half of the businesses in the Manufacturing sector, AI adoption is still an unexplored area with a hand full of complex workflows and a mind fu...

Manas Agrawal

Detectron2 FPN + PointRend Model for Amazing Satellite Image Segmentation

Satellite image segmentation has been in practice for the past few years, and it has a wide range of real-world applications like monitoring deforesta...

AI Practices

Evolution Of Human Resource In The New World Of Technology How has the Human Resources changed with time?

Of all the departments and functions in a corporate organization, Human Resource is the one function related to employees’ personal aspects. The ent...

Urmita Das

Explainable AI

The advancement in AI technology has led us to solve several problems with technology working side by side. The complexity of these AI models is growi...

Affine

Gradient Boosting Trees for Classification: A Beginner’s Guide

Introduction Machine learning algorithms require more than just fitting models and making predictions to improve accuracy. Nowadays, most winning m...

Aratrika Pal

Is AI Creating Values for Startups?

Only the world’s top businesses could afford to invest in AI a decade ago, but things have changed drastically in the last 5-6 years. AI has become ...

Affine