3. Intelligent PDF Processing

Using Intelligent PDF processing approach the documents would be processed, and values would be extracted based on a dynamic repository. The dynamic repository consists of a generic data model which defines the list of fields to be extracted as part of the process and a data dictionary which contains all the pseudo names for the fields listed in the data model. Jiffy provides a pre-package solution which comprises of the document templates, a finite set of data dictionary, and a built in Jiffy User Interface application to view and confirm the data extracted from the PDF.

Dynamic Repository

Dynamic Repository
The fields that must be extracted from the PDF should be defined under the dynamic repository:
- Add Dynamic Repository – To add a new dynamic repository.
- Add Tag – Once the repository is created, click on the repository name and click on “+” icon to create new tags. The fields that must be extracted from the PDF should be created as tags. The user has an option to add, delete and rename tags.
- Add Pseudonym – Select the tag that was created in previous step and click on “Add Pseudonym” button to add the corresponding tag label to the repository. (For eg: For invoice processing, the Invoice number can be depicted in different formats using different synonyms like ‘Invoice No.’, ‘Invoice #’, ‘Invoice Number’ etc. All these synonyms need to be added under the specific tag name as Invoice Number). The user has an option to add, delete, and rename pseudonyms.
Document Template creation
The pre-packaged template would be available for user to upload to the Document templates section. (Templates -> Document Templates). The pre-packaged template needs to be modified to change the repository name and the tag names to the newly created names from the previous step. Once the template is uploaded, add a template tag as well.
Task Design
Creating a task under Jiffy Core (Task Design – Task). The built-in nodes can be used to create the task. This includes the PDF node and the rest API node. The Rest API node is used to pass the data to the Jiffy UI Portal (JDI). The task would be created and available in the repository for the user to copy from. To copy a task from repository, Refer - Setting up automation environment -> Copy task from repository for detailed steps on how to copy a task from repository. For the configurations to be set, the below steps need to be followed under Properties tab:
- Select a PDF configuration from the drop down. Refer - Task Design -> Configuration for setting a Document configuration
- Template Type need to be changed to “Fixed Template”
- Select a PDF Template tag (Refer - Repository -> Document Templates on detailed steps on how to add a tag to Document template)
- Select the PDF location from the mapping section
- Add the Rest API node and provide the mapping from PDF node to Rest API node (in case the test case has been copied from repository, skip this step)
- Click on trial run. Click on the output section, to view the PDF output
Viewing Output in Jiffy UI Portal

The user would be able to view the output in Jiffy UI Portal. All the data that is captured from the PDF based on the data dictionary and the postions, would be made available in Jiffy UI Portal. The operations user or bot designer need to provide confirmation on the correct data to be mapped to the respective fields, by clicking on the Manually approve & Save button. This is a one-time activity that need to be performed for each template.

Dynamic Repository

All the data extracted from the PDF would be displayed under the section Extracted Data from PDF
All the approved data would be displayed under the section Approved Data

Below video demonstrates the Intelligent Document processing PDF extraction process

Reading Scanned Images

Jiffy can process both digital (text pdf) and scanned pdf (image pdf) formats. For scanned/image pdfs, the users need to have a licensed OCR integrated to Jiffy. During processing (pdf reader node), Jiffy automatically checks if the input file is scanned/digital. In case it is a scanned copy, it automatically invokes the OCR to convert the scanned document to text pdf and then continues with data extraction. Jiffy uses Abbyy Fine Reader OCR to perform this action.

Integrating Google Vision API as OCR

Google Vision API can be integrated as an OCR using Jiffy’s REST API node or as a custom expression. The user needs a valid license key as it is a paid service. After understanding the input and output formats of the API, user can configure Jiffy to make use of Google Vision API instead of Abbyy Fine Reader.

3. Intelligent PDF Processing

Reading PDF

Automation Analytics and AI in a box

HfS Hot Vendor