Jiffy has an inbuilt node available for reading PDF documents. The node is available under “Miscellaneous” section. PDF Reader node is used to read the contents from a text PDF. Jiffy supports integration with OCR technologies to read image PDFs. The image PDF would get converted to text PDF and the same would be used by the PDF reader too extract the data.
How to Start? The below steps need to be followed:
Data Extraction Approaches Jiffy supports the following different approaches for extracting data from the PDF files
Rule-based Data Extraction This approach is mostly used in scenarios where there are limited number of PDF formats available and the frequency in which a new format being introduced is also low. An XML template need to be defined for each PDF format.
Intelligent PDF Processing This approach is mostly used in scenarios where the number of PDF formats are huge and the frequency in which a new format being introduced is also high. The IQ automation works on the principle of having a dynamic repository to store all the data dictionary and a dynamic model (XML template) for extract information from the PDF.
Cognitive Automation This approach is also used in scenarios where the number of PDF formats are huge and the frequency in which a new format being introduced is also high. The major difference between IQ Automation and Cognitive automation is that, the training data (approximately 1 year past data) need to be fed into the system to train the BOT and an ML algorithm need to be devised.