Jiffy has an inbuilt PDF reader node to read and process Text / editable PDF’s, no additional software is required, for images, we use ABBY’s FineReader, the licensing of which is a separate cost. The image PDF would get converted to a text PDF and the same would be used by the PDF reader of Jiffy to extract the data.
We can also integrate with other OCR tools like MODI and Google Tesseract
Jiffy’s inbuilt PDF reader can read all type of documents - word, excel, text
We have an inbuilt Excel node to read excel files and a PDF node to read data from PDF’s specifically. As mentioned earlier in the case of Image files, ABBY is used to convert the file to a text PDF which is then fed into the PDF reader node
In Intelligent PDF processing approach, the documents are processed and values are extracted based on a dynamic repository. The dynamic repository consists of a generic data model which defines the list of fields to be extracted as part of the process and a data dictionary which contains all the pseudo names for the fields listed in the data model
Hence there is a one time activity of template creation - Template with Repo Name, Tags and the position of the element to be provided.
Based on the defined repository, the BOT would come up with recommendations on JDI when it is not able to tag a field. The user can then click and select the field to confirm to the BOT which is taken as a learning for the BOT
Is there a limit on the size of PDF that Jiffy can read ?
Jiffy has been used to read PDF's with more than 300+ pages , there are inbuilt functions to split the pdf vertically into pages and also perform a horizontal split when the data is spread across vertical sections on the same page
The accuracy and efficiency of the data read by the OCR largely depends on the quality of the text in the actual document. When the data is read using the ML model in our Cognitive layer, there is a good amount of dependency on the training data fed and hence model thus generated. As a best practise therefore we recommend to add substantial data validation rules using back end data in order to increase the accuracy of the data being predicted. This automatically improves performance of the BOT