Components
This is the architecture of the Infor Document Processor (IDP):
- Input: An input document or file can be with extension of .pdf, .jpg or .jpeg, .png. The current architecture supports these file formats.
- Preprocessing: IDP process includes the steps of noise reduction/removal, rescaling, binarization, rotation/ deskewing (rotating an image).
- Postprocessing: Includes table detection and template extraction.
- OCR Engine: In current implementation, IDP use AWS Textract OCR (optical character recognition) technology in processing the documents. In IDP the OCR output response is generated as .JSON file format.