Components

This is the architecture of the Infor Document Processor (IDP):

  • Input: An input document or file can be with extension of .pdf, .jpg or .jpeg, .png. The current architecture supports these file formats.
  • Preprocessing: IDP process includes the steps of noise reduction/removal, rescaling, binarization, rotation/ deskewing (rotating an image).
  • Postprocessing: Includes table detection and template extraction.
  • OCR Engine: In current implementation, IDP use AWS Textract OCR (optical character recognition) technology in processing the documents. In IDP the OCR output response is generated as .JSON file format.