Document extraction

Extract key values from the given document and predicts the type of the field. For example, if in a given document "Invoice: 101" keyValue is found, it will predict its type as "INVOICE NUMBER" for now the type prediction is limited to the type identified by AWS Textract OCR Provider.

Sync Extraction

/ocrsvc/v{ver}/DocumentExtraction is called when user want to extract all key pair values.

The table shows the required values:

Component Description
API Method /ocrsvc/v{ver}/DocumentExtraction
Input
  • ver - The version of the IDP
  • ocrDocument - the input file to the OCR engine(.jpeg,.jpg & .png)
  • The response is of JSON type
Output The response is of JSON type

Async Extraction

ocrsvc/v{ver}/AsyncDocumentExtraction are called when user wants to extract all key pair values. Called when OCR Document is a PDF File.

The table shows the required values:

Component Description
API Method ocrsvc/v{ver}/AsyncDocumentExtraction
Input
  • ver - The version of the IDP
  • ocrDocument - the input file to the OCR engine(.pdf)
  • pageNo - PageNo can be , eg. for pdf with Page size 10, values could be 1,2,3 or 1-3 or 1-3, 7-10
Output The response is of JSON type