Document key - value

These are the endpoints used when the extraction is based on key values and a key or a list of keys is provided as input.

Sync Extraction

/ocrsvc/v{ver}/DocumentKeyValue is the endpoint used when the file type of OCR Document is one of JPG, JPEG or PNG and a key or list of keys is provided.

The table shows the required values:

Component Description
API Method /ocrsvc/v{ver}/DocumentKeyValue
Input
  • ver - The version of the IDP
  • ocrDocument - Processes .jpg, .jpeg, and .png files as input.
  • list_of_keys - For example, Invoice Number,Invoice Date, Address, etc
Output The response is of JSON type

Async Extraction

/ocrsvc/v{ver}/AsyncDocumentKeyValue is called the OCR Document is a PDF File and a key or list of keys is provided.

The table shows the required values:

Component Description
API Method /ocrsvc/v{ver}/AsyncDocumentKeyValue
Input
  • ver - The version of the IDP
  • ocrDocument - Processes .pdf file as input.
  • list_of_keys - Eg: Invoice Number,Invoice Date, Address and so on
  • pageNo – PageNo can be, for example, for pdf with Page size 10, values could be 1,2,3 or 1-3 or 1, 3- 5 or 1-3, 7-10
Output Task ID is generated

Using the Task ID, user need to submit the Task ID in /ocrsvc/v{ver}/GetJobResult. This API returns the job result for all the Async API for the given TaskID.

Response Output

{
  "ExtractionData": [
    {
      "FieldName": "Name of the extracted entity",
      "FieldValue": "Extracted text value",
      "FieldGeometry": [Left, Top, Width, Height],
      "Confidence": "Confidence score of extraction",
      "PageNo": "Page number where the field was found"
    }
  ],
  "_metadata": {
    "Confidence": "Overall confidence score of the extraction",
    "TaskID": "Unique identifier for the OCR processing job",
    "OcrProvider": "Name of the OCR service provider used",
    "TenantID": "Identifier for the tenant or client using the service",
    "NumberOfPages": "Total number of pages in the document"
  }
}