Document key - value

These are the endpoints used when the extraction is based on key values and a key or a list of keys is provided as input.

Sync extraction

/ocrsvc/v{ver}/DocumentKeyValue is the endpoint used when the file type of OCR Document is one of JPG, JPEG or PNG and a key or list of keys is provided.

The table shows the required values:

Component Description
API Method /ocrsvc/v{ver}/DocumentKeyValue
Input
  • ver:1(Default value)
  • ocrDocument: Processes .jpg, .jpeg, and .png files as input.
  • list_of_keys: For example, Invoice Number, Invoice Date, Address, etc.
Output The response is of JSON type.

Async extraction

/ocrsvc/v{ver}/AsyncDocumentKeyValue is called the OCR Document is a .pdf file and a key or list of keys is provided.

The table shows the required values:

Component Description
API Method /ocrsvc/v{ver}/AsyncDocumentKeyValue
Input
  • ver:1(Default value)
  • ocrDocument: Processes .pdf file as input.
  • list_of_keys: For example, Invoice Number, Invoice Date, Address and so on.
  • pageNo: PageNo can be, for example, for .pdf with page size 10, values could be 1,2,3 or 1-3 or 1, 3- 5 or 1-3, 7-10.
Output Task ID is generated.

Using the Task ID, the user needs to submit the Task ID in /ocrsvc/v{ver}/GetJobResult. This API returns the job result for all the Async API for the given TaskID.

Response output

{
  "ExtractionData": [
    {
      "FieldName": "Name of the extracted entity",
      "FieldValue": "Extracted text value",
      "FieldGeometry": [Left, Top, Width, Height],
      "Confidence": "Confidence score of extraction",
      "PageNo": "Page number where the field was found"
    }
  ],
  "_metadata": {
    "Confidence": "Overall confidence score of the extraction",
    "TaskID": "Unique identifier for the OCR processing job",
    "OcrProvider": "Name of the OCR service provider used",
    "TenantID": "Identifier for the tenant or client using the service",
    "NumberOfPages": "Total number of pages in the document"
  }
}