Document key - value

These are the endpoints used when the extraction is based on key values and a key or a list of keys is provided as input.

Sync Extraction

/ocrsvc/v{ver}/DocumentKeyValue is the endpoint used when the file type of OCR Document is one of JPG, JPEG or PNG and a key or list of keys is provided.

The table shows the required values:

Component Description
API Method /ocrsvc/v{ver}/DocumentKeyValue
Input
  • ver - The version of the IDP
  • ocrDocument - the input file to the OCR engine(.jpeg,.jpg & .png)
  • list_of_keys - For example, Invoice Number,Invoice Date, Address, etc
Output The response is of JSON type

Async Extraction

/ocrsvc/v{ver}/AsyncDocumentKeyValue is called the OCR Document is a PDF File and a key or list of keys is provided.

The table shows the required values:

Component Description
API Method /ocrsvc/v{ver}/AsyncDocumentKeyValue
Input
  • ver - The version of the IDP
  • ocrDocument - the input file to the OCR engine(.pdf)
  • list_of_keys - Eg: Invoice Number,Invoice Date, Address and so on
  • pageNo – PageNo can be, for example, for pdf with Page size 10, values could be 1,2,3 or 1-3 or 1-3, 7-10
Output The response is of JSON type.

JSON file sample template

{
  "ExtractionData": [
    {
      "FieldName": "Invoice #",
      "FieldValue": "616909624",
      "FieldGeometry": [
        0.17446370422840118,
        0.29006290435791016,
        0.07185196131467819,
        0.010504248552024364
      ],
      "Confidence": 95.11544036865234,
      "PageNo": 1
    },
    {
      "FieldName": "order #",
      "FieldValue": "118553824",
      "FieldGeometry": [
        0.6294771432876587,
        0.2897982895374298,
        0.07047391682863235,
        0.010583918541669846
      ],
      "Confidence": 95.08195495605469,
      "PageNo": 1
    }
  ],
  "_metadata": {
    "Confidence": 95.09869766235352,
    "TaskID": "bc348274-9508-448d-b5ad-26f444c01508",
    "OcrProvider": "AWS_TEXTRACT",
    "TenantID": "IDDPDEV_TST",
    "NumberOfPages": 1
  }
}