API endpoint documentation

Sync Document OCR

Extracts all the text present in the given document.

The table shows the required values:

Component Description
API Method /ocrsvc/v{ver}/DocumentOCR
Input
  • ver - The version of the IDP
  • ocrDocument - Processes .jpg, .jpeg, and .png files as input.
Output The response is of JSON type

Async Document OCR

Extracts all the text present in the given document.

The table shows the required values:

Component Description
API Method /ocrsvc/v{ver}/AsyncDocumentOCR
Input
  • ver - The version of the IDP
  • pageNo - PageNo can be, for example, for pdf with page size 10, values could be 1,2,3 or 1-3 or 1-3, 7-10
  • ocrDocument - Processes .pdf files as input.
Output Task ID will be generated

Using the Task ID, user need to submit the Task ID in /ocrsvc/v{ver}/GetJobResult

This API returns the job result for all the Async API for the given TaskID. The JSON file code below is a sample for Async OCR Document Extract:

{
  "data": [
    {
      "OCR_text": "Provide extracted text", 
      "PageNo": "Provide page number"
    }
  ],
  "_metadata": {
    "TotalWords": "Provide total word count",
    "TotalLine": "Provide total line count",
    "Confidence": "Provide confidence percentage",
    "TaskID": "Provide unique task ID",
    "OcrProvider": "Provide OCR provider name",
    "TenantID": "Provide tenant ID",
    "NumberOfPages": "Provide total number of pages"
  }
}