API endpoint documentation

Sync Document OCR

Extracts all the text present in the given document.

The table shows the required values:

Component Description
API Method /ocrsvc/v{ver}/DocumentOCR
Input
  • ver:1(Default value)
  • ocrDocument: Processes .jpg, .jpeg, and .png files as input.
Output The response is of JSON type.

Async Document OCR

Extracts all the text present in the given document.

The table shows the required values:

Component Description
API Method /ocrsvc/v{ver}/AsyncDocumentOCR
Input
  • ver:1(Default value)
  • pageNo: PageNo can be, for example, for .pdf with page size 10, values could be 1,2,3 or 1-3 or 1-3, 7-10
  • ocrDocument: Processes .pdf files as input.
Output Task ID will be generated.

Using the Task ID, the user needs to submit the Task ID in /ocrsvc/v{ver}/GetJobResult.

This API returns the job result for all the Async API for the given TaskID. The JSON file code shown here is a sample for Async OCR Document Extract:

{
  "data": [
    {
      "OCR_text": "Provide extracted text", 
      "PageNo": "Provide page number"
    }
  ],
  "_metadata": {
    "TotalWords": "Provide total word count",
    "TotalLine": "Provide total line count",
    "Confidence": "Provide confidence percentage",
    "TaskID": "Provide unique task ID",
    "OcrProvider": "Provide OCR provider name",
    "TenantID": "Provide tenant ID",
    "NumberOfPages": "Provide total number of pages"
  }
}