Temple extraction

These are the endpoints that are involved in extracting data based on the given template from a given document.

Sync Extraction

/ocrsvc/v{ver}/TemplateExtraction is called when user want to from the specified file formats like .jpg or .jpeg or .png.

The table shows the required values:

Component Description
API Method /ocrsvc/v{ver}/TemplateExtraction
Input
  • ver - The version of the IDP
  • ocrDocument - the input file to the OCR engine(.jpeg,.jpg & .png)
  • documentType - Type of the Document
  • templateFile - The file of JSON type which represents the template of the document
Output The response is of JSON type

Async Extraction

/ocrsvc/v{ver}/AysncTemplateExtraction is called when called when user want to from the specified file formats like .pdf .

The table shows the required values:

Component Description
API Method /ocrsvc/v{ver}/AysncTemplateExtraction
Input
  • ver - The version of the IDP
  • ocrDocument - the input file to the OCR engine(.pdf)
  • documentType - Type of the Document
  • pageNo – PageNo can be, for example,. for pdf with Page size 10, values could be 1,2,3 or 1-3 or 1-3, 7-10
  • templateFile – The file of JSON type which represents the template of the document
Output The response is of JSON type.

Template Creation

"DocumentTypeId": "Direct Order Form",
  "TemplateID": "directorder1",
  "Page": [
    {
      "PageID": "1",
      "StartReg": "",
      "EndReg": "",
      "Fields": [
        {
          "FieldName": "Entity_Name",
          "Type": "Text",
          "ExtractionParser": [
            {
              "Type": "REExtractor",
              "PaserInput": {
                "regtext": "Agreement between ([\\s\\S]*?) and [\\s\\S]*? \\(\"Licensee\"\\)"
              }
            }
          ]
        }