Document extraction
Extract key values from the given document and predicts the type of the field. For example, if in a given document "Invoice: 101" keyValue is found, it will predict its type as "INVOICE NUMBER" for now the type prediction is limited to the type identified by AWS Textract OCR Provider.
Sync Extraction
/ocrsvc/v{ver}/DocumentExtraction is called when user want to extract all key pair values.
The table shows the required values:
Component | Description |
---|---|
API Method | /ocrsvc/v{ver}/DocumentExtraction |
Input |
|
Output | The response is of JSON type |
Async Extraction
ocrsvc/v{ver}/AsyncDocumentExtraction are called when user wants to extract all key pair values. Called when OCR Document is a PDF File.
The table shows the required values:
Component | Description |
---|---|
API Method | ocrsvc/v{ver}/AsyncDocumentExtraction |
Input |
|
Output generated | Task ID is generated. |
Response output
{
"ExtractionData": [
{
"FieldName": "Name of the extracted entity",
"FieldValue": "Extracted text value, possibly multi-line",
"FieldGeometry": [
[Left, Top, Width, Height] (Field Name geometry),
[Left, Top, Width, Height] (Field Value geometry)
],
"Confidence": ["Confidence score for each extracted value"],
"PageNo": "Page number where the field was found",
"Type": {
"Value": "Category of the extracted field (e.g., ADDRESS, DATE, etc.)",
"Confidence": "Confidence score for field classification"
}
}
],
"_metadata": {
"TotalFields": "Total number of extracted fields",
"Confidence": "Overall confidence score of the extraction",
"TaskID": "Unique identifier for the OCR processing job",
"OcrProvider": "Name of the OCR service provider used",
"TenantID": "Identifier for the tenant or client using the service",
"NumberOfPages": "Total number of pages in the document"
}
}