Create Document Processor Flow

With Document Processor Flow (DPF), you can build custom workflows to:

  • Classify documents (for example, invoices, resumes, purchase orders)
  • Extract entity information (invoice numbers, dates, amounts, etc.)
  • Process tables (structured or unstructured header, line item, and footer tables)

A DPF workflow is made of document classes, activities, entity lists, table headers, along with their prompt definitions and instructions. Once configured, the workflow can be tested, finalized, and versioned.

Create New Flow

  1. Click the Add button in the top-right corner.
  2. On the Basic Information page, specify these values:
    • Flow Name: Enter a descriptive name (maximum 100 characters)
    • Prompt Structure: Select Old or New
    • Version: The default version is v1
    • Description: Add optional details about the flow purpose
  3. Click Next to continue.

Create Document Class

Document classes define what types of documents and information you want to process.

Examples of classes: Invoice, Packing List, Purchase Order, Bank Statement, Resume.

  1. Click Add Document Class.
  2. In the dialog box, specify:
    • Class Name: Specify Class name identifier
    • Pre-trained Model: To inherit entities and tables from base flows (optional)
    • Prompt Definition: Provide the class definition or key identifier for document classification
  3. If you want to perform Entity Classification, add at least one entity to each class.
    • Entities: Click Add and specify the entity name and definition prompt.
    • Tables: Click Add and select Table. Specify the table prompt. To add columns, click the column details icon. Add column names and define a separate prompt for each column.
Note: These are the current system limits:
  • Entities: Maximum 30 per class
  • Tables: Maximum 5 per class
  • Table Columns: Maximum 20 per table, 50 total across all tables
  • Document Classes: Maximum 5 per flow
Note: 

If you select the new prompt structure, an Other class is automatically created when you add the first document class. This class:

  • Cannot be deleted
  • Has no entities or column headers
  • Catches documents that do not match other classes

Configure Activities

  1. On Activity page, drag activities based on your requirements:
    • To perform document classification only, drag Document Classification.

      You must add at least two document classes.

    • To perform entity extraction only, drag Entity Classification.

      You must add exactly one document class.

    • To perform both operations, drag Document Classification and Entity Classification.

    Note: If you use the new prompt structure, the system automatically adds Document Classification when you drag Entity Classification.
  2. To set the activity properties in the property panel, click the activity and configure this field:
    • Provider: specify the AI service provider
    Note: You must use the same settings for all tasks.

Review and customize your prompt

On the Prompt page, you can review the prompts. All prompts contain user role definition, where you specify the tasks for the LLM.

  • Document Classification Prompt
    This prompt includes the document classes and their definitions. In this prompt, you can:
    • Set guidelines for document analysis and classification.
    • Use 5 predefined instructions to guide the LLM in processing documents.
    • Add, delete, or update instructions. The maximum number of instructions is 10.

    You can define the Document Boundary Detection guidelines to group pages by document type in PDFs that contain multiple document types, such as invoices, bank statements, and purchase orders. Three predefined guidelines are included. You can edit them or add more guidelines. The maximum number of guidelines is 6.

    The output is generated in JSON format.

  • Entity Classification Prompt

    This prompt contains:

    • Entity name and prompt definitions.
    • Entity output format.
    • Predefined entity extraction guidelines for the LLM that specify OCR consideration, extraction process, confidence scores calculation, and output format structure. The maximum number of these guidelines is 15.
    • Predefined Quality Assurance guidelines for validating the extraction process. The maximum number of these guidelines is 6.
  • Table Extraction Prompt (if applicable)

    This prompt contains:

    • Column names and prompt definitions for each table.
    • Table output format.
    • Predefined entity extraction guidelines for the LLM that specify OCR consideration, extraction process, confidence scores calculation, and output format structure. The maximum number of these guidelines is 15.
    • Predefined Quality Assurance guidelines for validating the extraction process. The maximum number of these guidelines is 6.

Test

  1. Click Test Model to validate the prompt configuration before you finalize the flow.
  2. In the dialog box, upload a sample file to simulate inference behavior.
  3. Examine the model’s classification results and output structure for accuracy and completeness.
  4. If needed, return to previous steps to refine prompt definitions, instructions, or boundary guidelines based on test results.

Summarize and Finalize

On the Summary page, you can view these details of the document processing setup:

  • Document Processor Flow meta-data: Displays the flow name, description, and version number.
  • Document Classes: Lists each class along with associated entity and table counts.
  • Activity Configurations: Shows configured activities and their provider values.
  • Generated Prompts: Presents the prompts created for each document class.
Note: You can save the flow as either Draft or Active.

The flow saved as Draft can be modified at any time.

The flow saved as Active becomes read-only, cannot be modified, and is ready for inference. You can still drill down into an Active flow to view its details.