๐Ÿ“„Document Types

In Extracta's document classification system, a Document Type defines a category of documents you expect to classifyโ€”such as invoices, receipts, contracts, or purchase orders. Each document type helps the system determine how to interpret and route the documents you upload.

๐Ÿงฉ Whats a Document Type?

A documentType is a JSON object with the following fields:

Field
Type
Required
Description

name

string

โœ…

A clear, human-readable label for the document type (e.g., "Invoice").

description

string

โœ…

A short explanation of the type of documents this represents.

uniqueWords

list<string>

โœ…

A list of key terms or phrases likely to appear in this document type.

extractionId

string

โŒ

ID of a pre-configured extraction template to auto-extract data.

๐Ÿ“Œ Why Unique Words Matter?

The uniqueWords array is critical to helping our classification model distinguish between document types. These are keywords or phrases commonly found in that document type. Think of them like clues for classification.

"uniqueWords": [
    "invoice number", 
    "bill to", 
    "total amount"
]

This helps the classifier recognize invoices based on visible cues.

โš™๏ธ Linking to Extraction Templates

You can optionally link a documentType to an existing extraction template using extractionId. This allows you to:

  1. Upload a batch of documents for classification.

  2. Have each document automatically classified (e.g., as an "Invoice").

  3. Automatically trigger data extraction for that document using the associated template.

This creates a powerful end-to-end flow: classification โž structured data extraction.

๐Ÿงพ Example Definition

Hereโ€™s how a documentTypes array might look:

"documentTypes": [
  {
    "name": "Invoice",
    "description": "Standard commercial invoice from vendors or suppliers.",
    "uniqueWords": ["invoice number", "bill to", "total amount"],
    "extractionId": "invoiceExtractionId"
  },
  {
    "name": "Purchase Order",
    "description": "Internal or external purchase order documents.",
    "uniqueWords": ["PO number", "item description", "quantity ordered"]
  },
  {
    "name": "Receipt",
    "description": "Retail or online transaction receipts.",
    "uniqueWords": ["receipt", "paid", "transaction id"]
  }
]

Last updated

Was this helpful?