📄Document Types

In Extracta's document classification system, a Document Type defines a category of documents you expect to classify—such as invoices, receipts, contracts, or purchase orders. Each document type helps the system determine how to interpret and route the documents you upload.

🧩 Whats a Document Type?

A documentType is a JSON object with the following fields:

Field

Type

Required

Description

name

string

✅

A clear, human-readable label for the document type (e.g., "Invoice").

description

string

✅

A short explanation of the type of documents this represents.

uniqueWords

list<string>

✅

A list of key terms or phrases likely to appear in this document type.

extractionId

string

❌

ID of a pre-configured extraction template to auto-extract data.

📌 Why Unique Words Matter?

The uniqueWords array is critical to helping our classification model distinguish between document types. These are keywords or phrases commonly found in that document type. Think of them like clues for classification.

"uniqueWords": [
    "invoice number", 
    "bill to", 
    "total amount"
]

This helps the classifier recognize invoices based on visible cues.

⚙️ Linking to Extraction Templates

You can optionally link a documentType to an existing extraction template using extractionId. This allows you to:

Upload a batch of documents for classification.
Have each document automatically classified (e.g., as an "Invoice").
Automatically trigger data extraction for that document using the associated template.

This creates a powerful end-to-end flow: classification ➝ structured data extraction.

🧾 Example Definition

Here’s how a documentTypes array might look:

"documentTypes": [
  {
    "name": "Invoice",
    "description": "Standard commercial invoice from vendors or suppliers.",
    "uniqueWords": ["invoice number", "bill to", "total amount"],
    "extractionId": "invoiceExtractionId"
  },
  {
    "name": "Purchase Order",
    "description": "Internal or external purchase order documents.",
    "uniqueWords": ["PO number", "item description", "quantity ordered"]
  },
  {
    "name": "Receipt",
    "description": "Retail or online transaction receipts.",
    "uniqueWords": ["receipt", "paid", "transaction id"]
  }
]

PreviousClassification Details NextCustom Document

Last updated 2 months ago

Was this helpful?