Document Types
In Extracta's document classification system, a Document Type
defines a category of documents you expect to classifyβsuch as invoices, receipts, contracts, or purchase orders. Each document type helps the system determine how to interpret and route the documents you upload.
π§© Whats a Document Type?
A documentType is a JSON object with the following fields:
name
string
β
A clear, human-readable label for the document type (e.g., "Invoice"
).
description
string
β
A short explanation of the type of documents this represents.
uniqueWords
list<string>
β
A list of key terms or phrases likely to appear in this document type.
extractionId
string
β
ID of a pre-configured extraction template to auto-extract data.
π Why Unique Words Matter?
The uniqueWords
array is critical to helping our classification model distinguish between document types. These are keywords or phrases commonly found in that document type. Think of them like clues for classification.
This helps the classifier recognize invoices based on visible cues.
βοΈ Linking to Extraction Templates
You can optionally link a documentType
to an existing extraction template using extractionId
. This allows you to:
Upload a batch of documents for classification.
Have each document automatically classified (e.g., as an "Invoice").
Automatically trigger data extraction for that document using the associated template.
This creates a powerful end-to-end flow: classification β structured data extraction.
π§Ύ Example Definition
Hereβs how a documentTypes
array might look:
Last updated
Was this helpful?