# Document Types

In Extracta's document classification system, a `Document Type` defines a category of documents you expect to classify—such as invoices, receipts, contracts, or purchase orders. Each document type helps the system determine how to interpret and route the documents you upload.

## 🧩 Whats a Document Type?

A documentType is a JSON object with the following fields:

<table><thead><tr><th width="179.61328125">Field</th><th width="161.2734375">Type</th><th width="104.8828125">Required</th><th>Description</th></tr></thead><tbody><tr><td><code>name</code></td><td><code>string</code></td><td>✅</td><td>A clear, human-readable label for the document type (e.g., <code>"Invoice"</code>).</td></tr><tr><td><code>description</code></td><td><code>string</code></td><td>✅</td><td>A short explanation of the type of documents this represents.</td></tr><tr><td><code>uniqueWords</code></td><td><code>list&#x3C;string></code></td><td>✅</td><td>A list of key terms or phrases likely to appear in this document type.</td></tr><tr><td><code>extractionId</code></td><td><code>string</code></td><td>❌</td><td>ID of a pre-configured extraction template to auto-extract data.</td></tr></tbody></table>

## 📌 Why Unique Words Matter?

The `uniqueWords` array is critical to helping our classification model distinguish between document types. These are **keywords or phrases** commonly found in that document type. Think of them like clues for classification.

```json
"uniqueWords": [
    "invoice number", 
    "bill to", 
    "total amount"
]
```

This helps the classifier recognize invoices based on visible cues.

## ⚙️ Linking to Extraction Templates

You can optionally link a `documentType` to an existing **extraction template** using `extractionId`. This allows you to:

1. Upload a batch of documents for classification.
2. Have each document automatically classified (e.g., as an "Invoice").
3. Automatically trigger **data extraction** for that document using the associated template.

This creates a powerful end-to-end flow: **classification ➝ structured data extraction**.

## 🧾 Example Definition

Here’s how a `documentTypes` array might look:

```json
"documentTypes": [
  {
    "name": "Invoice",
    "description": "Standard commercial invoice from vendors or suppliers.",
    "uniqueWords": ["invoice number", "bill to", "total amount"],
    "extractionId": "invoiceExtractionId"
  },
  {
    "name": "Purchase Order",
    "description": "Internal or external purchase order documents.",
    "uniqueWords": ["PO number", "item description", "quantity ordered"]
  },
  {
    "name": "Receipt",
    "description": "Retail or online transaction receipts.",
    "uniqueWords": ["receipt", "paid", "transaction id"]
  }
]
```
