Extracta.ai
DashboardJoin Discord
  • extracta.ai
    • Introduction
    • Overview
  • API Reference
    • πŸ”“Authentication
    • πŸ“Supported File Types
  • Data Extraction - API
    • πŸ’»API Endpoints - Data Extraction
      • 1. Create extraction
      • 2. View extraction
      • 3. Update extraction
      • 4. Delete extraction
      • 5. Upload Files
      • 6. Get results
    • Extraction Details
      • 🌎Supported Languages
      • βš™οΈOptions
      • πŸ“‹Fields
    • Receiving Batch Results
      • Polling vs Webhook
      • How to use the Webhook
    • πŸ•ΉοΈPostman Integration
  • Document Classification - API
    • πŸ’»API Endpoints - Document Classification
      • 1. Create classification
      • 2. View classification
      • 3. Update classification
      • 4. Delete data
        • 4.1 Delete classification
        • 4.2 Delete batch
        • 4.3 Delete files
      • 5. Upload Files
      • 6. Get results
    • Classification Details
      • πŸ“„Document Types
  • Documents
    • Custom Document
    • Resume / CV
    • Contract
    • Business Card
    • Email
    • Invoice
    • Receipt
    • Bank Statement
  • Support
    • πŸ’Tutorials
    • 🟒API Status
  • Contact
    • πŸ“§Contact Us
    • ❓FAQ
Powered by GitBook
On this page
  • 🧩 Whats a Document Type?
  • πŸ“Œ Why Unique Words Matter?
  • βš™οΈ Linking to Extraction Templates
  • 🧾 Example Definition

Was this helpful?

  1. Document Classification - API
  2. Classification Details

Document Types

In Extracta's document classification system, a Document Type defines a category of documents you expect to classifyβ€”such as invoices, receipts, contracts, or purchase orders. Each document type helps the system determine how to interpret and route the documents you upload.

🧩 Whats a Document Type?

A documentType is a JSON object with the following fields:

Field
Type
Required
Description

name

string

βœ…

A clear, human-readable label for the document type (e.g., "Invoice").

description

string

βœ…

A short explanation of the type of documents this represents.

uniqueWords

list<string>

βœ…

A list of key terms or phrases likely to appear in this document type.

extractionId

string

❌

ID of a pre-configured extraction template to auto-extract data.

πŸ“Œ Why Unique Words Matter?

The uniqueWords array is critical to helping our classification model distinguish between document types. These are keywords or phrases commonly found in that document type. Think of them like clues for classification.

"uniqueWords": [
    "invoice number", 
    "bill to", 
    "total amount"
]

This helps the classifier recognize invoices based on visible cues.

βš™οΈ Linking to Extraction Templates

You can optionally link a documentType to an existing extraction template using extractionId. This allows you to:

  1. Upload a batch of documents for classification.

  2. Have each document automatically classified (e.g., as an "Invoice").

  3. Automatically trigger data extraction for that document using the associated template.

This creates a powerful end-to-end flow: classification ➝ structured data extraction.

🧾 Example Definition

Here’s how a documentTypes array might look:

"documentTypes": [
  {
    "name": "Invoice",
    "description": "Standard commercial invoice from vendors or suppliers.",
    "uniqueWords": ["invoice number", "bill to", "total amount"],
    "extractionId": "invoiceExtractionId"
  },
  {
    "name": "Purchase Order",
    "description": "Internal or external purchase order documents.",
    "uniqueWords": ["PO number", "item description", "quantity ordered"]
  },
  {
    "name": "Receipt",
    "description": "Retail or online transaction receipts.",
    "uniqueWords": ["receipt", "paid", "transaction id"]
  }
]
PreviousClassification DetailsNextCustom Document

Last updated 18 hours ago

Was this helpful?

πŸ“„