Custom Document

Extracta.ai empowers businesses to automate the extraction of structured and unstructured data from a wide array of document types. Leveraging AI-powered OCR technology, our API facilitates the transformation of scanned documents in PDF, JPG, and PNG formats into actionable, intelligent data. This capability is crucial for businesses looking to digitalize their operations through advanced data analysis, mining, and Named Entity Recognition (NER). Whether you're processing individual files or handling documents in batches, Extracta.ai offers a streamlined solution for your automation and information retrieval needs.

Why Choose Extracta.ai for Custom Document Parsing?

  • Versatile Data Extraction: Tailor the API to meet your specific document parsing requirements, whether your documents are structured or unstructured.

  • Advanced OCR Technology: Convert scanned documents into digital data with high accuracy, making your data easily accessible and actionable.

  • Intelligent Data Processing: Utilize IDP and NLP technologies for deeper insights, enabling effective data analysis and decision-making processes.

  • Batch File Processing: Efficiently process large volumes of documents, saving time and resources while enhancing operational efficiency.

Getting Started

To harness the power of custom document parsing, you'll start by crafting a POST request to our /createExtraction endpoint. The key to leveraging this functionality lies in specifying the custom fields relevant to your documents. Here’s how you can structure your request body for a custom document type.

Body Example for Custom Document Parsing

JSON Body
{
    "extractionDetails": {
        "name": "Custom document - Extraction",
        "language": "English",
        "fields": [
            {
                "key": "name",
                "description": "the name of the person in the CV",
                "example": "Johan Smith"
            },
            {
                "key": "email",
                "description": "the email of the person in the CV",
                "example": "johan@gmail.com"
            },
            {
                "key": "phone",
                "description": "the phone number of the person",
                "example": "123 333 4445"
            },
            {
                "key": "address",
                "description": "the compelte address of the person",
                "example": "1234 Main St, New York, NY 10001"
            },
            {
                "key": "soft_skills",
                "description": "the soft skills of the person",
                "example": ""
            },
            {
                "key": "hard_skills",
                "description": "the hard skills of the person",
                "example": ""
            },
            {
                "key": "last_job",
                "description": "the last job of the person",
                "example": "Software Engineer"
            },
            {
                "key": "years_of_experience",
                "description": "the years of experience of last job",
                "example": "5"
            }
        ]
    }
}

Customizing Your Request

  1. Define Your Fields: These fields should reflect the unique aspects of your custom document type.

  2. Prepare the API Call: Incorporate the JSON template into your POST request body to /createExtraction. Ensure your API key is included in the header as a Bearer token for authentication.

  3. Process the Extracted Data: After submitting your request, the API will analyze your document and return structured data based on the custom fields you defined, ready for integration into your systems.

Code Example

const axios = require('axios');

/**
 * Initiates a new document extraction process with the provided details.
 * 
 * @param {string} token - The authorization token for API access.
 * @param {Object} extractionDetails - The details of the extraction to be created.
 * @returns {Promise<Object>} The promise that resolves to the API response with the new extraction ID.
 */
async function createExtraction(token, extractionDetails) {
    const url = "https://api.extracta.ai/api/v1/createExtraction";

    try {
        const response = await axios.post(url, {
            extractionDetails
        }, {
            headers: {
                'Content-Type': 'application/json',
                'Authorization': `Bearer ${token}`
            }
        });

        // Handling response
        return response.data; // Directly return the parsed JSON response
    } catch (error) {
        // Handling errors
        throw error.response ? error.response.data : new Error('An unknown error occurred');
    }
}

async function main() {
    const token = 'apiKey';
    const extractionDetails = {}; // the json body from the example

    try {
        const response = await createExtraction(token, extractionDetails);
        console.log("New Extraction Created:", response);
    } catch (error) {
        console.error("Failed to create new extraction:", error);
    }
}

main();

Conclusion

With Extracta.ai, the complexity of parsing custom document types is simplified, enabling businesses to focus on deriving value from their data rather than on the intricacies of data extraction. Our customizable approach ensures that you can adapt the API to fit the unique needs of your documents, facilitating efficient data handling and analysis.

For detailed guidance on API integration and making the most of Extracta.ai’s capabilities, please explore our API Endpoints and 1. Create extraction pages.

Last updated