âī¸Options
The options
object in Extracta.ai's API requests provides additional customization for your data extraction process. By setting different properties within this object, you can tailor the extraction to suit the specific needs of your documents. Here's how each option affects your data extraction:
Options Overview
hasTable
hasTable
Type: Boolean
Not required
Default:
false
Description: Indicates whether the document to be processed contains tables. When set to
true
, the extraction process includes an additional step specifically designed to analyze and extract information from tables within the document. This option ensures that table data is accurately recognized and extracted, providing structured information that's easy to use and analyze.
handwrittenTextRecognition
handwrittenTextRecognition
Type: Boolean
Not required
Default:
false
Description: Determines if the document includes handwritten text that needs to be recognized and extracted. Setting this option to
true
initiates a specialized step in the extraction process focused on analyzing handwritten text. This feature leverages advanced OCR and machine learning techniques to convert handwritten notes into digital text, enhancing the comprehensiveness of the data extraction.
checkboxRecognition
checkboxRecognition
Type: Boolean
Not required
Default:
false
Description: Determines if the document contains checkboxes that need to be recognized and their states (checked or unchecked) extracted. When set to true, the extraction process includes a specialized step focused on identifying checkboxes within the document and accurately determining their status.
specificPageProcessing
specificPageProcessing
Type: Boolean
Not required
Default:
false
Description: Specific Page Processing is a feature designed to allow users to extract and process only a specified range of pages from a PDF document rather than processing the entire document. This feature is particularly useful when working with large PDF files where only certain sections or pages are relevant for the task at hand.
specificPageProcessingOptions
specificPageProcessingOptions
Type: Map
Required only if
specificPageProcessing
istrue
Description: When Specific Page Processing is enabled, the system allows the user to define a range of pages (using
from
andto
parameters) that they want to focus on. The specified range of pages is then extracted from the original PDF, creating a new document that contains only those pages. This newly created PDF is treated as a separate file and is processed according to the usual workflowâwhether for storage, analysis, or further manipulation.Example: Imagine a scenario where you have a 100-page PDF document, but the relevant information that you want to extract is from page 1 to 3. This feature allows you to specify this range, reducing processing time and cost.
Using the options
Object
options
ObjectTo utilize these options, include the options
object in your API request payload, specifying your preferences for hasTable
and handwrittenTextRecognition
as shown below:
Adjust the values according to the needs of your document. For instance, if your document includes tables and handwritten notes, your options
object would look like this:
Conclusion
The options
object allows for significant customization of the extraction process, enabling you to adapt the extraction to fit the unique characteristics of your documents. Whether dealing with complex tables, handwritten notes, or both, adjusting these options ensures that your extraction process is optimized for the highest accuracy and relevance of the extracted data.
Remember to review your documents' needs and set the options
accordingly to take full advantage of the customized extraction capabilities offered by Extracta.ai.
Last updated