Analyzing Documents Using Pre-Built Schemas

To analyze any type of document and fully customize the output structure instead of using pre-built schemas, see Analyzing Documents With Custom User-Defined Schemas.

A document action is a multi-step process that uses multiple AI engines to scan a document, filter out fields, and return a structured response as a JSON object. Each document action defines the types of documents it expects as input, the fields to extract, and the fields to filter out from the response.

You can hide fields, mark fields as required, configure the minimum confidence score accepted for each field to extract, and configure Prompts to enhance and refine the data-extraction process by asking questions using natural language.

Create a document action using a predefined type as a template, and then specify the fields that are mandatory, the fields that must be excluded from the JSON response, and the minimum confidence score expected for each field.

After you create a new document action, you can add reviewers and publish it to Anypoint Exchange, which enables RPA to execute the document action and also creates an API that you can call from any external system.

When you create a document action, ensure that the configured schema works for each uploaded document. If you can’t customize the schema to work with all sample documents, consider creating multiple document actions. For example, purchase orders from multiple vendors might be different enough to require a separate document action for each vendor.

Creating a document action with a pre-built schema requires the following tasks:

Upload Sample Files and Preview the Results
Configure the Schema for the Extraction

Unless you select the Generic document action, which doesn’t use a schema.
Add Prompts to Refine the Result

Before You Begin Creating Document Actions

Ensure you have any of the following Anypoint permissions:

Manage Actions: Gives a user complete access to IDP and assigns reviewer permission by default for every document action.
Build Actions: Enables a user to create, edit, publish, and execute document actions and assign reviewers to the actions.

Upload Sample Files and Preview the Results

To start creating a new document action, upload sample files to test the extraction process:

In the sidebar, click Document Actions.
Click Create New.
Select the type of document to use as a template, specify a name for your document action, and click Create.
Click Select files and upload sample files to analyze.

You can upload up to 10 files with a size limit of 8 MB per file.
Click Run to analyze the files and get a preview of the results.

The document action editor shows a preview of the analyzed document that you can zoom in and out for better visibility. Navigate the different pages of the document using the Previous (<) and Next (>) buttons. Switching pages updates the extracted values shown in the Outputs section.

IDP highlights fields in yellow when the extracted values have a confidence score below the configured threshold.

After uploading the sample files, configure the schema.

Configure the Schema for the Extraction

If you are creating a Generic document action, skip this step as there’s no schema to configure. Instead, use prompts to extract the data from your documents.

Configure the schema by selecting fields to hide from the response, fields that are required, and the minimum confidence score accepted for each field:

In the Outputs section, click Fields and select any of the extracted field names to configure the following settings:
- Visibility: defines if this field shows in the output JSON result. Click Visibility () to hide this field.
- Threshold: the minimum required confidence score accepted for this field. If the returned Confidence value is below the threshold, the document is queued for human review.
- Required: select this option to send the document to review if the field is missing or can’t be extracted.
You can click Focus () to center the preview in the corresponding field.
If your document contains tables, click Tables to configure the extraction settings for the table columns.

After configuring the schema, add Prompts to your document action.

Add Prompts to Refine the Result

Add Prompts to refine the results of the extraction by asking questions about the document using natural language:

In the Outputs panel, click Prompts.
Click Add New.

Configure the required details:

Name: a unique name for the query.

Instruction: a unique question or request in English, using natural language.

Prompts don’t support special characters. Ensure you use alphanumeric characters when writing your prompts. This limitation doesn’t apply to documents, which can contain any type of characters or symbols.

Required: select this option to send the document to review if the question can’t be answered.
Confidence Threshold: the minimum accepted value to prevent the document from getting queued for review.

Click Add.
Click Run to analyze the document again and see the results of the prompts.
Click Save.

Enhance Data Extraction with Einstein

By default, IDP uses its natural language processing model (IDP NLP) to extract data based on the configured prompts. When you create a document action, you can select Einstein to analyze the document and extract the data.

Use Einstein to answer complex questions about the document, such as asking the total of an invoice after deducting taxes and other concepts, or to extract data from non-standard documents such as a driver’s license or a medical record that contains handwriting.

To improve data extraction using Einstein to answer your prompts, see Enhancing Data Extraction with Einstein.

Configure Document Action Settings

After you create a document action, click Settings to configure it. The available settings depend on when you created the document action:

Setting	Description	Available From	Considerations
Model	Select the predictive large language model (LLM) to use when executing this document action.	February 5th, 2025	Choose the model that best fits your document processing needs. See Supported Models for additional details about each model.
PII Masking	Mask personally identifiable information from your documents before sending them to the predictive model.	February 5th, 2025	Salesforce Organization Preferences override this setting. Reduces character limit to 120,000 per request. See Prompt Limits for additional details. Cannot be used simultaneously with Image Recognition. For more information, see Einstein Trust Layer.
Image Recognition	Enables the model to read and interpret images within documents.	May 8th, 2025	Might increase Einstein credit consumption depending on the selected model. Contact your account executive for details on credit usage and consumption. Cannot be used simultaneously with PII Masking. Accuracy of checkbox processing varies by model depending on form complexity. See Supported Models for more information about how each model performs when processing checkboxes.

Setting

Description

Available From

Considerations

Model

Select the predictive large language model (LLM) to use when executing this document action.

February 5th, 2025

Choose the model that best fits your document processing needs. See Supported Models for additional details about each model.

PII Masking

Mask personally identifiable information from your documents before sending them to the predictive model.

February 5th, 2025

Salesforce Organization Preferences override this setting.
Reduces character limit to 120,000 per request.

See Prompt Limits for additional details.
Cannot be used simultaneously with Image Recognition.
For more information, see Einstein Trust Layer.

Image Recognition

Enables the model to read and interpret images within documents.

May 8th, 2025

Might increase Einstein credit consumption depending on the selected model.

Contact your account executive for details on credit usage and consumption.
Cannot be used simultaneously with PII Masking.
Accuracy of checkbox processing varies by model depending on form complexity.

See Supported Models for more information about how each model performs when processing checkboxes.

To access all new features, create a new document action.