Contact Us 1-800-596-4880

Analyzing Documents With Custom User-Defined Schemas

Analyze any type of document and fully customize the output structure by creating a Generic document action and enabling Customize Schema. Define the output fields and tables and configure prompt instructions for Einstein to analyze the document and extract the data.

You cannot enable Customize Schema for existing document actions, you must create a new Generic document action.

Einstein supports these predictive models:

  • OpenAI’s GPT-4o (gpt-4o-2024-08-06) LLM

  • OpenAI’s GPT-4o Mini (gpt-4o-mini-2024-07-18) LLM

Einstein accesses these models through the Salesforce Einstein Trust layer, which is part of the Salesforce Einstein platform.

Select the model to use during document analysis by configuring Settings in the document action editor.

Document actions created before February 5th support only OpenAI’s GPT-4o (gpt-4o-2024-05-13). To enable model selection, create a new document action.

A document action is a multi-step process that uses multiple AI engines to scan a document, filter out fields, and return a structured response as a JSON object. Each document action defines the types of documents it expects as input, the fields to extract, and the fields to filter out from the response.

After you create a document action, add reviewers and publish it to Anypoint Exchange. Publishing the document action enables RPA to execute it and also creates an API that you can call from any external system to analyze your documents.

For instructions about using pre-built schemas, see Analyzing Documents Using Pre-Built Schemas

Before You Begin

  1. Ensure you have any of the following Anypoint permissions:

    Manage Actions

    Gives a user complete access to IDP and assigns reviewer permission by default for every document action.

    Build Actions

    Enables a user to create, edit, publish, and execute document actions and assign reviewers to the actions.

  2. Enable MuleSoft Anypoint Platform to publish MuleSoft assets to Salesforce.

  3. Enable Einstein for Anypoint Platform.

Create a Generic Document Action and Enable Customize Schema

To analyze documents and fully customize the output structure, create a document action of the Generic type and enable Customize Schema:

  1. In the sidebar, click Document Actions.

  2. Click Create New.

  3. Select the Generic type.

  4. Enable Customize Schema.

    By enabling Customize Schema you can define fields and tables to create a structured output for your analyzed documents. This feature also enables an enhanced preview of the fields and tables in the document action builder.

    If Customize Schema is disabled, you can still configure Einstein prompts to analyze the document. See Enhancing Data Extraction with Einstein for instructions.

  5. Provide a Name and click Create.

Next, add the Fields and Tables to extract.

Add the Fields to Extract

To configure the fields to extract and define the structure of the output document:

  1. In the document action builder, click Select Files and select the files to use as an example or model for the extraction.

  2. In the Outputs panel, select the Fields tab.

  3. Click Start from Scratch to add the first field.

  4. Provide a name for the field and a prompt.

    If you don’t configure a prompt, Einstein scans the document and automatically extracts the data that better suits the field. However, make sure you test the output to confirm the extracted field matches the expected result. Otherwise, configure a prompt to provide Einstein with specific instructions about how to populate that field.

  5. Click Add New to continue adding fields.

  6. Click Run and verify the results of the extraction.

  7. Click Save.

Add the Tables to Extract

To configure data extraction from tables:

  1. In the document action builder, click Select Files and select the files to use as an example or model for the extraction.

    Skip this step if the example file is already loaded.

  2. In the Outputs panel, select the Tables tab.

  3. Click Start from Scratch to add the first table.

  4. Provide a name for the table and a prompt if necessary to configure the data to extract.

  5. Click Add New Column and provide a name and prompt for the column.

    • Continue adding columns until the table is complete.

  6. If there are more tables to configure, click Add New and repeat the steps to add the table columns.

  7. Click Run and verify the results of the extraction.

  8. Click Save.

After you finish configuring the output schema for your documents, add reviewers and publish your document action.

Configure Document Action Settings

After you create a document action, click Settings to configure the following:

  • Model

    Select the predictive model to use when executing this document action.

  • Enable PII Masking

    Mask any personally identifiable information from your documents before sending it to the predictive model. Your Salesforce Organization Preferences override this setting. For more information, see Einstein Trust Layer.

These settings are not available for document actions created before February 5th. To enable these settings, create a new document action.