Configuring Embedding Operations

Configure the [Embedding] Generate from text operation.

Configure the Embedding Generate from Text Operation

The [Embedding] Generate from text operation splits the text into chunks of the provided size and creates numeric vectors for each text chunk.

The [Embedding] Generate from text operation can be followed by either the [Store] Add operation or the [Store] Query operation. Both operations can use the output payload of the [Embedding] Generate from text operation without any transformation. When the [Embedding] Generate from text operation is used before the:

[Store] Add operation: The text along with the generated embeddings can be ingested into a vector store.
[Store] Query operation: The text is first used to generate an embedding that is then used to perform a query against the vector store.

When generating an embedding from text for query purposes, don’t provide any segmentation fields. Leave the Max Segment Size (Characters) and Max Overlap Size (Characters) fields blank.

To configure the [Embedding] Generate from text operation:

Select the operation on the Anypoint Code Builder or Studio canvas.
In the General properties tab for the operation, enter these values:
- Input Texts
  
  Enter the input list of texts to generate embeddings from.
- Max Segment Size (Characters)
  
  Enter the segment size of the document to be split in. This can be left blank as it is an optional input.
- Max Overlap Size (Characters)
  
  Enter the overlap size of the segments to fine-tune the similarity search. This can be left blank as it is an optional input.
- Model (Deployment) Name
  
  Enter the embedding model (deployment) name.

This is the XML for this operation:

<ms-vectors:embedding-generate-from-text
  doc:name="[Embedding] Generate from text"
  doc:id="92c7a561-7b99-4840-8ffb-f680c9e392dc"
  config-ref="MuleSoft_Vectors_Connector_Embedding_config"
  maxSegmentSizeInChar="3000"
  maxOverlapSizeInChars="300"
  embeddingModelName="sfdc_ai__DefaultOpenAITextEmbeddingAda_002">
  <ms-vectors:text ><![CDATA[#[payload.text]]]></ms-vectors:text>
</ms-vectors:embedding-generate-from-text>

Output Configuration

This operation responds with a JSON payload. This is an example response:

{
    "embeddings": [
      [-0.00683132, -0.0033572172, 0.02698761, -0.01291587, ...],
      [-0.0047172513, -0.03481483, 0.02046227, -0.037395656, ...],
      ...
    ]
    "text-segments": [
        {
            "metadata": {
                "index": "0"
            },
            "text": "In the modern world, technological advancements have become .",
        },
        {
            "metadata": {
                "index": "1"
            },
            "text": "E-commerce giants like Amazon and Alibaba have redefined ..",
        },
        ...
    ],
    "dimension": 1536
}

embeddings: List of generated embeddings.
- list-item (embedding)
text-segments: List of segments.
- list-item (text-segment)
  - text: Text segment.
  - metadata: Metadata key-value pairs.
    
    index: Segment or chunk number for the uploaded data source.
dimension: Dimension of the embeddings.

The operation also returns attributes that aren’t within the main JSON payload, that include information about token usage, for example:

{
  "embeddingModelDimension": 1536,
  "embeddingModelName": "sfdc_ai__DefaultOpenAITextEmbeddingAda_002",
  "tokenUsage": {
      "outputCount": 9,
      "totalCount": 18,
      "inputCount": 9
  },
  "additionalAttributes": {}
}

embeddingModelDimension: Dimension for the embedding model used.
embeddingModelName: Embedding model name used.
tokenUsage: Token usage metadata returned as attributes
- outputCount: Number of tokens used to generate the output
- totalCount: Total number of tokens used for input and output.
- inputCount: Number of tokens used to process the input.

Configuring Embedding Operations

Configure the Embedding Generate from Text Operation

Output Configuration

See Also