pabis.eu

Cut Costs in OpenSearch Serverless with Bedrock Knowledge Base - Part 2

26 August 2024

In the last post, we constructed all the necessary resources to create a Bedrock Knowledge Base utilizing OpenSearch Serverless. These were the CloudFormation templates, Lambda function for index creation and other static things like IAM Roles and S3 Buckets. Today, we will go through the process of combining all these resources to create a fully functioning "pipeline". We will use Step Functions for this.

Previous post can be found here

Let's look again at the high-level overview of the graph from the previous post. We see that we need to run some steps one by one. However, this won't map one-to-one in the Step Function. For example, we will need for the CloudFormation stack to be created and to do this, we need to periodically poll the status of the stack creation. What's more we need to handle failures so that already created resources get deleted in case something fails at a later stage.

High-level step function for cheaper Bedrock RAG

The Step Function resulting from all the steps is thus very long. I won't paste the entire code here. Also what is worth noting is that I decided to hardcode some of the values inside it using Terraform's templatefile. The complexity is already high enough so let's make it easier for us to experiment. Feel free to modify it so that your Step Function accepts parameters.

The Step Function

I exported this graph from AWS Console. I marked how each part maps to the high-level overview. The graph is so large that AWS Console renderer doesn't know how to display it properly.

Step Function with higher level sections marked

The graph above is a bit more complicated version of the step function marked as v1. However, after writing this entire post, I simplified it and made a new v2 version. It works the same but the choice blocks for CloudFormation stacks are merged.

Version 2 of the Step Function Version 1 of the Step Function

Create OpenSearch Collection

Let's go step by step through all the regions selected. First we'll create the OpenSearch Collection. We create the CloudFormation stack with needed parameters (LambdaRoleArn and BedrockRoleArn) and as a template we specify the URL of the uploaded file earlier to the S3 bucket. We also give CloudFormation the IAM role to assume. CreateStack call returns an ID that we use to poll the progress of stack creation. If stack status is still CREATE_IN_PROGRESS we wait for 30 seconds and call DescribeStacks again. If the status changes, we check if it was successful with CREATE_COMPLETE. If not, we delete this stack and fail the execution.

Create OpenSearch Collection stack

At the end of the step, I transform the output with ResultSelector. In order to reach the values from the CloudFormation Outputs, you need to use a sophisticated JsonPath expression.

{
  "DataSource.$": "$.Stacks[0]..Outputs[?(@.OutputKey==DataSource)].OutputValue",
  "KnowledgeBase.$": "$.Stacks[0]..Outputs[?(@.OutputKey==KnowledgeBase)].OutputValue"
}

I also appended output of each meaningful step with ResultPath so that we don't lose any information during pipeline execution. After all, we will need these values even when something fails in order to delete the stacks. This block's output is saved under $.CollectionStack, so $.CollectionStack.StackId refers to the stack with OpenSearch Collection. Stack with the knowledge base will be stored under $.KBStack.StackId.

Create OpenSearch Index

Because there's so far no CloudFormation resource to create an index in an OSS Collection, we need to use an HTTP call to the endpoint returned as the output from the previous stack (CollectionEndpoint). The easiest way is to use a Lambda function that will use OpenSearch client for Python. We prepared this function in the previous part of this project. This function simply authenticates with Lambda's IAM Role credentials against the endpoint and creates an index that is appropriately formatted. We pass necessary parameters to Lambda's event. These are vectorIndexName - which is the name of our index, field names: vectorName, textName and metadataName; and vectorDimensions - check this guide to find dimensions for your embedding model.

Create index

If the index creation function is successful, so if it returns 200 as status code, we assume that the index was created. I added a small wait time to let is synchronize between shards in OpenSearch. I found out that sometimes it's immediate and sometimes it takes even a minute to be done. I saved output of the Lambda function under ResultPath of $.LambdaResult.

Create Bedrock Knowledge Base

Now as we have both the collection and index inside it, we can create the Knowledge Base in Bedrock. We define two resources in CloudFormation which are the knowledge base itself as well as data source pointing to our S3 bucket. We give almost all of the same parameters as we did to the Lambda function (field names, index name) as well as ARNs of the embedding models, OpenSearch Collection, IAM role for Bedrock and name of the S3 bucket with the knowledge. As an embedding model, I chose Cohere Multilingual (arn:aws:bedrock:us-west-2::foundation-model/cohere.embed-multilingual-v3) which costs a bit but seems to be performant as well. You need to request access to it via Bedrock Console - see this guide.

Create Knowledge Base Stack

I repeated the same steps as for the first CloudFormation stack we created but this time I only wait 10 seconds. Creating a Knowledge Base should be fast.

Outputs of this stack are also transformed with ResultPath. In the next step you will need to reference them with [0] index as for whatever reason, the JsonPath produces an array with one element. As you can see the ResultPath of this block is $.KBStackOutputs and $.KBStack.

{
  "DataSourceId.$": "$.KBStackOutputs.DataSource[0]",
  "KnowledgeBaseId.$": "$.KBStackOutputs.KnowledgeBase[0]"
}

Syncing the Knowledge Base

Now we can sync the Knowledge Base with the Data Source which is out S3 bucket. Put some files into this S3 bucket. I for example added some PDFs: Certified Machine Learning Associate and Specialty exam guides and complete EC2 User Guide in PDF format. In total it is around 32 MB.

The synchronization happens when you call StartIngestionJob. It will return an ID that you need to reference afterwards in GetIngestionJob. You then have to poll it until it's done and the status changes to either COMPLETE or FAILED. If it fails, we can destroy both stacks we created previously. The ResultPath for this block is saved in $.IngestionJob.

I also wait with polling in this case for 30 seconds as it can take a long time. Depending on the amount of data you put into the S3 bucket and the embedding model you choose, this step can be the most expensive one in this project.

Sync Knowledge Base

I observed the synchronization process. In the end it produced 95 MB of indexed chunks of text and took 8 minutes to complete.

Data is syncing

Performing RAG Inference

Now you have a fully functioning Knowledge Base. It's your choice now if you want to use it immediately for performing a batch inference or if you want to create a Wait step, for example until specified date such as 5PM of today so that this Knowledge Base can be used whenever needed by your team. In my example I just do some queries to Claude 3 Haiku.

Before I created all the queries, I wanted to test if the knowledge base is even usable by the LLM. So, I paused the execution of the Step Function by removing the DeleteStack steps and used Bedrock's Console to test it. I found this interesting frame in EC2 User Guide about instance store as the root device.

EC2 User Guide instance store

First, I asked clean Haiku in the Chat Playground to see if it maybe knows it internally. Fortunately (or not), the answer was far from the truth. It hallucinated the instance types. I used temperature of 0.7 and top P of 0.9.

Haiku internal knowledge

With the same settings of temperature = 0.7 and top P = 0.9, I asked Haiku in RAG loop (using Test knowledge base button) the same question. It answered correctly and gave the reference in my S3 bucket.

Haiku RAG knowledge base

Because the answer was promising, I created a set of queries to run in parallel in the Step Function. Each of the questions I put into S3 bucket as JSONs returned from Bedrock, which includes both the response and references from the Knowledge Base.

Perform RAG Stuff

In order for the object keys to be unique in the bucket, I added a time and date retrieved from Step Functions context ($$) along with States.Format function. So the configuration of the PutObject call looks like this:

{
  "Body.$": "$.RAGResult",
  "Bucket": "my-outputs-abcdef12",
  "Key.$": "States.Format('ec2_discount/{}.json', $$.State.EnteredTime)"
}

Cleanup

After all queries are done, we can delete all the stacks we created in order to save on costs. Deleting the knowledge base should be the first one as it depends on the collection. The section below also includes the deletion of stacks in case of failures along the way. However, I want to note that not all of the failure scenarios are covered, so you might need to adjust it for more cases, as it can lead to the OpenSearch Collection running without being used.

Deleting the stacks

Costs

I tested how much does it cost to run this pipeline versus having the OpenSearch Collection created for a day. On August 25th I ran the pipeline without doing anything else on this AWS account. On the same day, I created an empty OpenSearch Collection in a different account at 4PM and let it run for 24 hours.

The single run of the step function cost me $0.25 - this includes running OSS, converting text to embeddings using Cohere and Haiku inference itself. On the other account, where I left an empty OpenSearch Collection running for somewhat less than 24 hours I got bill for $2.94.

Price for pipeline

Price for OpenSearch

The largest cost for the pipeline was as expected synchronization between S3 and the index. Cohere cost me $0.16 for this single run. This price can be reduced if we use cheaper model such as Cohere English or Amazon Titan. However, if you plan to scan gigabytes of data, it can be more reasonable to have the OpenSearch Collection running. As on screenshot in "Syncing the Knowledge Base" section, my index resulted in size of 95 MB.

Improvements

While describing this process I found some places where the function can be improved, so I did just that. The new function is named step-function-v2.yaml and can be found in the repository. You can select it when deploying with OpenTofu by changing the variable step_function_ver to v2.

I simplified and cleaned up the outputs so that each block has it's own ResultPath instead of mixing all the values together. I also merged choice statements so that they don't have only two choices, but three - for success, for "in progress" and any other status of the CloudFormation stack - for example Is KB Stack Created and Is KB Stack Success is now a one choice step.