Bedrock and pydantic-ai with observability from day one

Terraform PR Agent June 2, 2026 · 9 min read

bedrock
pydantic-ai
terraform
observability

This is the first post in Terraform PR Agent, a series that builds an AI agent which writes Terraform and opens review-ready pull requests, running on AWS Bedrock with pydantic-ai. Later posts add observability, file tools and a validate loop, conventions and policy, evals for choosing a model with numbers, and the guardrails to ship it against a real repo. The posts share one project scaffold, so the prerequisites below are a one-time setup for the whole series.

Prerequisites (one-time setup for the series)

Tooling and AWS access common to every post in this series.

Tooling

Terraform 1.x (install). Every post provisions infrastructure with Terraform.
uv for Python project management (install). Each post ships a runnable script you can invoke with uv run.
direnv (install) so terraform, uv run, and aws pick up AWS credentials automatically on cd. The project scaffold ships an .envrc that sources a gitignored .envrc.local.
(Optional) A coding agent such as Claude Code, Cursor, Codex, or Gemini CLI to consume the AgentPrompt blocks throughout the series. Not required (each prompt has a manual equivalent shown alongside it), but it skips the boilerplate.

Agent prompt: Check and install missing tooling

You are helping set up tooling for a tutorial project.

For each of `terraform`, `uv`, and `direnv`, run `command -v` to
check whether it is installed. If present, print the version and
continue.

For missing tools, detect the system package manager in this order:
`command -v brew`, `command -v dnf`, `command -v apt-get`. Use the
first one available:

  - Terraform: `brew tap hashicorp/tap && brew install hashicorp/tap/terraform`,
    dnf via the HashiCorp RPM repo, or apt via the HashiCorp deb repo.
  - uv: `brew install uv`, or the official installer
    `curl -LsSf https://astral.sh/uv/install.sh | sh`.
  - direnv: `brew install direnv`, `dnf install direnv`, or
    `apt-get install direnv`.

If no package manager is available or the install fails, stop and
link the manual install page so the developer can finish by hand:

  - Terraform: https://developer.hashicorp.com/terraform/install
  - uv: https://docs.astral.sh/uv/getting-started/installation/
  - direnv: https://direnv.net/docs/installation.html

After installing direnv, do not modify any shell rc files. Print the
hook line for the developer's shell (bash, zsh, or fish) and the path
to the relevant rc file, then wait for them to apply it themselves.

Report which tools were already present, which you installed, and
which need manual follow-up.

AWS access

A sandbox, test, or personal AWS account with permission to create, modify, and delete the resources discussed in each post. If you don’t have one, follow the official Create Your AWS Account walkthrough (about ten minutes; requires a credit card and a phone number for verification). Treat it as disposable - you can close it from the billing console after the series.
AWS credentials available locally via aws configure sso, aws configure, or whichever method matches your setup. You wire them into the project through .envrc.local in the next section, not your shell rc.

Anthropic First Time Use

Bedrock requires a one-time use-case form per account (or per AWS Organization management account) before Anthropic models can be invoked. Easiest path: open any Claude model in the Bedrock console playground and submit the form. Auto-subscription on first invoke can take up to 15 minutes to settle, so it is worth clearing this before post 1.

CLI alternative and verification

Programmatic equivalent (requires AWS CLI 2.27.42 or later):

1
aws bedrock put-use-case-for-model-access \
2
  --form-data "$(printf '{"companyName":"...","companyWebsite":"...","intendedUsers":"1","industryOption":"...","otherIndustryOption":"","useCases":"..."}' | base64)"

Verify:

1
aws bedrock get-foundation-model-availability \
2
  --model-id anthropic.claude-haiku-4-5-20251001-v1:0 \
3
  --region eu-west-1

Look for agreementAvailability.status: AVAILABLE. Expected output:

1
{
2
  "modelId": "anthropic.claude-haiku-4-5-20251001-v1",
3
  "agreementAvailability": { "status": "AVAILABLE" },
4
  "authorizationStatus": "AUTHORIZED",
5
  "entitlementAvailability": "AVAILABLE",
6
  "regionAvailability": "AVAILABLE"
7
}

If the form has not been submitted, only agreementAvailability.status flips to NOT_AVAILABLE. The other three fields stay green even when invocation would fail, so do not rely on them.

Project scaffold

Download and unpack the scaffold into your projects directory (swap ~/projects for whatever location you prefer):

1
mkdir -p ~/projects
2
cd ~/projects
3
curl -fsSL https://andreaslang.dev/terraform-pr-agent/terraform-pr-agent.tar.gz | tar xz

You should now have:

1
terraform-pr-agent/
2
  infra/         Terraform (Bedrock, CloudWatch, DynamoDB, ...)
3
  agent/         Python package (pydantic-ai code)
4
  evals/         Golden cases + eval harness
5
  scripts/       Single-file Python scripts (PEP 723 inline deps, run via `uv run`)
6
  AGENTS.md      Conventions for AI coding agents (see below)
7
  .envrc         Sources .envrc.local for direnv
8
  .envrc.local   Your AWS env (gitignored, fill in locally)
9
  .gitignore     Excludes .envrc.local, .terraform/, state files

This is the base scaffold; subsequent posts in the series add and modify files inside this tree. Each later post links the cumulative checkpoint here if you want to skip ahead or recover from drift.

Or have an agent do the same:

Agent prompt: Fetch and unpack the project scaffold

1
You are helping set up a tutorial project on a developer's machine.
2

3
Fetch and unpack the scaffold tarball into ~/projects/:
4

5
  mkdir -p ~/projects
6
  cd ~/projects
7
  curl -fsSL \
8
    https://andreaslang.dev/terraform-pr-agent/terraform-pr-agent.tar.gz \
9
    | tar xz
10

11
Verify with `ls -la ~/projects/terraform-pr-agent` and report any
12
failures. Do not install dependencies, do not run `direnv allow`,
13
and do not fill in any values in .envrc.local: the developer will
14
do that themselves in the next step.

Edit .envrc.local, uncomment Option A (named profile) or Option B (static / temporary creds), then run direnv allow . in the project root. The template:

1
# Local AWS env for this project. Gitignored.
2
# Set the region (shared by both auth options) and uncomment ONE of the
3
# credential blocks below, then run `direnv allow` in this directory.
4

5
export AWS_REGION=eu-west-1
6

7
# Option A: Named profile (e.g. from `aws configure sso` or ~/.aws/credentials).
8
# export AWS_PROFILE=your-profile-name
9

10
# Option B: Static or temporary credentials.
11
# export AWS_ACCESS_KEY_ID=...
12
# export AWS_SECRET_ACCESS_KEY=...
13
# export AWS_SESSION_TOKEN=...   # only if using temporary creds

What else is in the scaffold

1
source_env_if_exists .envrc.local

1
.envrc.local
2
.terraform/
3
*.tfstate
4
*.tfstate.*
5
.direnv/

The AGENTS.md template is tool-agnostic: Claude Code, Cursor, Codex, and Gemini CLI all read it automatically at session start.

1
# Project conventions
2

3
Tutorial project from the Terraform PR Agent series at
4
https://andreaslang.dev/posts/terraform-pr-agent/
5

6
> When a post introduces a new top-level directory, ship an updated
7
> `AGENTS.md` in that post's `scaffold/` overlay so this Layout section
8
> stays accurate.
9

10
## Layout
11

12
- `infra/` Terraform (Bedrock, CloudWatch, DynamoDB, ...)
13
- `agent/` Python package (pydantic-ai code)
14
- `evals/` Golden cases + eval harness
15
- `scripts/` Standalone single-file Python scripts with PEP 723 inline
16
  dependency metadata. Run with `uv run scripts/<name>.py`; do not move
17
  these into the `agent` package or a shared `pyproject.toml`.
18

19
## Tooling
20

21
- Python: use `uv` for dependency and script management. Run scripts with
22
  `uv run`. Single-file scripts under `scripts/` declare their deps inline
23
  via PEP 723 (`# /// script ... # ///` headers) so they stay
24
  self-contained and runnable without a project venv.
25
- Terraform: 1.x.
26
- AWS: credentials configured via `aws configure sso` or static keys.
27

28
## Conventions
29

30
- Run `terraform validate` after editing any `.tf` file.
31
- Run `terraform fmt` before committing `.tf`.
32
- Never auto-apply Terraform; print the plan first and wait for confirmation.
33
- Don't introduce dependencies the current post hasn't covered.
34
- AWS resources live in a sandbox sub-account; never assume a production account.

What this post covers

We stand up Bedrock + pydantic-ai end to end and put a CloudWatch dashboard and alert on top before writing a single line of agent logic. A local pydantic-ai script validates the stack by invoking Claude Haiku 4.5 through a project-scoped application inference profile, with all the permissions wrapped in a single IAM role.

Most projects only wire monitoring once there is something to monitor. Agentic workloads bite earlier: a small shift in how prompts or data hit the model can turn a previously fine task into many more turns and a much bigger bill, and you do not notice until the invoice lands. Eval (covered in a later post) and monitoring are not optional past hello-world.

Bedrock IAM and inference profile

Three AWS resources need to exist before any agent code can run:

The Bedrock inference profile. We could call the model directly via its Bedrock model ID, but an inference profile buys us two things.
- It allows you to switch out models without changing the agent (same inference profile ARN calling a different model)
- It allows you to monitor usage and create alerts (what we need)
An IAM role that we can assume within our AWS account ("arn:aws:iam::${...}:root") to invoke the model and use the Converse API. AWS recommends the Converse API for most chat-style calls, and the pydantic-ai model we use here, BedrockConverseModel, goes through it. You still need the IAM permissions for InvokeModel, though, because that is the action name Converse maps to internally.
Basic terraform main.tf with the provider setup.

The files below drop into the paths shown on each tab; copy them into your own project, then run terraform init and terraform apply. State stays local since there is no backend block - fine for a learning project.

1
locals {
2
  bedrock_model_id            = "anthropic.claude-haiku-4-5-20251001-v1:0"
3
  bedrock_cross_region_prefix = "eu"
4

5
  system_inference_profile_arn = format(
6
    "arn:aws:bedrock:%s:%s:inference-profile/%s.%s",
7
    data.aws_region.current.region,
8
    data.aws_caller_identity.current.account_id,
9
    local.bedrock_cross_region_prefix,
10
    local.bedrock_model_id,
11
  )
12
}
13

14
resource "aws_bedrock_inference_profile" "agent" {
15
  name        = "terraform-pr-agent"
16
  description = "Application inference profile for the Terraform PR Agent series."
17

18
  model_source {
19
    copy_from = local.system_inference_profile_arn
20
  }
21
}
22

23
output "inference_profile_arn" {
24
  value = aws_bedrock_inference_profile.agent.arn
25
}

1
data "aws_iam_policy_document" "bedrock_invoke" {
2
  statement {
3
    actions = [
4
      "bedrock:Converse",
5
      "bedrock:ConverseStream",
6
      "bedrock:InvokeModel",
7
      "bedrock:InvokeModelWithResponseStream",
8
    ]
9
    # Bedrock foundation-model ARNs do not pin to the caller region; the
10
    # inference profile fans out cross-region, so the * region segment is required.
11
    #trivy:ignore:avd-aws-0057
12
    resources = [
13
      aws_bedrock_inference_profile.agent.arn,
14
      local.system_inference_profile_arn,
15
      "arn:aws:bedrock:*::foundation-model/${local.bedrock_model_id}",
16
    ]
17
  }
18

19
  # Anthropic models on Bedrock are distributed via AWS Marketplace. On the
20
  # first invocation from a new account, Bedrock auto-subscribes the account
21
  # to the model product, which requires the invoking principal to hold these
22
  # Marketplace actions. Once the subscription is active these calls become
23
  # no-ops, but the principal still needs ViewSubscriptions on every call so
24
  # Bedrock can confirm the subscription is in place.
25
  statement {
26
    actions = [
27
      "aws-marketplace:Subscribe",
28
      "aws-marketplace:Unsubscribe",
29
      "aws-marketplace:ViewSubscriptions",
30
    ]
31
    # Marketplace subscription actions are global by design.
32
    #trivy:ignore:avd-aws-0057
33
    resources = ["*"]
34
  }
35
}
36

37
resource "aws_iam_policy" "bedrock_invoke" {
38
  name        = "terraform-pr-agent-bedrock-invoke"
39
  description = "Invoke Claude via the terraform-pr-agent inference profile."
40
  policy      = data.aws_iam_policy_document.bedrock_invoke.json
41
}
42

43
data "aws_iam_policy_document" "agent_assume" {
44
  statement {
45
    actions = ["sts:AssumeRole"]
46
    principals {
47
      type        = "AWS"
48
      identifiers = ["arn:aws:iam::${data.aws_caller_identity.current.account_id}:root"]
49
    }
50
  }
51
}
52

53
resource "aws_iam_role" "agent" {
54
  name               = "terraform-pr-agent"
55
  assume_role_policy = data.aws_iam_policy_document.agent_assume.json
56
}
57

58
resource "aws_iam_role_policy_attachment" "agent_bedrock_invoke" {
59
  role       = aws_iam_role.agent.name
60
  policy_arn = aws_iam_policy.bedrock_invoke.arn
61
}
62

63
output "agent_role_arn" {
64
  value = aws_iam_role.agent.arn
65
}

1
terraform {
2
  required_providers {
3
    aws = {
4
      source  = "hashicorp/aws"
5
      version = "6.17.0"
6
    }
7
  }
8
}
9

10
provider "aws" {}
11

12
data "aws_caller_identity" "current" {}
13
data "aws_region" "current" {}

First pydantic-ai call

Now that the infra is in place, let’s call the model. Calling Bedrock first means any region, quota, or IAM mistakes surface immediately, before there’s any agent code to debug them against. We will do it with a uv run script that uses PEP 723 - Inline script metadata. It allows you to declare the dependencies of a script in metadata and uv will download them for you before running the script. The script we build is a simple console chat interface with the twist that to shut it down we can tell the agent that we want to. This works by enforcing a structured response type, but more on that later.

Add the new env vars to `.envrc.local`

Before we can call anything we need some information about the infra we just set up. In particular the AGENT_ROLE_ARN (we will assume this role in the script) and the BEDROCK_INFERENCE_PROFILE_ARN (we will use this to invoke the model). You can get these by running the terraform output command as shown in the comments of the .envrc.local file.

23
# Fill in after `terraform apply` in post 1, then run `direnv reload`.
24
# Values come from terraform outputs:
25
#   terraform -chdir=infra output -raw agent_role_arn
26
#   terraform -chdir=infra output -raw inference_profile_arn
27
# export AGENT_ROLE_ARN=arn:aws:iam::<account>:role/terraform-pr-agent
28
# export BEDROCK_INFERENCE_PROFILE_ARN=arn:aws:bedrock:eu-west-1:<account>:application-inference-profile/<id>

Full script

You can copy the full script below into a file called scripts/chat.py. We will look at specific sections below, so keep it at hand.

1
# /// script
2
# requires-python = ">=3.11"
3
# dependencies = [
4
#   "pydantic-ai-slim[bedrock]>=0.0.30",
5
#   "boto3>=1.35",
6
# ]
7
# ///
8
"""Command-line chat against Claude Haiku 4.5 on Bedrock.
9

10
Reads three env vars (set them in .envrc.local; see post 1):
11

12
  AGENT_ROLE_ARN
13
    IAM role to assume. Source: terraform -chdir=infra output -raw agent_role_arn
14
  BEDROCK_INFERENCE_PROFILE_ARN
15
    Application inference profile that wraps the EU CRIS profile.
16
    Source: terraform -chdir=infra output -raw inference_profile_arn
17
  AWS_REGION
18
    Region for the bedrock-runtime endpoint. Defaults to eu-west-1.
19

20
Run with `uv run scripts/chat.py` from the project root.
21
"""
22

23
from __future__ import annotations
24

25
import os
26
import sys
27

28
import boto3
29
from botocore.exceptions import ClientError
30
from pydantic import BaseModel, Field
31
from pydantic_ai import Agent
32
from pydantic_ai.models.bedrock import BedrockConverseModel
33
from pydantic_ai.providers.bedrock import BedrockProvider
34

35
MODEL_ID = "anthropic.claude-haiku-4-5-20251001-v1:0"
36

37
SYSTEM_PROMPT = (
38
    "You are a conversational assistant running in a small command-line chat. "
39
    "Respond naturally to the user in the `message` field. "
40
    "Set `terminate=True` only when the user explicitly asks to end the "
41
    "conversation (for example: bye, quit, exit, stop, that's all). "
42
    "When you set `terminate=True`, include a short farewell in `message`. "
43
    "In every other case set `terminate=False`."
44
)
45

46

47
class Reply(BaseModel):
48
    """Structured response on every turn so the loop can decide when to stop."""
49

50
    message: str = Field(description="Natural-language reply to the user.")
51
    terminate: bool = Field(
52
        description=(
53
            "True only when the user has explicitly asked to end the conversation; False otherwise."
54
        )
55
    )
56

57

58

59

60
def require_env(name: str) -> str:
61
    value = os.environ.get(name)
62
    if not value:
63
        sys.stderr.write(
64
            f"error: {name} is not set. Add it to .envrc.local from "
65
            f"`terraform output`, then run `direnv reload`.\n"
66
        )
67
        sys.exit(2)
68
    return value
69

70

71
def assume_role(role_arn: str) -> dict[str, str]:
72
    """Trade the local SSO identity for short-lived terraform-pr-agent creds."""
73
    sts = boto3.client("sts")
74
    response = sts.assume_role(RoleArn=role_arn, RoleSessionName="chat-cli")
75
    return response["Credentials"]
76

77

78

79

80
def build_model(role_arn: str, inference_profile_arn: str, region: str) -> BedrockConverseModel:
81
    creds = assume_role(role_arn)
82
    bedrock_client = boto3.client(
83
        "bedrock-runtime",
84
        region_name=region,
85
        aws_access_key_id=creds["AccessKeyId"],
86
        aws_secret_access_key=creds["SecretAccessKey"],
87
        aws_session_token=creds["SessionToken"],
88
    )
89
    # The application inference profile ARN goes in settings, not model_name:
90
    # pydantic-ai still uses the foundation-model id for capability detection
91
    # and routes the actual Converse call through the profile.
92
    return BedrockConverseModel(
93
        MODEL_ID,
94
        provider=BedrockProvider(bedrock_client=bedrock_client),
95
        settings={"bedrock_inference_profile": inference_profile_arn},
96
    )
97

98

99

100

101
def main() -> None:
102
    role_arn = require_env("AGENT_ROLE_ARN")
103
    inference_profile_arn = require_env("BEDROCK_INFERENCE_PROFILE_ARN")
104
    region = os.environ.get("AWS_REGION", "eu-west-1")
105

106
    try:
107
        model = build_model(role_arn, inference_profile_arn, region)
108
    except ClientError as err:
109
        sys.stderr.write(
110
            f"error: could not assume {role_arn}: {err}\n"
111
            f"check that `terraform apply` has run and AWS_PROFILE is set.\n"
112
        )
113
        sys.exit(1)
114

115
    agent = Agent(model, output_type=Reply, system_prompt=SYSTEM_PROMPT)
116

117
    print('Chatting with Claude Haiku 4.5 via terraform-pr-agent. Type "bye" to exit.')
118
    history: list = []
119
    while True:
120
        try:
121
            user_input = input("you> ").strip()
122
        except (EOFError, KeyboardInterrupt):
123
            print()
124
            break
125
        if not user_input:
126
            continue
127
        try:
128
            result = agent.run_sync(user_input, message_history=history)
129
        except ClientError as err:
130
            sys.stderr.write(f"bedrock error: {err}\n")
131
            continue
132
        reply = result.output
133
        print(f"agent> {reply.message}")
134
        history = result.all_messages()
135
        if reply.terminate:
136
            break
137

138

139
if __name__ == "__main__":
140
    main()

System prompt - Basic Chat

Here you can see the system prompt, which sets the scene of the agent’s purpose. This is not the system prompt we will use for the agent in our target use case (creating terraform PRs), but just a demo prompt to give you a feel for how the LLM works. Note the specific instructions referencing termination of the program. They reference the Reply schema, which we will define next.

38
SYSTEM_PROMPT = (
39
    "You are a conversational assistant running in a small command-line chat. "
40
    "Respond naturally to the user in the `message` field. "
41
    "Set `terminate=True` only when the user explicitly asks to end the "
42
    "conversation (for example: bye, quit, exit, stop, that's all). "
43
    "When you set `terminate=True`, include a short farewell in `message`. "
44
    "In every other case set `terminate=False`."
45
)

Reply schema

Now you see why the tool is called pydantic-ai. The response model (what the LLM has to return) is a pydantic BaseModel. If you know pydantic then you know that for each model you can generate a JSON schema, which can be communicated to the LLM. pydantic-ai picks one of three modes depending on what the model supports. Forced tool call has the LLM call a tool whose schema matches our response model. Native output lets the model return structured data directly, but the model has to support it. Prompted output is the fallback: pydantic-ai injects schema instructions into the prompt and parses the reply. More information here. Tool output with enforced usage is the default.

50
class Reply(BaseModel):
51
    """Structured response on every turn so the loop can decide when to stop."""
52

53
    message: str = Field(description="Natural-language reply to the user.")
54
    terminate: bool = Field(
55
        description=(
56
            "True only when the user has explicitly asked to end the conversation; False otherwise."
57
        )
58
    )

Assume the role

Strictly we do not need to assume a role here - the user profile has plenty of permissions on its own. But the script then runs nothing like a deployed Lambda would: in production the agent runs under a narrow, single-purpose role, not your broad dev profile, and exercising that path now catches role-related issues before they hit deploy.

76
def assume_role(role_arn: str) -> dict[str, str]:
77
    """Trade the local SSO identity for short-lived terraform-pr-agent creds."""
78
    sts = boto3.client("sts")
79
    response = sts.assume_role(RoleArn=role_arn, RoleSessionName="chat-cli")
80
    return response["Credentials"]

Build the model

Now we can put everything together and build the model. There is a small wrinkle here: we pass both the inference profile and the model ID to the model builder. The inference profile is what is actually called and the model ID is what helps pydantic-ai to use the right capabilities, but you could also overwrite them yourself in bedrock_additional_model_request_fields, which you may have to do for newer or less popular models where pydantic-ai’s defaults are missing or broken. Some settings like thinking may also interfere with enforced tool calls.

87
def build_model(role_arn: str, inference_profile_arn: str, region: str) -> BedrockConverseModel:
88
    creds = assume_role(role_arn)
89
    bedrock_client = boto3.client(
90
        "bedrock-runtime",
91
        region_name=region,
92
        aws_access_key_id=creds["AccessKeyId"],
93
        aws_secret_access_key=creds["SecretAccessKey"],
94
        aws_session_token=creds["SessionToken"],
95
    )
96
    # The application inference profile ARN goes in settings, not model_name:
97
    # pydantic-ai still uses the foundation-model id for capability detection
98
    # and routes the actual Converse call through the profile.
99
    return BedrockConverseModel(
100
        MODEL_ID,
101
        provider=BedrockProvider(bedrock_client=bedrock_client),
102
        settings={"bedrock_inference_profile": inference_profile_arn},
103
    )

The conversation loop

Finally, the agent loop, to chat with the agent. We will loop indefinitely until the agent provides us with the flag to terminate. We print whatever message the agent returns to the console. Give it a try!

127
    history: list = []
128
    while True:
129
        try:
130
            user_input = input("you> ").strip()
131
        except (EOFError, KeyboardInterrupt):
132
            print()
133
            break
134
        if not user_input:
135
            continue
136
        try:
137
            result = agent.run_sync(user_input, message_history=history)
138
        except ClientError as err:
139
            sys.stderr.write(f"bedrock error: {err}\n")
140
            continue
141
        reply = result.output
142
        print(f"agent> {reply.message}")
143
        history = result.all_messages()
144
        if reply.terminate:
145
            break

Notice in the example chat below that the LLM appears to remember my name. The LLM has no memory of its own; the recall comes from the message_history that we pass in and never reset. The fact that we do not reset it does not matter for our toy example, but in production you would need to manage what gets passed as history and what the agent “forgets”. Additionally, things like RAG (retrieval-augmented generation) could be used to give a longer-term memory (often semantic search with a vector store is used for this). It sounds complicated, but is often just enriching the user prompt (your first message to the agent) with semantically matching context from past conversations.

Sample chat session printed by scripts/chat.py: greeting Claude and giving my name, asking a follow-up that Claude answers using the name, then exiting.

CloudWatch dashboard for inference profile metrics

If you tried out the agent in the previous section, you should have seen some metrics in CloudWatch. The screenshot below from my sandbox account shows a cryptic model ID, which is our inference profile ID. This is not very human-readable, but we’ll fix that in this section with a CloudWatch dashboard that surfaces the metrics worth watching.

The dashboard scopes every widget to the terraform-pr-agent application inference profile via the ModelId dimension (Bedrock reuses that dimension name for the inference profile ID). Each metric carries a label override so the legend shows the friendly profile name rather than the ID or formula. Four widgets in a 2x2 grid:

Tokens: InputTokenCount and OutputTokenCount (sum). Your usage and cost driver.
Cache tokens: CacheReadInputTokenCount and CacheWriteInputTokenCount (sum). Stays at zero - Haiku usage in this post sits below Bedrock’s caching threshold, and we have not opted into prompt caching yet anyway.
Invocations and errors: Invocations plus InvocationClientErrors, InvocationServerErrors, and InvocationThrottles (sum). Health at a glance.
Latency (ms): InvocationLatency average and p99.

Copy the file below and save it as infra/cloudwatch.tf in the project we started, then run terraform apply.

1
locals {
2
  cloudwatch_region = data.aws_region.current.region
3

4
  dashboard_label = aws_bedrock_inference_profile.agent.name
5
  profile_id      = aws_bedrock_inference_profile.agent.id
6
}
7

8
resource "aws_cloudwatch_dashboard" "agent" {
9
  dashboard_name = aws_bedrock_inference_profile.agent.name
10
  dashboard_body = jsonencode({
11
    widgets = [
12
      {
13
        type   = "metric"
14
        x      = 0
15
        y      = 0
16
        width  = 12
17
        height = 6
18
        properties = {
19
          title  = "Tokens"
20
          region = local.cloudwatch_region
21
          view   = "timeSeries"
22
          stat   = "Sum"
23
          period = 60
24
          metrics = [
25
            # here we plug in the labels so this looks nicer
26
            ["AWS/Bedrock", "InputTokenCount", "ModelId", local.profile_id, { label = "${local.dashboard_label} / input" }],
27
            # the dot syntax allows not to repeat the same as above
28
            [".", "OutputTokenCount", ".", ".", { label = "${local.dashboard_label} / output" }],
29
          ]
69 collapsed lines
30
        }
31
      },
32
      {
33
        type   = "metric"
34
        x      = 12
35
        y      = 0
36
        width  = 12
37
        height = 6
38
        properties = {
39
          title  = "Cache tokens"
40
          region = local.cloudwatch_region
41
          view   = "timeSeries"
42
          stat   = "Sum"
43
          period = 60
44
          metrics = [
45
            ["AWS/Bedrock", "CacheReadInputTokenCount", "ModelId", local.profile_id, { label = "${local.dashboard_label} / cache read" }],
46
            [".", "CacheWriteInputTokenCount", ".", ".", { label = "${local.dashboard_label} / cache write" }],
47
          ]
48
        }
49
      },
50
      {
51
        type   = "metric"
52
        x      = 0
53
        y      = 6
54
        width  = 12
55
        height = 6
56
        properties = {
57
          title  = "Invocations and errors"
58
          region = local.cloudwatch_region
59
          view   = "timeSeries"
60
          stat   = "Sum"
61
          period = 60
62
          metrics = [
63
            ["AWS/Bedrock", "Invocations", "ModelId", local.profile_id, { label = "${local.dashboard_label} / invocations" }],
64
            [".", "InvocationClientErrors", ".", ".", { label = "${local.dashboard_label} / 4xx" }],
65
            [".", "InvocationServerErrors", ".", ".", { label = "${local.dashboard_label} / 5xx" }],
66
            [".", "InvocationThrottles", ".", ".", { label = "${local.dashboard_label} / throttles" }],
67
          ]
68
        }
69
      },
70
      {
71
        type   = "metric"
72
        x      = 12
73
        y      = 6
74
        width  = 12
75
        height = 6
76
        properties = {
77
          title  = "Latency (ms)"
78
          region = local.cloudwatch_region
79
          view   = "timeSeries"
80
          period = 60
81
          metrics = [
82
            ["AWS/Bedrock", "InvocationLatency", "ModelId", local.profile_id, { label = "${local.dashboard_label} / avg", stat = "Average" }],
83
            [".", ".", ".", ".", { label = "${local.dashboard_label} / p99", stat = "p99" }],
84
          ]
85
        }
86
      },
87
    ]
88
  })
89
}
90

91
output "cloudwatch_dashboard_url" {
92
  value = format(
93
    "https://%s.console.aws.amazon.com/cloudwatch/home?region=%s#dashboards/dashboard/%s",
94
    local.cloudwatch_region,
95
    local.cloudwatch_region,
96
    aws_cloudwatch_dashboard.agent.dashboard_name,
97
  )
98
}

After terraform apply, the cloudwatch_dashboard_url output points straight at the new dashboard.

Daily token threshold alarm

You can copy the file below into infra/alerts.tf and run terraform apply to create an alarm that will yell when the token usage exceeds a daily threshold.

1
# alias/aws/sns is the AWS-managed SNS key; a CMK is unnecessary for an
2
# operational alarm topic that publishes alarm-fired events, not audit data.
3
#trivy:ignore:avd-aws-0136
4
resource "aws_sns_topic" "agent_alerts" {
5
  name              = "terraform-pr-agent-alerts"
6
  kms_master_key_id = "alias/aws/sns"
7
}
8

9
# The subscription stays PENDING until the recipient clicks the AWS
10
# confirmation email. Alarms fire either way, but no email goes out until the
11
# subscription is confirmed.
12
resource "aws_sns_topic_subscription" "agent_alerts_email" {
13
  topic_arn = aws_sns_topic.agent_alerts.arn
14
  protocol  = "email"
15
  endpoint  = var.alert_email
16
}
17

18
resource "aws_cloudwatch_metric_alarm" "daily_tokens" {
19
  alarm_name          = "${aws_bedrock_inference_profile.agent.name}-daily-tokens"
20
  alarm_description   = "Daily input + output token usage for the agent inference profile exceeded the configured threshold."
21
  comparison_operator = "GreaterThanThreshold"
22
  evaluation_periods  = 1
23
  threshold           = var.daily_token_alarm_threshold
24
  treat_missing_data  = "notBreaching"
25

26
  metric_query {
27
    id          = "total"
28
    expression  = "input + output"
29
    label       = "Total tokens (input + output)"
30
    return_data = true
31
  }
32

33
  metric_query {
34
    id = "input"
35
    metric {
36
      namespace   = "AWS/Bedrock"
37
      metric_name = "InputTokenCount"
38
      dimensions  = { ModelId = local.profile_id }
39
      stat        = "Sum"
40
      period      = 86400
41
    }
42
  }
43

44
  metric_query {
45
    id = "output"
46
    metric {
47
      namespace   = "AWS/Bedrock"
48
      metric_name = "OutputTokenCount"
49
      dimensions  = { ModelId = local.profile_id }
50
      stat        = "Sum"
51
      period      = 86400
52
    }
53
  }
54

55
  alarm_actions = [aws_sns_topic.agent_alerts.arn]
56
}

Before terraform apply, set TF_VAR_alert_email in .envrc.local so the SNS subscription has somewhere to send. The block is already in the template, just uncomment it and run direnv reload:

16
# Email subscribed to the alarms SNS topic. Required by infra/alerts.tf.
17
# Set before `terraform apply`; AWS sends a confirmation email that must be
18
# clicked before alarms can deliver.
19
# export [email protected]

This wires up a plain email SNS subscription. In production you would point the alarm at PagerDuty or incident.io so someone is actually paged. AWS sends a confirmation email on first apply, and the subscription stays PendingConfirmation until you click the link. The alarm itself fires regardless, but no email leaves SNS until that confirmation is in.

2
# alias/aws/sns is the AWS-managed SNS key; a CMK is unnecessary for an
3
# operational alarm topic that publishes alarm-fired events, not audit data.
4
#trivy:ignore:avd-aws-0136
5
resource "aws_sns_topic" "agent_alerts" {
6
  name              = "terraform-pr-agent-alerts"
7
  kms_master_key_id = "alias/aws/sns"
8
}
9

10
# The subscription stays PENDING until the recipient clicks the AWS
11
# confirmation email. Alarms fire either way, but no email goes out until the
12
# subscription is confirmed.
13
resource "aws_sns_topic_subscription" "agent_alerts_email" {
14
  topic_arn = aws_sns_topic.agent_alerts.arn
15
  protocol  = "email"
16
  endpoint  = var.alert_email
17
}

The alarm sums daily input and output tokens. Output tokens are more expensive, so you could weight them higher in the expression, or compute a dollar cost that also folds in cache read/write.

21
resource "aws_cloudwatch_metric_alarm" "daily_tokens" {
22
  alarm_name          = "${aws_bedrock_inference_profile.agent.name}-daily-tokens"
23
  alarm_description   = "Daily input + output token usage for the agent inference profile exceeded the configured threshold."
24
  comparison_operator = "GreaterThanThreshold"
25
  evaluation_periods  = 1
26
  threshold           = var.daily_token_alarm_threshold
27
  treat_missing_data  = "notBreaching"
28

29
  metric_query {
30
    id          = "total"
31
    expression  = "input + output"
32
    label       = "Total tokens (input + output)"
33
    return_data = true
34
  }
35

36
  metric_query {
37
    id = "input"
38
    metric {
39
      namespace   = "AWS/Bedrock"
40
      metric_name = "InputTokenCount"
41
      dimensions  = { ModelId = local.profile_id }
42
      stat        = "Sum"
43
      period      = 86400
44
    }
45
  }
46

47
  metric_query {
48
    id = "output"
49
    metric {
50
      namespace   = "AWS/Bedrock"
51
      metric_name = "OutputTokenCount"
52
      dimensions  = { ModelId = local.profile_id }
53
      stat        = "Sum"
54
      period      = 86400
55
    }
56
  }
57

58
  alarm_actions = [aws_sns_topic.agent_alerts.arn]
59
}

End state

A pydantic-ai agent talking to Claude on Bedrock, a live dashboard showing token usage, and an alarm armed to yell when daily spend creeps up. Every later post in the series leans on this baseline.

Coming next: an audit trail for the agent.

Tooling#

AWS access#

Anthropic First Time Use#

Project scaffold

What this post covers#

Bedrock IAM and inference profile#

First pydantic-ai call#

Add the new env vars to .envrc.local#

Full script#

System prompt - Basic Chat#

Reply schema#

Assume the role#

Build the model#

The conversation loop#

CloudWatch dashboard for inference profile metrics#

Daily token threshold alarm#

End state#