AWS Lambda Worker
Problem: Run a bounded worker as a Lambda function with predictable timeouts and logs.
This example follows the core principles described in the AI Worker Design Patterns and uses the standard Worker Protocol schema.
Key ideas#
- Keep the worker single-purpose and explicit about inputs and outputs.
- Put hard limits in the contract (timeout, retries, tools allowed).
- Make failures machine-actionable with stable error codes.
- Emit structured signals so orchestrators can route, retry, or escalate.
Diagram#
invoke -> Lambda -> worker runtime -> response
Worker spec#
worker_id: aws-lambda-worker
version: 1.0
purpose: Run a bounded worker as a Lambda function with predictable timeouts and logs.
inputs:
- request: object
outputs:
- status: string
- outputs: object
- observability: object
constraints:
timeout_seconds: 60
max_tokens: 1500
tools_allowed: [language_model, parameter_store (optional)]
retries:
max_attempts: 2
backoff: exponential
observability:
trace_id: required
log_fields: [worker_id, attempt, duration_ms]
Input schema#
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"additionalProperties": false,
"properties": {
"request": {
"type": "object"
}
},
"required": [
"request"
]
}
Output schema#
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"additionalProperties": true,
"properties": {
"status": {
"type": "string"
},
"outputs": {
"type": "object"
},
"observability": {
"type": "object"
}
}
}
Constraints#
{
"timeout_seconds": 60,
"max_tokens": 1500,
"retries": {
"max_attempts": 2,
"backoff": "exponential"
},
"rate_limit": "per-tenant (example: 10/min)",
"tools_allowed": [
"language_model",
"parameter_store (optional)"
]
}
Failure modes & handling#
- Cold start pushes duration over budget: reduce payloads or pre-warm; mark error_code=timeout.
- Downstream API rate limited: error_code=rate_limited, retryable=true.
- Large artifact outputs: store externally and return references only.
Observability signals#
- logs: worker_id, attempt, duration_ms, status, error_code
- metrics: success_count, failure_count, retry_count, p95_duration_ms
- trace fields: trace_id, span_id, upstream_request_id (if present)
Related examples#
See also#
FAQ#
Should the worker return partial results on failure?
If partial results are safe and useful, return them with a stable status and error_code. Otherwise fail fast and let orchestration decide.
Where should large artifacts go?
Store them externally (object storage or DB) and return a reference (URL or artifact ID) in the response.
How should I choose timeouts?
Set a hard ceiling based on SLOs and queue backpressure. Prefer smaller workers with tighter timeouts over monolith workers.