Lambda Managed Instances: A Working Demo and the Math Behind It
You’ve built this function before. A webhook comes in, you validate it, call two or three downstream APIs to enrich the data, write the result to DynamoDB, and return. Total execution time: 500 milliseconds. Actual CPU work: maybe 50 milliseconds. The other 90% of the time, your function is sitting there, waiting on network I/O, when it could be handling the next request.
That’s the standard Lambda execution model. One invocation, one execution environment, one function at a time. It works great for bursty, unpredictable traffic. But if you’re running a steady-state workload that processes millions of these I/O-heavy requests per month, that CPU is sitting idle while your function waits on the network.
Lambda Managed Instances (LMI) changes that equation. Instead of one invocation per execution environment, LMI lets multiple invocations share the same environment concurrently. While invocation A waits on a downstream API call, invocation B uses the CPU. Your CPU stays busy instead of waiting on the network.
What Is Lambda Managed Instances and How Does It Work?

LMI runs your Lambda functions on EC2 instances of your choice, in your own account. AWS handles everything you’d rather not: instance lifecycle, OS patching, load balancing, auto scaling. You keep the Lambda programming model. Same handler signature, same event sources, same CloudWatch integration. The difference is under the hood.
There are three things that matter:
Multi-concurrency. A single execution environment handles multiple invocations at the same time. Python gets 16 concurrent invocations per vCPU by default. Node.js gets 64. Java and .NET get 32. This is the big one for I/O-bound workloads.
EC2 pricing. You pay for the EC2 instances (plus a 15% management fee) instead of per-millisecond Lambda duration pricing. That means you can apply discounts via EC2 Savings Plans, Compute Savings Plans, Reserved Instances, or special pricing agreements, which can cut costs by up to 72% compared to EC2 on-demand.
Instance selection. You can choose from compute-optimized (C), general-purpose (M), and memory-optimized (R) instance families, including Graviton4. Need high-bandwidth networking? Up to 32 GB of memory per execution environment? You pick the instance type.
What Is a Capacity Provider in Lambda Managed Instances?
The new primitive is the Capacity Provider. Think of it as the infrastructure blueprint for your LMI functions. It defines:
- VPC configuration: which subnets and security groups your instances run in
- Instance requirements: architecture (x86 or ARM), allowed or excluded instance types
- Scaling config: max vCPU count and scaling mode (auto or manual)
You create a capacity provider, then attach your Lambda functions to it. Multiple functions can share the same capacity provider, which improves resource utilization. Lambda launches a minimum of 3 instances by default, spread across the availability zones in your capacity provider’s VPC configuration for resiliency.
Here’s an important operational detail: you must publish a function version to trigger instance provisioning. LMI launches EC2 instances and initializes execution environments ahead of time, so they’re ready before traffic arrives. This is different from standard Lambda’s on-demand model (where environments spin up on first invoke) and from Provisioned Concurrency (which pre-warms standard Lambda environments). LMI is running actual EC2 instances with your code deployed on them.
When Should You Use Lambda Managed Instances vs Standard Lambda?
I’ll be honest: LMI isn’t for every workload. Here’s the rule of thumb I like:
Use LMI when:
- Your function is I/O-bound (waiting on APIs, databases, queues)
- You have steady, predictable traffic (not bursty spikes)
- Your monthly Lambda bill is high enough that EC2 pricing + Savings Plans saves money
- You need more than 10 GB of memory or specific instance types
Stick with standard Lambda when:
- Traffic is bursty and unpredictable
- You need scale-to-zero (LMI maintains a minimum of 3 execution environments)
- Your invocation volume is low (the always-on EC2 cost will exceed Lambda’s per-invocation pricing)
- Your function is CPU-bound with no I/O wait time (multi-concurrency won’t help)
The crossover point depends on your workload, but for a typical I/O-bound function, it’s roughly 4-5 million invocations per month. Below that, standard Lambda is cheaper. Above that, LMI starts winning, and the gap widens as volume increases.
How Much Does Lambda Managed Instances Cost Compared to Standard Lambda?
Let’s make this concrete. Take a webhook processor that validates events, calls 3 downstream APIs concurrently (~200ms I/O wait), and writes to DynamoDB. Total execution: ~500ms. CPU usage: ~50ms. 5 million invocations per month.
Standard Lambda
| Line Item | Calculation | Cost |
|---|---|---|
| Requests | 5M x $0.20/1M | $1.00 |
| Duration | 5M x 0.5s x 0.5GB x $0.0000166667/GB-s | $20.83 |
| Total | $21.83/mo |
That’s 1,250,000 GB-seconds of compute, but your CPU is only active for roughly 10% of it. The rest is waiting on network calls.
Lambda Managed Instances
With multi-concurrency, one execution environment handles 16 requests simultaneously (Python default). While request A waits on a downstream API, requests B through P use the CPU.
3 execution environments on EC2 instances handle your entire workload. Lambda picks the instance type, but the cost looks roughly like this:
| Line Item | Calculation | Cost |
|---|---|---|
| Requests | 5M x $0.20/1M | $1.00 |
| EC2 compute | ~1 instance running 24/7 | ~$30/mo |
| Management fee | 15% of EC2 | ~$4.50 |
| Total (on-demand) | ~$35.50/mo | |
| Total (1yr Savings Plan) | ~37% EC2 discount | ~$24/mo |
Wait, standard Lambda is cheaper at this config? That’s because I used 512MB in the standard Lambda calculation. Let me be fair about this.
The real comparison gets interesting when you factor in two things: first, LMI’s minimum is 2GB of memory (so if your standard Lambda function also needs 2GB, the duration cost quadruples). Second, at higher volumes the EC2 cost stays relatively flat while Lambda scales linearly.
At 2GB memory and 5M invocations:
| Standard Lambda | LMI (on-demand) | LMI (Savings Plan) | |
|---|---|---|---|
| Monthly cost | $84.33 | ~$35.50 | ~$24.00 |
| vs Standard | baseline | -58% | -72% |
That’s a lot of savings for the same workload. And it gets better: at 10M invocations, standard Lambda doubles to $168.66 while LMI barely moves because you’re already paying for the instances.
How Do You Set Up Lambda Managed Instances with AWS SAM?

Let’s build the thing. I’m using AWS SAM with the new AWS::Serverless::CapacityProvider resource type. The full code is in the companion repo.
The architecture is straightforward: API Gateway receives webhook POST requests, routes them to a Lambda function running on LMI, which validates the payload, calls 3 downstream services concurrently for enrichment, and writes the result to DynamoDB.
The SAM Template
Here’s the capacity provider and function definition:
Resources:
# The capacity provider defines where your LMI functions run
WebhookCapacityProvider:
Type: AWS::Serverless::CapacityProvider
Properties:
CapacityProviderName: !Sub ${AWS::StackName}-cp
VpcConfig:
SubnetIds:
- !Ref Subnet1
- !Ref Subnet2
- !Ref Subnet3
SecurityGroupIds:
- !Ref SecurityGroupId
ScalingConfig:
MaxVCpuCount: 30
# The function looks like any other SAM function,
# plus CapacityProviderConfig
WebhookProcessorFunction:
Type: AWS::Serverless::Function
Properties:
Handler: app.lambda_handler
CodeUri: infrastructure/lambda/webhook-processor/
MemorySize: 4096
Timeout: 30
CapacityProviderConfig:
Arn: !GetAtt WebhookCapacityProvider.Arn
ExecutionEnvironmentMemoryGiBPerVCpu: 2.0
PerExecutionEnvironmentMaxConcurrency: 16
AutoPublishAlias: live
Environment:
Variables:
TABLE_NAME: !Ref WebhookEventsTable
Events:
PostWebhook:
Type: Api
Properties:
Path: /webhooks
Method: POST
A few things to call out:
CapacityProviderConfig is the new property that makes this an LMI function. You pass the capacity provider ARN, the memory-to-vCPU ratio (2:1 for compute-optimized, 4:1 for general purpose, 8:1 for memory-optimized), and the max concurrency per execution environment.
AutoPublishAlias: live is crucial. LMI requires a published version to provision capacity. SAM’s AutoPublishAlias handles this automatically, publishing a new version on every deploy and pointing the alias to it.
VPC configuration lives on the capacity provider, not the function. This is different from standard Lambda VPC config. You set it once on the capacity provider, and all attached functions inherit it.
SAM auto-generates the operator role. If you don’t specify OperatorRole on the capacity provider, SAM creates one with the AWSLambdaManagedEC2ResourceOperator managed policy. One less thing to manage.
The Function Code
The webhook processor uses Lambda Powertools and asyncio for concurrent downstream calls:
import json
import os
import time
import uuid
import asyncio
from datetime import datetime, timezone
from decimal import Decimal
import boto3
from aws_lambda_powertools import Logger, Metrics, Tracer
from aws_lambda_powertools.event_handler import APIGatewayRestResolver
from aws_lambda_powertools.metrics import MetricUnit
logger = Logger()
tracer = Tracer()
metrics = Metrics()
app = APIGatewayRestResolver()
# Initialized once, shared across concurrent invocations.
# boto3 resources and Powertools instances are thread-safe.
TABLE_NAME = os.environ.get("TABLE_NAME")
dynamodb = boto3.resource("dynamodb")
table = dynamodb.Table(TABLE_NAME)
The enrichment step is where multi-concurrency shines. Each downstream call waits 100-200ms on I/O. With standard Lambda, you pay for that wait. With LMI, other invocations use the CPU while this one sleeps:
async def _simulate_api_call(service_name: str, delay_s: float) -> dict:
"""In production, this would be an HTTP call to a real service."""
start = time.monotonic()
await asyncio.sleep(delay_s)
elapsed_ms = (time.monotonic() - start) * 1000
return {
"service": service_name,
"status": "success",
"latency_ms": round(elapsed_ms, 1),
}
async def enrich_event(event_data: dict) -> dict:
"""Call 3 downstream services concurrently."""
tasks = [
_simulate_api_call("geocoding", 0.15),
_simulate_api_call("fraud-scoring", 0.20),
_simulate_api_call("loyalty-lookup", 0.10),
]
results = await asyncio.gather(*tasks)
return {r["service"]: r for r in results}
The handler ties it together: validate, enrich, store, return:
@app.post("/webhooks")
@tracer.capture_method
def process_webhook():
start_time = time.monotonic()
body = app.current_event.body or ""
payload = json.loads(body) if body else {}
event_id = str(uuid.uuid4())
event_type = payload.get("event_type", "unknown")
# Enrich by calling downstream services concurrently
enrichment = _run_enrichment(payload)
# Write to DynamoDB
table.put_item(Item={
"PK": f"EVENT#{event_type}",
"SK": f"{datetime.now(timezone.utc).isoformat()}#{event_id}",
"event_id": event_id,
"payload": json.dumps(payload),
"enrichment": json.dumps(enrichment),
})
processing_time = round((time.monotonic() - start_time) * 1000, 1)
return {
"event_id": event_id,
"status": "processed",
"processing_time_ms": processing_time,
}
Deploy and Test
Build and deploy with SAM:
sam build --use-container
sam deploy --guided
SAM will ask for your subnet IDs and security group. Use subnets across at least 3 AZs in a VPC with internet access (the default VPC works fine for a demo). The deploy takes a few minutes because LMI needs to provision EC2 instances and initialize execution environments.
Once deployed, test the health endpoint:
curl https://<api-id>.execute-api.<region>.amazonaws.com/prod/health
{
"status": "healthy",
"compute_type": "lambda-managed-instances"
}
Send a webhook:
curl -X POST https://<api-id>.execute-api.<region>.amazonaws.com/prod/webhooks \
-H "Content-Type: application/json" \
-d '{
"event_type": "payment.completed",
"order_id": "ORD-12345",
"amount": 99.99,
"customer_id": "CUST-789"
}'
{
"event_id": "7d660175-7cbc-4c9b-9d15-3a5d79e82eff",
"event_type": "payment.completed",
"status": "processed",
"processing_time_ms": 202.0,
"enrichment": {
"geocoding": { "latency_ms": 151.0, "status": "success" },
"fraud-scoring": { "latency_ms": 201.3, "status": "success" },
"loyalty-lookup": { "latency_ms": 100.6, "status": "success" }
}
}
202 milliseconds. All three enrichment calls ran concurrently (the longest was 201ms for fraud-scoring, which set the floor). Five sequential requests all came back at ~202ms. No cold starts, consistent performance.
What Are the Gotchas with Lambda Managed Instances?
Thread safety matters. With multi-concurrency, your code runs multiple invocations in the same process. Global variables, shared state, and writes to /tmp need to be thread-safe. boto3 clients initialized outside the handler are fine. Mutable global dictionaries are not. Test under concurrent load before you ship.
Plan your migration path. You can deploy the same function code to both standard Lambda and LMI simultaneously, provided it’s thread-safe. However, there’s no built-in toggle to switch a single function between execution modes. In practice, run both side by side during testing and cut traffic over when you’re confident.
VPC networking is required. LMI instances run in your VPC. They need outbound internet access for CloudWatch Logs, X-Ray, and any external API calls. The default VPC with public subnets works for demos. For production, use VPC endpoints or a NAT gateway.
Minimum 3 instances. Lambda launches 3 instances across AZs for resiliency. You’re paying for those even at zero traffic. This is why LMI doesn’t make sense for low-volume workloads.
Deploy takes minutes, not seconds. Publishing a version triggers EC2 instance provisioning. Expect 3-5 minutes before your function is invocable. Plan your CI/CD accordingly.
The 15% management fee. EC2 pricing discounts (Savings Plans, Reserved Instances) apply to the base EC2 cost, not the management fee. Factor this into your cost calculations.
Cleanup
Don’t forget to tear down the stack when you’re done. Those EC2 instances keep running (and billing) until you delete the capacity provider:
sam delete --stack-name lmi-webhook-demo --profile demo --region us-west-2
What Should You Build Next with Lambda Managed Instances?
If I were taking this to production, I’d swap the simulated downstream calls for real HTTP clients (aiohttp or httpx), add HMAC signature validation on every request, set up CloudWatch alarms on CPUUtilization and ExecutionEnvironmentConcurrency, and run a load test to find the right PerExecutionEnvironmentMaxConcurrency for my specific workload.
I’d also look hard at the cost comparison with my actual traffic patterns. The math I showed earlier is directional, not precise. Your mileage will vary based on memory configuration, invocation duration, and how much of that duration is I/O wait vs CPU work. The higher the I/O-to-CPU ratio, the more LMI saves you.
Check out the LMI Pricing Calculator (built by Debasis Rath and team) to model the cost comparison with your actual traffic patterns and instance types. Stephen Liedig also built an LMI Scheduler for managing instance scaling schedules, and there’s an LMI Skill for AI coding assistants to help you set up and configure LMI functions directly in your IDE.
Lambda has always been about paying for what you use. LMI flips the question: what if your CPU could actually be busy the whole time you’re paying for it?
For the Agents
If you use an AI coding assistant, here are some relevant Kiro Powers:
- lambda-managed-instances - LMI concepts, SAM setup, capacity provider config, multi-concurrency patterns, thread-safe initialization, and common pitfalls
- aws-sam - SAM CLI tooling, template validation, and deployment workflows
Agent Instructions
Grab the companion agent file for this post. Use it as Kiro steering, CLAUDE.md, or any agent instructions format.
Comments