# AWS Lambda Managed Instances (LMI)

## What This Is

Lambda Managed Instances runs Lambda functions on EC2 instances in your account while AWS manages the infrastructure (lifecycle, patching, scaling, routing). The key difference from standard Lambda: multiple invocations share the same execution environment concurrently. This makes LMI cost-effective for steady-state, I/O-bound workloads where standard Lambda charges you for idle CPU wait time.

## Key Concepts

- **Capacity Provider**: Infrastructure blueprint that defines VPC config, instance requirements, and scaling behavior. Functions attach to a capacity provider. Multiple functions can share one.
- **Multi-concurrency**: One execution environment handles multiple invocations simultaneously. Python: 16/vCPU, Node.js: 64/vCPU, Java/.NET: 32/vCPU.
- **Memory-to-vCPU ratio**: 2:1 (compute-optimized), 4:1 (general purpose), 8:1 (memory-optimized). Minimum 2GB memory.
- **Published versions required**: LMI pre-provisions capacity on version publish. No cold starts for published versions. `$LATEST` is `ActiveNonInvocable`.
- **EC2 pricing + 15% management fee**: Supports Savings Plans and Reserved Instances on the EC2 portion. No per-millisecond duration charge.
- **Minimum instances across AZs**: Lambda launches instances across the availability zones defined in your capacity provider's VPC configuration for resiliency. No scale-to-zero.
- **Migration path**: You can deploy the same function code to both standard Lambda and LMI simultaneously (provided it's thread-safe). There's no built-in toggle to switch a single function between execution modes, so run both side by side during testing and cut traffic over when confident.

## Setup with SAM

### Capacity Provider

```yaml
WebhookCapacityProvider:
  Type: AWS::Serverless::CapacityProvider
  Properties:
    CapacityProviderName: !Sub ${AWS::StackName}-cp
    VpcConfig:
      SubnetIds:
        - !Ref Subnet1
        - !Ref Subnet2
        - !Ref Subnet3
      SecurityGroupIds:
        - !Ref SecurityGroupId
    ScalingConfig:
      MaxVCpuCount: 30
```

SAM auto-generates the operator role with `AWSLambdaManagedEC2ResourceOperator` policy if `OperatorRole` is omitted.

### Function with CapacityProviderConfig

```yaml
WebhookProcessorFunction:
  Type: AWS::Serverless::Function
  Properties:
    Handler: app.lambda_handler
    CodeUri: infrastructure/lambda/webhook-processor/
    MemorySize: 4096
    Timeout: 30
    CapacityProviderConfig:
      Arn: !GetAtt WebhookCapacityProvider.Arn
      ExecutionEnvironmentMemoryGiBPerVCpu: 2.0
      PerExecutionEnvironmentMaxConcurrency: 16
    AutoPublishAlias: live
    Environment:
      Variables:
        TABLE_NAME: !Ref WebhookEventsTable
    Policies:
      - DynamoDBCrudPolicy:
          TableName: !Ref WebhookEventsTable
    Events:
      PostWebhook:
        Type: Api
        Properties:
          Path: /webhooks
          Method: POST
```

`AutoPublishAlias` is required. LMI needs a published version to provision capacity.

### VPC Networking

LMI instances run in your VPC. They need outbound access for CloudWatch Logs, X-Ray, DynamoDB, and any external calls. Options:
- Public subnet with internet gateway (simplest, fine for dev)
- VPC endpoints (most secure, higher cost)
- Private subnet with NAT gateway (standard enterprise pattern)

VPC config lives on the capacity provider, not the function.

## Common Patterns

### I/O-Bound Enrichment with asyncio

Use `asyncio.gather` to call multiple downstream services concurrently. While one invocation awaits I/O, other concurrent invocations use the CPU.

```python
import asyncio
import time

async def _call_service(name: str, delay_s: float) -> dict:
    start = time.monotonic()
    await asyncio.sleep(delay_s)  # Replace with real HTTP call
    return {"service": name, "latency_ms": round((time.monotonic() - start) * 1000, 1)}

async def enrich(data: dict) -> dict:
    tasks = [
        _call_service("geocoding", 0.15),
        _call_service("fraud-scoring", 0.20),
        _call_service("loyalty-lookup", 0.10),
    ]
    results = await asyncio.gather(*tasks)
    return {r["service"]: r for r in results}

def _run_enrichment(data: dict) -> dict:
    loop = asyncio.new_event_loop()
    try:
        return loop.run_until_complete(enrich(data))
    finally:
        loop.close()
```

### Thread-Safe Initialization

Initialize shared clients outside the handler. These are safe to share across concurrent invocations:

```python
import boto3
from aws_lambda_powertools import Logger, Tracer, Metrics
from aws_lambda_powertools.event_handler import APIGatewayRestResolver

logger = Logger()
tracer = Tracer()
metrics = Metrics()
app = APIGatewayRestResolver()

dynamodb = boto3.resource("dynamodb")
table = dynamodb.Table(os.environ["TABLE_NAME"])
```

Do NOT use mutable global state (dicts, lists) shared across invocations without synchronization.

### Build and Deploy

```bash
sam build --use-container
sam deploy --guided
```

Deploy takes 3-5 minutes because LMI provisions EC2 instances and initializes execution environments.

## API Reference

### SAM Resource: AWS::Serverless::CapacityProvider

| Property | Type | Required | Description |
|----------|------|----------|-------------|
| CapacityProviderName | String | No | Unique name within account/region |
| VpcConfig | Object | Yes | SubnetIds (list), SecurityGroupIds (list) |
| OperatorRole | String | No | IAM role ARN. SAM auto-generates if omitted |
| InstanceRequirements | Object | No | Architectures (list), AllowedTypes/ExcludedTypes |
| ScalingConfig | Object | No | MaxVCpuCount (int), ScalingMode (Auto/Manual) |
| KmsKeyArn | String | No | KMS key for encryption |

### SAM Property: CapacityProviderConfig (on AWS::Serverless::Function)

| Property | Type | Required | Description |
|----------|------|----------|-------------|
| Arn | String | Yes | Capacity provider ARN |
| ExecutionEnvironmentMemoryGiBPerVCpu | Float | No | 2.0, 4.0, or 8.0. Default 2.0 |
| PerExecutionEnvironmentMaxConcurrency | Integer | No | Max concurrent invocations per env |

## Pitfalls

- **Memory minimum is 2GB.** Smaller functions will fail to deploy.
- **First deploy with 2GB may fail** with "high resource consumption on init" if your dependencies are heavy. Bump to 4096MB.
- **No built-in toggle.** You can run the same code on both standard Lambda and LMI, but there's no switch to move a single function between modes. Run side by side and cut over.
- **No scale-to-zero.** Minimum 3 instances always running. Don't use for low-volume workloads.
- **Deploy latency.** Version publish triggers EC2 provisioning. 3-5 minutes before function is invocable.
- **SubnetIds as parameter references** don't work with SAM validate. Use individual `AWS::EC2::Subnet::Id` parameters instead of `List<AWS::EC2::Subnet::Id>`.
- **Python multi-concurrency uses separate processes**, not threads. Focus on interprocess coordination and file locking for `/tmp`.
- **CloudWatch Logs need outbound access.** If your VPC has no internet path, logs silently fail.
- **15% management fee is not discountable.** Savings Plans/RIs only apply to the base EC2 cost.

## Working Examples

Full working repo: https://github.com/singledigit/lambda-managed-instances-sam

### Test Commands

```bash
# Health check
curl https://<api-id>.execute-api.<region>.amazonaws.com/prod/health

# Send webhook
curl -X POST https://<api-id>.execute-api.<region>.amazonaws.com/prod/webhooks \
  -H "Content-Type: application/json" \
  -d '{"event_type":"payment.completed","order_id":"ORD-123","amount":99.99}'

# Retrieve event
curl https://<api-id>.execute-api.<region>.amazonaws.com/prod/webhooks/<event_id>

# Cleanup
sam delete --stack-name lmi-webhook-demo
```

## Related Resources

- [AWS Lambda Managed Instances docs](https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances.html)
- [SAM CapacityProvider resource](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/sam-resource-capacityprovider.html)
- [SAM CapacityProviderConfig property](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/sam-property-function-capacityproviderconfig.html)
- [LMI networking guide](https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances-networking.html)
- [LMI best practices](https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances-best-practices.html)
- Kiro Power: **aws-sam** for SAM CLI tooling and deployment workflows
