# Lambda Durable Functions for Computer Vision — Agent Skill File

> Source: [From 9 Tiles to 900: Scaling Computer Vision Pipelines with Lambda Durable Functions](https://edjgeek.com/blog/9-tiles-to-900-cv-pipelines) by Eric Johnson
> Last verified: June 2026

## Purpose

Use this skill when building or advising on:
- Computer vision pipelines that tile large images for parallel inference
- AWS Lambda durable functions with `context.map()` for fan-out workloads
- Checkpointed multi-step pipelines with partial failure handling
- Real-time pipeline observability with AppSync Events

## Core Concepts

### Durable Execution Model

AWS Lambda durable functions use checkpoint-and-replay. Each `context.step()` persists its result. If the Lambda dies mid-execution, replay skips completed steps and resumes from the last checkpoint.

- Runtime: Node.js 22+ or Python 3.12+
- SDK: `@aws/durable-execution-sdk-js` (TypeScript) or `aws-durable-execution-sdk-python`
- Max execution: up to 1 year
- Checkpoint size limit: 256 KB per step result

### context.map() for Tiled Inference

`context.map()` fans out an array of items as independent concurrent invocations. Each item runs in its own child context with independent checkpointing.

```typescript
const mapResults = await context.map(
  'operation-name',
  items,
  async (ctx, item, index) => {
    return await ctx.step(`process-${index}`, async () => {
      // Each item processed independently
      return result;
    });
  },
  { maxConcurrency: 5 },
);
```

Key properties:
- Each item is independently checkpointed (one failure doesn't affect others)
- `maxConcurrency` controls parallel execution (match to your model quota)
- `mapResults.succeeded()` and `mapResults.failed()` for partial failure handling
- Same code handles 9 items or 900 items

### Pipeline Pattern (4-Step CV)

```
Step 1: Preprocess (moderate + build tile grid)
Step 2: context.map() (N concurrent inferences, each checkpointed)
Step 3: Synthesize (aggregate findings)
Step 4: Store (persist + emit events)
```

### Checkpoint Size Discipline

The 256 KB limit means:
- Never pass image bytes through checkpoints
- Each tile re-fetches from S3 (or uses S3 Files for filesystem access)
- Return only small result objects from steps
- At scale (400+ tiles), this is the only viable architecture

### Model Selection Per Step

Different steps can use different models:
- Tiled inference (runs N times): Use fast/cheap model (e.g., Amazon Nova Lite)
- Synthesis (runs once): Use capable model (e.g., Claude) for complex reasoning
- Content moderation (runs once): Any model with safety capabilities

The model is a parameter, not a structural decision. Change `MODEL_ID`, not the pipeline.

## SAM Template Configuration

```yaml
ImageAnalysisPipeline:
  Type: AWS::Serverless::Function
  Properties:
    Handler: handler.handler
    CodeUri: src/pipeline/
    AutoPublishAlias: live
    DurableConfig:
      ExecutionTimeout: 900
      RetentionPeriodInDays: 1
    Policies:
      - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicDurableExecutionRolePolicy
      - S3ReadPolicy:
          BucketName: !Ref ImageBucket
      - DynamoDBCrudPolicy:
          TableName: !Ref ResultsTable
      - Statement:
          - Effect: Allow
            Action: bedrock:InvokeModel
            Resource: "*"
```

Required policy: `AWSLambdaBasicDurableExecutionRolePolicy`

## Real-Time Observability with AppSync Events

Publish per-step and per-tile status events via AppSync Events API:

```typescript
await publish(channel, [{ type: 'region', index, status: 'done', finding }]);
```

Clients subscribe via WebSocket to `/pipeline/{executionId}` for live progress.

## Scale Guidelines

| Grid Size | Tiles | Typical Use Case | maxConcurrency |
|-----------|-------|------------------|----------------|
| 3x3 | 9 | Demo, small images | 5 |
| 5x5 | 25 | Medium images | 10-15 |
| 8x8 | 64 | High-res photos, documents | 20-30 |
| 16x16+ | 256+ | Satellite, pathology | 50+ (match quota) |

At high tile counts, consider:
- S3 Files for filesystem-based image access (eliminates per-tile GetObject calls)
- Higher `maxConcurrency` matched to Bedrock provisioned throughput
- Smaller result objects to stay under 256 KB checkpoint limit

## Bedrock Converse API (Image + Text)

```typescript
const response = await client.send(new ConverseCommand({
  modelId: 'amazon.nova-lite-v1:0',
  messages: [{
    role: 'user',
    content: [
      { image: { format: 'jpeg', source: { bytes: imageBytes } } },
      { text: prompt }
    ]
  }],
  inferenceConfig: { maxTokens: 512 }
}));
```

Supported image formats: jpeg, png, gif, webp

## Reference Implementation

GitHub: https://github.com/singledigit/image-analysis-orchestration

Stack: TypeScript, Node.js 24, AWS SAM, Amazon Bedrock Nova Lite, AWS AppSync Events, Amazon DynamoDB, Amazon CloudFront, Amazon Cognito

## Related Resources

- [AWS Lambda durable functions documentation](https://docs.aws.amazon.com/lambda/latest/dg/durable-functions.html)
- [Durable Execution SDK Developer Guide](https://docs.aws.amazon.com/durable-execution/sdk-reference/operations/map/)
- [S3 Files for Lambda](https://edjgeek.com/blog/s3-files-lambda-agents/)
- [Amazon Bedrock Converse API](https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference.html)
