Skip to content

Implements NestingType.Flat#2409

Draft
GarrettBeatty wants to merge 22 commits into
gcbeatty/durable-mapfrom
gcbeatty/flat
Draft

Implements NestingType.Flat#2409
GarrettBeatty wants to merge 22 commits into
gcbeatty/durable-mapfrom
gcbeatty/flat

Conversation

@GarrettBeatty

@GarrettBeatty GarrettBeatty commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Issue #, if available: #2216

Description of changes:

Summary

Implements NestingType.Flat for ParallelAsync and MapAsync. Previously both threw NotSupportedException when NestingType.Flat was supplied; Nested (the default) was the only wired mode. This PR wires up Flat for both operations using an inline-payload strategy that matches the behavior of the Python, JS, and Java durable-execution SDKs.

What NestingType.Flat is

When a parallel/map operation runs, each branch/item normally executes inside its own child CONTEXT operation, which emits START + SUCCEED/FAIL checkpoints. For large fan-outs this is a lot of checkpoint writes — roughly one extra CONTEXT op per branch on top of the work each branch actually does.

NestingType controls how those branches are represented in the checkpoint graph:

  • Nested (default) — each branch produces a full CONTEXT operation. Maximum observability in execution traces; more checkpoint operations.
  • Flat — each branch runs in a virtual context that emits no CONTEXT checkpoint of its own. Per-branch results/errors are recorded inline on the parent parallel/map operation's payload instead. Fewer checkpoints, at the cost of less granular traces.
// Opt into Flat per operation:
await ctx.ParallelAsync(branches, name: "fanout",
    config: new ParallelConfig { NestingType = NestingType.Flat });

await ctx.MapAsync(items, MapItem, name: "fanout",
    config: new MapConfig { NestingType = NestingType.Flat });

Nested remains the default, so existing workflows are unaffected and opting in is non-breaking.

How it works

A flat branch is a logical scope that owns an ID namespace and a logger scope but is invisible in the checkpoint tree. Four pieces make that work:

  1. Decoupled ID generation (OperationIdGenerator) — a new CreateVirtualChild(operationId, reportedParentId) separates the hash prefix used to derive inner-operation IDs from the ParentId reported on those operations:

    • Inner-op IDs are still prefixed by the branch's own op-id (hash("{branchOpId}-1"), …), so two sibling branches never collide on inner IDs. The ID space is identical to Nested — load-bearing for deterministic replay.
    • But inner ops report the nearest non-virtual ancestor (the parallel/map op) as their wire ParentId, because the virtual branch emits no CONTEXT checkpoint for them to reference.
  2. Checkpoint suppression (ChildContextOperation) — an isVirtual flag suppresses the branch's CONTEXT START/SUCCEED/FAIL enqueues. The exception still propagates on failure; inner ops (steps/waits) still checkpoint normally, re-parented per (1).

  3. Inline result/error payload (BatchSummary / BatchUnitSummary) — under Nested, per-unit results are read back from each child's own CONTEXT checkpoint on replay. Under Flat there are no child checkpoints, so each unit's serialized result (or ErrorObject) is recorded inline on the parent operation's BatchSummary payload. ConcurrentOperation persists this payload even on FAIL (Nested omits it, since it can rebuild from child checkpoints).

  4. Replay reconstruction (ConcurrentOperation.ReconstructFromCheckpoints) — when virtual, per-unit results/errors are read from the inline summary instead of from child operations.

This mirrors the Python/Java SDKs' inline-payload approach (the JS SDK re-executes branch bodies on replay instead; inline is the better fit for .NET's cached-replay model).

Tests

Unit tests (ParallelOperationTests, MapOperationTests) — replaced the two *_NestingTypeFlat_ThrowsNotSupported tests with Flat behavior coverage: per-branch CONTEXT ops suppressed, inner-op re-parenting to the parallel/map op, inline partial-failure surfacing, and replay-from-inline-payload (success + failure).

Integration tests (ParallelFlatNestingTest, MapFlatNestingTest) — end-to-end against the real durable-execution service. Each runs 3 branches/items with a step + durable wait (the wait forces a suspend/resume so the operation genuinely replays) and asserts against service history:

  • exactly one CONTEXT op (the parent) — no per-branch/item contexts,
  • inner step/wait ops re-parent to the parallel/map op (Event.ParentId),
  • inner-op IDs still derive from the per-branch ID space,
  • per-branch results survive replay (read back from the inline parent payload).

All 331 unit tests pass on net8.0 and net10.0.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.


COPY bin/publish/ ${LAMBDA_TASK_ROOT}

ENTRYPOINT ["/var/task/bootstrap"]

COPY bin/publish/ ${LAMBDA_TASK_ROOT}

ENTRYPOINT ["/var/task/bootstrap"]

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements NestingType.Flat support for ParallelAsync and MapAsync in Amazon.Lambda.DurableExecution, switching per-unit execution to “virtual” child contexts that do not emit per-branch/per-item CONTEXT checkpoints, while still keeping deterministic operation IDs and re-parenting inner operations to the nearest non-virtual ancestor. It also adds unit + integration tests to validate replay behavior and the new checkpoint/payload contract.

Changes:

  • Enable Flat nesting by suppressing per-unit child CONTEXT START/SUCCEED/FAIL checkpoints and persisting per-unit results/errors inline on the parent operation payload.
  • Add support for “virtual” child ID generation so inner ops keep branch/item-scoped IDs but report the non-virtual parent as ParentId.
  • Add/extend unit and integration tests validating: no per-unit contexts, correct re-parenting, and correct replay reconstruction from inline payloads.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
Libraries/test/Amazon.Lambda.DurableExecution.Tests/ParallelOperationTests.cs Updates tests to validate Flat parallel semantics (no branch contexts, inline replay, re-parenting).
Libraries/test/Amazon.Lambda.DurableExecution.Tests/MapOperationTests.cs Updates tests to validate Flat map semantics (no item contexts, inline replay, re-parenting).
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ParallelFlatNestingFunction/ParallelFlatNestingFunction.csproj New integration test function project for Flat parallel.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ParallelFlatNestingFunction/Function.cs New Flat parallel test workflow exercising replay via WaitAsync.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ParallelFlatNestingFunction/Dockerfile Container packaging for the new Flat parallel test function.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/MapFlatNestingFunction/MapFlatNestingFunction.csproj New integration test function project for Flat map.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/MapFlatNestingFunction/Function.cs New Flat map test workflow exercising replay via WaitAsync.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/MapFlatNestingFunction/Dockerfile Container packaging for the new Flat map test function.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/ParallelFlatNestingTest.cs New end-to-end AWS integration test asserting Flat parallel history/IDs/parenting.
Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/MapFlatNestingTest.cs New end-to-end AWS integration test asserting Flat map history/IDs/parenting.
Libraries/src/Amazon.Lambda.DurableExecution/ParallelConfig.cs Updates docs to describe Flat’s inline result/error behavior.
Libraries/src/Amazon.Lambda.DurableExecution/NestingType.cs Updates enum docs to reflect Flat now being implemented and its semantics.
Libraries/src/Amazon.Lambda.DurableExecution/MapConfig.cs Updates docs to describe Flat’s inline result/error behavior.
Libraries/src/Amazon.Lambda.DurableExecution/Internal/ParallelOperation.cs Plumbs Flat configuration into concurrent orchestration via isVirtual.
Libraries/src/Amazon.Lambda.DurableExecution/Internal/MapOperation.cs Plumbs Flat configuration into concurrent orchestration via isVirtual.
Libraries/src/Amazon.Lambda.DurableExecution/Internal/OperationIdGenerator.cs Adds “virtual child” ID generator that decouples ID prefix from reported ParentId.
Libraries/src/Amazon.Lambda.DurableExecution/Internal/ConcurrentOperation.cs Implements inline per-unit payload persistence (Flat) and replay reconstruction paths.
Libraries/src/Amazon.Lambda.DurableExecution/Internal/ChildContextOperation.cs Suppresses per-unit CONTEXT checkpoints when the child context is virtual (Flat).
Libraries/src/Amazon.Lambda.DurableExecution/Internal/BatchSummary.cs Extends summary payload to optionally include inline Result / Error for Flat units.
Libraries/src/Amazon.Lambda.DurableExecution/Internal/BatchJsonContext.cs Adds source-gen metadata for ErrorObject in the batch summary payload.
Libraries/src/Amazon.Lambda.DurableExecution/DurableContext.cs Removes Flat “not supported” guard and updates child-context factory to support virtual parenting.
Libraries/src/Amazon.Lambda.DurableExecution/CLAUDE.md Adds repository-specific architecture/testing guidance (documentation only).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +612 to +634
// Flat (virtual) units have no child checkpoint — their result/error
// was recorded inline on this summary. Nested units read from the
// child's own CONTEXT checkpoint. A unit is "inline" when the summary
// entry carries a Result/Error, which only Flat writes.
if (_isVirtual && summaryEntry != null)
{
if (status == BatchItemStatus.Succeeded && summaryEntry.Result != null)
{
unitResult = DeserializeResult(summaryEntry.Result);
}
else if (status == BatchItemStatus.Failed && summaryEntry.Error != null)
{
var err = summaryEntry.Error;
unitError = new ChildContextException(err.ErrorMessage ?? "Unit failed")
{
SubType = ChildSubType,
ErrorType = err.ErrorType,
ErrorData = err.ErrorData,
OriginalStackTrace = err.StackTrace
};
}
}
else if (status == BatchItemStatus.Succeeded && childOp?.ContextDetails?.Result != null)
@GarrettBeatty GarrettBeatty changed the title Flat Implements NestingType.Flat Jun 5, 2026
Adds parallel branch execution to the .NET Durable Execution SDK.
ParallelAsync runs N branches concurrently with configurable concurrency
limits and completion policies, returning an IBatchResult<T> with
per-branch status and error information.

Per-branch checkpoint payloads are serialized via the ILambdaSerializer
registered on ILambdaContext.Serializer (typically configured through
LambdaBootstrapBuilder.Create(handler, serializer)), matching the
StepAsync / RunInChildContextAsync pattern. There are no separate
reflection / AOT-safe overload pairs: the AOT story is determined
entirely by which serializer the user registers with the runtime.

Public surface:
- IDurableContext.ParallelAsync<T> (2 overloads: Func[] vs
  DurableBranch<T>[])
- DurableBranch<T> record (Name + Func)
- ParallelConfig (MaxConcurrency, CompletionConfig, NestingType)
- CompletionConfig with factories AllSuccessful() / FirstSuccessful() /
  AllCompleted(); ToleratedFailureCount / ToleratedFailurePercentage
  (validated 0.0-1.0)
- IBatchResult<T> with All / Succeeded / Failed / Started accessors,
  GetResults, GetErrors, ThrowIfError, HasFailure, CompletionReason,
  count properties
- IBatchItem<T> with Index, Name, Status, Result, Error
- BatchItemStatus { Succeeded, Failed, Started }
- CompletionReason { AllCompleted, MinSuccessfulReached,
  FailureToleranceExceeded }
- NestingType (Nested default; Flat throws NotSupportedException - reserved)
- ParallelException (carries IBatchResult; future-subclassable)

Internal:
- ParallelOperation<T> orchestrator dispatches branches with optional
  semaphore-bounded concurrency. Each branch runs as a
  ChildContextOperation<T> with deterministic ID via
  OperationIdGenerator.CreateChild.
- Branch failures aggregated as IBatchItem<T> entries; orchestrator
  throws ParallelException only when CompletionConfig signals
  FailureToleranceExceeded.
- Parent CONTEXT checkpoint records summary (CompletionReason +
  per-branch index/name/status); branch results live on per-branch
  CONTEXT checkpoints.
- ExecutionState now thread-safe (lock around reads/writes of
  _operations, _visitedOperations, _isReplaying). Required for
  concurrent branch replay; affects all operations but no regressions.
- ParallelOperation awaits Task.WhenAll(inFlight) before disposing
  the semaphore so cancellation/exception during dispatch lets
  in-flight branches settle cleanly.
- Reuses OperationSubTypes.Parallel / OperationSubTypes.ParallelBranch
  from Wave 0.

Adds 31 unit tests + 6 integration tests covering CompletionConfig
matrix, MaxConcurrency, FirstSuccessful short-circuit, replay
determinism, mixed-status replay, cancellation, and concurrency
stress.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix tests

change file

Validate CompletionConfig thresholds and honor checkpointed branch names

- Add range validation to CompletionConfig.MinSuccessful (>= 1) and
  ToleratedFailureCount (>= 0), matching the existing
  ToleratedFailurePercentage setter. Previously zero/negative values
  produced nonsensical immediate short-circuits.
- ReconstructFromCheckpoints now uses the branch Name persisted in the
  parallel summary instead of always reading the current branch name,
  and throws NonDeterministicExecutionException on name drift between
  deployments (the prior path silently ignored summaryEntry.Name).
- Correct XML docs for BatchItemStatus.Started / IBatchResult.Started /
  CompletionConfig.FirstSuccessful: Started means a branch was not
  dispatched before a completion short-circuit fired (or has no
  checkpoint on replay), not that it is still running.
Implements IDurableContext.MapAsync, processing a collection in parallel
with one child context per item. Mirrors the Python/JS/Java SDKs, where
Map is a sibling of Parallel sharing one concurrency engine.

- Extract ConcurrentOperation<T> base holding all orchestration, completion,
  checkpoint, and replay logic; ParallelOperation and MapOperation are thin
  subclasses supplying only the per-unit (name, func), sub-type labels, and
  failure-exception factory.
- MapConfig defaults CompletionConfig to AllCompleted() (permissive), matching
  Python/Java Map; intentionally differs from ParallelConfig's AllSuccessful().
  Adds ItemNamer; no ItemBatcher (not implemented in any reference SDK).
- New MapException so callers can distinguish Map from Parallel failures.
- Generalize ParallelSummary/ParallelJsonContext into shared BatchSummary/
  BatchJsonContext.
- Tests: 24 unit tests (MapOperationTests) + 6 integration functions/tests
  mirroring the Parallel set. Full suite 325/325 on net8.0 and net10.0.
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-map branch from 97d51e0 to e0b6f5f Compare June 8, 2026 18:36
@GarrettBeatty GarrettBeatty force-pushed the gcbeatty/durable-map branch from e0b6f5f to dcda8da Compare June 12, 2026 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants