Skip to content

feat(temporal): opt-in continue-as-new for long-lived agent workflows#447

Merged
jromualdez-scale merged 1 commit into
nextfrom
dm/temporal-continue-as-new
Jun 30, 2026
Merged

feat(temporal): opt-in continue-as-new for long-lived agent workflows#447
jromualdez-scale merged 1 commit into
nextfrom
dm/temporal-continue-as-new

Conversation

@danielmillerp

@danielmillerp danielmillerp commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Why

Long-lived chat/session agents run as a single Temporal workflow that stays open indefinitely. Two things killed them:

  1. Event history grows until it hits Temporal's ~50k-event / 50MB limit, then the workflow stalls.
  2. The workflow execution timeout (24h SDK default) terminated the whole chain — a user returning to an old chat hit a dead workflow and their message silently vanished.

This adds an opt-in continue-as-new pattern to BaseWorkflow so a session can stay open forever, and defaults the execution timeout to infinite so a long-lived chat isn't capped at 24h.

Design

Opt-in by adoption, no flag, no patch gate. An agent gets recycling only by calling run_until_complete from its @workflow.run instead of a bare wait_condition(timeout=None). Agents that keep the old wait are untouched. (No workflow.patched() gate: we have no in-flight long-running workflows to preserve, and per Temporal guidance the right tool for evolving these "pure entity" workflows later is Worker Versioning + upgrade-on-continue-as-new — tracked in AGX1-420.)

BaseWorkflow helpers (src/agentex/lib/core/temporal/workflows/workflow.py):

  • run_until_complete(*args, is_complete, timeout=None) — keeps the workflow open and recycles history when Temporal suggests it. Optional timeout caps the wait (default None = wait forever). Workflow-level lifetime cap is the execution timeout (WORKFLOW_EXECUTION_TIMEOUT_SECONDS, infinite by default).
  • should_continue_as_new() — recycle when workflow.info().is_continue_as_new_suggested() (Temporal owns the threshold).
  • drain_and_continue_as_new() — waits all_handlers_finished (so an in-flight turn isn't lost) and re-checks completion before workflow.continue_as_new.
  • is_continued_run() — gate one-time @workflow.run prologue (welcome message, state rehydration) so it doesn't repeat on each recycle.

Execution timeout (environment_variables.py + temporal_task_service.py + temporal_client.py): WORKFLOW_EXECUTION_TIMEOUT_SECONDS now defaults to None = no execution timeout (None/0/negative → execution_timeout=None). It's chain-wide (continue-as-new does NOT reset it), so capping it would still kill a forever-chat.

Scope (deliberately concise)

Just the pattern + the timeout default. Restoring state after a recycle is framework-specific (rebuild from adk.messages, an adk.state snapshot, or a framework's own memory like a LangGraph checkpointer / Pydantic AI history) and is left to follow-up PRs, one per integration. The only example touched is 000_hello_acp, which keeps no cross-turn state — it adopts the pattern and gates its one-time welcome behind is_continued_run() so it isn't re-emitted on recycle.

Follow-ups (Temporal-team feedback)

  • AGX1-420 — adopt Worker Versioning + upgrade-on-CAN for evolving long-running workflows (vs patching).
  • AGX1-421 — durable idle-timeout to gracefully close idle workflows so they don't pile up.

Verification

  • Unit tests for the decision helpers (should_continue_as_new, is_continued_run) — tests/lib/core/temporal/test_base_workflow_continue_as_new.py.
  • py_compile + ruff + pyright clean.
  • Follow-up: replay/integration test of drain_and_continue_as_new against a Temporal test server.

🤖 Generated with Claude Code

Greptile Summary

This PR adds opt-in Temporal continue-as-new support for long-lived workflows. The main changes are:

  • New BaseWorkflow helpers for recycle decisions, handler draining, continued-run detection, and long waits.
  • The 000_hello_acp Temporal example now opts into run_until_complete and skips its one-time welcome on recycled runs.
  • Workflow execution timeouts now default to no timeout, with positive configured values still applied.
  • Unit tests cover the pure continue-as-new decision helpers.

Confidence Score: 5/5

The changes are narrowly scoped to opt-in workflow recycling helpers and timeout configuration behavior, with no reported correctness or security issues.

The implementation is covered by focused unit tests for the pure helper behavior, and the touched example adopts the new pattern without adding persistent cross-turn state requirements.

T-Rex T-Rex Logs

What T-Rex did

  • T-Rex ran the requested verification, but its local artifact references were not uploaded.

T-Rex Ran code and verified through T-Rex

Reviews (18): Last reviewed commit: "feat(temporal): opt-in continue-as-new f..." | Re-trigger Greptile

@danielmillerp danielmillerp changed the base branch from main to next June 24, 2026 20:20
@danielmillerp danielmillerp force-pushed the dm/temporal-continue-as-new branch from 5d63a08 to 4170651 Compare June 24, 2026 20:22
Comment thread src/agentex/lib/core/temporal/workflows/workflow.py Outdated
@danielmillerp danielmillerp force-pushed the dm/temporal-continue-as-new branch from 4170651 to 891ef6d Compare June 24, 2026 21:07
Comment thread src/agentex/lib/core/temporal/workflows/workflow.py Outdated
Comment thread examples/tutorials/10_async/10_temporal/150_codex/project/workflow.py Outdated
@danielmillerp danielmillerp force-pushed the dm/temporal-continue-as-new branch from 1fb74c4 to 22e7358 Compare June 26, 2026 17:55
Comment thread src/agentex/lib/core/temporal/workflows/workflow.py Outdated
@danielmillerp danielmillerp force-pushed the dm/temporal-continue-as-new branch from 22e7358 to ad68bd8 Compare June 26, 2026 18:11
Comment thread src/agentex/lib/core/temporal/workflows/workflow.py Outdated
@danielmillerp danielmillerp force-pushed the dm/temporal-continue-as-new branch from ad68bd8 to 65ab89a Compare June 26, 2026 18:27
Comment thread src/agentex/lib/core/temporal/workflows/workflow.py Outdated
@danielmillerp danielmillerp force-pushed the dm/temporal-continue-as-new branch 2 times, most recently from 43f62c2 to 9d71bb7 Compare June 26, 2026 18:59
Comment thread src/agentex/lib/core/temporal/workflows/workflow.py Outdated
@danielmillerp danielmillerp force-pushed the dm/temporal-continue-as-new branch 2 times, most recently from 467b202 to 80ab955 Compare June 29, 2026 22:35
Comment thread src/agentex/lib/core/temporal/workflows/workflow.py
@danielmillerp danielmillerp force-pushed the dm/temporal-continue-as-new branch 4 times, most recently from 1069028 to 6350513 Compare June 30, 2026 09:19
Comment thread src/agentex/lib/core/temporal/workflows/workflow.py
@danielmillerp danielmillerp force-pushed the dm/temporal-continue-as-new branch 3 times, most recently from 6562c2d to f480ffd Compare June 30, 2026 10:06
Long-lived chat/session agents run as a single Temporal workflow that stays open
indefinitely. Two things killed them: event history grows past Temporal's
~50k-event / 50MB limit, and the 24h execution-timeout default terminated the
whole chain. This adds an opt-in continue-as-new pattern on BaseWorkflow and
defaults the execution timeout to infinite.

Opt-in by adoption: an agent gets recycling only by calling run_until_complete
from its @workflow.run instead of a bare wait_condition(timeout=None). No flag,
no patch gate (no in-flight long-running workflows to preserve; Worker Versioning
+ upgrade-on-continue-as-new is the path for evolving these later).

BaseWorkflow helpers:
- run_until_complete(*args, is_complete, timeout=None): keep the workflow open and
  recycle history when Temporal suggests it. Optional timeout caps the wait
  (default None = wait indefinitely).
- should_continue_as_new(): recycle when workflow.info().is_continue_as_new_suggested().
- drain_and_continue_as_new(): drain all_handlers_finished and re-check completion
  before workflow.continue_as_new.
- is_continued_run(): gate one-time @workflow.run prologue (e.g. a welcome message,
  state rehydration) so it doesn't repeat on each recycle.

Execution timeout: WORKFLOW_EXECUTION_TIMEOUT_SECONDS now defaults to None (no
execution timeout; None/0/negative -> execution_timeout=None) — the workflow-level
lifetime cap, configurable per deployment. It is chain-wide (continue-as-new does
not reset it), so leaving it unset is required for a forever-chat.

State restoration after a recycle is framework-specific and left to follow-up
PRs, one per integration. 000_hello_acp adopts the pattern and gates its one-time
welcome behind is_continued_run() so it isn't re-emitted on recycle.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@danielmillerp danielmillerp force-pushed the dm/temporal-continue-as-new branch from f480ffd to ce7dc41 Compare June 30, 2026 10:16
@jromualdez-scale jromualdez-scale merged commit 98cf744 into next Jun 30, 2026
54 checks passed
@jromualdez-scale jromualdez-scale deleted the dm/temporal-continue-as-new branch June 30, 2026 18:40
@stainless-app stainless-app Bot mentioned this pull request Jun 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants