Skip to content

Unify iframe + agent-browser into one swappable-renderer browser surface#156

Merged
nedtwigg merged 58 commits into
mainfrom
ab-iframe-unify
Jun 19, 2026
Merged

Unify iframe + agent-browser into one swappable-renderer browser surface#156
nedtwigg merged 58 commits into
mainfrom
ab-iframe-unify

Conversation

@nedtwigg

@nedtwigg nedtwigg commented Jun 19, 2026

Copy link
Copy Markdown
Member

What

Collapses dor iframe and dor ab (agent-browser) into a single surfaceType: 'browser' surface with three interchangeable renderers — ab-screencast, ab-popout, and iframe — all sharing one browser chrome (URL bar + Back/Forward + the far-left Display chip). params.url becomes the canonical URL across renderers, so swapping renderers or restoring a session keeps the page.

See docs/specs/dor-iframe.md and the agent-browser spec for the as-built design.

Highlights

  • Unified surface & chrome. One browser surface type; render modes renamed to ab-screencast / ab-popout / iframe. Every browser surface gets real chrome (URL, nav, Display modal) on every host.
  • Transparent iframe proxy. A loopback proxy fronts http upstreams and injects a shim that gives Dormouse a keyboard side-channel, an accurate focus model, real error pages, and same-frame navigation reporting. New-tab/window.open attempts are offered as a new pane instead of vanishing in the single-frame renderer. CSP / X-Frame-Options framing refusals are diagnosed and shown as a served page.
  • Render-swap + pop-out. Bidirectional swap between the embedded screencast and the iframe embed; headed pop-out windows on hosts that can spawn them, closing cleanly on editor shutdown. ab→iframe swap is gated behind a typed confirm when ≥2 tabs are open.
  • Hosts unified. VS Code and standalone hosts share one lib module for agent-browser; standalone gains screencast + render-swap + pop-out capabilities and a dev harness (dev:standalone:ab) with a debugging skill.
  • StrictMode everywhere. Web, standalone, VS Code, storybook, tests, and website entries all run under StrictMode; the screencast loop and registrations were made StrictMode-safe.
  • Lifecycle/transition fixes. Daemon restart on pop-out/pop-in, graceful sidecar stop on quit, tab-click at the root, canvas repaint on tab switch, dropped duplicate frame/tab re-broadcasts, preserved iframe URL on render swap and adapter method bindings.

Review follow-ups (ultrareview + dormouse-bot)

Three chrome-desync / build issues caught in review, all now fixed with coverage:

  • Modifier-clicks lied to the URL bar. The proxy shim posted location for any anchor click, so Cmd/Ctrl/Shift/Alt+click — which open a new tab and leave the frame put — moved the parent URL bar + Back history to a page the iframe wasn't showing. Now guarded on modifier keys / primary button.
  • Cancelled same-origin clicks lied too. The shim's same-frame post runs in the capture phase, ahead of the page's own handlers, so a link the page cancels (preventDefault, or an <a> that fetches instead of navigating) still reported a navigation. The post is now deferred a tick and skipped when the click was cancelled; real navigations re-report via the next document's shim.
  • Back/Forward didn't reload after an observed nav. After a shim-observed in-frame navigation, params.url stays at the source URL, so Back to that URL was a no-op write and the proxy effect never re-fired — the frame kept showing the navigated page while the chrome showed the Back target. goToHistoryIndex now bumps reloadNonce to force a re-resolution (fresh proxy port → real reload), matching the reload button.
  • Build break (fixed). This branch rewrote AgentBrowserPanel's popped-out CDP-connect path to call the optional platform.agentBrowserCommand directly inside a nested closure; TS doesn't carry the optional-property narrowing into the closure, so it failed tsc and reddened CI. Bound to a local const after the guard. (An earlier revision of this description wrongly called this pre-existing — it is introduced by this branch; main used getAgentBrowserStreamUrl?.(…).)

Test

tsc -b clean; pnpm -r run test green. The iframe-proxy shim and IframePanel chrome/Back behavior have direct coverage.

🤖 Generated with Claude Code

nedtwigg and others added 30 commits June 17, 2026 00:14
Reframe the surface's far-left chip as a render-mode + sync indicator and
turn its modal into the single "Display" control center for how a surface
renders (docs/specs/dor-iframe.md -> Path 1; dor-agent-browser.md ->
Headed Pop-Out). UI-only, driven by Storybook: the production panel does
not wire the new actions yet, so the live modal is unchanged apart from
the cosmetic chip glyphs.

- chip: FrameCorners = embed, LockSimple = screencast synced,
  LockSimpleOpen = scaled (SurfacePaneHeader)
- modal: new Render section (Screencast/Embed) gated on setRenderMode;
  viewport controls grey out in embed; Pop out button gated on canPopOut
- types: optional renderMode / setRenderMode / popOut / canPopOut on the
  screen controller, so existing constructions stay green
- stories: renderMode + canPopOut knobs; Embed / EmbedRender / NoPopOut

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…n icons

Builds on the swappable-render storybook UI: make pop-out a third render
backend and restyle the Display modal around the three modes. Still
UI-only (driven by Storybook); production wiring lands later.

- RenderMode is now screencast | popout | embed; drop the separate
  popOut() action — pop-out is just setRenderMode('popout'), gated by
  canPopOut (hidden on web).
- Each render option uses its exact name (agent-browser screencast /
  agent-browser popout / iframe embed) and lists its agent/URL/feel
  trade-offs as green-check / red-x rows.
- Screencast's resolution nests under it: Resize with pane (link) vs
  Fixed (lock, via Device/Custom); greys out for the other modes. Drop
  the now-redundant "Currently"/"RENDER" chrome.
- Far-left chip icons reiterated: link / lock for screencast
  resize / fixed, ArrowSquareOut for popout, FrameCorners for embed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Tighten the screencast resolution UI in the Display modal:
- collapse the 8-button device grid into a single "Emulate" dropdown
  (none + the device registry); picking a device disables the dims.
- put W / H / DPI inline to the right of "Fixed", sized to their max
  digits (W/H 4, DPI 1) and borderless (underline only — the boxed
  inputs were too much framing).
- drop the "Resize with pane" detail text and the phantom icon-gap
  before "agent-browser screencast"; remove the now-dead Currently
  readout helpers (formatDpr, pane).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Quality cleanup (behavior-preserving):
- collapse the three near-identical render-option blocks into one
  RenderOption helper driven by a features array; screencast passes its
  nested resolution controls as children.
- fuse screenChipLabel + ScreenChipIcon into one screenChip() returning
  { icon, label } so the glyph and its label can't drift apart, and the
  icon renders as a value rather than its own component fiber.
- drop an empty-string className branch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add the Wall-level in-place renderer swap and route the Display modal's
"iframe embed" choice through it: selecting embed on an agent-browser
surface reads the active tab URL, replaces the pane with an iframe of
that URL in the same dock slot, and closes the now-unneeded browser.

- Wall.replaceSurface(oldId, {component, params, title}) generalizes
  createContentSurface's replace-untouched-terminal branch; the shared
  closeAgentBrowserSession factors session teardown out of
  killPaneImmediately.
- WallActions.onSwapRenderMode(id, mode): agent-browser→embed works now;
  iframe→screencast/popout calls the host-gated agentBrowserOpen and is
  inert until that capability lands (Stage 3).
- AgentBrowserPanel publishes renderMode 'screencast', gates canPopOut on
  agentBrowserPopOut, and setRenderMode('embed') triggers the swap.
- PlatformAdapter gains agentBrowserOpen / agentBrowserPopOut / PopIn /
  BringToFront interfaces (host impls next); the Wall caches the last
  `dor ab` binaryPath to spawn with.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
AgentBrowserPanel gains a popped-out state: setRenderMode('popout') calls
the host relaunch-headed capability; the body becomes a clean stub
("running in a separate window" + Bring to front / Pop back in), the
canvas stays mounted but hidden, and the stream stays connected to observe
tabs/status. setRenderMode('screencast') — or the headed window closing,
auto-reverted once the headed stream has connected — relaunches headless
and resumes. renderMode on the snapshot reflects popout; persisted via
params.poppedOut.

Inert until the host wires agentBrowserPopOut/PopIn/BringToFront; canPopOut
gates the modal option on agentBrowserPopOut.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Wire the host side so embed→screencast and pop-out actually work on the
VS Code host (standalone/Tauri lacks agent-browser today, so it degrades
— the optional methods stay absent there).

- agentBrowserOpen(url): mint a managed gui-session, `open <url>`, read the
  stream port via `stream status --json` — mirrors `dor ab`. Completes the
  iframe embed → live screencast swap.
- agentBrowserPopOut/PopIn: pop-out is a relaunch (headed/headless is fixed
  at launch), so close the session and reopen it `--headed`/headless at the
  active URL, returning the new stream port. v1 preserves the active tab URL
  only; window positioning over the pane is deferred (VS Code can't read
  screen coords → Chrome places the window). Needs verification against the
  real agent-browser CLI.
- Full plumbing: vscode-adapter methods (+ constructor bindings),
  message-types request/response, message-router dispatch.
- The panel passes the active URL into pop-out/in and hides "Bring to front"
  unless the host implements agentBrowserBringToFront (no-op for now).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…onal)

IframePanel now registers a screen controller, so an iframe embed surface
shows the unified browser chrome (URL + far-left chip → Display modal) and
can swap back to a live screencast. Gated on the host's agentBrowserOpen
capability — without a way to spawn a screencast there's nothing to swap
to, so the embed surface keeps its plain title (e.g. the web host).

- chromeActions: navigate updates the framed URL (re-resolves the proxy);
  reload bumps a nonce to re-resolve (a cross-origin frame can't reload via
  contentWindow); back/forward are no-ops (frame history is unreachable).
- setRenderMode(screencast) routes through the Wall's onSwapRenderMode,
  which spawns an agent-browser for the URL and replaces the pane in place.
  embed→popout is left out for now (the modal offers screencast only).

This completes the bidirectional screencast ↔ embed swap. The dor iframe
surface also gains the URL header + swap chip on capable hosts.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Update the specs to describe what shipped: the Display modal's three-cell
Render section, Wall.replaceSurface, the asymmetric swap directions (the
embed→screencast spawn gated on agentBrowserOpen), and pop-out as an
in-panel render mode with its v1 limits (active-tab-URL only, no window
positioning, VS-Code-only).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
agentBrowserOpen gains a `headed` option so embed→popout is a single
spawn: the host launches the new agent-browser headed in one shot, and
the Wall mounts the replacement surface with poppedOut: true — straight
into the stub, no headless-screencast flash before an immediate relaunch.
The iframe controller's canPopOut now gates on agentBrowserPopOut, so the
embed Display modal offers Pop-out alongside Screencast. Drops the unused
popOutOnReady plumbing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Avoids the interactive prompt / telemetry nag when launching Storybook.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two defects surfaced validating the swap/pop-out matrix on the VS Code host.
Phase 0 confirmed the agent-browser CLI behaves as the host assumes (headed
sessions stream, close kills the browser, relaunch-by-name works), so both were
lib-side, not host/CLI.

1. embed→screencast Apply was disabled. The Display modal gated Apply on the
   viewport-drive capability (hostCapable), which an embed surface reports as
   false. A render-mode swap only needs the spawn capability (already gating
   whether the option shows), so gate the swap separately: Apply is enabled for
   any mode switch and the "dor ab set" note is hidden during a swap.

2. Auto-revert resurrected torn-down sessions. Killing or swapping away from a
   popped-out surface issues `close`, dropping the headed stream; the panel read
   that as a user window-close and relaunched the session headless. A shared
   teardown guard (agent-browser-sessions.ts) marks a session closed before
   Dormouse closes it so auto-revert stands down; a freshly-mounted panel clears
   the mark so a re-created managed name works again.

Also harden readStreamPort with a brief retry so a relaunch that hasn't yet
published its stream port doesn't pin the pane to a dead port ("ended while the
session is live"). TODO.md records the Phase 0 findings + corrected root causes
for the next validation run.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Resolve the contradiction the half-update left: Path 1 (Swappable Render
Backend) shipped but still sat under "# Future Work / Designed, not yet built."
Promote "Render Backends: Two Axes" to the implemented body and scope the
roadmap framing to Path 2 (Plugin System), which gets its own not-yet-built
status note.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The render-swap + headed pop-out work shipped past what the spec described
(and a prior half-update left "implemented" callouts under "Future Expansions /
not yet built"). Reconcile to as-built:

- Render Indicator: the far-left chip now encodes render mode (embed / screencast
  synced/scaled / popped-out) and opens the Display modal; point glyphs at the
  BrowserChromeHeader story.
- Display modal: replace the old Sync/Device/Custom mockup + tables with the
  Render + Resolution model, pointing at the AgentBrowserScreenModal story as the
  UI source of truth; keep the load-bearing behavior (sync, last-writer-wins,
  persistence, degradation).
- Browser-Chrome Header: iframe-embed surfaces also get the chrome (not just
  terminals=plain); rename the chip.
- Host capabilities: add agentBrowserOpen / PopOut / PopIn / BringToFront.
- Lifecycle: render-swap-away close + the auto-revert teardown guard.
- Session naming: gui-<hex> sessions and their not-`--key`-addressable limit.
- Implementation Touchpoints: replaceSurface/onSwapRenderMode, IframePanel
  controller, agent-browser-sessions.ts.
- Headed Pop-Out: rewrite as-built (modal radio not header arrow, no type-char
  confirm, active-tab-URL only, VS-Code-only, Bring-to-front unimplemented) and
  move out of "Future Expansions"; note the dropped confirm as a deviation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The spec's chip table had a stale glyph mapping carried over from a superseded
storybook pass: the shipped header uses LinkIcon for screencast SYNCED
(resizes-with-pane) and LockSimpleIcon for SCALED (fixed) — no open-lock — which
also mirrors the Display modal's Resize-with-pane / Fixed controls. Fix the spec
to match `SurfacePaneHeader.tsx`. (The TODO glossary was already correct.)

TODO: refresh the intro (623 tests; Phase 0 CLI now validated live, the VS Code
webview matrix still pending) and the diagnosis guidance (Phase 0 confirmed, so a
failure points at the lib reconnect/lifecycle or the host plumbing, not the CLI).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Quality-only cleanups on the render-backend-swap branch (no behavior change):

- AgentBrowserPanel popOut: hoist the duplicated "revert unless stream came
  back live" block (repeated in the !ok and catch arms) into a single
  revertUnlessLive helper.
- AgentBrowserPanel popIn: drop the duplicated updateParameters({poppedOut:false})
  and the redundant early-return guard; the common state writes now run once.
- AgentBrowserPanel frame handler: replace the empty-body `if (poppedOut) {}`
  comment anchor with a guard clause around the draw branch.
- IframePanel: read history.index through a ref so back()/forward() — and thus
  chromeActions — stay stable, so the screen-controller registration no longer
  disposes+re-registers (and re-renders the header) on every navigation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… + pop-out)

The branch's swappable render backend, headed pop-out, and HiDPI screencast
surface only worked on the VS Code host, which implements the full
agent-browser capability set. The standalone (Tauri) host only had
createIframeProxyUrl, so on standalone the screencast surface, the
embed->screencast render swap, and pop-out were inert. This mirrors the VS Code
host (vscode-ext/src/agent-browser-host.ts) on standalone.

Rust (standalone/src-tauri/src/agent_browser.rs, new module wired into lib.rs):
- agent_browser_command: runs the binary against a session; validates args[0]
  against the subcommand allowlist (kept in sync with
  AGENT_BROWSER_ALLOWED_SUBCOMMANDS in lib) -- not a general exec channel.
- agent_browser_screenshot: captures one device-resolution frame and returns
  the raw bytes as an ArrayBuffer (tauri::ipc::Response), no base64 round-trip.
- agent_browser_edit: host-owned eval for select-all/copy/cut; copy/cut land on
  the OS clipboard (tauri-plugin-clipboard-manager, mirroring
  vscode.env.clipboard.writeText).
- agent_browser_open: spawns a managed namespaced session (dormouse.1.gui-<hex>)
  and opens a url (optionally headed), returning { session, wsPort, binaryPath }.
- agent_browser_pop_out / agent_browser_pop_in: resolve the live active url via
  `get url`, close, then relaunch headed/headless, returning the new wsPort.
- agent_browser_stream_status: reads the current `stream status --json` port so
  restored panels recover from a stale persisted wsPort.
All accept an optional binaryPath and fall back through
DORMOUSE_AGENT_BROWSER_BIN -> PATH (runWithBinaryFallback equivalent).

TS (standalone/src/tauri-adapter.ts): implemented agentBrowserCommand,
agentBrowserEdit, agentBrowserScreenshot, agentBrowserStreamStatus,
agentBrowserOpen, agentBrowserPopOut, agentBrowserPopIn, each invoking the
matching Rust command. getAgentBrowserStreamUrl is intentionally omitted: the
stream server accepts the tauri://localhost origin, so the panel's built-in
fallback connects directly to ws://127.0.0.1:<port> (no relay). agentBrowserBringToFront
stays unimplemented, consistent with VS Code.

CSP (tauri.conf.json): added ws://127.0.0.1:* and ws://localhost:* to
connect-src so the screencast stream WebSocket can connect.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- agent_browser.rs: add command_error(result, label) helper, collapsing
  four identical stderr-or-exit-code error blocks (edit/open/pop_out/pop_in).
- agent_browser.rs: factor pop_out/pop_in's shared close+relaunch body into a
  private relaunch(session, url, headed, binary_path) primitive; the two Tauri
  commands are now thin headed=true/false wrappers.
- agent_browser.rs: drop the dead `pid << 64` term in generate_gui_session —
  it was masked entirely away by `& 0xffff_ffff_ffff`, so the PID never
  affected the output (session id is unchanged: low 48 bits of nanos).
- tauri-adapter.ts: extract errMessage(err) helper, replacing 8 copies of the
  `err instanceof Error ? err.message : String(err)` idiom.

Quality-only, no behavior change. lib tsc, standalone tsc, and cargo check
all pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A popped-out surface is a real headed OS window the user drives directly,
but the viewport-sync effects (ResizeObserver, wsPort-change, DPR) gate
only on `syncEngaged`, never on `poppedOut` — so a swap-to-popout (which
seeds `syncEngaged: true`) kept issuing `agent-browser set viewport <pane>`
on every Dormouse pane resize, fighting the native window even though
popout mode disables the resolution controls.

Guard the single `issueSyncToPane` chokepoint on `poppedOutRef` so no
`set viewport` is issued while popped out. `syncEngaged` stays true, so
sync resumes correctly when the surface pops back in (the wsPort-change
effect re-issues against the fresh session). Found by codex review.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A popped-out surface relaunches its agent-browser session headed — a real
OS window detached from the extension host. Nothing closed it on shutdown
(`close` only ran for explicit kill / render-swap), so quitting the editor
left an orphaned Chrome window behind, contrary to the spec's pop-out
lifecycle ("Dormouse/editor quits → headed windows are cleaned up; no
orphans").

Track popped-out sessions in the host (set on pop-out, cleared on pop-in
or any explicit `close`) and close them from `deactivate()`. Headless
sessions are deliberately left alive to reattach across webview reloads.
deactivate() also fires on Reload Window; a popped-out surface then
auto-reverts to a headless screencast on reactivation, which is preferable
to a detached headed Chrome lingering. Found by codex review.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The specs described an agent-browser surface that was VS-Code-only, but the
standalone (Tauri) host had since gained the full capability set (command,
edit, screenshot, stream-status, open, pop-out, render-swap). Flip the stale
"Tauri today is inert" / "VS-Code-only" claims to name the web host as the
only one without agent-browser, add the standalone files to the touchpoints,
and mark Path 1 implemented on both hosts. Notes the one remaining VS-Code-only
gap: closing orphaned headed pop-out windows on shutdown.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The two hosts implemented the same capability set in two languages — Node/TS in
the VS Code extension host, a hand-ported 541-line Rust module in standalone —
so they drifted (notably: the standalone never closed orphaned headed pop-out
windows on shutdown).

Collapse both onto a single host-agnostic module, lib/src/host/agent-browser-host.ts,
exactly as the iframe proxy is already shared:

- lib/src/host/agent-browser-host.ts: the single source of truth (binary
  resolution/spawn, allowlist, edit scripts, gui-session naming, URL resolution,
  screenshot, pop-out tracking, closePoppedOut). A factory injecting only the two
  genuinely host-specific bits — clipboard-write and logging.
- VS Code host: slimmed to instantiate the shared module + keep the VS-Code-only
  stream relay (the vscode-webview:// origin workaround).
- Standalone: the Node sidecar runs the bundled agent-browser-host.cjs behind
  thin Rust forwarders in lib.rs (mirroring iframe_create_proxy_url); deleted
  agent_browser.rs and the tauri-plugin-clipboard-manager dep. clipboard-ops.js
  gains writeClipboardText (pbcopy/clip/xclip/wl-copy), and the sidecar's
  shutdown() now calls closePoppedOut() — which is the orphan-window fix.

Screenshot bytes ride the sidecar stdio as base64 and are decoded back to a raw
tauri::ipc::Response, so the webview still receives an ArrayBuffer.

Also fixes a latent gap surfaced by unifying: a headed `open` (embed→popout,
which spawns headed directly) is now tracked for shutdown cleanup; the old code
only tracked popOut.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…r surface

Replace the "three surface types" framing with a single surfaceType:'browser'
whose render axis is one canonical renderMode field: ab-screencast | ab-popout |
iframe (the ab- prefix names the engine, leaving room for a future engine; iframe
is engine-less). The URL and renderMode become single-homed, persisted panel
params, which is the fix for the buggy transitions — the URL had no canonical
home for agent-browser and was laundered from a possibly-stale live snapshot at
swap time.

- Canonical BrowserPaneState { url, renderMode, agentBrowser? } as the single
  source of truth across every renderer.
- A BrowserPanel shell owns url+renderMode and remounts the matching renderer
  child (not a fused component — input models still differ).
- Render-mode transition matrix: iframe -> ab trivial; ab <-> ab silently drops
  non-active tabs (accepted, pending profile persistence); ab -> iframe warns +
  typed-character confirm when >=2 tabs.
- dor iframe opens a full browser-chrome tab (chrome ungated from swap-capability).
- iframe new-tab attempts (target=_blank / window.open) intercepted by the shim
  and prompted -> open as a new browser pane.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The agent-browser surface never persisted its URL — it lived only in the live
session/stream — so render-mode swaps and pop-out/auto-revert laundered it from
a chrome snapshot that can be empty mid-relaunch, yielding blank-page swaps and
about:blank auto-reverts.

Mirror the active tab's URL into params.url whenever the chrome snapshot changes,
and read params.url first (falling back to the live snapshot) in popOut/popIn and
in Wall.onSwapRenderMode's agent-browser->embed path. params.url is now the single
source of truth and round-trips through the layout blob.

First increment of the browser-surface unification (docs/specs/dor-iframe.md
-> Path 1, "url is single-homed").

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The RenderMode enum was screencast | popout | embed. Rename to the engine-prefixed
vocabulary (docs/specs/dor-iframe.md → "Render Backends"): ab-screencast, ab-popout,
iframe. The ab- prefix names the engine (agent-browser), leaving room for a future
engine beside it; iframe is the engine-less DOM embed. Pure rename across the screen
controller, both panels, the Display modal, the header chip, stories, and tests — no
behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
nedtwigg and others added 14 commits June 18, 2026 15:32
The agent-browser daemon re-broadcasts the current frame and tab list on
a ~20Hz heartbeat even when nothing changes. The connection forwarded
every message, so a *static* page drove ~20 device-resolution
screenshots/sec — each a child-process spawn (`agent-browser
screenshot`) — plus ~20 `setTabs` re-renders/sec. Worse, each forwarded
frame's screenshot poked the daemon into emitting again, a
self-perpetuating loop that never settled.

Drop byte-identical frame and tab re-broadcasts (djb2 hash of the
payload) before emitting `frame-pulse`/`tabs`, resetting the dedupe
sentinels on reconnect so a fresh stream always re-primes. A genuine
change (animation, navigation, new/closed/focused tab, title) alters the
bytes and flows through untouched.

On a settled static page this takes idle work from ~20/sec to 0 (the
feedback loop breaks and the daemon goes quiet), matching the screenshot
loop's contract: "a static page produces no pulses, so no shots and no
cost."

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Update the skill from a real debugging session:

- Freshness: `close --all` is global (kills the outer session too);
  document the re-open + stray-about:blank-needs-a-second-open recovery.
- Driving: `keyboard` is type/inserttext only (use inserttext — `type`
  reorders chars); submit via synthetic keydown Enter, not `\r`; wrap
  evals in an IIFE; click via mouse move/down/up with a dwell; map clicks
  through the 1:1 canvas→nested-page offset.
- Timing: prefer a `window.__M` global + shell polling over mirror lines;
  note the cmdStart-on-retry and mouse-round-trip skew caveats.
- What To Watch: record the static-page churn root cause + dedupe fix and
  a concrete idle regression check.
- Validation: add the `lib/src` tsc + vitest path and the Vite-alias
  hot-reload note.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The pane that `dor ab open` creates is not the selected pane — the
terminal the command ran in is. The canvas mouse handlers gated on
`interactive` (passthrough mode AND this pane selected), so the first
click on the browser surface was swallowed: it only *selected* the pane
(via the root `onClickPanel`), and `selectedId` updates a render later,
so the click never reached the page. Clicking a link did nothing until a
second click — felt like a "gigantic delay" before a tab opened.

Gate mouse down/up on passthrough mode alone, so a direct click on the
canvas both selects the pane and passes through (the down/up carry their
own coordinates, so forwarding just those is enough). Keyboard and wheel
still require full `interactive` so a background pane never steals them.

A single click now opens the link's tab immediately.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Clicking a browser tab chip or its × did nothing: a mousedown on the
chip bubbles to the pane's onMouseDown → onClickPanel, which selects the
pane and makes dockview move this panel's DOM. With a real (non-instant)
press, that re-layout lands between mousedown and mouseup, so the node
the press started on is gone by release and the browser never synthesizes
a `click` — selectTab/closeTab were never called (no command, no error).

This only reproduces with a human-length press: synthetic instant
down→up and `element.click()` both dodge it, which is why it slipped
through earlier. Reproduced in the harness by holding the press ~300ms.

Fix: act on mousedown (left button), like the canvas already does via
onCanvasMouseDown — it fires synchronously before the re-layout. Tests
fire mousedown (not click) so a revert to onClick is caught.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two cleanups from the /simplify review:

- IframePanel: commitUrl and observeFrameUrl differed by a single
  updateParameters line (and repeated the title formatting a third time
  in goToHistoryIndex). Extract applyFrameUrl(url, persist).

- The host's `tab list --json` parser re-implemented the connection's
  parseTabRecord — same tabId/id fallback, same url/active coercion, even
  the same "some CLI builds use id" rationale. Lift the record parse into
  a shared lib/src/lib/agent-browser-tab.ts that both the connection
  (live stream) and the host (CLI) import, so the tab shape has one home.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two fixes for "clicking a browser tab does nothing":

1. Click-eating, fixed at the source. Selecting a pane on mousedown makes
   dockview move the panel's DOM, and a real (non-instant) press then
   spans that relayout, so the browser never synthesizes the `click` and
   the chip/× silently did nothing. Give every browser surface dockview's
   `renderer:'always'` (rendererForParams) — the same setting the iframe
   already used — so the node stays put and the click survives. The tab
   chip/× revert from the mousedown workaround (cc56e6e) back to onClick.
   (The canvas passthrough gate and the rAF focus stay — verified those
   address first-click selection timing and dockview focus ordering, not
   the DOM move.)

2. Stale canvas on switch. The daemon emits no screencast frame when the
   active tab changes, and the dedup'd stream (1388844) is otherwise
   silent on a static page, so the canvas never repainted onto the
   newly-selected tab. Force one device capture when the active tab id
   changes so the surface follows the switch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two URL-bar-lies in the new same-frame location plumbing:

- The proxy shim posted `location` for any anchor click, so Cmd/Ctrl/
  Shift/Alt+click (which open a new tab and leave the frame put) moved
  the parent URL bar + Back history to a page the iframe wasn't showing.
  Guard the same-frame post on modifier keys / primary button.

- After a shim-observed in-frame navigation, params.url stays at the
  source URL, so Back to that URL was a no-op write and the proxy effect
  never re-fired — the frame kept showing the navigated page while the
  chrome showed the Back target. Bump reloadNonce in goToHistoryIndex to
  force a re-resolution (fresh proxy port → real reload), matching the
  reload button.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 19, 2026

Copy link
Copy Markdown

Deploying mouseterm with  Cloudflare Pages  Cloudflare Pages

Latest commit: 15733ef
Status: ✅  Deploy successful!
Preview URL: https://51d2e0e3.mouseterm.pages.dev
Branch Preview URL: https://ab-iframe-unify.mouseterm.pages.dev

View logs

@dormouse-bot dormouse-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both Build & Test and Standalone Smoketest are red on the same tsc error at AgentBrowserPanel.tsx:546 (platform.agentBrowserCommand possibly undefined). The PR description calls this "pre-existing... present before this branch's latest fixes" — but it isn't on main: there connect() resolved the stream URL via getAgentBrowserStreamUrl?.(wsPort) (optional-chained). This branch rewrote connect() to call platform.agentBrowserCommand(...) directly, which introduces the error. The effect already guards with if (!platform.agentBrowserCommand) return; a few lines up, but TS doesn't carry that narrowing of an optional property into the nested connect closure — so the call still needs a local binding. Inline suggestion below makes the build green.

One lower-confidence observation on the new iframe proxy shim, for your call (no fix pushed since the right behavior is a judgment): the injected click handler in iframe-proxy-rewrite.ts posts location: a.href from the capture phase for same-frame primary-button anchor clicks. Capture runs before the page's own bubble-phase handlers, so a same-origin link the page intercepts and cancels (<a href="/logout" onClick={e => e.preventDefault()}>, or an <a> styled as a button that does a fetch instead of navigating) still reports /logout to the parent, which pushes it into Back history and the URL bar though the frame never navigated. SPA <Link> clicks self-correct via the patched pushState, and javascript:/cross-origin hrefs are already dropped by the parent's origin check — so the residual gap is genuinely-cancelled same-origin clicks. If that matters, deferring the post a tick (setTimeout(() => { if (!e.defaultPrevented) post('location', ...) }, 0)) or relying on the navigation hooks instead of the click guess would close it.

Comment thread lib/src/components/wall/AgentBrowserPanel.tsx Outdated
- AgentBrowserPanel's popped-out CDP-connect effect called the optional
  platform.agentBrowserCommand directly inside the nested connect()
  closure. TS doesn't carry an optional-property narrowing into the
  closure, so it failed tsc (TS2722/TS18048) and reddened CI. Bind it to
  a local const after the guard.

- The proxy shim's same-frame click handler runs in the capture phase
  and posted location before the page's own handlers, so a same-origin
  link the page cancels (preventDefault, or an <a> that fetches instead
  of navigating) still desynced the parent URL bar + Back history. Defer
  the post a tick and skip it when the click was cancelled; real
  navigations re-report via the next document's shim.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
nedtwigg and others added 2 commits June 19, 2026 12:43
The two surfaces are now one `browser` surface with a swappable renderer
(ab-screencast / ab-popout / iframe), so describe them in one spec:
shared shell first (chrome, canonical pane state, render swap, lifecycle,
host pattern), then each renderer. Removes the cross-file duplication
(the 3x "full browser-chrome tab" line, the host-pattern echo, the
transitions/tab-drop restatement) by giving each shared concept a single
owner. Repoints dor-cli.md links and all code-comment doc-pointers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the two blocks that re-narrated the code — the implementation
touchpoints table and the per-method host-capability descriptions —
with a compact code map and one-line method notes (signatures already
live in types.ts). Trim restated header-layout bullets, the dev-server
mechanics, the two-axes render bullet, and the profile-persistence
future-work item (each duplicated a fact stated elsewhere). Also drop
stray </content>/</invoke> tags that leaked into the file tail.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
nedtwigg and others added 2 commits June 19, 2026 13:14
Rewrite the unified browser spec as a terse contract: Title-case sections,
a per-section "Source of truth" file list, flat BrowserPanelParams, and
declarative invariants/header-contract/swap-behavior tables. Delegates
mechanics to the code instead of paraphrasing them (~440 lines).
The contract rewrite renamed sections, orphaning the "→ <section>" anchors
in the code comments that point at dor-browser.md. Re-point all of them to
the new headings (23 files) and clean a residual sub-anchor.

Restore four traps the condensation dropped that aren't recoverable from
the code: screencast is DIP/CSS-only so owning CDP wouldn't help; VK map
must not be key.charCodeAt(0) (. = VK_DELETE); pop-out must not query the
daemon in the close/reopen gap (spawns a blank daemon); sync is
last-writer-wins, disengaged only after a frame confirms the issued size.

Also fix the iframe-focus source-of-truth paths: use-window-focused.ts is
under components/wall/, and registerSurfaceFocusHandle is defined in
terminal-lifecycle.ts (terminal-registry only re-exports it).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The iframe panel calls the host capability through a detached reference
(`const createProxy = getPlatform().createIframeProxyUrl; createProxy(url)`),
which drops `this`. BrowserSidecarAdapter's methods reach `this.host.invoke`,
so the detached call threw "Cannot read properties of undefined (reading
'host')" — caught and surfaced as a bogus "Couldn't reach the server" error
page in every `dor iframe` pane under the standalone agent-browser dev harness.

The Tauri adapter is `this`-free (module-level rawInvoke) and the VS Code
adapter already binds for exactly this reason; BrowserSidecarAdapter was the
lone adapter that didn't honor the detached-call convention. Bind its
this-using methods in the constructor, mirroring vscode-adapter.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@nedtwigg nedtwigg merged commit 7f88741 into main Jun 19, 2026
9 checks passed
@nedtwigg nedtwigg deleted the ab-iframe-unify branch June 19, 2026 21:08
dormouse-bot added a commit that referenced this pull request Jun 19, 2026
#156 removed dor-iframe.md and dor-agent-browser.md, replacing both with
dor-browser.md. Merge main and collapse the two index entries into one
pointing at the unified browser surface spec.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants