diff --git a/.claude/skills/debug-standalone-agent-browser/SKILL.md b/.claude/skills/debug-standalone-agent-browser/SKILL.md new file mode 100644 index 00000000..8b9aceeb --- /dev/null +++ b/.claude/skills/debug-standalone-agent-browser/SKILL.md @@ -0,0 +1,191 @@ +--- +name: debug-standalone-agent-browser +description: Use when debugging Dormouse standalone behavior through the browser-based agent-browser harness instead of Tauri. Covers launching a fresh `pnpm dev:standalone:ab` run, observing sidecar and in-browser logs together, driving the UI with `agent-browser`, clearing stale nested browser sessions, and timing agent-browser screencast/tab behavior. +--- + +# Debug Standalone With Agent Browser + +Use this skill when you need to run Dormouse standalone in a normal browser so you can inspect sidecar logs, browser console logs, DOM state, screenshots, and user interactions from the same debugging session. + +## Harness + +Run from the repo root: + +```sh +DORMOUSE_BROWSER_DEV_AB_SESSION=dormouse-debug-$(date +%s) \ +DORMOUSE_BROWSER_DEV_VITE_PORT=1550 \ +DORMOUSE_BROWSER_DEV_HOST_PORT=1552 \ +pnpm dev:standalone:ab +``` + +The harness: + +- stages the `dor` CLI and sidecar proxy +- starts the standalone Node sidecar directly +- starts a localhost HTTP/SSE bridge for browser-side `PlatformAdapter` calls +- starts Vite with `VITE_DORMOUSE_BROWSER_DEV_HOST` +- opens the app in `agent-browser` +- mirrors browser console logs as `[browser log] ...` in the harness terminal + +Use unique `DORMOUSE_BROWSER_DEV_AB_SESSION`, Vite port, and host port for repeat runs to avoid stale outer-browser state and port collisions. + +## Freshness + +Before a measurement, clear any stale nested agent-browser session used by Dormouse surfaces: + +```sh +agent-browser --session dormouse.1.default close --all +``` + +This matters because `dor ab open ...` uses a nested agent-browser session such as `dormouse.1.default`. If it has old tabs, the first stream snapshot can be polluted with stale URLs. + +**`close --all` is global, not per-session.** Despite the `--session` flag, it closes *every* agent-browser session — including the outer harness session the app runs in. That is actually the cleanest way to get a fresh blank Dormouse, but you must then re-open the outer session yourself: + +```sh +agent-browser --session dormouse.1.default close --all # clears nested AND outer +agent-browser --session open "http://localhost:/" +``` + +The first `open` after a `close --all` frequently lands on `about:blank` instead of navigating (the stray-about:blank race). **Issue `open` a second time** and poll until the URL sticks and the xterm input exists: + +```sh +agent-browser --session open "http://localhost:/" # often needed twice +agent-browser --session eval '(()=>(!!document.querySelector("textarea.xterm-helper-textarea")&&location.href.indexOf("")>-1)?"ready":"no")()' +``` + +Browser console mirroring (`[browser log] ...`) keeps working after a manual re-open, so you don't lose log visibility. + +Stop any running harness with Ctrl-C (or `pkill -f dev-agent-browser.mjs`) before starting another one. Do not leave background dev servers running after a timing run. + +## Driving Dormouse + +Use the outer harness session printed by `dev:standalone:ab`: + +```sh +agent-browser --session snapshot -i +agent-browser --session get text body +agent-browser --session screenshot /private/tmp/dormouse.png +``` + +### Command/mouse subcommands are limited + +- `agent-browser keyboard` accepts only `type` and `inserttext` (there is **no** `keyboard press`). +- `agent-browser mouse` accepts only `move`, `down`, `up`, `wheel` (there is **no** `mouse click`). + +### Typing into xterm + +`keyboard type "..."` simulates per-keystroke events and **reorders characters under load** (you get `dor ab opne dormouse.sh`). Use `keyboard inserttext` (atomic) instead, and always read the input line back to verify before submitting: + +```sh +agent-browser --session eval '(()=>{document.querySelector("textarea.xterm-helper-textarea")?.focus();return"f"})()' +agent-browser --session keyboard inserttext "dor ab open dormouse.sh" +# verify the line, then clear with raw Ctrl-U ($'\025') and retype if it is wrong +agent-browser --session eval '(()=>{var r=document.querySelector(".xterm-rows");return r?r.innerText.split("\n").filter(l=>l.trim()).slice(-1)[0]:""})()' +``` + +### Submitting (Enter) + +`keyboard type $'\r'` and `keyboard type "\n"` are **unreliable** here — they frequently fail to submit. The robust submit is a **synthetic `keydown` Enter dispatched to the helper textarea**, and it still sometimes needs a retry, so loop until the command's output appears: + +```sh +agent-browser --session eval '(()=>{var ta=document.querySelector("textarea.xterm-helper-textarea");ta.focus();["keydown","keypress","keyup"].forEach(function(t){ta.dispatchEvent(new KeyboardEvent(t,{key:"Enter",code:"Enter",keyCode:13,which:13,bubbles:true,cancelable:true}));});return"enter"})()' +# poll the xterm rows for the expected output (e.g. "A dormouse knows when to wake up"); re-dispatch if absent +``` + +### eval gotcha + +`agent-browser eval` runs in a **persistent context**, so top-level `const`/`let` leak across calls (`Identifier 'r' has already been declared`). **Wrap every eval body in an IIFE** — `(()=>{ ... })()`. + +### Clicking a link inside the screencast + +The screencast canvas forwards real pointer events to the nested page, and the nested page viewport is **1:1 with the canvas intrinsic size**, so mapping is just an offset: + +1. Get the outer canvas box: `agent-browser --session eval '(()=>{var c=document.querySelector("canvas");var r=c.getBoundingClientRect();return JSON.stringify({x:r.x,y:r.y,iw:c.width,ih:c.height})})()'` +2. Find the link's center in the **nested** session DOM (the real page): `agent-browser --session dormouse.1.default eval '(()=>{var a=[...document.querySelectorAll("a")].find(x=>/github\.com/i.test(x.href));var r=a.getBoundingClientRect();return JSON.stringify({cx:r.x+r.width/2,cy:r.y+r.height/2})})()'` +3. Outer click point = `canvas.x + nested.cx`, `canvas.y + nested.cy`. Click with move → down → up, and **dwell ~0.1–0.2s between down and up** — a too-fast click on an *idle* (quiet) daemon does not register: + +```sh +agent-browser --session mouse move +agent-browser --session mouse down ; sleep 0.12 ; agent-browser --session mouse up +``` + +Confirm the click landed against the **real** daemon, not just the panel: `agent-browser --session dormouse.1.default tab list`. + +## Timing Pattern + +Install a page-local timing probe with `agent-browser eval` before the action under test. **Store marks in a global (`window.__M`) and poll it from the shell** — this is more reliable than parsing `[browser log] [measure] ...` mirror lines, and gives you exact deltas. Keep it simple: + +- record `performance.now()` into `window.__M.` at each event +- use a `MutationObserver` plus a short `setInterval` to watch DOM titles/text/canvas +- intercept `console.log` to catch `[ab-panel] tabs msg` and parse its `t` array — that is the precise signal for tab open/resolve (e.g. `t.length>=2`, or an entry matching `github.com`) +- poll with `agent-browser eval '(()=>JSON.stringify(window.__M))()'` + +Useful marks: + +- `command-enter-start`: immediately before submitting `dor ab open ...` +- `first-visible-canvas`: first visible non-zero canvas frame +- `page-title-loaded`: a `[title]` attribute equals the page's real `` +- `github-click-start`: immediately before clicking the GitHub link +- `tabs-two`: first `[ab-panel] tabs msg` whose `t` array has ≥2 entries +- `tabs-github-resolved`: first tab entry whose URL is `github.com/diffplug/dormouse` + +Caveats that corrupt timings: + +- **Set `cmd-enter-start` atomically inside the same eval that dispatches the synthetic Enter** (zero skew). Because submitting often needs retries, do **not** naively reset the start mark each retry — a stale/overwritten `cmdStart` yields nonsense deltas (e.g. tens of seconds). Only count marks that fire after the *successful* submit. +- The shell→`mouse down`/`up` round-trip adds ~100–150 ms; a click-to-tabs number measured this way is an upper bound. + +For click targeting, see **Clicking a link inside the screencast** above (1:1 canvas→page mapping; locate the link via the nested session DOM). + +## What To Watch + +In the harness terminal, correlate: + +- `[sidecar] ...` for sidecar behavior +- `[browser log] [ab-panel] connecting stream ...` +- `[browser log] [ab-panel] tabs msg ...` +- `[browser log] [agent-browser] screenshot start/done ...` +- `[browser log] [measure] ...` + +For a clean `dor ab open dormouse.sh`, the first tab snapshot should look like one active tab: + +```text +[ab-panel] tabs msg {"t":["t1:A:https://dormouse.sh/"]} +``` + +If the first snapshot already contains GitHub or multiple Dormouse tabs, clear the nested session and rerun. + +### Static-page screenshot churn (diagnosed + fixed) + +A *static* page should produce **zero** `screenshot start/done` and **zero** `tabs msg` once it settles. If you see them repeating (~20/sec) with unchanged tab snapshots, that is the known churn bug. + +Root cause: the external agent-browser **daemon re-broadcasts the current frame and tab list on a ~20Hz heartbeat even when nothing changes**. Each forwarded frame triggers a device-resolution screenshot — *a child-process spawn* (`agent-browser screenshot`) — which pokes the daemon into emitting again: a self-perpetuating feedback loop. Each redundant `tabs` message also forces a `setTabs` React re-render. + +Fix (in `lib/src/components/wall/agent-browser-connection.ts`): drop **byte-identical** frame and tab re-broadcasts (djb2 hash of the payload) before emitting `frame-pulse` / `tabs`, resetting the dedupe sentinels on reconnect. A genuine change (animation, navigation, new/closed/focused tab, title) alters the bytes and flows through. See `agent-browser-connection.test.ts` for the dedupe + reconnect-reprime tests. + +**Regression check:** open `dormouse.sh`, let it settle ~4s, then over 10s of idle confirm `grep -c "screenshot start"` and `grep -c "tabs msg"` are both **0**. To re-measure the daemon's raw vs. forwarded rate, temporarily add a 2s-window counter in the connection (count frames/tabs seen vs. dropped-as-duplicate) — on a static page it reads ~39 seen / 39 dropped per 2s before the dedupe takes effect. + +## Validation + +After changing the harness, run: + +```sh +node --check standalone/scripts/dev-agent-browser.mjs +pnpm --filter dormouse-standalone build +``` + +After changing webview/lib code under `lib/src` (e.g. the agent-browser connection, panel, or screenshot loop), run from `lib/`: + +```sh +npx tsc --noEmit -p tsconfig.json +npx vitest run src/components/wall # whole wall suite, or the specific *.test.ts +``` + +`lib/src` is served to the standalone app directly via a Vite alias (`dormouse-lib` → `lib/src`), so these changes hot-reload — re-open the outer session to pick them up, no sidecar rebuild needed. (Only host-side code bundled into the sidecar, e.g. `agent-browser-host.ts`, needs a sidecar rebuild + restart.) + +If the `dor` command fails inside the staged sidecar with missing package imports, confirm the harness uses: + +```js +standalone/sidecar/dor-cli/dist/dor.js +``` + +not `dist/cli.js`. diff --git a/.gitignore b/.gitignore index ba812d73..1ed41e5f 100644 --- a/.gitignore +++ b/.gitignore @@ -29,6 +29,7 @@ standalone/src-tauri/gen/ standalone/dist/ standalone/sidecar/dor-cli/ standalone/sidecar/iframe-proxy.cjs +standalone/sidecar/agent-browser-host.cjs standalone/sidecar/node_modules/ standalone/node_modules/ diff --git a/docs/specs/dor-agent-browser.md b/docs/specs/dor-agent-browser.md deleted file mode 100644 index 2edfa585..00000000 --- a/docs/specs/dor-agent-browser.md +++ /dev/null @@ -1,627 +0,0 @@ -# Dor Agent-Browser Surface - -> See `docs/specs/glossary.md` for canonical Session and Pane vocabulary, and -> `docs/specs/dor-cli.md` for the shared `dor` CLI, surface handle model, and -> host control plumbing this surface builds on. - -`dor agent-browser` (alias `dor ab`) shows a live, interactive browser **inside** -Dormouse by delegating 100% to the user's own -[agent-browser](https://github.com/vercel-labs/agent-browser) install. It is a -**viewer client, not a fork**: every piece of browser behavior — Chromium, CDP, -the screencast, the entire command surface — stays in `agent-browser`. Dormouse -adds only a thin surface that renders the session, forwards input, and presents -tabs. We reimplement none of agent-browser's behavior, the same way an HTTP -client is not a fork of the server. - -This is the chosen alternative to the iframe surface (see -[dor-iframe.md](dor-iframe.md)). Because the browser -renders to a Dormouse-owned `<canvas>` rather than a cross-origin `<iframe>`, -Dormouse keeps its own keydown listener and never loses focus control: the -keyboard model that breaks for iframes does not apply here. - -## Delegation Boundary - -`dor ab` resolves the user's `agent-browser` binary on `PATH` (override with -`DORMOUSE_AGENT_BROWSER_BIN`). It is **not bundled or vendored**; if it is -missing, the command fails with an install hint -(`npm i -g agent-browser`). The version is therefore always the user's own — -commands and the stream protocol are version-matched by construction. - -`dor ab <args...>` is a near-transparent passthrough to `agent-browser <args...>`. -Dormouse intercepts exactly one flag — `--key` (below) — translating it to an -`agent-browser --session` selector; every other argument is forwarded verbatim, -including subcommands that do not exist yet. Three behaviors are delegated rather -than reimplemented: - -| Concern | Delegated to | -| --- | --- | -| Video (frames) | agent-browser session **stream** WebSocket, as change signals (see Channels → Frames) | -| Input (mouse/keyboard) | the same stream WebSocket's native **`input_*`** messages | -| Tabs | stream `tabs` messages (read) + **`tab list` / `tab <n>` / `tab close`** (act) | - -## The `--key` Model - -`--key <name>` is the primary interface. It is **workspace-scoped** and defaults -to `--key default`, so a human running `dor ab` and a coding agent running -`dor ab` from any terminal in the same workspace land on the **same** browser -surface. This is the 80% case: one browser everyone iterates on. A second -concurrent browser is one flag away: - -``` -dor ab open http://localhost:5173 # → key "default" -dor ab --key storybook open http://localhost:6006 -dor ab click @e3 # drives key "default" -dor ab --key storybook reload # drives key "storybook" -``` - -Workspace scoping is automatic: `dor ab` routes its control request to the Wall -that owns the invoking terminal surface, and the Wall is per-workspace, so key -resolution is scoped to the right workspace with no extra plumbing. - -### Key → session naming - -A managed `--key` maps to a namespaced agent-browser session: - -``` -session = "dormouse.<workspaceId>.<key>" -``` - -`<workspaceId>` is hardcoded `1` until Dormouse exposes real workspaces (see -`dor-cli.md` → Handle Model); it is encoded now to avoid a later rename. -Namespacing keeps managed keys from colliding with sessions a user created -directly via plain `agent-browser`. Dots, not slashes: agent-browser session -names become socket paths, and a `/` in the name kills the daemon on startup -(verified against 0.27.0). Keys are validated to `[A-Za-z0-9._-]+` for the -same reason. - -### `--key` vs raw `--session` - -`--key` (managed, namespaced) and `--session` (attach to a session by its literal -agent-browser name) are **mutually exclusive**. `--key default` applies only when -neither is given. `--session <raw>` is the bring-your-own escape hatch for -attaching to a session some other tool created; Dormouse still opens/reuses a -surface for it but performs no namespacing. - -## Session ↔ Surface Mapping - -The session name is the single source of truth. The Wall holds a registry: - -``` -key (or raw session) → { session, surfaceId } -``` - -- **1:1, auto-managed.** The first `dor ab` for a session with no surface creates - a browser surface (split next to the caller, same placement rule as - `dor iframe`). Later commands for that session reuse it. No 1:many mirroring. -- **Two namespaces, reconciled.** Every other `dor` command addresses a *surface* - (`surface:3`, `title:…`); agent-browser addresses a *session*. Driving the - browser is **session-keyed exclusively** (via `--key`/`--session`). Layout - commands (`dor split`, `dor kill`, move) still treat the pane as an ordinary - surface. The pane is addressable two ways for two purposes; there is no - dual-identity ambiguity because only one namespace ever drives the browser. -- **Targeting by surface is supported but secondary.** A surface ref resolves - *to* its bound session; `--key` remains the primary interface. - -## Tabs - -A session may have any number of tabs (page targets). Dormouse has no tab model -and gains none: **one session is always exactly one surface**, regardless of tab -count. Tabs live entirely inside that surface's chrome. - -- **Integrated mode (1 tab):** the page title sits in the Dormouse surface - header. No tab strip — the pretty, default case. -- **Multi-tab mode (≥2 tabs):** a tab strip renders *below* the Dormouse header, - inside the surface body (title + close `×` per tab; no manual "+", and no - favicons — the webview CSP blocks arbitrary external images). The strip is a - thin view over the stream's pushed `tabs` messages; selecting a tab issues - `tab <tabId>` (the frame stream and input follow the active target because - "active tab" is an agent-browser operation); the `×` issues - `tab close <tabId>`. When the session returns to one tab, the surface drops - back to integrated mode. -- **Orthogonal to minimize.** Internal tab count is invisible when the surface is - minimized: title-only along the bottom whether it holds 1 tab or 9. Dormouse's - binary "you're looking at it or you're not" model is preserved. - -Tab behaviors: - -- **`dor ab open <url>` navigates the active tab; it does not spawn one.** New - tabs arrive only from the web (popups, `target=_blank`) or an explicit - `dor ab tab new`. The agent drives the active tab; the web spawns extras — - which is what naturally moves the surface into multi-tab mode. -- **Web-opened tabs are focused** (enter multi-tab mode, select the newest), - matching typical browser foregrounding; reversible by clicking back. Dormouse - does not fight the web's popup / open-in-new-tab behavior. - -## Browser-Chrome Header - -The browser surface's header reads like a browser: the active tab's **URL** (not -its HTML `<title>`), Chrome-style nav controls, and the one thing only Dormouse -can show — which pane in the workspace is serving a localhost URL. All of this is -**browser-surface only**, gated on the screen-controller presence exactly like -the SYNCED/SCALED chip; terminals and iframes keep their plain title header. The -header is shared (`SurfacePaneHeader.tsx`) and already tight and responsive. - -### Layout — mirror Chrome's toolbar - -Left→right, matching a real browser so it reads as "browser-ish": - -``` -┌──────────────────────────────────────────────────────────────────────────────┐ -│ ⤢ ← → ⟳ (storybook) localhost:5173 ◉ pnpm dev ⬍ ⬌ ⤢ _ ✕ │ -└──────────────────────────────────────────────────────────────────────────────┘ - sync back/fwd/ key URL dev-server split/zoom min/ - chip refresh badge (host+path) connection (collapse) kill -``` - -- **Sync chip → far left.** The SYNCED/SCALED icon (`FrameCorners`/`Resize`, - click → screen modal) sits at the very left edge, out of the way of the nav - controls. Behavior unchanged. -- **Back / forward / refresh** sit where Chrome puts them, immediately left of - the URL. -- **URL is the primary text**, replacing the HTML title. A flexible spacer lives - after the URL/connection so the layout buttons stay right-aligned. - -Priority order under width pressure: **sync + URL/connection always visible; nav -buttons collapse next (below ~360px); split/zoom collapse first (below 420px); -kill always stays.** - -### URL over HTML title - -The header's primary text is the active tab's **URL (host + path)**; the HTML -`<title>` is **demoted to the tooltip**. The URL already rides the `tabs` stream. -The persisted panel title (door labels, session save) stays the tab's display -title — the URL preference is a live-header concern only, so the multi-tab strip -still shows HTML titles to tell tabs apart. Both flow body→header through the -existing screen controller's separate **chrome snapshot** channel (URL / key), -kept distinct from the screen snapshot so tab updates don't churn the -SYNCED/SCALED chip and vice versa. - -**Click to navigate.** Clicking the URL opens an inline editor (the -terminal-rename pattern) pre-filled with the full URL, all selected: **Enter** -navigates (`open <url>`, scheme-normalized — `http://` for loopback so a bare -`localhost:5173` doesn't SSL-error, `https://` otherwise); **Escape**/blur -cancels, browser-omnibox style. While it's open the surface flags -dialog-keyboard so the Wall's chord handler stands down, and the panel's -key-forwarder skips editable targets so keystrokes reach the field, not the page. - -### `--key` badge - -The `--key` (default `default`) is what `dor ab --key …` targets, so with two or -more browser surfaces it's exactly what you need to see. A small badge renders -for **non-default keys only** (`default` is skipped), as a **separate element — -not a string prefix on the title** — because the title is persisted and we don't -want `(storybook)` leaking into saved state. It rides the chrome snapshot from -`params.key`; raw `--session` surfaces (no key) show no badge. - -### Dev-server connection - -When the active tab URL is **loopback** (`localhost` / `127.0.0.1` / `[::1]` / -`*.localhost`), Dormouse correlates `<port>` to the **terminal pane serving it** -and surfaces a clickable chip — e.g. `◉ pnpm dev :5173` — that **focuses that -terminal** on click (reattaching it first if it's minimized). Dormouse is the -only tool that owns both the browser surface and the terminals, and the building -block is `PlatformAdapter.getOpenPorts(id)` (the TCP ports a terminal's process -tree is listening on). - -Mechanics & wrinkles: - -- **Where it lives.** A panel can't see other panes' ports, so correlation lives - in the **Wall** (`use-dev-server-ports.ts` driving a shared store, - `agent-browser-ports.ts`); the header consumes the resolved `{ paneId, label }` - and clicks back into the Wall (`onFocusPane`) to focus the pane. -- **Which binds match.** A pane owns the port when it listens on it with a - localhost-reachable bind — loopback (`127.0.0.1` / `::1`) **or** any-interface - (`0.0.0.0` / `::`, which still answers `localhost`). A bind on one specific - non-loopback interface does not match. -- **Cost — strictly off the hot path.** `getOpenPorts` shells out (`lsof` / - PowerShell) on the host that also drives the screencast, so a scan never runs - synchronously on tab-open: it's **debounced + idle-scheduled** - (`requestIdleCallback`) so the opening tab's first screenshots come first. It - **scans once, then settles** — a matched port is remembered and not rescanned; - we only keep retrying (slow idle poll) while a wanted port is still *unmatched* - (the dev server may start after the tab). A **surface reload** un-settles and - re-validates, but optimistically — the current chip stays until the rescan - disagrees. At most one scan is in flight; visible panes and minimized doors are - both scanned (both keep live ptys). -- **Fallbacks (degrade to just the URL):** non-loopback URL; no pane listening on - the port; a bind on a specific non-loopback interface; a tunneled/proxied - domain; or two+ panes claiming the port (ambiguous). -- **Bidirectional (later):** a terminal serving a port could conversely show - "viewed in `surface:3`". Out of scope for now; the port store would make it - cheap. - -### Back / forward / refresh - -- **All three are native agent-browser commands** — `back`, `forward`, `reload` - — added to the `agentBrowserCommand` allowlist and issued like tab actions, no - eval fallback. -- **No enabled-state.** `canGoBack` / `canGoForward` aren't in the stream, so the - buttons are **always enabled** (a click at the ends no-ops) rather than greyed, - matching most embedded browsers. They are inert on hosts without - `agentBrowserCommand` (Tauri today), like the screen-modal resizes. - -## Screen Indicator & Viewport - -The surface viewport is governed by agent-browser's own `set viewport` / `set -device`. Dormouse does not invent a parallel "mode" enum; instead the header -carries a **two-state indicator that reflects reality**, and a modal that is -nothing more than a GUI front-end for those native `set` commands. - -### The indicator (SYNCED / SCALED) - -At the **far left of the header** (see Browser-Chrome Header), the chip shows one -of two derived states — never a stored mode: - -- **`SYNCED`** — the browser's live viewport (CSS pixels) equals the pane's CSS - pixel size, so the display maps 1:1 with no scaling. -- **`SCALED`** — anything else; the display is letterboxed/zoomed to fit the pane. - -The viewport is read from the stream (`status.viewportWidth/Height`, equal to -frame `metadata.deviceWidth/Height`) and compared against the pane's CSS size -(`getBoundingClientRect`). **DPR is not part of the comparison:** the screencast -is delivered at CSS-pixel resolution, so it never encodes the browser's device -pixel ratio (verified 0.27.0 — `set viewport 800 600 2` yields the same 800×600 -frame as `@1`); it is therefore unrecoverable from frames. Dormouse still -*issues* `displayDpr` when syncing so the page renders at the right density, but -the indicator is a pure CSS-size match. Because it is derived, the indicator is -correct no matter *how* the viewport was set — modal, `dor ab set …`, or a raw -`agent-browser` call. `SYNCED` is simply the case where the viewport equals the -pane. There is **no keyboard shortcut**. - -### The modal - -Clicking the indicator opens a modal — three mutually exclusive targets: - -``` -╭─ Screen — surface:3 ────────────────────────────────────────╮ -│ │ -│ Currently SCALED │ -│ browser 393×852 · pane 980×560 @2x │ -│ │ -│ ( ) Sync to pane │ -│ viewport follows the pane, pixel-for-pixel │ -│ → now: 980×560 @2x │ -│ │ -│ (•) Device all devices emulate touch + mobile UA │ -│ ┌──────────────────┐ ┌──────────────────┐ │ -│ │ • iPhone 16 │ │ iPhone 16 Pro │ │ -│ │ iPhone 17 │ │ iPhone 15 │ │ -│ │ Pixel 9 │ │ Galaxy S25 │ │ -│ │ iPad │ │ iPad Pro │ │ -│ └──────────────────┘ └──────────────────┘ │ -│ iPhone 16 · 393×852 │ -│ │ -│ ( ) Custom W [ 1280 ] H [ 720 ] DPI [ 1 ] │ -│ │ -│ [ Cancel ] [ Apply ] │ -╰──────────────────────────────────────────────────────────────╯ -``` - -Each target maps to a native command — the modal issues exactly what a user -could type: - -| Target | Native command issued | -| --- | --- | -| **Sync to pane** | `set viewport <paneCssW> <paneCssH> <displayDpr>`, re-issued (debounced ~200ms) on pane resize | -| **Device** | `set device <name>` — the fixed registry only (`iPhone 15`, `iPhone 16`, `iPhone 16 Pro`, `iPhone 17`, `iPad`, `iPad Pro`, `Pixel 9`, `Galaxy S25`); bundles viewport + DPR + touch + mobile UA | -| **Custom** | `set viewport <w> <h> <dpi>` | - -The device registry is fixed (no custom descriptors), and touch / mobile-UA are -**only** available bundled inside `set device` — there is no standalone touch -setting (verified against 0.27.0). So Sync/Custom are never touch; only Device -is. The modal **reads the live viewport on open** and pre-selects accordingly: -*Sync* if sync is engaged and matching, otherwise *Custom* pre-filled with the -current dims. Like the indicator, the modal reflects reality rather than a stored -intent. The CLI does not expose a device's dimensions ahead of time, so device -sizing is **apply-then-reflect**: choosing a device issues `set device <name>`, -and its detail line fills in from the next frames rather than being known up -front (the same gap means the modal cannot pre-select a device by matching dims). - -**Transparency with `dor ab set …`.** There is nothing extra to "expose" — the -modal *is* a GUI for native `agent-browser set`. Device/Custom issue the same -`set device` / `set viewport` a user runs as `dor ab set …`. Two issue paths -converge on one session — the terminal's `dor ab` execs agent-browser directly; -the webview modal goes through the host's `agentBrowserCommand` — and the daemon -serializes them. Whichever wrote last, the indicator and the modal's pre-fill -reflect it. - -**Sync is the one non-native concept.** agent-browser has no "follow the pane" -mode; *Sync to pane* is a Dormouse behavior that auto-issues native `set -viewport <pane>` and re-issues on resize. **A freshly created browser surface -auto-engages sync**, so it starts `SYNCED` — pixel-for-pixel and responsive to -the pane — rather than at agent-browser's native 1280×720. Coexistence is -**last-writer-wins**: Dormouse tracks the viewport it last issued (`lastIssued`) -and only treats a deviating frame as an external override once a frame has first -*confirmed* the issued size landed (so a resize transient isn't mistaken for an -external `dor ab set …`). When an external setter wins, Dormouse disengages sync -and the indicator falls to `SCALED`. - -> **Known limitation: no way to re-trigger sync from the CLI.** Because sync is -> not an agent-browser concept, `dor ab` has no verb for it; once an external -> `set` disengages sync, re-enabling it means reopening the modal and choosing -> *Sync to pane*. - -Persistence and degradation: - -- The only Dormouse-side state worth persisting is **whether sync is engaged**; - device/custom viewports live in the agent-browser session itself and survive - reattach. `syncEngaged` rides in the surface's dockview **panel params**, which - already round-trip through the serialized layout blob (the same channel that - carries `session`/`wsPort` across webview reloads), so it persists with no - `session-types.ts`/`session-save.ts` changes; the panel seeds its initial state - from `params.syncEngaged` (absent ⇒ fresh surface ⇒ auto-engage). -- Like tab actions, this inherits the `agentBrowserCommand` host capability: on - adapters that do not implement it (currently Tauri), modal-driven resizes are - inert. (`dor ab set …` from a terminal still works there, since it execs - agent-browser directly.) - -## Lifecycle - -Surface lifetime and browser lifetime are bound, both directions: - -- **Kill the surface → close the browser.** `dor kill` / the header `×` / - `dor ab … close` tears down the session (`agent-browser --session <resolved> - close`). -- **Session dies externally → tear down the surface.** If the browser exits - (crash, or a plain `agent-browser close` elsewhere), the stream reports - `connected: false`; the Wall removes or placeholders the surface. - -## Channels - -### Frames (out) — screenshot display, screencast-paced - -The stream's screencast is **CSS-resolution only**: Chromium's -`Page.startScreencast` captures in DIP and has no deviceScaleFactor/scale knob, -so on a HiDPI display its frames upscale to mush. (Verified against the CDP spec -— screencast metadata is defined in DIP, `maxWidth/maxHeight` only *downscale* — -and by probe: our own CDP screencast at `deviceScaleFactor: 2` still returns 1×; -only `Page.captureScreenshot` honors DPR. This is a Chromium limitation, not -agent-browser's, so owning the CDP connection wouldn't change it.) - -So Dormouse **displays device-resolution screenshots** and uses the screencast -purely as a **change signal**: - -- Port discovery: `agent-browser --session <s> stream status --json` → - `{ "port": <n>, ... }` ⇒ `ws://127.0.0.1:<n>`. Streaming is always enabled; - `AGENT_BROWSER_STREAM_PORT` pins a port. -- Each `{ "type": "frame", … }` message is a "page changed" **pulse**. The - frame's own JPEG is **not** decoded/drawn — in fact it is **not even parsed**: - frames are the only large stream messages (a base64 JPEG, ~150–220 KB at - desktop sizes; an animating page streams ~13 MB/s of them at 1080p/60fps that - we'd otherwise `JSON.parse` and throw away), so we pulse on any message over a - size threshold and skip the parse + allocation. The live viewport (for the - indicator and input mapping) comes from the small `status` messages, which fire - whenever it changes. Frame size is fixed to the viewport — the screencast has - no resolution/fps knob (only `AGENT_BROWSER_STREAM_PORT`), and its rate is - ~60fps regardless of size, so there's nothing to shrink anyway. -- On a pulse, capture a crisp frame via the host's `agentBrowserScreenshot` - (`agent-browser screenshot`, which honors the session viewport/DPR — device - resolution, e.g. 2560×1600 for a 1280×800@2 pane) and `drawImage` it to the - canvas. -- **Backpressure (latest-only, self-throttling):** at most one screenshot in - flight; a pulse during a shot sets a `dirty` flag (no queue — bursts collapse - to one follow-up, latest wins); a sequence guard drops out-of-order decodes; - the next shot waits ~1.5× the measured (EWMA) capture time since the last - start (≈⅔ duty), with a floor against tight loops. A static page produces no - pulses, hence no shots and no cost. (~17 fps JPEG q85 on an M-series Mac.) -- **Fallback:** on hosts without `agentBrowserScreenshot` (e.g. Tauri today), - render the CSS-resolution screencast frame directly instead. -- Pointer coordinates map through the pane rect vs `metadata` device size - (aspect-preserving; independent of the screenshot's pixel size). - -### Input (in) - -The stream WebSocket natively accepts input messages, so the webview sends -input on the **same socket it already opened for frames** — there is no CDP -connection and no host input proxy. (Verified against 0.27.0: `input_mouse` -press/release/move/wheel and `input_keyboard` keyDown/keyUp with `text` all -work, including scroll. The daemon dispatches to the active target itself, so -tab switches need no input re-attachment.) - -- Mouse: `{ type: "input_mouse", eventType: "mousePressed" | "mouseReleased" | - "mouseMoved" | "mouseWheel", x, y, button, clickCount, deltaX?, deltaY?, - modifiers }` — coordinates mapped from canvas space to device space via frame - `metadata`. -- Keyboard: `{ type: "input_keyboard", eventType: "keyDown" | "keyUp", key, - code, text, windowsVirtualKeyCode, modifiers }`. - -Keyboard caveats (all verified against 0.27.0): - -- **`text` must always be present.** The daemon silently drops any - `input_keyboard` whose `text` field is absent — arrows, Escape, modifier - keys, every chord. `text: ""` dispatches a proper non-text key event; - printable keyDowns carry the character. `text` is suppressed (sent as `""`) - while ctrl/cmd is held so chords act as chords rather than inserting text. -- **`windowsVirtualKeyCode` needs a real VK map**, never - `key.charCodeAt(0)` — `.` is char 46 = VK_DELETE, so periods turn into - Delete presses (agent-browser's own bundled viewer has this bug). -- **Paste is bridged.** cmd/ctrl-V types the *local* clipboard into the page - as per-character keyDown events; plain forwarding would paste the embedded - Chromium's own (empty) clipboard. -- **macOS native editing chords (cmd-A/C/X) are emulated via the host edit - channel,** not the stream. CDP `Input.dispatchKeyEvent` needs the `commands` - hint for OS-level editing on macOS, and the stream protocol drops it (upstream - limitation — see the filed issue). So instead of forwarding those chords, the - panel routes the *intent* to the host's `agentBrowserEdit(session, op)` - capability, which runs a host-owned `eval` over the daemon's CDP connection: - - `selectAll` → `el.select()` / `execCommand('selectAll')`. - - `copy` → read the selection, write it to the **OS clipboard**. - - `cut` → copy + delete the selection. - The webview only picks one of these three op names; the host owns the JS, so - this is a purpose-built channel, not arbitrary eval. **cmd-Z/⇧Z (undo/redo) - are not emulated** — `execCommand('undo')` is unreliable for CDP-typed input; - they remain no-ops pending the upstream `commands` fix. On hosts without the - capability (standalone/Tauri), the chords fall through to plain key - forwarding, so pages' own JS shortcuts still fire. - -Focus behaves like a terminal surface: click-to-focus; keystrokes forward to the -browser only while the surface is selected and in interact mode. Because Dormouse -owns the keydown listener (unlike an iframe), the leader chord always returns -control to the Wall. - -### Tabs - -The stream WebSocket pushes `{ type: "tabs", tabs: [{ tabId, title, url, -active }] }` messages, which feed the strip for free. Tab *actions* still go -through the CLI — `tab <n>` (switch), `tab close` (per-tab `×`) — issued by the -host on the webview's behalf (a webview cannot spawn processes; see -`agentBrowserCommand` below). - -## Implementation Touchpoints - -| Piece | Location | -| --- | --- | -| `dor ab` command (passthrough + `--key` intercept) | `dor/src/commands/agent-browser.ts` | -| Control method `surface.agentBrowser` request/response | `dor/src/commands/types.ts`, `dor/src/control-client.ts` | -| Surface component (canvas viewer + WS client + tab strip + screenshot loop + sync tracking + SYNCED/SCALED + chrome snapshot) | `lib/src/components/wall/AgentBrowserPanel.tsx` | -| Browser-chrome header (sync chip + back/fwd/reload + URL + key badge + dev-server chip; agent-browser surfaces only) | `lib/src/components/wall/SurfacePaneHeader.tsx` | -| Screen modal (Sync / Device-registry / Custom; issues native `set …`) | `lib/src/components/wall/AgentBrowserScreenModal.tsx` | -| Per-surface screen+chrome bridge (header↔body↔modal) + modal host | `lib/src/components/wall/agent-browser-screen.ts`, `lib/src/components/AgentBrowserScreenModalHost.tsx` | -| URL display/loopback-port parsing | `lib/src/components/wall/browser-url.ts` | -| Dev-server port→pane store (consumed by the header) + Wall-side correlation driver | `lib/src/components/wall/agent-browser-ports.ts`, `lib/src/components/wall/use-dev-server-ports.ts` | -| Per-surface `syncEngaged` persistence | dockview **panel params**, via the serialized layout blob (no `session-types.ts`/`session-save.ts` change) | -| Surface registration + control handler + key→session registry + `onFocusPane` | `lib/src/components/Wall.tsx` | -| Host capabilities + VS Code stream relay | `lib/src/lib/platform/types.ts`, `lib/src/lib/platform/vscode-adapter.ts`, `vscode-ext/src/agent-browser-host.ts`, `vscode-ext/src/message-router.ts` | - -### Host capabilities - -Narrow host capabilities back the surface, all optional on `PlatformAdapter` so -hosts degrade gracefully: - -- **`agentBrowserCommand(session, args)`** — runs the user's agent-browser - binary for tab actions (`tab <n>`, `tab close`, `tab new`), screen-mode - resizing (`set viewport`, `set device`), navigation (`open <url>`, `reload`, - `back`, `forward`), and lifecycle (`close`). The host validates `args[0]` - against an allowlist (`tab`, `set`, `screenshot`, `open`, `reload`, `back`, - `forward`, `close`); this is not a general exec channel. -- **`agentBrowserScreenshot(session, { format, quality })`** — captures one - device-resolution frame via `agent-browser screenshot` (which honors the - session DPR, unlike the screencast) and returns the raw bytes (a `Uint8Array` - over structured clone, no base64 round-trip). Drives the crisp display path; - absent ⇒ the panel falls back to rendering screencast frames. -- **`agentBrowserEdit(session, op)`** — host-owned `eval` for the macOS editing - chords (select-all/copy/cut) the stream input path can't dispatch. -- **`getAgentBrowserStreamUrl(port)`** — returns the WebSocket URL the webview - should use for the session stream (see CSP/origin below). - -> **Footgun:** these adapter methods use `this.requestResponse` internally and -> are **bound in the adapter constructor**, because the panel calls some through -> detached references (`getPlatform().agentBrowserScreenshot`) which would -> otherwise drop `this`. - -## VS Code Webview CSP and Stream Origin - -The VS Code webview CSP (`vscode-ext/src/webview-html.ts`) must allow the stream -WebSocket: - -``` -connect-src ws://127.0.0.1:* ws://localhost:* <existing cspSource> -``` - -The canvas is drawn from in-memory image bytes (`createImageBitmap` over a -`Blob`, never an `<img src>` to an external URL), so no `img-src` change is -needed, and no `frame-src` is involved — there is no iframe. - -CSP alone is not enough in VS Code: the agent-browser stream server rejects -WebSocket upgrades whose `Origin` is not localhost-or-absent (verified against -0.27.0: `vscode-webview://…` → 403; `tauri://localhost` and plain localhost → -allowed; no override env var exists). The VS Code extension host therefore runs -a loopback-only TCP relay that strips the `Origin` header and pipes bytes only -to a stream port it has explicitly authorized. `getAgentBrowserStreamUrl` asks -the host for a short-lived, one-use relay URL -(`ws://127.0.0.1:<relayPort>/stream/<streamPort>/<token>`); the relay rejects -requests without a matching token/port grant. The standalone (Tauri) webview -connects directly — its origin is allowed. - ---- - -# Future Expansions - -> Designed, not yet built. Everything above describes the surface as it exists -> today; everything below is planned. - -## Headed Pop-Out - -The headless + streamed-screenshot surface above is the default everywhere: it is -crisp, deterministic, and **uniformly portable** (no OS window, no positioning, -no DPI/Wayland concerns; works identically on win/mac/linux, in VS Code, and on -web). But streaming can't match a *real* window for hands-on interactivity — IME -composition, file uploads, smooth scrolling, native editing chords, extensions, -DevTools, native dialogs. **Pop-out** is the escape hatch: it relaunches the -surface's browser **headed**, as an ordinary OS window the user drives directly. -A deliberate, occasional mode, not the rendering path. - -Because Chrome's headed/headless choice is fixed at process launch (no live -toggle — verified), pop-out is a **relaunch**, not a move. The design embraces -that: the user interacts with the headed window natively, so Dormouse does -**not** screencast it — the in-Dormouse pane becomes a stub. This sidesteps the -headed-screencast, off-screen-occlusion, and window-tracking problems entirely. - -**Affordance.** A pop-out arrow in the surface header's action cluster, on -agent-browser surfaces only, gated on a host capability (hidden on web). GUI-only -— like *Sync to pane* it has no `agent-browser` equivalent, so no `dor ab` verb. -Because it is destructive of live state, the click is confirmed with a -`randomKillChar()`-style type-the-character overlay (mirror `KillConfirm`). - -**Identity-preserving relaunch.** Pop-out keeps the session name; only the Chrome -process changes (headed, new stream port). The key→`{session, surfaceId}` -registry is untouched, so `dor ab --key …` keeps driving the same surface -transparently. - -**State carried.** v1 preserves the **ordered tab URL list + which was active** -and reopens them in order. Lost in v1: live DOM, scroll, form inputs, -`sessionStorage`, and — because agent-browser uses an ephemeral temp profile — -**cookies/login**. The **profile-persistence spike** (stable user-data-dir or -`agent-browser state save`/`load`) is the wanted follow-up that makes pop-out -usable for authenticated sites. - -**The pane while popped out.** Stays open as a clean placeholder: copy that it's -in a separate window, a best-effort **Bring to front**, and **Pop back in** -(closes the window → triggers the revert below). Frame display / screenshots / -input / chip / tab strip are inert, but the stream WS stays connected to observe -`tabs`/`status` — we track the **last non-empty tab list** and watch for -`connected: false`. - -**Positioning.** Best-effort, one-time, does **not** follow: place the headed -window's content area over the pane's screen rect when the host can resolve it; -otherwise — **always in VS Code** (sandboxed webview) and **on Wayland** (clients -can't self-position) — center on the current monitor. - -**Window identity.** No control tab. *Bring to front* raises the OS window via -the host (by the session's process); a Dormouse-flavored window title isn't -guaranteed (a Chrome window's title follows its active tab). - -**Lifecycle.** The headed window ending and the surface being disposed are -**decoupled**: - -- **The headed window ends** — by any gesture (window `×`/`⌘⇧W`, or closing the - last tab; without a control tab these are indistinguishable) → **auto-revert**: - relaunch headless, resume streaming, reopen the **last non-empty tab list** in - order. So closing the final tab reopens *that* tab; closing a three-tab window - reopens those three. The surface is never lost this way. -- **Kill the Dormouse pane / `dor kill`** → the only teardown. -- **Dormouse/editor quits** → headed windows are cleaned up; no orphans. - -**Host capability & cross-platform.** Needs host support beyond -`agentBrowserCommand`: relaunch headed with window-position args, raise a window, -resolve the pane→screen rect. Adapters degrade rather than fail: - -| Host / platform | Spawn headed | Position over pane | Bring to front | -| --- | --- | --- | --- | -| Standalone (Tauri), macOS / Windows / Linux-X11 | yes | yes (best-effort) | yes | -| Standalone, Linux-**Wayland** | yes | **no** → center | best-effort / maybe no | -| VS Code (any OS) | yes | **no** → center (webview can't read screen coords) | best-effort | -| Web | **no** (affordance hidden) | — | — | - -Windows adds per-monitor / fractional-DPI math; Wayland can't self-position or -reliably raise, so it always centers. The feature is therefore a **platform-gated -enhancement**, never load-bearing — the streamed surface stays the portable -baseline on every target. - -## Other planned - -- **Profile persistence** (above) — also benefits the streamed surface (logins - survive daemon restarts), not just pop-out. -- **Re-trigger sync from the CLI** — a Dormouse-reserved `dor ab` verb, at the - cost of the first non-passthrough subcommand. -- **Undo/redo chords** — blocked on the upstream stream-input `commands` fix. diff --git a/docs/specs/dor-browser.md b/docs/specs/dor-browser.md new file mode 100644 index 00000000..7ad5133d --- /dev/null +++ b/docs/specs/dor-browser.md @@ -0,0 +1,453 @@ +# Dor Browser Surface + +> See `docs/specs/glossary.md` for canonical Session and Pane vocabulary, and +> `docs/specs/dor-cli.md` for the shared `dor` CLI, surface handle model, and +> host control plumbing this surface builds on. + +Dormouse has one dockview component for web content: `BrowserPanel`, persisted as +`surfaceType: 'browser'` with a swappable `renderMode`. + +Entry points: + +- `dor ab ...` / `dor agent-browser ...` forwards to the user's own + `agent-browser` binary and binds that agent-browser session to a browser pane. + Typical navigation is `dor ab open <url>`. +- `dor iframe <url>` opens an absolute `http://` or `https://` URL in the iframe + renderer. The proxy currently instruments only `http://` upstreams; `https://` + is accepted by the CLI but shown as an unproxyable scheme in the pane. + +Two independent axes define a browser pane: + +| Axis | Values | +| --- | --- | +| Target | A bare URL, or a future Dormouse-owned backend process | +| Render | `ab-screencast`, `ab-popout`, `iframe` | + +The render axis is a pane parameter, not a separate surface type. The `dor` CLI +still reports `iframe` or `agent-browser` as a legacy/informative surface type; +that is derived from `renderMode`. + +Source of truth: `lib/src/components/wall/BrowserPanel.tsx`, +`lib/src/components/wall/browser-surface.ts`, `lib/src/components/Wall.tsx` +(`surfaceTypeFromParams`, `componentForSurfaceType`, `createContentSurface`). + +## Canonical Params + +The persisted pane params are flat: + +```ts +type BrowserPanelParams = { + surfaceType?: 'browser'; + renderMode?: 'ab-screencast' | 'ab-popout' | 'iframe'; + url?: string; + session?: string; + key?: string; + wsPort?: number; + binaryPath?: string; + syncEngaged?: boolean; + poppedOut?: boolean; // legacy migration only +}; +``` + +Invariants: + +- `renderMode` is canonical. Legacy params (`surfaceType: 'iframe'`, + `surfaceType: 'agent-browser'`, `poppedOut`) are migrated by + `resolveRenderMode`. +- `url` is the canonical target across render swaps and relaunches. Agent-browser + mirrors the newest non-blank active tab URL into params; iframe persists only + navigations initiated by Dormouse chrome. +- Agent-browser session state is flat (`session`, `wsPort`, `binaryPath`, + `syncEngaged`, `key`), not nested. +- Every browser panel uses dockview `renderer: 'always'`, because moving iframe + DOM reloads it and moving the screencast canvas mid-click breaks click + synthesis. + +Source of truth: `BrowserPanel.tsx`, `browser-surface.ts`, `Wall.tsx` +(`rendererForParams`, `replaceSurface`), `AgentBrowserPanel.tsx` +(`rememberRestorableUrl`, URL mirror), `IframePanel.tsx` (`applyFrameUrl`). + +## Placement And Lifetime + +Both CLI entry points use the same content-surface placement rule in +`Wall.tsx:createContentSurface`: replace an untouched terminal caller in place; +otherwise split next to the reference surface. `dor iframe` also accepts +`--surface`, `--minimize`, and `--json`. + +Surface lifetime owns backing resources: + +- Killing an agent-browser-rendered pane marks the session closed and runs + `agent-browser close` through `closeAgentBrowserSession`. +- Swapping away from an agent-browser renderer closes the old session through + the same path. +- A popped-out window closing is normally auto-reverted to headless, but the + closed-session mark prevents Dormouse-initiated kill/swap from resurrecting it. +- Iframe proxy grants are currently reclaimed by the proxy idle sweep, not by an + immediate per-surface teardown hook. + +Source of truth: `Wall.tsx` (`killPaneImmediately`, `closeAgentBrowserSession`, +`replaceSurface`), `lib/src/components/wall/agent-browser-sessions.ts`, +`lib/src/host/iframe-proxy.ts` (`GRANT_IDLE_TTL_MS`, `MAX_GRANTS`). + +## Browser Chrome + +Browser chrome is keyed by presence of a screen controller. Agent-browser panels +register one, and iframe panels now register one unconditionally, so `dor iframe` +gets the same browser header on every host. Render swapping from iframe to +agent-browser is gated separately by host capabilities. + +Header contract: + +- Far-left chip opens the Display modal and reflects the render backend: + `iframe` frame glyph, `ab-popout` external-window glyph, `ab-screencast` + link/lock depending on whether viewport CSS size matches pane CSS size. +- Primary text is URL-oriented: host+path, with query omitted in the live header. + HTML title is tooltip/secondary state. +- Clicking the URL opens an inline editor. `normalizeNavUrl` keeps explicit + schemes, uses `http://` for bare loopback hosts, and `https://` otherwise. +- Back, forward, and reload are always enabled. Agent-browser sends native + `back` / `forward` / `reload`; iframe uses parent-side history and re-resolves + the proxy on reload/back/forward. +- Non-default managed `--key` renders as its own quiet badge, never as a title + prefix. Raw `--session` and iframe surfaces show no key badge. +- Split/zoom buttons hide below `420px`; nav buttons hide below `360px`; minimize + and kill remain. + +Source of truth: `lib/src/components/wall/SurfacePaneHeader.tsx`, +`lib/src/components/wall/agent-browser-screen.ts`, +`lib/src/components/wall/browser-url.ts`, Storybook +`lib/src/stories/BrowserChromeHeader.stories.tsx`. + +## Dev-Server Chip + +For loopback URLs (`localhost`, `*.localhost`, `127.0.0.1`, `[::1]`), the header +registers interest in the URL port. The Wall scans terminal panes and minimized +doors via `PlatformAdapter.getOpenPorts(id)` and shows a chip only when exactly +one terminal owns that port. + +Matching is intentionally narrow: a process bound to loopback +(`127.0.0.1`, `::1`) or any-interface (`0.0.0.0`, `::`) serves localhost; a +specific non-loopback bind does not. Scanning is debounced, idle-scheduled, and +polls only while a wanted port is still unmatched. Reload revalidates +optimistically. + +Source of truth: `lib/src/components/wall/use-dev-server-ports.ts`, +`lib/src/components/wall/agent-browser-ports.ts`, +`lib/src/components/wall/browser-url.ts`. + +## Display Modal And Render Swaps + +The Display modal is the sole GUI for changing render mode and screencast +resolution. + +Render options: + +- `ab-screencast`: live Chromium via agent-browser stream plus Dormouse canvas. +- `ab-popout`: same session relaunched headed as a native OS window. Hidden if + the host lacks `agentBrowserPopOut`. +- `iframe`: proxied iframe. Agents cannot drive it. + +Resolution controls apply only to `ab-screencast`. They are GUI wrappers around +native agent-browser commands: + +- Resize with pane: Dormouse-owned sync that issues + `set viewport <paneW> <paneH> <displayDpr>` on resize. +- Fixed: `set viewport <w> <h> <dpr>`. +- Device: `set device <name>` from the modal's fixed registry. + +Sync state is the only Dormouse-specific resolution state that persists +(`syncEngaged`). Device/custom viewport state lives in agent-browser itself. +`SYNCED`/`SCALED` is derived from viewport CSS dimensions versus pane CSS +dimensions; DPR is issued but not part of the comparison because stream frames +are CSS-resolution. Sync coexists with external `set viewport`/`set device` +last-writer-wins: Dormouse disengages sync (→ `SCALED`) only after a frame first +confirms its own issued size landed, so a resize transient is not mistaken for an +external override. + +Swap behavior: + +| From -> To | Behavior | +| --- | --- | +| `iframe` -> `ab-screencast` / `ab-popout` | Host spawns a fresh `gui-<hex>` agent-browser session at the current URL via `agentBrowserOpen`. Hidden/inert without that capability. | +| `ab-screencast` <-> `ab-popout` | Same session, headed/headless relaunch in `AgentBrowserPanel`; preserves only the active URL. | +| `ab-*` -> `iframe` | Uses canonical `params.url`; if multiple tabs exist, requires the user to press `c` in the warning overlay because only the active tab survives. | + +Source of truth: `lib/src/components/wall/AgentBrowserScreenModal.tsx`, +`AgentBrowserPanel.tsx` (`screenActions`, sync effects, pop-out/pop-in), +`Wall.tsx` (`onSwapRenderMode`), Storybook +`lib/src/stories/AgentBrowserScreenModal.stories.tsx`. + +## Agent-Browser Renderer + +Dormouse is a viewer/client for the user's installed `agent-browser`; it does +not bundle or fork Chromium behavior. `dor ab` intercepts only `--key` and +`--session`; every other argument is forwarded verbatim to: + +```sh +agent-browser --session <resolved-session> <args...> +``` + +The binary is resolved from `DORMOUSE_AGENT_BROWSER_BIN` or `PATH`. If present, +`dor ab` resolves an absolute `binaryPath` and passes it to the host because GUI +hosts may not share the terminal's shell PATH. + +Managed identity: + +- Default is `--key default`. +- `--key <name>` maps to `dormouse.1.<name>` and must match + `[A-Za-z0-9._-]+`. +- `--key` and raw `--session` are mutually exclusive. +- GUI-spawned sessions use `dormouse.1.gui-<hex>` and are not addressable by + `--key`. +- One agent-browser session maps to one Dormouse surface. Re-running `dor ab` + for an existing session refreshes `wsPort`/`binaryPath` and reuses the pane. + +Source of truth: `dor/src/commands/agent-browser.ts`, +`dor/src/commands/types.ts` (`AgentBrowserSurfaceRequest`), `Wall.tsx` +(`findAgentBrowserSurface`, `surface.agentBrowser` handling). + +### Agent-Browser Connection + +Each visible agent-browser surface owns one `AgentBrowserConnection` for +`{ session, streamPort, binaryPath }`. Minimize unmounts the panel and disposes +the connection; the agent-browser daemon/session stays alive and reattaches from +persisted params. + +The stream WebSocket provides: + +- frame pulses and status, +- tab snapshots, +- native `input_mouse` / `input_keyboard` input. + +Dormouse does not render the stream JPEG by default. The screencast is +CSS-resolution only — Chromium's `Page.startScreencast` captures in DIP with no +DPR knob, so its frames upscale to mush on HiDPI; this is a Chromium limit, not +agent-browser's, so owning the CDP connection wouldn't change it. So Dormouse +treats frame messages as change pulses, captures a crisp device-resolution +screenshot through the host's `agentBrowserScreenshot`, and draws that to canvas +with latest-only backpressure. If the host cannot screenshot, it falls back to the +stream frame path. + +Important input details: + +- `input_keyboard.text` is always sent; non-text keys use `text: ""`. +- `windowsVirtualKeyCode` comes from a real key map, never `key.charCodeAt(0)` + (`.` is char 46 = VK_DELETE, so periods would otherwise become Delete presses). +- Local paste is replayed as per-character key input. +- macOS select-all/copy/cut use the purpose-built host `agentBrowserEdit` + channel. Undo/redo is not emulated. + +Tabs live inside the agent-browser surface. The header is integrated for one tab; +the in-body tab strip appears for two or more. Tab select/close actions go +through `agentBrowserCommand`. + +Source of truth: `lib/src/components/wall/AgentBrowserPanel.tsx`, +`agent-browser-connection.ts`, `agent-browser-screenshot-loop.ts`, +`agent-browser-input.ts`, `agent-browser-tab.ts`, and their tests. + +### Pop-Out + +`ab-popout` relaunches the same session headed because Chrome headed/headless is +fixed at daemon launch. The pane becomes a stub with Pop back in, and optionally +Bring to front if a host implements `agentBrowserBringToFront`. + +State carried in v1: only the active non-blank URL. Other tabs, DOM state, +scroll, form inputs, session storage, and cookies/logins are not preserved across +the relaunch. The host kills the daemon before reopening so the headed/headless +mode actually changes, then reads a new stream port. Dormouse supplies that +active-tab URL; the host trusts it and does not query the daemon during the +close/reopen gap, because a `stream status` or tab query there can spawn a +competing blank daemon. + +While popped out, Dormouse keeps a stream/CDP observer so URL/header state follows +same-tab navigation and so a headed window close can auto-revert to headless. +Hosts close tracked popped-out sessions on shutdown to avoid orphan headed +windows. + +Source of truth: `AgentBrowserPanel.tsx` (pop-out state, CDP observer, +auto-revert), `lib/src/host/agent-browser-host.ts` (`popOut`, `popIn`, +`closePoppedOut`), VS Code/standalone shutdown wiring. + +### Agent-Browser Host Capabilities + +The `PlatformAdapter` methods are optional. The shared implementation is +`lib/src/host/agent-browser-host.ts`; VS Code imports it directly and standalone +runs the bundled copy through the sidecar/Rust adapter. + +Capabilities: + +- `agentBrowserCommand`: allowlisted CLI subcommands. Source of truth for the + allowlist is `AGENT_BROWSER_ALLOWED_SUBCOMMANDS` in + `lib/src/lib/platform/types.ts`; host-side `get` is further limited to + `get cdp-url`. +- `agentBrowserScreenshot`: one device-resolution JPEG/PNG frame. +- `agentBrowserStreamStatus`: current stream port for stale-`wsPort` recovery. +- `agentBrowserEdit`: select-all/copy/cut via fixed host-owned JS and OS + clipboard write. +- `getAgentBrowserStreamUrl`: direct stream URL or VS Code relay URL. +- `agentBrowserOpen`: spawn a GUI-owned session for iframe -> agent-browser. +- `agentBrowserPopOut` / `agentBrowserPopIn`: headed/headless relaunch. +- `agentBrowserBringToFront`: optional, currently not implemented by the real + hosts. + +VS Code needs a loopback relay for the stream because the agent-browser stream +server rejects `vscode-webview://` origins. The relay grants one authorized +stream port/token and strips the Origin header. Standalone connects directly. + +Source of truth: `lib/src/lib/platform/types.ts`, +`lib/src/host/agent-browser-host.ts`, `vscode-ext/src/agent-browser-host.ts`, +`vscode-ext/src/webview-html.ts`, `standalone/src/tauri-adapter.ts`, +`standalone/src-tauri/src/lib.rs`, `standalone/sidecar/main.js`. + +## Iframe Renderer + +`dor iframe <url>` frames the page's own DOM. It is zero-lag and good for local +human inspection, but agents cannot drive/read it like agent-browser. + +On hosts with `createIframeProxyUrl`, `IframePanel` frames a per-surface loopback +proxy URL. On hosts without it, it falls back to a raw uninstrumented iframe. + +The proxy instruments `http://` upstreams only: + +- Loopback HTTP: strip frame-blocking headers/CSP, inject the shim, pass through + HTTP and WebSocket traffic. +- Remote HTTP that permits framing: best-effort proxy with shim. +- Remote HTTP that refuses framing: served Dormouse error page with `dor ab` + hint, not forced embedding. +- Unreachable upstream: served Dormouse error page. +- HTTPS: synchronous `scheme` failure in the panel with `dor ab` hint. + +The proxy uses one dedicated `127.0.0.1:0` server per grant. There is no token in +the path; the dedicated origin is the grant boundary and preserves root-relative +resources/client routers without body URL rewriting. Grants have a sliding idle +TTL and a hard cap. + +Current limits: + +- Absolute-origin subresources such as `http://localhost:5173/...` and + `ws://localhost:5173/...` bypass the proxy. This is acceptable for loopback, + but those resources are not instrumented. +- The shim reclaims only Dormouse control messages. All ordinary keyboard and + pointer interaction stays inside the frame by design. +- Killed iframe panes wait for the proxy idle sweep until the generic + per-surface teardown hook exists. + +Source of truth: `lib/src/components/wall/IframePanel.tsx`, +`lib/src/host/iframe-proxy.ts`, `lib/src/host/iframe-proxy-rewrite.ts`, +`lib/src/lib/platform/iframe-proxy-types.ts`, and proxy tests. + +### Iframe Shim + +The injected shim is fixed Dormouse-owned code, not user-provided eval. It posts +only these messages to the parent: + +- `leader`: dual-tap Meta/Shift leader chord. +- `pointerdown`: genuine click inside the frame, used to select/focus the pane. +- `location`: same-frame navigation after history/hash/page events. +- `open-window`: intercepted `target=_blank` or `window.open` URL. + +Parent listeners validate the message origin against live proxy grants. Leader +messages feed the same Wall command-mode exit path as in-document dual-tap +handling. `IframePanel` maps proxy-origin `location` URLs back to upstream URLs +for chrome/history without reloading the frame. + +New-tab requests show an overlay prompt. Accept opens a new browser pane beside +the current one; cancel drops it. The shipped prompt does not directly switch the +current pane to agent-browser. + +Source of truth: `IFRAME_SHIM` in +`lib/src/host/iframe-proxy-rewrite.ts`, +`lib/src/lib/iframe-proxy-registry.ts`, +`lib/src/components/wall/use-wall-keyboard.ts`, `IframePanel.tsx`. + +### Iframe Focus And Rendering Notes + +- Cross-origin iframe focus blurs the parent window while `document.hasFocus()` + remains true; focus code must distinguish this from app backgrounding. +- Proxied frames use shim `pointerdown` for click adoption. Raw fallback uses the + older `window.blur` + active iframe heuristic. +- `registerSurfaceFocusHandle` focuses/blurs the iframe element like other + surfaces. +- `IframePanel` applies `transform: translateZ(0)` to its immediate container to + avoid Chromium out-of-process iframe pointer offsets caused by dockview + containment. +- The iframe sandbox omits `allow-top-navigation` to block framebusting while + allowing scripts, same-origin within the proxy origin, forms, popups, modals, + downloads, and common device/clipboard permissions. + +Source of truth: `IframePanel.tsx`, `lib/src/components/wall/use-window-focused.ts`, +`lib/src/lib/terminal-lifecycle.ts` (`registerSurfaceFocusHandle`). + +## Iframe Host Capability And CSP + +The optional adapter method is: + +```ts +createIframeProxyUrl?(targetUrl: string): Promise< + | { ok: true; url: string } + | { ok: false; reason: 'frame-refused' | 'unreachable' | 'scheme'; detail?: string } +>; +``` + +Reachability and frame refusal are normally diagnosed lazily by served error +pages after the iframe loads the proxy URL, so v1 mostly returns `ok` or +`scheme`. + +VS Code routes this through webview request/response messages to +`vscode-ext/src/iframe-proxy-host.ts`. Standalone routes through +`standalone/src/tauri-adapter.ts` -> `standalone/src-tauri/src/lib.rs` -> +sidecar `iframe:createProxyUrl`. + +The VS Code webview CSP must allow loopback frames: + +```txt +frame-src http://127.0.0.1:* http://localhost:* +``` + +Security boundaries: + +- proxy binds loopback only, +- each grant fronts exactly one upstream, +- no user script is injected, +- refusing remote sites are diverted to an error page, +- link-local/cloud-metadata ranges are blocked, +- other user-supplied `http://` targets are trusted as the user's command. + +Source of truth: `lib/src/lib/platform/types.ts`, +`lib/src/lib/platform/vscode-adapter.ts`, `vscode-ext/src/message-types.ts`, +`vscode-ext/src/message-router.ts`, `vscode-ext/src/webview-html.ts`, +`standalone/src/tauri-adapter.ts`, `lib/src/host/iframe-proxy-rewrite.ts`. + +## Code Map + +- CLI: `dor/src/commands/agent-browser.ts`, `dor/src/commands/iframe.ts`, + `dor/src/commands/types.ts`. +- Shell/render swap/lifecycle: `lib/src/components/Wall.tsx`, + `lib/src/components/wall/BrowserPanel.tsx`, + `lib/src/components/wall/browser-surface.ts`. +- Chrome/modal: `SurfacePaneHeader.tsx`, `AgentBrowserScreenModal.tsx`, + `agent-browser-screen.ts`, `browser-url.ts`. +- Agent-browser renderer: `AgentBrowserPanel.tsx`, + `agent-browser-connection.ts`, `agent-browser-input.ts`, + `agent-browser-screenshot-loop.ts`, `agent-browser-tab.ts`, + `agent-browser-sessions.ts`. +- Iframe renderer/proxy: `IframePanel.tsx`, `iframe-proxy-registry.ts`, + `lib/src/host/iframe-proxy.ts`, `lib/src/host/iframe-proxy-rewrite.ts`, + `lib/src/lib/platform/iframe-proxy-types.ts`. +- Host adapters: `lib/src/host/agent-browser-host.ts`, + `vscode-ext/src/agent-browser-host.ts`, `vscode-ext/src/iframe-proxy-host.ts`, + `standalone/src/tauri-adapter.ts`, `standalone/src-tauri/src/lib.rs`, + `standalone/sidecar/main.js`. + +## Future Work + +- Stable agent-browser profile/state persistence so pop-out preserves logins, + cookies, tabs, DOM state, and scroll. +- CLI affordance to re-engage Dormouse sync-to-pane. +- Upstream support for stream keyboard `commands`, replacing the host edit + workaround and enabling undo/redo. +- General per-surface teardown hook for iframe proxy grants and future + Dormouse-owned backend processes. +- Plugin/backend target axis: spawn, health-check, proxy, and reap a local web + process such as `openvscode-server`. +- Optional terminal-side "this port is viewed by surface:N" indicator. diff --git a/docs/specs/dor-cli.md b/docs/specs/dor-cli.md index 09eb8c01..df561528 100644 --- a/docs/specs/dor-cli.md +++ b/docs/specs/dor-cli.md @@ -165,10 +165,11 @@ from `command-detail`. - `dor read` [impl](../../dor/src/commands/read.ts) [docs](../../dor/test/snapshots/help/read.md) - `dor kill` [impl](../../dor/src/commands/kill.ts) [docs](../../dor/test/snapshots/help/kill.md) - `dor iframe` — **provisional**; high-fidelity URL embed with structural - limitations, see [dor-iframe.md](dor-iframe.md). + limitations; the `iframe` renderer of the unified `browser` surface, see + [dor-browser.md](dor-browser.md). [impl](../../dor/src/commands/iframe.ts) [docs](../../dor/test/snapshots/help/iframe.md) - `dor agent-browser` / `dor ab` — delegates to the user's `agent-browser`, - rendered in a Dormouse-native surface; see [dor-agent-browser.md](dor-agent-browser.md) - (the chosen alternative to the iframe surface) + rendered in a Dormouse-native surface; the `ab-screencast` renderer of the + unified `browser` surface, see [dor-browser.md](dor-browser.md) - `dor list-panes` [impl](../../dor/src/commands/list-panes.ts) [docs](../../dor/test/snapshots/help/list-panes.md) - `dor list-pane-surfaces` [impl](../../dor/src/commands/list-pane-surfaces.ts) [docs](../../dor/test/snapshots/help/list-pane-surfaces.md) diff --git a/docs/specs/dor-iframe.md b/docs/specs/dor-iframe.md deleted file mode 100644 index 33a19036..00000000 --- a/docs/specs/dor-iframe.md +++ /dev/null @@ -1,334 +0,0 @@ -# Dor Iframe Surface - -> See `docs/specs/glossary.md` for canonical Session and Pane vocabulary, -> `docs/specs/dor-cli.md` for the shared `dor` CLI, surface handle model, and -> host control plumbing this surface builds on, and -> `docs/specs/dor-agent-browser.md` for the sibling browser surface. - -`dor iframe <url>` opens an absolute `http(s)` URL in a high-fidelity `<iframe>` -surface for human inspection. The iframe renders the page's **own DOM** directly -— zero-lag and pixel-perfect — but in a **separate browsing context** that the -browser, not Dormouse, drives. - -The surface no longer points the `<iframe>` at the target directly. It fronts the -target with a **host-owned transparent proxy**: Dormouse serves the bytes, so it -controls them. That converts the iframe from a blind embedder into -Dormouse-served content, which is the one capability the raw iframe lacked — and -from it the surface gains a keyboard side-channel for its global leader chord, an -accurate focus model, and real error pages. - -> Status: **works for loopback dev servers** in hosts that can run the shared -> Node proxy (VS Code extension host and standalone/Tauri sidecar). Arbitrary -> web browsing is still better served by the **agent-browser** surface (`dor ab`, -> see [dor-agent-browser.md](dor-agent-browser.md)): the iframe surface proxies -> `http://` upstreams (loopback dev servers are overwhelmingly plain http), -> defers `https://`, and routes a remote that refuses framing to an error page -> pointing at `dor ab`. - -## The CLI → surface - -`dor iframe <url>` (`dor/src/commands/iframe.ts`) sends a `surface.iframe` control -request; `parseIframeUrl` constrains inputs to absolute `http://`/`https://` -(Dormouse does not infer schemes). Placement follows the shared content-surface -rule (`lib/src/components/Wall.tsx` → `createContentSurface`): an untouched -terminal caller is replaced in place, anything else gets a split next to the -caller. - -`IframePanel.tsx` then asks the host to front the target with its proxy -(`getPlatform().createIframeProxyUrl`) and frames the returned loopback URL. If -the host has no proxy it falls back to a raw `<iframe src={url}>`; if the target -isn't proxyable (e.g. an `https://` URL) it shows an actionable message instead. - -## The Transparent Proxy (Instrumented Iframe) - -This is the **substrate** the surface is built on. Instead of pointing the -`<iframe>` at the target, Dormouse points it at a loopback proxy -(`lib/src/host/iframe-proxy.ts`) that fetches the target and serves it back. The -moment Dormouse serves the bytes, two things become possible that the raw iframe -cannot do: - -1. **Inject a keyboard side-channel** so Dormouse's global leader chord keeps - working inside the frame (the technique VS Code uses for its own webviews). -2. **See the upstream result**, so a refused-to-be-framed page or a dead server - becomes a clear error page instead of a blank pane. - -### The one load-bearing fact - -Same-origin policy blocks the **parent from reaching into the child** -(`iframe.contentDocument` throws cross-origin). It does **not** block the **child -from posting to the parent** — `window.parent.postMessage()` is cross-origin-safe -by design. Dormouse can never reach *in*, but a script *we put in the served HTML* -can always call out. The only capability we need is control over the served -bytes, which the proxy gives us. This is exactly how VS Code instruments its -plugin webviews — serve HTML from an origin you own, inject a bootstrap — not a -new technique, the proven one. - -### Target policy: loopback instruments, remote diagnoses - -| Target | Proxy behavior | -| --- | --- | -| **Loopback** (`localhost`/`127.0.0.1`/`[::1]`) http | Full instrument: strip `X-Frame-Options`, drop the page CSP, inject the shim, serve. The user spawned it; framing is the intent. | -| **Remote** http, frameable | Best-effort render with the injected shim (flagged that `dor ab` is the better tool for arbitrary browsing). A CSP `frame-ancestors *` is considered frameable; scoped sources such as `https://*.example.com` are restrictive. | -| **Remote** http, refuses framing | **Never force-framed.** Serve a Dormouse error page with a one-click hint to `dor ab open <url>`. | -| **Unreachable** (conn refused, DNS, non-2xx) | Serve a Dormouse error page ("is the dev server running?"). | -| **`https://`** | Deferred. The panel reports `scheme` and points at `dor ab`. | - -We do **not** force-frame a site that refuses embedding: rewriting authenticated -cross-origin pages is an auth/cookie/ToS dead end, and that's what the -agent-browser surface is for. v1 proxies `http://` upstreams plus WebSocket -upgrades; `https://` is deferred. - -### Proxy mechanism - -Modeled on the agent-browser **stream relay** -(`vscode-ext/src/agent-browser-host.ts`) — already a loopback-only, single-purpose -forwarder. The difference: this proxy speaks **HTTP** (it parses and rewrites -responses) and passes through **WebSocket upgrades** (dev-server HMR, -openvscode-server's connection). - -Source of truth: `lib/src/host/iframe-proxy.ts` owns the shared HTTP/WebSocket -server, while `lib/src/host/iframe-proxy-rewrite.ts` owns the dependency-free -policy, HTML instrumentation, framing checks, and served error pages. - -- **Per-grant dedicated loopback server.** Each grant gets its own ephemeral - `http.Server` bound to `127.0.0.1:0`, fronting exactly **one** fixed upstream. - The grant's *origin* is the grant. Two consequences, and they are deliberate - departures from the original "single shared port + `…/<token>/<path>`" sketch: - - Root-relative sub-resources (`/assets/x.js`) and absolute paths resolve and - proxy **transparently with zero body rewriting** — a shared origin can't do - this without rewriting (the old Open Decision #2). - - **No token in the URL.** A path-prefix token would land in - `location.pathname`, and a client-side router (a React-Router/Remix dev - server) would match no route and render its own 404. A server bound to a - single upstream is inherently not an open forwarder, so the token bought - nothing here that the dedicated server doesn't already provide. - - Grants use a sliding idle TTL with a lazy sweep (a live iframe refreshes on - every request; an idle grant's server is closed), plus a hard cap. -- **Response rewriting.** For `text/html` from a frameable upstream: strip - `X-Frame-Options`, drop the page CSP (response header **and** any - `<meta http-equiv>`), inject the shim before `</head>`. `Host`/`Origin`/`Referer` - are rewritten to the upstream so origin-aware servers (`Vary: Origin`, CSRF - checks) see a same-origin request, and a `Location` redirect back to the - upstream origin is rewritten to the proxy origin so it doesn't bounce the frame - off-proxy. Non-HTML passes through (framing/hop-by-hop headers still stripped). - The initial framed proxy URL preserves the target's path, query, and fragment; - the fragment remains browser-only and is not sent on upstream HTTP requests. - HTML injection streams: the proxy buffers only until `</head>`, `<body>`, or a - bounded prefix cap, instruments that prefix, then pipes the rest of the - upstream response through without waiting for the full document. -- **WebSocket passthrough.** Upgrades are forwarded as a raw byte pipe once the - upgrade head is rewritten (`Host`/`Origin` → upstream), exactly like the stream - relay. -- **Anti-framebust.** The `<iframe>` uses a `sandbox` without - `allow-top-navigation`, so a tool's `if (top !== self) top.location = …` cannot - navigate the Wall away. - -### The iframe shim message channel (resolves #1 and proxied click adoption) - -A fixed, Dormouse-owned script — like agent-browser's `EDIT_SCRIPTS`, never -user-supplied, so it is not an eval vector — injected inline before `</head>`. -It posts only Dormouse-owned control messages to the parent: - -- `leader`: the reserved leader chord (dual-tap ⌘ / ⇧, the same detection as - `handle-dual-tap.ts`). -- `pointerdown`: genuine user pointerdown inside the cross-origin frame, so the - panel can adopt the click as pane selection + passthrough entry. - -Every other keystroke and pointer event flows to the tool untouched. The -embedded tool (a code editor, a VS-Code-web workbench) keeps full keyboard -interactivity; Dormouse keeps its one global chord and a click-adoption signal. - -The Wall already owns a capturing `window` keydown listener -(`use-wall-keyboard.ts`); it gains a `message` listener that validates -`event.origin` against the live proxy grants (`lib/src/lib/iframe-proxy-registry.ts`) -and feeds the forwarded chord into the same dispatch the in-document dual-tap -would (`exitTerminalMode`) — no synthesized `KeyboardEvent` round-trip. The -iframe panel separately listens for the same validated proxy origin and treats a -`pointerdown` message as `onClickPanel(api.id)`. - -### Accurate focus model (resolves #2 and #3) - -- **#2** — focusing one of our iframes fires `blur` on the parent `window` even - though the app isn't backgrounded; the focused element is just an `<iframe>` - *inside* our document, so `document.hasFocus()` stays **true**. - `use-window-focused.ts` and Wall's blur handler read it instead of blindly going - inactive, so headers/attention stay live when an iframe takes focus. -- **#3** — `IframePanel` registers a focus handle (`registerSurfaceFocusHandle` in - `terminal-lifecycle.ts`) so `focusSession` focuses the frame element like any - other surface. Because clicking *into* a cross-origin frame doesn't bubble a - `mousedown` to the pane, proxied frames adopt the shim's validated - `pointerdown` message as entering the pane. The raw fallback has no shim, so it - preserves the older focus heuristic: window `blur` while our iframe is - `document.activeElement` and the app still has focus. Both paths keep - mode/selection consistent when the frame owns focus. - -### Real error signals (resolves #4) - -With the proxy, **Dormouse is the server**, so it diagnoses lazily and serves a -precise error *page* from the proxy origin (which frames fine): a refused remote → -"`<host>` refuses to be embedded; `dor ab open <url>`"; a dead upstream → "nothing -responding at `localhost:8080` — is the dev server running?". `createIframeProxyUrl` -itself returns `{ ok: false, reason }` only for the synchronous cases (chiefly an -unproxyable `scheme`); reachability and frame-refusal are surfaced as served -pages. These served error pages include the same fixed leader shim as proxied -HTML, so the keyboard escape path still works after the user clicks inside an -error state. - -### Cursor alignment (the out-of-process-frame offset) - -A cross-origin iframe is an **out-of-process frame**; Chromium maps pointer events -to it relative to its nearest compositing/containing ancestor. Dockview's root -(`.dv-dockview`) sets `contain: layout`, so without intervention clicks land offset -by the pane's distance from that root (hundreds of px for a split pane). -`IframePanel` gives the iframe's **immediate container** its own identity layer -(`transform: translateZ(0)`), co-located with the frame, so it becomes the nearest -reference and the offset collapses to ~0. It's an identity transform, so -`getBoundingClientRect` (overlay measurement) is unaffected. Same-origin surfaces -(xterm, the agent-browser canvas) are immune — they recompute from -`getBoundingClientRect`, which a cross-origin frame can't. - -### Host capability and CSP - -A new optional `PlatformAdapter` method mirroring `getAgentBrowserStreamUrl` -(`lib/src/lib/platform/types.ts`), so hosts degrade gracefully: - -```ts -createIframeProxyUrl?(targetUrl: string): Promise< - | { ok: true; url: string } - | { ok: false; reason: 'frame-refused' | 'unreachable' | 'scheme'; detail?: string } ->; -``` - -VS Code implements it in the extension host (`vscode-ext/src/iframe-proxy-host.ts`), -routed via `message-router.ts` / `message-types.ts` and the `vscode-adapter.ts` -request/response pair. Standalone implements the same adapter method through -`standalone/src/tauri-adapter.ts` → `iframe_create_proxy_url` in -`standalone/src-tauri/src/lib.rs` → the sidecar's `iframe:createProxyUrl` command, -which loads the bundled shared proxy. **The proxy is needed on every host** — -even where a Tauri webview could frame `http://127.0.0.1` directly for origin -reasons, injection still requires controlling the bytes. Hosts with no process -to run one (the web host) omit the method and the panel falls back to a raw, -uninstrumented `<iframe>`. - -With the proxy, the VS Code webview CSP (`vscode-ext/src/webview-html.ts`) narrows -from the old broad `frame-src http: https:` to the loopback proxy origin only: - -``` -frame-src http://127.0.0.1:* http://localhost:* -``` - -### Security model - -The same fences as the stream relay: loopback-only bind both sides; a per-surface -grant served by a dedicated **single-upstream** server (no open forwarder); the -injected shim is fixed and Dormouse-owned (no user script ever reaches the page); -header-stripping never force-frames a third-party site (a refusing remote is -diverted to an error page, not stripped). For CSP, only a standalone -`frame-ancestors *` is permissive; wildcard host patterns remain restrictive. -**SSRF:** the proxy fetches a user-supplied URL, so it refuses link-local / -cloud-metadata ranges -(`169.254.0.0/16`, `fe80::/10`) and trusts other ranges — the trust boundary is the -user's own `dor iframe <url>`. - -## Remaining limitations - -### Inherent (designed around, not patchable) - -- **Focus still leaves Dormouse for the tool's own keys.** The shim reclaims only - the leader chord; every other keystroke fires in the frame's document and never - reaches the Wall's `window` listener — *by design*, so the embedded tool keeps - full keyboard interactivity. The same-origin policy means the parent can't - observe those keys; the leader is the one chord we round-trip. - -### Known v1 gaps - -- **`https://` upstreams are deferred** — the panel shows a `scheme` message with a - `dor ab` hint. -- **Absolute-origin sub-resources bypass the proxy.** The dedicated-port origin - makes root-relative URLs proxy transparently, but a dev server that emits - absolute `http://localhost:5173/…` (notably Vite's HMR `ws://localhost:5173/…`) - connects straight to the upstream — uninstrumented, though harmless for loopback - since the browser can reach it. -- **No teardown-on-kill hook yet.** A killed iframe surface's proxy server is - reaped by the idle sweep, not immediately on kill. (The shared teardown hook is - tracked under Path 2 below.) - ---- - -# Future Work - -> Designed, not yet built. Everything above is implemented; everything below is -> the roadmap. Both paths reuse the proxy + shim substrate unchanged. - -## Render Backends: Two Axes - -With the proxy in place, "view a web thing in a pane" factors into two -**independent axes**, and the agent-browser and iframe surfaces are just cells in -the grid: - -| | **Target: just a URL** | **Target: a backend Dormouse spawns & owns** | -| --- | --- | --- | -| **Render: screencast** (agent-browser) | `dor ab open <url>` — today | (possible, rarely wanted) | -| **Render: embed** (proxy + shim iframe) | `dor iframe <localhost>` — today | **the plugin system (Path 2)** | - -- **Render axis** = *how* you see it. `screencast` (real Chromium, agent-drivable, - any URL, laggy) vs `embed` (the page's own DOM, zero-lag, loopback-only, not - agent-drivable). -- **Target axis** = *what* you point at. A bare URL, vs. a URL whose backend - process Dormouse spawns and reaps. - -The shim and proxy live **entirely in the `embed` render backend**, which is why -both paths below reuse them unchanged — they differ only in which other axis they -exercise. - -## Path 1 — Swappable Render Backend - -Expose the **render axis** as a per-pane choice: same target, switch screencast ↔ -embed. This is the hedge on the agent-browser bet — if the screencast's lag is -unacceptable for a local dev server, one gesture swaps it to the zero-lag embed. - -**Do not fuse the surfaces into a dual-mode mega-component.** `AgentBrowserPanel` -is already large and the input models differ fundamentally (CDP `input_*` messages -vs native DOM). Instead, make the swap a **layout operation: replace the pane's -renderer in place, preserving the target.** `createContentSurface` already replaces -an untouched terminal in its slot — generalize that to "replace surface X with -surface Y at the same dock position," triggered by a header affordance ("open in -iframe" / "open in browser"). - -## Path 2 — Plugin System - -Extend the **target axis** with process ownership: Dormouse spawns the plugin's -backend (an HTML editor, `openvscode-server` / `code serve-web`, any local tool) -and renders it through the same `embed` backend. The rendering is solved by the -proxy; what Path 2 adds is a **process supervisor** — spawn (in what cwd/env? -per-workspace or per-pane? reuse across panes?), allocate/health-check the port, -wire the proxy to it, and **reap the process when the pane is killed.** - -That last requirement is the second consumer that justifies generalizing the -per-surface **teardown-on-kill hook** — the one-off `agent-browser` close currently -special-cased in `Wall.tsx`'s `killPaneImmediately` (see -[dor-agent-browser.md](dor-agent-browser.md) → Lifecycle), and the same hook the -iframe proxy server would use to close immediately instead of waiting for the idle -sweep. - -**Risk specific to the motivating example (code-server / openvscode-server):** it -will stress the substrate more than a Vite dev server — worth a spike before -committing. It needs a broader `sandbox` (`allow-same-origin allow-scripts -allow-forms allow-popups allow-downloads allow-modals`, still omitting -`allow-top-navigation`); may want COOP/COEP for cross-origin isolation -(SharedArrayBuffer) — verify against the actual target; and leans on WebSocket -passthrough harder than most. - -## Decisions made in v1 - -1. **Remote frameable targets** → render best-effort with the leader shim - (loopback fully instrumented; a remote that refuses → error page). Keeps a crisp - "loopback = full instrument, remote = best-effort, refusing = `dor ab`" line. -2. **Absolute-origin sub-resources** → left as a known gap. The dedicated-port - origin solves root-relative URLs for free; absolute upstream URLs bypass the - proxy rather than rewriting response bodies. -3. **SSRF range-blocking** → refuse link-local/metadata, trust other ranges (the - command is the user's own). -4. **Web host** → keep the blind raw-iframe fallback rather than hiding the - surface. diff --git a/docs/specs/layout.md b/docs/specs/layout.md index a137d925..e97a3d0b 100644 --- a/docs/specs/layout.md +++ b/docs/specs/layout.md @@ -11,7 +11,12 @@ A Session's **View** state places it in one of two containers: - **Pane** — a visible container in the content area. The session's terminal output is rendered via xterm.js. The pane has a header with controls and acts as the drag handle for layout rearrangement. - **Door** — a minimized container in the baseboard. The session is still alive (PTY running, output buffered) but not visible. The door shows the session's title plus alert and TODO indicators, and looks like a mouse hole cut into the baseboard. -Transitioning between Pane and Door does not alter the Session in any way. Minimizing a pane creates a door; reattaching a door creates a pane. The terminal content, scrollback, and process state are preserved across transitions. +Transitioning between Pane and Door does not alter the Session in any way. +Minimizing a pane creates a door; reattaching a door creates a pane. Terminal +content, scrollback, and process state are preserved across transitions. For +non-terminal browser surfaces, the backing browser session remains alive while +the visible viewer resources are released: no canvas, screencast WebSocket, +screenshot loop, or input forwarding runs while the surface is a Door. ## Shell layout diff --git a/docs/specs/transport.md b/docs/specs/transport.md index 44afcc8a..a297678b 100644 --- a/docs/specs/transport.md +++ b/docs/specs/transport.md @@ -10,8 +10,24 @@ Each platform adapter wraps a PTY-spawning runtime and a transport channel betwe |---|---|---| | VS Code extension | extension host (Node.js) | `vscode.Webview.postMessage` ↔ `acquireVsCodeApi().postMessage` | | Standalone (Tauri) | sidecar process | Tauri command/event bridge | +| Standalone browser-dev | sidecar process + local dev HTTP bridge | fetch commands + Server-Sent Events | | Fake (tests, playground) | in-process | direct function calls / event emitter | +### Standalone browser-dev harness + +Source of truth: `standalone/scripts/dev-agent-browser.mjs`, `standalone/src/browser-sidecar-host.ts`, and `standalone/src/browser-sidecar-adapter.ts`. + +`pnpm dev:standalone:ab` starts the standalone sidecar directly, starts a localhost-only HTTP bridge, starts Vite with `VITE_DORMOUSE_BROWSER_DEV_HOST`, and opens the app URL in an `agent-browser` session. The browser build uses `BrowserSidecarAdapter` instead of `TauriAdapter` when that env var is present. + +The browser-dev bridge is intentionally a transport shim over the same sidecar protocol, not a second PTY implementation: + +- Webview → host fire-and-forget commands use `POST /__dormouse_dev_host/send`. +- Webview → host request/response commands use `POST /__dormouse_dev_host/invoke`. +- Host → webview events use `GET /__dormouse_dev_host/events` as an SSE stream. +- Browser console calls are mirrored to `POST /__dormouse_dev_host/console` so a single `pnpm dev:standalone:ab` terminal shows sidecar logs, Vite logs, and in-browser diagnostics. + +The harness may omit native-only desktop chrome such as window controls and update checks, but it must preserve the `PlatformAdapter` PTY, control-request, clipboard, iframe-proxy, and agent-browser contracts used by the app. Tauri APIs must not be required at static module-evaluation time when `VITE_DORMOUSE_BROWSER_DEV_HOST` is set, because the page is loaded by a normal browser rather than the Tauri WebView. + ## PTY lifecycle PTYs are managed by the platform host, not by the webview. The webview is a view layer that **resumes** over live PTYs (host-preserved) or **restores** from a Snapshot (cold start). See `docs/specs/glossary.md` for the Process / Link states. diff --git a/lib/.storybook/preview.ts b/lib/.storybook/preview.ts index fa967607..6200ca1b 100644 --- a/lib/.storybook/preview.ts +++ b/lib/.storybook/preview.ts @@ -1,5 +1,5 @@ import type { Preview } from '@storybook/react'; -import { useEffect, useLayoutEffect } from 'react'; +import { useEffect, useLayoutEffect, StrictMode } from 'react'; import { createElement } from 'react'; import '../src/theme.css'; import '../src/index.css'; @@ -128,6 +128,9 @@ const preview: Preview = { theme: DEFAULT_STORYBOOK_THEME, }, decorators: [ + // Exercise React StrictMode (dev double-invoke) so story-rendered components + // get the same correctness checks as the app entries (lib web + standalone). + (Story) => createElement(StrictMode, null, createElement(Story)), // Theme switcher: inject --vscode-* CSS variables (Story, context) => { const requestedThemeName = context.globals.theme as string | undefined; diff --git a/lib/package.json b/lib/package.json index eeceedaa..eaa9aa8f 100644 --- a/lib/package.json +++ b/lib/package.json @@ -10,7 +10,7 @@ "preview": "vite preview", "test": "vitest run", "test:watch": "vitest", - "storybook": "storybook dev -p 6006 --no-open", + "storybook": "storybook dev -p 6006 --no-open --ci", "build-storybook": "storybook build" }, "dependencies": { diff --git a/lib/src/components/MobileTerminalUi.test.tsx b/lib/src/components/MobileTerminalUi.test.tsx index 11cf404c..dc30d82b 100644 --- a/lib/src/components/MobileTerminalUi.test.tsx +++ b/lib/src/components/MobileTerminalUi.test.tsx @@ -1,7 +1,7 @@ /** * @vitest-environment jsdom */ -import { act } from 'react'; +import { act, StrictMode } from 'react'; import { createRoot, type Root } from 'react-dom/client'; import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest'; import { MobileTerminalUi, type MobileTerminalTouchMode } from './MobileTerminalUi'; @@ -55,11 +55,13 @@ function renderMobileTerminal({ const renderWith = (mode: MobileTerminalTouchMode) => { act(() => { root.render( - <MobileTerminalUi - activeTouchMode={mode} - cursorTouchAvailable - terminal={<div data-testid="terminal" />} - />, + <StrictMode> + <MobileTerminalUi + activeTouchMode={mode} + cursorTouchAvailable + terminal={<div data-testid="terminal" />} + /> + </StrictMode>, ); }); }; diff --git a/lib/src/components/Wall.tsx b/lib/src/components/Wall.tsx index 503798a6..f5221a32 100644 --- a/lib/src/components/Wall.tsx +++ b/lib/src/components/Wall.tsx @@ -10,6 +10,8 @@ import 'dockview-react/dist/styles/dockview.css'; import { Baseboard } from './Baseboard'; import { ExternalLinkModalHost } from './ExternalLinkModalHost'; import { AgentBrowserScreenModalHost } from './AgentBrowserScreenModalHost'; +import { getAgentBrowserScreenController } from './wall/agent-browser-screen'; +import { markAgentBrowserSessionClosed } from './wall/agent-browser-sessions'; import { KILL_CONFIRM_MS, KILL_SHAKE_MS, KillConfirmOverlay, randomKillChar, type ConfirmKill } from './KillConfirm'; import { clearSessionAttention, @@ -53,8 +55,8 @@ import type { PersistedDoor } from '../lib/session-types'; import { useDynamicPalette } from '../lib/themes/use-dynamic-palette'; import { TerminalPanel } from './wall/TerminalPanel'; import { TerminalPaneHeader } from './wall/TerminalPaneHeader'; -import { AgentBrowserPanel } from './wall/AgentBrowserPanel'; -import { IframePanel } from './wall/IframePanel'; +import { BrowserPanel } from './wall/BrowserPanel'; +import { resolveRenderMode, isAgentBrowserParams, isBrowserParams } from './wall/browser-surface'; import { hostPathDisplay } from './wall/browser-url'; import { SurfacePaneHeader } from './wall/SurfacePaneHeader'; import { WorkspaceSelectionOverlay } from './wall/WorkspaceSelectionOverlay'; @@ -165,15 +167,39 @@ function persistedPanelTitle(title: string | null | undefined): string { } function surfaceTypeFromParams(params: unknown): DorSurfaceType { - if (params && typeof params === 'object' && !Array.isArray(params)) { - const surfaceType = (params as { surfaceType?: unknown }).surfaceType; - if (surfaceType === 'iframe' || surfaceType === 'agent-browser') return surfaceType; - } - return 'terminal'; + if (!isBrowserParams(params)) return 'terminal'; + // The CLI surface type tracks the *renderer* (iframe vs agent-browser) so + // `dor` output stays informative even though both are one 'browser' surface. + return resolveRenderMode(params) === 'iframe' ? 'iframe' : 'agent-browser'; +} + +/** Killing or swapping away from an agent-browser surface closes its session — + * surface lifetime and browser lifetime are bound (spec → Lifecycle). No-op + * for other surface types. */ +function closeAgentBrowserSession(params: unknown): void { + if (!isAgentBrowserParams(params)) return; + const p = params as { session?: unknown; binaryPath?: unknown }; + if (typeof p.session !== 'string') return; + const binaryPath = typeof p.binaryPath === 'string' ? p.binaryPath : undefined; + // Mark before issuing the close so a popped-out surface's auto-revert sees + // the impending teardown and doesn't relaunch the session we're killing. + markAgentBrowserSessionClosed(p.session); + getPlatform().agentBrowserCommand?.(p.session, ['close'], binaryPath).catch(() => {}); } function componentForSurfaceType(type: DorSurfaceType): string { - return type; + // iframe + agent-browser both render through the unified BrowserPanel. + return type === 'terminal' ? 'terminal' : 'browser'; +} + +/** Every browser surface uses dockview's `renderer:'always'`. The default + * (`onlyWhenVisible`) detaches/reattaches — i.e. *moves* — the panel DOM on + * activation; that reloads an <iframe>, and for the screencast canvas it moves + * the node mid-press, so a real click's mouseup lands on a different node and no + * `click` is synthesized (tab chips / page links silently did nothing). Keeping + * the panel always-mounted avoids both. Only ever called for 'browser' panels. */ +function rendererForParams(_params: { renderMode?: unknown }): 'always' { + return 'always'; } function tabComponentForSurfaceType(type: DorSurfaceType): string { @@ -336,7 +362,10 @@ function ShellSpawnNotice({ ); } -const components = { terminal: TerminalPanel, iframe: IframePanel, 'agent-browser': AgentBrowserPanel }; +// One body component for every browser surface; the legacy 'iframe' / +// 'agent-browser' names alias to it so dockview layouts persisted before the +// unification still resolve on restore. +const components = { terminal: TerminalPanel, browser: BrowserPanel, iframe: BrowserPanel, 'agent-browser': BrowserPanel }; const tabComponents = { terminal: TerminalPaneHeader, surface: SurfacePaneHeader }; // --- Main component --- @@ -501,13 +530,7 @@ export function Wall({ const api = apiRef.current; const panel = api?.getPanel(id); if (!api || !panel) return; - // Surface lifetime and browser lifetime are bound: killing an - // agent-browser surface closes its session (spec → Lifecycle). - const panelParams = panel.params as { surfaceType?: unknown; session?: unknown; binaryPath?: unknown } | undefined; - if (panelParams?.surfaceType === 'agent-browser' && typeof panelParams.session === 'string') { - const binaryPath = typeof panelParams.binaryPath === 'string' ? panelParams.binaryPath : undefined; - getPlatform().agentBrowserCommand?.(panelParams.session, ['close'], binaryPath).catch(() => {}); - } + closeAgentBrowserSession(panel.params); orchestrateKill(api, id, selectPane, setSelectedId, killInProgressRef, overlayElRef); fireEvent({ type: 'kill', id }); }, [fireEvent, selectPane]); @@ -897,13 +920,11 @@ export function Wall({ * anything else gets a split (the `dor iframe` placement rule). */ const createContentSurface = useCallback(({ - component, minimized, params, reference, title, }: { - component: 'iframe' | 'agent-browser'; minimized: boolean; params: Record<string, unknown>; reference: DorSurface; @@ -918,6 +939,9 @@ export function Wall({ const referencePanel = api.getPanel(reference.id); if (!referencePanel) return { ok: false, message: `surface '${reference.ref}' is not visible` }; + // One component for every browser surface; the renderer is derived per mode. + const component = 'browser'; + const renderer = rendererForParams(params); const newId = generatePaneId(); const replaceUntouchedTerminal = reference.type === 'terminal' && isUntouched(reference.id); @@ -930,8 +954,8 @@ export function Wall({ params, // Keep iframes mounted across (de)activation — dockview's default // onlyWhenVisible renderer detaches/reattaches panel DOM, and moving an - // <iframe> in the DOM reloads it (docs/specs/dor-iframe.md). - renderer: component === 'iframe' ? 'always' : undefined, + // <iframe> in the DOM reloads it (docs/specs/dor-browser.md). + renderer, position: { referencePanel: referencePanel.id, direction: 'within' }, }); disposeSession(reference.id); @@ -949,7 +973,7 @@ export function Wall({ tabComponent: 'surface', title, params, - renderer: component === 'iframe' ? 'always' : undefined, + renderer, position: { referencePanel: referencePanel.id, direction: dockDirection }, }); selectPane(newId); @@ -962,6 +986,41 @@ export function Wall({ return { ok: true, value: { id: newId, ref: surfaceRefForId(newId), status: 'created' } }; }, [generatePaneId, minimizePane, selectPane, surfaceRefForId]); + // The last binary path a `dor ab` surface resolved on a terminal's PATH. + // Re-used to spawn an agent-browser when swapping an iframe embed up to a + // screencast, since the webview/host PATH may not find the binary itself. + const lastAgentBrowserBinaryPathRef = useRef<string | undefined>(undefined); + + /** + * Replace a content surface's renderer in place, preserving its dock slot + * (docs/specs/dor-browser.md → "Display Modal And Render Swaps"). Adds the + * new panel `within` the old one, closes the old surface's session if any, + * then removes the old panel and selects the new. The generalized form of + * createContentSurface's replace-untouched-terminal branch. + */ + const replaceSurface = useCallback((oldId: string, next: { + params: Record<string, unknown>; + title: string; + }): string | null => { + const api = apiRef.current; + const panel = api?.getPanel(oldId); + if (!api || !panel) return null; + closeAgentBrowserSession(panel.params); + const newId = generatePaneId(); + api.addPanel({ + id: newId, + component: 'browser', + tabComponent: 'surface', + title: next.title, + params: next.params, + renderer: rendererForParams(next.params), + position: { referencePanel: panel, direction: 'within' }, + }); + api.removePanel(panel); + selectPane(newId); + return newId; + }, [generatePaneId, selectPane]); + /** * The agent-browser session ↔ surface registry, derived from panel/door * params rather than kept as separate state so it survives webview reloads. @@ -969,9 +1028,7 @@ export function Wall({ */ const findAgentBrowserSurface = useCallback((session: string): { id: string; minimized: boolean } | null => { const isMatch = (params: unknown) => - !!params && typeof params === 'object' && - (params as { surfaceType?: unknown }).surfaceType === 'agent-browser' && - (params as { session?: unknown }).session === session; + isAgentBrowserParams(params) && (params as { session?: unknown }).session === session; const panel = apiRef.current?.panels.find((candidate) => isMatch(candidate.params)); if (panel) return { id: panel.id, minimized: false }; @@ -1296,9 +1353,8 @@ export function Wall({ return; } const result = createContentSurface({ - component: 'iframe', minimized: booleanParam(params.minimized), - params: { surfaceType: 'iframe', url }, + params: { surfaceType: 'browser', renderMode: 'iframe', url }, reference: target.value, title: hostPathDisplay(url, true), }); @@ -1328,6 +1384,8 @@ export function Wall({ const key = stringParam(params.key); const wsPort = numberParam(params.wsPort); const binaryPath = stringParam(params.binaryPath); + // Remember the resolved binary so an embed→screencast swap can spawn one. + if (binaryPath) lastAgentBrowserBinaryPathRef.current = binaryPath; const refreshedParams = { ...(wsPort !== undefined ? { wsPort } : {}), ...(binaryPath !== undefined ? { binaryPath } : {}), @@ -1366,10 +1424,10 @@ export function Wall({ return; } const result = createContentSurface({ - component: 'agent-browser', minimized: booleanParam(params.minimized), params: { - surfaceType: 'agent-browser', + surfaceType: 'browser', + renderMode: 'ab-screencast', session, ...(key !== undefined ? { key } : {}), ...refreshedParams, @@ -1504,7 +1562,79 @@ export function Wall({ onCancelRename: () => { setRenamingPaneId(null); }, - }), [addSplitPanel, minimizePane, enterTerminalMode, exitTerminalMode, killPaneImmediately]); + onSwapRenderMode: (id, mode) => { + const api = apiRef.current; + const panel = api?.getPanel(id); + if (!api || !panel) return; + const params = panel.params as Record<string, unknown> | undefined; + const currentType = surfaceTypeFromParams(params); + + // agent-browser → iframe: frame the active tab's URL, then the replace + // closes the now-unneeded headless browser. Webview-only. + if (currentType === 'agent-browser' && mode === 'iframe') { + // Canonical params.url (mirrored from the chrome snapshot) first; fall + // back to the live snapshot for a surface that hasn't reported a tab yet. + const url = (typeof params?.url === 'string' && params.url) || getAgentBrowserScreenController(id)?.chrome().url; + if (!url) return; + replaceSurface(id, { + params: { surfaceType: 'browser', renderMode: 'iframe', url }, + title: hostPathDisplay(url, true), + }); + return; + } + + // iframe → live agent-browser (ab-screencast or ab-popout): the host must + // spawn a session for the URL (absent ⇒ inert, like other host-gated + // affordances). ab-popout spawns headed directly so the new surface mounts + // already popped-out (no headless launch + immediate relaunch flash). + if (currentType === 'iframe' && (mode === 'ab-screencast' || mode === 'ab-popout')) { + const chromeUrl = getAgentBrowserScreenController(id)?.chrome().url; + const url = (typeof chromeUrl === 'string' && chromeUrl) + || (typeof params?.url === 'string' ? params.url : undefined); + const platform = getPlatform(); + if (!url || !platform.agentBrowserOpen) return; + const headed = mode === 'ab-popout'; + platform.agentBrowserOpen(url, { headed }, lastAgentBrowserBinaryPathRef.current).then((res) => { + if (!res.ok || !res.session) return; + if (res.binaryPath) lastAgentBrowserBinaryPathRef.current = res.binaryPath; + const nextParams = { + surfaceType: 'browser', + renderMode: mode, + session: res.session, + url, + ...(res.wsPort !== undefined ? { wsPort: res.wsPort } : {}), + ...(res.binaryPath !== undefined ? { binaryPath: res.binaryPath } : {}), + syncEngaged: true, + }; + const nextId = replaceSurface(id, { + params: nextParams, + title: hostPathDisplay(url, true), + }); + if (!nextId) { + closeAgentBrowserSession(nextParams); + console.warn(`[dormouse] failed to replace iframe surface '${id}' with agent-browser surface`); + } + }).catch((err) => { + console.warn('[dormouse] failed to swap iframe surface to agent-browser:', err); + }); + } + }, + onOpenBrowserPane: (id, url) => { + const api = apiRef.current; + if (!api) return; + // A new-tab request from the iframe shim → open the URL as a new iframe + // browser pane, split next to the source (docs/specs/dor-browser.md → + // "Iframe Shim"). + const reference = buildDorSurfaces(api).find((s) => s.id === id); + if (!reference) return; + createContentSurface({ + minimized: false, + params: { surfaceType: 'browser', renderMode: 'iframe', url }, + reference, + title: hostPathDisplay(url, true), + }); + }, + }), [addSplitPanel, minimizePane, enterTerminalMode, exitTerminalMode, killPaneImmediately, replaceSurface, buildDorSurfaces, createContentSurface]); const wallActionsRef = useRef(wallActions); wallActionsRef.current = wallActions; diff --git a/lib/src/components/wall/AgentBrowserPanel.test.tsx b/lib/src/components/wall/AgentBrowserPanel.test.tsx new file mode 100644 index 00000000..d82cb820 --- /dev/null +++ b/lib/src/components/wall/AgentBrowserPanel.test.tsx @@ -0,0 +1,597 @@ +/** + * @vitest-environment jsdom + */ +import { act, StrictMode } from 'react'; +import { createRoot, type Root } from 'react-dom/client'; +import type { IDockviewPanelProps } from 'dockview-react'; +import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest'; +import { FakePtyAdapter, setPlatform } from '../../lib/platform'; +import type { AgentBrowserPopResult, AgentBrowserStreamStatusResult, PlatformAdapter } from '../../lib/platform/types'; +import { AgentBrowserPanel } from './AgentBrowserPanel'; +import { getAgentBrowserScreenController } from './agent-browser-screen'; +import { ModeContext, SelectedIdContext, WallActionsContext, type WallActions } from './wall-context'; + +globalThis.IS_REACT_ACT_ENVIRONMENT = true; + +type TestPanelParams = { + surfaceType: string; + renderMode?: string; + session: string; + wsPort?: number; + url?: string; + poppedOut?: boolean; +}; + +class ResizeObserverMock { + observe() {} + unobserve() {} + disconnect() {} +} + +function stubActions(overrides: Partial<WallActions> = {}): WallActions { + return { + onKill: vi.fn(), + onMinimize: vi.fn(), + onAlertButton: vi.fn(() => 'noop'), + onToggleTodo: vi.fn(), + onSplitH: vi.fn(), + onSplitV: vi.fn(), + onZoom: vi.fn(), + onClickPanel: vi.fn(), + onFocusPane: vi.fn(), + onStartRename: vi.fn(), + onFinishRename: vi.fn(() => ({ accepted: true })), + onCancelRename: vi.fn(), + onSwapRenderMode: vi.fn(), + ...overrides, + }; +} + +function panelProps( + id: string, + updateParameters = vi.fn(), +): IDockviewPanelProps<TestPanelParams> { + return { + api: { id, title: 'Browser', updateParameters, setTitle: vi.fn() }, + params: { surfaceType: 'agent-browser', session: 'browser-session' }, + } as unknown as IDockviewPanelProps<TestPanelParams>; +} + +class WebSocketMock { + static instances: WebSocketMock[] = []; + static OPEN = 1; + + onopen: ((event: Event) => void) | null = null; + onmessage: ((event: MessageEvent) => void) | null = null; + onclose: ((event: CloseEvent) => void) | null = null; + onerror: ((event: Event) => void) | null = null; + readyState = 1; + sent: string[] = []; + + constructor(public url: string) { + WebSocketMock.instances.push(this); + queueMicrotask(() => this.onopen?.(new Event('open'))); + } + + send(data: string) { + this.sent.push(data); + } + + close() { + this.readyState = 3; + this.onclose?.(new CloseEvent('close')); + } + + emitMessage(data: string) { + this.onmessage?.({ data } as MessageEvent); + } +} + +let container: HTMLDivElement; +let root: Root; + +beforeEach(() => { + vi.stubGlobal('ResizeObserver', ResizeObserverMock); + vi.stubGlobal('WebSocket', WebSocketMock); + WebSocketMock.instances = []; + container = document.createElement('div'); + document.body.appendChild(container); + root = createRoot(container); +}); + +afterEach(() => { + act(() => root.unmount()); + container.remove(); + vi.restoreAllMocks(); + setPlatform(new FakePtyAdapter()); +}); + +async function renderPanel(props = panelProps('ab-panel')): Promise<void> { + await act(async () => { + root.render( + <StrictMode> + <WallActionsContext.Provider value={stubActions()}> + <AgentBrowserPanel {...props} /> + </WallActionsContext.Provider> + </StrictMode>, + ); + }); +} + +describe('AgentBrowserPanel render mode controller', () => { + it('relaunches screencast sessions as popout and publishes the mode immediately', async () => { + const updateParameters = vi.fn(); + const popOut = vi.fn<PlatformAdapter['agentBrowserPopOut']>(async (): Promise<AgentBrowserPopResult> => ({ + ok: true, + wsPort: 3456, + })); + const streamStatus = vi.fn<PlatformAdapter['agentBrowserStreamStatus']>(async (): Promise<AgentBrowserStreamStatusResult> => ({ + ok: true, + wsPort: 1234, + })); + const platform = new FakePtyAdapter() as FakePtyAdapter & Pick<PlatformAdapter, 'agentBrowserCommand' | 'agentBrowserPopOut' | 'agentBrowserStreamStatus'>; + platform.agentBrowserCommand = vi.fn(async () => ({ exitCode: 0, stdout: '', stderr: '' })); + platform.agentBrowserPopOut = popOut; + platform.agentBrowserStreamStatus = streamStatus; + setPlatform(platform); + + await renderPanel(panelProps('ab-panel', updateParameters)); + + await act(async () => { + getAgentBrowserScreenController('ab-panel')?.actions.setRenderMode?.('ab-popout'); + }); + + expect(popOut).toHaveBeenCalledWith('browser-session', expect.objectContaining({ url: undefined }), undefined); + expect(updateParameters).toHaveBeenCalledWith({ renderMode: 'ab-popout' }); + expect(getAgentBrowserScreenController('ab-panel')?.snapshot().renderMode).toBe('ab-popout'); + expect(container.textContent).toContain('This browser is running in a separate window.'); + expect(WebSocketMock.instances.some((ws) => ws.url === 'ws://127.0.0.1:3456')).toBe(true); + }); + + it('relaunches popped-out sessions back into screencast', async () => { + const updateParameters = vi.fn(); + const popIn = vi.fn<PlatformAdapter['agentBrowserPopIn']>(async (): Promise<AgentBrowserPopResult> => ({ + ok: true, + wsPort: 4567, + })); + const platform = new FakePtyAdapter() as FakePtyAdapter & Pick<PlatformAdapter, 'agentBrowserCommand' | 'agentBrowserPopIn' | 'agentBrowserStreamStatus'>; + platform.agentBrowserCommand = vi.fn(async () => ({ exitCode: 0, stdout: '', stderr: '' })); + platform.agentBrowserPopIn = popIn; + platform.agentBrowserStreamStatus = vi.fn(async () => ({ ok: true, wsPort: 1234 })); + setPlatform(platform); + + await renderPanel({ + ...panelProps('ab-panel', updateParameters), + params: { surfaceType: 'browser', renderMode: 'ab-popout', session: 'browser-session' }, + } as unknown as IDockviewPanelProps<{ surfaceType: string; renderMode: string; session: string }>); + + expect(getAgentBrowserScreenController('ab-panel')?.snapshot().renderMode).toBe('ab-popout'); + + await act(async () => { + getAgentBrowserScreenController('ab-panel')?.actions.setRenderMode?.('ab-screencast'); + }); + + expect(popIn).toHaveBeenCalledWith('browser-session', expect.objectContaining({ url: undefined }), undefined); + expect(updateParameters).toHaveBeenCalledWith({ renderMode: 'ab-screencast' }); + expect(getAgentBrowserScreenController('ab-panel')?.snapshot().renderMode).toBe('ab-screencast'); + }); + + it('pop-in uses the latest observed headed-window tab URL over stale params', async () => { + const updateParameters = vi.fn(); + const popIn = vi.fn<PlatformAdapter['agentBrowserPopIn']>(async (): Promise<AgentBrowserPopResult> => ({ + ok: true, + wsPort: 4567, + })); + const platform = new FakePtyAdapter() as FakePtyAdapter & Pick<PlatformAdapter, 'agentBrowserCommand' | 'agentBrowserPopIn' | 'agentBrowserStreamStatus'>; + platform.agentBrowserCommand = vi.fn(async () => ({ exitCode: 0, stdout: '', stderr: '' })); + platform.agentBrowserPopIn = popIn; + platform.agentBrowserStreamStatus = vi.fn(async () => ({ ok: true, wsPort: 1234 })); + setPlatform(platform); + + await renderPanel({ + ...panelProps('ab-panel', updateParameters), + params: { + surfaceType: 'browser', + renderMode: 'ab-popout', + session: 'browser-session', + wsPort: 1111, + url: 'https://google.com/', + }, + }); + + await act(async () => { + WebSocketMock.instances[0]?.emitMessage(JSON.stringify({ + type: 'tabs', + tabs: [{ tabId: 'tab-1', title: 'Example Domain', url: 'https://example.com/', active: true }], + })); + }); + + expect(updateParameters).toHaveBeenCalledWith({ url: 'https://example.com/' }); + + await act(async () => { + getAgentBrowserScreenController('ab-panel')?.actions.setRenderMode?.('ab-screencast'); + }); + + expect(popIn).toHaveBeenCalledWith('browser-session', expect.objectContaining({ url: 'https://example.com/' }), undefined); + }); + + it('mirrors popped-out stream tab URL updates when the stream reports id instead of tabId', async () => { + const updateParameters = vi.fn(); + const platform = new FakePtyAdapter() as FakePtyAdapter & Pick<PlatformAdapter, 'agentBrowserCommand' | 'agentBrowserStreamStatus'>; + platform.agentBrowserCommand = vi.fn(async () => ({ exitCode: 0, stdout: '', stderr: '' })); + platform.agentBrowserStreamStatus = vi.fn(async () => ({ ok: true, wsPort: 1234 })); + setPlatform(platform); + + await renderPanel({ + ...panelProps('ab-panel', updateParameters), + params: { + surfaceType: 'browser', + renderMode: 'ab-popout', + session: 'browser-session', + wsPort: 1111, + url: 'https://google.com/', + }, + }); + + await act(async () => { + WebSocketMock.instances[0]?.emitMessage(JSON.stringify({ + type: 'tabs', + tabs: [{ id: 'tab-1', title: 'Example Domain', url: 'https://example.com/', active: true }], + })); + }); + + expect(updateParameters).toHaveBeenCalledWith({ url: 'https://example.com/' }); + }); + + it('mirrors popped-out manual navigation from CDP target events', async () => { + const updateParameters = vi.fn(); + const platform = new FakePtyAdapter() as FakePtyAdapter & Pick<PlatformAdapter, 'agentBrowserCommand' | 'agentBrowserStreamStatus'>; + platform.agentBrowserCommand = vi.fn(async (_session, args) => { + if (args.join(' ') === 'get cdp-url') return { exitCode: 0, stdout: 'ws://127.0.0.1:9222/devtools/browser/test', stderr: '' }; + return { exitCode: 0, stdout: '', stderr: '' }; + }); + platform.agentBrowserStreamStatus = vi.fn(async () => ({ ok: true, wsPort: 1234 })); + setPlatform(platform); + + await renderPanel({ + ...panelProps('ab-panel', updateParameters), + params: { + surfaceType: 'browser', + renderMode: 'ab-popout', + session: 'browser-session', + wsPort: 1111, + url: 'https://google.com/', + }, + }); + + await act(async () => { + await Promise.resolve(); + await Promise.resolve(); + }); + + const cdpWs = WebSocketMock.instances.find((ws) => ws.url.includes('/devtools/browser/')); + expect(cdpWs).toBeTruthy(); + + await act(async () => { + cdpWs?.emitMessage(JSON.stringify({ + method: 'Target.targetInfoChanged', + params: { targetInfo: { type: 'page', url: 'https://example.com/', title: 'Example Domain' } }, + })); + }); + + expect(platform.agentBrowserCommand).toHaveBeenCalledWith('browser-session', ['get', 'cdp-url'], undefined); + expect(updateParameters).toHaveBeenCalledWith({ url: 'https://example.com/' }); + }); + + it('actively selects a newly opened tab when the stream does not mark it active', async () => { + const platform = new FakePtyAdapter() as FakePtyAdapter & Pick<PlatformAdapter, 'agentBrowserCommand'>; + platform.agentBrowserCommand = vi.fn(async () => ({ exitCode: 0, stdout: '', stderr: '' })); + setPlatform(platform); + + await renderPanel({ + ...panelProps('ab-panel'), + params: { surfaceType: 'browser', session: 'browser-session', wsPort: 1111 }, + }); + + await act(async () => { + WebSocketMock.instances.at(-1)?.emitMessage(JSON.stringify({ + type: 'tabs', + tabs: [{ tabId: 't1', title: 'Dormouse', url: 'https://dormouse.sh/', active: true }], + })); + }); + + await act(async () => { + WebSocketMock.instances.at(-1)?.emitMessage(JSON.stringify({ + type: 'tabs', + tabs: [ + { tabId: 't1', title: 'Dormouse', url: 'https://dormouse.sh/', active: true }, + { tabId: 't2', title: 'GitHub', url: 'https://github.com/diffplug/dormouse', active: false }, + ], + })); + }); + + expect(platform.agentBrowserCommand).toHaveBeenCalledWith('browser-session', ['tab', 't2'], undefined); + }); + + it('does not force-select a provisional new tab that already reports active', async () => { + const platform = new FakePtyAdapter() as FakePtyAdapter & Pick<PlatformAdapter, 'agentBrowserCommand'>; + platform.agentBrowserCommand = vi.fn(async () => ({ exitCode: 0, stdout: '', stderr: '' })); + setPlatform(platform); + + await renderPanel({ + ...panelProps('ab-panel'), + params: { surfaceType: 'browser', session: 'browser-session', wsPort: 1111 }, + }); + + await act(async () => { + WebSocketMock.instances.at(-1)?.emitMessage(JSON.stringify({ + type: 'tabs', + tabs: [{ tabId: 't1', title: 'Dormouse', url: 'https://dormouse.sh/', active: true }], + })); + }); + + await act(async () => { + WebSocketMock.instances.at(-1)?.emitMessage(JSON.stringify({ + type: 'tabs', + tabs: [ + { tabId: 't1', title: 'Dormouse', url: 'https://dormouse.sh/', active: false }, + { tabId: 't2', title: 'Dormouse', url: 'https://dormouse.sh/', active: true }, + ], + })); + }); + + expect(platform.agentBrowserCommand).not.toHaveBeenCalled(); + }); + + it('selects a provisional new tab after it reaches its destination if it is not active', async () => { + const platform = new FakePtyAdapter() as FakePtyAdapter & Pick<PlatformAdapter, 'agentBrowserCommand'>; + platform.agentBrowserCommand = vi.fn(async () => ({ exitCode: 0, stdout: '', stderr: '' })); + setPlatform(platform); + + await renderPanel({ + ...panelProps('ab-panel'), + params: { surfaceType: 'browser', session: 'browser-session', wsPort: 1111 }, + }); + + await act(async () => { + WebSocketMock.instances.at(-1)?.emitMessage(JSON.stringify({ + type: 'tabs', + tabs: [{ tabId: 't1', title: 'Dormouse', url: 'https://dormouse.sh/', active: true }], + })); + }); + + await act(async () => { + WebSocketMock.instances.at(-1)?.emitMessage(JSON.stringify({ + type: 'tabs', + tabs: [ + { tabId: 't1', title: 'Dormouse', url: 'https://dormouse.sh/', active: false }, + { tabId: 't2', title: 'Dormouse', url: 'https://dormouse.sh/', active: true }, + ], + })); + }); + + await act(async () => { + WebSocketMock.instances.at(-1)?.emitMessage(JSON.stringify({ + type: 'tabs', + tabs: [ + { tabId: 't1', title: 'Dormouse', url: 'https://dormouse.sh/', active: true }, + { tabId: 't2', title: 'GitHub', url: 'https://github.com/diffplug/dormouse', active: false }, + ], + })); + }); + + expect(platform.agentBrowserCommand).toHaveBeenCalledWith('browser-session', ['tab', 't2'], undefined); + }); + + it('keeps the last known active tab when the stream emits a transient empty tab list', async () => { + const updateParameters = vi.fn(); + await renderPanel({ + ...panelProps('ab-panel', updateParameters), + params: { surfaceType: 'browser', session: 'browser-session', wsPort: 1111 }, + }); + + await act(async () => { + WebSocketMock.instances.at(-1)?.emitMessage(JSON.stringify({ + type: 'tabs', + tabs: [{ tabId: 't2', title: 'GitHub', url: 'https://github.com/diffplug/dormouse', active: true }], + })); + }); + + expect(getAgentBrowserScreenController('ab-panel')?.chrome().url).toBe('https://github.com/diffplug/dormouse'); + + await act(async () => { + WebSocketMock.instances.at(-1)?.emitMessage(JSON.stringify({ type: 'tabs', tabs: [] })); + }); + + expect(getAgentBrowserScreenController('ab-panel')?.chrome().url).toBe('https://github.com/diffplug/dormouse'); + }); + + it('does not recover a stale port through stream status after that port opened live', async () => { + const streamStatus = vi.fn<PlatformAdapter['agentBrowserStreamStatus']>(async (): Promise<AgentBrowserStreamStatusResult> => ({ + ok: true, + wsPort: 2222, + })); + const platform = new FakePtyAdapter() as FakePtyAdapter & Pick<PlatformAdapter, 'agentBrowserStreamStatus'>; + platform.agentBrowserStreamStatus = streamStatus; + setPlatform(platform); + + await renderPanel({ + ...panelProps('ab-panel'), + params: { surfaceType: 'browser', session: 'browser-session', wsPort: 1111 }, + }); + + await act(async () => { + await Promise.resolve(); + }); + streamStatus.mockClear(); + + await act(async () => { + WebSocketMock.instances.at(-1)?.emitMessage(JSON.stringify({ type: 'status', connected: false, screencasting: false })); + await Promise.resolve(); + }); + + expect(streamStatus).not.toHaveBeenCalled(); + }); + + it('swaps straight to iframe with no extra tabs (no confirm gate)', async () => { + const onSwapRenderMode = vi.fn(); + await act(async () => { + root.render( + <StrictMode> + <WallActionsContext.Provider value={stubActions({ onSwapRenderMode })}> + <AgentBrowserPanel {...panelProps('ab-panel')} /> + </WallActionsContext.Provider> + </StrictMode>, + ); + }); + + // A single-tab (here zero-tab) session has nothing to lose, so the swap is + // issued immediately; the ≥2-tab confirm gate is exercised only in the GUI. + await act(async () => { + getAgentBrowserScreenController('ab-panel')?.actions.setRenderMode?.('iframe'); + }); + + expect(onSwapRenderMode).toHaveBeenCalledWith('ab-panel', 'iframe'); + }); +}); + +describe('AgentBrowserPanel canvas input forwarding', () => { + // The pane that `dor ab open` creates is not the selected pane (the terminal + // is), so the FIRST click on the browser surface must still reach the page — + // it is the click that selects the pane. Mouse-down/up therefore gate on + // passthrough mode alone, not full `interactive` (mode && selected). + async function renderWithMode(mode: 'passthrough' | 'command', selectedId: string | null): Promise<HTMLCanvasElement> { + const props = { + api: { id: 'ab-panel', title: 'Browser', updateParameters: vi.fn(), setTitle: vi.fn() }, + params: { surfaceType: 'agent-browser', session: 'browser-session', wsPort: 4321 }, + } as unknown as IDockviewPanelProps<TestPanelParams>; + await act(async () => { + root.render( + <StrictMode> + <WallActionsContext.Provider value={stubActions()}> + <ModeContext.Provider value={mode}> + <SelectedIdContext.Provider value={selectedId}> + <AgentBrowserPanel {...props} /> + </SelectedIdContext.Provider> + </ModeContext.Provider> + </WallActionsContext.Provider> + </StrictMode>, + ); + }); + const canvas = container.querySelector('canvas') as HTMLCanvasElement; + // jsdom has no layout — give the canvas a frame grid + box so toDevice maps. + canvas.width = 1280; + canvas.height = 720; + canvas.getBoundingClientRect = () => ({ width: 1280, height: 720, left: 0, top: 0, right: 1280, bottom: 720, x: 0, y: 0, toJSON() {} }) as DOMRect; + return canvas; + } + + const sentMouseEvents = () => WebSocketMock.instances + .flatMap((ws) => ws.sent) + .filter((m) => m.includes('"type":"input_mouse"')); + + it('forwards a click to the page when in passthrough mode even if the pane is not selected', async () => { + const canvas = await renderWithMode('passthrough', 'some-other-pane'); + await act(async () => { + canvas.dispatchEvent(new MouseEvent('mousedown', { bubbles: true, clientX: 100, clientY: 50, button: 0 })); + canvas.dispatchEvent(new MouseEvent('mouseup', { bubbles: true, clientX: 100, clientY: 50, button: 0 })); + }); + const events = sentMouseEvents(); + expect(events.some((m) => m.includes('"eventType":"mousePressed"'))).toBe(true); + expect(events.some((m) => m.includes('"eventType":"mouseReleased"'))).toBe(true); + }); + + it('does not forward canvas clicks in command mode', async () => { + const canvas = await renderWithMode('command', null); + await act(async () => { + canvas.dispatchEvent(new MouseEvent('mousedown', { bubbles: true, clientX: 100, clientY: 50, button: 0 })); + canvas.dispatchEvent(new MouseEvent('mouseup', { bubbles: true, clientX: 100, clientY: 50, button: 0 })); + }); + expect(sentMouseEvents()).toHaveLength(0); + }); +}); + +describe('AgentBrowserPanel tab strip actions', () => { + // The chip/× use plain onClick. In the real app a click on an unselected + // browser pane used to be lost because selecting the pane moved its DOM + // mid-press; that is fixed at the source by giving browser panels dockview's + // `renderer:'always'` (Wall.tsx → rendererForParams), so the node stays put and + // the click survives. jsdom doesn't move the DOM, so a dispatched click here + // just exercises the onClick → selectTab/closeTab wiring. + async function renderWithTwoTabs(): Promise<ReturnType<typeof vi.fn>> { + const command = vi.fn(async () => ({ exitCode: 0, stdout: '', stderr: '' })); + const platform = new FakePtyAdapter() as FakePtyAdapter & Pick<PlatformAdapter, 'agentBrowserCommand'>; + platform.agentBrowserCommand = command; + setPlatform(platform); + const props = { + api: { id: 'ab-panel', title: 'Browser', updateParameters: vi.fn(), setTitle: vi.fn() }, + params: { surfaceType: 'agent-browser', session: 'browser-session', wsPort: 4321 }, + } as unknown as IDockviewPanelProps<TestPanelParams>; + await renderPanel(props); + const ws = WebSocketMock.instances[WebSocketMock.instances.length - 1]; + await act(async () => { + ws.emitMessage(JSON.stringify({ type: 'tabs', tabs: [ + { tabId: 't1', title: 'Dormouse', url: 'https://dormouse.sh/', active: true }, + { tabId: 't2', title: 'GitHub', url: 'https://github.com/diffplug/dormouse', active: false }, + ] })); + }); + return command; + } + + const chipFor = (url: string) => [...container.querySelectorAll('div[title]')] + .find((e) => e.getAttribute('title') === url && (e.className || '').includes('cursor-pointer')) as HTMLElement; + + it('switches to an inactive tab on chip click', async () => { + const command = await renderWithTwoTabs(); + const chip = chipFor('https://github.com/diffplug/dormouse'); + await act(async () => { + chip.dispatchEvent(new MouseEvent('click', { bubbles: true, button: 0 })); + }); + expect(command).toHaveBeenCalledWith('browser-session', ['tab', 't2'], undefined); + }); + + it('closes a tab on the × button click', async () => { + const command = await renderWithTwoTabs(); + const closeBtn = chipFor('https://github.com/diffplug/dormouse') + .querySelector('button[aria-label="Close tab"]') as HTMLButtonElement; + await act(async () => { + closeBtn.dispatchEvent(new MouseEvent('click', { bubbles: true, button: 0 })); + }); + expect(command).toHaveBeenCalledWith('browser-session', ['tab', 'close', 't2'], undefined); + }); + + it('captures a fresh frame when the active tab changes, but not on other tab edits', async () => { + // The daemon emits no screencast frame on a tab switch and the dedup'd stream + // is otherwise silent, so the panel forces one device screenshot so the canvas + // follows the newly-active tab. + const screenshot = vi.fn(async () => ({ ok: false as const, error: 'test' })); + const platform = new FakePtyAdapter() as FakePtyAdapter & Pick<PlatformAdapter, 'agentBrowserScreenshot'>; + platform.agentBrowserScreenshot = screenshot; + setPlatform(platform); + const props = { + api: { id: 'ab-panel', title: 'Browser', updateParameters: vi.fn(), setTitle: vi.fn() }, + params: { surfaceType: 'agent-browser', session: 'browser-session', wsPort: 4321 }, + } as unknown as IDockviewPanelProps<TestPanelParams>; + await renderPanel(props); + const ws = WebSocketMock.instances[WebSocketMock.instances.length - 1]; + const tabs = (a: 't1' | 't2', extra = false) => JSON.stringify({ type: 'tabs', tabs: [ + { tabId: 't1', title: 'Dormouse', url: 'https://dormouse.sh/', active: a === 't1' }, + { tabId: 't2', title: 'GitHub', url: 'https://github.com/diffplug/dormouse', active: a === 't2' }, + ...(extra ? [{ tabId: 't3', title: 'GitHub', url: 'https://github.com/diffplug/dormouse', active: false }] : []), + ] }); + + await act(async () => { ws.emitMessage(tabs('t1')); }); + await new Promise((r) => setTimeout(r, 40)); // let the priming capture settle + screenshot.mockClear(); + + // Adding a tab without changing which is active must NOT force a capture. + await act(async () => { ws.emitMessage(tabs('t1', true)); }); + await new Promise((r) => setTimeout(r, 120)); + expect(screenshot).not.toHaveBeenCalled(); + + // Switching the active tab does. + await act(async () => { ws.emitMessage(tabs('t2', true)); }); + await new Promise((r) => setTimeout(r, 120)); + expect(screenshot).toHaveBeenCalled(); + }); +}); diff --git a/lib/src/components/wall/AgentBrowserPanel.tsx b/lib/src/components/wall/AgentBrowserPanel.tsx index 6498f53b..24c66c23 100644 --- a/lib/src/components/wall/AgentBrowserPanel.tsx +++ b/lib/src/components/wall/AgentBrowserPanel.tsx @@ -1,5 +1,5 @@ /** - * Live agent-browser session viewer (see docs/specs/dor-agent-browser.md). + * Live agent-browser session viewer (see docs/specs/dor-browser.md). * * One WebSocket — the session's stream socket — carries everything: JPEG * frames out, `input_mouse`/`input_keyboard` in, plus pushed `status` and @@ -18,12 +18,15 @@ import { registerAgentBrowserScreen, type ChromeActions, type ChromeSnapshot, + type RenderMode, type ScreenActions, type ScreenRegistration, type ScreenSnapshot, type ScreenState, } from './agent-browser-screen'; import { hostPathDisplay } from './browser-url'; +import { resolveRenderMode } from './browser-surface'; +import { clearAgentBrowserSessionClosed, isAgentBrowserSessionClosed } from './agent-browser-sessions'; import { EDIT_OPS, MOUSE_BUTTONS, @@ -33,6 +36,12 @@ import { virtualKeyCode, } from './agent-browser-input'; import { createScreenshotLoop } from './agent-browser-screenshot-loop'; +import { + createAgentBrowserConnection, + type AgentBrowserConnection, + type AgentBrowserStreamStatus as StreamStatus, + type AgentBrowserTab as StreamTab, +} from './agent-browser-connection'; import { usePaneChrome } from './use-pane-chrome'; import { ModeContext, @@ -42,16 +51,38 @@ import { type AgentBrowserPanelParams = { surfaceType?: string; + /** Canonical render backend; the BrowserPanel shell also passes it as a prop. */ + renderMode?: RenderMode; session?: string; key?: string; wsPort?: number; binaryPath?: string; + /** The active tab's URL, mirrored from the live session so it persists in the + * layout blob and is available to render-mode swaps and pop-out without a live + * stream. The canonical target for the surface (see dor-browser.md → "Canonical Params"). */ + url?: string; /** Whether sync-to-pane is engaged; persists via the dockview layout blob so * a re-attached surface re-engages sync if it was engaged. Absent on a fresh - * surface ⇒ auto-engage (see docs/specs/dor-agent-browser.md). */ + * surface ⇒ auto-engage (see docs/specs/dor-browser.md). */ syncEngaged?: boolean; + /** Whether this session is currently popped out to a headed OS window + * (docs/specs/dor-browser.md → "Pop-Out"). Persists via the + * layout blob so a re-attached surface re-renders the stub. */ + poppedOut?: boolean; }; +/** Best-effort screen rect for positioning a popped-out window over the pane. + * VS Code webviews can't read true screen coords (the host then centers); on + * standalone, window.screenX/Y offset the pane's viewport rect into screen + * space. */ +function paneScreenRect(el: HTMLElement | null): { x: number; y: number; width: number; height: number } | undefined { + if (!el) return undefined; + const r = el.getBoundingClientRect(); + const sx = typeof window.screenX === 'number' ? window.screenX : 0; + const sy = typeof window.screenY === 'number' ? window.screenY : 0; + return { x: Math.round(sx + r.left), y: Math.round(sy + r.top), width: Math.round(r.width), height: Math.round(r.height) }; +} + // SYNCED is "browser viewport CSS size == pane CSS size". The screencast is // always delivered at CSS-pixel resolution — the frame never encodes the // browser's DPR (verified 0.27.0: `set viewport 800 600 2` yields the same @@ -64,26 +95,28 @@ function dimsMatch(a: { w: number; h: number }, b: { w: number; h: number }): bo return Math.abs(a.w - b.w) <= DIM_TOLERANCE && Math.abs(a.h - b.h) <= DIM_TOLERANCE; } -// Stream messages above this size are frames (a base64 JPEG — ~150–220 KB at -// desktop sizes); `status`/`tabs` are well under 16 KB. We display screenshots, -// not frames, so a frame's payload is discarded — there's no point paying -// JSON.parse + a throwaway allocation for it (measured ~13 MB/s at 1080p/60fps -// on an animating page). We pulse on the raw message and read the viewport from -// the small `status` messages instead. Smaller messages still get parsed, so a -// rare tiny-viewport frame falling under the cutoff still pulses correctly. -const FRAME_PULSE_THRESHOLD = 16384; - -type StreamTab = { - tabId: string; - title: string | null; - url: string; - active: boolean; -}; +// A pop-out/pop-in relaunch restores a single URL. A transient about:blank — a +// stray tab the close+reopen can momentarily surface, or a freshly-relaunched +// blank page — must never be treated as the page to restore, or the real URL is +// lost on the way back in. Mirrors the host's usableRelaunchUrl. +function isRestorableUrl(url: string | null | undefined): url is string { + if (typeof url !== 'string') return false; + const trimmed = url.trim(); + return trimmed !== '' && trimmed !== 'about:blank'; +} -type StreamStatus = { - connected: boolean; - screencasting: boolean; -}; +function parseCdpUrl(stdout: string): string | null { + const trimmed = stdout.trim(); + if (!trimmed) return null; + try { + const parsed = JSON.parse(trimmed) as { data?: { result?: unknown }; result?: unknown; url?: unknown }; + const value = parsed.data?.result ?? parsed.result ?? parsed.url; + if (typeof value === 'string' && value.startsWith('ws://')) return value; + } catch { + // Plain text is the common CLI output. + } + return trimmed.match(/ws:\/\/\S+/)?.[0] ?? null; +} function tabDisplayTitle(tab: StreamTab): string { const title = tab.title?.trim(); @@ -91,15 +124,12 @@ function tabDisplayTitle(tab: StreamTab): string { return hostPathDisplay(tab.url) || 'untitled'; } -// Decode a base64 screencast frame to an ImageBitmap (the fallback display path -// for hosts that can't screenshot). Callers apply their own freshness guard. -function decodeScreencastFrame(dataBase64: string): Promise<ImageBitmap> { - const bytes = Uint8Array.from(atob(dataBase64), (c) => c.charCodeAt(0)); - return createImageBitmap(new Blob([bytes], { type: 'image/jpeg' })); -} - -export function AgentBrowserPanel({ api, params }: IDockviewPanelProps<AgentBrowserPanelParams>) { +export function AgentBrowserPanel({ api, params, renderMode: renderModeProp }: IDockviewPanelProps<AgentBrowserPanelParams> & { renderMode?: RenderMode }) { const actions = useContext(WallActionsContext); + // Stable handle so the screen controller (registered once) can reach the live + // Wall actions — used by setRenderMode to trigger an in-place surface swap. + const actionsRef = useRef(actions); + actionsRef.current = actions; const mode = useContext(ModeContext); const selectedId = useContext(SelectedIdContext); const elRef = useRef<HTMLDivElement>(null); @@ -109,18 +139,51 @@ export function AgentBrowserPanel({ api, params }: IDockviewPanelProps<AgentBrow const session = params?.session; const wsPort = params?.wsPort; + const [streamPort, setStreamPort] = useState(wsPort); + useEffect(() => { setStreamPort(wsPort); }, [wsPort]); const interactive = mode === 'passthrough' && selectedId === api.id; const interactiveRef = useRef(interactive); interactiveRef.current = interactive; - - const wsRef = useRef<WebSocket | null>(null); - const frameSeqRef = useRef(0); + // A direct mouse click on the canvas should reach the page even when this pane + // isn't the selected one yet — the click is what selects it (via the root + // `onClickPanel`), but `selectedId` only updates on the next render, so gating + // mouse-down/up on `interactive` would swallow the very first click on a + // freshly-opened surface (the user clicks, nothing happens, they click again). + // Mouse forwarding therefore only requires passthrough mode; keyboard/wheel + // still require full `interactive` so a background pane never steals them. + const passthrough = mode === 'passthrough'; + const passthroughRef = useRef(passthrough); + passthroughRef.current = passthrough; + + const connectionRef = useRef<AgentBrowserConnection | null>(null); const deviceRef = useRef({ width: 1280, height: 720 }); const [status, setStatus] = useState<StreamStatus | null>(null); const [hasFrame, setHasFrame] = useState(false); const [connectionLost, setConnectionLost] = useState(false); + const [streamRecoverySeq, setStreamRecoverySeq] = useState(0); const [tabs, setTabs] = useState<StreamTab[]>([]); - const knownTabIdsRef = useRef<Set<string>>(new Set()); + const tabsRef = useRef(tabs); + tabsRef.current = tabs; + // Crossing to the single-frame iframe renderer closes all but the active tab; + // when others are open the swap is gated behind a typed confirm (overlay below). + const [pendingIframeSwap, setPendingIframeSwap] = useState(false); + const swapConfirmRef = useRef<HTMLDivElement>(null); + + // Pop-out state: while true the browser runs in a headed OS window and the + // pane is a stub. Seeded from params so it survives a re-attach. + // poppedOut is derived from the canonical renderMode the shell passes; fall + // back to resolving it from params for a direct mount (tests) or a legacy blob. + const seededMode = renderModeProp ?? resolveRenderMode(params); + const [poppedOut, setPoppedOut] = useState<boolean>(seededMode === 'ab-popout'); + const poppedOutRef = useRef(poppedOut); + poppedOutRef.current = poppedOut; + // Gate auto-revert: only treat a dropped stream as "window closed" once the + // headed stream has actually connected (avoids reverting mid-relaunch). + const headedConnectedRef = useRef(false); + // True while a headed↔headless relaunch is in flight. The relaunch closes the + // current stream before reopening on a new port, so that expected drop must + // not be read as "the headed window closed" (it would auto-revert mid-pop-out). + const relaunchingRef = useRef(false); const binaryPath = params?.binaryPath; const runAgentBrowser = useCallback((args: string[]) => { @@ -146,6 +209,97 @@ export function AgentBrowserPanel({ api, params }: IDockviewPanelProps<AgentBrow sessionRef.current = session; const binaryPathRef = useRef(binaryPath); binaryPathRef.current = binaryPath; + const wsPortRef = useRef(streamPort); + wsPortRef.current = streamPort; + const liveStreamPortRef = useRef<number | null>(null); + // Canonical URL mirror (params.url): kept in a ref so the pop-out / pop-in + // callbacks read the latest without re-creating, and prefer it over the live + // chrome snapshot, which can be momentarily empty during a relaunch. + const paramsUrl = params?.url; + const paramsUrlRef = useRef(paramsUrl); + paramsUrlRef.current = paramsUrl; + // The newest non-blank active-tab URL observed from the live stream. This is + // deliberately separate from params.url: Dockview param writes can lag a tab + // message, but pop-in/auto-revert must carry the page the user just navigated + // to in the headed window. + const latestRestorableUrlRef = useRef<string | undefined>(isRestorableUrl(paramsUrl) ? paramsUrl : undefined); + useEffect(() => { + if (isRestorableUrl(paramsUrl)) latestRestorableUrlRef.current = paramsUrl; + }, [paramsUrl]); + const rememberRestorableUrl = useCallback((url: string | null | undefined) => { + if (!isRestorableUrl(url)) return false; + latestRestorableUrlRef.current = url; + if (!relaunchingRef.current && url !== paramsUrlRef.current) { + paramsUrlRef.current = url; + api.updateParameters({ url }); + } + return true; + }, [api]); + const rememberActiveTabUrl = useCallback((next: StreamTab[]) => { + const active = next.find((t) => t.active) ?? next[0] ?? null; + rememberRestorableUrl(active?.url); + }, [rememberRestorableUrl]); + const applyObservedNavigation = useCallback((url: string | null | undefined, title?: string | null) => { + if (!isRestorableUrl(url)) return; + rememberRestorableUrl(url); + setTabs((prev) => { + if (prev.length === 0) return [{ tabId: 'cdp-active', title: title ?? null, url, active: true }]; + const activeIndex = Math.max(0, prev.findIndex((tab) => tab.active)); + const current = prev[activeIndex]; + if (!current || (current.url === url && (title == null || current.title === title))) return prev; + return prev.map((tab, index) => index === activeIndex + ? { ...tab, url, title: title ?? tab.title } + : tab); + }); + }, [rememberRestorableUrl]); + + const closeIfSessionMarkedClosed = useCallback((targetSession: string | null | undefined = sessionRef.current): boolean => { + if (!targetSession || !isAgentBrowserSessionClosed(targetSession)) return false; + getPlatform().agentBrowserCommand?.(targetSession, ['close'], binaryPathRef.current).catch(() => {}); + return true; + }, []); + + const reconcileStreamPort = useCallback(async (directPort?: number): Promise<boolean> => { + if (closeIfSessionMarkedClosed()) return false; + setConnectionLost(false); + setStatus(null); + setHasFrame(false); + + if (directPort && directPort > 0) { + if (directPort !== wsPortRef.current) { + setStreamPort(directPort); + console.log(`[ab-panel] subscribing to returned stream port ${JSON.stringify({ session: sessionRef.current, wsPort: directPort, previousWsPort: wsPortRef.current })}`); + } + if (directPort !== wsPort) { + api.updateParameters({ wsPort: directPort }); + } else { + setStreamRecoverySeq((seq) => seq + 1); + } + return true; + } + + const currentSession = sessionRef.current; + const platform = getPlatform(); + if (!currentSession || !platform.agentBrowserStreamStatus) { + setStreamRecoverySeq((seq) => seq + 1); + return false; + } + + try { + const res = await platform.agentBrowserStreamStatus(currentSession, binaryPathRef.current); + if (closeIfSessionMarkedClosed(currentSession)) return false; + if (!res.ok || !res.wsPort) return false; + if (res.wsPort !== wsPortRef.current) { + setStreamPort(res.wsPort); + api.updateParameters({ wsPort: res.wsPort }); + } else { + setStreamRecoverySeq((seq) => seq + 1); + } + return true; + } catch { + return false; + } + }, [api, closeIfSessionMarkedClosed, wsPort]); // --- display: crisp HiDPI screenshots, paced by stream-frame "pulses" --- // @@ -169,15 +323,6 @@ export function AgentBrowserPanel({ api, params }: IDockviewPanelProps<AgentBrow const screenshotCapableRef = useRef(false); screenshotCapableRef.current = !!getPlatform().agentBrowserScreenshot && !!session; - const screenshotLoop = useMemo(() => createScreenshotLoop({ - getSession: () => sessionRef.current, - getBinaryPath: () => binaryPathRef.current, - isCapable: () => screenshotCapableRef.current, - draw: drawBitmap, - }), [drawBitmap]); - - useEffect(() => () => screenshotLoop.dispose(), [screenshotLoop]); - // --- screen indicator (SYNCED/SCALED) + sync-to-pane --- // // A fresh surface auto-engages sync (no persisted flag); a re-attached one @@ -204,7 +349,8 @@ export function AgentBrowserPanel({ api, params }: IDockviewPanelProps<AgentBrow // DPR can't be read back from frames, so report the density we'd sync to. const viewport = { w: device.width, h: device.height, dpr: displayDpr }; const state: ScreenState = dimsMatch(viewport, paneCss) ? 'SYNCED' : 'SCALED'; - return { state, viewport, paneCss, displayDpr, syncEngaged: syncEngagedRef.current }; + const renderMode = poppedOutRef.current ? 'ab-popout' : 'ab-screencast'; + return { state, viewport, paneCss, displayDpr, syncEngaged: syncEngagedRef.current, renderMode }; }, []); // Publish to the registry only when something the header/modal cares about @@ -220,6 +366,7 @@ export function AgentBrowserPanel({ api, params }: IDockviewPanelProps<AgentBrow prev.viewport.dpr !== next.viewport.dpr || prev.displayDpr !== next.displayDpr || prev.syncEngaged !== next.syncEngaged || + prev.renderMode !== next.renderMode || !dimsMatch(prev.paneCss, next.paneCss); if (changed) { lastPublishedRef.current = next; @@ -229,6 +376,10 @@ export function AgentBrowserPanel({ api, params }: IDockviewPanelProps<AgentBrow // Push the current pane size to the browser as a native `set viewport`. const issueSyncToPane = useCallback(() => { + // A popped-out surface is a real headed OS window the user drives directly; + // never force its viewport to the (now-stub) pane size. Sync resumes when it + // pops back in — the wsPort-change effect re-issues against the fresh session. + if (poppedOutRef.current) return; // Hosts without agentBrowserCommand (Tauri today) can't drive the viewport; // stay silent rather than warn on every resize. The surface just reads // SCALED, which is accurate. @@ -274,126 +425,197 @@ export function AgentBrowserPanel({ api, params }: IDockviewPanelProps<AgentBrow // --- stream connection --- useEffect(() => { - if (!wsPort) return; - let disposed = false; - let ws: WebSocket | null = null; - let retryTimer: ReturnType<typeof setTimeout> | undefined; - let failures = 0; - - knownTabIdsRef.current = new Set(); - - // Fallback only (hosts without the screenshot capability, e.g. Tauri): - // render the CSS-resolution screencast frame directly. - const drawScreencastFrame = (data: string) => { - const seq = ++frameSeqRef.current; - decodeScreencastFrame(data).then((bitmap) => { - // JPEG decodes are async; drop frames that finished out of order. - if (disposed || seq !== frameSeqRef.current) { - bitmap.close(); - return; + if (!streamPort || !session) return; + + // Per-connection resource: created here and disposed in this effect's cleanup + // so it survives React StrictMode's mount→cleanup→mount double-invoke. A + // memoized loop disposed by a separate effect's cleanup would never be + // recreated on the re-mount, leaving every frame pulse dropped (disposed loop). + const screenshotLoop = createScreenshotLoop({ + getSession: () => sessionRef.current, + getBinaryPath: () => binaryPathRef.current, + isCapable: () => screenshotCapableRef.current, + draw: drawBitmap, + }); + + const connection = createAgentBrowserConnection({ + session, + streamPort, + binaryPath: binaryPathRef.current, + getStreamUrl: async (port) => (await getPlatform().getAgentBrowserStreamUrl?.(port)) ?? undefined, + runCommand: (targetSession, args, targetBinaryPath) => getPlatform().agentBrowserCommand?.(targetSession, args, targetBinaryPath) + ?? Promise.resolve({ exitCode: 1, stdout: '', stderr: 'agent-browser commands unavailable' }), + canSelectTabs: () => !poppedOutRef.current && !relaunchingRef.current, + log: (message) => console.log(message), + }); + connectionRef.current = connection; + const unsubscribe = connection.subscribe((event) => { + if (event.type === 'connection-open') { + liveStreamPortRef.current = event.port; + setConnectionLost(false); + } else if (event.type === 'connection-close') { + if (event.failures >= 3) setConnectionLost(true); + } else if (event.type === 'status') { + setStatus(event.status); + setConnectionLost(event.status.connected === false); + const maybeStatus = event.status as StreamStatus & { viewportWidth?: number; viewportHeight?: number }; + if (typeof maybeStatus.viewportWidth === 'number' && typeof maybeStatus.viewportHeight === 'number') { + deviceRef.current = { width: maybeStatus.viewportWidth, height: maybeStatus.viewportHeight }; + maybeDisengageSync(); + publishScreen(); + } + } else if (event.type === 'tabs') { + const prevActiveId = event.previousTabs.find((t) => t.active)?.tabId; + const nextActiveId = event.tabs.find((t) => t.active)?.tabId; + rememberActiveTabUrl(event.tabs); + setTabs(event.tabs); + // Switching the active tab doesn't make the daemon emit a screencast + // frame, and the dedup'd stream is otherwise silent on a static page, so + // nothing would repaint the canvas onto the newly-active tab. Force one + // capture so the surface follows the tab the user just selected. + if (nextActiveId && nextActiveId !== prevActiveId && !poppedOutRef.current && !relaunchingRef.current) { + screenshotLoop.pulse(); } - drawBitmap(bitmap); - }).catch(() => {}); + } else if (event.type === 'frame-pulse') { + if (event.metadata?.deviceWidth && event.metadata?.deviceHeight) { + deviceRef.current = { width: event.metadata.deviceWidth, height: event.metadata.deviceHeight }; + } + maybeDisengageSync(); + publishScreen(); + if (!poppedOutRef.current && !relaunchingRef.current) screenshotLoop.pulse(); + } + }); + setHasFrame(false); + setConnectionLost(false); + return () => { + unsubscribe(); + if (connectionRef.current === connection) connectionRef.current = null; + connection.dispose(); + screenshotLoop.dispose(); }; + }, [streamPort, streamRecoverySeq, session, maybeDisengageSync, publishScreen, drawBitmap, rememberActiveTabUrl]); + + // agent-browser's stream currently publishes the initial headed tab list, but + // not every same-tab manual navigation. While popped out, subscribe directly + // to Chrome DevTools Protocol target/page events so the Dormouse URL/header + // tracks the headed window without polling. + useEffect(() => { + if (!poppedOut || !session) return; + const platform = getPlatform(); + // Bind to a local const: TS doesn't carry the narrowing of an optional + // property into the nested `connect` closure, so the direct call wouldn't + // typecheck against the `agentBrowserCommand?` signature. + const runCommand = platform.agentBrowserCommand; + if (!runCommand) return; + let disposed = false; + let ws: WebSocket | null = null; + let nextId = 1; - const handleMessage = (raw: unknown) => { + const send = (method: string, params?: Record<string, unknown>) => { + if (ws?.readyState === WebSocket.OPEN) ws.send(JSON.stringify({ id: nextId++, method, ...(params ? { params } : {}) })); + }; + const handleTargetInfo = (targetInfo: unknown) => { + if (!targetInfo || typeof targetInfo !== 'object') return; + const info = targetInfo as { type?: unknown; url?: unknown; title?: unknown }; + if (info.type !== 'page') return; + applyObservedNavigation( + typeof info.url === 'string' ? info.url : null, + typeof info.title === 'string' ? info.title : null, + ); + }; + const handleCdpMessage = (raw: unknown) => { if (typeof raw !== 'string') return; let msg: any; - try { - msg = JSON.parse(raw); - } catch { - return; - } - if (msg.type === 'frame' && typeof msg.data === 'string') { - // The frame carries the browser's live viewport; update it, reconcile - // sync, and refresh the indicator (publishScreen self-gates). Then use - // the frame as a "page changed" pulse to grab a crisp screenshot — or, - // where the host can't screenshot, render the frame itself. - if (msg.metadata?.deviceWidth && msg.metadata?.deviceHeight) { - deviceRef.current = { width: msg.metadata.deviceWidth, height: msg.metadata.deviceHeight }; - } - maybeDisengageSync(); - publishScreen(); - if (screenshotCapableRef.current) screenshotLoop.pulse(); - else drawScreencastFrame(msg.data); - } else if (msg.type === 'status') { - setStatus({ connected: msg.connected === true, screencasting: msg.screencasting === true }); - setConnectionLost(msg.connected === false); - if (typeof msg.viewportWidth === 'number' && typeof msg.viewportHeight === 'number') { - deviceRef.current = { width: msg.viewportWidth, height: msg.viewportHeight }; - maybeDisengageSync(); - publishScreen(); + try { msg = JSON.parse(raw); } catch { return; } + if (msg.method === 'Target.targetCreated' || msg.method === 'Target.targetInfoChanged') { + handleTargetInfo(msg.params?.targetInfo); + } else if (msg.method === 'Target.targetDestroyed') { + console.log(`[ab-panel] cdp target destroyed ${JSON.stringify({ targetId: msg.params?.targetId })}`); + } else if (msg.method === 'Page.frameNavigated') { + const frame = msg.params?.frame; + if (!frame?.parentId) { + applyObservedNavigation( + typeof frame?.url === 'string' ? frame.url : null, + typeof frame?.name === 'string' ? frame.name : null, + ); } - } else if (msg.type === 'tabs' && Array.isArray(msg.tabs)) { - const next: StreamTab[] = msg.tabs - .filter((t: any) => typeof t?.tabId === 'string') - .map((t: any) => ({ - tabId: t.tabId, - title: typeof t.title === 'string' ? t.title : null, - url: typeof t.url === 'string' ? t.url : '', - active: t.active === true, - })); - // Web-opened tabs (popups, target=_blank) are focused, matching - // browser foregrounding. Skip the first message — that's catch-up, - // not a popup. - const known = knownTabIdsRef.current; - if (known.size > 0) { - const fresh = next.filter((t) => !known.has(t.tabId)); - const newest = fresh[fresh.length - 1]; - if (newest && !newest.active) runAgentBrowser(['tab', newest.tabId]); - } - knownTabIdsRef.current = new Set(next.map((t) => t.tabId)); - setTabs(next); + } else if (Array.isArray(msg.result?.targetInfos)) { + for (const targetInfo of msg.result.targetInfos) handleTargetInfo(targetInfo); } }; const connect = async () => { - let url: string | null = null; + let cdpUrl: string | null = null; try { - url = (await getPlatform().getAgentBrowserStreamUrl?.(wsPort)) ?? null; - } catch { - url = null; + const result = await runCommand(session, ['get', 'cdp-url'], binaryPathRef.current); + if (result.exitCode === 0) cdpUrl = parseCdpUrl(result.stdout); + else console.log(`[ab-panel] cdp-url failed ${JSON.stringify({ stderr: result.stderr, stdout: result.stdout })}`); + } catch (err) { + console.log(`[ab-panel] cdp-url error ${String(err)}`); } - if (disposed) return; - ws = new WebSocket(url ?? `ws://127.0.0.1:${wsPort}`); - wsRef.current = ws; + if (disposed || !cdpUrl) return; + console.log(`[ab-panel] connecting cdp ${JSON.stringify({ cdpUrl })}`); + ws = new WebSocket(cdpUrl); ws.onopen = () => { - failures = 0; - setConnectionLost(false); + console.log('[ab-panel] cdp open'); + send('Target.setDiscoverTargets', { discover: true }); + send('Target.getTargets'); + // If get cdp-url ever returns a page websocket instead of the browser + // websocket, these page-level events are the navigation source. + send('Page.enable'); }; - ws.onmessage = (ev) => { - // Fast-path discarded frames: any large message is a screencast frame - // whose pixels we don't display, so pulse without parsing the payload. - const data = ev.data; - if (screenshotCapableRef.current && typeof data === 'string' && data.length > FRAME_PULSE_THRESHOLD) { - screenshotLoop.pulse(); - return; - } - handleMessage(data); - }; - ws.onclose = () => { - wsRef.current = null; - if (disposed) return; - failures += 1; - // The port dies with the session, so repeated refusals mean the - // browser is gone; keep a slow retry alive in case it comes back on - // the same port, but a new `dor ab` updating wsPort is the real path. - if (failures >= 3) setConnectionLost(true); - retryTimer = setTimeout(connect, Math.min(1000 * 2 ** failures, 10000)); - }; - ws.onerror = () => {}; + ws.onmessage = (ev) => handleCdpMessage(ev.data); + ws.onclose = () => { if (!disposed) console.log('[ab-panel] cdp close'); }; + ws.onerror = () => console.log('[ab-panel] cdp error'); }; - setHasFrame(false); - setConnectionLost(false); void connect(); return () => { disposed = true; - if (retryTimer !== undefined) clearTimeout(retryTimer); - wsRef.current = null; ws?.close(); }; - }, [wsPort, runAgentBrowser, maybeDisengageSync, publishScreen, screenshotLoop, drawBitmap]); + }, [poppedOut, session, streamPort, applyObservedNavigation]); + + // A persisted panel may restore with a stale wsPort: the agent-browser + // session is still alive, but the stream server restarted on a new port while + // VS Code/webview state kept the old one. Once the old socket is proven dead + // (or no port was persisted), ask the host for the current port and rewrite + // panel params so the normal WebSocket effect reconnects. + useEffect(() => { + if (!session) return; + // Critical: do NOT query the daemon mid-relaunch. A pop-out/pop-in close+kills + // the daemon before reopening; querying `stream status` in that window spawns + // a fresh COMPETING headless daemon on a different port and pins the panel to + // it — so the panel ends up streaming an about:blank ghost instead of the + // headed window. The host hands back the authoritative port when it's done. + if (relaunchingRef.current) return; + // Once this exact port has opened, a later disconnect is a live stream + // failure, not a stale persisted port. Do not ask `stream status` here: + // the CLI can spawn a fresh daemon and reset the session, hiding the real + // failure and reverting the URL. + if (streamPort && liveStreamPortRef.current === streamPort) { + if (connectionLost || status?.connected === false) { + console.log(`[ab-panel] stream recovery skipped for live port ${JSON.stringify({ session, wsPort: streamPort, connectionLost, connected: status?.connected })}`); + } + return; + } + if (streamPort && !connectionLost && status?.connected !== false) return; + const platform = getPlatform(); + if (!platform.agentBrowserStreamStatus) return; + let cancelled = false; + platform.agentBrowserStreamStatus(session, binaryPath).then((res) => { + if (cancelled || !res.ok || !res.wsPort) return; + setConnectionLost(false); + setStatus(null); + if (res.wsPort !== streamPort) { + setStreamPort(res.wsPort); + api.updateParameters({ wsPort: res.wsPort }); + } else { + setStreamRecoverySeq((seq) => seq + 1); + } + }).catch(() => {}); + return () => { cancelled = true; }; + }, [session, binaryPath, streamPort, connectionLost, status?.connected, api]); // --- header: persisted title + browser-chrome (URL / key) --- @@ -417,6 +639,13 @@ export function AgentBrowserPanel({ api, params }: IDockviewPanelProps<AgentBrow }; const chromeSnapshotRef = useRef(chromeSnapshot); chromeSnapshotRef.current = chromeSnapshot; + const currentRelaunchUrl = useCallback(() => { + return [ + latestRestorableUrlRef.current, + chromeSnapshotRef.current.url, + paramsUrlRef.current, + ].find(isRestorableUrl); + }, []); // Native history nav — `back`/`forward`/`reload` issued like tab actions // (allowlisted in agentBrowserCommand). Stable; reads the live closure via @@ -433,6 +662,82 @@ export function AgentBrowserPanel({ api, params }: IDockviewPanelProps<AgentBrow // Stable across renders (reads refs / stable setters), so the registered // controller never goes stale. Engaging always re-issues (clears lastIssued) // so it reclaims the viewport even from an external device. + // --- Headed Pop-Out: relaunch this session's browser as a native OS window. + // The pane becomes a stub; the stream stays connected to observe tabs/status + // and to auto-revert when the window closes. The new Chrome process gets a + // fresh stream port, which we write into params so the WS reconnects. --- + const popOut = useCallback(() => { + const platform = getPlatform(); + if (!session || !platform.agentBrowserPopOut) return; + if (closeIfSessionMarkedClosed(session)) return; + headedConnectedRef.current = false; + relaunchingRef.current = true; + setPoppedOut(true); + api.updateParameters({ renderMode: 'ab-popout' }); + // Pop-out failed: revert to in-pane unless the stream came back live anyway. + const revertUnlessLive = () => reconcileStreamPort().then((live) => { + relaunchingRef.current = false; + if (live) return; + setPoppedOut(false); + api.updateParameters({ renderMode: 'ab-screencast' }); + }); + // Don't reconcile to the current (headless) port first — it's about to close. + // Connect to the headed window's fresh port once the relaunch returns it. + const url = currentRelaunchUrl(); + console.log(`[ab-panel] popOut -> ${JSON.stringify({ session, url })}`); + platform.agentBrowserPopOut(session, { rect: paneScreenRect(elRef.current), url }, binaryPathRef.current).then((res) => { + console.log(`[ab-panel] popOut result ${JSON.stringify(res)}`); + if (closeIfSessionMarkedClosed(session)) return; + if (!res.ok) { + void revertUnlessLive(); + return; + } + void reconcileStreamPort(res.wsPort); + relaunchingRef.current = false; + }).catch((err) => { + console.log(`[ab-panel] popOut error ${String(err)}`); + if (closeIfSessionMarkedClosed(session)) return; + void revertUnlessLive(); + }); + }, [session, api, reconcileStreamPort, closeIfSessionMarkedClosed, currentRelaunchUrl]); + + const popIn = useCallback(() => { + if (closeIfSessionMarkedClosed(session)) return; + // Same expected mid-relaunch stream drop as pop-out: suppress screenshot + // pulses so none relaunches the just-closed browser at about:blank. + relaunchingRef.current = true; + setPoppedOut(false); + api.updateParameters({ renderMode: 'ab-screencast' }); + const platform = getPlatform(); + if (!session || !platform.agentBrowserPopIn) { relaunchingRef.current = false; return; } + // Don't reconcile to the current (headed) port first — the host is about to + // kill that daemon. Querying now would spawn a competing daemon (see the + // recovery-effect note). Connect to the fresh port the host returns. + const url = currentRelaunchUrl(); + console.log(`[ab-panel] popIn -> ${JSON.stringify({ session, url })}`); + platform.agentBrowserPopIn(session, { url }, binaryPathRef.current).then((res) => { + console.log(`[ab-panel] popIn result ${JSON.stringify(res)}`); + if (closeIfSessionMarkedClosed(session)) { relaunchingRef.current = false; return; } + if (res.ok) void reconcileStreamPort(res.wsPort); + else void reconcileStreamPort(); + relaunchingRef.current = false; + }).catch(() => { + if (closeIfSessionMarkedClosed(session)) { relaunchingRef.current = false; return; } + void reconcileStreamPort(); + relaunchingRef.current = false; + }); + }, [session, api, reconcileStreamPort, closeIfSessionMarkedClosed, currentRelaunchUrl]); + + const bringToFront = useCallback(() => { + if (!session) return; + getPlatform().agentBrowserBringToFront?.(session, binaryPathRef.current)?.catch(() => {}); + }, [session]); + + const popOutRef = useRef(popOut); + popOutRef.current = popOut; + const popInRef = useRef(popIn); + popInRef.current = popIn; + const screenActions = useMemo<ScreenActions>(() => ({ engageSync() { // Clear lastIssued so the issue below isn't skipped, and issue now rather @@ -455,6 +760,17 @@ export function AgentBrowserPanel({ api, params }: IDockviewPanelProps<AgentBrow openModal() { openAgentBrowserScreenModal(api.id); }, + setRenderMode(renderMode) { + // agent-browser → iframe is a render swap handled by the Wall; + // ab-screencast ↔ ab-popout relaunches this same session, handled in-panel. + if (renderMode === 'iframe') { + // The iframe renderer is single-frame: only the active tab survives. + // Warn + require a typed confirm when other tabs would be closed. + if (tabsRef.current.length >= 2) setPendingIframeSwap(true); + else actionsRef.current.onSwapRenderMode(api.id, 'iframe'); + } else if (renderMode === 'ab-popout') popOutRef.current(); + else if (poppedOutRef.current) popInRef.current(); // ab-popout → ab-screencast + }, }), [api.id, issueSyncToPane]); useEffect(() => { @@ -464,6 +780,7 @@ export function AgentBrowserPanel({ api, params }: IDockviewPanelProps<AgentBrow chrome: chromeSnapshotRef.current, chromeActions, hostCapable: !!getPlatform().agentBrowserCommand, + canPopOut: !!getPlatform().agentBrowserPopOut, }); registrationRef.current = registration; lastPublishedRef.current = null; @@ -482,6 +799,56 @@ export function AgentBrowserPanel({ api, params }: IDockviewPanelProps<AgentBrow // displayUrl is a pure function of url, so url covers it. }, [chromeSnapshot.url, chromeSnapshot.title, chromeSnapshot.key]); + // Mirror the active tab's URL into params so it persists in the layout blob and + // render-mode swaps / pop-out have a canonical URL even when the live stream is + // momentarily without an active tab (see dor-browser.md → "Canonical Params": + // `url` is the canonical target). Non-empty changes only; the write sets params.url === url so + // it does not re-fire. + useEffect(() => { + const url = chromeSnapshot.url; + // Track the active tab faithfully so url is always the page the user is on — + // this is the source of truth the relaunch (pop-out/pop-in/auto-revert) + // reads. Two guards: freeze while a relaunch is in flight (the active tab is + // momentarily a blank/booting page that must not overwrite the real target), + // and never record a transient about:blank. + if (!relaunchingRef.current && isRestorableUrl(url) && url !== paramsUrlRef.current) { + latestRestorableUrlRef.current = url; + paramsUrlRef.current = url; + api.updateParameters({ url }); + } + }, [chromeSnapshot.url, api]); + + // Push the render-mode flip (screencast ↔ popout) to the header/modal. + useEffect(() => { publishScreen(); }, [poppedOut, publishScreen]); + + // This surface owns its session again — clear any teardown mark a prior + // surface (re-using the same managed name) left behind, so auto-revert works. + useEffect(() => { + if (session) clearAgentBrowserSessionClosed(session); + }, [session]); + + // Auto-revert: once the headed stream has connected, a later disconnect means + // the window closed → relaunch headless and resume streaming (spec → Lifecycle). + // But a disconnect also happens when Dormouse itself closes the session (pane + // kill, or a render-swap away from popout); the closed-session mark tells those + // apart so we don't resurrect a session that's being torn down. + useEffect(() => { + if (!poppedOut) { headedConnectedRef.current = false; return; } + // The expected mid-relaunch drop isn't the window closing — ignore it. + if (relaunchingRef.current) return; + if (status?.connected === true) headedConnectedRef.current = true; + else if (headedConnectedRef.current && (status?.connected === false || connectionLost)) { + if (sessionRef.current && isAgentBrowserSessionClosed(sessionRef.current)) return; + popInRef.current(); + } + }, [poppedOut, status?.connected, connectionLost]); + + // Focus the swap-confirm overlay when it appears so it captures the typed + // confirm/cancel keys (the pane's key-forwarder skips in-pane targets). + useEffect(() => { + if (pendingIframeSwap) swapConfirmRef.current?.focus(); + }, [pendingIframeSwap]); + // Persist sync state into the panel params so it round-trips through the // dockview layout blob (and survives reattach). Skip no-op writes. useEffect(() => { @@ -522,10 +889,10 @@ export function AgentBrowserPanel({ api, params }: IDockviewPanelProps<AgentBrow // essential — it otherwise still holds the previous session's pane size and // issueSyncToPane would no-op, leaving the fresh browser unsynced (SCALED). useEffect(() => { - if (!wsPort || !syncEngagedRef.current) return; + if (!streamPort || !syncEngagedRef.current) return; lastIssuedRef.current = null; issueSyncToPane(); - }, [wsPort, issueSyncToPane]); + }, [streamPort, issueSyncToPane]); // Display-scale (DPR) changes don't resize the pane, so ResizeObserver misses // them; a window resize is the available signal. Recompute the indicator and, @@ -542,8 +909,7 @@ export function AgentBrowserPanel({ api, params }: IDockviewPanelProps<AgentBrow // --- input forwarding (stream-native input_* messages) --- const send = useCallback((payload: Record<string, unknown>) => { - const ws = wsRef.current; - if (ws && ws.readyState === WebSocket.OPEN) ws.send(JSON.stringify(payload)); + connectionRef.current?.send(payload); }, []); const toDevice = useCallback((e: { clientX: number; clientY: number }): { x: number; y: number } | null => { @@ -581,7 +947,7 @@ export function AgentBrowserPanel({ api, params }: IDockviewPanelProps<AgentBrow }; const onCanvasMouseDown = (e: React.MouseEvent) => { - if (!interactiveRef.current) return; + if (!passthroughRef.current) return; // preventDefault stops the browser's focus-shift default action (a click // on a non-focusable canvas would otherwise blur to <body>), and the // explicit focus claims keystrokes for this pane. @@ -603,7 +969,9 @@ export function AgentBrowserPanel({ api, params }: IDockviewPanelProps<AgentBrow }; const onCanvasMouseUp = (e: React.MouseEvent) => { - if (!interactiveRef.current) return; + // Pair with onCanvasMouseDown: gate on passthrough (not full `interactive`) + // so the release of a first, pane-selecting click still completes the click. + if (!passthroughRef.current) return; e.preventDefault(); const point = toDevice(e); if (!point) return; @@ -791,7 +1159,7 @@ export function AgentBrowserPanel({ api, params }: IDockviewPanelProps<AgentBrow // --- placeholder state --- const placeholder = (() => { - if (!wsPort) return `Waiting for browser session ${session ?? ''} — run dor ab open <url>`; + if (!streamPort) return `Waiting for browser session ${session ?? ''} — run dor ab open <url>`; if (connectionLost || status?.connected === false) { return `Browser session ${session ?? ''} ended — run dor ab open <url> to restart it, or close this surface.`; } @@ -846,9 +1214,11 @@ export function AgentBrowserPanel({ api, params }: IDockviewPanelProps<AgentBrow </div> )} <div ref={viewportRef} className="relative flex min-h-0 flex-1 items-center justify-center"> + {/* Canvas stays mounted across pop-out (its listeners keep their element) + — just hidden under the stub while a headed window renders instead. */} <canvas ref={canvasRef} - className={clsx('block max-h-full max-w-full select-none', !hasFrame && 'hidden')} + className={clsx('block max-h-full max-w-full select-none', (!hasFrame || poppedOut) && 'hidden')} onMouseDown={onCanvasMouseDown} onMouseUp={onCanvasMouseUp} onMouseMove={onCanvasMouseMove} @@ -856,8 +1226,64 @@ export function AgentBrowserPanel({ api, params }: IDockviewPanelProps<AgentBrow if (interactiveRef.current) e.preventDefault(); }} /> - {placeholder && ( + {poppedOut ? ( + // Popped out to a headed OS window — the pane is a clean stub. + <div className="flex flex-col items-center gap-3 px-4 text-center text-sm text-muted"> + <div>This browser is running in a separate window.</div> + <div className="flex gap-2 text-xs"> + {getPlatform().agentBrowserBringToFront && ( + <button + type="button" + onMouseDown={(e) => e.stopPropagation()} + onClick={(e) => { + e.stopPropagation(); + bringToFront(); + }} + className="rounded border border-border px-2.5 py-1 text-muted transition-colors hover:border-foreground hover:text-foreground" + > + Bring to front + </button> + )} + <button + type="button" + onMouseDown={(e) => e.stopPropagation()} + onClick={(e) => { + e.stopPropagation(); + popIn(); + }} + className="rounded border border-border px-2.5 py-1 text-muted transition-colors hover:border-foreground hover:text-foreground" + > + Pop back in + </button> + </div> + </div> + ) : placeholder ? ( <div className="px-4 text-center text-sm text-muted">{placeholder}</div> + ) : null} + {pendingIframeSwap && ( + <div + ref={swapConfirmRef} + tabIndex={-1} + className="absolute inset-0 z-10 flex flex-col items-center justify-center gap-3 bg-terminal-bg/95 px-6 text-center outline-none" + onMouseDown={(e) => e.stopPropagation()} + onKeyDown={(e) => { + e.stopPropagation(); + if (e.key === 'c' || e.key === 'C') { + setPendingIframeSwap(false); + actions.onSwapRenderMode(api.id, 'iframe'); + } else if (e.key === 'Escape') { + setPendingIframeSwap(false); + } + }} + > + <div className="max-w-sm text-sm text-foreground"> + Switching to the iframe renderer keeps only the active tab.{' '} + <span className="font-semibold">{Math.max(0, tabs.length - 1)} other tab{tabs.length - 1 === 1 ? '' : 's'}</span> will be closed. + </div> + <div className="text-xs text-muted"> + Press <kbd className="rounded bg-app-bg px-1 py-0.5 font-mono">c</kbd> to continue · <kbd className="rounded bg-app-bg px-1 py-0.5 font-mono">Esc</kbd> to cancel + </div> + </div> )} </div> </div> diff --git a/lib/src/components/wall/AgentBrowserScreenModal.tsx b/lib/src/components/wall/AgentBrowserScreenModal.tsx index 816d993c..9a8c202a 100644 --- a/lib/src/components/wall/AgentBrowserScreenModal.tsx +++ b/lib/src/components/wall/AgentBrowserScreenModal.tsx @@ -1,20 +1,34 @@ /** - * Screen / viewport modal for an agent-browser surface - * (docs/specs/dor-agent-browser.md → "Screen Indicator & Viewport → The - * modal"). It is purely a GUI front-end for native `agent-browser set - * viewport` / `set device`, plus the one Dormouse-side concept, *Sync to pane*. + * Display modal for a web surface (docs/specs/dor-browser.md → "Render + * indicator & the Display modal"; docs/specs/dor-browser.md → "Render-mode + * transitions"). Opened from the header's far-left chip, it is the + * single place that owns *how* a surface renders: * - * Three mutually exclusive targets: - * - Sync to pane → engageSync() (auto-issues `set viewport <pane>` on resize) - * - Device → applyDevice() (fixed registry; bundles viewport+DPR+touch+UA) - * - Custom → applyViewport() + * - Render — swap the backend in place, preserving the target: + * `agent-browser screencast`, `agent-browser popout` (relaunch headed as a + * native OS window), or `iframe embed`. Each lists its agent/URL/feel + * trade-offs. Shown only when the controller wires `setRenderMode`; the + * popout option is gated on `canPopOut` (hidden on web). + * - Resolution — the screencast viewport: *Resize with pane* (linked to the + * pane) or *Fixed* (a specific resolution chosen via Device or Custom). + * Specific to screencast, so it nests under that option and greys out + * whenever a different render mode is selected. * * It reads the live snapshot on open and pre-selects accordingly, reflecting * reality rather than a stored intent. */ -import { useMemo, useRef, useState } from 'react'; +import { useMemo, useRef, useState, type ReactNode } from 'react'; +import { + ArrowSquareOutIcon, + CheckIcon, + FrameCornersIcon, + type Icon, + LinkIcon, + LockSimpleIcon, + XIcon, +} from '@phosphor-icons/react'; import { ModalCloseButton, ModalFrame, modalActionButton } from '../design'; -import type { ScreenController, ScreenSnapshot } from './agent-browser-screen'; +import type { RenderMode, ScreenController, ScreenSnapshot } from './agent-browser-screen'; import { useAgentBrowserScreenSnapshot } from './agent-browser-screen'; // Fixed registry — the CLI's own device set. No custom descriptors; touch + @@ -32,10 +46,6 @@ const DEVICES = [ type Target = 'sync' | 'device' | 'custom'; -function formatDpr(dpr: number): string { - return `${Number.isInteger(dpr) ? dpr : Math.round(dpr * 100) / 100}x`; -} - export function AgentBrowserScreenModal({ controller, label, @@ -65,6 +75,25 @@ export function AgentBrowserScreenModal({ const [customH, setCustomH] = useState(String(initial?.viewport.h ?? 720)); const [customDpi, setCustomDpi] = useState(String(initial?.viewport.dpr ?? 1)); + // Render backend (Path 1 + Headed Pop-Out). The Render section only appears + // when the surface wires `setRenderMode` (the swap is wired); otherwise the + // modal is the plain screencast viewport modal it has always been. + const currentMode: RenderMode = snapshot?.renderMode ?? 'ab-screencast'; + const canSwapRender = !!controller.actions.setRenderMode; + const [renderMode, setRenderMode] = useState<RenderMode>(currentMode); + // Pop-out is a render mode, gated per host/platform (hidden on web). + const canPopOut = controller.canPopOut ?? false; + // Only the screencast backend has a Dormouse-settable viewport; pop-out is a + // native OS window and embed renders at the pane size, so both grey it out. + const viewportDisabled = renderMode !== 'ab-screencast'; + // Whether Apply changes the render backend (vs only tweaking the current + // screencast's viewport). A swap is gated on whether its option is shown, not + // on the viewport-drive capability below. + const switchingMode = renderMode !== currentMode; + // Within screencast, the resolution is either linked to the pane (resize with + // pane) or fixed — Device/Custom are the two ways to pick the fixed size. + const isFixed = target === 'device' || target === 'custom'; + const customValid = useMemo(() => { const w = Number(customW); const h = Number(customH); @@ -72,18 +101,96 @@ export function AgentBrowserScreenModal({ return Number.isInteger(w) && w > 0 && Number.isInteger(h) && h > 0 && dpi > 0 && Number.isFinite(dpi); }, [customW, customH, customDpi]); - const applyDisabled = !hostCapable || (target === 'custom' && !customValid); + // Apply gating splits three ways: + // - non-screencast target (embed/popout): no viewport to set; the swap is + // the action, gated only on its option being shown — always enabled. + // - swapping TO screencast (from embed/popout): spawns a fresh session that + // drives its own viewport, so the *current* surface's viewport-drive + // capability is irrelevant — always enabled. (This is the embed→screencast + // bug: an embed surface reports hostCapable:false, which used to dead-lock + // Apply even though switching needs only the spawn capability.) + // - staying on screencast (tweaking the viewport): needs the host to drive + // `set viewport`, and a valid custom size. + const applyDisabled = + viewportDisabled || switchingMode + ? false + : (!hostCapable || (target === 'custom' && !customValid)); const apply = () => { if (applyDisabled) return; - if (target === 'sync') controller.actions.engageSync(); - else if (target === 'device') controller.actions.applyDevice(device); - else controller.actions.applyViewport(Number(customW), Number(customH), Number(customDpi)); + if (switchingMode) { + // A mode swap; the viewport sub-controls don't apply to the outgoing + // surface (and are inert on embed/popout controllers anyway). + controller.actions.setRenderMode?.(renderMode); + } else if (renderMode === 'ab-screencast') { + if (target === 'sync') controller.actions.engageSync(); + else if (target === 'device') controller.actions.applyDevice(device); + else controller.actions.applyViewport(Number(customW), Number(customH), Number(customDpi)); + } onClose(); }; - const vp = snapshot?.viewport; - const pane = snapshot?.paneCss; + // Screencast resolution controls: Resize with pane (viewport linked to the + // pane) vs a Fixed resolution chosen via Device or Custom. Rendered nested + // under the screencast render option (or standalone when the surface can't + // swap render mode), and greyed whenever the active mode isn't screencast. + const viewportControls = ( + <fieldset disabled={viewportDisabled} className={viewportDisabled ? 'opacity-40' : undefined}> + <div className="text-xs font-semibold tracking-wide text-muted uppercase">Resolution</div> + <div className="mt-2 flex flex-col gap-3 text-sm"> + <label className="flex cursor-pointer items-center gap-2"> + <input + type="radio" + name="screen-target" + checked={target === 'sync'} + onChange={() => setTarget('sync')} + /> + <LinkIcon size={14} className="shrink-0 text-muted" /> + <span className="text-foreground">Resize with pane</span> + </label> + + <div className="flex flex-col gap-2"> + <div className="flex items-center gap-3"> + <label className="flex cursor-pointer items-center gap-2"> + <input + type="radio" + name="screen-target" + checked={isFixed} + onChange={() => setTarget('custom')} + /> + <LockSimpleIcon size={14} className="shrink-0 text-muted" /> + <span className="text-foreground">Fixed</span> + </label> + {/* Dimensions inline; or pick a device via Emulate below (emulating + disables the dims — they fill in from the next frames). */} + <div className="flex items-center gap-2"> + <DimInput label="W" chars={4} value={customW} disabled={target === 'device'} onChange={setCustomW} onFocus={() => setTarget('custom')} /> + <DimInput label="H" chars={4} value={customH} disabled={target === 'device'} onChange={setCustomH} onFocus={() => setTarget('custom')} /> + <DimInput label="DPI" chars={1} value={customDpi} disabled={target === 'device'} onChange={setCustomDpi} onFocus={() => setTarget('custom')} /> + </div> + </div> + <label className="ml-6 flex items-center gap-2 text-xs text-muted"> + <span>Emulate</span> + <select + value={target === 'device' ? device : ''} + onChange={(e) => { + const name = e.target.value; + if (name) { setTarget('device'); setDevice(name); } + else setTarget('custom'); + }} + title="touch + mobile UA" + className="rounded border border-border bg-app-bg px-1.5 py-1 font-mono text-foreground outline-none focus:border-focus-ring" + > + <option value="">none</option> + {DEVICES.map((name) => ( + <option key={name} value={name}>{name}</option> + ))} + </select> + </label> + </div> + </div> + </fieldset> + ); return ( <ModalFrame @@ -92,7 +199,7 @@ export function AgentBrowserScreenModal({ backdrop="strong" elevation="modal" overlayClassName="px-4 py-6" - className="w-full max-w-[30rem]" + className="max-h-[85vh] w-full max-w-[30rem] overflow-y-auto" initialFocusRef={cancelRef} onEscape={onClose} > @@ -101,91 +208,49 @@ export function AgentBrowserScreenModal({ id="agent-browser-screen-modal-title" className="min-w-0 flex-1 text-sm leading-5 text-foreground" > - Screen — <span className="font-semibold">{label}</span> + Display — <span className="font-semibold">{label}</span> </h2> <ModalCloseButton onClick={onClose} /> </div> - {snapshot && vp && pane && ( - <div className="mt-3 text-xs text-muted"> - Currently <span className="font-semibold text-foreground">{snapshot.state}</span> - <div className="mt-0.5 font-mono"> - browser {vp.w}×{vp.h} - {' · '} - pane {pane.w}×{pane.h} @{formatDpr(snapshot.displayDpr)} - </div> - </div> - )} - - <div className="mt-4 flex flex-col gap-3 text-sm"> - <label className="flex cursor-pointer items-start gap-2"> - <input - type="radio" - name="screen-target" - className="mt-0.5" - checked={target === 'sync'} - onChange={() => setTarget('sync')} - /> - <span className="min-w-0"> - <span className="text-foreground">Sync to pane</span> - <span className="mt-0.5 block text-xs text-muted"> - viewport follows the pane, pixel-for-pixel - {pane ? ` → now: ${pane.w}×${pane.h} @${formatDpr(snapshot?.displayDpr ?? 1)}` : ''} - </span> - </span> - </label> + {canSwapRender ? ( + <div className="mt-4 flex flex-col gap-3"> + {/* Screencast has no mode icon of its own — its two resolution modes + (resize-with-pane / fixed) carry the link / lock glyphs, and the + resolution controls nest under it, greying out for the other modes. */} + <RenderOption + checked={renderMode === 'ab-screencast'} + onSelect={() => setRenderMode('ab-screencast')} + label="agent-browser screencast" + features={[[true, 'agents can read/write'], [true, 'any URL'], [false, 'laggy for humans']]} + > + <div className="ml-6 mt-2">{viewportControls}</div> + </RenderOption> - <label className="flex cursor-pointer items-start gap-2"> - <input - type="radio" - name="screen-target" - className="mt-0.5" - checked={target === 'device'} - onChange={() => setTarget('device')} - /> - <span className="min-w-0 flex-1"> - <span className="text-foreground">Device</span> - <span className="ml-2 text-xs text-muted">emulates touch + mobile UA</span> - <span className="mt-1.5 grid grid-cols-2 gap-1"> - {DEVICES.map((name) => ( - <button - key={name} - type="button" - onClick={() => { setTarget('device'); setDevice(name); }} - className={`rounded border px-2 py-1 text-left text-xs transition-colors ${ - target === 'device' && device === name - ? 'border-focus-ring bg-header-inactive-bg text-foreground' - : 'border-border text-muted hover:text-foreground' - }`} - > - {name} - </button> - ))} - </span> - <span className="mt-1 block text-xs text-muted"> - dimensions fill in after applying - </span> - </span> - </label> + {canPopOut && ( + <RenderOption + checked={renderMode === 'ab-popout'} + onSelect={() => setRenderMode('ab-popout')} + icon={ArrowSquareOutIcon} + label="agent-browser popout" + features={[[true, 'agents can read/write'], [true, 'any URL'], [true, 'native human experience']]} + /> + )} - <label className="flex cursor-pointer items-start gap-2"> - <input - type="radio" - name="screen-target" - className="mt-0.5" - checked={target === 'custom'} - onChange={() => setTarget('custom')} + <RenderOption + checked={renderMode === 'iframe'} + onSelect={() => setRenderMode('iframe')} + icon={FrameCornersIcon} + label="iframe embed" + features={[[false, 'agents cannot read/write'], [false, 'localhost only'], [true, 'native human experience']]} /> - <span className="flex min-w-0 flex-1 flex-wrap items-center gap-2"> - <span className="text-foreground">Custom</span> - <DimInput label="W" value={customW} onChange={setCustomW} onFocus={() => setTarget('custom')} /> - <DimInput label="H" value={customH} onChange={setCustomH} onFocus={() => setTarget('custom')} /> - <DimInput label="DPI" value={customDpi} onChange={setCustomDpi} onFocus={() => setTarget('custom')} /> - </span> - </label> - </div> + </div> + ) : ( + // No render swap wired: the legacy plain screencast resolution modal. + <div className="mt-4">{viewportControls}</div> + )} - {!hostCapable && ( + {!hostCapable && !viewportDisabled && !switchingMode && ( <p className="mt-3 text-xs text-muted"> This host can't drive the browser viewport; run <span className="font-mono">dor ab set …</span> from a terminal instead. @@ -214,27 +279,80 @@ export function AgentBrowserScreenModal({ ); } +/** One render-backend option: a radio + optional mode icon + label, then its + * agent/URL/feel trade-offs. Screencast passes its nested resolution controls + * as children. */ +function RenderOption({ + checked, + onSelect, + icon: ModeIcon, + label, + features, + children, +}: { + checked: boolean; + onSelect: () => void; + icon?: Icon; + label: string; + features: [boolean, string][]; + children?: ReactNode; +}) { + return ( + <div className="flex flex-col gap-1.5 text-sm"> + <label className="flex cursor-pointer items-center gap-2"> + <input type="radio" name="render-mode" checked={checked} onChange={onSelect} /> + {ModeIcon && <ModeIcon size={14} className="shrink-0 text-muted" />} + <span className="text-foreground">{label}</span> + </label> + <div className="ml-6 flex flex-col gap-0.5 text-xs"> + {features.map(([ok, text]) => <Feature key={text} ok={ok}>{text}</Feature>)} + </div> + {children} + </div> + ); +} + +/** One trade-off line for a render mode: a green check (has the property) or a + * red x (lacks it), then the label. Matches the user's agent/URL/feel matrix. */ +function Feature({ ok, children }: { ok?: boolean; children: ReactNode }) { + return ( + <span className="flex items-center gap-1.5 text-muted"> + {ok + ? <CheckIcon size={12} weight="bold" className="shrink-0 text-success" /> + : <XIcon size={12} weight="bold" className="shrink-0 text-error" />} + {children} + </span> + ); +} + function DimInput({ label, value, onChange, onFocus, + disabled, + chars = 4, }: { label: string; value: string; onChange: (next: string) => void; onFocus: () => void; + disabled?: boolean; + /** Max digits the field holds — sizes the box so W/H/DPI stay compact. */ + chars?: number; }) { return ( - <span className="inline-flex items-center gap-1 text-xs text-muted"> + <span className={`inline-flex items-center gap-1 text-xs text-muted ${disabled ? 'opacity-50' : ''}`}> {label} <input type="text" inputMode="numeric" value={value} + disabled={disabled} onFocus={onFocus} onChange={(e) => onChange(e.target.value.replace(/[^0-9.]/g, ''))} - className="w-16 rounded border border-border bg-app-bg px-1.5 py-1 font-mono text-foreground outline-none focus:border-focus-ring" + style={{ width: `calc(${chars}ch + 0.5rem)` }} + className="border-0 border-b border-border bg-transparent px-0.5 py-0.5 font-mono text-foreground outline-none focus:border-focus-ring" /> </span> ); diff --git a/lib/src/components/wall/BrowserPanel.tsx b/lib/src/components/wall/BrowserPanel.tsx new file mode 100644 index 00000000..3c94e4a1 --- /dev/null +++ b/lib/src/components/wall/BrowserPanel.tsx @@ -0,0 +1,50 @@ +/** + * The single dockview component for every browser surface (docs/specs/dor-browser.md + * → "Display Modal And Render Swaps"). + * + * One surface, swappable renderer: it reads the canonical `renderMode` and mounts + * the matching child — `IframePanel` for `iframe`, `AgentBrowserPanel` for + * `ab-screencast` / `ab-popout`. The two children stay separate components (their + * input models differ — CDP `input_*` messages vs native DOM); the shell only owns + * the renderer choice. The browser chrome each child registers is keyed by + * `api.id`, so the shared header/modal are unaffected by which child is mounted. + */ +import { useEffect } from 'react'; +import type { IDockviewPanelProps } from 'dockview-react'; +import type { RenderMode } from './agent-browser-screen'; +import { resolveRenderMode } from './browser-surface'; +import { AgentBrowserPanel } from './AgentBrowserPanel'; +import { IframePanel } from './IframePanel'; + +/** Canonical persisted state for a browser surface. `renderMode` + `url` are the + * single source of truth across swaps; the agent-browser fields ride flat and are + * present only for `ab-*` modes. */ +export type BrowserPanelParams = { + surfaceType?: string; + renderMode?: RenderMode; + url?: string; + session?: string; + key?: string; + wsPort?: number; + binaryPath?: string; + syncEngaged?: boolean; + /** Legacy: surfaces persisted before `renderMode` existed stored pop-out as a + * boolean alongside surfaceType 'iframe' | 'agent-browser'. Migrated below. */ + poppedOut?: boolean; +}; + +export function BrowserPanel(props: IDockviewPanelProps<BrowserPanelParams>) { + const { api, params } = props; + const renderMode = resolveRenderMode(params); + + // Canonicalize a legacy layout once: write renderMode + surfaceType:'browser' + // so later reads and persistence use the unified shape (the children read + // renderMode from the prop below, so this is purely for the persisted blob). + useEffect(() => { + if (params?.renderMode === renderMode && params?.surfaceType === 'browser') return; + api.updateParameters({ renderMode, surfaceType: 'browser' }); + }, [api, params?.renderMode, params?.surfaceType, renderMode]); + + if (renderMode === 'iframe') return <IframePanel {...props} />; + return <AgentBrowserPanel {...props} renderMode={renderMode} />; +} diff --git a/lib/src/components/wall/IframePanel.test.tsx b/lib/src/components/wall/IframePanel.test.tsx index 2895f225..b1ecbfdc 100644 --- a/lib/src/components/wall/IframePanel.test.tsx +++ b/lib/src/components/wall/IframePanel.test.tsx @@ -1,12 +1,14 @@ /** * @vitest-environment jsdom */ -import { act } from 'react'; +import { act, StrictMode } from 'react'; import { createRoot, type Root } from 'react-dom/client'; import type { IDockviewPanelProps } from 'dockview-react'; import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest'; import { FakePtyAdapter, setPlatform } from '../../lib/platform'; +import type { PlatformAdapter } from '../../lib/platform/types'; import { IframePanel } from './IframePanel'; +import { getAgentBrowserScreenController } from './agent-browser-screen'; import { WallActionsContext, type WallActions } from './wall-context'; globalThis.IS_REACT_ACT_ENVIRONMENT = true; @@ -25,13 +27,14 @@ function stubActions(overrides: Partial<WallActions> = {}): WallActions { onStartRename: vi.fn(), onFinishRename: vi.fn(() => ({ accepted: true })), onCancelRename: vi.fn(), + onSwapRenderMode: vi.fn(), ...overrides, }; } -function panelProps(id: string): IDockviewPanelProps<{ url: string }> { +function panelProps(id: string, updateParameters = vi.fn()): IDockviewPanelProps<{ url: string }> { return { - api: { id, title: 'Raw iframe' }, + api: { id, title: 'Raw iframe', updateParameters, setTitle: vi.fn() }, params: { url: 'http://example.test/app' }, } as unknown as IDockviewPanelProps<{ url: string }>; } @@ -52,12 +55,14 @@ afterEach(() => { vi.restoreAllMocks(); }); -async function renderPanel(actions: WallActions): Promise<HTMLIFrameElement> { +async function renderPanel(actions: WallActions, props = panelProps('iframe-raw')): Promise<HTMLIFrameElement> { await act(async () => { root.render( - <WallActionsContext.Provider value={actions}> - <IframePanel {...panelProps('iframe-raw')} /> - </WallActionsContext.Provider>, + <StrictMode> + <WallActionsContext.Provider value={actions}> + <IframePanel {...props} /> + </WallActionsContext.Provider> + </StrictMode>, ); }); @@ -96,4 +101,83 @@ describe('IframePanel', () => { expect(onClickPanel).not.toHaveBeenCalled(); }); + + it('drives iframe back and forward from the registered chrome actions', async () => { + const updateParameters = vi.fn(); + const platform = new FakePtyAdapter() as FakePtyAdapter & Pick<PlatformAdapter, 'agentBrowserOpen'>; + platform.agentBrowserOpen = vi.fn(); + setPlatform(platform); + await renderPanel(stubActions(), panelProps('iframe-history', updateParameters)); + + await act(async () => { + getAgentBrowserScreenController('iframe-history')?.chromeActions.navigate('http://example.test/one'); + }); + await act(async () => { + getAgentBrowserScreenController('iframe-history')?.chromeActions.navigate('http://example.test/two'); + }); + await act(async () => { + getAgentBrowserScreenController('iframe-history')?.chromeActions.back(); + }); + expect(updateParameters).toHaveBeenLastCalledWith({ url: 'http://example.test/one' }); + + await act(async () => { + getAgentBrowserScreenController('iframe-history')?.chromeActions.forward(); + }); + expect(updateParameters).toHaveBeenLastCalledWith({ url: 'http://example.test/two' }); + }); + + it('maps proxied frame location messages into chrome without updating params', async () => { + const updateParameters = vi.fn(); + const platform = new FakePtyAdapter() as FakePtyAdapter & Pick<PlatformAdapter, 'agentBrowserOpen' | 'createIframeProxyUrl'>; + platform.agentBrowserOpen = vi.fn(); + platform.createIframeProxyUrl = vi.fn(async () => ({ + ok: true, + url: 'http://127.0.0.1:61234/app', + upstream: 'http://example.test/app', + })); + setPlatform(platform); + await renderPanel(stubActions(), panelProps('iframe-proxied', updateParameters)); + + await act(async () => { + window.dispatchEvent(new MessageEvent('message', { + origin: 'http://127.0.0.1:61234', + data: { __dormouse: 'location', url: 'http://127.0.0.1:61234/other/?q=1#frag' }, + })); + }); + + expect(updateParameters).not.toHaveBeenCalled(); + expect(getAgentBrowserScreenController('iframe-proxied')?.chrome().url).toBe('http://example.test/other/?q=1#frag'); + }); + + it('re-resolves the proxy on Back after an observed in-frame navigation', async () => { + const updateParameters = vi.fn(); + const platform = new FakePtyAdapter() as FakePtyAdapter & Pick<PlatformAdapter, 'agentBrowserOpen' | 'createIframeProxyUrl'>; + platform.agentBrowserOpen = vi.fn(); + // Fixed URL so the proxy origin stays stable (the message handler gates on + // it); re-resolution is observed via the call count, not a changed src. + const createProxy = vi.fn(async () => ({ ok: true, url: 'http://127.0.0.1:61234/app' })); + platform.createIframeProxyUrl = createProxy; + setPlatform(platform); + await renderPanel(stubActions(), panelProps('iframe-back', updateParameters)); + + // Observe an in-frame navigation: it adds a history entry but, by design, + // does not write params.url back, so params.url stays the source URL. + await act(async () => { + window.dispatchEvent(new MessageEvent('message', { + origin: 'http://127.0.0.1:61234', + data: { __dormouse: 'location', url: 'http://127.0.0.1:61234/other' }, + })); + }); + + const callsBeforeBack = createProxy.mock.calls.length; + await act(async () => { + getAgentBrowserScreenController('iframe-back')?.chromeActions.back(); + }); + + // Back targets the original (still-persisted) URL, so updateParameters is a + // no-op write — the proxy must still re-resolve or the frame would keep + // showing /other while the chrome shows /app. + expect(updateParameters).toHaveBeenLastCalledWith({ url: 'http://example.test/app' }); + expect(createProxy.mock.calls.length).toBeGreaterThan(callsBeforeBack); + }); }); diff --git a/lib/src/components/wall/IframePanel.tsx b/lib/src/components/wall/IframePanel.tsx index 01a29a9e..1f927eb3 100644 --- a/lib/src/components/wall/IframePanel.tsx +++ b/lib/src/components/wall/IframePanel.tsx @@ -1,4 +1,4 @@ -import { useContext, useEffect, useRef, useState } from 'react'; +import { useCallback, useContext, useEffect, useMemo, useRef, useState } from 'react'; import type { IDockviewPanelProps } from 'dockview-react'; import { TERMINAL_BOTTOM_RADIUS_CLASS } from '../design'; import { getPlatform } from '../../lib/platform'; @@ -7,6 +7,14 @@ import { registerSurfaceFocusHandle } from '../../lib/terminal-registry'; import type { IframeProxyResult } from '../../lib/platform/types'; import { usePaneChrome } from './use-pane-chrome'; import { WallActionsContext } from './wall-context'; +import { + openAgentBrowserScreenModal, + registerAgentBrowserScreen, + type ChromeActions, + type ScreenActions, + type ScreenRegistration, +} from './agent-browser-screen'; +import { hostPathDisplay } from './browser-url'; type IframePanelParams = { surfaceType?: string; @@ -15,7 +23,7 @@ type IframePanelParams = { // Sandbox the proxied frame so a tool's `if (top !== self) top.location = …` // framebust cannot navigate the Wall away — allow-top-navigation is omitted on -// purpose (docs/specs/dor-iframe.md → "Anti-framebust"). Everything else a local +// purpose (docs/specs/dor-browser.md → "Iframe Renderer"). Everything else a local // dev tool needs is granted; allow-same-origin is safe here because the frame's // origin (the loopback proxy) is never same-origin with the host webview. const PROXY_SANDBOX = 'allow-scripts allow-same-origin allow-forms allow-popups allow-modals allow-downloads'; @@ -30,6 +38,11 @@ type Resolution = | { kind: 'raw'; src: string } | { kind: 'error'; reason: 'frame-refused' | 'unreachable' | 'scheme'; detail?: string }; +type IframeHistory = { + entries: string[]; + index: number; +}; + function originOf(url: string): string { try { return new URL(url).origin; @@ -38,32 +51,125 @@ function originOf(url: string): string { } } +function sameUrl(a: string, b: string): boolean { + if (a === b) return true; + try { + return new URL(a).href === new URL(b).href; + } catch { + return false; + } +} + +function appendHistory(history: IframeHistory, nextUrl: string): IframeHistory { + const current = history.entries[history.index] ?? ''; + if (!nextUrl || sameUrl(current, nextUrl)) return history; + return { + entries: [...history.entries.slice(0, history.index + 1), nextUrl], + index: history.index + 1, + }; +} + +function upstreamUrlFromFrameLocation(frameUrl: unknown, targetUrl: string, proxyOrigin: string): string | null { + if (typeof frameUrl !== 'string' || !targetUrl || !proxyOrigin) return null; + try { + const frame = new URL(frameUrl); + if (frame.origin !== proxyOrigin) return null; + const target = new URL(targetUrl); + return `${target.origin}${frame.pathname}${frame.search}${frame.hash}`; + } catch { + return null; + } +} + export function IframePanel({ api, params }: IDockviewPanelProps<IframePanelParams>) { const actions = useContext(WallActionsContext); const elRef = useRef<HTMLDivElement>(null); const iframeRef = useRef<HTMLIFrameElement>(null); usePaneChrome(api, elRef); - const url = typeof params?.url === 'string' ? params.url : ''; + const sourceUrl = typeof params?.url === 'string' ? params.url : ''; + const [liveUrl, setLiveUrl] = useState(sourceUrl); + // A new-tab/window request from the proxy shim, pending the user's choice to + // open it as a new pane (docs/specs/dor-browser.md → "Iframe Shim"). + const [pendingOpenUrl, setPendingOpenUrl] = useState<string | null>(null); + const [history, setHistory] = useState<IframeHistory>(() => ( + sourceUrl ? { entries: [sourceUrl], index: 0 } : { entries: [], index: -1 } + )); + // Mirror the live index into a ref so the back/forward actions stay stable — + // otherwise chromeActions (and the screen registration depending on it) would + // churn on every navigation. + const historyIndexRef = useRef(history.index); + historyIndexRef.current = history.index; + const historyRef = useRef(history); + historyRef.current = history; + // Bumped by the header's reload button to re-resolve the proxy (a cross-origin + // frame can't be reloaded via its contentWindow). + const [reloadNonce, setReloadNonce] = useState(0); + const actionsRef = useRef(actions); + actionsRef.current = actions; + + // Params are still the persisted/source URL for session restore and + // render-swaps. Keep a small browser-like history on top so iframe chrome + // Back/Forward are real even though the cross-origin frame history itself is + // not reachable from the parent webview. + useEffect(() => { + if (!sourceUrl) { + setLiveUrl(''); + setHistory({ entries: [], index: -1 }); + return; + } + setLiveUrl(sourceUrl); + setHistory((prev) => appendHistory(prev, sourceUrl)); + }, [sourceUrl]); + + // Show a URL in the frame chrome + history. `persist` writes it back to the + // panel params (a real navigation we initiated); an observed frame URL does + // not, since params stay the source/restore URL. + const applyFrameUrl = useCallback((nextUrl: string, persist: boolean) => { + if (!nextUrl) return; + setLiveUrl(nextUrl); + setHistory((prev) => appendHistory(prev, nextUrl)); + if (persist) api.updateParameters({ url: nextUrl }); + api.setTitle?.(hostPathDisplay(nextUrl, true)); + }, [api]); + + const commitUrl = useCallback((nextUrl: string) => applyFrameUrl(nextUrl, true), [applyFrameUrl]); + const observeFrameUrl = useCallback((nextUrl: string) => applyFrameUrl(nextUrl, false), [applyFrameUrl]); + + const goToHistoryIndex = useCallback((nextIndex: number) => { + const prev = historyRef.current; + if (nextIndex < 0 || nextIndex >= prev.entries.length) return; + const nextUrl = prev.entries[nextIndex]; + setLiveUrl(nextUrl); + setHistory({ ...prev, index: nextIndex }); + api.updateParameters({ url: nextUrl }); + api.setTitle?.(hostPathDisplay(nextUrl, true)); + // Force a proxy re-resolution so the frame actually reloads. After an + // observed in-frame navigation, params.url stays at the source URL, so a + // Back to that same URL is a no-op write — without bumping the nonce the + // proxy effect (deps: sourceUrl, reloadNonce) wouldn't re-fire and the frame + // would keep showing the navigated page while the chrome shows the target. + setReloadNonce((n) => n + 1); + }, [api]); // Ask the host to front the target with its transparent proxy. The returned // URL is a loopback origin that serves the page's bytes (instrumented for // loopback) so Dormouse — now the server — gets a keyboard side-channel, an // accurate focus model, and real error pages. Reachability/frame-refusal are // diagnosed by the proxy and shown as a served page inside the frame. - const [resolution, setResolution] = useState<Resolution>(() => (url ? { kind: 'resolving' } : { kind: 'empty' })); + const [resolution, setResolution] = useState<Resolution>(() => (sourceUrl ? { kind: 'resolving' } : { kind: 'empty' })); useEffect(() => { - if (!url) { + if (!sourceUrl) { setResolution({ kind: 'empty' }); return; } const createProxy = getPlatform().createIframeProxyUrl; if (!createProxy) { - setResolution({ kind: 'raw', src: url }); + setResolution({ kind: 'raw', src: sourceUrl }); return; } let cancelled = false; setResolution({ kind: 'resolving' }); - createProxy(url).then( + createProxy(sourceUrl).then( (result: IframeProxyResult) => { if (cancelled) return; if (result.ok) setResolution({ kind: 'proxied', src: result.url, origin: originOf(result.url) }); @@ -74,10 +180,66 @@ export function IframePanel({ api, params }: IDockviewPanelProps<IframePanelPara }, ); return () => { cancelled = true; }; - }, [url]); + }, [sourceUrl, reloadNonce]); + + // Register a screen controller so the embed surface shows the unified + // browser chrome (URL + the far-left chip → Display modal) and can swap back + // to a live screencast. Gated on the host being able to spawn an + // agent-browser (agentBrowserOpen) — without it there's no screencast to + // swap to, so the embed surface keeps its plain title (e.g. the web host). + const swapCapable = !!getPlatform().agentBrowserOpen; + const screenActions = useMemo<ScreenActions>(() => ({ + engageSync() {}, + applyDevice() {}, + applyViewport() {}, + openModal() { openAgentBrowserScreenModal(api.id); }, + // iframe is the current backend; ab-screencast / ab-popout swap to + // agent-browser. Wired only when the host can spawn one — without it the + // modal hides its Render section, but the chrome (URL/nav) still shows. + setRenderMode: swapCapable + ? (mode) => { if (mode !== 'iframe') actionsRef.current.onSwapRenderMode(api.id, mode); } + : undefined, + }), [api.id, swapCapable]); + const chromeActions = useMemo<ChromeActions>(() => ({ + navigate(next) { commitUrl(next); }, + back() { goToHistoryIndex(historyIndexRef.current - 1); }, + forward() { goToHistoryIndex(historyIndexRef.current + 1); }, + reload() { setReloadNonce((n) => n + 1); }, + }), [commitUrl, goToHistoryIndex]); + const registrationRef = useRef<ScreenRegistration | null>(null); + // Register the screen controller unconditionally so the browser chrome (URL + + // far-left chip) shows for every iframe surface, on every host — `dor iframe` + // is a full browser-chrome tab, not a lesser one (docs/specs/dor-browser.md). + // The render-swap action is gated separately (screenActions.setRenderMode). + useEffect(() => { + const registration = registerAgentBrowserScreen(api.id, { + snapshot: { + state: 'SYNCED', + renderMode: 'iframe', + viewport: { w: 0, h: 0, dpr: 1 }, + paneCss: { w: 0, h: 0 }, + displayDpr: 1, + syncEngaged: false, + }, + actions: screenActions, + chrome: { url: liveUrl, displayUrl: hostPathDisplay(liveUrl), title: api.title ?? null, key: null }, + chromeActions, + hostCapable: false, + // embed→popout spawns the new agent-browser headed and mounts it + // popped-out, so it needs both spawn and pop-out host capabilities. + canPopOut: !!getPlatform().agentBrowserPopOut, + }); + registrationRef.current = registration; + return () => { registration.dispose(); registrationRef.current = null; }; + }, [api.id, swapCapable, screenActions, chromeActions]); + // Keep the header's URL current as navigation and in-frame location changes + // land. The iframe src is still driven only by sourceUrl. + useEffect(() => { + registrationRef.current?.updateChrome({ url: liveUrl, displayUrl: hostPathDisplay(liveUrl), title: api.title ?? null, key: null }); + }, [liveUrl, api.title]); // Trust postMessage from this frame's origin (validated by the Wall's - // keyboard/focus channel) only while the proxied surface is live. + // keyboard/focus/location channel) only while the proxied surface is live. const proxyOrigin = resolution.kind === 'proxied' ? resolution.origin : null; useEffect(() => { if (!proxyOrigin) return; @@ -96,12 +258,26 @@ export function IframePanel({ api, params }: IDockviewPanelProps<IframePanelPara if (!proxyOrigin) return; const onMessage = (e: MessageEvent) => { if (e.origin !== proxyOrigin) return; - if ((e.data as { __dormouse?: unknown } | null)?.__dormouse !== 'pointerdown') return; - actions.onClickPanel(api.id); + const data = e.data as { __dormouse?: unknown; url?: unknown } | null; + if (data?.__dormouse === 'pointerdown') { + actions.onClickPanel(api.id); + return; + } + if (data?.__dormouse === 'open-window' && typeof data.url === 'string') { + // Single-frame renderer: a new-tab/window request becomes a new pane. + // Map a proxy-origin URL back to the upstream; pass externals through. + const mapped = upstreamUrlFromFrameLocation(data.url, liveUrl || sourceUrl, proxyOrigin) ?? data.url; + setPendingOpenUrl(mapped); + return; + } + if (data?.__dormouse === 'location') { + const nextUrl = upstreamUrlFromFrameLocation(data.url, liveUrl || sourceUrl, proxyOrigin); + if (nextUrl) observeFrameUrl(nextUrl); + } }; window.addEventListener('message', onMessage); return () => window.removeEventListener('message', onMessage); - }, [api, proxyOrigin, actions]); + }, [api, proxyOrigin, actions, liveUrl, sourceUrl, observeFrameUrl]); // Raw fallback frames have no injected shim, but focusing a cross-origin // iframe still blurs the parent window while the document itself remains @@ -163,13 +339,47 @@ export function IframePanel({ api, params }: IDockviewPanelProps<IframePanelPara ref={iframeRef} className="block h-full w-full border-0 bg-white" src={src} - title={api.title ?? url} + title={api.title ?? liveUrl} allow={IFRAME_ALLOW} {...(resolution.kind === 'proxied' ? { sandbox: PROXY_SANDBOX, 'data-dormouse-proxy': 'true' } : {})} referrerPolicy="strict-origin-when-cross-origin" /> ) : ( - <PanelMessage resolution={resolution} url={url} /> + <PanelMessage resolution={resolution} url={sourceUrl} /> + )} + {pendingOpenUrl && ( + <div className="absolute inset-0 z-10 flex flex-col items-center justify-center gap-3 bg-terminal-bg/95 px-6 text-center"> + <div className="max-w-sm text-sm text-foreground"> + This page wants to open a new tab: + <div className="mt-1 break-all font-mono text-xs text-muted">{pendingOpenUrl}</div> + </div> + <div className="flex gap-2"> + <button + type="button" + onMouseDown={(e) => e.stopPropagation()} + onClick={(e) => { + e.stopPropagation(); + const u = pendingOpenUrl; + setPendingOpenUrl(null); + if (u) actions.onOpenBrowserPane?.(api.id, u); + }} + className="rounded border border-border px-2.5 py-1 text-sm text-foreground transition-colors hover:border-foreground" + > + Open in new pane + </button> + <button + type="button" + onMouseDown={(e) => e.stopPropagation()} + onClick={(e) => { e.stopPropagation(); setPendingOpenUrl(null); }} + className="rounded border border-border px-2.5 py-1 text-sm text-muted transition-colors hover:text-foreground" + > + Cancel + </button> + </div> + <div className="text-xs text-muted/80"> + Pages that open many tabs work better in agent-browser — open the chip → Render. + </div> + </div> )} </div> ); diff --git a/lib/src/components/wall/SurfacePaneHeader.test.tsx b/lib/src/components/wall/SurfacePaneHeader.test.tsx index 6c84828e..9417b446 100644 --- a/lib/src/components/wall/SurfacePaneHeader.test.tsx +++ b/lib/src/components/wall/SurfacePaneHeader.test.tsx @@ -1,7 +1,7 @@ /** * @vitest-environment jsdom */ -import { act } from 'react'; +import { act, StrictMode } from 'react'; import { createRoot, type Root } from 'react-dom/client'; import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest'; import type { IDockviewPanelHeaderProps } from 'dockview-react'; @@ -45,6 +45,7 @@ function stubActions(overrides: Partial<WallActions> = {}): WallActions { onStartRename: vi.fn(), onFinishRename: vi.fn(() => ({ accepted: true })), onCancelRename: vi.fn(), + onSwapRenderMode: vi.fn(), ...overrides, }; } @@ -80,9 +81,11 @@ afterEach(() => { function renderHeader(props: IDockviewPanelHeaderProps, actions: WallActions) { act(() => { root.render( - <WallActionsContext.Provider value={actions}> - <SurfacePaneHeader {...props} /> - </WallActionsContext.Provider>, + <StrictMode> + <WallActionsContext.Provider value={actions}> + <SurfacePaneHeader {...props} /> + </WallActionsContext.Provider> + </StrictMode>, ); }); } diff --git a/lib/src/components/wall/SurfacePaneHeader.tsx b/lib/src/components/wall/SurfacePaneHeader.tsx index 1e41bb2d..80932b26 100644 --- a/lib/src/components/wall/SurfacePaneHeader.tsx +++ b/lib/src/components/wall/SurfacePaneHeader.tsx @@ -1,14 +1,16 @@ -import { useContext, useEffect, useState } from 'react'; +import { useContext, useEffect, useState, type ReactNode } from 'react'; import type { IDockviewPanelHeaderProps } from 'dockview-react'; import { ArrowClockwiseIcon, ArrowLeftIcon, ArrowLineDownIcon, ArrowRightIcon, + ArrowSquareOutIcon, ArrowsInIcon, ArrowsOutIcon, FrameCornersIcon, - ResizeIcon, + LinkIcon, + LockSimpleIcon, SplitHorizontalIcon, SplitVerticalIcon, XIcon, @@ -19,6 +21,7 @@ import { useAgentBrowserChromeSnapshot, useAgentBrowserScreenController, useAgentBrowserScreenSnapshot, + type ScreenSnapshot, } from './agent-browser-screen'; import { loopbackPort, normalizeNavUrl, pathDisplay } from './browser-url'; import { triggerDevServerRescan, useDevServerMatch } from './agent-browser-ports'; @@ -31,6 +34,20 @@ import { ZoomedContext, } from './wall-context'; +/** The far-left chip reflects the surface's render backend at a glance, and + * opens the Display modal. iframe embed → frame; agent-browser popout → + * open-window glyph; agent-browser screencast → a link when its resolution + * resizes with the pane, a lock when it's fixed. Returns the glyph and its + * label together so the two never drift apart. */ +function screenChip(s: ScreenSnapshot): { icon: ReactNode; label: string } { + const mode = s.renderMode ?? 'ab-screencast'; + if (mode === 'iframe') return { icon: <FrameCornersIcon size={14} />, label: 'iframe embed — change render' }; + if (mode === 'ab-popout') return { icon: <ArrowSquareOutIcon size={14} />, label: 'agent-browser popout — change render' }; + return s.state === 'SYNCED' + ? { icon: <LinkIcon size={14} />, label: 'agent-browser screencast, resizes with pane — change render or resolution' } + : { icon: <LockSimpleIcon size={14} />, label: 'agent-browser screencast, fixed resolution — change render or resolution' }; +} + export function SurfacePaneHeader({ api }: IDockviewPanelHeaderProps) { const mode = useContext(ModeContext); const selectedId = useContext(SelectedIdContext); @@ -45,6 +62,7 @@ export function SurfacePaneHeader({ api }: IDockviewPanelHeaderProps) { const screen = useAgentBrowserScreenController(api.id); const screenSnapshot = useAgentBrowserScreenSnapshot(screen); const chrome = useAgentBrowserChromeSnapshot(screen); + const chip = screenSnapshot ? screenChip(screenSnapshot) : null; // Dev-server connection: when the active tab is loopback, correlate its port // to the Dormouse terminal pane serving it (resolved Wall-side). Hooks run @@ -84,18 +102,17 @@ export function SurfacePaneHeader({ api }: IDockviewPanelHeaderProps) { > {screen && screenSnapshot && chrome ? ( <> - {/* Sync chip → far left, out of the way of the nav controls. Opens - the screen modal; SYNCED/SCALED reflects reality. */} + {/* Render/screen chip → far left, out of the way of the nav controls. + Opens the Display modal; the glyph reflects reality — frame = + embed, window = popout, link/lock = screencast resize/fixed. */} <button type="button" onClick={(e) => { e.stopPropagation(); screen.actions.openModal(); }} - aria-label={`Screen: ${screenSnapshot.state} — change viewport`} - title={`Screen: ${screenSnapshot.state} — change viewport`} + aria-label={chip?.label} + title={chip?.label} className="flex h-5 min-w-5 shrink-0 items-center justify-center rounded transition-colors hover:bg-current/10" > - {screenSnapshot.state === 'SYNCED' - ? <FrameCornersIcon size={14} /> - : <ResizeIcon size={14} />} + {chip?.icon} </button> {/* Back / forward / refresh — native agent-browser commands; always diff --git a/lib/src/components/wall/agent-browser-connection.test.ts b/lib/src/components/wall/agent-browser-connection.test.ts new file mode 100644 index 00000000..ae959b74 --- /dev/null +++ b/lib/src/components/wall/agent-browser-connection.test.ts @@ -0,0 +1,211 @@ +import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest'; +import { createAgentBrowserConnection } from './agent-browser-connection'; + +class WebSocketMock { + static instances: WebSocketMock[] = []; + static OPEN = 1; + + onopen: ((event: Event) => void) | null = null; + onmessage: ((event: MessageEvent) => void) | null = null; + onclose: ((event: CloseEvent) => void) | null = null; + onerror: ((event: Event) => void) | null = null; + readyState = 1; + sent: string[] = []; + closed = false; + + constructor(public url: string) { + WebSocketMock.instances.push(this); + queueMicrotask(() => this.onopen?.(new Event('open'))); + } + + send(data: string) { + this.sent.push(data); + } + + close() { + this.closed = true; + this.readyState = 3; + this.onclose?.({ code: 1000, reason: '', wasClean: true } as CloseEvent); + } + + emitMessage(data: string) { + this.onmessage?.({ data } as MessageEvent); + } +} + +beforeEach(() => { + vi.stubGlobal('WebSocket', WebSocketMock); + WebSocketMock.instances = []; +}); + +afterEach(() => { + vi.restoreAllMocks(); +}); + +describe('agent-browser connection', () => { + it('closes the stream websocket when disposed', async () => { + const connection = createAgentBrowserConnection({ + session: 'dormouse.1.default', + streamPort: 1234, + }); + + await Promise.resolve(); + const ws = WebSocketMock.instances[0]; + expect(ws.url).toBe('ws://127.0.0.1:1234'); + + connection.dispose(); + + expect(ws.closed).toBe(true); + }); + + it('ignores transient empty tabs after a real tab list', async () => { + const connection = createAgentBrowserConnection({ + session: 'dormouse.1.default', + streamPort: 1234, + }); + + await Promise.resolve(); + const ws = WebSocketMock.instances[0]; + ws.emitMessage(JSON.stringify({ + type: 'tabs', + tabs: [{ tabId: 't1', title: 'Dormouse', url: 'https://dormouse.sh/', active: true }], + })); + + expect(connection.snapshot().tabs).toHaveLength(1); + + ws.emitMessage(JSON.stringify({ type: 'tabs', tabs: [] })); + + expect(connection.snapshot().tabs).toEqual([ + { tabId: 't1', title: 'Dormouse', url: 'https://dormouse.sh/', active: true }, + ]); + }); + + it('drops byte-identical frame re-broadcasts (daemon heartbeat) but forwards real changes', async () => { + const connection = createAgentBrowserConnection({ + session: 'dormouse.1.default', + streamPort: 1234, + }); + let pulses = 0; + connection.subscribe((event) => { if (event.type === 'frame-pulse') pulses += 1; }); + + await Promise.resolve(); + const ws = WebSocketMock.instances[0]; + const frameA = JSON.stringify({ type: 'frame', data: 'AAAAAAAAAA' }); + const frameB = JSON.stringify({ type: 'frame', data: 'BBBBBBBBBB' }); + + ws.emitMessage(frameA); // first frame — primes + ws.emitMessage(frameA); // identical re-broadcast — dropped + ws.emitMessage(frameA); // identical re-broadcast — dropped + expect(pulses).toBe(1); + + ws.emitMessage(frameB); // real change — forwarded + ws.emitMessage(frameB); // identical — dropped + expect(pulses).toBe(2); + + ws.emitMessage(frameA); // changed again — forwarded + expect(pulses).toBe(3); + }); + + it('drops identical tab-snapshot re-broadcasts but forwards real changes', async () => { + const connection = createAgentBrowserConnection({ + session: 'dormouse.1.default', + streamPort: 1234, + }); + let tabsEvents = 0; + connection.subscribe((event) => { if (event.type === 'tabs') tabsEvents += 1; }); + + await Promise.resolve(); + const ws = WebSocketMock.instances[0]; + const snapshot = JSON.stringify({ + type: 'tabs', + tabs: [{ tabId: 't1', title: 'Dormouse', url: 'https://dormouse.sh/', active: true }], + }); + + ws.emitMessage(snapshot); // first — emits + ws.emitMessage(snapshot); // identical heartbeat — dropped + ws.emitMessage(snapshot); // identical heartbeat — dropped + expect(tabsEvents).toBe(1); + + // A genuine change (new tab) alters the signature and is forwarded. + ws.emitMessage(JSON.stringify({ + type: 'tabs', + tabs: [ + { tabId: 't1', title: 'Dormouse', url: 'https://dormouse.sh/', active: false }, + { tabId: 't2', title: 'GitHub', url: 'https://github.com/diffplug/dormouse', active: true }, + ], + })); + expect(tabsEvents).toBe(2); + }); + + it('re-primes after a reconnect so the first identical frame/tabs still forwards', async () => { + vi.useFakeTimers(); + try { + const connection = createAgentBrowserConnection({ + session: 'dormouse.1.default', + streamPort: 1234, + }); + let pulses = 0; + let tabsEvents = 0; + connection.subscribe((event) => { + if (event.type === 'frame-pulse') pulses += 1; + if (event.type === 'tabs') tabsEvents += 1; + }); + + // Flush connect()'s async getStreamUrl microtask + the mock's queued onopen. + await vi.advanceTimersByTimeAsync(0); + const ws = WebSocketMock.instances[0]; + const frame = JSON.stringify({ type: 'frame', data: 'AAAAAAAAAA' }); + const snapshot = JSON.stringify({ + type: 'tabs', + tabs: [{ tabId: 't1', title: 'Dormouse', url: 'https://dormouse.sh/', active: true }], + }); + ws.emitMessage(frame); + ws.emitMessage(snapshot); + expect(pulses).toBe(1); + expect(tabsEvents).toBe(1); + + // Socket drops; the connection resets dedupe state and schedules a reconnect + // (backoff ~2s for the first failure). Advance past it to open a new socket. + ws.onclose?.({ code: 1006, reason: '', wasClean: false } as CloseEvent); + await vi.advanceTimersByTimeAsync(2100); + const ws2 = WebSocketMock.instances[WebSocketMock.instances.length - 1]; + expect(ws2).not.toBe(ws); + + // The reconnected stream re-sends the same frame/tabs; they must re-prime, not + // be swallowed as duplicates of the pre-disconnect state. + ws2.emitMessage(frame); + ws2.emitMessage(snapshot); + expect(pulses).toBe(2); + expect(tabsEvents).toBe(2); + + connection.dispose(); + } finally { + vi.useRealTimers(); + } + }); + + it('does not force-select an active provisional duplicate-url tab', async () => { + const runCommand = vi.fn(async () => ({ exitCode: 0, stdout: '', stderr: '' })); + const connection = createAgentBrowserConnection({ + session: 'dormouse.1.default', + streamPort: 1234, + runCommand, + }); + + await Promise.resolve(); + const ws = WebSocketMock.instances[0]; + ws.emitMessage(JSON.stringify({ + type: 'tabs', + tabs: [{ tabId: 't1', title: 'Dormouse', url: 'https://dormouse.sh/', active: true }], + })); + ws.emitMessage(JSON.stringify({ + type: 'tabs', + tabs: [ + { tabId: 't1', title: 'Dormouse', url: 'https://dormouse.sh/', active: false }, + { tabId: 't2', title: 'Dormouse', url: 'https://dormouse.sh/', active: true }, + ], + })); + + expect(runCommand).not.toHaveBeenCalled(); + }); +}); diff --git a/lib/src/components/wall/agent-browser-connection.ts b/lib/src/components/wall/agent-browser-connection.ts new file mode 100644 index 00000000..34a8a3e7 --- /dev/null +++ b/lib/src/components/wall/agent-browser-connection.ts @@ -0,0 +1,318 @@ +import type { AgentBrowserCommandResult } from '../../lib/platform/types'; +import { type AgentBrowserTab, parseAgentBrowserTabs } from '../../lib/agent-browser-tab'; + +// Re-exported so existing importers keep resolving the tab type/parser from here. +export type { AgentBrowserTab }; +export { parseAgentBrowserTabs }; + +// Stream messages above this size are frames (a base64 JPEG); status/tabs are +// small JSON control messages. Consumers display screenshots, so large frames +// are treated as unparsed "page changed" pulses. +const FRAME_PULSE_THRESHOLD = 16384; +const DEBUG_RING_LIMIT = 300; + +// Fast non-cryptographic string hash (djb2) for cheap byte-identity checks on +// stream payloads. Used to detect redundant frames/tabs the daemon re-broadcasts. +function djb2(s: string): number { + let h = 5381; + for (let i = 0; i < s.length; i++) h = ((h << 5) + h + s.charCodeAt(i)) | 0; + return h; +} + +export type AgentBrowserConnectionState = 'connecting' | 'open' | 'closed' | 'failed'; + +export interface AgentBrowserStreamStatus { + connected: boolean; + screencasting: boolean; + viewportWidth?: number; + viewportHeight?: number; +} + +export interface AgentBrowserFramePulse { + deviceWidth?: number; + deviceHeight?: number; +} + +export interface AgentBrowserSnapshot { + connection: AgentBrowserConnectionState; + session: string; + streamPort: number; + tabs: AgentBrowserTab[]; + status: AgentBrowserStreamStatus | null; + connectionLost: boolean; + lastError?: string; + livePortOpened: boolean; +} + +export type AgentBrowserConnectionEvent = + | { type: 'connection-open'; port: number } + | { type: 'connection-close'; port: number; failures: number; code: number; reason: string; wasClean: boolean } + | { type: 'connection-error'; port: number } + | { type: 'status'; status: AgentBrowserStreamStatus } + | { type: 'tabs'; tabs: AgentBrowserTab[]; previousTabs: AgentBrowserTab[] } + | { type: 'frame-pulse'; metadata?: AgentBrowserFramePulse } + | { type: 'debug'; event: AgentBrowserDebugEvent }; + +export interface AgentBrowserDebugEvent { + ts: number; + session: string; + port: number; + event: string; + data?: unknown; +} + +export interface AgentBrowserConnectionDeps { + session: string; + streamPort: number; + binaryPath?: string; + getStreamUrl?: (port: number) => Promise<string | undefined>; + runCommand?: (session: string, args: string[], binaryPath?: string) => Promise<AgentBrowserCommandResult>; + canSelectTabs?: () => boolean; + log?: (message: string) => void; +} + +export function createAgentBrowserConnection(deps: AgentBrowserConnectionDeps): AgentBrowserConnection { + return new AgentBrowserConnection(deps); +} + +export class AgentBrowserConnection { + private readonly listeners = new Set<(event: AgentBrowserConnectionEvent) => void>(); + private readonly debugEvents: AgentBrowserDebugEvent[] = []; + private socket: WebSocket | null = null; + private retryTimer: ReturnType<typeof setTimeout> | undefined; + private disposed = false; + private failures = 0; + private knownTabIds = new Set<string>(); + private pendingNewTab: { tabId: string; initialUrl: string; seenAtMs: number } | null = null; + private snap: AgentBrowserSnapshot; + + // The agent-browser daemon re-broadcasts the current frame and tab list on a + // ~20Hz heartbeat even when nothing changes, so a *static* page would otherwise + // drive ~20 device-resolution screenshots/sec (each a child-process spawn) plus + // ~20 `setTabs` re-renders/sec. We drop byte-identical re-broadcasts here so an + // unchanged page costs nothing downstream (the screenshot loop's own contract: + // "a static page produces no pulses, so no shots and no cost"). `0`/`''` are + // pre-first-message sentinels, and reset on reconnect so a fresh stream always + // re-primes the canvas/tabs. + private lastFrameKey = 0; + private lastTabsSig = ''; + + constructor(private readonly deps: AgentBrowserConnectionDeps) { + this.snap = { + connection: 'connecting', + session: deps.session, + streamPort: deps.streamPort, + tabs: [], + status: null, + connectionLost: false, + livePortOpened: false, + }; + this.connect(); + } + + subscribe(listener: (event: AgentBrowserConnectionEvent) => void): () => void { + this.listeners.add(listener); + return () => this.listeners.delete(listener); + } + + snapshot(): AgentBrowserSnapshot { + return this.snap; + } + + debugSnapshot(): AgentBrowserDebugEvent[] { + return [...this.debugEvents]; + } + + send(payload: Record<string, unknown>): void { + const ws = this.socket; + if (ws && ws.readyState === WebSocket.OPEN) ws.send(JSON.stringify(payload)); + } + + dispose(): void { + this.disposed = true; + if (this.retryTimer !== undefined) clearTimeout(this.retryTimer); + this.retryTimer = undefined; + const ws = this.socket; + this.socket = null; + ws?.close(); + } + + private emit(event: AgentBrowserConnectionEvent): void { + for (const listener of this.listeners) listener(event); + } + + private debug(event: string, data?: unknown): void { + const item: AgentBrowserDebugEvent = { + ts: Date.now(), + session: this.deps.session, + port: this.deps.streamPort, + event, + ...(data !== undefined ? { data } : {}), + }; + this.debugEvents.push(item); + if (this.debugEvents.length > DEBUG_RING_LIMIT) this.debugEvents.splice(0, this.debugEvents.length - DEBUG_RING_LIMIT); + this.emit({ type: 'debug', event: item }); + } + + private log(message: string): void { + this.deps.log?.(message); + } + + private patch(next: Partial<AgentBrowserSnapshot>): void { + this.snap = { ...this.snap, ...next }; + } + + private async connect(): Promise<void> { + let url: string | undefined; + try { + url = await this.deps.getStreamUrl?.(this.deps.streamPort); + } catch (err) { + this.debug('stream-url-error', { error: err instanceof Error ? err.message : String(err) }); + } + if (this.disposed) return; + const wsUrl = url ?? `ws://127.0.0.1:${this.deps.streamPort}`; + this.log(`[ab-panel] connecting stream ${JSON.stringify({ wsPort: this.deps.streamPort, url: wsUrl })}`); + this.debug('connect', { url: wsUrl }); + this.socket = new WebSocket(wsUrl); + this.socket.onopen = () => { + this.failures = 0; + this.patch({ connection: 'open', connectionLost: false, livePortOpened: true }); + this.log(`[ab-panel] stream open ${JSON.stringify({ wsPort: this.deps.streamPort })}`); + this.debug('open'); + this.emit({ type: 'connection-open', port: this.deps.streamPort }); + }; + this.socket.onmessage = (ev) => this.handleMessage(ev.data); + this.socket.onerror = () => { + this.patch({ lastError: 'stream socket error' }); + this.log(`[ab-panel] stream error ${JSON.stringify({ wsPort: this.deps.streamPort })}`); + this.debug('error'); + this.emit({ type: 'connection-error', port: this.deps.streamPort }); + }; + this.socket.onclose = (ev) => { + this.socket = null; + // A reconnected stream re-sends the current frame/tabs; clear the dedupe + // sentinels so that first post-reconnect snapshot always re-primes the + // canvas and tab list rather than being dropped as a "duplicate". + this.lastFrameKey = 0; + this.lastTabsSig = ''; + if (this.disposed) return; + this.failures += 1; + if (this.failures >= 3) this.patch({ connection: 'failed', connectionLost: true }); + else this.patch({ connection: 'closed' }); + const data = { wsPort: this.deps.streamPort, failures: this.failures, code: ev.code, reason: ev.reason, wasClean: ev.wasClean }; + this.log(`[ab-panel] stream close ${JSON.stringify(data)}`); + this.debug('close', data); + this.emit({ type: 'connection-close', port: this.deps.streamPort, failures: this.failures, code: ev.code, reason: ev.reason, wasClean: ev.wasClean }); + this.retryTimer = setTimeout(() => this.connect(), Math.min(1000 * 2 ** this.failures, 10000)); + }; + } + + // Drop a frame whose pixels (and device dims) match the previous one — the + // daemon's heartbeat re-broadcasts an unchanged page, and redrawing it is pure + // cost. Returns true when the frame is a duplicate the caller should ignore. + private isDuplicateFrame(payload: string): boolean { + const key = djb2(payload) ^ (payload.length | 0); + if (key === this.lastFrameKey) return true; + this.lastFrameKey = key; + return false; + } + + private handleMessage(raw: unknown): void { + if (typeof raw !== 'string') return; + if (raw.length > FRAME_PULSE_THRESHOLD) { + if (this.isDuplicateFrame(raw)) return; + this.emit({ type: 'frame-pulse' }); + return; + } + let msg: any; + try { + msg = JSON.parse(raw); + } catch { + return; + } + if (msg.type === 'frame' && typeof msg.data === 'string') { + const metadata = msg.metadata?.deviceWidth && msg.metadata?.deviceHeight + ? { deviceWidth: msg.metadata.deviceWidth as number, deviceHeight: msg.metadata.deviceHeight as number } + : undefined; + // Fold device dims into the identity key so a resize that happens to keep + // identical pixels still propagates. + const key = metadata ? `${msg.data}@${metadata.deviceWidth}x${metadata.deviceHeight}` : msg.data; + if (this.isDuplicateFrame(key)) return; + this.emit({ type: 'frame-pulse', metadata }); + } else if (msg.type === 'status') { + const status: AgentBrowserStreamStatus = { + connected: msg.connected === true, + screencasting: msg.screencasting === true, + ...(typeof msg.viewportWidth === 'number' ? { viewportWidth: msg.viewportWidth } : {}), + ...(typeof msg.viewportHeight === 'number' ? { viewportHeight: msg.viewportHeight } : {}), + }; + this.patch({ status, connectionLost: msg.connected === false }); + this.emit({ type: 'status', status }); + } else if (msg.type === 'tabs' && Array.isArray(msg.tabs)) { + this.handleTabs(parseAgentBrowserTabs(msg.tabs)); + } + } + + private handleTabs(next: AgentBrowserTab[]): void { + const previousTabs = this.snap.tabs; + if (next.length === 0 && previousTabs.length > 0) { + this.log(`[ab-panel] empty tabs snapshot ignored ${JSON.stringify({ w: this.deps.streamPort, previous: previousTabs.length })}`); + this.debug('tabs-empty-ignored', { previous: previousTabs.length }); + return; + } + + // Drop an identical tab-snapshot re-broadcast (same ids, active flags, urls, + // titles): it would otherwise re-run tab-selection and force a `setTabs` + // re-render every heartbeat. A real change (new/closed tab, navigation, + // focus, title) alters the signature and falls through. + const fullSig = JSON.stringify(next.map((t) => `${t.tabId}:${t.active ? 'A' : '-'}:${t.url}:${t.title ?? ''}`)); + if (fullSig === this.lastTabsSig) return; + this.lastTabsSig = fullSig; + + this.maybeSelectNewTab(next, previousTabs); + this.knownTabIds = new Set(next.map((t) => t.tabId)); + const sig = JSON.stringify({ w: this.deps.streamPort, t: next.map((t) => `${t.tabId}:${t.active ? 'A' : '-'}:${t.url}`) }); + this.log(`[ab-panel] tabs msg ${sig}`); + this.debug('tabs', { tabs: next }); + this.patch({ tabs: next }); + this.emit({ type: 'tabs', tabs: next, previousTabs }); + } + + private maybeSelectNewTab(next: AgentBrowserTab[], previousTabs: AgentBrowserTab[]): void { + const canSelect = this.deps.canSelectTabs?.() ?? true; + const maybeSelectTab = (tab: AgentBrowserTab, reason: string) => { + if (!canSelect) return; + this.log(`[ab-panel] selecting tab ${JSON.stringify({ tabId: tab.tabId, url: tab.url, reason })}`); + this.debug('select-tab', { tabId: tab.tabId, url: tab.url, reason }); + this.deps.runCommand?.(this.deps.session, ['tab', tab.tabId], this.deps.binaryPath).then((result) => { + if (result.exitCode !== 0) { + this.log(`[agent-browser] tab ${tab.tabId} failed: ${result.stderr || result.stdout || `exit ${result.exitCode}`}`); + } + }).catch((err) => this.log(`[agent-browser] tab ${tab.tabId} failed: ${err instanceof Error ? err.message : String(err)}`)); + }; + + const pending = this.pendingNewTab; + if (pending) { + const tab = next.find((t) => t.tabId === pending.tabId); + if (!tab) { + this.pendingNewTab = null; + } else if (tab.url !== pending.initialUrl) { + if (!tab.active) maybeSelectTab(tab, 'new-tab-destination'); + else this.log(`[ab-panel] new tab destination observed ${JSON.stringify({ tabId: tab.tabId, url: tab.url, elapsedMs: Math.round(performance.now() - pending.seenAtMs) })}`); + this.pendingNewTab = null; + } + } + + if (this.knownTabIds.size === 0) return; + const fresh = next.filter((t) => !this.knownTabIds.has(t.tabId)); + const newest = fresh[fresh.length - 1]; + if (!newest) return; + const duplicateUrl = !!newest.url && previousTabs.some((tab) => tab.url === newest.url); + if (!newest.active) maybeSelectTab(newest, 'new-tab-inactive'); + else if (duplicateUrl) { + this.pendingNewTab = { tabId: newest.tabId, initialUrl: newest.url, seenAtMs: performance.now() }; + this.log(`[ab-panel] new tab provisional ${JSON.stringify({ tabId: newest.tabId, url: newest.url })}`); + this.debug('new-tab-provisional', { tabId: newest.tabId, url: newest.url }); + } + } +} diff --git a/lib/src/components/wall/agent-browser-ports.test.ts b/lib/src/components/wall/agent-browser-ports.test.ts index 541867f8..63d65f30 100644 --- a/lib/src/components/wall/agent-browser-ports.test.ts +++ b/lib/src/components/wall/agent-browser-ports.test.ts @@ -74,13 +74,17 @@ describe('dev-server port store', () => { releaseDevServerPort(7000); }); - it('clears a resolution once the last watcher releases the port', () => { + it('keeps the cached resolution when the last watcher releases the port', () => { requestDevServerPort(9999); setDevServerResolution(9999, { paneId: 'pane-z', label: 'vite' }); expect(getDevServerResolution(9999)).not.toBeNull(); + // Releasing drops the "wanted" interest but KEEPS the cached resolution. + // Release is also what React StrictMode's mount→cleanup→mount runs on every + // header mount; clearing here would blank the chip until the next Wall scan. + // The Wall owns clearing stale resolutions (it re-validates re-wanted ports). releaseDevServerPort(9999); - expect(getDevServerResolution(9999)).toBeNull(); expect(getWantedDevServerPorts()).not.toContain(9999); + expect(getDevServerResolution(9999)).toEqual({ paneId: 'pane-z', label: 'vite' }); }); }); diff --git a/lib/src/components/wall/agent-browser-ports.ts b/lib/src/components/wall/agent-browser-ports.ts index 956fc2a3..a0778fd8 100644 --- a/lib/src/components/wall/agent-browser-ports.ts +++ b/lib/src/components/wall/agent-browser-ports.ts @@ -1,6 +1,6 @@ /** * Dev-server port → terminal-pane correlation store - * (docs/specs/dor-agent-browser.md → "Dev-server connection"). + * (docs/specs/dor-browser.md → "Dev-Server Chip"). * * A browser surface header can't see other panes' open ports, so the * correlation lives in the Wall: it watches which loopback ports headers are @@ -58,11 +58,12 @@ export function releaseDevServerPort(port: number): void { return; } wanted.delete(port); - // Drop the stale resolution so a later watcher re-resolves from scratch - // rather than briefly flashing a now-defunct pane. - const hadResolution = resolutions.delete(port); + // Keep the last resolution cached rather than dropping it here: releasing is + // also what React StrictMode's mount→cleanup→mount does on every header mount, + // and clearing it in this cleanup would blank the chip until the next scan. The + // resolution is Wall-owned — a re-wanted port is re-validated (the Wall's + // `settled` set drops it) and a now-defunct pane is cleared by that scan. emitWanted(); - if (hadResolution) emitResolutions(); } export function getWantedDevServerPorts(): number[] { diff --git a/lib/src/components/wall/agent-browser-screen.ts b/lib/src/components/wall/agent-browser-screen.ts index fb250424..8045dcdc 100644 --- a/lib/src/components/wall/agent-browser-screen.ts +++ b/lib/src/components/wall/agent-browser-screen.ts @@ -1,8 +1,8 @@ /** * Per-surface bridge between an agent-browser pane's body (AgentBrowserPanel) * and its tab header (SurfacePaneHeader) + the screen modal, which are - * separate components for one pane (see docs/specs/dor-agent-browser.md → - * "Screen Indicator & Viewport"). + * separate components for one pane (see docs/specs/dor-browser.md → + * "Render indicator & the Display modal"). * * The panel owns the live state (viewport, pane size, sync) and the action * (`runAgentBrowser`); the header and modal only read a snapshot and invoke @@ -20,8 +20,26 @@ import { useSyncExternalStore } from 'react'; export type ScreenState = 'SYNCED' | 'SCALED'; +/** How a web surface is rendered (docs/specs/dor-browser.md → "Canonical Params"; + * dor-browser.md → "Pop-Out"). The `ab-` prefix names the engine + * (agent-browser), leaving room for a future engine beside it; `iframe` is the + * engine-less DOM embed: + * - `ab-screencast` — real Chromium to a canvas: agent-drivable, any URL, but + * laggy for a human. + * - `ab-popout` — the same agent-browser relaunched headed as a native OS + * window: agent-drivable, any URL, native human feel; the in-Dormouse pane + * becomes a stub. + * - `iframe` — the page's own DOM in a proxied iframe: native + zero-lag, + * but loopback-only and not agent-drivable. + * Absent ⇒ `ab-screencast` — a surface with no explicit mode reads as a + * screencast. */ +export type RenderMode = 'ab-screencast' | 'ab-popout' | 'iframe'; + export interface ScreenSnapshot { state: ScreenState; + /** The surface's current render backend; absent ⇒ `ab-screencast`. Drives the + * far-left chip glyph (frame-corners = iframe; lock = screencast). */ + renderMode?: RenderMode; /** The browser's live CSS viewport + inferred device pixel ratio. */ viewport: { w: number; h: number; dpr: number }; /** The pane's CSS pixel size (the canvas render area). */ @@ -40,10 +58,16 @@ export interface ScreenActions { applyViewport(w: number, h: number, dpr: number): void; /** Open the screen modal for this surface. */ openModal(): void; + /** Swap this surface's render backend in place, preserving the target + * (docs/specs/dor-browser.md → "Display Modal And Render Swaps"). This is + * the single entry point for every mode, including `popout` (relaunch headed + * — docs/specs/dor-browser.md → "Pop-Out"). Absent until the + * swap is wired; the modal hides its Render section without it. */ + setRenderMode?(mode: RenderMode): void; } /** What the browser-chrome header reads about the active tab - * (docs/specs/dor-agent-browser.md → "Browser-chrome header"). Updated on its + * (docs/specs/dor-browser.md → "Browser Chrome"). Updated on its * own cadence (tab stream messages), separate from the screen snapshot which * churns on resize. */ export interface ChromeSnapshot { @@ -82,6 +106,9 @@ export interface ScreenController { readonly chromeActions: ChromeActions; /** Whether the host can run `agentBrowserCommand` (false ⇒ resizes inert). */ readonly hostCapable: boolean; + /** Whether this host/platform can pop the surface out to a headed OS window + * (false/absent on web; gates the modal's `popout` render option). */ + readonly canPopOut?: boolean; } interface ScreenEntry { @@ -122,6 +149,7 @@ export function registerAgentBrowserScreen( chrome: ChromeSnapshot; chromeActions: ChromeActions; hostCapable: boolean; + canPopOut?: boolean; }, ): ScreenRegistration { const entry: ScreenEntry = { @@ -144,6 +172,7 @@ export function registerAgentBrowserScreen( chrome: () => entry.chrome, chromeActions: init.chromeActions, hostCapable: init.hostCapable, + canPopOut: init.canPopOut, }, }; registry.set(id, entry); diff --git a/lib/src/components/wall/agent-browser-screenshot-loop.ts b/lib/src/components/wall/agent-browser-screenshot-loop.ts index 399f98a7..7127cfcd 100644 --- a/lib/src/components/wall/agent-browser-screenshot-loop.ts +++ b/lib/src/components/wall/agent-browser-screenshot-loop.ts @@ -57,14 +57,34 @@ export function createScreenshotLoop(deps: ScreenshotLoopDeps): ScreenshotLoop { dirty = false; const mySeq = ++seq; lastStart = performance.now(); + console.log(`[agent-browser] screenshot start ${JSON.stringify({ session, seq: mySeq })}`); + // Watchdog: a capture that never resolves (a wedged host round-trip) must not + // pin `inFlight` forever and silently freeze the screencast. Free the slot and + // retry after a generous bound; a late resolve is dropped by the seq guard. + let settled = false; + const watchdog = setTimeout(() => { + if (settled) return; + settled = true; + inFlight = false; + console.warn(`[agent-browser] screenshot capture stalled (>8s) ${JSON.stringify({ session, seq: mySeq, dirty, willRetry: dirty })}`); + if (dirty) schedule(); + }, 8000); platform.agentBrowserScreenshot(session, { format: 'jpeg', quality: 85 }, deps.getBinaryPath()).then((res) => { - avgMs = avgMs * 0.6 + (performance.now() - lastStart) * 0.4; + if (settled) return; + settled = true; + clearTimeout(watchdog); + const elapsedMs = performance.now() - lastStart; + console.log(`[agent-browser] screenshot done ${JSON.stringify({ session, seq: mySeq, ok: res.ok, bytes: res.bytes?.byteLength ?? 0, elapsedMs: Math.round(elapsedMs), dirty })}`); + avgMs = avgMs * 0.6 + elapsedMs * 0.4; inFlight = false; if (res.ok && res.bytes) display(res.bytes, res.mime || 'image/jpeg', mySeq); else console.warn('[agent-browser] screenshot failed:', res.error ?? '(no data)'); if (dirty) schedule(); }).catch((err) => { - console.warn('[agent-browser] screenshot error:', err); + if (settled) return; + settled = true; + clearTimeout(watchdog); + console.warn(`[agent-browser] screenshot error ${JSON.stringify({ session, seq: mySeq })}:`, err); inFlight = false; if (dirty) schedule(); }); diff --git a/lib/src/components/wall/agent-browser-sessions.test.ts b/lib/src/components/wall/agent-browser-sessions.test.ts new file mode 100644 index 00000000..27683840 --- /dev/null +++ b/lib/src/components/wall/agent-browser-sessions.test.ts @@ -0,0 +1,35 @@ +import { afterEach, describe, expect, it } from 'vitest'; +import { + clearAgentBrowserSessionClosed, + isAgentBrowserSessionClosed, + markAgentBrowserSessionClosed, +} from './agent-browser-sessions'; + +afterEach(() => { + // Module state is process-global; reset the names this suite touches. + clearAgentBrowserSessionClosed('dormouse.1.gui-abc'); + clearAgentBrowserSessionClosed('dormouse.1.default'); +}); + +describe('agent-browser session teardown guard', () => { + it('reports a session as closed only after it is marked', () => { + expect(isAgentBrowserSessionClosed('dormouse.1.gui-abc')).toBe(false); + markAgentBrowserSessionClosed('dormouse.1.gui-abc'); + expect(isAgentBrowserSessionClosed('dormouse.1.gui-abc')).toBe(true); + }); + + it('clears the mark when a new surface re-takes the session name', () => { + // Kill marks the name closed; a later `dor ab` re-creating the same managed + // name must clear it so the new surface's auto-revert is live again. + markAgentBrowserSessionClosed('dormouse.1.default'); + expect(isAgentBrowserSessionClosed('dormouse.1.default')).toBe(true); + clearAgentBrowserSessionClosed('dormouse.1.default'); + expect(isAgentBrowserSessionClosed('dormouse.1.default')).toBe(false); + }); + + it('tracks sessions independently', () => { + markAgentBrowserSessionClosed('dormouse.1.gui-abc'); + expect(isAgentBrowserSessionClosed('dormouse.1.default')).toBe(false); + expect(isAgentBrowserSessionClosed('dormouse.1.gui-abc')).toBe(true); + }); +}); diff --git a/lib/src/components/wall/agent-browser-sessions.ts b/lib/src/components/wall/agent-browser-sessions.ts new file mode 100644 index 00000000..099dd929 --- /dev/null +++ b/lib/src/components/wall/agent-browser-sessions.ts @@ -0,0 +1,33 @@ +/** + * Tracks agent-browser sessions Dormouse has *deliberately* closed (a pane kill, + * or a render-swap away from the screencast/popout backend), so a popped-out + * surface's auto-revert doesn't resurrect a session that's being torn down. + * + * The race (docs/specs/dor-browser.md → "Pop-Out"): + * killing or swapping a popped-out surface issues `agent-browser … close`, which + * drops the headed stream. The panel's auto-revert reads that dropped stream as + * "the user closed the window" and relaunches the session headless — bringing + * back a session (and process) that was meant to die. Marking the session closed + * *before* issuing the close lets the auto-revert stand down. + * + * A managed session name can be re-created later (e.g. `dor ab` re-opening + * `dormouse.1.default` after a kill), so a freshly-mounted panel clears the mark + * for its session — the flag means "this specific live surface is going away," + * not "this name is forever dead." + */ +const closedSessions = new Set<string>(); + +/** Mark a session as being closed by Dormouse (call before issuing `close`). */ +export function markAgentBrowserSessionClosed(session: string): void { + closedSessions.add(session); +} + +/** Clear the mark — a new surface is taking ownership of this session name. */ +export function clearAgentBrowserSessionClosed(session: string): void { + closedSessions.delete(session); +} + +/** Whether Dormouse is deliberately tearing this session down right now. */ +export function isAgentBrowserSessionClosed(session: string): boolean { + return closedSessions.has(session); +} diff --git a/lib/src/components/wall/browser-surface.ts b/lib/src/components/wall/browser-surface.ts new file mode 100644 index 00000000..b1fb5274 --- /dev/null +++ b/lib/src/components/wall/browser-surface.ts @@ -0,0 +1,49 @@ +/** + * Browser-surface param classification — the single source of truth for "what + * renderer does this pane use?" and "is this a browser pane at all?", including + * migration of layouts persisted before `renderMode` existed (surfaceType + * 'iframe' | 'agent-browser' + a `poppedOut` boolean). Used by the BrowserPanel + * shell, the Wall (dispatch + lifecycle + CLI type), and the dev-server-port + * correlation, so the classification never drifts between them. + */ +import type { RenderMode } from './agent-browser-screen'; + +type BrowserParamsLike = { + surfaceType?: unknown; + renderMode?: unknown; + session?: unknown; + poppedOut?: unknown; +}; + +function asParams(params: unknown): BrowserParamsLike { + return params && typeof params === 'object' ? (params as BrowserParamsLike) : {}; +} + +/** Resolve the canonical render mode, migrating a legacy layout blob. */ +export function resolveRenderMode(params: unknown): RenderMode { + const p = asParams(params); + if (p.renderMode === 'ab-screencast' || p.renderMode === 'ab-popout' || p.renderMode === 'iframe') { + return p.renderMode; + } + if (p.surfaceType === 'iframe') return 'iframe'; + if (p.surfaceType === 'agent-browser' || typeof p.session === 'string') { + return p.poppedOut ? 'ab-popout' : 'ab-screencast'; + } + return 'iframe'; +} + +/** Whether params describe an agent-browser-rendered surface (ab-screencast / + * ab-popout, or a legacy agent-browser blob). */ +export function isAgentBrowserParams(params: unknown): boolean { + const p = asParams(params); + return p.renderMode === 'ab-screencast' || p.renderMode === 'ab-popout' || p.surfaceType === 'agent-browser'; +} + +/** Whether params describe any browser surface (vs a terminal): the unified + * 'browser' type, a legacy iframe/agent-browser blob, or anything carrying a + * renderMode. */ +export function isBrowserParams(params: unknown): boolean { + const p = asParams(params); + return p.surfaceType === 'browser' || p.surfaceType === 'iframe' + || p.surfaceType === 'agent-browser' || typeof p.renderMode === 'string'; +} diff --git a/lib/src/components/wall/browser-url.ts b/lib/src/components/wall/browser-url.ts index d2d4e08a..1ab2322f 100644 --- a/lib/src/components/wall/browser-url.ts +++ b/lib/src/components/wall/browser-url.ts @@ -1,6 +1,6 @@ /** * Small URL helpers for the agent-browser surface header - * (see docs/specs/dor-agent-browser.md → "Browser-chrome header"). + * (see docs/specs/dor-browser.md → "Browser Chrome"). * * The header shows a tab's URL as host+path (Chrome-style, the scheme and any * query/hash dropped) and, when that URL is loopback, correlates its port to a diff --git a/lib/src/components/wall/use-dev-server-ports.ts b/lib/src/components/wall/use-dev-server-ports.ts index 4fbc170a..77644a6b 100644 --- a/lib/src/components/wall/use-dev-server-ports.ts +++ b/lib/src/components/wall/use-dev-server-ports.ts @@ -1,6 +1,6 @@ /** * Wall-side driver for the dev-server connection chip - * (docs/specs/dor-agent-browser.md → "Dev-server connection"). + * (docs/specs/dor-browser.md → "Dev-Server Chip"). * * A browser-surface header can't see other panes' open ports, so it registers * the loopback port it's showing in the shared store (`useDevServerMatch`) and @@ -36,6 +36,7 @@ import { subscribeWantedDevServerPorts, } from './agent-browser-ports'; import type { DooredItem } from './wall-types'; +import { isBrowserParams } from './browser-surface'; // Wait this long after interest changes before scanning, so a tab's open + // initial screencast settle first and quick navigation coalesces into one scan. @@ -49,9 +50,7 @@ const IDLE_TIMEOUT_MS = 2000; type ResolveOutcome = 'busy' | 'idle' | 'pending'; function isTerminalParams(params: unknown): boolean { - if (!params || typeof params !== 'object') return true; - const surfaceType = (params as { surfaceType?: unknown }).surfaceType; - return surfaceType !== 'iframe' && surfaceType !== 'agent-browser'; + return !isBrowserParams(params); } function isTerminalDoor(door: DooredItem): boolean { diff --git a/lib/src/components/wall/use-dockview-ready.ts b/lib/src/components/wall/use-dockview-ready.ts index daf2794e..acb09547 100644 --- a/lib/src/components/wall/use-dockview-ready.ts +++ b/lib/src/components/wall/use-dockview-ready.ts @@ -1,4 +1,4 @@ -import { useCallback, type Dispatch, type RefObject, type SetStateAction } from 'react'; +import { useCallback, useRef, type Dispatch, type RefObject, type SetStateAction } from 'react'; import type { DockviewApi, DockviewGroupPanel, @@ -48,19 +48,35 @@ export function useDockviewReady({ setSelectedId: Dispatch<SetStateAction<string | null>>; onApiReady?: (api: DockviewApi) => void; }): (event: DockviewReadyEvent) => void { + // handleReady must be idempotent across a dockview remount. React StrictMode + // (dev) mounts dockview → fires onReady → disposes it → mounts a fresh dockview + // → fires onReady AGAIN, on the same Wall instance (so these refs persist). + // Consuming the initial ids/layout on the first pass would leave the surviving + // second dockview with nothing to restore — it would fall back to a freshly + // generated pane id, dropping the restored session and (in the website + // playground) the pane that onApiReady's addPanel references. So resolve the + // restoration once and cache it; every onReady replays the same result. + const resolvedRef = useRef<{ mode: 'layout' | 'panes'; paneIds: string[] } | null>(null); + return useCallback((e: DockviewReadyEvent) => { apiRef.current = e.api; setDockviewApi(e.api); - const restored = initialPaneIdsRef.current; const layout = restoredLayoutRef.current; const restoredDoors = initialDoorsRef.current; - initialPaneIdsRef.current = undefined; - restoredLayoutRef.current = undefined; - initialDoorsRef.current = []; doorsRef.current = restoredDoors; setDoors(restoredDoors); + let resolution = resolvedRef.current; + if (!resolution) { + const restored = initialPaneIdsRef.current; + const hasRestored = !!restored && restored.length > 0; + resolution = layout && hasRestored + ? { mode: 'layout', paneIds: restored! } + : { mode: 'panes', paneIds: hasRestored ? restored! : [generatePaneId()] }; + resolvedRef.current = resolution; + } + const primeDefaultShell = (id: string) => { const defaults = getDefaultShellOpts(); if (defaults?.shell) { @@ -81,24 +97,21 @@ export function useDockviewReady({ }); }; - if (layout && restored && restored.length > 0) { + if (resolution.mode === 'layout') { try { e.api.fromJSON(layout as SerializedDockview); - setSelectedId(restored[0]); + setSelectedId(resolution.paneIds[0]); } catch { - for (const id of restored) { + for (const id of resolution.paneIds) { addTerminalPanel(id); } - setSelectedId(restored[0]); + setSelectedId(resolution.paneIds[0]); } } else { - const paneIds = restored && restored.length > 0 - ? restored - : [generatePaneId()]; - for (const id of paneIds) { + for (const id of resolution.paneIds) { addTerminalPanel(id); } - setSelectedId(paneIds[0]); + setSelectedId(resolution.paneIds[0]); } e.api.onWillShowOverlay((event) => { @@ -172,6 +185,7 @@ export function useDockviewReady({ killInProgressRef, modeRef, onApiReady, + resolvedRef, restoredLayoutRef, selectPane, selectedIdRef, diff --git a/lib/src/components/wall/use-wall-keyboard.ts b/lib/src/components/wall/use-wall-keyboard.ts index b538b605..54a34edb 100644 --- a/lib/src/components/wall/use-wall-keyboard.ts +++ b/lib/src/components/wall/use-wall-keyboard.ts @@ -39,8 +39,8 @@ export function useWallKeyboard(ctx: WallKeyboardCtx): void { // A focused cross-origin iframe owns the keyboard, so its keystrokes never // reach the capturing window listener above. The proxy shim posts our - // reserved leader chord back out (docs/specs/dor-iframe.md → "The keyboard - // side-channel"); feed it into the same dispatch the in-document dual-tap + // reserved leader chord back out (docs/specs/dor-browser.md → "The iframe + // shim message channel"); feed it into the same dispatch the in-document dual-tap // would, after validating the message came from a live proxy origin. const onMessage = (e: MessageEvent) => { const data = e.data as { __dormouse?: unknown } | null; diff --git a/lib/src/components/wall/use-window-focused.ts b/lib/src/components/wall/use-window-focused.ts index 4a75895d..0078aeda 100644 --- a/lib/src/components/wall/use-window-focused.ts +++ b/lib/src/components/wall/use-window-focused.ts @@ -8,7 +8,7 @@ export function useWindowFocused(): boolean { // though the app hasn't been backgrounded — the focused element is just an // <iframe> *inside* this document, so `document.hasFocus()` stays true. // Reading it instead of blindly setting false keeps headers/attention live - // when an iframe takes focus (docs/specs/dor-iframe.md → "#2"). + // when an iframe takes focus (docs/specs/dor-browser.md → "Iframe Focus And Rendering Notes"). const onBlur = () => setFocused(document.hasFocus()); window.addEventListener('focus', onFocus); window.addEventListener('blur', onBlur); diff --git a/lib/src/components/wall/wall-context.tsx b/lib/src/components/wall/wall-context.tsx index 08fd3a9e..ae64d1cb 100644 --- a/lib/src/components/wall/wall-context.tsx +++ b/lib/src/components/wall/wall-context.tsx @@ -1,6 +1,7 @@ import { createContext } from 'react'; import type { AlertButtonActionResult, SessionStatus, SetTerminalUserTitleResult } from '../../lib/terminal-registry'; import type { WallMode, SpawnDirection } from './wall-types'; +import type { RenderMode } from './agent-browser-screen'; export interface PaneElementsState { elements: Map<string, HTMLElement>; @@ -38,6 +39,16 @@ export interface WallActions { onStartRename: (id: string) => void; onFinishRename: (id: string, value: string) => SetTerminalUserTitleResult; onCancelRename: () => void; + /** Swap a surface's render backend in place, preserving the target URL + * (docs/specs/dor-browser.md → "Display Modal And Render Swaps"). agent-browser ↔ iframe is a + * surface-type replacement; screencast ↔ popout is handled inside the + * agent-browser panel and does not route here. */ + onSwapRenderMode: (id: string, mode: RenderMode) => void; + /** Open a URL as a new iframe browser pane, split next to `id`. The iframe + * renderer is single-frame, so a page's new-tab request (target=_blank / + * window.open, surfaced by the proxy shim) becomes a new pane + * (docs/specs/dor-browser.md → "Iframe Shim"). */ + onOpenBrowserPane?: (id: string, url: string) => void; } export const WallActionsContext = createContext<WallActions>({ @@ -53,6 +64,8 @@ export const WallActionsContext = createContext<WallActions>({ onStartRename: () => {}, onFinishRename: () => ({ accepted: true }), onCancelRename: () => {}, + onSwapRenderMode: () => {}, + onOpenBrowserPane: () => {}, }); export const RenamingIdContext = createContext<string | null>(null); diff --git a/lib/src/host/agent-browser-host.test.ts b/lib/src/host/agent-browser-host.test.ts new file mode 100644 index 00000000..e643e07a --- /dev/null +++ b/lib/src/host/agent-browser-host.test.ts @@ -0,0 +1,75 @@ +import { EventEmitter } from 'events'; +import { mkdtempSync } from 'fs'; +import { tmpdir } from 'os'; +import { join } from 'path'; +import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest'; +import { createAgentBrowserHost } from './agent-browser-host'; + +type SpawnResult = { stdout?: string; stderr?: string; code?: number }; + +const spawnMock = vi.hoisted(() => vi.fn()); + +vi.mock('child_process', () => ({ + spawn: spawnMock, +})); + +function enqueueSpawnResults(results: SpawnResult[]) { + const queue = [...results]; + spawnMock.mockImplementation((binary: string, args: string[]) => { + const result = queue.shift(); + if (!result) throw new Error(`unexpected spawn: ${binary} ${args.join(' ')}`); + const child = new EventEmitter() as EventEmitter & { + stdout: EventEmitter; + stderr: EventEmitter; + }; + child.stdout = new EventEmitter(); + child.stderr = new EventEmitter(); + queueMicrotask(() => { + if (result.stdout) child.stdout.emit('data', result.stdout); + if (result.stderr) child.stderr.emit('data', result.stderr); + child.emit('close', result.code ?? 0); + }); + return child; + }); +} + +describe('agent-browser host relaunch', () => { + const originalSocketDir = process.env.AGENT_BROWSER_SOCKET_DIR; + + beforeEach(() => { + spawnMock.mockReset(); + process.env.AGENT_BROWSER_SOCKET_DIR = mkdtempSync(join(tmpdir(), 'dormouse-ab-host-test-')); + }); + + afterEach(() => { + if (originalSocketDir === undefined) delete process.env.AGENT_BROWSER_SOCKET_DIR; + else process.env.AGENT_BROWSER_SOCKET_DIR = originalSocketDir; + }); + + it('closes a stray about:blank tab when tab list reports CLI-style id fields', async () => { + enqueueSpawnResults([ + {}, // close + {}, // --headed open + { + stdout: JSON.stringify({ + tabs: [ + { id: 'blank-tab', url: 'about:blank', active: false }, + { id: 'real-tab', url: 'https://example.com/', active: true }, + ], + }), + }, + {}, // tab close blank-tab + { stdout: JSON.stringify({ port: 61218 }) }, + ]); + + const host = createAgentBrowserHost({ writeClipboardText: vi.fn() }); + const result = await host.popOut('dormouse.1.default', { url: 'https://example.com/' }, '/usr/local/bin/agent-browser'); + + expect(result).toEqual({ ok: true, wsPort: 61218 }); + expect(spawnMock).toHaveBeenCalledWith( + '/usr/local/bin/agent-browser', + ['--session', 'dormouse.1.default', 'tab', 'close', 'blank-tab'], + expect.anything(), + ); + }); +}); diff --git a/lib/src/host/agent-browser-host.ts b/lib/src/host/agent-browser-host.ts new file mode 100644 index 00000000..bcdbbd8f --- /dev/null +++ b/lib/src/host/agent-browser-host.ts @@ -0,0 +1,459 @@ +/** + * Host-agnostic agent-browser support (docs/specs/dor-browser.md → + * "Agent-Browser Host Capabilities"). The single source of truth for both hosts: + * + * - VS Code: the extension host imports this directly + * (`vscode-ext/src/agent-browser-host.ts`). + * - Standalone: bundled to `standalone/sidecar/agent-browser-host.cjs` and run + * by the Node sidecar, fronted by thin Rust forwarders — exactly how the + * iframe proxy (`iframe-proxy.ts`) is shared. + * + * Everything here is plain Node (child_process / fs / crypto), so the *same* + * code runs on both hosts. Only two genuinely host-specific bits are injected: + * writing the OS clipboard (for the macOS editing chords) and logging. + * + * Narrow capabilities, all on behalf of the webview: + * + * 1. `command` — runs the user's agent-browser binary against a session for tab + * actions, navigation, and teardown. Subcommands are allowlisted; not a + * general exec channel. + * 2. `edit` — host-owned `eval` for the macOS editing chords + * (select-all/copy/cut) the stream input path can't dispatch; copy/cut land + * on the OS clipboard. + * 3. `screenshot` — captures one device-resolution frame and returns the bytes. + * 4. `streamStatus` — reads the current stream port so restored panels recover + * from a stale persisted `wsPort`. + * 5. `open` — spawns a managed namespaced session and opens a url, backing a + * render swap (docs/specs/dor-browser.md → "Display Modal And Render Swaps"). + * 6. `popOut` / `popIn` — relaunch a session headed/headless at its live active + * url (Chrome's mode is fixed at launch, so this is a close + relaunch). + * 7. `closePoppedOut` — close every still-headed window, called from each host's + * shutdown so quitting never orphans a real Chrome window. + * + * The VS Code stream relay is NOT here: it works around the `vscode-webview://` + * origin the agent-browser stream server rejects, which is a VS-Code-only + * concern (the standalone webview's `tauri://localhost` origin is accepted, so + * it connects directly). It stays in the VS Code host. + */ +import * as os from 'os'; +import * as path from 'path'; +import { promises as fs } from 'fs'; +import { spawn } from 'child_process'; +import { randomBytes } from 'crypto'; +import { type AgentBrowserTab, parseAgentBrowserTabs } from '../lib/agent-browser-tab'; +import { + AGENT_BROWSER_ALLOWED_SUBCOMMANDS, + type AgentBrowserCommandResult, + type AgentBrowserEditOp, + type AgentBrowserEditResult, + type AgentBrowserOpenResult, + type AgentBrowserPopResult, + type AgentBrowserScreenshotResult, + type AgentBrowserStreamStatusResult, +} from '../lib/platform/types'; + +const ALLOWED_SUBCOMMANDS = new Set<string>(AGENT_BROWSER_ALLOWED_SUBCOMMANDS); + +// The host owns the exact JS for each editing op — the webview only selects a +// name, so this never becomes an arbitrary-eval channel. copy/cut return the +// selected text; selectAll returns ''. Inputs/textareas use selection ranges; +// everything else falls back to the Selection API + execCommand. +const EDIT_SCRIPTS: Record<AgentBrowserEditOp, string> = { + selectAll: `(()=>{const el=document.activeElement;if(el&&'select'in el&&'value'in el){el.select();}else{document.execCommand('selectAll');}return'';})()`, + copy: `(()=>{const el=document.activeElement;if(el&&'selectionStart'in el&&el.selectionStart!=null){return el.value.slice(el.selectionStart,el.selectionEnd);}return String(window.getSelection()||'');})()`, + cut: `(()=>{const el=document.activeElement;if(el&&'selectionStart'in el&&el.selectionStart!=null){const s=el.selectionStart,e=el.selectionEnd,t=el.value.slice(s,e);el.setRangeText('',s,e,'end');el.dispatchEvent(new Event('input',{bubbles:true}));return t;}const sel=String(window.getSelection()||'');if(sel)document.execCommand('delete');return sel;})()`, +}; + +const STREAM_PORT_READ_ATTEMPTS = 4; +const STREAM_PORT_READ_DELAY_MS = 150; +const delay = (ms: number): Promise<void> => new Promise((resolve) => setTimeout(resolve, ms)); + +export interface AgentBrowserHostDeps { + /** Write text to the OS clipboard (copy/cut land here). VS Code passes + * `vscode.env.clipboard.writeText`; the sidecar shells out (pbcopy/clip/…). */ + writeClipboardText: (text: string) => Promise<void> | void; + /** Optional diagnostic logger. */ + log?: (message: string) => void; +} + +export interface AgentBrowserHost { + command(session: string, args: string[], binaryPath?: string): Promise<AgentBrowserCommandResult>; + edit(session: string, op: AgentBrowserEditOp, binaryPath?: string): Promise<AgentBrowserEditResult>; + screenshot(session: string, opts: { format?: 'jpeg' | 'png'; quality?: number }, binaryPath?: string): Promise<AgentBrowserScreenshotResult>; + streamStatus(session: string, binaryPath?: string): Promise<AgentBrowserStreamStatusResult>; + open(url: string, opts: { headed?: boolean }, binaryPath?: string): Promise<AgentBrowserOpenResult>; + popOut(session: string, opts: { rect?: { x: number; y: number; width: number; height: number }; url?: string }, binaryPath?: string): Promise<AgentBrowserPopResult>; + popIn(session: string, opts: { url?: string }, binaryPath?: string): Promise<AgentBrowserPopResult>; + closePoppedOut(): Promise<void>; +} + +export function createAgentBrowserHost(deps: AgentBrowserHostDeps): AgentBrowserHost { + const log = deps.log ?? (() => {}); + + // Sessions currently relaunched headed via pop-out, mapped to the binary path + // that spawned them. A headed session is a real OS window, so the host must + // close it on shutdown or it orphans (spec → "Headed Pop-Out" lifecycle: + // "Dormouse/editor quits → headed windows are cleaned up; no orphans"). + // Headless sessions are deliberately NOT tracked — they're left alive to + // reattach across webview reloads (the wsPort/stream-recovery design). + const poppedOutSessions = new Map<string, string | undefined>(); + + // The host's PATH is often the GUI login PATH (no nvm/volta shims), so prefer + // the absolute path `dor ab` resolved in the user's terminal; fall through on + // ENOENT in case it has gone stale. + async function runWithBinaryFallback(args: string[], binaryPath?: string): Promise<AgentBrowserCommandResult> { + const candidates = [...new Set([ + binaryPath, + process.env.DORMOUSE_AGENT_BROWSER_BIN, + 'agent-browser', + ].filter((c): c is string => !!c))]; + + let lastError = ''; + for (const binary of candidates) { + const result = await spawnAgentBrowser(binary, args); + if (result !== 'ENOENT') return result; + lastError = `'${binary}' was not found`; + log(`[agent-browser] ${lastError}; trying next candidate`); + } + return { exitCode: 1, stdout: '', stderr: `agent-browser binary not found (${lastError})` }; + } + + function spawnAgentBrowser(binary: string, args: string[]): Promise<AgentBrowserCommandResult | 'ENOENT'> { + return new Promise((resolve) => { + const child = spawn(binary, args, { stdio: ['ignore', 'pipe', 'pipe'] }); + let stdout = ''; + let stderr = ''; + child.stdout.on('data', (chunk) => { stdout += String(chunk); }); + child.stderr.on('data', (chunk) => { stderr += String(chunk); }); + child.on('error', (err: NodeJS.ErrnoException) => { + if (err.code === 'ENOENT') { + resolve('ENOENT'); + return; + } + log(`[agent-browser] spawn failed: ${err.message}`); + resolve({ exitCode: 1, stdout: '', stderr: err.message }); + }); + child.on('close', (code) => { + resolve({ exitCode: code ?? 1, stdout, stderr }); + }); + }); + } + + // Read a session's stream WebSocket port via `stream status --json`. Mirrors + // the parse in dor/src/commands/agent-browser.ts: { port } or { data: { port } }. + // Right after `open` / `--headed open` (a fresh spawn, a pop-out, or a pop-in + // relaunch) the daemon may not have published the port yet; a single read + // would then return undefined and leave the panel pinned to a stale port — it + // reads "ended" though the session is live. Retry briefly to close that window. + async function readStreamPort(session: string, binaryPath?: string): Promise<number | undefined> { + for (let attempt = 0; attempt < STREAM_PORT_READ_ATTEMPTS; attempt++) { + const result = await runWithBinaryFallback(['--session', session, 'stream', 'status', '--json'], binaryPath); + if (result.exitCode === 0) { + try { + const parsed = JSON.parse(result.stdout) as { port?: unknown; data?: { port?: unknown } }; + const port = parsed.data?.port ?? parsed.port; + if (typeof port === 'number' && Number.isFinite(port)) return port; + } catch { + // malformed output — fall through and retry + } + } + if (attempt < STREAM_PORT_READ_ATTEMPTS - 1) await delay(STREAM_PORT_READ_DELAY_MS); + } + return undefined; + } + + function usableRelaunchUrl(value: unknown): string | undefined { + if (typeof value !== 'string') return undefined; + const trimmed = value.trim(); + if (!trimmed || trimmed === 'about:blank') return undefined; + return trimmed; + } + + // Enumerate a session's tabs via `tab list --json`. Envelope mirrors the rest + // of the CLI parsing here: { tabs } or { data: { tabs } }; the record parse is + // shared with the live stream (parseAgentBrowserTabs). Returns [] on any + // failure so callers degrade gracefully. + async function listTabs(session: string, binaryPath?: string): Promise<AgentBrowserTab[]> { + const result = await runWithBinaryFallback(['--session', session, 'tab', 'list', '--json'], binaryPath); + if (result.exitCode !== 0) return []; + try { + const parsed = JSON.parse(result.stdout) as { tabs?: unknown; data?: { tabs?: unknown } }; + return parseAgentBrowserTabs(parsed.data?.tabs ?? parsed.tabs); + } catch { + return []; + } + } + + // Dormouse is the source of truth for the relaunch target: the panel observes + // the live `tabs` stream and tracks the active tab's URL in its params, then + // passes it here. We deliberately do NOT re-query the daemon — right after + // `close` the daemon relaunches at about:blank, so a `get url` / `tab list` + // would race the very transition it's meant to preserve and hand back blank. + function relaunchUrl(requestedUrl: unknown): string { + return usableRelaunchUrl(requestedUrl) ?? 'about:blank'; + } + + // agent-browser keeps a long-lived per-session daemon whose headed/headless + // mode is fixed at *its* launch. `close` only closes the browser, not the + // daemon, and there is no CLI verb to stop it — so a `--headed`/headless + // relaunch against a live daemon is silently ignored ("daemon already + // running"), and pop-out/pop-in never actually switches mode. The daemon's pid + // lives in `$AGENT_BROWSER_SOCKET_DIR/<session>.pid` (default ~/.agent-browser); + // terminate it and wait for the process to exit so the next `open` spawns a + // fresh daemon in the mode we ask for. Best-effort and cross-platform + // (process.kill works on win/mac/linux). + function agentBrowserStateDir(): string { + return process.env.AGENT_BROWSER_SOCKET_DIR || path.join(os.homedir(), '.agent-browser'); + } + + async function killDaemon(session: string): Promise<void> { + const pidFile = path.join(agentBrowserStateDir(), `${session}.pid`); + let pid: number; + try { + pid = Number.parseInt((await fs.readFile(pidFile, 'utf8')).trim(), 10); + } catch { + return; // no pid file — nothing to kill (already gone, or custom dir) + } + if (!Number.isInteger(pid) || pid <= 0) return; + try { + process.kill(pid, 'SIGTERM'); + } catch { + return; // ESRCH: already dead + } + // Wait for the process to actually exit (signal 0 throws once it's gone), so + // the relaunch doesn't race a daemon that's still shutting down. + for (let i = 0; i < 40; i++) { + try { + process.kill(pid, 0); + } catch { + log(`[ab-relaunch] daemon ${pid} for ${session} exited after ${i * 50}ms`); + return; + } + await delay(50); + } + log(`[ab-relaunch] daemon ${pid} for ${session} still alive after 2s; SIGKILL`); + try { process.kill(pid, 'SIGKILL'); } catch { /* ignore */ } + } + + // After a relaunch, close any stray about:blank tab the close+reopen race can + // leave behind — but only when a real page is open, so we never close the sole + // tab. Best-effort: a failure here must not fail the pop-out/pop-in. + async function closeStrayBlankTabs(session: string, binaryPath?: string): Promise<void> { + const tabs = await listTabs(session, binaryPath); + log(`[ab-relaunch] tabs after open: ${JSON.stringify(tabs)}`); + if (tabs.length < 2 || !tabs.some((t) => usableRelaunchUrl(t.url))) return; + for (const tab of tabs) { + if (!usableRelaunchUrl(tab.url)) { + log(`[ab-relaunch] closing stray blank tab ${tab.tabId}`); + await runWithBinaryFallback(['--session', session, 'tab', 'close', tab.tabId], binaryPath); + } + } + } + + // A fresh managed session for a surface spawned from the GUI (no `--key`), + // mirroring `dor ab`'s `dormouse.<workspaceId>.<key>` namespacing so it can't + // collide with a user's own agent-browser sessions. + function generateGuiSession(): string { + return `dormouse.1.gui-${randomBytes(6).toString('hex')}`; + } + + // Reused per session so we don't litter tmp with one file per frame; the panel + // guarantees one screenshot in flight per surface, so overwriting is safe. + function screenshotPath(session: string, ext: string): string { + const safe = session.replace(/[^A-Za-z0-9._-]/g, '_'); + return path.join(os.tmpdir(), `dormouse-ab-shot-${safe}.${ext}`); + } + + async function command(session: string, args: string[], binaryPath?: string): Promise<AgentBrowserCommandResult> { + if (typeof session !== 'string' || !session) { + return { exitCode: 1, stdout: '', stderr: 'session is required' }; + } + const subcommand = args[0]; + if (!subcommand || !ALLOWED_SUBCOMMANDS.has(subcommand)) { + return { exitCode: 1, stdout: '', stderr: `agent-browser subcommand '${subcommand ?? ''}' is not allowed from the webview` }; + } + if (subcommand === 'get' && args[1] !== 'cdp-url') { + return { exitCode: 1, stdout: '', stderr: `agent-browser get '${args[1] ?? ''}' is not allowed from the webview` }; + } + // An explicit close (kill / render-swap) tears the session down itself, so + // it's no longer ours to clean up on shutdown. + if (subcommand === 'close') poppedOutSessions.delete(session); + return runWithBinaryFallback(['--session', session, ...args], binaryPath); + } + + async function edit(session: string, op: AgentBrowserEditOp, binaryPath?: string): Promise<AgentBrowserEditResult> { + if (typeof session !== 'string' || !session) { + return { ok: false, error: 'session is required' }; + } + const script = EDIT_SCRIPTS[op]; + if (!script) { + return { ok: false, error: `unknown edit op '${op}'` }; + } + + const result = await runWithBinaryFallback(['--session', session, 'eval', script, '--json'], binaryPath); + if (result.exitCode !== 0) { + return { ok: false, error: result.stderr.trim() || `eval exited ${result.exitCode}` }; + } + + // eval --json envelope: { success, data: { result }, error }. + let text = ''; + try { + const envelope = JSON.parse(result.stdout) as { success?: boolean; data?: { result?: unknown }; error?: unknown }; + if (envelope.success === false) { + return { ok: false, error: typeof envelope.error === 'string' ? envelope.error : `${op} failed` }; + } + if (typeof envelope.data?.result === 'string') text = envelope.data.result; + } catch { + return { ok: false, error: `could not parse eval output for ${op}` }; + } + + if (op === 'selectAll') return { ok: true }; + // Land the grabbed text on the user's real OS clipboard. Skip empty so an + // empty selection doesn't clobber what's already there. + if (text) { + try { + await deps.writeClipboardText(text); + } catch (err) { + return { ok: false, error: `clipboard write failed: ${err instanceof Error ? err.message : String(err)}` }; + } + } + return { ok: true, text }; + } + + // Capture one device-resolution frame via the user's agent-browser + // `screenshot` command (which honors the session's viewport/DPR, unlike the + // CSS-resolution screencast) and return the raw image bytes. agent-browser + // writes a file and reports the path; we read it back and hand the bytes to + // the caller (the VS Code host structured-clones them to the webview; the + // sidecar base64s them over stdio to Rust, which returns a raw Response). + async function screenshot( + session: string, + opts: { format?: 'jpeg' | 'png'; quality?: number }, + binaryPath?: string, + ): Promise<AgentBrowserScreenshotResult> { + if (typeof session !== 'string' || !session) { + return { ok: false, error: 'session is required' }; + } + const format = opts.format === 'png' ? 'png' : 'jpeg'; + const ext = format === 'png' ? 'png' : 'jpg'; + const out = screenshotPath(session, ext); + const args = ['--session', session, 'screenshot', out, '--screenshot-format', format]; + if (format === 'jpeg') { + const q = Number.isFinite(opts.quality) ? Math.min(100, Math.max(1, Math.round(opts.quality as number))) : 85; + args.push('--screenshot-quality', String(q)); + } + const result = await runWithBinaryFallback(args, binaryPath); + if (result.exitCode !== 0) { + log(`[agent-browser] screenshot failed (exit ${result.exitCode}): ${result.stderr.trim() || result.stdout.trim()}`); + return { ok: false, error: result.stderr.trim() || `screenshot exited ${result.exitCode}` }; + } + try { + const buffer = await fs.readFile(out); + // A Uint8Array view over exactly this file's bytes. + const bytes = new Uint8Array(buffer.buffer, buffer.byteOffset, buffer.byteLength); + return { ok: true, bytes, mime: format === 'png' ? 'image/png' : 'image/jpeg' }; + } catch (err) { + log(`[agent-browser] screenshot read failed: ${err instanceof Error ? err.message : String(err)}`); + return { ok: false, error: `could not read screenshot file: ${err instanceof Error ? err.message : String(err)}` }; + } + } + + async function streamStatus(session: string, binaryPath?: string): Promise<AgentBrowserStreamStatusResult> { + if (typeof session !== 'string' || !session) return { ok: false, error: 'session is required' }; + const wsPort = await readStreamPort(session, binaryPath); + if (!wsPort) return { ok: false, error: 'stream port unavailable' }; + return { ok: true, wsPort }; + } + + // Spawn a managed session and open <url> — backs swapping an iframe embed up + // to a live screencast (docs/specs/dor-browser.md → "Display Modal And Render Swaps"). With `headed`, + // the process launches headed in one shot so embed→popout doesn't open a + // headless browser only to tear it down. + async function open(url: string, opts: { headed?: boolean }, binaryPath?: string): Promise<AgentBrowserOpenResult> { + if (typeof url !== 'string' || !url) return { ok: false, error: 'url is required' }; + const session = generateGuiSession(); + const args = ['--session', session, ...(opts?.headed ? ['--headed'] : []), 'open', url]; + const opened = await runWithBinaryFallback(args, binaryPath); + if (opened.exitCode !== 0) { + return { ok: false, error: opened.stderr.trim() || `open exited ${opened.exitCode}` }; + } + // A headed spawn is a real OS window — track it so shutdown can close it. + if (opts?.headed) poppedOutSessions.set(session, binaryPath); + const wsPort = await readStreamPort(session, binaryPath); + return { ok: true, session, ...(wsPort ? { wsPort } : {}), ...(binaryPath ? { binaryPath } : {}) }; + } + + // Pop-out is a relaunch, not a live toggle: Chrome's headed/headless choice is + // fixed at launch (spec → "Headed Pop-Out"). Close the headless session, then + // reopen it headed at the active URL. (v1 preserves the active tab URL only; + // multi-tab + profile/cookie restore are tracked follow-ups. Window + // positioning over opts.rect is deferred — neither host acts on it yet, so the + // window opens where Chrome places it.) + async function popOut( + session: string, + opts: { rect?: { x: number; y: number; width: number; height: number }; url?: string }, + binaryPath?: string, + ): Promise<AgentBrowserPopResult> { + if (typeof session !== 'string' || !session) return { ok: false, error: 'session is required' }; + const url = relaunchUrl(opts?.url); + log(`[ab-relaunch] popOut session=${session} requestedUrl=${JSON.stringify(opts?.url)} -> open ${url}`); + // Close the browser, then fully stop the daemon so the headed relaunch isn't + // ignored as "daemon already running" (which would leave it headless). + await runWithBinaryFallback(['--session', session, 'close'], binaryPath); + await killDaemon(session); + const opened = await runWithBinaryFallback(['--session', session, '--headed', 'open', url], binaryPath); + log(`[ab-relaunch] popOut headed-open exit=${opened.exitCode}${opened.stderr.trim() ? ` stderr=${opened.stderr.trim()}` : ''}`); + if (opened.exitCode !== 0) { + return { ok: false, error: opened.stderr.trim() || `headed open exited ${opened.exitCode}` }; + } + // Now a real headed OS window — track it so shutdown can close it. + poppedOutSessions.set(session, binaryPath); + await closeStrayBlankTabs(session, binaryPath); + const wsPort = await readStreamPort(session, binaryPath); + log(`[ab-relaunch] popOut returning wsPort=${wsPort}`); + return { ok: true, ...(wsPort ? { wsPort } : {}) }; + } + + // The reverse: close the headed session and relaunch it headless at the active + // URL, resuming the screencast. + async function popIn( + session: string, + opts: { url?: string }, + binaryPath?: string, + ): Promise<AgentBrowserPopResult> { + if (typeof session !== 'string' || !session) return { ok: false, error: 'session is required' }; + const url = relaunchUrl(opts?.url); + log(`[ab-relaunch] popIn session=${session} requestedUrl=${JSON.stringify(opts?.url)} -> open ${url}`); + // Reverse of pop-out: the daemon is headed, so a plain `open` would reattach + // to it and stay headed. Stop the daemon so the relaunch comes up headless. + await runWithBinaryFallback(['--session', session, 'close'], binaryPath); + await killDaemon(session); + // The headed window is gone after the close above; back to headless. + poppedOutSessions.delete(session); + const opened = await runWithBinaryFallback(['--session', session, 'open', url], binaryPath); + log(`[ab-relaunch] popIn open exit=${opened.exitCode}${opened.stderr.trim() ? ` stderr=${opened.stderr.trim()}` : ''}`); + if (opened.exitCode !== 0) { + return { ok: false, error: opened.stderr.trim() || `open exited ${opened.exitCode}` }; + } + await closeStrayBlankTabs(session, binaryPath); + const wsPort = await readStreamPort(session, binaryPath); + log(`[ab-relaunch] popIn returning wsPort=${wsPort}`); + return { ok: true, ...(wsPort ? { wsPort } : {}) }; + } + + // Close every still-popped-out session's headed window. Called from each + // host's shutdown (VS Code `deactivate()`, the sidecar's `shutdown()`) so + // quitting doesn't orphan real Chrome windows. On a reload, a popped-out + // surface then auto-reverts to a headless screencast when it reactivates + // (spec → "The headed window ends → auto-revert"), which is preferable to + // leaving a detached headed Chrome behind. + async function closePoppedOut(): Promise<void> { + const entries = [...poppedOutSessions.entries()]; + poppedOutSessions.clear(); + await Promise.all(entries.map(([session, binaryPath]) => + runWithBinaryFallback(['--session', session, 'close'], binaryPath).catch(() => undefined), + )); + } + + return { command, edit, screenshot, streamStatus, open, popOut, popIn, closePoppedOut }; +} diff --git a/lib/src/host/iframe-proxy-rewrite.test.ts b/lib/src/host/iframe-proxy-rewrite.test.ts index 44e38250..ecd55da8 100644 --- a/lib/src/host/iframe-proxy-rewrite.test.ts +++ b/lib/src/host/iframe-proxy-rewrite.test.ts @@ -80,6 +80,29 @@ describe('instrumentHtml', () => { expect(IFRAME_SHIM).toContain('__dormouse'); expect(IFRAME_SHIM).toContain("'leader'"); expect(IFRAME_SHIM).toContain("'pointerdown'"); + expect(IFRAME_SHIM).toContain("'location'"); + expect(IFRAME_SHIM).toContain("addEventListener('click'"); + expect(IFRAME_SHIM).toContain('pushState'); + }); + + it('intercepts new-tab attempts (target=_blank / window.open) as open-window', () => { + expect(IFRAME_SHIM).toContain("'open-window'"); + // window.open is overridden so popups become a new pane rather than vanishing. + expect(IFRAME_SHIM).toContain('window.open=function'); + }); + + it('does not report a same-frame location for modifier / non-primary clicks', () => { + // Cmd/Ctrl/Shift/Alt+click and middle-click open a new tab/window without + // navigating the frame, so the shim must bail rather than post a stale + // location that would make the parent chrome URL bar lie. + expect(IFRAME_SHIM).toContain('e.metaKey||e.ctrlKey||e.shiftKey||e.altKey||e.button!==0'); + }); + + it('defers the same-frame location post and skips it when the click was cancelled', () => { + // The capture-phase post must wait a tick and respect a page that cancels + // the click (preventDefault / fetch-instead-of-navigate), else it reports a + // navigation that never happened. + expect(IFRAME_SHIM).toContain('if(!e.defaultPrevented)post(\'location\''); }); }); diff --git a/lib/src/host/iframe-proxy-rewrite.ts b/lib/src/host/iframe-proxy-rewrite.ts index ff7fb99c..367c34e9 100644 --- a/lib/src/host/iframe-proxy-rewrite.ts +++ b/lib/src/host/iframe-proxy-rewrite.ts @@ -1,6 +1,6 @@ /** * Pure, dependency-free helpers for the iframe transparent proxy - * (docs/specs/dor-iframe.md → "The Transparent Proxy"). + * (docs/specs/dor-browser.md → "Iframe Renderer"). * * Split out from the Node server (`iframe-proxy.ts`) so the policy/rewriting * logic is shared by every host that runs the proxy (VS Code extension host, @@ -34,10 +34,22 @@ export const STRIP_RESPONSE_HEADERS = new Set([ // only the frame, so the Wall can't see it; this lets it select the pane / // enter passthrough (#3). It's genuine user input, so it can't loop with the // parent's programmatic focus. +// - `location`: the proxied frame's current URL. The parent converts it back +// to the upstream URL and uses it to keep iframe Back/Forward/Reload chrome +// honest. export const IFRAME_SHIM = `(function(){ var P=window.parent; if(!P||P===window)return; - function post(t){try{P.postMessage({__dormouse:t},'*');}catch(e){}} + function post(t,d){try{var m={__dormouse:t};if(d)for(var k in d)m[k]=d[k];P.postMessage(m,'*');}catch(e){}} + function postLocation(){post('location',{url:String(location.href)});} + function anchorHref(e){ + var n=e&&e.target; + while(n&&n.nodeType===1){ + if(n.tagName&&String(n.tagName).toLowerCase()==='a'&&n.href)return n; + n=n.parentElement; + } + return null; + } function tap(s,e){ var now=Date.now(),side=e.location===1?'left':'right'; if(s.side==='left'&&side==='right'&&now-s.time<500){s.side=null;return true;} @@ -49,6 +61,45 @@ export const IFRAME_SHIM = `(function(){ else if(e.key==='Shift'){if(tap(shift,e))post('leader');} },true); addEventListener('pointerdown',function(){post('pointerdown');},true); + addEventListener('click',function(e){ + var a=anchorHref(e); + if(!a||a.hasAttribute('download'))return; + if(a.target&&a.target!=='_self'){ + // New-tab/window link: the iframe renderer is single-frame, so hand the + // URL to Dormouse to open as a new pane instead of letting it vanish. + e.preventDefault(); + post('open-window',{url:String(a.href)}); + return; + } + // Modifier / non-primary clicks (Cmd/Ctrl/Shift/Alt, middle button) open a + // new tab/window and leave this frame put — don't report a location the + // frame isn't actually showing, or the parent's URL bar + Back history lie. + if(e.metaKey||e.ctrlKey||e.shiftKey||e.altKey||e.button!==0)return; + // This is the capture phase, before the page's own handlers. Defer a tick + // and bail if the page cancelled the click (preventDefault, or an <a> that + // fetches instead of navigating) — else we'd report a navigation that never + // happened. A real navigation re-reports via the next document's shim, so + // nothing is lost if this frame is torn down before the tick fires. + var href=String(a.href); + setTimeout(function(){if(!e.defaultPrevented)post('location',{url:href});},0); + },true); + // window.open is likewise single-frame-hostile; redirect it to a new pane. + try{window.open=function(u){ + var url='';try{url=u?String(new URL(String(u),location.href)):'';}catch(_e){url=String(u||'');} + post('open-window',{url:url}); + return null; + };}catch(_e){} + addEventListener('popstate',postLocation,true); + addEventListener('hashchange',postLocation,true); + addEventListener('pageshow',postLocation,true); + var H=history; + if(H&&H.pushState&&H.replaceState){ + var p=H.pushState,r=H.replaceState; + H.pushState=function(){var v=p.apply(this,arguments);setTimeout(postLocation,0);return v;}; + H.replaceState=function(){var v=r.apply(this,arguments);setTimeout(postLocation,0);return v;}; + } + if(document.readyState==='loading')addEventListener('DOMContentLoaded',postLocation,{once:true}); + else setTimeout(postLocation,0); })();`; // Drop any in-document CSP (loopback "relax CSP") and inject the shim before diff --git a/lib/src/host/iframe-proxy.ts b/lib/src/host/iframe-proxy.ts index 83938e23..e36fc11d 100644 --- a/lib/src/host/iframe-proxy.ts +++ b/lib/src/host/iframe-proxy.ts @@ -1,6 +1,6 @@ /** * Host-agnostic transparent proxy for the iframe surface - * (docs/specs/dor-iframe.md → "The Transparent Proxy"). + * (docs/specs/dor-browser.md → "Iframe Renderer"). * * Instead of pointing the `<iframe>` at a `dor iframe <url>` target directly — * where a cross-origin frame owns the keyboard, hides load errors, and can be diff --git a/lib/src/lib/agent-browser-tab.ts b/lib/src/lib/agent-browser-tab.ts new file mode 100644 index 00000000..8afd94f7 --- /dev/null +++ b/lib/src/lib/agent-browser-tab.ts @@ -0,0 +1,35 @@ +/** + * A browser tab as reported by agent-browser. The same record shape arrives over + * two channels — the live stream's `tabs` messages and the CLI's `tab list + * --json` — so the parse lives here once, shared by the connection (webview) and + * the host (Node). Some CLI builds report the identifier as `id` instead of + * `tabId`, so both forms are accepted. + */ +export interface AgentBrowserTab { + tabId: string; + title: string | null; + url: string; + active: boolean; +} + +export function parseTabRecord(record: unknown): AgentBrowserTab | null { + if (!record || typeof record !== 'object') return null; + const t = record as Record<string, unknown>; + const tabId = typeof t.tabId === 'string' + ? t.tabId + : typeof t.id === 'string' + ? t.id + : null; + if (!tabId) return null; + return { + tabId, + title: typeof t.title === 'string' ? t.title : null, + url: typeof t.url === 'string' ? t.url : '', + active: t.active === true, + }; +} + +export function parseAgentBrowserTabs(raw: unknown): AgentBrowserTab[] { + if (!Array.isArray(raw)) return []; + return raw.map(parseTabRecord).filter((tab): tab is AgentBrowserTab => !!tab); +} diff --git a/lib/src/lib/iframe-proxy-registry.ts b/lib/src/lib/iframe-proxy-registry.ts index 8329cd9f..769e7c4d 100644 --- a/lib/src/lib/iframe-proxy-registry.ts +++ b/lib/src/lib/iframe-proxy-registry.ts @@ -1,7 +1,7 @@ /** * Tracks the loopback proxy origins of live iframe surfaces so the Wall's * keyboard/focus channel can trust `postMessage` events from instrumented - * frames (docs/specs/dor-iframe.md → "The keyboard side-channel"). The shim we + * frames (docs/specs/dor-browser.md → "Iframe Shim"). The shim we * inject calls `parent.postMessage(...)`, which is cross-origin-safe by design; * the Wall validates `event.origin` against this set before acting on a * forwarded leader chord, so only a frame Dormouse itself served can drive it. diff --git a/lib/src/lib/platform/iframe-proxy-types.ts b/lib/src/lib/platform/iframe-proxy-types.ts index cee410e4..93d09800 100644 --- a/lib/src/lib/platform/iframe-proxy-types.ts +++ b/lib/src/lib/platform/iframe-proxy-types.ts @@ -1,6 +1,6 @@ /** * Result of asking the host to front a `dor iframe` target with its transparent - * proxy (docs/specs/dor-iframe.md → "The Transparent Proxy"). On `ok` the panel + * proxy (docs/specs/dor-browser.md → "Iframe Renderer"). On `ok` the panel * points the `<iframe>` at `url` — a loopback proxy origin that fetches the * target, strips frame-blocking headers (loopback only), and injects the * Dormouse shim. On failure `reason` says why there is nothing to frame: diff --git a/lib/src/lib/platform/types.ts b/lib/src/lib/platform/types.ts index 21a74232..bf1636dc 100644 --- a/lib/src/lib/platform/types.ts +++ b/lib/src/lib/platform/types.ts @@ -47,8 +47,9 @@ export interface AgentBrowserCommandResult { * channel for tab actions, screen-mode resizing (`set viewport` / `set * device`), HiDPI frame capture (`screenshot`), navigation (`open <url>`, * `reload` / `back` / `forward`), and session teardown, not a general exec - * path. */ -export const AGENT_BROWSER_ALLOWED_SUBCOMMANDS = ['tab', 'set', 'screenshot', 'open', 'reload', 'back', 'forward', 'close'] as const; + * path. `get` is limited host-side to `get cdp-url` for CDP event + * subscription while a browser is popped out. */ +export const AGENT_BROWSER_ALLOWED_SUBCOMMANDS = ['tab', 'set', 'screenshot', 'open', 'reload', 'back', 'forward', 'close', 'get'] as const; export interface AgentBrowserScreenshotResult { ok: boolean; @@ -62,7 +63,7 @@ export interface AgentBrowserScreenshotResult { /** Native editing operations that the stream's input_keyboard path cannot * trigger on macOS (CDP drops the `commands` field — see - * docs/specs/dor-agent-browser.md and the upstream issue). The host owns the + * docs/specs/dor-browser.md and the upstream issue). The host owns the * exact JS for each; the webview only picks one of these names, so this stays * a purpose-built channel rather than an arbitrary-eval one. */ export type AgentBrowserEditOp = 'selectAll' | 'copy' | 'cut'; @@ -76,6 +77,40 @@ export interface AgentBrowserEditResult { export type { IframeProxyResult }; +/** Result of asking the host for the current stream status of an existing + * session. Used to recover persisted panels whose saved wsPort went stale + * across VS Code/webview reloads without exposing a generic `stream` exec + * channel to the webview. */ +export interface AgentBrowserStreamStatusResult { + ok: boolean; + wsPort?: number; + error?: string; +} + +/** Result of spawning a managed agent-browser session for a render swap + * (docs/specs/dor-browser.md → "Display Modal And Render Swaps"). */ +export interface AgentBrowserOpenResult { + ok: boolean; + /** The resolved/namespaced session name the new surface should bind to. */ + session?: string; + /** The session's stream WebSocket port. */ + wsPort?: number; + /** The binary path the host resolved, threaded back so later host commands + * (close, screenshot…) reuse it. */ + binaryPath?: string; + error?: string; +} + +/** Result of a headed/headless relaunch (docs/specs/dor-browser.md → + * "Pop-Out"). The Chrome process is replaced, so the stream port + * changes; the session name is preserved. */ +export interface AgentBrowserPopResult { + ok: boolean; + /** The new stream WebSocket port after the relaunch. */ + wsPort?: number; + error?: string; +} + export interface PlatformAdapter { // Lifecycle init(): Promise<void>; @@ -113,7 +148,7 @@ export interface PlatformAdapter { // VS Code-only escape hatch for mirrored workbench shortcuts from webviews. runWorkbenchCommand?(command: VSCodeWorkbenchCommand): void; - // agent-browser surface support (see docs/specs/dor-agent-browser.md). + // agent-browser surface support (see docs/specs/dor-browser.md). // Runs the user's agent-browser binary against a session; the host validates // args[0] against AGENT_BROWSER_ALLOWED_SUBCOMMANDS. `binaryPath` is the // absolute path resolved by `dor ab` in the invoking terminal — the host's @@ -132,18 +167,44 @@ export interface PlatformAdapter { // signals. Absent on hosts that can't run the binary (degrades to rendering // the screencast frames directly). agentBrowserScreenshot?(session: string, opts: { format?: 'jpeg' | 'png'; quality?: number }, binaryPath?: string): Promise<AgentBrowserScreenshotResult>; + // Reads the current stream port for an already-running session. This is a + // purpose-built status channel, not part of agentBrowserCommand's allowlist, + // so restored panels can recover from a stale persisted wsPort after reload. + agentBrowserStreamStatus?(session: string, binaryPath?: string): Promise<AgentBrowserStreamStatusResult>; // The WebSocket URL for a session's stream port. Hosts whose webview origin // the agent-browser stream server rejects (VS Code) return a tokenized relay // URL; absent or null falls back to ws://127.0.0.1:<port>. getAgentBrowserStreamUrl?(port: number): Promise<string | null>; - // iframe surface support (see docs/specs/dor-iframe.md → "The Transparent - // Proxy"). Stands up a loopback proxy in front of a `dor iframe` target and + // iframe surface support (see docs/specs/dor-browser.md → "The transparent + // proxy"). Stands up a loopback proxy in front of a `dor iframe` target and // returns the proxy URL the panel should frame, or a structured reason it // could not. Absent on hosts with no process to run a proxy (e.g. the web // host), where the panel falls back to a raw, uninstrumented `<iframe>`. createIframeProxyUrl?(targetUrl: string): Promise<IframeProxyResult>; + // Render-swap support (docs/specs/dor-browser.md → "Display Modal And Render Swaps"; + // docs/specs/dor-browser.md → "Pop-Out"). All optional + // so hosts degrade: the modal hides whatever isn't backed by a capability. + // + // Spawn a managed agent-browser session and open <url> — backs swapping an + // iframe embed up to a live screencast (`headed: false`) or straight to a + // popped-out window (`headed: true`, so embed→popout is one spawn, not a + // headless launch immediately torn down). `binaryPath` is the last one a + // `dor ab` surface resolved (a GUI-launched host's own PATH may miss the + // binary); the host falls back to PATH / DORMOUSE_AGENT_BROWSER_BIN. + agentBrowserOpen?(url: string, opts: { headed?: boolean }, binaryPath?: string): Promise<AgentBrowserOpenResult>; + // Relaunch a session's browser headed as a native OS window, reopening `url` + // (headed/headless is fixed at launch, so this is a close+relaunch — v1 + // preserves the active tab URL). Best-effort positioned over `rect` (CSS px + // in screen space). Returns the new stream port. Absent ⇒ pop-out hidden. + agentBrowserPopOut?(session: string, opts: { rect?: { x: number; y: number; width: number; height: number }; url?: string }, binaryPath?: string): Promise<AgentBrowserPopResult>; + // Relaunch headless (pop back in) reopening `url`, resuming the screencast; + // returns the new stream port. Pairs with agentBrowserPopOut. + agentBrowserPopIn?(session: string, opts: { url?: string }, binaryPath?: string): Promise<AgentBrowserPopResult>; + // Best-effort raise the session's headed window to the front. + agentBrowserBringToFront?(session: string, binaryPath?: string): Promise<void>; + // PTY event listeners onPtyData(handler: (detail: { id: string; data: string }) => void): void; offPtyData(handler: (detail: { id: string; data: string }) => void): void; diff --git a/lib/src/lib/platform/vscode-adapter.ts b/lib/src/lib/platform/vscode-adapter.ts index 1dc7ecc4..6a4d6900 100644 --- a/lib/src/lib/platform/vscode-adapter.ts +++ b/lib/src/lib/platform/vscode-adapter.ts @@ -1,4 +1,4 @@ -import type { AgentBrowserCommandResult, AgentBrowserEditOp, AgentBrowserEditResult, AgentBrowserScreenshotResult, AlertStateDetail, IframeProxyResult, OpenPort, PlatformAdapter, PtyInfo } from './types'; +import type { AgentBrowserCommandResult, AgentBrowserEditOp, AgentBrowserEditResult, AgentBrowserOpenResult, AgentBrowserPopResult, AgentBrowserScreenshotResult, AgentBrowserStreamStatusResult, AlertStateDetail, IframeProxyResult, OpenPort, PlatformAdapter, PtyInfo } from './types'; import { OPEN_PORT_TIMEOUT_MS } from './types'; import { setDefaultShellOpts } from '../shell-defaults'; import { @@ -31,7 +31,11 @@ export class VSCodeAdapter implements PlatformAdapter { this.agentBrowserCommand = this.agentBrowserCommand.bind(this); this.agentBrowserEdit = this.agentBrowserEdit.bind(this); this.agentBrowserScreenshot = this.agentBrowserScreenshot.bind(this); + this.agentBrowserStreamStatus = this.agentBrowserStreamStatus.bind(this); this.getAgentBrowserStreamUrl = this.getAgentBrowserStreamUrl.bind(this); + this.agentBrowserOpen = this.agentBrowserOpen.bind(this); + this.agentBrowserPopOut = this.agentBrowserPopOut.bind(this); + this.agentBrowserPopIn = this.agentBrowserPopIn.bind(this); this.createIframeProxyUrl = this.createIframeProxyUrl.bind(this); // Seed the default shell from the extension-injected global so that @@ -253,6 +257,16 @@ export class VSCodeAdapter implements PlatformAdapter { return result ?? { ok: false, error: 'agent-browser screenshot timed out' }; } + async agentBrowserStreamStatus(session: string, binaryPath?: string): Promise<AgentBrowserStreamStatusResult> { + const result = await this.requestResponse<AgentBrowserStreamStatusResult>( + 'agentBrowser:streamStatus', 'agentBrowser:streamStatusResult', + { session, binaryPath }, + (msg) => ({ ok: msg.ok, wsPort: msg.wsPort, error: msg.error }), + 5000, + ); + return result ?? { ok: false, error: 'agent-browser stream status timed out' }; + } + getAgentBrowserStreamUrl(port: number): Promise<string | null> { // The agent-browser stream server rejects vscode-webview:// origins, so // the extension host relays the stream (see agent-browser-host.ts). @@ -263,6 +277,33 @@ export class VSCodeAdapter implements PlatformAdapter { ); } + async agentBrowserOpen(url: string, opts: { headed?: boolean }, binaryPath?: string): Promise<AgentBrowserOpenResult> { + const result = await this.requestResponse<AgentBrowserOpenResult>( + 'agentBrowser:open', 'agentBrowser:openResult', { url, headed: opts.headed, binaryPath }, + (msg) => ({ ok: msg.ok, session: msg.session, wsPort: msg.wsPort, binaryPath: msg.binaryPath, error: msg.error }), + 15000, + ); + return result ?? { ok: false, error: 'agent-browser open timed out' }; + } + + async agentBrowserPopOut(session: string, opts: { rect?: { x: number; y: number; width: number; height: number }; url?: string }, binaryPath?: string): Promise<AgentBrowserPopResult> { + const result = await this.requestResponse<AgentBrowserPopResult>( + 'agentBrowser:popOut', 'agentBrowser:popResult', { session, url: opts.url, rect: opts.rect, binaryPath }, + (msg) => ({ ok: msg.ok, wsPort: msg.wsPort, error: msg.error }), + 15000, + ); + return result ?? { ok: false, error: 'agent-browser pop-out timed out' }; + } + + async agentBrowserPopIn(session: string, opts: { url?: string }, binaryPath?: string): Promise<AgentBrowserPopResult> { + const result = await this.requestResponse<AgentBrowserPopResult>( + 'agentBrowser:popIn', 'agentBrowser:popResult', { session, url: opts.url, binaryPath }, + (msg) => ({ ok: msg.ok, wsPort: msg.wsPort, error: msg.error }), + 15000, + ); + return result ?? { ok: false, error: 'agent-browser pop-in timed out' }; + } + async createIframeProxyUrl(url: string): Promise<IframeProxyResult> { // The extension host stands up the loopback proxy and serves the bytes (see // iframe-proxy-host.ts). On timeout, report unreachable so the panel shows a diff --git a/lib/src/lib/terminal-lifecycle.ts b/lib/src/lib/terminal-lifecycle.ts index 3714182a..3d0186c0 100644 --- a/lib/src/lib/terminal-lifecycle.ts +++ b/lib/src/lib/terminal-lifecycle.ts @@ -514,8 +514,8 @@ export function markSessionTouched(id: string): void { /** * A non-terminal content surface's focus contract, so `focusSession` can drive * it like any xterm pane. The iframe surface registers one whose `focus` moves - * keyboard focus into the instrumented frame (docs/specs/dor-iframe.md → "#3 — - * the surface registers a focus handle"). + * keyboard focus into the instrumented frame (docs/specs/dor-browser.md → + * "Iframe Focus And Rendering Notes"). */ export interface SurfaceFocusHandle { focus(): void; diff --git a/lib/src/stories/AgentBrowserScreenModal.stories.tsx b/lib/src/stories/AgentBrowserScreenModal.stories.tsx index c53b09ee..8ec994aa 100644 --- a/lib/src/stories/AgentBrowserScreenModal.stories.tsx +++ b/lib/src/stories/AgentBrowserScreenModal.stories.tsx @@ -1,9 +1,13 @@ import { useMemo } from 'react'; import type { Meta, StoryObj } from '@storybook/react'; import { AgentBrowserScreenModal } from '../components/wall/AgentBrowserScreenModal'; -import type { ScreenController, ScreenSnapshot, ScreenState } from '../components/wall/agent-browser-screen'; +import type { RenderMode, ScreenController, ScreenSnapshot, ScreenState } from '../components/wall/agent-browser-screen'; interface StoryArgs { + /** Render backend — `embed` greys out the Screen (viewport) section. */ + renderMode: RenderMode; + /** Whether the host can pop out (gates the "Pop out to window" button). */ + canPopOut: boolean; state: ScreenState; /** Browser CSS viewport + inferred DPR. */ vpW: number; @@ -26,6 +30,7 @@ function useMockController(args: StoryArgs): ScreenController { return useMemo<ScreenController>(() => { const snapshot: ScreenSnapshot = { state: args.state, + renderMode: args.renderMode, viewport: { w: args.vpW, h: args.vpH, dpr: args.vpDpr }, paneCss: { w: args.paneW, h: args.paneH }, displayDpr: args.displayDpr, @@ -49,14 +54,16 @@ function useMockController(args: StoryArgs): ScreenController { reload: () => console.log('[story] reload'), }, hostCapable: args.hostCapable, + canPopOut: args.canPopOut, actions: { engageSync: () => console.log('[story] engageSync'), applyDevice: (name) => console.log('[story] applyDevice', name), applyViewport: (w, h, dpr) => console.log('[story] applyViewport', w, h, dpr), openModal: () => {}, + setRenderMode: (mode) => console.log('[story] setRenderMode', mode), }, }; - }, [args.state, args.vpW, args.vpH, args.vpDpr, args.paneW, args.paneH, args.displayDpr, args.syncEngaged, args.hostCapable]); + }, [args.state, args.renderMode, args.vpW, args.vpH, args.vpDpr, args.paneW, args.paneH, args.displayDpr, args.syncEngaged, args.hostCapable, args.canPopOut]); } function AgentBrowserScreenModalStory(args: StoryArgs) { @@ -73,6 +80,8 @@ const meta: Meta<typeof AgentBrowserScreenModalStory> = { title: 'Modals/AgentBrowserScreenModal', component: AgentBrowserScreenModalStory, argTypes: { + renderMode: { control: 'inline-radio', options: ['ab-screencast', 'ab-popout', 'iframe'] }, + canPopOut: { control: 'boolean' }, state: { control: 'inline-radio', options: ['SYNCED', 'SCALED'] }, vpW: { control: 'number' }, vpH: { control: 'number' }, @@ -83,6 +92,12 @@ const meta: Meta<typeof AgentBrowserScreenModalStory> = { syncEngaged: { control: 'boolean' }, hostCapable: { control: 'boolean' }, }, + // Defaults shared by every story (each story overrides the viewport knobs); + // a swap-capable, pop-out-capable surface so both new affordances show. + args: { + renderMode: 'ab-screencast', + canPopOut: true, + }, }; export default meta; @@ -139,3 +154,47 @@ export const HostIncapable: Story = { hostCapable: false, }, }; + +// Pop-out render mode: same agent-browser as a native OS window. The Render +// section pre-selects Pop-out and the Screen section greys out (the window owns +// its own size). +export const Popout: Story = { + args: { + renderMode: 'ab-popout', + state: 'SYNCED', + vpW: 980, vpH: 560, vpDpr: 2, + paneW: 980, paneH: 560, + displayDpr: 2, + syncEngaged: true, + hostCapable: true, + }, +}; + +// Embed (iframe) render mode: the Render section pre-selects Embed and the +// Screen (viewport) section greys out — the iframe renders at the pane size, so +// there's nothing to set. +export const EmbedRender: Story = { + args: { + renderMode: 'iframe', + state: 'SYNCED', + vpW: 980, vpH: 560, vpDpr: 2, + paneW: 980, paneH: 560, + displayDpr: 2, + syncEngaged: true, + hostCapable: true, + }, +}; + +// Host can't pop out (e.g. the web host) ⇒ the Render section drops the Pop-out +// option, leaving Screencast / Embed. +export const NoPopOut: Story = { + args: { + canPopOut: false, + state: 'SYNCED', + vpW: 980, vpH: 560, vpDpr: 2, + paneW: 980, paneH: 560, + displayDpr: 2, + syncEngaged: true, + hostCapable: true, + }, +}; diff --git a/lib/src/stories/BrowserChromeHeader.stories.tsx b/lib/src/stories/BrowserChromeHeader.stories.tsx index 9f6e1835..bc8744ab 100644 --- a/lib/src/stories/BrowserChromeHeader.stories.tsx +++ b/lib/src/stories/BrowserChromeHeader.stories.tsx @@ -12,6 +12,7 @@ import { SurfacePaneHeader } from '../components/wall/SurfacePaneHeader'; import { registerAgentBrowserScreen, type ChromeSnapshot, + type RenderMode, type ScreenRegistration, type ScreenSnapshot, type ScreenState, @@ -21,7 +22,7 @@ import { setDevServerResolution } from '../components/wall/agent-browser-ports'; /** * Playground for the agent-browser surface's browser-chrome header - * (docs/specs/dor-agent-browser.md → "Browser-Chrome Header"). + * (docs/specs/dor-browser.md → "Browser Chrome"). * * `SurfacePaneHeader` decides "this is a browser surface" purely from the * presence of a screen controller for its `api.id`, and reads URL / key from @@ -45,9 +46,13 @@ const loggingActions: WallActions = { onStartRename: () => {}, onFinishRename: () => ({ accepted: true }), onCancelRename: () => {}, + onSwapRenderMode: (id, mode) => console.log('[story] swap render', id, mode), }; interface StoryArgs { + /** Render backend — drives the far-left chip glyph: frame = embed, lock = + * screencast (closed when synced, open when scaled). */ + renderMode: RenderMode; /** Drives the SYNCED/SCALED chip + the modal it opens. */ state: ScreenState; /** Active tab URL — also the source of the host+path text and loopback port. */ @@ -74,11 +79,12 @@ function BrowserChromeStory(args: StoryArgs) { const screenSnapshot: ScreenSnapshot = useMemo(() => ({ state: args.state, + renderMode: args.renderMode, viewport: { w: 1280, h: 720, dpr: 1 }, paneCss: args.state === 'SYNCED' ? { w: 1280, h: 720 } : { w: 980, h: 560 }, displayDpr: 2, syncEngaged: args.state === 'SYNCED', - }), [args.state]); + }), [args.state, args.renderMode]); const chromeSnapshot: ChromeSnapshot = useMemo(() => ({ url: args.url, @@ -98,6 +104,7 @@ function BrowserChromeStory(args: StoryArgs) { applyDevice: (name) => console.log('[story] applyDevice', name), applyViewport: (w, h, dpr) => console.log('[story] applyViewport', w, h, dpr), openModal: () => console.log('[story] openModal'), + setRenderMode: (mode) => console.log('[story] setRenderMode', mode), }, chromeActions: { navigate: (url) => console.log('[story] navigate', url), @@ -158,6 +165,7 @@ const meta: Meta<typeof BrowserChromeStory> = { title: 'Components/BrowserChromeHeader', component: BrowserChromeStory, argTypes: { + renderMode: { control: 'inline-radio', options: ['ab-screencast', 'ab-popout', 'iframe'] }, state: { control: 'radio', options: ['SYNCED', 'SCALED'] }, url: { control: 'text' }, htmlTitle: { control: 'text' }, @@ -168,6 +176,7 @@ const meta: Meta<typeof BrowserChromeStory> = { selected: { control: 'boolean' }, }, args: { + renderMode: 'ab-screencast', state: 'SYNCED', url: 'http://localhost:5173/app', htmlTitle: 'Vite + React', @@ -185,6 +194,20 @@ type Story = StoryObj<typeof BrowserChromeStory>; /** Everything on at once: key badge + URL + dev-server chip + nav. */ export const Playground: Story = {}; +/** Pop-out render mode — same agent-browser, relaunched as a native OS window; + * the far-left chip becomes the open-window glyph. (The pane body is a stub + * while the window is up, but the header chrome stays live.) */ +export const Popout: Story = { + args: { renderMode: 'ab-popout' }, +}; + +/** Embed (iframe) render mode — the unified chrome is identical to screencast, + * but the far-left chip becomes the frame-corners glyph. Same URL/nav/dev-server + * header; only the chip + body renderer differ. */ +export const Embed: Story = { + args: { renderMode: 'iframe' }, +}; + /** Letterboxed viewport — the chip reads SCALED (click it for the modal). */ export const Scaled: Story = { args: { state: 'SCALED' }, diff --git a/lib/src/stories/MouseHeaderIcon.stories.tsx b/lib/src/stories/MouseHeaderIcon.stories.tsx index 73b90674..ed87c2db 100644 --- a/lib/src/stories/MouseHeaderIcon.stories.tsx +++ b/lib/src/stories/MouseHeaderIcon.stories.tsx @@ -32,6 +32,7 @@ const noopActions: WallActions = { onStartRename: () => {}, onFinishRename: () => ({ accepted: true }), onCancelRename: () => {}, + onSwapRenderMode: () => {}, }; function MouseIconStoryFrame({ diff --git a/lib/src/stories/ShellCwd.stories.tsx b/lib/src/stories/ShellCwd.stories.tsx index 19a88e36..e504459c 100644 --- a/lib/src/stories/ShellCwd.stories.tsx +++ b/lib/src/stories/ShellCwd.stories.tsx @@ -52,6 +52,7 @@ const noopActions: WallActions = { onStartRename: () => {}, onFinishRename: () => ({ accepted: true }), onCancelRename: () => {}, + onSwapRenderMode: () => {}, }; const meta: Meta<typeof ShellCwdMatrix> = { diff --git a/lib/src/stories/TerminalPaneHeader.stories.tsx b/lib/src/stories/TerminalPaneHeader.stories.tsx index e6baf71c..26f93981 100644 --- a/lib/src/stories/TerminalPaneHeader.stories.tsx +++ b/lib/src/stories/TerminalPaneHeader.stories.tsx @@ -26,6 +26,7 @@ const noopActions: WallActions = { onStartRename: () => {}, onFinishRename: () => ({ accepted: true }), onCancelRename: () => {}, + onSwapRenderMode: () => {}, }; function actionsRejecting(reason: 'empty' | 'reserved'): WallActions { diff --git a/lib/tsconfig.app.json b/lib/tsconfig.app.json index 2c282eef..cf3a4c0b 100644 --- a/lib/tsconfig.app.json +++ b/lib/tsconfig.app.json @@ -23,5 +23,5 @@ "include": ["src"], // The pure rewrite helpers typecheck here (DOM provides URL); the Node proxy // server is esbuild-only (bundled per host), like the rest of our host code. - "exclude": ["src/**/*.test.ts", "src/**/*.test.tsx", "src/host/iframe-proxy.ts"] + "exclude": ["src/**/*.test.ts", "src/**/*.test.tsx", "src/host/iframe-proxy.ts", "src/host/agent-browser-host.ts"] } diff --git a/package.json b/package.json index 67b74b44..44c71933 100644 --- a/package.json +++ b/package.json @@ -15,6 +15,7 @@ "test": "pnpm -r run test", "dev:lib": "pnpm --filter dormouse-lib dev", "dev:standalone": "pnpm --filter dormouse-standalone tauri dev", + "dev:standalone:ab": "pnpm --filter dormouse-standalone dev:agent-browser", "dev:website": "pnpm --filter dormouse-website dev", "build:vscode": "pnpm --filter dormouse-lib build && pnpm --filter dormouse build:frontend && pnpm --filter dormouse build", "build:standalone": "pnpm --filter dormouse-standalone tauri build", diff --git a/standalone/package.json b/standalone/package.json index fe0890b7..7f217797 100644 --- a/standalone/package.json +++ b/standalone/package.json @@ -6,6 +6,7 @@ "type": "module", "scripts": { "dev": "vite", + "dev:agent-browser": "pnpm stage && node scripts/dev-agent-browser.mjs", "build": "pnpm stage && tsc -b && vite build", "stage": "pnpm stage:dor-cli && pnpm stage:sidecar-proxy", "stage:dor-cli": "pnpm --filter dor build && node scripts/stage-dor-cli.mjs", diff --git a/standalone/scripts/build-sidecar-proxy.mjs b/standalone/scripts/build-sidecar-proxy.mjs index bbb33a03..9ded58a9 100644 --- a/standalone/scripts/build-sidecar-proxy.mjs +++ b/standalone/scripts/build-sidecar-proxy.mjs @@ -1,23 +1,32 @@ -// Bundle the host-agnostic iframe proxy (lib/src/host/iframe-proxy.ts, shared -// with the VS Code extension host) into a CommonJS file the Node sidecar can -// require. Keeps the proxy as a single TypeScript source while the sidecar -// itself stays plain CJS. See docs/specs/dor-iframe.md → "The Transparent Proxy". +// Bundle the host-agnostic host modules (shared with the VS Code extension host) +// into CommonJS files the Node sidecar can require. Keeps each as a single +// TypeScript source while the sidecar itself stays plain CJS. +// - lib/src/host/iframe-proxy.ts → sidecar/iframe-proxy.cjs +// - lib/src/host/agent-browser-host.ts → sidecar/agent-browser-host.cjs +// See docs/specs/dor-browser.md. import { build } from 'esbuild'; import { fileURLToPath } from 'node:url'; import path from 'node:path'; const here = path.dirname(fileURLToPath(import.meta.url)); -const entry = path.resolve(here, '../../lib/src/host/iframe-proxy.ts'); -const outfile = path.resolve(here, '../sidecar/iframe-proxy.cjs'); +const libHost = path.resolve(here, '../../lib/src/host'); +const sidecar = path.resolve(here, '../sidecar'); -await build({ - entryPoints: [entry], - outfile, - bundle: true, - platform: 'node', // node builtins (http/net) stay external automatically - format: 'cjs', - target: 'node22', - logLevel: 'warning', -}); +const bundles = [ + { entry: 'iframe-proxy.ts', out: 'iframe-proxy.cjs' }, + { entry: 'agent-browser-host.ts', out: 'agent-browser-host.cjs' }, +]; -console.log(`[sidecar] built ${path.relative(process.cwd(), outfile)}`); +for (const { entry, out } of bundles) { + const outfile = path.resolve(sidecar, out); + await build({ + entryPoints: [path.resolve(libHost, entry)], + outfile, + bundle: true, + platform: 'node', // node builtins (http/net/fs/child_process) stay external + format: 'cjs', + target: 'node22', + logLevel: 'warning', + }); + console.log(`[sidecar] built ${path.relative(process.cwd(), outfile)}`); +} diff --git a/standalone/scripts/dev-agent-browser.mjs b/standalone/scripts/dev-agent-browser.mjs new file mode 100644 index 00000000..d37d06f5 --- /dev/null +++ b/standalone/scripts/dev-agent-browser.mjs @@ -0,0 +1,282 @@ +#!/usr/bin/env node +import http from 'node:http'; +import net from 'node:net'; +import os from 'node:os'; +import path from 'node:path'; +import { fileURLToPath } from 'node:url'; +import { spawn } from 'node:child_process'; +import { createInterface } from 'node:readline'; + +const __dirname = path.dirname(fileURLToPath(import.meta.url)); +const standaloneDir = path.resolve(__dirname, '..'); +const repoRoot = path.resolve(standaloneDir, '..'); +const sidecarDir = path.join(standaloneDir, 'sidecar'); +const sidecarScript = path.join(sidecarDir, 'main.js'); +const dorBinDir = path.join(sidecarDir, 'dor-cli', 'bin'); +const dorEntrypoint = path.join(sidecarDir, 'dor-cli', 'dist', 'dor.js'); +const hostPort = Number(process.env.DORMOUSE_BROWSER_DEV_HOST_PORT || 1422); +const vitePort = Number(process.env.DORMOUSE_BROWSER_DEV_VITE_PORT || 1420); +const browserSession = process.env.DORMOUSE_BROWSER_DEV_AB_SESSION || 'dormouse-dev-standalone'; +const controlSocket = path.join(os.tmpdir(), `dormouse-${process.pid}-browser-dor.sock`); +const controlToken = Math.random().toString(36).slice(2); + +const pending = new Map(); +const sseClients = new Set(); +let sidecar; +let vite; +let shuttingDown = false; +let requestSeq = 0; + +function log(message) { + console.error(`[dev:standalone:ab] ${message}`); +} + +function sendSse(res, event, data) { + const payload = JSON.stringify(data); + res.write(`event: ${event}\n`); + for (const line of payload.split(/\r?\n/)) res.write(`data: ${line}\n`); + res.write('\n'); +} + +function broadcast(event, data) { + for (const client of sseClients) sendSse(client, event, data); +} + +function writeSidecar(event, data = {}) { + sidecar?.stdin?.write(`${JSON.stringify({ event, data })}\n`); +} + +function requestSidecar(event, data, responseEvent, pick, timeoutMs = 10000) { + const requestId = `dev-${++requestSeq}`; + return new Promise((resolve, reject) => { + const timer = setTimeout(() => { + pending.delete(requestId); + reject(new Error(`${event} timed out`)); + }, timeoutMs); + pending.set(requestId, { + responseEvent, + resolve: (payload) => { + clearTimeout(timer); + resolve(pick(payload)); + }, + reject: (err) => { + clearTimeout(timer); + reject(err); + }, + }); + writeSidecar(event, { ...data, requestId }); + }); +} + +const fireAndForget = { + pty_spawn: ({ id, options }) => writeSidecar('pty:spawn', { id, options }), + pty_write: ({ id, data }) => writeSidecar('pty:input', { id, data }), + pty_resize: ({ id, cols, rows }) => writeSidecar('pty:resize', { id, cols, rows }), + pty_kill: ({ id }) => writeSidecar('pty:kill', { id }), + pty_request_init: () => writeSidecar('pty:requestInit'), + dor_control_response: ({ response }) => writeSidecar('dor:controlResponse', response), + kill_sidecar_now: () => shutdown(), +}; + +const invokeMap = { + get_available_shells: (_args) => requestSidecar('pty:getShells', {}, 'pty:shells', (data) => data.shells ?? []), + pty_get_cwd: ({ id }) => requestSidecar('pty:getCwd', { id }, 'pty:cwd', (data) => data.cwd ?? null), + pty_get_open_ports: ({ id }) => requestSidecar('pty:getOpenPorts', { id }, 'pty:openPorts', (data) => data.ports ?? []), + pty_get_scrollback: ({ id }) => requestSidecar('pty:getScrollback', { id }, 'pty:scrollback', (data) => data.data ?? null), + read_clipboard_file_paths: () => requestSidecar('clipboard:readFiles', {}, 'clipboard:files', (data) => data.paths ?? null), + read_clipboard_image_as_file_path: () => requestSidecar('clipboard:readImage', {}, 'clipboard:image', (data) => data.path ?? null), + read_clipboard_text: () => requestSidecar('clipboard:readText', {}, 'clipboard:text', (data) => data.text ?? null), + iframe_create_proxy_url: ({ target }) => requestSidecar('iframe:createProxyUrl', { target }, 'iframe:proxyUrl', (data) => data.result), + agent_browser_command: ({ session, args, binaryPath }) => requestSidecar('agentBrowser:command', { session, args, binaryPath }, 'agentBrowser:result', (data) => data.result, 30000), + agent_browser_edit: ({ session, op, binaryPath }) => requestSidecar('agentBrowser:edit', { session, op, binaryPath }, 'agentBrowser:result', (data) => data.result, 30000), + agent_browser_screenshot: ({ session, format, quality, binaryPath }) => requestSidecar('agentBrowser:screenshot', { session, format, quality, binaryPath }, 'agentBrowser:result', (data) => data.result, 30000), + agent_browser_stream_status: ({ session, binaryPath }) => requestSidecar('agentBrowser:streamStatus', { session, binaryPath }, 'agentBrowser:result', (data) => data.result, 30000), + agent_browser_open: ({ url, headed, binaryPath }) => requestSidecar('agentBrowser:open', { url, headed, binaryPath }, 'agentBrowser:result', (data) => data.result, 30000), + agent_browser_pop_out: ({ session, url, rect, binaryPath }) => requestSidecar('agentBrowser:popOut', { session, url, rect, binaryPath }, 'agentBrowser:result', (data) => data.result, 30000), + agent_browser_pop_in: ({ session, url, binaryPath }) => requestSidecar('agentBrowser:popIn', { session, url, binaryPath }, 'agentBrowser:result', (data) => data.result, 30000), +}; + +async function readJson(req) { + const chunks = []; + for await (const chunk of req) chunks.push(chunk); + if (chunks.length === 0) return {}; + return JSON.parse(Buffer.concat(chunks).toString('utf8')); +} + +function cors(res) { + res.setHeader('access-control-allow-origin', '*'); + res.setHeader('access-control-allow-methods', 'GET,POST,OPTIONS'); + res.setHeader('access-control-allow-headers', 'content-type'); +} + +function startHostServer() { + const server = http.createServer(async (req, res) => { + cors(res); + if (req.method === 'OPTIONS') { + res.writeHead(204).end(); + return; + } + try { + const url = new URL(req.url || '/', `http://${req.headers.host}`); + if (req.method === 'GET' && url.pathname === '/__dormouse_dev_host/events') { + res.writeHead(200, { + 'content-type': 'text/event-stream', + 'cache-control': 'no-cache', + connection: 'keep-alive', + 'access-control-allow-origin': '*', + }); + sseClients.add(res); + sendSse(res, 'sidecar', { event: 'dev:connected', data: { pid: process.pid } }); + req.on('close', () => sseClients.delete(res)); + return; + } + if (req.method === 'POST' && url.pathname === '/__dormouse_dev_host/send') { + const { cmd, args } = await readJson(req); + const fn = fireAndForget[cmd]; + if (!fn) throw new Error(`unknown send command ${cmd}`); + fn(args || {}); + res.writeHead(200, { 'content-type': 'application/json' }).end(JSON.stringify({ ok: true })); + return; + } + if (req.method === 'POST' && url.pathname === '/__dormouse_dev_host/invoke') { + const { cmd, args } = await readJson(req); + const fn = invokeMap[cmd]; + if (!fn) throw new Error(`unknown invoke command ${cmd}`); + const result = await fn(args || {}); + res.writeHead(200, { 'content-type': 'application/json' }).end(JSON.stringify({ ok: true, result })); + return; + } + if (req.method === 'POST' && url.pathname === '/__dormouse_dev_host/console') { + const { level, args } = await readJson(req); + console.error(`[browser ${level || 'log'}] ${(args || []).join(' ')}`); + res.writeHead(204).end(); + return; + } + res.writeHead(404).end('not found'); + } catch (err) { + res.writeHead(500, { 'content-type': 'text/plain' }).end(err instanceof Error ? err.message : String(err)); + } + }); + return new Promise((resolve, reject) => { + server.once('error', reject); + server.listen(hostPort, '127.0.0.1', () => { + server.off('error', reject); + resolve(server); + }); + }); +} + +function startSidecar() { + sidecar = spawn(process.execPath, [sidecarScript], { + cwd: sidecarDir, + stdio: ['pipe', 'pipe', 'pipe'], + env: { + ...process.env, + DORMOUSE_NODE: process.execPath, + DORMOUSE_CLI_BIN: dorBinDir, + DORMOUSE_CLI_JS: dorEntrypoint, + DORMOUSE_CONTROL_SOCKET: controlSocket, + DORMOUSE_CONTROL_TOKEN: controlToken, + }, + }); + log(`sidecar pid=${sidecar.pid}`); + log(`dor control socket: ${controlSocket}`); + + createInterface({ input: sidecar.stdout }).on('line', (line) => { + let msg; + try { + msg = JSON.parse(line); + } catch { + console.error(`[sidecar stdout] ${line}`); + return; + } + const event = msg.event; + const data = msg.data ?? null; + const requestId = data && typeof data.requestId === 'string' ? data.requestId : null; + if (requestId) { + const pendingRequest = pending.get(requestId); + if (pendingRequest && pendingRequest.responseEvent === event) { + pending.delete(requestId); + if (typeof data.error === 'string') pendingRequest.reject(new Error(data.error)); + else pendingRequest.resolve(data); + return; + } + } + broadcast('sidecar', { event, data }); + }); + createInterface({ input: sidecar.stderr }).on('line', (line) => console.error(`[sidecar] ${line}`)); + sidecar.on('exit', (code, signal) => { + log(`sidecar exited code=${code} signal=${signal}`); + for (const request of pending.values()) request.reject(new Error('sidecar exited')); + pending.clear(); + shutdown(); + }); +} + +function startVite() { + vite = spawn('pnpm', ['--filter', 'dormouse-standalone', 'dev'], { + cwd: repoRoot, + stdio: ['ignore', 'pipe', 'pipe'], + env: { + ...process.env, + VITE_DORMOUSE_BROWSER_DEV_HOST: `http://127.0.0.1:${hostPort}`, + DORMOUSE_BROWSER_DEV_VITE_PORT: String(vitePort), + }, + }); + createInterface({ input: vite.stdout }).on('line', (line) => console.error(`[vite] ${line}`)); + createInterface({ input: vite.stderr }).on('line', (line) => console.error(`[vite] ${line}`)); + vite.on('exit', (code, signal) => { + log(`vite exited code=${code} signal=${signal}`); + shutdown(); + }); +} + +async function waitForVite() { + const deadline = Date.now() + 30000; + while (Date.now() < deadline) { + try { + await new Promise((resolve, reject) => { + const socket = net.connect(vitePort, 'localhost', resolve); + socket.once('error', reject); + socket.once('connect', () => socket.end()); + }); + return; + } catch { + await new Promise((resolve) => setTimeout(resolve, 250)); + } + } + throw new Error(`vite did not open port ${vitePort}`); +} + +async function openAgentBrowser() { + const args = ['--session', browserSession]; + if (process.env.DORMOUSE_BROWSER_DEV_HEADED === '1') args.push('--headed'); + args.push('open', `http://localhost:${vitePort}`); + const child = spawn('agent-browser', args, { cwd: repoRoot, stdio: ['ignore', 'pipe', 'pipe'] }); + createInterface({ input: child.stdout }).on('line', (line) => console.error(`[agent-browser] ${line}`)); + createInterface({ input: child.stderr }).on('line', (line) => console.error(`[agent-browser] ${line}`)); + await new Promise((resolve) => child.on('exit', resolve)); + log(`agent-browser session: ${browserSession}`); + log(`try: agent-browser --session ${browserSession} snapshot -i`); +} + +async function shutdown() { + if (shuttingDown) return; + shuttingDown = true; + for (const client of sseClients) client.end(); + sseClients.clear(); + if (vite && !vite.killed) vite.kill('SIGTERM'); + if (sidecar && !sidecar.killed) sidecar.kill('SIGTERM'); + setTimeout(() => process.exit(0), 250).unref(); +} + +process.on('SIGINT', shutdown); +process.on('SIGTERM', shutdown); + +log(`starting browser dev host on http://127.0.0.1:${hostPort}`); +await startHostServer(); +startSidecar(); +startVite(); +await waitForVite(); +await openAgentBrowser(); +log('running; Ctrl-C to stop'); diff --git a/standalone/sidecar/clipboard-ops.js b/standalone/sidecar/clipboard-ops.js index c534af2d..fbe868ec 100644 --- a/standalone/sidecar/clipboard-ops.js +++ b/standalone/sidecar/clipboard-ops.js @@ -2,7 +2,7 @@ const fs = require('node:fs'); const os = require('node:os'); const path = require('node:path'); const crypto = require('node:crypto'); -const { execFile } = require('node:child_process'); +const { execFile, spawn } = require('node:child_process'); const { promisify } = require('node:util'); const execFileP = promisify(execFile); @@ -260,10 +260,56 @@ async function readClipboardImageAsFilePath(runtime = {}) { return null; } +// Write text to the OS clipboard via the platform's native CLI (stdin), mirroring +// the native reads above. Backs the agent-browser host's edit channel: copy/cut +// land the grabbed selection on the user's real clipboard (the VS Code host uses +// vscode.env.clipboard.writeText for the same purpose). +function writeViaStdin(cmd, args, text, runtime) { + const spawnFn = runtime.spawn || spawn; + return new Promise((resolve, reject) => { + let child; + try { + child = spawnFn(cmd, args, { stdio: ['pipe', 'ignore', 'ignore'] }); + } catch (err) { + reject(err); + return; + } + child.on('error', reject); + child.on('close', (code) => { + if (code === 0) resolve(); + else reject(new Error(`${cmd} exited ${code}`)); + }); + // The tool may exit before we finish writing (EPIPE) — swallow it; `close` + // with a nonzero code already reports a real failure. + if (child.stdin) { + child.stdin.on('error', () => {}); + child.stdin.end(text); + } + }); +} + +async function writeClipboardText(text, runtime = {}) { + const platform = runtime.platform || process.platform; + if (platform === 'darwin') return writeViaStdin('pbcopy', [], text, runtime); + if (platform === 'win32') return writeViaStdin('clip', [], text, runtime); + const env = runtime.env || process.env; + const wayland = Boolean(env.WAYLAND_DISPLAY); + const attempts = wayland + ? [['wl-copy', []], ['xclip', ['-selection', 'clipboard']]] + : [['xclip', ['-selection', 'clipboard']], ['wl-copy', []]]; + let lastErr; + for (const [cmd, args] of attempts) { + try { return await writeViaStdin(cmd, args, text, runtime); } + catch (err) { lastErr = err; } + } + throw lastErr || new Error('no clipboard write tool available'); +} + module.exports = { readClipboardFilePaths, readClipboardImageAsFilePath, readClipboardText, + writeClipboardText, parseUriList, splitNonEmptyLines, }; diff --git a/standalone/sidecar/clipboard-ops.test.js b/standalone/sidecar/clipboard-ops.test.js index 4402a8cc..8e9b5666 100644 --- a/standalone/sidecar/clipboard-ops.test.js +++ b/standalone/sidecar/clipboard-ops.test.js @@ -6,10 +6,38 @@ const { readClipboardFilePaths, readClipboardImageAsFilePath, readClipboardText, + writeClipboardText, parseUriList, splitNonEmptyLines, } = require('./clipboard-ops'); +// A fake child_process.spawn for writeClipboardText: records (cmd, args) and the +// text written to stdin, then fires `close`/`error` asynchronously like a real +// process. `behavior(cmd)` chooses the outcome per command ({ code } | { error }). +function fakeSpawn(behavior) { + const calls = []; + const writes = []; + const spawn = (cmd, args) => { + calls.push([cmd, args]); + const handlers = {}; + return { + on(event, cb) { handlers[event] = cb; return this; }, + stdin: { + on() {}, + end(text) { + writes.push([cmd, text]); + const res = (behavior ? behavior(cmd) : null) || {}; + queueMicrotask(() => { + if (res.error) handlers.error?.(res.error); + else handlers.close?.(res.code ?? 0); + }); + }, + }, + }; + }; + return { spawn, calls, writes }; +} + function fakeOs(tmp = '/tmp/test') { return { tmpdir: () => tmp }; } @@ -283,3 +311,40 @@ test('readClipboardImageAsFilePath on linux writes buffer from exec stdout', asy assert.equal(fs.writes.length, 1); assert.deepEqual(fs.writes[0][2], { mode: 0o600 }); }); + +test('writeClipboardText on mac shells out to pbcopy via stdin', async () => { + const f = fakeSpawn(); + await writeClipboardText('copied!', { platform: 'darwin', spawn: f.spawn }); + assert.deepEqual(f.calls, [['pbcopy', []]]); + assert.deepEqual(f.writes, [['pbcopy', 'copied!']]); +}); + +test('writeClipboardText on windows shells out to clip', async () => { + const f = fakeSpawn(); + await writeClipboardText('x', { platform: 'win32', spawn: f.spawn }); + assert.deepEqual(f.calls, [['clip', []]]); +}); + +test('writeClipboardText on linux prefers xclip in X11', async () => { + const f = fakeSpawn(); + await writeClipboardText('hi', { platform: 'linux', env: {}, spawn: f.spawn }); + assert.equal(f.calls[0][0], 'xclip'); + assert.deepEqual(f.calls[0][1], ['-selection', 'clipboard']); +}); + +test('writeClipboardText on linux prefers wl-copy under Wayland', async () => { + const f = fakeSpawn(); + await writeClipboardText('hi', { platform: 'linux', env: { WAYLAND_DISPLAY: 'wayland-0' }, spawn: f.spawn }); + assert.equal(f.calls[0][0], 'wl-copy'); +}); + +test('writeClipboardText on linux falls back when the first tool fails', async () => { + const f = fakeSpawn((cmd) => (cmd === 'xclip' ? { code: 1 } : { code: 0 })); + await writeClipboardText('hi', { platform: 'linux', env: {}, spawn: f.spawn }); + assert.deepEqual(f.calls.map((c) => c[0]), ['xclip', 'wl-copy']); +}); + +test('writeClipboardText rejects when the tool exits nonzero', async () => { + const f = fakeSpawn(() => ({ code: 1 })); + await assert.rejects(writeClipboardText('hi', { platform: 'darwin', spawn: f.spawn })); +}); diff --git a/standalone/sidecar/main.js b/standalone/sidecar/main.js index 4ceca7b3..0c46b735 100644 --- a/standalone/sidecar/main.js +++ b/standalone/sidecar/main.js @@ -12,8 +12,17 @@ const { create } = require('./pty-core'); const clipboard = require('./clipboard-ops'); const { createDorControlServer } = require('./dor-control-server'); // Built from lib/src/host/iframe-proxy.ts (shared with the VS Code host) by -// scripts/build-sidecar-proxy.mjs. See docs/specs/dor-iframe.md. +// scripts/build-sidecar-proxy.mjs. See docs/specs/dor-browser.md. const { createIframeProxyUrl } = require('./iframe-proxy.cjs'); +// Same pattern: lib/src/host/agent-browser-host.ts is the single source of truth +// for the agent-browser host capabilities, run here exactly as the VS Code +// extension host runs it. See docs/specs/dor-browser.md → "Agent-Browser Host Capabilities". +const { createAgentBrowserHost } = require('./agent-browser-host.cjs'); + +const agentBrowser = createAgentBrowserHost({ + writeClipboardText: (text) => clipboard.writeClipboardText(text), + log: (m) => console.error(m), +}); function send(event, data) { process.stdout.write(JSON.stringify({ event, data }) + '\n'); @@ -54,6 +63,7 @@ rl.on('line', (line) => { case 'pty:getScrollback': mgr.getScrollback(data.id, data.requestId); break; case 'pty:getShells': mgr.getShells(data.requestId); break; case 'pty:gracefulKillAll': mgr.gracefulKillAll(data.timeout); break; + case 'sidecar:shutdown': shutdown(); break; case 'dor:controlResponse': dorControl?.respond(data); break; case 'iframe:createProxyUrl': // Log to stderr — stdout is the JSON-lines protocol channel. @@ -61,6 +71,47 @@ rl.on('line', (line) => { result: await createIframeProxyUrl(data.target, { log: (m) => console.error(m) }), })); break; + case 'agentBrowser:command': + respondAsync('agentBrowser:result', data.requestId, async () => ({ + result: await agentBrowser.command(data.session, data.args, data.binaryPath), + })); + break; + case 'agentBrowser:edit': + respondAsync('agentBrowser:result', data.requestId, async () => ({ + result: await agentBrowser.edit(data.session, data.op, data.binaryPath), + })); + break; + case 'agentBrowser:screenshot': + // Raw bytes can't ride the JSON-lines stdio, so base64 here; the Rust + // forwarder decodes back to a raw tauri::ipc::Response for the webview. + respondAsync('agentBrowser:result', data.requestId, async () => { + const shot = await agentBrowser.screenshot( + data.session, { format: data.format, quality: data.quality }, data.binaryPath, + ); + if (!shot.ok) return { result: { ok: false, error: shot.error } }; + return { result: { ok: true, mime: shot.mime, bytesBase64: Buffer.from(shot.bytes).toString('base64') } }; + }); + break; + case 'agentBrowser:streamStatus': + respondAsync('agentBrowser:result', data.requestId, async () => ({ + result: await agentBrowser.streamStatus(data.session, data.binaryPath), + })); + break; + case 'agentBrowser:open': + respondAsync('agentBrowser:result', data.requestId, async () => ({ + result: await agentBrowser.open(data.url, { headed: data.headed }, data.binaryPath), + })); + break; + case 'agentBrowser:popOut': + respondAsync('agentBrowser:result', data.requestId, async () => ({ + result: await agentBrowser.popOut(data.session, { url: data.url, rect: data.rect }, data.binaryPath), + })); + break; + case 'agentBrowser:popIn': + respondAsync('agentBrowser:result', data.requestId, async () => ({ + result: await agentBrowser.popIn(data.session, { url: data.url }, data.binaryPath), + })); + break; case 'clipboard:readFiles': respondAsync('clipboard:files', data.requestId, async () => ({ paths: await clipboard.readClipboardFilePaths(), @@ -83,7 +134,19 @@ rl.on('line', (line) => { } }); -function shutdown() { +let shuttingDown = false; +async function shutdown() { + if (shuttingDown) return; + shuttingDown = true; + // Close any headed pop-out windows so quitting never orphans a real Chrome + // window (spec → "Headed Pop-Out" lifecycle). Bounded so a hung agent-browser + // can't wedge the exit; mirrors the VS Code host's deactivate(). + try { + await Promise.race([ + agentBrowser.closePoppedOut(), + new Promise((resolve) => setTimeout(resolve, 1500).unref?.()), + ]); + } catch {} dorControl?.close(); mgr.killAll(); process.exit(0); diff --git a/standalone/src-tauri/Cargo.lock b/standalone/src-tauri/Cargo.lock index 69ec8edd..785d9499 100644 --- a/standalone/src-tauri/Cargo.lock +++ b/standalone/src-tauri/Cargo.lock @@ -661,6 +661,7 @@ dependencies = [ name = "dormouse" version = "0.11.0" dependencies = [ + "base64 0.22.1", "process-wrap", "serde", "serde_json", diff --git a/standalone/src-tauri/Cargo.toml b/standalone/src-tauri/Cargo.toml index 58a0bc5c..6474f5ec 100644 --- a/standalone/src-tauri/Cargo.toml +++ b/standalone/src-tauri/Cargo.toml @@ -18,6 +18,7 @@ serde_json = "1" tauri = { version = "2", features = [] } tauri-plugin-shell = "2" tauri-plugin-updater = "2" +base64 = "0.22" serde = { version = "1", features = ["derive"] } serde_json = "1" process-wrap = { version = "9", features = ["std"] } diff --git a/standalone/src-tauri/src/lib.rs b/standalone/src-tauri/src/lib.rs index 3bb71bf2..21740530 100644 --- a/standalone/src-tauri/src/lib.rs +++ b/standalone/src-tauri/src/lib.rs @@ -1,3 +1,4 @@ +use base64::{engine::general_purpose::STANDARD as BASE64, Engine as _}; use serde::{Deserialize, Serialize}; use serde_json::{Map as JsonMap, Value as JsonValue}; use std::{ @@ -329,6 +330,142 @@ fn iframe_create_proxy_url( Ok(response.get("result").cloned().unwrap_or(JsonValue::Null)) } +// ── agent-browser host (docs/specs/dor-browser.md → "Agent-Browser Host Capabilities"). +// Thin forwarders to the Node sidecar, which runs the shared +// lib/src/host/agent-browser-host.ts — the very same module the VS Code +// extension host runs. Mirrors iframe_create_proxy_url; the logic lives in lib, +// not here, so the two hosts can't drift. ────────────────────────────────────── + +// agent-browser launches Chrome (slow on first run), and pop-out is a +// close + relaunch, so allow a generous window before a forward times out. +const AGENT_BROWSER_TIMEOUT: Duration = Duration::from_secs(30); + +fn agent_browser_forward( + state: &SidecarState, + event: &str, + data: JsonValue, +) -> Result<JsonValue, String> { + let response = request_from_sidecar_timeout(state, event, data, AGENT_BROWSER_TIMEOUT)?; + Ok(response.get("result").cloned().unwrap_or(JsonValue::Null)) +} + +#[tauri::command] +fn agent_browser_command( + state: tauri::State<'_, SidecarState>, + session: String, + args: Vec<String>, + binary_path: Option<String>, +) -> Result<JsonValue, String> { + agent_browser_forward( + &state, + "agentBrowser:command", + serde_json::json!({ "session": session, "args": args, "binaryPath": binary_path }), + ) +} + +#[tauri::command] +fn agent_browser_edit( + state: tauri::State<'_, SidecarState>, + session: String, + op: String, + binary_path: Option<String>, +) -> Result<JsonValue, String> { + agent_browser_forward( + &state, + "agentBrowser:edit", + serde_json::json!({ "session": session, "op": op, "binaryPath": binary_path }), + ) +} + +#[tauri::command] +fn agent_browser_stream_status( + state: tauri::State<'_, SidecarState>, + session: String, + binary_path: Option<String>, +) -> Result<JsonValue, String> { + agent_browser_forward( + &state, + "agentBrowser:streamStatus", + serde_json::json!({ "session": session, "binaryPath": binary_path }), + ) +} + +#[tauri::command] +fn agent_browser_open( + state: tauri::State<'_, SidecarState>, + url: String, + headed: Option<bool>, + binary_path: Option<String>, +) -> Result<JsonValue, String> { + agent_browser_forward( + &state, + "agentBrowser:open", + serde_json::json!({ "url": url, "headed": headed, "binaryPath": binary_path }), + ) +} + +// `rect` is accepted by the adapter but unused — no window positioning today. +#[tauri::command] +fn agent_browser_pop_out( + state: tauri::State<'_, SidecarState>, + session: String, + url: Option<String>, + binary_path: Option<String>, +) -> Result<JsonValue, String> { + agent_browser_forward( + &state, + "agentBrowser:popOut", + serde_json::json!({ "session": session, "url": url, "binaryPath": binary_path }), + ) +} + +#[tauri::command] +fn agent_browser_pop_in( + state: tauri::State<'_, SidecarState>, + session: String, + url: Option<String>, + binary_path: Option<String>, +) -> Result<JsonValue, String> { + agent_browser_forward( + &state, + "agentBrowser:popIn", + serde_json::json!({ "session": session, "url": url, "binaryPath": binary_path }), + ) +} + +// Screenshot returns raw image bytes. The sidecar base64s them over the +// JSON-lines stdio; decode back to a raw tauri::ipc::Response so the webview +// gets an ArrayBuffer (the path the panel decodes with createImageBitmap). +#[tauri::command] +fn agent_browser_screenshot( + state: tauri::State<'_, SidecarState>, + session: String, + format: Option<String>, + quality: Option<u32>, + binary_path: Option<String>, +) -> Result<tauri::ipc::Response, String> { + let result = agent_browser_forward( + &state, + "agentBrowser:screenshot", + serde_json::json!({ "session": session, "format": format, "quality": quality, "binaryPath": binary_path }), + )?; + if result.get("ok").and_then(JsonValue::as_bool) != Some(true) { + return Err(result + .get("error") + .and_then(JsonValue::as_str) + .unwrap_or("screenshot failed") + .to_string()); + } + let b64 = result + .get("bytesBase64") + .and_then(JsonValue::as_str) + .ok_or("screenshot returned no bytes")?; + let bytes = BASE64 + .decode(b64) + .map_err(|err| format!("bad screenshot base64: {err}"))?; + Ok(tauri::ipc::Response::new(bytes)) +} + #[tauri::command] fn read_clipboard_file_paths( state: tauri::State<'_, SidecarState>, @@ -374,19 +511,51 @@ fn kill_sidecar_now(state: tauri::State<'_, SidecarState>) { kill_sidecar_and_wait(&state.child); } +// Normal app quit should let the Node sidecar run its shutdown handler first: +// that handler closes headed agent-browser pop-out windows before killing PTYs. +// If the sidecar is wedged, fall back to the same hard kill path so quit remains +// bounded. +fn shutdown_sidecar_and_wait(state: &SidecarState) { + const POLL_INTERVAL: Duration = Duration::from_millis(20); + const MAX_POLLS: u32 = 125; + + append_log("[sidecar] requesting graceful shutdown"); + send_to_sidecar( + state, + serde_json::json!({ "event": "sidecar:shutdown", "data": {} }).to_string(), + ); + + let Ok(mut guard) = state.child.lock() else { + return; + }; + for _ in 0..MAX_POLLS { + match guard.try_wait() { + Ok(Some(status)) => { + append_log(format!( + "[sidecar] confirmed graceful exit (status: {status})" + )); + return; + } + Ok(None) => std::thread::sleep(POLL_INTERVAL), + Err(err) => { + append_log(format!( + "[sidecar] wait error during graceful shutdown: {err}" + )); + return; + } + } + } + + append_log("[sidecar] graceful shutdown timed out (~2.5s); killing"); + let _ = guard.start_kill(); +} + // Job Object on Windows / process group on Unix — kill propagates to the // sidecar's grandchildren (the spawned shells). On Unix this is SIGKILL to // the whole process group, which is more thorough than the previous // SIGTERM-to-just-node path that left node-pty grandchildren orphaned. -fn kill_sidecar(child: &SharedChild) { - if let Ok(mut guard) = child.lock() { - append_log(format!("[sidecar] killing (pid={})", guard.id())); - let _ = guard.start_kill(); - } -} - -// Like `kill_sidecar`, but blocks until the process has actually exited. The -// updater calls this before launching the Windows NSIS installer: NSIS +// +// The updater calls this before launching the Windows NSIS installer: NSIS // overwrites files inside the bundled sidecar (e.g. node-pty's `conpty.node`), // and Windows refuses to overwrite a native module the live sidecar still has // loaded — surfacing as "Error opening file for writing". Releasing those @@ -812,14 +981,21 @@ pub fn run() { read_clipboard_image_as_file_path, read_clipboard_text, read_update_log, + agent_browser_command, + agent_browser_edit, + agent_browser_screenshot, + agent_browser_stream_status, + agent_browser_open, + agent_browser_pop_out, + agent_browser_pop_in, ]) .build(tauri::generate_context!()) .expect("error while building Dormouse") .run(|app, event| { if let RunEvent::Exit = event { if let Some(state) = app.try_state::<SidecarState>() { - append_log("[app] exit — killing sidecar"); - kill_sidecar(&state.child); + append_log("[app] exit — shutting down sidecar"); + shutdown_sidecar_and_wait(&state); } } }); diff --git a/standalone/src-tauri/tauri.conf.json b/standalone/src-tauri/tauri.conf.json index 593b7c8b..4ad438b5 100644 --- a/standalone/src-tauri/tauri.conf.json +++ b/standalone/src-tauri/tauri.conf.json @@ -23,7 +23,7 @@ } ], "security": { - "csp": "default-src 'self'; script-src 'self'; style-src 'self' 'unsafe-inline'; connect-src ipc: http://ipc.localhost; frame-src http://127.0.0.1:* http://localhost:*" + "csp": "default-src 'self'; script-src 'self'; style-src 'self' 'unsafe-inline'; connect-src ipc: http://ipc.localhost ws://127.0.0.1:* ws://localhost:*; frame-src http://127.0.0.1:* http://localhost:*" } }, "bundle": { diff --git a/standalone/src/AppBar.tsx b/standalone/src/AppBar.tsx index ac89a035..bd669ef9 100644 --- a/standalone/src/AppBar.tsx +++ b/standalone/src/AppBar.tsx @@ -1,5 +1,4 @@ import { useState, useEffect, useRef, useCallback } from 'react'; -import { getCurrentWindow } from '@tauri-apps/api/window'; import { CaretDownIcon, MinusIcon, CornersOutIcon, CornersInIcon, XIcon, PlusIcon, CheckIcon } from '@phosphor-icons/react'; import { ThemePicker } from '../../lib/src/components/ThemePicker'; import { PopupButtonRow, chromeButton } from '../../lib/src/components/design'; @@ -16,15 +15,56 @@ interface AppBarProps { shells: ShellEntry[]; } -const appWindow = getCurrentWindow(); +type AppWindow = { + isFocused(): Promise<boolean>; + onFocusChanged(handler: (event: { payload: boolean }) => void): Promise<() => void>; + isMaximized(): Promise<boolean>; + onResized(handler: () => void): Promise<() => void>; + minimize(): Promise<void>; + toggleMaximize(): Promise<void>; + close(): Promise<void>; +}; + +let appWindowPromise: Promise<AppWindow | null> | null = null; + +function getAppWindow(): Promise<AppWindow | null> { + if (import.meta.env.VITE_DORMOUSE_BROWSER_DEV_HOST) { + return Promise.resolve(null); + } + appWindowPromise ??= import('@tauri-apps/api/window') + .then(({ getCurrentWindow }) => getCurrentWindow() as AppWindow); + return appWindowPromise; +} function useAppWindowFocused(): boolean { const [focused, setFocused] = useState(() => document.hasFocus()); useEffect(() => { - appWindow.isFocused().then(setFocused); - const unlisten = appWindow.onFocusChanged(({ payload }) => setFocused(payload)); - return () => { unlisten.then(fn => fn()); }; + let cancelled = false; + let cleanup: (() => void) | null = null; + + const onFocus = () => setFocused(true); + const onBlur = () => setFocused(false); + window.addEventListener('focus', onFocus); + window.addEventListener('blur', onBlur); + + getAppWindow().then((appWindow) => { + if (cancelled || !appWindow) return; + window.removeEventListener('focus', onFocus); + window.removeEventListener('blur', onBlur); + appWindow.isFocused().then((next) => { + if (!cancelled) setFocused(next); + }); + const unlisten = appWindow.onFocusChanged(({ payload }) => setFocused(payload)); + cleanup = () => { unlisten.then(fn => fn()); }; + }); + + return () => { + cancelled = true; + cleanup?.(); + window.removeEventListener('focus', onFocus); + window.removeEventListener('blur', onBlur); + }; }, []); return focused; @@ -48,15 +88,27 @@ function Tip({ label, children }: { label: string; children: React.ReactNode }) // ── Windows/Linux window buttons ─────────────────────────────────────────── function WinControls() { + const [appWindow, setAppWindow] = useState<AppWindow | null>(null); const [maximized, setMaximized] = useState(false); useEffect(() => { + let cancelled = false; + getAppWindow().then((win) => { + if (!cancelled) setAppWindow(win); + }); + return () => { cancelled = true; }; + }, []); + + useEffect(() => { + if (!appWindow) return; appWindow.isMaximized().then(setMaximized); const unlisten = appWindow.onResized(() => { appWindow.isMaximized().then(setMaximized); }); return () => { unlisten.then(fn => fn()); }; - }, []); + }, [appWindow]); + + if (!appWindow) return null; return ( <div className="flex items-stretch self-stretch"> diff --git a/standalone/src/browser-sidecar-adapter.ts b/standalone/src/browser-sidecar-adapter.ts new file mode 100644 index 00000000..a34877e5 --- /dev/null +++ b/standalone/src/browser-sidecar-adapter.ts @@ -0,0 +1,305 @@ +import type { + AgentBrowserCommandResult, + AgentBrowserEditOp, + AgentBrowserEditResult, + AgentBrowserOpenResult, + AgentBrowserPopResult, + AgentBrowserScreenshotResult, + AgentBrowserStreamStatusResult, + AlertStateDetail, + IframeProxyResult, + OpenPort, + PlatformAdapter, + PtyInfo, +} from "dormouse-lib/lib/platform/types"; +import { AlertManager, type SessionStatus } from "dormouse-lib/lib/alert-manager"; +import { normalizeExternalUri } from "dormouse-lib/lib/external-links"; +import { + applyTerminalProtocolEvents, + collectTerminalSemanticEvents, + collectTerminalProtocolResponses, + TerminalProtocolParser, +} from "dormouse-lib/lib/terminal-protocol"; +import { applyTerminalSemanticEventsByPtyId } from "dormouse-lib/lib/terminal-state-store"; +import type { DorControlRequestPayload, DorControlResult } from "dor/protocol"; +import { BrowserSidecarHost } from "./browser-sidecar-host"; + +const errMessage = (err: unknown): string => err instanceof Error ? err.message : String(err); + +function decodeBase64Bytes(base64: string): Uint8Array { + const binary = atob(base64); + const bytes = new Uint8Array(binary.length); + for (let i = 0; i < binary.length; i++) bytes[i] = binary.charCodeAt(i); + return bytes; +} + +export class BrowserSidecarAdapter implements PlatformAdapter { + private dataHandlers = new Set<(detail: { id: string; data: string }) => void>(); + private exitHandlers = new Set<(detail: { id: string; exitCode: number }) => void>(); + private listHandlers = new Set<(detail: { ptys: PtyInfo[] }) => void>(); + private replayHandlers = new Set<(detail: { id: string; data: string }) => void>(); + private filesDroppedHandlers = new Set<(paths: string[]) => void>(); + private alertStateHandlers = new Set<(detail: AlertStateDetail) => void>(); + private protocolParsers = new Map<string, TerminalProtocolParser>(); + private alertManager = new AlertManager(); + private unlistenHost: (() => void) | null = null; + + constructor(private readonly host: BrowserSidecarHost) { + this.alertManager.onStateChange((id, state) => { + for (const handler of this.alertStateHandlers) handler({ id, ...state }); + }); + + // Some of these get called through detached references (e.g. the iframe + // panel does `const createProxy = getPlatform().createIframeProxyUrl`), which + // drops `this` and makes the internal `this.host` access throw. The VS Code + // adapter binds for the same reason; mirror it so any call style is safe. + this.createIframeProxyUrl = this.createIframeProxyUrl.bind(this); + this.agentBrowserCommand = this.agentBrowserCommand.bind(this); + this.agentBrowserEdit = this.agentBrowserEdit.bind(this); + this.agentBrowserScreenshot = this.agentBrowserScreenshot.bind(this); + this.agentBrowserStreamStatus = this.agentBrowserStreamStatus.bind(this); + this.agentBrowserOpen = this.agentBrowserOpen.bind(this); + this.agentBrowserPopOut = this.agentBrowserPopOut.bind(this); + this.agentBrowserPopIn = this.agentBrowserPopIn.bind(this); + } + + async init(): Promise<void> { + await this.host.init(); + this.unlistenHost = this.host.onEvent(({ event, data }) => this.handleHostEvent(event, data)); + this.installConsoleForwarder(); + } + + shutdown(): void { + this.alertManager.dispose(); + this.protocolParsers.clear(); + this.unlistenHost?.(); + this.unlistenHost = null; + this.host.send("kill_sidecar_now"); + this.host.close(); + } + + async getAvailableShells(): Promise<{ name: string; path: string; args?: string[] }[]> { + try { + return await this.host.invoke("get_available_shells"); + } catch { + return []; + } + } + + spawnPty(id: string, options?: { cols?: number; rows?: number; cwd?: string; shell?: string; args?: string[] }): void { + this.protocolParsers.set(id, new TerminalProtocolParser()); + this.host.send("pty_spawn", { id, options }); + } + + writePty(id: string, data: string): void { + this.host.send("pty_write", { id, data }); + } + + resizePty(id: string, cols: number, rows: number): void { + this.host.send("pty_resize", { id, cols, rows }); + } + + killPty(id: string): void { + this.protocolParsers.delete(id); + this.host.send("pty_kill", { id }); + } + + async getCwd(id: string): Promise<string | null> { + try { return await this.host.invoke("pty_get_cwd", { id }); } catch { return null; } + } + + async getScrollback(id: string): Promise<string | null> { + try { return await this.host.invoke("pty_get_scrollback", { id }); } catch { return null; } + } + + async getOpenPorts(id: string): Promise<OpenPort[]> { + try { return await this.host.invoke("pty_get_open_ports", { id }); } catch { return []; } + } + + async readClipboardFilePaths(): Promise<string[] | null> { + try { return await this.host.invoke("read_clipboard_file_paths"); } catch { return null; } + } + + async readClipboardImageAsFilePath(): Promise<string | null> { + try { return await this.host.invoke("read_clipboard_image_as_file_path"); } catch { return null; } + } + + async readClipboardText(): Promise<string | null> { + try { return await this.host.invoke("read_clipboard_text"); } catch { return null; } + } + + async createIframeProxyUrl(targetUrl: string): Promise<IframeProxyResult> { + try { + return await this.host.invoke("iframe_create_proxy_url", { target: targetUrl }); + } catch (err) { + return { ok: false, reason: "unreachable", detail: errMessage(err) }; + } + } + + async agentBrowserCommand(session: string, args: string[], binaryPath?: string): Promise<AgentBrowserCommandResult> { + try { return await this.host.invoke("agent_browser_command", { session, args, binaryPath }); } + catch (err) { return { exitCode: 1, stdout: "", stderr: errMessage(err) }; } + } + + async agentBrowserEdit(session: string, op: AgentBrowserEditOp, binaryPath?: string): Promise<AgentBrowserEditResult> { + try { return await this.host.invoke("agent_browser_edit", { session, op, binaryPath }); } + catch (err) { return { ok: false, error: errMessage(err) }; } + } + + async agentBrowserScreenshot(session: string, opts: { format?: "jpeg" | "png"; quality?: number }, binaryPath?: string): Promise<AgentBrowserScreenshotResult> { + try { + const result = await this.host.invoke<{ ok: true; mime?: string; bytesBase64: string } | { ok: false; error?: string }>( + "agent_browser_screenshot", + { session, format: opts.format, quality: opts.quality, binaryPath }, + ); + if (!result.ok) return { ok: false, error: result.error }; + return { ok: true, bytes: decodeBase64Bytes(result.bytesBase64), mime: result.mime ?? (opts.format === "png" ? "image/png" : "image/jpeg") }; + } catch (err) { + return { ok: false, error: errMessage(err) }; + } + } + + async agentBrowserStreamStatus(session: string, binaryPath?: string): Promise<AgentBrowserStreamStatusResult> { + try { return await this.host.invoke("agent_browser_stream_status", { session, binaryPath }); } + catch (err) { return { ok: false, error: errMessage(err) }; } + } + + async agentBrowserOpen(url: string, opts: { headed?: boolean }, binaryPath?: string): Promise<AgentBrowserOpenResult> { + try { return await this.host.invoke("agent_browser_open", { url, headed: opts.headed, binaryPath }); } + catch (err) { return { ok: false, error: errMessage(err) }; } + } + + async agentBrowserPopOut(session: string, opts: { rect?: { x: number; y: number; width: number; height: number }; url?: string }, binaryPath?: string): Promise<AgentBrowserPopResult> { + try { return await this.host.invoke("agent_browser_pop_out", { session, url: opts.url, rect: opts.rect, binaryPath }); } + catch (err) { return { ok: false, error: errMessage(err) }; } + } + + async agentBrowserPopIn(session: string, opts: { url?: string }, binaryPath?: string): Promise<AgentBrowserPopResult> { + try { return await this.host.invoke("agent_browser_pop_in", { session, url: opts.url, binaryPath }); } + catch (err) { return { ok: false, error: errMessage(err) }; } + } + + openExternal(uri: string): void { + const normalized = normalizeExternalUri(uri); + if (normalized) window.open(normalized, "_blank", "noopener,noreferrer"); + } + + onFilesDropped(handler: (paths: string[]) => void): () => void { + this.filesDroppedHandlers.add(handler); + return () => { this.filesDroppedHandlers.delete(handler); }; + } + + onPtyData(handler: (detail: { id: string; data: string }) => void): void { this.dataHandlers.add(handler); } + offPtyData(handler: (detail: { id: string; data: string }) => void): void { this.dataHandlers.delete(handler); } + onPtyExit(handler: (detail: { id: string; exitCode: number }) => void): void { this.exitHandlers.add(handler); } + offPtyExit(handler: (detail: { id: string; exitCode: number }) => void): void { this.exitHandlers.delete(handler); } + requestInit(): void { this.host.send("pty_request_init"); } + onPtyList(handler: (detail: { ptys: PtyInfo[] }) => void): void { this.listHandlers.add(handler); } + offPtyList(handler: (detail: { ptys: PtyInfo[] }) => void): void { this.listHandlers.delete(handler); } + onPtyReplay(handler: (detail: { id: string; data: string }) => void): void { this.replayHandlers.add(handler); } + offPtyReplay(handler: (detail: { id: string; data: string }) => void): void { this.replayHandlers.delete(handler); } + onRequestSessionFlush(_handler: (detail: { requestId: string }) => void): void {} + offRequestSessionFlush(_handler: (detail: { requestId: string }) => void): void {} + notifySessionFlushComplete(_requestId: string): void {} + + alertRemove(id: string): void { this.alertManager.remove(id); } + alertToggle(id: string): void { this.alertManager.toggleAlert(id); } + alertDisable(id: string): void { this.alertManager.disableAlert(id); } + alertDismiss(id: string): void { this.alertManager.dismissAlert(id); } + alertDismissOrToggle(id: string, displayedStatus: string): void { this.alertManager.dismissOrToggleAlert(id, displayedStatus as SessionStatus); } + alertAttend(id: string): void { this.alertManager.attend(id); } + alertResize(id: string): void { this.alertManager.onResize(id); } + alertClearAttention(id?: string): void { this.alertManager.clearAttention(id); } + alertToggleTodo(id: string): void { this.alertManager.toggleTodo(id); } + alertMarkTodo(id: string): void { this.alertManager.markTodo(id); } + alertClearTodo(id: string): void { this.alertManager.clearTodo(id); } + onAlertState(handler: (detail: AlertStateDetail) => void): void { this.alertStateHandlers.add(handler); } + offAlertState(handler: (detail: AlertStateDetail) => void): void { this.alertStateHandlers.delete(handler); } + + private static STATE_KEY = 'dormouse.browser-sidecar.session'; + + saveState(state: unknown): void { + try { localStorage.setItem(BrowserSidecarAdapter.STATE_KEY, JSON.stringify(state)); } + catch { console.error('[browser-sidecar] Failed to save session state'); } + } + + getState(): unknown { + try { + const raw = localStorage.getItem(BrowserSidecarAdapter.STATE_KEY); + return raw ? JSON.parse(raw) : null; + } catch { + return null; + } + } + + private handleHostEvent(event: string, data: unknown): void { + if (event === "pty:data") { + const { id, data: text } = data as { id: string; data: string }; + const parsed = this.getProtocolParser(id).process(text); + applyTerminalProtocolEvents(this.alertManager, id, parsed.events); + const semanticEvents = collectTerminalSemanticEvents(parsed.events); + this.alertManager.applyTerminalSemanticEvents(id, semanticEvents); + applyTerminalSemanticEventsByPtyId(id, semanticEvents); + for (const response of collectTerminalProtocolResponses(parsed.events)) this.writePty(id, response); + if (parsed.visibleData.length === 0) return; + this.alertManager.onData(id); + for (const handler of this.dataHandlers) handler({ id, data: parsed.visibleData }); + } else if (event === "pty:exit") { + const payload = data as { id: string; exitCode: number }; + this.alertManager.onExit(payload.id, payload.exitCode); + this.protocolParsers.delete(payload.id); + for (const handler of this.exitHandlers) handler(payload); + } else if (event === "pty:list") { + for (const handler of this.listHandlers) handler(data as { ptys: PtyInfo[] }); + } else if (event === "pty:replay") { + const { id, data: text } = data as { id: string; data: string }; + const parsed = this.getProtocolParser(id).process(text); + applyTerminalSemanticEventsByPtyId(id, collectTerminalSemanticEvents(parsed.events)); + for (const handler of this.replayHandlers) handler({ id, data: parsed.visibleData }); + } else if (event === "dor:controlRequest") { + const payload = data as DorControlRequestPayload; + const respond = (response: DorControlResult) => { + this.host.send("dor_control_response", { response: { requestId: payload.requestId, ...response } }); + }; + window.dispatchEvent(new CustomEvent("dormouse:control-request", { + detail: { + requestId: payload.requestId, + surfaceId: payload.surfaceId, + method: payload.method, + params: payload.params ?? {}, + respond, + }, + })); + } + } + + private getProtocolParser(id: string): TerminalProtocolParser { + let parser = this.protocolParsers.get(id); + if (!parser) { + parser = new TerminalProtocolParser(); + this.protocolParsers.set(id, parser); + } + return parser; + } + + private installConsoleForwarder(): void { + const patched = window as typeof window & { __DORMOUSE_BROWSER_CONSOLE_PATCHED__?: boolean }; + if (patched.__DORMOUSE_BROWSER_CONSOLE_PATCHED__) return; + patched.__DORMOUSE_BROWSER_CONSOLE_PATCHED__ = true; + for (const level of ["log", "warn", "error"] as const) { + const original = console[level].bind(console); + console[level] = (...args: unknown[]) => { + original(...args); + fetch(this.host.url('/__dormouse_dev_host/console'), { + method: 'POST', + headers: { 'content-type': 'application/json' }, + body: JSON.stringify({ level, args: args.map((arg) => { + try { return typeof arg === 'string' ? arg : JSON.stringify(arg); } + catch { return String(arg); } + }) }), + }).catch(() => {}); + }; + } + } + +} diff --git a/standalone/src/browser-sidecar-host.ts b/standalone/src/browser-sidecar-host.ts new file mode 100644 index 00000000..70694398 --- /dev/null +++ b/standalone/src/browser-sidecar-host.ts @@ -0,0 +1,80 @@ +export type BrowserSidecarEvent = { event: string; data: unknown }; + +type Pending = { + resolve: (value: unknown) => void; + reject: (error: Error) => void; +}; + +export class BrowserSidecarHost { + private events: EventSource | null = null; + private readonly eventHandlers = new Set<(event: BrowserSidecarEvent) => void>(); + private readonly pending = new Map<string, Pending>(); + private nextId = 1; + + constructor(private readonly baseUrl: string) {} + + url(path: string): URL { + return new URL(path, this.baseUrl); + } + + async init(): Promise<void> { + if (this.events) return; + const url = this.url('/__dormouse_dev_host/events'); + this.events = new EventSource(url); + this.events.addEventListener('sidecar', (event) => { + const parsed = JSON.parse((event as MessageEvent).data) as BrowserSidecarEvent; + this.deliver(parsed); + }); + this.events.onerror = () => { + console.error('[browser-sidecar] event stream disconnected'); + }; + } + + close(): void { + this.events?.close(); + this.events = null; + for (const { reject } of this.pending.values()) reject(new Error('browser sidecar host closed')); + this.pending.clear(); + } + + onEvent(handler: (event: BrowserSidecarEvent) => void): () => void { + this.eventHandlers.add(handler); + return () => this.eventHandlers.delete(handler); + } + + send(cmd: string, args?: Record<string, unknown>): void { + fetch(this.url('/__dormouse_dev_host/send'), { + method: 'POST', + headers: { 'content-type': 'application/json' }, + body: JSON.stringify({ cmd, args: args ?? {} }), + }).catch((err) => console.error(`[browser-sidecar] ${cmd} failed:`, err)); + } + + async invoke<T>(cmd: string, args?: Record<string, unknown>): Promise<T> { + const requestId = `browser-${this.nextId++}`; + const response = await fetch(this.url('/__dormouse_dev_host/invoke'), { + method: 'POST', + headers: { 'content-type': 'application/json' }, + body: JSON.stringify({ requestId, cmd, args: args ?? {} }), + }); + if (!response.ok) throw new Error(await response.text()); + const body = await response.json() as { ok: boolean; result?: T; error?: string }; + if (!body.ok) throw new Error(body.error ?? `${cmd} failed`); + return body.result as T; + } + + private deliver(event: BrowserSidecarEvent): void { + const data = event.data as { requestId?: unknown; error?: unknown }; + const requestId = typeof data?.requestId === 'string' ? data.requestId : null; + if (requestId) { + const pending = this.pending.get(requestId); + if (pending) { + this.pending.delete(requestId); + if (typeof data.error === 'string') pending.reject(new Error(data.error)); + else pending.resolve(event.data); + return; + } + } + for (const handler of this.eventHandlers) handler(event); + } +} diff --git a/standalone/src/main.tsx b/standalone/src/main.tsx index d9b442d0..a1fd5573 100644 --- a/standalone/src/main.tsx +++ b/standalone/src/main.tsx @@ -1,13 +1,12 @@ import { StrictMode, useEffect, useState } from "react"; import { createRoot } from "react-dom/client"; -import { invoke } from "@tauri-apps/api/core"; import { setPlatform } from "dormouse-lib/lib/platform"; +import type { PlatformAdapter } from "dormouse-lib/lib/platform/types"; import { resumeOrRestore } from "dormouse-lib/lib/reconnect"; import { setDefaultShellOpts } from "dormouse-lib/lib/shell-defaults"; import { restoreActiveTheme } from "dormouse-lib/lib/themes"; import App from "dormouse-lib/App"; import "dormouse-lib/index.css"; -import { TauriAdapter } from "./tauri-adapter"; import { UpdateBanner } from "./UpdateBanner"; import { UpdateDebugModal } from "./UpdateDebugModal"; import { AppBar, type ShellEntry } from "./AppBar"; @@ -20,10 +19,6 @@ import { buildDebugReport, } from "./updater"; -// Initialize Tauri platform adapter before rendering -const platform = new TauriAdapter(); -setPlatform(platform); - function ConnectedUpdateBanner() { const state = useUpdateState(); const [snapshot, setSnapshot] = useState<{ version: string; error?: string } | null>(null); @@ -69,15 +64,30 @@ function ConnectedUpdateBanner() { ); } +async function createPlatform(): Promise<PlatformAdapter> { + const browserDevHost = import.meta.env.VITE_DORMOUSE_BROWSER_DEV_HOST as string | undefined; + if (browserDevHost) { + const [{ BrowserSidecarHost }, { BrowserSidecarAdapter }] = await Promise.all([ + import("./browser-sidecar-host"), + import("./browser-sidecar-adapter"), + ]); + return new BrowserSidecarAdapter(new BrowserSidecarHost(browserDevHost)); + } + const { TauriAdapter } = await import("./tauri-adapter"); + return new TauriAdapter(); +} + // Await init() first to register event listeners before reconnecting async function bootstrap() { + const platform = await createPlatform(); + setPlatform(platform); await platform.init(); const { initAlertStateReceiver } = await import("dormouse-lib/lib/terminal-registry"); initAlertStateReceiver(); restoreActiveTheme(); - // Fetch app bar data from Rust backend - const detectedShells = await invoke<ShellEntry[]>("get_available_shells"); + // Fetch app bar data from the active host backend. + const detectedShells = await platform.getAvailableShells(); const shells: ShellEntry[] = detectedShells.length > 0 ? detectedShells : [{ name: 'shell', path: '' }]; const initialShell = shells[0]; setDefaultShellOpts(initialShell ? { shell: initialShell.path, args: initialShell.args } : null); diff --git a/standalone/src/tauri-adapter.ts b/standalone/src/tauri-adapter.ts index 0bbe5926..13661c08 100644 --- a/standalone/src/tauri-adapter.ts +++ b/standalone/src/tauri-adapter.ts @@ -1,7 +1,20 @@ import { invoke as rawInvoke } from "@tauri-apps/api/core"; import { listen } from "@tauri-apps/api/event"; import { open } from "@tauri-apps/plugin-shell"; -import type { AlertStateDetail, IframeProxyResult, OpenPort, PlatformAdapter, PtyInfo } from "dormouse-lib/lib/platform/types"; +import type { + AgentBrowserCommandResult, + AgentBrowserEditOp, + AgentBrowserEditResult, + AgentBrowserOpenResult, + AgentBrowserPopResult, + AgentBrowserScreenshotResult, + AgentBrowserStreamStatusResult, + AlertStateDetail, + IframeProxyResult, + OpenPort, + PlatformAdapter, + PtyInfo, +} from "dormouse-lib/lib/platform/types"; import { AlertManager, type SessionStatus } from "dormouse-lib/lib/alert-manager"; import { normalizeExternalUri } from "dormouse-lib/lib/external-links"; import { @@ -21,6 +34,9 @@ function invoke(cmd: string, args?: Record<string, unknown>): void { ); } +const errMessage = (err: unknown): string => + err instanceof Error ? err.message : String(err); + /** * Platform adapter for the Tauri standalone app. * @@ -221,7 +237,81 @@ export class TauriAdapter implements PlatformAdapter { try { return await rawInvoke<IframeProxyResult>("iframe_create_proxy_url", { target: targetUrl }); } catch (err) { - return { ok: false, reason: "unreachable", detail: err instanceof Error ? err.message : String(err) }; + return { ok: false, reason: "unreachable", detail: errMessage(err) }; + } + } + + // --- agent-browser host capabilities (see docs/specs/dor-browser.md → + // "Agent-Browser Host Capabilities"). Each invokes the matching Rust command, which runs the + // user's agent-browser binary (binaryPath → DORMOUSE_AGENT_BROWSER_BIN → PATH, + // mirroring the VS Code host's runWithBinaryFallback). Note there is no + // getAgentBrowserStreamUrl here: the agent-browser stream server accepts the + // tauri://localhost origin, so the panel connects directly to + // ws://127.0.0.1:<port> via its built-in fallback when the method is absent. --- + + async agentBrowserCommand(session: string, args: string[], binaryPath?: string): Promise<AgentBrowserCommandResult> { + try { + return await rawInvoke<AgentBrowserCommandResult>("agent_browser_command", { session, args, binaryPath }); + } catch (err) { + return { exitCode: 1, stdout: "", stderr: errMessage(err) }; + } + } + + async agentBrowserEdit(session: string, op: AgentBrowserEditOp, binaryPath?: string): Promise<AgentBrowserEditResult> { + try { + return await rawInvoke<AgentBrowserEditResult>("agent_browser_edit", { session, op, binaryPath }); + } catch (err) { + return { ok: false, error: errMessage(err) }; + } + } + + async agentBrowserScreenshot(session: string, opts: { format?: "jpeg" | "png"; quality?: number }, binaryPath?: string): Promise<AgentBrowserScreenshotResult> { + // The Rust command returns the raw image as an ArrayBuffer (tauri::ipc::Response) + // on success, or rejects with an error string — no base64 round-trip. + try { + const buffer = await rawInvoke<ArrayBuffer>("agent_browser_screenshot", { + session, + format: opts.format, + quality: opts.quality, + binaryPath, + }); + const mime = opts.format === "png" ? "image/png" : "image/jpeg"; + return { ok: true, bytes: new Uint8Array(buffer), mime }; + } catch (err) { + return { ok: false, error: errMessage(err) }; + } + } + + async agentBrowserStreamStatus(session: string, binaryPath?: string): Promise<AgentBrowserStreamStatusResult> { + try { + return await rawInvoke<AgentBrowserStreamStatusResult>("agent_browser_stream_status", { session, binaryPath }); + } catch (err) { + return { ok: false, error: errMessage(err) }; + } + } + + async agentBrowserOpen(url: string, opts: { headed?: boolean }, binaryPath?: string): Promise<AgentBrowserOpenResult> { + try { + return await rawInvoke<AgentBrowserOpenResult>("agent_browser_open", { url, headed: opts.headed, binaryPath }); + } catch (err) { + return { ok: false, error: errMessage(err) }; + } + } + + async agentBrowserPopOut(session: string, opts: { rect?: { x: number; y: number; width: number; height: number }; url?: string }, binaryPath?: string): Promise<AgentBrowserPopResult> { + // `rect` is accepted by the type but unused — no window positioning today. + try { + return await rawInvoke<AgentBrowserPopResult>("agent_browser_pop_out", { session, url: opts.url, binaryPath }); + } catch (err) { + return { ok: false, error: errMessage(err) }; + } + } + + async agentBrowserPopIn(session: string, opts: { url?: string }, binaryPath?: string): Promise<AgentBrowserPopResult> { + try { + return await rawInvoke<AgentBrowserPopResult>("agent_browser_pop_in", { session, url: opts.url, binaryPath }); + } catch (err) { + return { ok: false, error: errMessage(err) }; } } diff --git a/standalone/src/updater.ts b/standalone/src/updater.ts index a42f5a5c..46cdb5bd 100644 --- a/standalone/src/updater.ts +++ b/standalone/src/updater.ts @@ -1,16 +1,41 @@ import { useSyncExternalStore } from 'react'; -import { check, type Update } from '@tauri-apps/plugin-updater'; -import { getCurrentWindow } from '@tauri-apps/api/window'; -import { getVersion } from '@tauri-apps/api/app'; -import { open } from '@tauri-apps/plugin-shell'; -import { invoke } from '@tauri-apps/api/core'; import { IS_WINDOWS, PLATFORM_STRING } from 'dormouse-lib/lib/platform'; import type { UpdateBannerState } from './UpdateBanner'; +import type { Update } from '@tauri-apps/plugin-updater'; const GITHUB_REPO_URL = 'https://github.com/diffplug/dormouse'; +const BROWSER_DEV_HOST = Boolean(import.meta.env.VITE_DORMOUSE_BROWSER_DEV_HOST); function openUrl(url: string, context: string): void { - open(url).catch((e) => console.error(`[updater] Failed to open ${context}:`, e)); + if (BROWSER_DEV_HOST) { + window.open(url, '_blank', 'noopener,noreferrer'); + return; + } + import('@tauri-apps/plugin-shell') + .then(({ open }) => open(url)) + .catch((e) => console.error(`[updater] Failed to open ${context}:`, e)); +} + +async function checkForUpdate(): Promise<Update | null> { + if (BROWSER_DEV_HOST) return null; + const { check } = await import('@tauri-apps/plugin-updater'); + return check(); +} + +async function getAppVersion(): Promise<string> { + if (BROWSER_DEV_HOST) return 'browser-dev'; + const { getVersion } = await import('@tauri-apps/api/app'); + return getVersion(); +} + +async function invokeTauri<T>(cmd: string): Promise<T> { + const { invoke } = await import('@tauri-apps/api/core'); + return invoke<T>(cmd); +} + +async function getAppWindow() { + const { getCurrentWindow } = await import('@tauri-apps/api/window'); + return getCurrentWindow(); } // --- State --- @@ -64,14 +89,16 @@ export function openChangelog(): void { } async function openCurrentVersionChangelog(): Promise<void> { - const version = (await getVersion()).trim(); + const version = (await getAppVersion()).trim(); openUrl(`https://dormouse.sh/changelog/after/${encodeURIComponent(version)}`, 'changelog'); } export async function buildDebugReport(error: string, toVersion: string): Promise<string> { const [fromVersion, logTail] = await Promise.all([ - getVersion().catch(() => ''), - invoke<string>('read_update_log').catch((e) => `(failed to read log: ${String(e)})`), + getAppVersion().catch(() => ''), + BROWSER_DEV_HOST + ? Promise.resolve('(update log is unavailable in browser dev)') + : invokeTauri<string>('read_update_log').catch((e) => `(failed to read log: ${String(e)})`), ]); return [ @@ -99,11 +126,12 @@ export function openIssueSearch(error: string): void { // --- Lifecycle --- export function startUpdateCheck(): void { + if (BROWSER_DEV_HOST) return; void runUpdateCheck(); } async function runUpdateCheck(): Promise<void> { - currentVersion = await getVersion(); + currentVersion = await getAppVersion(); let hadFailureMarker = false; @@ -143,7 +171,7 @@ async function runUpdateCheck(): Promise<void> { await new Promise((resolve) => setTimeout(resolve, 5_000)); try { - const update = await check(); + const update = await checkForUpdate(); if (!update) { registerCloseHandler(); return; @@ -210,10 +238,11 @@ export function _resetForTesting(): void { let closeHandlerRegistered = false; function registerCloseHandler(): void { + if (BROWSER_DEV_HOST) return; if (closeHandlerRegistered) return; closeHandlerRegistered = true; - getCurrentWindow().onCloseRequested(async (event) => { + getAppWindow().then((appWindow) => appWindow.onCloseRequested(async (event) => { const update = pendingUpdate; if (!update) return; @@ -238,7 +267,7 @@ function registerCloseHandler(): void { // fully exit before launching the installer. (On macOS/Linux open files // can be replaced in place, so this is Windows-only.) if (IS_WINDOWS) { - await invoke('kill_sidecar_now'); + await invokeTauri('kill_sidecar_now'); } await update.install(); } catch (e) { @@ -252,6 +281,6 @@ function registerCloseHandler(): void { } pendingUpdate = null; - await getCurrentWindow().close(); - }); + await appWindow.close(); + })); } diff --git a/standalone/vite.config.ts b/standalone/vite.config.ts index d5c6c4c8..fd790d6e 100644 --- a/standalone/vite.config.ts +++ b/standalone/vite.config.ts @@ -8,6 +8,7 @@ const dorDir = path.resolve(__dirname, "../dor"); // https://v2.tauri.app/start/frontend/vite/ const host = process.env.TAURI_DEV_HOST; +const port = Number(process.env.DORMOUSE_BROWSER_DEV_VITE_PORT || 1420); export default defineConfig({ plugins: [react(), tailwindcss()], @@ -24,7 +25,7 @@ export default defineConfig({ // Tauri expects a fixed port; fail if that port is not available server: { host: host || false, - port: 1420, + port, strictPort: true, hmr: host ? { protocol: "ws", host, port: 1421 } : undefined, fs: { diff --git a/vscode-ext/src/agent-browser-host.ts b/vscode-ext/src/agent-browser-host.ts index 773959b7..c75de423 100644 --- a/vscode-ext/src/agent-browser-host.ts +++ b/vscode-ext/src/agent-browser-host.ts @@ -1,180 +1,44 @@ /** - * Extension-host support for the agent-browser surface - * (docs/specs/dor-agent-browser.md → "Host capabilities"). + * Extension-host wiring for the agent-browser surface + * (docs/specs/dor-browser.md → "Agent-Browser Host Capabilities"). * - * Two narrow capabilities, both on behalf of the webview: - * - * 1. `runAgentBrowserCommand` — runs the user's agent-browser binary against a - * session for tab actions and session teardown. Subcommands are - * allowlisted; this is not a general exec channel. - * - * 2. `createStreamRelayUrl` — a loopback-only TCP relay that strips the - * `Origin` header from WebSocket upgrade requests. The agent-browser stream - * server returns 403 for `vscode-webview://` origins (only localhost or - * absent origins are accepted), so the webview cannot connect directly; it - * connects to a short-lived tokenized relay URL instead and the relay pipes - * bytes only to the authorized 127.0.0.1:<streamPort>. + * The capability logic itself is host-agnostic and lives in + * `lib/src/host/agent-browser-host.ts` (shared verbatim with the standalone + * Node sidecar). This file only: + * 1. instantiates that shared host with the two VS-Code-specific bits — + * writing the OS clipboard and logging — and re-exports its methods; and + * 2. owns the **stream relay**, which is genuinely VS-Code-only: the + * agent-browser stream server returns 403 for `vscode-webview://` origins + * (only localhost or absent origins are accepted), so the webview cannot + * connect directly. It connects to a short-lived tokenized relay URL and + * the relay pipes bytes only to the authorized 127.0.0.1:<streamPort>. + * (The standalone webview's `tauri://localhost` origin is accepted, so it + * connects directly and needs no relay.) */ import * as vscode from 'vscode'; import * as net from 'net'; -import * as os from 'os'; -import * as path from 'path'; -import { promises as fs } from 'fs'; -import { spawn } from 'child_process'; import { randomBytes } from 'crypto'; import { log } from './log'; -import { - AGENT_BROWSER_ALLOWED_SUBCOMMANDS, - type AgentBrowserCommandResult, - type AgentBrowserEditOp, - type AgentBrowserEditResult, - type AgentBrowserScreenshotResult, -} from '../../lib/src/lib/platform/types'; +import { createAgentBrowserHost } from '../../lib/src/host/agent-browser-host'; + +const host = createAgentBrowserHost({ + writeClipboardText: (text) => vscode.env.clipboard.writeText(text), + log: (message) => log.info(message), +}); + +export const runAgentBrowserCommand = host.command; +export const runAgentBrowserEdit = host.edit; +export const runAgentBrowserScreenshot = host.screenshot; +export const runAgentBrowserStreamStatus = host.streamStatus; +export const runAgentBrowserOpen = host.open; +export const runAgentBrowserPopOut = host.popOut; +export const runAgentBrowserPopIn = host.popIn; +export const closePoppedOutSessions = host.closePoppedOut; -const ALLOWED_SUBCOMMANDS = new Set<string>(AGENT_BROWSER_ALLOWED_SUBCOMMANDS); const STREAM_RELAY_TOKEN_BYTES = 32; const STREAM_RELAY_GRANT_TTL_MS = 60_000; const STREAM_RELAY_GRANT_SWEEP_MS = 30_000; -// The host owns the exact JS for each editing op — the webview only selects a -// name, so this never becomes an arbitrary-eval channel. copy/cut return the -// selected text; selectAll returns ''. Inputs/textareas use selection ranges; -// everything else falls back to the Selection API + execCommand. -const EDIT_SCRIPTS: Record<AgentBrowserEditOp, string> = { - selectAll: `(()=>{const el=document.activeElement;if(el&&'select'in el&&'value'in el){el.select();}else{document.execCommand('selectAll');}return'';})()`, - copy: `(()=>{const el=document.activeElement;if(el&&'selectionStart'in el&&el.selectionStart!=null){return el.value.slice(el.selectionStart,el.selectionEnd);}return String(window.getSelection()||'');})()`, - cut: `(()=>{const el=document.activeElement;if(el&&'selectionStart'in el&&el.selectionStart!=null){const s=el.selectionStart,e=el.selectionEnd,t=el.value.slice(s,e);el.setRangeText('',s,e,'end');el.dispatchEvent(new Event('input',{bubbles:true}));return t;}const sel=String(window.getSelection()||'');if(sel)document.execCommand('delete');return sel;})()`, -}; - -export async function runAgentBrowserCommand(session: string, args: string[], binaryPath?: string): Promise<AgentBrowserCommandResult> { - if (typeof session !== 'string' || !session) { - return { exitCode: 1, stdout: '', stderr: 'session is required' }; - } - const subcommand = args[0]; - if (!subcommand || !ALLOWED_SUBCOMMANDS.has(subcommand)) { - return { exitCode: 1, stdout: '', stderr: `agent-browser subcommand '${subcommand ?? ''}' is not allowed from the webview` }; - } - return runWithBinaryFallback(['--session', session, ...args], binaryPath); -} - -export async function runAgentBrowserEdit(session: string, op: AgentBrowserEditOp, binaryPath?: string): Promise<AgentBrowserEditResult> { - if (typeof session !== 'string' || !session) { - return { ok: false, error: 'session is required' }; - } - const script = EDIT_SCRIPTS[op]; - if (!script) { - return { ok: false, error: `unknown edit op '${op}'` }; - } - - const result = await runWithBinaryFallback(['--session', session, 'eval', script, '--json'], binaryPath); - if (result.exitCode !== 0) { - return { ok: false, error: result.stderr.trim() || `eval exited ${result.exitCode}` }; - } - - // eval --json envelope: { success, data: { result }, error }. - let text = ''; - try { - const envelope = JSON.parse(result.stdout) as { success?: boolean; data?: { result?: unknown }; error?: unknown }; - if (envelope.success === false) { - return { ok: false, error: typeof envelope.error === 'string' ? envelope.error : `${op} failed` }; - } - if (typeof envelope.data?.result === 'string') text = envelope.data.result; - } catch { - return { ok: false, error: `could not parse eval output for ${op}` }; - } - - if (op === 'selectAll') return { ok: true }; - // Land the grabbed text on the user's real OS clipboard. Skip empty so an - // empty selection doesn't clobber what's already there. - if (text) await vscode.env.clipboard.writeText(text); - return { ok: true, text }; -} - -// Reused per session so we don't litter tmp with one file per frame; the panel -// guarantees one screenshot in flight per surface, so overwriting is safe. -function screenshotPath(session: string, ext: string): string { - const safe = session.replace(/[^A-Za-z0-9._-]/g, '_'); - return path.join(os.tmpdir(), `dormouse-ab-shot-${safe}.${ext}`); -} - -// Capture one device-resolution frame via the user's agent-browser `screenshot` -// command (which honors the session's viewport/DPR, unlike the CSS-resolution -// screencast) and return the raw image bytes. agent-browser writes a file and -// reports the path; we read it back and hand the bytes to the webview. -export async function runAgentBrowserScreenshot( - session: string, - opts: { format?: 'jpeg' | 'png'; quality?: number }, - binaryPath?: string, -): Promise<AgentBrowserScreenshotResult> { - if (typeof session !== 'string' || !session) { - return { ok: false, error: 'session is required' }; - } - const format = opts.format === 'png' ? 'png' : 'jpeg'; - const ext = format === 'png' ? 'png' : 'jpg'; - const out = screenshotPath(session, ext); - const args = ['--session', session, 'screenshot', out, '--screenshot-format', format]; - if (format === 'jpeg') { - const q = Number.isFinite(opts.quality) ? Math.min(100, Math.max(1, Math.round(opts.quality as number))) : 85; - args.push('--screenshot-quality', String(q)); - } - const result = await runWithBinaryFallback(args, binaryPath); - if (result.exitCode !== 0) { - log.info(`[agent-browser] screenshot failed (exit ${result.exitCode}): ${result.stderr.trim() || result.stdout.trim()}`); - return { ok: false, error: result.stderr.trim() || `screenshot exited ${result.exitCode}` }; - } - try { - const buffer = await fs.readFile(out); - // A Uint8Array view over exactly this file's bytes; structured-clone copies - // it across the webview boundary (no base64 round-trip). - const bytes = new Uint8Array(buffer.buffer, buffer.byteOffset, buffer.byteLength); - return { ok: true, bytes, mime: format === 'png' ? 'image/png' : 'image/jpeg' }; - } catch (err) { - log.info(`[agent-browser] screenshot read failed: ${err instanceof Error ? err.message : String(err)}`); - return { ok: false, error: `could not read screenshot file: ${err instanceof Error ? err.message : String(err)}` }; - } -} - -// The extension host's PATH is often the GUI login PATH (no nvm/volta shims), -// so prefer the absolute path `dor ab` resolved in the user's terminal; fall -// through on ENOENT in case it has gone stale. -async function runWithBinaryFallback(args: string[], binaryPath?: string): Promise<AgentBrowserCommandResult> { - const candidates = [...new Set([ - binaryPath, - process.env.DORMOUSE_AGENT_BROWSER_BIN, - 'agent-browser', - ].filter((c): c is string => !!c))]; - - let lastError = ''; - for (const binary of candidates) { - const result = await spawnAgentBrowser(binary, args); - if (result !== 'ENOENT') return result; - lastError = `'${binary}' was not found`; - log.info(`[agent-browser] ${lastError}; trying next candidate`); - } - return { exitCode: 1, stdout: '', stderr: `agent-browser binary not found (${lastError})` }; -} - -function spawnAgentBrowser(binary: string, args: string[]): Promise<AgentBrowserCommandResult | 'ENOENT'> { - return new Promise((resolve) => { - const child = spawn(binary, args, { stdio: ['ignore', 'pipe', 'pipe'] }); - let stdout = ''; - let stderr = ''; - child.stdout.on('data', (chunk) => { stdout += String(chunk); }); - child.stderr.on('data', (chunk) => { stderr += String(chunk); }); - child.on('error', (err: NodeJS.ErrnoException) => { - if (err.code === 'ENOENT') { - resolve('ENOENT'); - return; - } - log.info(`[agent-browser] spawn failed: ${err.message}`); - resolve({ exitCode: 1, stdout: '', stderr: err.message }); - }); - child.on('close', (code) => { - resolve({ exitCode: code ?? 1, stdout, stderr }); - }); - }); -} - let relayPortPromise: Promise<number> | null = null; const streamRelayGrants = new Map<string, { port: number; expiresAt: number }>(); let lastStreamRelayGrantSweep = 0; diff --git a/vscode-ext/src/extension.ts b/vscode-ext/src/extension.ts index 8c25cc0f..860b174d 100644 --- a/vscode-ext/src/extension.ts +++ b/vscode-ext/src/extension.ts @@ -3,6 +3,7 @@ import * as path from 'path'; import * as ptyManager from './pty-manager'; import { DormouseViewProvider } from './webview-view-provider'; import { attachRouter, flushAllSessions, getAlertStates } from './message-router'; +import { closePoppedOutSessions } from './agent-browser-host'; import { getWebviewHtml } from './webview-html'; import { log } from './log'; import { mergeAlertStates, refreshSavedSessionStateFromPtys } from './session-state'; @@ -203,6 +204,10 @@ export function activate(context: vscode.ExtensionContext) { export async function deactivate() { if (!extensionContext) return; log.info('[deactivate] starting'); + // Close any headed pop-out windows first so quitting never orphans a real + // Chrome window (spec → "Headed Pop-Out" lifecycle). + log.info('[deactivate] closing popped-out browser windows'); + await closePoppedOutSessions(); // Save session state while PTYs are still alive — CWD and scrollback // queries need live processes. Must happen before gracefulKillAll. log.info('[deactivate] flushing sessions from webview'); diff --git a/vscode-ext/src/iframe-proxy-host.ts b/vscode-ext/src/iframe-proxy-host.ts index 76114b4b..c4c49ee9 100644 --- a/vscode-ext/src/iframe-proxy-host.ts +++ b/vscode-ext/src/iframe-proxy-host.ts @@ -2,8 +2,8 @@ * VS Code extension-host binding for the iframe transparent proxy. * * The proxy itself is host-agnostic and lives in `lib/src/host/iframe-proxy.ts` - * (shared with the Tauri sidecar — see docs/specs/dor-iframe.md → "The - * Transparent Proxy"). This file only injects the VS Code logger; the + * (shared with the Tauri sidecar — see docs/specs/dor-browser.md → "The + * transparent proxy"). This file only injects the VS Code logger; the * message-router calls `createIframeProxyUrl` exactly as before. */ import { createIframeProxyUrl as createProxy } from '../../lib/src/host/iframe-proxy'; diff --git a/vscode-ext/src/message-router.ts b/vscode-ext/src/message-router.ts index 833fd692..5bfe37c0 100644 --- a/vscode-ext/src/message-router.ts +++ b/vscode-ext/src/message-router.ts @@ -13,7 +13,7 @@ import type { TerminalSemanticEvent } from '../../lib/src/lib/terminal-state'; import type { PersistedSession } from '../../lib/src/lib/session-types'; import type { WebviewMessage, ExtensionMessage } from './message-types'; import type { DorControlRequest } from './pty-manager'; -import { createStreamRelayUrl, runAgentBrowserCommand, runAgentBrowserEdit, runAgentBrowserScreenshot } from './agent-browser-host'; +import { createStreamRelayUrl, runAgentBrowserCommand, runAgentBrowserEdit, runAgentBrowserOpen, runAgentBrowserPopIn, runAgentBrowserPopOut, runAgentBrowserScreenshot, runAgentBrowserStreamStatus } from './agent-browser-host'; import { createIframeProxyUrl } from './iframe-proxy-host'; import { log } from './log'; @@ -386,6 +386,16 @@ export function attachRouter( } satisfies ExtensionMessage); }); break; + case 'agentBrowser:streamStatus': + runAgentBrowserStreamStatus( + msg.session, + typeof msg.binaryPath === 'string' ? msg.binaryPath : undefined, + ).then((result) => { + webview.postMessage({ + type: 'agentBrowser:streamStatusResult', requestId: msg.requestId, ...result, + } satisfies ExtensionMessage); + }); + break; case 'agentBrowser:getStreamUrl': { const streamPort = Number.isInteger(msg.port) && msg.port > 0 && msg.port <= 65535 ? msg.port : null; if (!streamPort) { @@ -401,6 +411,33 @@ export function attachRouter( ); break; } + case 'agentBrowser:open': + runAgentBrowserOpen( + typeof msg.url === 'string' ? msg.url : '', + { headed: msg.headed === true }, + typeof msg.binaryPath === 'string' ? msg.binaryPath : undefined, + ).then((result) => { + webview.postMessage({ type: 'agentBrowser:openResult', requestId: msg.requestId, ...result } satisfies ExtensionMessage); + }); + break; + case 'agentBrowser:popOut': + runAgentBrowserPopOut( + msg.session, + { url: typeof msg.url === 'string' ? msg.url : undefined, rect: msg.rect }, + typeof msg.binaryPath === 'string' ? msg.binaryPath : undefined, + ).then((result) => { + webview.postMessage({ type: 'agentBrowser:popResult', requestId: msg.requestId, ...result } satisfies ExtensionMessage); + }); + break; + case 'agentBrowser:popIn': + runAgentBrowserPopIn( + msg.session, + { url: typeof msg.url === 'string' ? msg.url : undefined }, + typeof msg.binaryPath === 'string' ? msg.binaryPath : undefined, + ).then((result) => { + webview.postMessage({ type: 'agentBrowser:popResult', requestId: msg.requestId, ...result } satisfies ExtensionMessage); + }); + break; case 'iframe:createProxyUrl': createIframeProxyUrl(typeof msg.url === 'string' ? msg.url : '').then( (result) => webview.postMessage({ diff --git a/vscode-ext/src/message-types.ts b/vscode-ext/src/message-types.ts index 0f62e385..5cdd70f3 100644 --- a/vscode-ext/src/message-types.ts +++ b/vscode-ext/src/message-types.ts @@ -1,7 +1,7 @@ import type { ActivityNotification, SessionStatus, TodoState } from '../../lib/src/lib/alert-manager'; import type { TerminalSemanticEvent } from '../../lib/src/lib/terminal-state'; import type { DorControlRequestPayload, DorControlResponsePayload } from '../../dor/src/protocol'; -import type { IframeProxyResult, OpenPort } from '../../lib/src/lib/platform/types'; +import type { AgentBrowserStreamStatusResult, IframeProxyResult, OpenPort } from '../../lib/src/lib/platform/types'; import type { VSCodeWorkbenchCommand } from '../../lib/src/lib/vscode-keybindings'; // Messages from webview → extension host @@ -21,7 +21,11 @@ export type WebviewMessage = | { type: 'agentBrowser:command'; session: string; args: string[]; binaryPath?: string; requestId: string } | { type: 'agentBrowser:edit'; session: string; op: 'selectAll' | 'copy' | 'cut'; binaryPath?: string; requestId: string } | { type: 'agentBrowser:screenshot'; session: string; format?: 'jpeg' | 'png'; quality?: number; binaryPath?: string; requestId: string } + | { type: 'agentBrowser:streamStatus'; session: string; binaryPath?: string; requestId: string } | { type: 'agentBrowser:getStreamUrl'; port: number; requestId: string } + | { type: 'agentBrowser:open'; url: string; headed?: boolean; binaryPath?: string; requestId: string } + | { type: 'agentBrowser:popOut'; session: string; url?: string; rect?: { x: number; y: number; width: number; height: number }; binaryPath?: string; requestId: string } + | { type: 'agentBrowser:popIn'; session: string; url?: string; binaryPath?: string; requestId: string } | { type: 'iframe:createProxyUrl'; url: string; requestId: string } | { type: 'dormouse:init' } | { type: 'dormouse:saveState'; state: unknown } @@ -62,7 +66,10 @@ export type ExtensionMessage = | { type: 'agentBrowser:commandResult'; requestId: string; exitCode: number; stdout: string; stderr: string } | { type: 'agentBrowser:editResult'; requestId: string; ok: boolean; text?: string; error?: string } | { type: 'agentBrowser:screenshotResult'; requestId: string; ok: boolean; bytes?: Uint8Array; mime?: string; error?: string } + | ({ type: 'agentBrowser:streamStatusResult'; requestId: string } & AgentBrowserStreamStatusResult) | { type: 'agentBrowser:streamUrl'; requestId: string; url: string | null } + | { type: 'agentBrowser:openResult'; requestId: string; ok: boolean; session?: string; wsPort?: number; binaryPath?: string; error?: string } + | { type: 'agentBrowser:popResult'; requestId: string; ok: boolean; wsPort?: number; error?: string } | { type: 'iframe:proxyUrl'; requestId: string; result: IframeProxyResult } | { type: 'dormouse:newTerminal'; diff --git a/vscode-ext/src/webview-html.ts b/vscode-ext/src/webview-html.ts index 4e8bf2db..5979f77c 100644 --- a/vscode-ext/src/webview-html.ts +++ b/vscode-ext/src/webview-html.ts @@ -30,13 +30,13 @@ export function getWebviewHtml( `font-src ${webview.cspSource}`, `img-src ${webview.cspSource} data: blob:`, // ws: entries cover the agent-browser stream relay (frames + input for - // browser surfaces; see docs/specs/dor-agent-browser.md). + // browser surfaces; see docs/specs/dor-browser.md). `connect-src ${webview.cspSource} ws://127.0.0.1:* ws://localhost:*`, // `dor iframe` frames its target through a loopback transparent proxy that // the extension host stands up (iframe-proxy-host.ts), so the only origin we // ever embed is 127.0.0.1/localhost on an OS-assigned port. Without a // frame-src override the `default-src 'none'` fallback blocks the frame - // outright, leaving a blank (white) pane. See docs/specs/dor-iframe.md. + // outright, leaving a blank (white) pane. See docs/specs/dor-browser.md. `frame-src http://127.0.0.1:* http://localhost:*`, ].join('; '); diff --git a/website/src/entry.client.tsx b/website/src/entry.client.tsx index 550d2bfa..31768be9 100644 --- a/website/src/entry.client.tsx +++ b/website/src/entry.client.tsx @@ -1,10 +1,17 @@ +import { StrictMode } from "react"; import ReactDOM from "react-dom/client"; import { HydratedRouter } from "react-router/dom"; import "./index.css"; -// Intentionally not wrapped in <React.StrictMode>. The desktop playground's -// Wall/dockview setup is not idempotent across StrictMode's dev-only double -// mount: the first onReady consumes initialPaneIds, so the remount's onReady -// loses `tut-main` and PlaygroundDesktop's addPanel referencePanel throws. -ReactDOM.hydrateRoot(document, <HydratedRouter />); +// StrictMode everywhere (matches the lib web + standalone entries and Storybook): +// the dev-only double mount is now idempotent end to end. The Wall's onReady +// caches its initial restoration instead of consuming it (use-dockview-ready.ts), +// so the remount re-creates `tut-main` and the playground's addPanel no longer +// loses its referencePanel. +ReactDOM.hydrateRoot( + document, + <StrictMode> + <HydratedRouter /> + </StrictMode>, +);