Skip to content

feat(web_search): SearXNG-backed web_search tool + Docker sidecar#24

Merged
jkyberneees merged 4 commits into
mainfrom
feat/searxng-web-search
Jun 10, 2026
Merged

feat(web_search): SearXNG-backed web_search tool + Docker sidecar#24
jkyberneees merged 4 commits into
mainfrom
feat/searxng-web-search

Conversation

@jkyberneees

Copy link
Copy Markdown
Contributor

Summary

Gives the agent local-first web search with no cloud search API and no keys — the same zero-setup ethos as the bundled transcribe (whisper) and vision (MiniCPM-V) tools. A new native web_search tool queries a self-hosted SearXNG metasearch instance over the internal Docker network and returns ranked results the agent then fetches with browser / http_batch.

What's included

Native web_search tool (cmd/odek/web_search_tool.go)

  • Queries the SearXNG JSON API → ranked results (title, url, snippet, engine) + direct answers, capped by max_results.
  • Output wrapped as untrusted content (SERP snippets can carry injection payloads).
  • Gated as network_egress (prompt in restricted, allow in godmode), consistent with browser/http_batch.
  • No SSRF surface: accepts only a query string; the backend URL is fixed config. A CheckRedirect guard re-classifies redirect hops so a compromised/misconfigured SearXNG can't 3xx toward internal/metadata endpoints.
  • Resilient to the compose cold-start race: retries only on ECONNREFUSED (the precise "up but not listening" signal).
  • Registered only when web_search.base_url is set, so plain installs without a SearXNG instance don't see a dead tool.

Config (internal/config)

  • WebSearchConfig{BaseURL, Categories, Language, MaxResults, Timeout} threaded end-to-end, mirroring VisionConfig.
  • The builtinTools per-tool config params were bundled into a toolConfig struct to stop positional-parameter churn as tools are added.

Docker

  • searxng sidecar (pinned 2026.6.8-f3fab143b), co-starting with every profile, internal-only (no host port), depends_on wired.
  • docker/searxng/settings.yml enables the JSON API and disables the anti-bot limiter → no Redis/Valkey needed.
  • SEARXNG_SECRET in .env.example; both bundled configs set web_search.base_url.

Security

  • Results are nonce-wrapped <untrusted_content>; web_search:<query> added to the SECURITY.md untrusted table.
  • Egress-gated; redirect-guarded (SSRF); query-only input means no agent-controlled URL.
  • SearXNG secret_key is overridden at app load from SEARXNG_SECRET (env override verified against SearXNG source); the placeholder is the canonical ultrasecretkey so an unset secret triggers SearXNG's own warning.

Tests

Hermetic httptest SearXNG mock — happy path, max_results override vs config cap, untrusted wrapping, JSON-disabled 403, unreachable + cold-start retry, redirect-to-internal blocked, policy denial, heterogeneous/malformed answers (must not lose results), empty query, schema; plus resolveWebSearch defaults/merge. Full suite green under -race; go vet/gofmt clean; docker compose config validates.

Docs

README (new "🌐 Local Web Search" feature blurb), CHEATSHEET (config reference + standalone non-Docker recipe), SECURITY, CONFIG, TELEGRAM, docker/README, DOCKER_COMPOSE_USER_GUIDE.

Review trail

This branch was put through a high-effort /code-review and a full vprotocol v5.2.7 verification; both rounds of confirmed findings were fixed in-branch (answers parsing, SSRF redirect guard, secret-warning visibility, cold-start resilience, answer-decode robustness, doc consistency). vprotocol verdict: HumanReviewRequired (η 0.445; single-model author of code+tests+review → correlated, ρ 0.24). An independent human review is the protocol-mandated next step, with two focus areas:

  • The SearXNG integration has not been run end-to-end against a live container — all tests use a mock. A docker compose up smoke test is recommended before merge.
  • No formal spec; behavior was verified against SearXNG source rather than a contract.

🤖 Generated with Claude Code

jkyberneees and others added 4 commits June 10, 2026 09:13
Gives the agent local-first web search with no cloud API or keys, matching the
transcribe/vision zero-setup pattern.

Tool (cmd/odek/web_search_tool.go):
- Native Go tool querying a self-hosted SearXNG JSON API; returns ranked
  results (title/url/snippet/engine) + direct answers, capped by max_results.
- Output wrapped as untrusted content (SERP snippets can carry injection).
- Gated as network_egress (prompt in restricted, allow in godmode), consistent
  with browser/http_batch. The backend URL is fixed config, not agent-supplied,
  so the tool has no SSRF surface (only a query string is accepted).
- Registered only when web_search.base_url is set, so plain installs without a
  SearXNG instance don't see a dead tool.

Config (internal/config):
- WebSearchConfig{BaseURL, Categories, Language, MaxResults, Timeout} threaded
  end-to-end (FileConfig, ResolvedConfig, resolveWebSearch, overlayFile).

Wiring (cmd/odek):
- builtinTools' growing positional config params (Transcription, Vision) are
  bundled into a toolConfig struct to stop per-tool signature churn; all ~10
  call sites updated. web_search is threaded into run/serve/repl/telegram/
  schedule/subagent/mcp.

Docker:
- New `searxng` compose sidecar (pinned image), co-starting with every profile,
  internal-only (no host port), with depends_on wired on each odek service.
- docker/searxng/settings.yml enables the JSON API and disables the anti-bot
  limiter, so no Redis/Valkey is needed. SEARXNG_SECRET added to .env.example.
- Both bundled configs set web_search.base_url=http://searxng:8080.

Tests: hermetic httptest SearXNG mock covering happy path, max_results override
vs config cap, untrusted wrapping, JSON-disabled 403, unreachable backend,
policy denial, empty query, schema; resolveWebSearch defaults/merge. Full suite
green under -race.

Docs: README, SECURITY, CHEATSHEET, CONFIG, TELEGRAM, docker/README,
DOCKER_COMPOSE_USER_GUIDE.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… cold-start)

Code review of the SearXNG integration surfaced four actionable issues; fixes:

#2 (confirmed) — answers parsing emitted raw answer-object JSON. SearXNG
`answers` are objects ({"answer":"...","url":...,"template":...}), not bare
strings, so strings.Trim(rawJSON,'"') left the {...} blob intact. Decode the
`answer` field out of each object instead (also drops the fragile Trim). The
test mock used a string array, masking this — corrected to objects.

#3 (defense-in-depth) — the http.Client had no CheckRedirect. A compromised
or misconfigured SearXNG could 3xx the client toward an internal/metadata
endpoint (SSRF). Install the same per-hop re-classification guard browser and
http_batch use, capped at 10 hops. New TestWebSearch_RedirectToInternalBlocked.

#1 (hardening) — the mounted settings.yml hardcoded secret_key
"change-me-in-dot-env". Deeper tracing showed SEARXNG_SECRET *does* override
the file at app load (searx/settings_defaults.py environ_name), so the env
wiring worked — but the placeholder defeated SearXNG's built-in "secret_key
is not changed" warning, which only fires for the canonical "ultrasecretkey".
Switch the placeholder + the compose env fallback to "ultrasecretkey" so an
unset secret is loudly flagged rather than silently weak. Comments/.env.example
corrected to describe the real (app-load) override mechanism.

#4 (reliability) — `depends_on: [searxng]` only waits for container start, so
the first web_search after `compose up` could race SearXNG readiness. Rather
than a Docker healthcheck (whose probe tooling I can't verify in the upstream
image — a broken probe would deadlock odek startup), make the tool resilient:
retry only on ECONNREFUSED (the precise "up but not yet listening" signal),
2 extra attempts with a 1s backoff. Timeouts / genuine-down fail fast.

#5 (overlay whole-pointer replace) intentionally deferred — it is consistent
with the existing Skills/Memory/Transcription/Vision merge behavior; fixing
only web_search would be inconsistent.

Full suite green under -race; vet/gofmt clean; compose config validates.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Close two documentation gaps from the feature review:

- README discoverability: web_search was only mentioned in the prompt-injection
  paragraph and absent from the feature list. Add a "🌐 Local Web Search"
  strategic-feature blurb linking to the CHEATSHEET config reference.
- Non-Docker path: the CHEATSHEET only said "run SearXNG yourself" with no
  recipe. Add a concrete `docker run` snippet (reusing the repo's ready-made
  docker/searxng/settings.yml), the matching odek config, and the two settings
  that matter (search.formats json + limiter off) for bring-your-own configs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… consistency

vprotocol v5.2.7 verification of the feat/searxng-web-search branch. Three
confirmed findings repaired; others refuted (redirect hop-count matches the
httpBatch pattern; :ro mount is non-fatal per the SearXNG entrypoint; all
settings.yml keys verified valid).

F1 (robustness) — the typed `answers []struct{Answer string}` decode coupled
the critical `results` parse to the answers shape: a non-string "answer" value
(or any foreign answer type) would fail the whole json.Unmarshal and drop ALL
results. Restore the original immunity by keeping answers as []json.RawMessage
and decoding each element tolerantly — a non-conforming answer is skipped, never
fatal. New TestWebSearch_HeterogeneousAnswersDoNotLoseResults locks it in.

F2 (consistency) — the CHEATSHEET standalone recipe used searxng/searxng:latest
while compose pins 2026.6.8-f3fab143b. Pin the recipe to the same tag.

F3 (docs) — the standalone recipe's `$PWD/docker/searxng/...` volume path
assumed the repo root without saying so. State "run from the repo root".

Full suite green under -race; vet/gofmt clean; compose validates.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@cloudflare-workers-and-pages

Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
odek a1a1ad1 Commit Preview URL

Branch Preview URL
Jun 10 2026, 09:35 AM

@jkyberneees

Copy link
Copy Markdown
Contributor Author

vprotocol v5.2.7 — Verification Certificate

PR: #24 feat/searxng-web-search · head a1a1ad1 · 27 files · +806 LOC (~646 Go)
Generator: Claude Opus 4.8 (claude-opus-4-8) · Class: GeneratedCode (code + tests, single model/session)
Pipeline: single-model (B=C=D=E) — monoculture fallback, ρ at full strength

This certificate was produced against this exact head SHA before PR creation; per §0.4 (certificates bind to (PR, head_sha)) it is reproduced verbatim — no re-run, no new commits since verification.

Pre-scan (§0)

Added lines scanned for injection markers / verdict tokens / new exec sinks: clean. The untrusted→agent path (SERP results) is nonce-wrapped via wrapUntrusted; the new egress path is NetworkEgress-gated and redirect-guarded. Axis 2.8 → pass.

Nine Axes

Axis Verdict Notes
2.1 Semantic Correctness ✅ pass Explicit error/fallback paths; tolerant answer decode
2.2 Behavioral Contract ⚠️ warn No independent spec; verified against SearXNG source (answer model, env override, settings schema)
2.3 Security Surface ✅ pass Query-only input (no SSRF surface); redirect guard; results untrusted-wrapped; secret via app-load env override
2.4 Structural Integrity ✅ pass toolConfig struct halts call-site churn; mirrors httpBatch/vision
2.5 Behavioral Exploration ✅ pass Cold-start retry (ECONNREFUSED only), heterogeneous answers, 403/unreachable/denied covered
2.6 Dependency Integrity ✅ pass SearXNG image pinned (compose + recipe consistent); stdlib-only Go deps
2.7 Generator Provenance ⚠️ warn Code + tests + review: same model/session → correlated (gates ρ)
2.8 Adversarial Surface ✅ pass Redirect re-classification blocks SSRF (tested); pre-scan clean
2.9 Documentation Coverage ✅ pass README blurb, CHEATSHEET config + standalone recipe, SECURITY, docker/README, compose guide

η Derivation

Signal Weight Value
m (mutation kill) 0.34 0.67 (10 tool tests + robustness regression; no mutation runner, estimated)
o (oracle agreement) 0.24 0.35 (no independent Agent-C contract)
b (branch coverage) 0.14 0.73 (changed-line; tool error branches covered)
f (fuzz survival) 0.09 0.90 (no crashes; estimated)
s (SAST clean) 0.04 1.00 (go vet clean)
t (static depth) 0.10 1.00 (typed; compiler+vet clean)
d (doc coverage) 0.05 1.00

η_raw = 0.685 · ρ = 0.24 (family +0.10, version +0.05, spec_independence +0.05, AST ~0.02, shared-mutants ~0.02)
η = clamp(0.685 − 0.24, 0, 1) = 0.445

Verdict: HumanReviewRequired

η 0.445 < 0.80 and ρ 0.24 ∈ (0.20, 0.30]. Single-model authorship of code + tests + review cannot self-certify higher. ΔDebt ≈ 0.5 h (Low) · LOC 806 < 1,500 (standard pipeline).

Auto-repairs applied (in-branch, commit a1a1ad1)

  • F1 (robustness): typed answers decode coupled the results parse to the answers shape — a non-string answer would fail the whole json.Unmarshal and drop all results. Restored immunity via per-element tolerant decode; added TestWebSearch_HeterogeneousAnswersDoNotLoseResults.
  • F2 (consistency): pinned the CHEATSHEET standalone recipe image tag to match compose.
  • F3 (docs): added "run from the repo root" to the standalone recipe.

(Earlier rounds in commit 4ba8c3e fixed: answer-object parsing, SSRF redirect guard, secret-warning visibility, cold-start retry.)

Refuted

  • Redirect hop-count "11 vs 10" — identical to httpBatchTool's accepted guard; bounded; no security effect.
  • Retry nil-deref / body-leak / no-timeout-retry — speculative / impossible in current code (httpResp nil on transport error, returned before the defer) / intentional.
  • :ro mount + chown — non-fatal per the SearXNG entrypoint (set -u, not -e); file stays readable.
  • settings.yml invalid keys — all 8 verified present in the SearXNG default schema.

Open items for the human reviewer

  • Not yet run end-to-end against a live SearXNG container — all tests use a mock. A docker compose up smoke test is recommended before merge.
  • Axis 2.7: code, tests, and verification share one author — independently spot-check the mock assumptions against a real instance.
  • Axis 2.2: no formal spec; behavior verified against SearXNG source rather than a contract.

Generated by vprotocol v5.2.7 (single-model pipeline; ρ at full strength per §0.1 monoculture fallback).

@jkyberneees jkyberneees merged commit 3cf96f6 into main Jun 10, 2026
7 checks passed
@jkyberneees jkyberneees deleted the feat/searxng-web-search branch June 10, 2026 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant