feat(scheduledrun): add cron-based agent execution#2097
Conversation
There was a problem hiding this comment.
Pull request overview
Introduces a new v1alpha2 ScheduledRun capability to execute Agents/SandboxAgents on a cron schedule, spanning CRD + controller scheduler, REST API endpoints, UI creation/list/detail flows, and supporting metrics/RBAC updates.
Changes:
- Adds ScheduledRun CRD/types plus controller scheduler/controller logic and Prometheus metrics.
- Adds REST API surface for ScheduledRuns (list/get/create/update/delete/trigger) and UI pages/components to manage schedules and view run history.
- Updates RBAC and agent listing to support schedulable agent selection and ScheduledRun resource access.
Reviewed changes
Copilot reviewed 35 out of 37 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| ui/src/types/index.ts | Adds frontend types for ScheduledRun CRD, status, and run history. |
| ui/src/lib/scheduledRuns.ts | UI helpers for rendering ScheduledRun target refs and display status. |
| ui/src/lib/formatDateTime.ts | Adds a shared datetime formatting helper for schedule/run history views. |
| ui/src/components/schedules/ScheduledRunList.tsx | New schedules list view with trigger/edit/delete actions. |
| ui/src/components/schedules/RunHistoryTable.tsx | New run-history table for ScheduledRun detail page. |
| ui/src/components/Header.tsx | Adds navigation entries for Scheduled Runs and “New Scheduled Run”. |
| ui/src/components/DeleteAgentButton.tsx | Adds toast feedback for agent deletion success/failure. |
| ui/src/app/schedules/page.tsx | Adds /schedules route entrypoint. |
| ui/src/app/schedules/new/page.tsx | Adds create/edit ScheduledRun form page. |
| ui/src/app/schedules/[namespace]/[name]/page.tsx | Adds ScheduledRun detail page with run history and trigger/suspend actions. |
| ui/src/app/actions/scheduledRuns.ts | Adds server actions to call ScheduledRun REST endpoints and revalidate UI paths. |
| ui/src/app/actions/agents.ts | Adds getSchedulableAgents() to list agents without AgentHarness rows. |
| helm/kagent/templates/rbac/writer-role.yaml | Grants write permissions for ScheduledRun resources/finalizers. |
| helm/kagent/templates/rbac/getter-role.yaml | Grants get/list/watch and status access for ScheduledRuns. |
| helm/kagent-crds/templates/kagent.dev_scheduledruns.yaml | Helm-templated ScheduledRun CRD manifest. |
| go/go.mod | Adds robfig/cron dependency for scheduler and API validation. |
| go/go.sum | Records robfig/cron module checksums. |
| go/core/test/e2e/scheduledrun_api_test.go | Adds end-to-end REST API lifecycle tests for ScheduledRuns. |
| go/core/pkg/app/app.go | Wires ScheduledRun scheduler/controller into manager and HTTP server. |
| go/core/internal/scheduledrun/target.go | Shared utilities for resolving/validating ScheduledRun targets. |
| go/core/internal/metrics/scheduledrun.go | Adds Prometheus metrics for dispatch/outcomes/durations/active schedules. |
| go/core/internal/httpserver/server.go | Adds ScheduledRuns routes and handler wiring in HTTP server. |
| go/core/internal/httpserver/handlers/test_helpers_test.go | Registers ScheduledRun types in handler test scheme. |
| go/core/internal/httpserver/handlers/scheduledruns.go | Implements ScheduledRuns REST handlers + schedule validation + trigger endpoint. |
| go/core/internal/httpserver/handlers/scheduledruns_test.go | Adds unit tests for ScheduledRuns handler behaviors. |
| go/core/internal/httpserver/handlers/handlers.go | Adds ScheduledRuns handler to handlers bundle (conditional on trigger availability). |
| go/core/internal/httpserver/handlers/agents.go | Adds query param to exclude AgentHarness rows from agent list responses. |
| go/core/internal/httpserver/handlers/agents_test.go | Adds tests for excluding AgentHarness and ScheduledRun interactions with deletions. |
| go/core/internal/controller/scheduledrun_scheduler.go | Implements cron scheduling, dispatch, run-history recording, and outcome polling. |
| go/core/internal/controller/scheduledrun_scheduler_test.go | Adds scheduler unit tests for scheduling and runOnce behavior. |
| go/core/internal/controller/scheduledrun_controller.go | Adds ScheduledRun controller reconcile logic (validation, nextRunTime, Accepted condition). |
| go/core/internal/controller/scheduledrun_controller_test.go | Adds controller unit tests for acceptance/rejection scenarios. |
| go/core/internal/a2a/agent_client_registry.go | Adds route-key based send method for A2A client registry lookups. |
| go/core/internal/a2a/a2a_handler_mux.go | Exposes helpers to compute route keys for Agent vs SandboxAgent. |
| go/api/v1alpha2/zz_generated.deepcopy.go | Updates generated deep-copy code for new ScheduledRun API types. |
| go/api/v1alpha2/scheduledrun_types.go | Adds ScheduledRun API types, validation markers, and constants. |
| go/api/config/crd/bases/kagent.dev_scheduledruns.yaml | Adds controller-gen base CRD for ScheduledRun. |
Files not reviewed (1)
- go/api/v1alpha2/zz_generated.deepcopy.go: Generated file
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Adds the v1alpha2 ScheduledRun CRD, controller scheduler, REST API, metrics, and UI flow for creating, listing, viewing, updating, and manually triggering scheduled agent runs. ScheduledRun targets resolve through a shared same-namespace Agent/SandboxAgent reference path. Suspended runs do not schedule next executions and cannot be manually triggered until resumed. RunStatus unifies dispatch and outcome into a single enum (DispatchFailed/Pending/Succeeded/Failed/Timeout); RunHistoryEntry is StartTime, EndTime, SessionID, Status, Message. The outcome poller is restart-safe: Pending entries are resumed on Start so a pod restart between dispatch and terminal resolution does not leave entries stuck Pending. CRD admission pattern annotations enforce DNS-label constraints, so duplicate handler-side DNS validators are removed. Service creation is enabled for agents.x-k8s.io Sandboxes so SandboxAgent A2A endpoints are reachable in real e2e deployments. Includes CRD/Helm generation plus focused unit and e2e coverage. Co-authored-by: Codex <codex@openai.com> Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: 0xLeo258 <noixe0312@gmail.com>
f91aa90 to
71b93d8
Compare
- Drop redundant maxRunHistory bounds check in the HTTP handler; the CRD already enforces Minimum=1/Maximum=100 with a default of 10, so the handler-side check was unreachable for valid inputs and inconsistent with its own error message. - formatDateTime: return "-" for unparseable inputs to match the documented contract. - Watch Agent/SandboxAgent so target create/delete drives a Reconcile; otherwise a previously-rejected SR never recovers when the target appears, and cron keeps firing into AgentNotFound after a delete. - runOnce: return an error when the RunHistory append fails and skip spawning the outcome poller. Manual triggers no longer report success when the run was never persisted. - Add focused tests for pollSessionOutcome (succeeded/failed/timeout), spawnOutcomePoller (SessionID-matched write), and resumePendingPollers (restart-safe re-spawn, skipping terminal/empty entries). Signed-off-by: 0xLeo258 <noixe0312@gmail.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Codex <codex@openai.com>
9442180 to
a8fdf4b
Compare
EItanya
left a comment
There was a problem hiding this comment.
This PR is looking good overall, but there are definitely a few really important pieces that need looking at. I also haven't had a chance to dive deep on the cron logic yet, which I will do.
I will leave UI review here to @peterj
| "k8s.io/apimachinery/pkg/runtime" | ||
| ) | ||
|
|
||
| // AnnotationCreatedBy records the user identity that created a ScheduledRun. |
There was a problem hiding this comment.
What if the resource is created with kubectl apply? I don't think we can rely on knowing the user who created a scheduled run. Rather these runs most likely need to belong to some special group which can be shared with other users. What do you think?
There was a problem hiding this comment.
I don't think we can rely on knowing the user who created a scheduled run
That's valid I think.
Rather these runs most likely need to belong to some special group which can be shared with other users
Maybe we can support it in the future, but in this PR let's remove it :-)
| // AgentRef is a reference to the Agent or SandboxAgent to execute. If | ||
| // Namespace is empty it defaults to the ScheduledRun's namespace. | ||
| // +required | ||
| AgentRef AgentReference `json:"agentRef"` |
There was a problem hiding this comment.
Can we use a TypedObjectReference here so that we can potentially support more in the future?
There was a problem hiding this comment.
Hi @EItanya,
Do you mean that we can have MCP crons in the future? Would love to change so
| // Suspend pauses scheduling and manual triggers when set to true. | ||
| // +optional | ||
| // +kubebuilder:default=false | ||
| Suspend bool `json:"suspend,omitempty"` |
There was a problem hiding this comment.
Why would this get rid of manual triggers? Those are user initiated anyway, so I think the most useful thing is to make sure scheduled runs are paused
| // +optional | ||
| // +kubebuilder:validation:MaxLength=63 | ||
| // +kubebuilder:validation:Pattern=`^[a-z0-9]([-a-z0-9]*[a-z0-9])?$` | ||
| Namespace string `json:"namespace,omitempty"` |
There was a problem hiding this comment.
Why do we accept Namespace if we don't actually allow cross namespace references? I think we should omit it until we decide we want it
There was a problem hiding this comment.
can we put this in a different go package
| } | ||
|
|
||
| agentsWithID, err := h.listAgentResponses(r.Context(), log, opts...) | ||
| includeAgentHarness := r.URL.Query().Get("includeAgentHarness") != "false" |
There was a problem hiding this comment.
If we are going to add filters to this page I would much rather do it generically than using oneoff filters like this. Can you omit this from this PR and do a separate one with that change?
| if scheduledRunTrigger != nil { | ||
| handlers.ScheduledRuns = NewScheduledRunsHandler(base, scheduledRunTrigger) | ||
| } |
There was a problem hiding this comment.
This smells like agent coding. Why would this ever be nil?
| // zone. Both are checked at the API edge so a bad request is | ||
| // rejected with 400 before it ever reaches the controller, where the same | ||
| // invariants are re-checked against the persisted object. | ||
| func ValidateSchedule(schedule, timeZone string) *errors.APIError { |
There was a problem hiding this comment.
The rest of the schedule run code should be in this package, specifically the scheduler currently in the controller
|
Hi @EItanya, thanks for the dedicated review. |
Adds the v1alpha2 ScheduledRun CRD, controller scheduler, REST API, metrics, and UI flow for creating, listing, viewing, updating, and manually triggering scheduled agent runs.
ScheduledRun targets resolve through a shared same-namespace Agent/SandboxAgent reference path. Suspended runs do not schedule next executions and cannot be manually triggered until resumed.
RunStatus unifies dispatch and outcome into a single enum (DispatchFailed/Pending/Succeeded/Failed/Timeout); RunHistoryEntry is StartTime, EndTime, SessionID, Status, Message. The outcome poller is restart-safe: Pending entries are resumed on Start so a pod restart between dispatch and terminal resolution does not leave entries stuck Pending.
CRD admission pattern annotations enforce DNS-label constraints, so duplicate handler-side DNS validators are removed. Service creation is enabled for agents.x-k8s.io Sandboxes so SandboxAgent A2A endpoints are reachable in real e2e deployments.
Includes CRD/Helm generation plus focused unit and e2e coverage.