Skip to content

feat(monitoring): add operator metrics and tenant health polling#138

Merged
GatewayJ merged 1 commit into
rustfs:mainfrom
GatewayJ:analysis/logging-monitoring-gap-20260613
Jun 14, 2026
Merged

feat(monitoring): add operator metrics and tenant health polling#138
GatewayJ merged 1 commit into
rustfs:mainfrom
GatewayJ:analysis/logging-monitoring-gap-20260613

Conversation

@GatewayJ

Copy link
Copy Markdown
Member

Type of Change

  • New Feature
  • Bug Fix
  • Documentation
  • Performance Improvement
  • Test/CI
  • Refactor
  • Other:

Related Issues

N/A

Summary of Changes

Adds production-oriented monitoring for RustFS Operator and Console, aligned with MinIO Operator-style storage health visibility.

  • Adds Prometheus text exposition for reconcile count, duration, errors, requeues, in-flight work, leader state, STS requests, HTTP requests, and tenant storage health.
  • Adds an operator observability server with /metrics, /healthz, and a real Kubernetes control-plane /readyz check.
  • Adds tenant storage health polling through RustFS admin /rustfs/admin/v3/info, including online/offline/healing drives, raw capacity/usage, object usage, write quorum, and health gauges.
  • Adds Console /metrics and initializes tracing for the Console server path.
  • Adds Helm values, metrics Service, optional ServiceMonitor resources, optional PrometheusRule alerts, and dev metrics Service manifests.

Checklist

  • I have read and followed the CONTRIBUTING.md guidelines
  • Passed make pre-commit (fmt-check + clippy + test + console-lint + console-fmt-check)
  • Added/updated necessary tests
  • Documentation updated (if needed)
  • CHANGELOG.md updated under [Unreleased] (N/A: this repository does not currently include CHANGELOG.md)
  • CI/CD passed (if applicable)

Impact

  • Breaking change (CRD/API compatibility)
  • Requires doc/config/deployment update
  • Other impact: exposes new operator metrics port 8080 by default and optional Prometheus Operator resources.

Verification

cargo check
cargo test
make pre-commit

Additional Notes

  • The controller queue metric is represented by rustfs_operator_reconcile_inflight and rustfs_operator_reconcile_requeues_total; kube-runtime does not expose internal queue depth through a stable public API.
  • helm template was not run locally because helm is not installed in this environment.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7b87bd979d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tenant_monitor.rs
@GatewayJ GatewayJ force-pushed the analysis/logging-monitoring-gap-20260613 branch from 7b87bd9 to eabc21f Compare June 14, 2026 10:28
@GatewayJ GatewayJ added this pull request to the merge queue Jun 14, 2026
Merged via the queue into rustfs:main with commit b8c5342 Jun 14, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant