feat: report /metrics for the OpenXLA serve path (#449 M3 Stage 2d) by inureyes · Pull Request #485 · lablup/mlxcel

inureyes · 2026-06-29T17:42:58Z

Summary

The XLA serve worker received the batch_metrics / batch_observability handles but populated neither, so /metrics reported all zeros for OpenXLA serving (active slots, queue depth, sequences, token throughput) even under load. This makes the path observable, mirroring how the MLX BatchScheduler populates the same metrics. Part of #449 M3 Stage 2d.

Stacked on #484 (sharded loading) for a clean incremental diff; this PR only touches the server worker plumbing.

What changed

XlaServeWorker now holds both handles and populates them:

BatchMetrics: active-count and queue-depth gauges each serve iteration (from the engine's active_len/pending_len), plus per-sequence completion with the generated-token count.
BatchObservability: record_prefill_start on admit (sequences started + prompt tokens), record_decode_step per pump with the step's token count (decode tokens + steps), record_sequence_completed on finish. The cache-pool / paged gauges stay zero (this path has neither; slots_available already conveys the live batch size).

Validation (E2E, CUDA on GB10)

With --metrics on Qwen2.5-0.5B, three concurrent /v1/completions (prompts of 5/4/5 tokens, 16 tokens each):

metric	before	after
`mlxcel_batch_sequences_started`	0	3
`mlxcel_batch_sequences_completed`	0	3
`mlxcel_batch_prefill_tokens_total`	0	14 (5+4+5)
`mlxcel_batch_decode_tokens_total`	0	48 (3×16)
`mlxcel_batch_decode_steps_total`	0	15 (continuous batching)
`mlxcel_slots_available` / `mlxcel_queue_depth`	4 / 0	4 / 0 (drained)

The MLX serving path is unchanged.

Refs #449.

The XLA serve worker received the `batch_metrics` / `batch_observability` handles but populated neither, so the `/metrics` endpoint reported all zeros for OpenXLA serving (active slots, queue depth, sequences, token throughput) even under load. The path was operationally blind. Thread both handles into `XlaServeWorker` and populate them the same way the MLX `BatchScheduler` does: - `BatchMetrics`: the active-count and queue-depth gauges each serve iteration (from the engine's `active_len` / `pending_len`), and a per-sequence completion with its generated-token count. - `BatchObservability`: `record_prefill_start` on admit (sequences started + prompt tokens), `record_decode_step` per pump with the step's token count (decode tokens + steps), and `record_sequence_completed` on finish. The cache-pool / paged gauges stay zero (this path has neither; `slots_available` already conveys the live batch size). Validation (E2E, CUDA on GB10): with `--metrics` on Qwen2.5-0.5B, three concurrent `/v1/completions` (prompts of 5/4/5 tokens, 16 tokens each) move the metrics exactly as expected: `sequences_started` and `sequences_completed` 0 -> 3, `prefill_tokens` 0 -> 14 (5+4+5), `decode_tokens` 0 -> 48 (3x16), `decode_steps` 15 (continuous batching), and the `slots_available` / `queue_depth` gauges return to 4 / 0 once drained. The MLX serving path is unchanged.

inureyes added area:architecture Architecture and code structure changes priority:medium Medium priority status:done Completed type:enhancement New features, capabilities, or significant additions labels Jun 29, 2026

inureyes force-pushed the feat/449-xla-serve-metrics branch from 564a460 to e39c3f0 Compare June 29, 2026 17:48

inureyes changed the base branch from feat/449-xla-sharded-safetensors to main June 29, 2026 17:48

inureyes force-pushed the feat/449-xla-serve-metrics branch from e39c3f0 to f68373c Compare June 29, 2026 17:50

inureyes merged commit b6b1a0c into main Jun 30, 2026
5 checks passed

inureyes deleted the feat/449-xla-serve-metrics branch June 30, 2026 12:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: report /metrics for the OpenXLA serve path (#449 M3 Stage 2d)#485

feat: report /metrics for the OpenXLA serve path (#449 M3 Stage 2d)#485
inureyes merged 1 commit into
mainfrom
feat/449-xla-serve-metrics

inureyes commented Jun 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

inureyes commented Jun 29, 2026

Summary

What changed

Validation (E2E, CUDA on GB10)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant