fix(eval): improve rubric text normalization for judge-garbled output by tottenjordan · Pull Request #6080 · google/adk-python

tottenjordan · 2026-06-11T13:12:25Z

Summary

_normalize_text currently only does .lower().strip(), so judge-model garbling (markdown bullets, smart quotes, bold formatting, extra whitespace) causes exact rubric match failures. Rubric scores get silently dropped with only a warning log.

Changes:

Replace _normalize_text with NFKC unicode normalization, smart-quote/dash translation, and markdown artifact stripping
Add substring fallback with uniqueness guard to convert_auto_rater_response_to_score — accepts a match only when exactly one rubric candidate matches, preventing ambiguous cross-matching

Garbling patterns handled:

Input	Normalized	Match
`- The response correctly uses tools`	`the response correctly uses tools`	✅
`* The response correctly uses tools`	`the response correctly uses tools`	✅
`"The response correctly uses tools"` (smart quotes)	`the response correctly uses tools`	✅
`— The response correctly uses tools` (em dash)	`the response correctly uses tools`	✅
`– The response correctly uses tools` (en dash)	`the response correctly uses tools`	✅
`• The response correctly uses tools` (unicode bullet)	`the response correctly uses tools`	✅
`The response correctly uses tools` (double spaces)	`the response correctly uses tools`	✅
`The response… uses tools` (ellipsis)	`the response... uses tools`	✅
`réponse` (accented chars)	`réponse` (preserved)	✅

Per @surajksharma07's suggestion in #6072: uses NFKC normalization instead of ascii-ignore (preserves non-English rubrics), and adds uniqueness guard on the substring fallback.

Validation

Unit tests: 46 tests pass (44 existing + 2 new) in test_rubric_based_evaluator.py
E2E pipeline: Ran full GEPA optimization pipeline (gepa-run-8fb68a8f52-20260611-115752) with 4 rubric-based criteria, gemini-2.5-pro judge — zero "not found in rubrics" warnings across all generations

Test plan

pytest tests/unittests/evaluation/test_rubric_based_evaluator.py -v — all 46 pass
Parametrized TestNormalizeText covers all garbling patterns from issue
TestSubstringFallbackUniquenessGuard verifies unique match accepted, ambiguous match rejected
All existing tests unchanged and passing

Replace _normalize_text's simple lower().strip() with NFKC unicode normalization, smart-quote/dash translation, and markdown artifact stripping. Add substring fallback with uniqueness guard to convert_auto_rater_response_to_score for cases where normalization alone isn't sufficient. Fixes google#6072

tottenjordan · 2026-06-11T13:16:22Z

@surajksharma07 PR is up per your suggestion in #6072. Includes the NFKC normalization, smart-char mapping, and uniqueness guard on the substring fallback. 46 tests pass (44 existing + 2 new).

rohityan · 2026-06-11T22:03:51Z

/adk-pr-analyze

rohityan · 2026-06-11T22:15:02Z

Hi @tottenjordan , Thank you for your contribution! We appreciate you taking the time to submit this pull request. Please fix formatting errors.

surajksharma07 mentioned this pull request Jun 11, 2026

RubricBasedEvaluator _normalize_text too basic — fails on judge model markdown output #6072

Open

rohityan self-assigned this Jun 11, 2026

rohityan added 2 commits June 11, 2026 14:43

Merge branch 'main' into fix/rubric-text-normalization

2c60e25

Merge branch 'main' into fix/rubric-text-normalization

5cb273e

rohityan added the eval [Component] This issue is related to evaluation label Jun 11, 2026

rohityan added the request clarification [Status] The maintainer need clarification or more information from the author label Jun 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(eval): improve rubric text normalization for judge-garbled output#6080

fix(eval): improve rubric text normalization for judge-garbled output#6080
tottenjordan wants to merge 3 commits into
google:mainfrom
tottenjordan:fix/rubric-text-normalization

tottenjordan commented Jun 11, 2026

Uh oh!

tottenjordan commented Jun 11, 2026

Uh oh!

rohityan commented Jun 11, 2026

Uh oh!

rohityan commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tottenjordan commented Jun 11, 2026

Summary

Validation

Test plan

Uh oh!

tottenjordan commented Jun 11, 2026

Uh oh!

rohityan commented Jun 11, 2026

Uh oh!

rohityan commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants