Skip to content

Add unified parse_json runtime type checker (#3285) [draft / proof-of-direction]#4063

Draft
vup903 wants to merge 4 commits into
zarr-developers:mainfrom
vup903:fix-issue-3285
Draft

Add unified parse_json runtime type checker (#3285) [draft / proof-of-direction]#4063
vup903 wants to merge 4 commits into
zarr-developers:mainfrom
vup903:fix-issue-3285

Conversation

@vup903

@vup903 vup903 commented Jun 15, 2026

Copy link
Copy Markdown

Summary

This is a draft / proof-of-direction PR for #3285, opened to show the intended approach and get early feedback before migrating all call sites — per @d-v-b's request in the issue thread.

It introduces a single unified runtime type checker, parse_json(value, type_annotation), that consolidates the scattered per-field parse_* helpers (there are currently ~42 of them across src/zarr). The function validates a JSON-decoded value against a type annotation and returns data assignable to that annotation (coercing where sensible, e.g. parse_json([1, 2, 3], tuple[int, int, int]) -> (1, 2, 3)), or raises a clear error.

What's in this draft

  • src/zarr/core/json_parse.pyparse_json plus sub-parsers, scoped to the JSON-relevant categories named in the issue:
    • primitives: None, str, int, float, bool (with a bool-vs-int-safe check: True is not accepted for int, and 1 is not accepted for bool)
    • Literal[...]
    • unions / Optional
    • fixed and variadic tuple
    • Sequence[T] / list[T] → returned as a tuple
    • Mapping[str, T] / dict[str, T]
    • TypedDict
  • tests/test_json_parse.py — 94 tests covering every category, incl. the bool/int edge cases, coercion, unions, nested TypedDict.
  • Pilot migration of three representative helpers to delegate to parse_json, with public signatures and behavior unchanged:
    • core/common.py::parse_orderparse_json(data, Literal["C", "F"])
    • core/common.py::parse_boolparse_json(data, bool)
    • core/metadata/v3.py::parse_zarr_formatparse_json(data, Literal[3]), re-wrapping the error as MetadataValidationError to preserve the original exception type and message.

Focused suites pass locally (Python 3.12, test.py3.12-optional env): test_common, test_config, test_metadata, and test_json_parse430 passed.

Open questions for maintainers (direction feedback)

  1. Module name/locationsrc/zarr/core/json_parse.py is a placeholder; happy to move it (e.g. into a typing/metadata utility module).
  2. Exceptionsparse_json raises plain ValueError / TypeError since it's field-agnostic; field-specific callers (like parse_zarr_format) re-wrap into their domain error. Does that split match what you'd want?
  3. TypedDict + NotRequired — the current _parse_typeddict reads __required_keys__ / __optional_keys__ directly. Under from __future__ import annotations with class-syntax TypedDicts, typing_extensions can't see the NotRequired wrapper at class-creation time, so an optional key can be mis-classified as required. The robust fix is to resolve hints via typing_extensions.get_type_hints(..., include_extras=True) and inspect for Required/NotRequired. I've left this for the follow-up once the overall direction is confirmed, since none of the three pilot helpers are TypedDicts.

If the direction looks right, I'll migrate the remaining parse_* call sites incrementally in follow-up commits.

Closes #3285 once fully migrated (draft for now).

vup903 and others added 4 commits June 14, 2026 22:51
Introduce zarr.core.json_parse.parse_json, a single type-annotation-driven validator that consolidates the scattered per-field parse_* helpers. Handles primitives, Literal, unions/Optional, fixed and variadic tuples, Sequence/list (coerced to tuple), Mapping/dict, and TypedDict, with a bool-vs-int safe primitive check. Adds tests/test_json_parse.py (94 tests) and a changelog fragment. No existing call sites migrated yet; this is the proof-of-direction module.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…r-developers#3285)

Pilot migration delegating three representative helpers to the unified parse_json validator. Public signatures and return types are unchanged. parse_zarr_format re-wraps parse_json's ValueError/TypeError as MetadataValidationError with the original message to preserve observable behavior. parse_json is imported function-locally to avoid circular imports. Focused suites green: test_common/test_config/test_metadata/test_json_parse = 430 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ns (zarr-developers#3285)

Apply ruff check/format to satisfy the Lint CI hook. Keep typing.Union/Optional spellings in tests (with noqa) to cover that origin path alongside X | Y. Rework _parse_typeddict to derive required/optional from get_type_hints(include_extras=True) + __total__ instead of __required_keys__, so class-syntax NotRequired is detected even when 'from __future__ import annotations' stringizes the hints (as in zarr's metadata modules). test_json_parse + test_common + test_metadata = 393 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lopers#3285)

get_origin(hint) is Required/NotRequired tripped mypy's comparison-overlap and unreachable checks; typing origin as Any keeps the runtime check while satisfying mypy. Full 'uv run --frozen mypy' is clean (190 files); ruff check/format clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

unified runtime type checking for our json data

1 participant