Skip to content

Migrate 4 API versions to shared arm_ml_service client#47664

Open
saanikaguptamicrosoft wants to merge 19 commits into
Azure:mainfrom
saanikaguptamicrosoft:saanika/migrate-2022-10-and-smoke-coverage
Open

Migrate 4 API versions to shared arm_ml_service client#47664
saanikaguptamicrosoft wants to merge 19 commits into
Azure:mainfrom
saanikaguptamicrosoft:saanika/migrate-2022-10-and-smoke-coverage

Conversation

@saanikaguptamicrosoft

@saanikaguptamicrosoft saanikaguptamicrosoft commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Notes

  • Reuse the existing arm_ml_service generated client which is on stable version 2025-12-01, instead of importing from the per-version restclient package. Versions migrated-
    1. 2022-10-01-preview
    2. 2022-12-01-preview
    3. 2023-02-01-preview
    4. 2023-06-01-preview
  • Added smoke serialization tests which can be reused for future migration work also.

Description

Please add an informative description that covers that changes made by the pull request and link all relevant issues.

If an SDK is being regenerated based on a new API spec, a link to the pull request containing these API spec changes should be included above.

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

Add variation cases per migrated entity to catch value/path-dependent wire
regressions in future migrations: command (minimal, serverless, aml/user/managed
identity, local compute, docker_args list, pytorch, tensorflow); sweep (median +
truncation policies, grid/bayesian sampling, log-uniform/randint/quniform search
space); spark (dynamic allocation + identity); import (file source); schedule
(recurrence trigger + embedded spark job); finetuning (custom minimal, aoai
minimal). Baselines captured from main; suite 46 passed.
Compute entities span the still-to-migrate API versions (v2022-10/12, v2023-08),
so this guards their request-wire before those migrations. Adds per-family builder
modules + auto-discovery registry (_registry.all_builders) so new families need no
generator edit. Baselines captured from main. Suite 60 passed.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands the offline wire-serialization smoke suite for azure-ai-ml (tests/smoke_serialization). The suite captures the canonical JSON body each entity PUTs on the wire (via entity._to_rest_object()) and asserts it is byte-identical to a committed baseline captured from main, so an upcoming REST-client migration (e.g. the 2022-10 API surface) can be proven to preserve the wire. This PR adds many new entity families and CommandJob/SweepJob/SparkJob/ImportJob/Schedule/FineTuning variations to that coverage, and refactors baseline generation to auto-discover builder modules.

Changes:

  • Adds new deterministic builder modules and parametrized test_*_wire.py tests for workspace/registry, feature-store/data-import, online/batch endpoints, datastores, computes, components, AutoML jobs, and assets, plus their committed expected_wire/*.json baselines.
  • Extends _builders.py with additional CommandJob (identity/distribution/local/serverless/docker-args), SweepJob (median/truncation policies), SparkJob (dynamic allocation), ImportJob (file source), Schedule (recurrence), and finetuning minimal variants.
  • Introduces _registry.all_builders() auto-discovery of every _builders*.py module and rewrites regenerate_expected_wire.py to use it and to aggregate/report failures.

Reviewed changes

Copilot reviewed 72 out of 72 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
_registry.py New auto-discovery of all *_BUILDERS dicts across _builders*.py; dedupes case names.
_builders.py Adds CommandJob/SweepJob/SparkJob/ImportJob/Schedule/FineTuning variants; drops unused Environment import; adds distribution/identity/search-space/trigger imports.
regenerate_expected_wire.py Switches to all_builders(); collects per-case failures and returns non-zero on failure.
_builders_workspace.py New Workspace/Registry builders.
_builders_featurestore.py New FeatureSet/FeatureStoreEntity/DataImport builders.
_builders_endpoint.py New managed online + batch endpoint builders via a _RestAdapter for location-bound rest methods.
_builders_datastore.py New blob/file/ADLS gen1+gen2/OneLake datastore builders.
_builders_compute.py New AmlCompute/ComputeInstance/Kubernetes/SynapseSpark/VM compute builders.
_builders_component.py New Command/Spark component builders (two docstring inaccuracies flagged).
_builders_automl.py New tabular/nlp/image AutoML builders sharing module-level train/valid inputs.
_builders_asset.py New Model/Environment/Data/Code asset builders.
test_*_wire.py (8 new) Parametrized serialize-guard + wire-equivalence tests per family, matching the existing two-check pattern.
expected_wire/*.json (40+ new) Committed wire baselines, one per builder case; all builder case names have a matching baseline.

Comment thread sdk/ml/azure-ai-ml/tests/smoke_serialization/_builders_component.py Outdated
Comment thread sdk/ml/azure-ai-ml/tests/smoke_serialization/_builders_component.py Outdated
Flip the compute entity family (Kubernetes, SynapseSpark, VirtualMachine,
AmlCompute, ComputeInstance, and shared identity/credentials) from the
per-version msrest restclients (v2022_10/v2022_12/v2023_08) to the unified
arm_ml_service hybrid client. These share ComputeResource and
IdentityConfiguration so they must convert together.

Wire output is verified byte-identical to the pre-migration main baseline via
the offline serialization smoke suite (120 assertions green).

Key arm-hybrid adaptations:
- Fields present on the wire (swagger) but not modeled as typed attributes on
  the arm models are set/read via mapping-key access instead of constructor
  args / attribute access:
  - ComputeInstance: enableRootAccess, releaseQuotaOnStop, enableOSPatching
  - AmlCompute: osType ("Linux"), createdOn read via .get()
  - CustomService: type and extra fields (no msrest additional_properties)
- Kubernetes/Synapse construct nested properties positionally instead of
  from_dict; SynapseSpark drops the redundant inner name (matches swagger).

Tests:
- Update compute unit tests to the arm serialization path (SdkJSONEncoder
  instead of msrest Serializer) and arm in-memory representation (timedelta
  instead of ISO duration strings; mapping-key reads for untyped wire fields).
- Fix the kubernetes smoke builder to use realistic property keys and recapture
  its baseline from main.
The 2022-12-01-preview client had a single production consumer (AmlCompute),
which was migrated to arm_ml_service in the previous commit. Remove the now
unused generated client.

- Drop the unused mock_aml_services_2022_12_01_preview test fixture.
- Refresh stale v2022_12_01 doc comments in trigger.py (the actual Rest*Trigger
  models are imported from v2023_04_01_preview).
Flip all remaining v2022_10_01_preview imports to the shared arm_ml_service
hybrid models/client: registry entities, compute/datastore/schedule schemas,
data_transfer builder, code/registry operations, and the ServiceClient102022Preview
wiring in _ml_client.py.

Schema deltas preserved via wire-key assignment (verified against the
2022-10-01-preview swagger):
- DatastoreType.HDFS (@removed from arm enum) -> literal "Hdfs".
- UserCreatedAcrAccount / UserCreatedStorageAccount (@removed models) -> set
  userCreatedAcrAccount / userCreatedStorageAccount wire dicts directly.
- RegistryProperties.managed_resource_group_tags (@removed typed field, still in
  the 2022-10 contract) -> set managedResourceGroupTags wire key.

Updated registry unit tests to construct arm rest objects and read the untyped
user-created wire fields via mapping access.
All production and test consumers were migrated to arm_ml_service in the
previous commits. Remove the now unused generated client.

- Repoint compute operations unit test fixture to mock_aml_services_2023_08_01_preview
  (compute ops' primary client is v2023_08).
- Drop the unused mock_aml_services_2022_10_01_preview conftest fixture.
- Remove an unused ManagedIdentity import from test_spark_component_entity.
… client

Flip the v2023_02_01_preview consumers onto the shared arm_ml_service client and
remove the generated folder:
- sweep RandomSamplingAlgorithmRule/SamplingAlgorithmType (schema) -> arm (enum
  values identical).
- Notification NotificationSetting -> arm (gains an unused optional 'webhooks'
  field; wire byte-identical).
- inferencing_server models (Custom/Triton/Online inferencing, Route,
  OnlineInferenceConfiguration) re-sourced v2023_02 -> v2023_08 (attribute maps
  byte-identical); these are model_package/inferencing surfaces NOT present in
  arm_ml_service, so they consolidate onto v2023_08 with the rest of that family.
- Wire ServiceClient022023Preview as a 2023-02-01-preview partial of the arm
  client; repoint the conftest mock fixture to arm_ml_service.
- Refresh stale v2023_02 docstring type refs in compute/job operations.
Captures pre-migration MonitorSchedule wire baselines for data drift, data
quality, prediction drift, feature attribution, model performance, custom,
generation safety/quality, generation token statistics, and no-target-baseline.
out_of_the_box excluded (pre-existing _signals UnboundLocalError on no-signal
monitors). Suite: 138 assertions green.
… client

Migrate the monitoring subtree, schedule envelope, trigger, data-import,
credentials and compute-runtime off v2023_06_01_preview onto the shared
arm_ml_service client, keeping the serialized wire byte-identical (verified by
the offline monitoring/schedule smoke baselines).

Monitoring serialization:
- TrailingInputData uses arm RollingInputData with inputDataType overridden to
  "Trailing" (arm renamed the type to "Rolling"; the pinned 2023-06 contract
  still expects "Trailing").
- Signals present in arm (DataDrift/DataQuality/PredictionDrift/
  FeatureAttributionDrift/Custom) construct the arm model and set the fields arm
  dropped (mode/properties/dataSegment/modelType/workspaceConnection) via their
  camelCase wire keys.
- Signals and thresholds removed from arm (ModelPerformance, GenerationSafety
  Quality, GenerationTokenStatistics, MonitoringDataSegment,
  MonitoringWorkspaceConnection) emit plain wire dicts.

Schedule:
- trigger.py + JobSchedule envelope flip to arm (mixed-tree otherwise). SparkJob
  (still v2023_04 msrest) is embedded as its serialized wire dict.

Other consumers: data_import re-sourced to v2023_08 (DataImport/DatabaseSource/
FileSystemSource are absent from arm; byte-identical there); credentials and
compute_runtime flipped to arm; ServiceClient062023Preview wired to the arm
partial. Schedule trigger-once stays on the v2024_01 client (arm
SchedulesOperations has no trigger method).

Tests: updated monitor/schedule unit tests for the arm camelCase as_dict shape
and timedelta window_size; corrected two stale rest_json_configs fixtures
(bogus modelType, string samplingRate). Repointed the v2023_06 mock fixture to
arm_ml_service. Delete the now unused generated client.
from azure.ai.ml._restclient.v2023_02_01_preview.models import Route as RestRoute
from azure.ai.ml._restclient.v2023_02_01_preview.models import TritonInferencingServer as RestTritonInferencingServer
from azure.ai.ml._restclient.v2023_08_01_preview.models import Route as RestRoute
from azure.ai.ml._restclient.v2023_08_01_preview.models import TritonInferencingServer as RestTritonInferencingServer

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldnt these be coming from arml_ml_service namespace like other places?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

…alize, credentials mypy

Four migration regressions caught by CI e2e/Analyze (invisible to mocked UTs):

1. Compute read path assumed arm-hybrid mapping access, but the compute ops layer
   deserializes the real GET/list response with the v2023_08 msrest client:
   - AmlCompute.created_on: read createdOn from msrest additional_properties (with
     arm .get() fallback for the entity round-trip).
   - ComputeInstance enableRootAccess/releaseQuotaOnStop/enableOSPatching: new
     _read_optional_compute_prop helper reads the typed msrest attribute first, then
     the arm wire key. Fixes 'AmlCompute'/'ComputeInstanceProperties' object has no
     attribute 'get'.

2. ImportDataSchedule._to_rest_object built a msrest v2023_04 Schedule, but the
   schedule ops now serialize with the arm SdkJSONEncoder -> 'Object of type Schedule
   is not JSON serializable'. Migrated to the shared arm Schedule envelope; ImportData
   action/DataImport are arm-absent so emit the action as a JSON-direct wire dict; read
   path rehydrates the v2023_08 msrest DataImport. Wire verified byte-identical to main.

3. _credentials.py: RestManagedServiceIdentityConfiguration/
   RestUserAssignedIdentityConfiguration were bare re-aliases of imported names, which
   mypy treats as variables (19 'not valid as a type' errors). Annotated as TypeAlias.

Coverage backfill: smoke cases for ImportDataSchedule (database + file_system), and
compute unit tests that feed a msrest v2023_08 response to _from_rest_object.
@saanikaguptamicrosoft saanikaguptamicrosoft changed the title Saanika/migrate 2022 10 and smoke coverage Migrate 4 API versions to shared arm_ml_service client Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants