DAOS-17321 ddb: Add checksum dump command to ddb C API#18543
Conversation
|
Ticket title is 'Checksum management with ddb' |
Add ddb_run_csum_dump() to the ddb C API to dump checksum information for a given VOS path. Extend vos_fetch_begin() to expose the actual stored epoch of the single value found during a VOS_OF_FETCH_CSUM fetch: - Add ic_sv_epoch to vos_io_context, populated in akey_fetch_single() when a real SV is found within the valid epoch range (0 otherwise). - Add vos_ioh2sv_epoch() public accessor. The dv_dump_csum_cb callback takes two typed parameters: struct daos_recx_ep_list *recx_rel — non-NULL for array akeys daos_epoch_t sv_epoch — actual stored epoch for SV akeys The print_csum_sv / write_file_csum_sv functions display the stored epoch alongside the checksum type, length, and value. Tests: - VOS400.2: vos_fetch_begin records the stored SV epoch in the fetch handle — four sub-cases: EPOCH_MAX, LE probe, exact match, not found. - Updated ddb VOS-level callbacks to assert sv_epoch values. - Updated ddb command-level tests to verify epoch in printed output. Features: recovery Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@hpe.com>
25cd7f6 to
edd5452
Compare
…/daos-17321/patch-003 Features: recovery
|
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18543/3/execution/node/1446/log |
|
Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18543/3/execution/node/1436/log |
…/daos-17321/patch-003 Features: recovery
|
Test stage Functional Hardware Large MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18543/4/execution/node/1641/log |
|
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18543/4/execution/node/1600/log |
|
Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18543/4/execution/node/1690/log |
…/daos-17321/patch-003 Features: recovery
Context
Third patch in the DAOS-17321 series:
VOS_OF_FETCH_CSUMtovos_fetch_begin()to retrieve per-extent checksum metadata without fetching data, with unit tests VOS400.1, VOS401.1, VOS401.2.dv_dump_csum()to the ddb VOS API: fetches checksum metadata viaVOS_OF_FETCH_CSUMand delivers it to the caller through adv_dump_csum_cbcallback.dv_dump_csum(), extends the VOS layer to expose the actual stored epoch of a fetched single value, and provides comprehensive unit and integration tests.Changes
ddb_run_csum_dump()— new ddb C API command (ddb.h,ddb_commands.c)ddb_run_csum_dump(struct ddb_ctx *ctx, struct csum_dump_options *opt)is the top-level command function. The option struct exposes three fields:path— VOS tree path to the akey (required)epoch— epoch for the fetch (DAOS_EPOCH_MAXfor the latest visible version)dst— optional output file; if set, raw checksum bytes are written to disk rather than printedThe function resolves the path and dispatches to one of four internal callbacks depending on the akey type (single-value vs array) and the presence of
dst:print_csum_svwrite_file_csum_svprint_csum_recxwrite_file_csum_recxSingle-value output — type, length, actual stored epoch, and checksum value:
Array output — per extent: index range, record size, epoch (from the recx/epoch list), and one checksum value per chunk:
Updated
dv_dump_csum_cbcallback (ddb_vos.h,ddb_vos.c)The callback signature now uses two typed parameters that clearly distinguish the per-akey-type context:
recx_relnon-NULL (the existing recx/epoch list),sv_epoch0.recx_relNULL,sv_epochthe actual stored epoch (see below).VOS: actual stored SV epoch in the fetch handle (
vos_io.c,vos.h)vos_fetch_begin()withVOS_OF_FETCH_CSUMnow records the epoch of the single value found during the B-tree walk, so the caller knows which version was retrieved.ic_sv_epoch(daos_epoch_t) added tovos_io_context, zero-initialized bycalloc. Set inakey_fetch_single()when a real SV is found within the valid epoch range (holes,DER_NONEXIST, and uncertainty violations leave it at 0, consistent with the DAOS convention that 0 is the "not set" epoch sentinel). Forarray akeys
akey_fetch_single()is never entered, soic_sv_epochstays 0.vos_ioh2sv_epoch(daos_handle_t ioh)public accessor added tovos.h/vos_io.c.Tests
VOS layer (
vts_io.c):assert_int_equal(vos_ioh2sv_epoch(ioh), 42)added alongside the existing csum data checks."vos_fetch_begin records the stored SV epoch in the fetch handle"— two SV versions at epochs 10 and 20, four fetch scenarios:DAOS_EPOCH_MAX→ 20, epoch 15 → 10 (LE probe), epoch 10 → 10 (exact match), epoch 9 → 0 (key not yet visible).ddb VOS API (
ddb_vos_tests.c): all fivedv_dump_csum_cbtest callbacks updated to the new signature; SV callbacks assert the expectedsv_epochvalues (1 and 2); RECX callbacks assertsv_epoch == 0.ddb C API (
ddb_commands_tests.c): five new tests under a dedicated csum suite with its own VOS pool setup/teardown:csum_dump_error_tests— invalid path, incomplete path, invalid containerprint_csum_sv_tests— no-csum case, EPOCH_MAX (epoch 2), and epoch 1; checks epoch value, csum type, and csum byteswrite_csum_sv_tests— "Dumping checksum" log line and raw bytes written to a mock fileprint_csum_recx_tests— multi-extent output with per-extent epoch and csum valueswrite_csum_recx_tests— multi-extent file writeSteps for the author:
After all prior steps are complete: