Add 'osism reset facts' command to clear the Ansible fact cache#2389
Open
berendt wants to merge 2 commits into
Open
Add 'osism reset facts' command to clear the Ansible fact cache#2389berendt wants to merge 2 commits into
berendt wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Hey - I've found 3 issues, and left some high level feedback:
- In
_reset_all, consider deleting keys per SCAN batch instead of accumulating all keys in a list before callingdelete, to reduce memory use and avoid building a large in-memory list on big inventories. - In
_reset_limited,json.loads(result.stdout)is unguarded; ifansible-inventoryprints invalid or partial JSON this will raise and crash the command, so it would be safer to wrap it in a try/except, log a clear error, and return a non-zero exit code. - For the limited reset path, you can improve Redis efficiency by issuing a single
utils.redis.delete(*keys)(or using a pipeline) instead of looping over hosts and callingdeleteonce per host.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `_reset_all`, consider deleting keys per SCAN batch instead of accumulating all keys in a list before calling `delete`, to reduce memory use and avoid building a large in-memory list on big inventories.
- In `_reset_limited`, `json.loads(result.stdout)` is unguarded; if `ansible-inventory` prints invalid or partial JSON this will raise and crash the command, so it would be safer to wrap it in a try/except, log a clear error, and return a non-zero exit code.
- For the limited reset path, you can improve Redis efficiency by issuing a single `utils.redis.delete(*keys)` (or using a pipeline) instead of looping over hosts and calling `delete` once per host.
## Individual Comments
### Comment 1
<location path="osism/commands/reset.py" line_range="82" />
<code_context>
+ logger.error("Timeout loading inventory.")
+ return 1
+
+ hosts = get_hosts_from_inventory(json.loads(result.stdout))
+
+ if not hosts:
</code_context>
<issue_to_address>
**issue:** Handle invalid JSON from `ansible-inventory` to avoid unhandled exceptions
If `ansible-inventory` outputs malformed/partial JSON (for example, mixed with plugin errors on stdout), `json.loads(result.stdout)` will raise `JSONDecodeError` and skip your current error handling. Consider wrapping this call in `try/except json.JSONDecodeError` and handling it like the timeout/return-code paths (log and return a non-zero exit code).
</issue_to_address>
### Comment 2
<location path="osism/commands/reset.py" line_range="75-77" />
<code_context>
+ timeout=30,
+ )
+
+ if result.returncode != 0:
+ logger.error("Error loading inventory.")
+ return 1
+ except subprocess.TimeoutExpired:
</code_context>
<issue_to_address>
**suggestion:** Log more detail (stderr/return code) when inventory loading fails
Consider logging `result.stderr` and/or `result.returncode` here to aid debugging, e.g. `logger.error(f"Error loading inventory (rc={result.returncode}): {result.stderr}")`. This preserves user-facing behavior while providing more actionable diagnostics.
```suggestion
if result.returncode != 0:
logger.error(
"Error loading inventory (rc=%s): %s",
result.returncode,
result.stderr,
)
return 1
```
</issue_to_address>
### Comment 3
<location path="tests/unit/commands/test_reset.py" line_range="87" />
<code_context>
+# --- limited path ---
+
+
+def test_facts_limit_deletes_only_selected_hosts(mock_redis):
+ mock_redis.delete.return_value = 1
+ ok = MagicMock()
</code_context>
<issue_to_address>
**suggestion (testing):** Add coverage for Redis errors in the `--limit` code path.
The flush-all path already tests `RedisError` handling (`test_facts_returns_nonzero_on_redis_error`), but the limited path (`_reset_limited`) doesn’t. Please add a `--limit` test that sets `mock_redis.delete.side_effect = RedisError(...)` and asserts that the command returns `1`, stops deleting further keys, and logs the expected error, so both reset strategies have equivalent, verified error handling.
Suggested implementation:
```python
def test_facts_limit_deletes_only_selected_hosts(mock_redis):
mock_redis.delete.return_value = 1
ok = MagicMock()
ok.returncode = 0
ok.stdout = "{}"
with patch(
"osism.commands.reset.get_inventory_path",
return_value="/ansible/inventory/hosts.yml",
), patch("osism.commands.reset.subprocess.run", return_value=ok), patch(
"osism.commands.reset.get_hosts_from_inventory",
return_value=["node1", "node2"],
):
# invoke the limited reset path and ensure only the selected hosts are deleted
from osism.commands.reset import facts
rc = facts(mock_redis, limit=["node1", "node2"])
assert rc == 0
# ensure we only delete the keys for the selected hosts
mock_redis.delete.assert_has_calls(
[
call("ansible_facts:node1"),
call("ansible_facts:node2"),
],
any_order=False,
)
assert mock_redis.delete.call_count == 2
def test_facts_limit_returns_nonzero_on_redis_error_and_stops_deleting(
mock_redis, loguru_logs
):
# first delete raises, subsequent deletes must not be attempted
mock_redis.delete.side_effect = RedisError("boom")
ok = MagicMock()
ok.returncode = 0
ok.stdout = "{}"
with patch(
"osism.commands.reset.get_inventory_path",
return_value="/ansible/inventory/hosts.yml",
), patch("osism.commands.reset.subprocess.run", return_value=ok), patch(
"osism.commands.reset.get_hosts_from_inventory",
return_value=["node1", "node2"],
):
from osism.commands.reset import facts
rc = facts(mock_redis, limit=["node1", "node2"])
# command should indicate failure
assert rc == 1
# only a single delete attempt should have been made
assert mock_redis.delete.call_count == 1
# error should be logged in the same way as the flush-all path
errors = [r for r in loguru_logs if r["level"] == "ERROR"]
assert any(
"Failed to reset Ansible fact cache" in r["message"] for r in errors
)
```
1. This change assumes `call` and `RedisError` are already imported in this test module (they likely are, given the existing flush-all tests). If not, add:
- `from redis import RedisError`
- `from unittest.mock import call`
2. The invocation `facts(mock_redis, limit=["node1", "node2"])` assumes the reset command exposes a `facts` function that accepts a Redis client and a `limit` argument. If the real interface differs (e.g. a Click entrypoint or a different function name/signature), adjust the call accordingly while preserving the semantics:
- The "success" test must exercise the limited reset path and assert two deletes for the selected hosts.
- The new error-handling test must configure `mock_redis.delete.side_effect = RedisError(...)`, assert that the return code is `1`, that only one delete is attempted, and that the same error message as the flush-all path is logged.
3. If the actual Redis key format for per-host facts is different (e.g. `ansible_facts:host:{name}`), update the keys in the `assert_has_calls` expectations to match the real implementation so the test validates the correct keys are being deleted.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| # --- limited path --- | ||
|
|
||
|
|
||
| def test_facts_limit_deletes_only_selected_hosts(mock_redis): |
There was a problem hiding this comment.
suggestion (testing): Add coverage for Redis errors in the --limit code path.
The flush-all path already tests RedisError handling (test_facts_returns_nonzero_on_redis_error), but the limited path (_reset_limited) doesn’t. Please add a --limit test that sets mock_redis.delete.side_effect = RedisError(...) and asserts that the command returns 1, stops deleting further keys, and logs the expected error, so both reset strategies have equivalent, verified error handling.
Suggested implementation:
def test_facts_limit_deletes_only_selected_hosts(mock_redis):
mock_redis.delete.return_value = 1
ok = MagicMock()
ok.returncode = 0
ok.stdout = "{}"
with patch(
"osism.commands.reset.get_inventory_path",
return_value="/ansible/inventory/hosts.yml",
), patch("osism.commands.reset.subprocess.run", return_value=ok), patch(
"osism.commands.reset.get_hosts_from_inventory",
return_value=["node1", "node2"],
):
# invoke the limited reset path and ensure only the selected hosts are deleted
from osism.commands.reset import facts
rc = facts(mock_redis, limit=["node1", "node2"])
assert rc == 0
# ensure we only delete the keys for the selected hosts
mock_redis.delete.assert_has_calls(
[
call("ansible_facts:node1"),
call("ansible_facts:node2"),
],
any_order=False,
)
assert mock_redis.delete.call_count == 2
def test_facts_limit_returns_nonzero_on_redis_error_and_stops_deleting(
mock_redis, loguru_logs
):
# first delete raises, subsequent deletes must not be attempted
mock_redis.delete.side_effect = RedisError("boom")
ok = MagicMock()
ok.returncode = 0
ok.stdout = "{}"
with patch(
"osism.commands.reset.get_inventory_path",
return_value="/ansible/inventory/hosts.yml",
), patch("osism.commands.reset.subprocess.run", return_value=ok), patch(
"osism.commands.reset.get_hosts_from_inventory",
return_value=["node1", "node2"],
):
from osism.commands.reset import facts
rc = facts(mock_redis, limit=["node1", "node2"])
# command should indicate failure
assert rc == 1
# only a single delete attempt should have been made
assert mock_redis.delete.call_count == 1
# error should be logged in the same way as the flush-all path
errors = [r for r in loguru_logs if r["level"] == "ERROR"]
assert any(
"Failed to reset Ansible fact cache" in r["message"] for r in errors
)- This change assumes
callandRedisErrorare already imported in this test module (they likely are, given the existing flush-all tests). If not, add:from redis import RedisErrorfrom unittest.mock import call
- The invocation
facts(mock_redis, limit=["node1", "node2"])assumes the reset command exposes afactsfunction that accepts a Redis client and alimitargument. If the real interface differs (e.g. a Click entrypoint or a different function name/signature), adjust the call accordingly while preserving the semantics:- The "success" test must exercise the limited reset path and assert two deletes for the selected hosts.
- The new error-handling test must configure
mock_redis.delete.side_effect = RedisError(...), assert that the return code is1, that only one delete is attempted, and that the same error message as the flush-all path is logged.
- If the actual Redis key format for per-host facts is different (e.g.
ansible_facts:host:{name}), update the keys in theassert_has_callsexpectations to match the real implementation so the test validates the correct keys are being deleted.
Operators troubleshooting a deployment need a clean way to drop cached Ansible facts. Stale or wrong facts can make Ansible act on outdated information, producing confusing results that are hard to diagnose, and there is no direct command to clear them today. Add a cliff command 'osism reset facts' that deletes the 'ansible_facts*' fact-cache keys from Redis. Without arguments it flushes the whole cache; with '-l/--limit' it resolves the host or group pattern via 'ansible-inventory --list --limit' and deletes only the matching hosts' keys. The command only clears the cache; it does not start a fact-gathering run and asks for no confirmation, since the cache is rebuilt automatically on the next Ansible run. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Christian Berendt <berendt@osism.tech>
f6c5ace to
86176ca
Compare
Cover both reset paths and their edges: flushing the whole ansible_facts* cache, the empty-cache no-op, a Redis error, the --limit host expansion, and the inventory-load failure and timeout contracts. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Christian Berendt <berendt@osism.tech>
86176ca to
8d0c51e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
osism reset facts, a command that clears the Ansible fact cache sooperators can drop stale or incorrect facts during troubleshooting. The
fact cache lives in Redis under
ansible_facts<host>keys (read byosism get facts,osism/api.py, andcheck_ansible_facts); until nowthere was no command to clear it.
-l/--limit <pattern>restricts the reset to selected hosts or groups,resolving the pattern through
ansible-inventory --list --limit.asks for no confirmation. The cache rebuilds on the next Ansible run
that gathers facts.
Closes #2388
Changes (in commit order)
Add 'osism reset facts' command to clear the fact cacheosism/commands/reset.pywithclass Facts(Command):ansible_facts*and batch-delete every key(mirrors the scan loop in
osism/utils/__init__.py:check_ansible_facts).-l/--limit→ expand the host/group pattern viaansible-inventoryand delete only the matching
ansible_facts<host>keys(mirrors the limit→inventory idiom in
osism/commands/report.py).utils.redis.delete(...)idiom fromosism/commands/vault.py.reset facts = osism.commands.reset:Factsin
setup.cfg(noresetnamespace existed before; no collision).Add unit tests for 'osism reset facts'tests/unit/commands/test_reset.pycovering flush-all,empty-cache no-op, Redis error,
--limithost expansion, and theinventory-load failure and timeout paths.
Notes for reviewers
reset facts) is a design choice the issue did notfix. It avoids overloading
sync facts(which does gather) and thereis no existing
resetnamespace. Renaming is a one-linesetup.cfgchange if you prefer e.g.
facts reset./cache/facts/<host>is read only byosism/api.py:search_inventory.This command clears the canonical Redis cache that the operator-facing
CLI uses; clearing the API's file cache is out of scope absent a
confirmed need.
not updated here; the in-repo doc surface is the command docstring and
--limithelp text rendered byosism reset facts --help.Local verification
flake8 osism/commands/reset.py tests/unit/commands/test_reset.py— passpytest tests/unit) runs in CI/Zuul per project convention.Implemented by planwerk-review with Claude:claude-opus-4-8