Add 'osism reset facts' command to clear the Ansible fact cache by berendt · Pull Request #2389 · osism/python-osism

berendt · 2026-06-17T15:31:23Z

Summary

Adds osism reset facts, a command that clears the Ansible fact cache so
operators can drop stale or incorrect facts during troubleshooting. The
fact cache lives in Redis under ansible_facts<host> keys (read by
osism get facts, osism/api.py, and check_ansible_facts); until now
there was no command to clear it.

By default the command flushes the entire cache.
-l/--limit <pattern> restricts the reset to selected hosts or groups,
resolving the pattern through ansible-inventory --list --limit.
It only clears the cache — it does not start a fact-gathering run and
asks for no confirmation. The cache rebuilds on the next Ansible run
that gathers facts.

Closes #2388

Changes (in commit order)

Add 'osism reset facts' command to clear the fact cache
- New osism/commands/reset.py with class Facts(Command):
  - No limit → SCAN ansible_facts* and batch-delete every key
    (mirrors the scan loop in osism/utils/__init__.py:check_ansible_facts).
  - -l/--limit → expand the host/group pattern via ansible-inventory
    and delete only the matching ansible_facts<host> keys
    (mirrors the limit→inventory idiom in osism/commands/report.py).
  - Redis key deletion reuses the utils.redis.delete(...) idiom from
    osism/commands/vault.py.
- Registers the entry point reset facts = osism.commands.reset:Facts
  in setup.cfg (no reset namespace existed before; no collision).
Add unit tests for 'osism reset facts'
- New tests/unit/commands/test_reset.py covering flush-all,
  empty-cache no-op, Redis error, --limit host expansion, and the
  inventory-load failure and timeout paths.

Notes for reviewers

Command name (reset facts) is a design choice the issue did not
fix. It avoids overloading sync facts (which does gather) and there
is no existing reset namespace. Renaming is a one-line setup.cfg
change if you prefer e.g. facts reset.
Redis cache only. A second, file-based fact cache at
/cache/facts/<host> is read only by osism/api.py:search_inventory.
This command clears the canonical Redis cache that the operator-facing
CLI uses; clearing the API's file cache is out of scope absent a
confirmed need.
The external CLI reference (osism.tech) lives in a separate repo and is
not updated here; the in-repo doc surface is the command docstring and
--limit help text rendered by osism reset facts --help.

Local verification

flake8 osism/commands/reset.py tests/unit/commands/test_reset.py — pass
Unit suite (pytest tests/unit) runs in CI/Zuul per project convention.

Implemented by planwerk-review with Claude:claude-opus-4-8

sourcery-ai

Hey - I've found 3 issues, and left some high level feedback:

In _reset_all, consider deleting keys per SCAN batch instead of accumulating all keys in a list before calling delete, to reduce memory use and avoid building a large in-memory list on big inventories.
In _reset_limited, json.loads(result.stdout) is unguarded; if ansible-inventory prints invalid or partial JSON this will raise and crash the command, so it would be safer to wrap it in a try/except, log a clear error, and return a non-zero exit code.
For the limited reset path, you can improve Redis efficiency by issuing a single utils.redis.delete(*keys) (or using a pipeline) instead of looping over hosts and calling delete once per host.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `_reset_all`, consider deleting keys per SCAN batch instead of accumulating all keys in a list before calling `delete`, to reduce memory use and avoid building a large in-memory list on big inventories.
- In `_reset_limited`, `json.loads(result.stdout)` is unguarded; if `ansible-inventory` prints invalid or partial JSON this will raise and crash the command, so it would be safer to wrap it in a try/except, log a clear error, and return a non-zero exit code.
- For the limited reset path, you can improve Redis efficiency by issuing a single `utils.redis.delete(*keys)` (or using a pipeline) instead of looping over hosts and calling `delete` once per host.

## Individual Comments

### Comment 1
<location path="osism/commands/reset.py" line_range="82" />
<code_context>
+            logger.error("Timeout loading inventory.")
+            return 1
+
+        hosts = get_hosts_from_inventory(json.loads(result.stdout))
+
+        if not hosts:
</code_context>
<issue_to_address>
**issue:** Handle invalid JSON from `ansible-inventory` to avoid unhandled exceptions

If `ansible-inventory` outputs malformed/partial JSON (for example, mixed with plugin errors on stdout), `json.loads(result.stdout)` will raise `JSONDecodeError` and skip your current error handling. Consider wrapping this call in `try/except json.JSONDecodeError` and handling it like the timeout/return-code paths (log and return a non-zero exit code).
</issue_to_address>

### Comment 2
<location path="osism/commands/reset.py" line_range="75-77" />
<code_context>
+                timeout=30,
+            )
+
+            if result.returncode != 0:
+                logger.error("Error loading inventory.")
+                return 1
+        except subprocess.TimeoutExpired:
</code_context>
<issue_to_address>
**suggestion:** Log more detail (stderr/return code) when inventory loading fails

Consider logging `result.stderr` and/or `result.returncode` here to aid debugging, e.g. `logger.error(f"Error loading inventory (rc={result.returncode}): {result.stderr}")`. This preserves user-facing behavior while providing more actionable diagnostics.

```suggestion
            if result.returncode != 0:
                logger.error(
                    "Error loading inventory (rc=%s): %s",
                    result.returncode,
                    result.stderr,
                )
                return 1
```
</issue_to_address>

### Comment 3
<location path="tests/unit/commands/test_reset.py" line_range="87" />
<code_context>
+# --- limited path ---
+
+
+def test_facts_limit_deletes_only_selected_hosts(mock_redis):
+    mock_redis.delete.return_value = 1
+    ok = MagicMock()
</code_context>
<issue_to_address>
**suggestion (testing):** Add coverage for Redis errors in the `--limit` code path.

The flush-all path already tests `RedisError` handling (`test_facts_returns_nonzero_on_redis_error`), but the limited path (`_reset_limited`) doesn’t. Please add a `--limit` test that sets `mock_redis.delete.side_effect = RedisError(...)` and asserts that the command returns `1`, stops deleting further keys, and logs the expected error, so both reset strategies have equivalent, verified error handling.

Suggested implementation:

```python
def test_facts_limit_deletes_only_selected_hosts(mock_redis):
    mock_redis.delete.return_value = 1
    ok = MagicMock()
    ok.returncode = 0
    ok.stdout = "{}"

    with patch(
        "osism.commands.reset.get_inventory_path",
        return_value="/ansible/inventory/hosts.yml",
    ), patch("osism.commands.reset.subprocess.run", return_value=ok), patch(
        "osism.commands.reset.get_hosts_from_inventory",
        return_value=["node1", "node2"],
    ):
        # invoke the limited reset path and ensure only the selected hosts are deleted
        from osism.commands.reset import facts

        rc = facts(mock_redis, limit=["node1", "node2"])

        assert rc == 0
        # ensure we only delete the keys for the selected hosts
        mock_redis.delete.assert_has_calls(
            [
                call("ansible_facts:node1"),
                call("ansible_facts:node2"),
            ],
            any_order=False,
        )
        assert mock_redis.delete.call_count == 2


def test_facts_limit_returns_nonzero_on_redis_error_and_stops_deleting(
    mock_redis, loguru_logs
):
    # first delete raises, subsequent deletes must not be attempted
    mock_redis.delete.side_effect = RedisError("boom")
    ok = MagicMock()
    ok.returncode = 0
    ok.stdout = "{}"

    with patch(
        "osism.commands.reset.get_inventory_path",
        return_value="/ansible/inventory/hosts.yml",
    ), patch("osism.commands.reset.subprocess.run", return_value=ok), patch(
        "osism.commands.reset.get_hosts_from_inventory",
        return_value=["node1", "node2"],
    ):
        from osism.commands.reset import facts

        rc = facts(mock_redis, limit=["node1", "node2"])

    # command should indicate failure
    assert rc == 1
    # only a single delete attempt should have been made
    assert mock_redis.delete.call_count == 1

    # error should be logged in the same way as the flush-all path
    errors = [r for r in loguru_logs if r["level"] == "ERROR"]
    assert any(
        "Failed to reset Ansible fact cache" in r["message"] for r in errors
    )

```

1. This change assumes `call` and `RedisError` are already imported in this test module (they likely are, given the existing flush-all tests). If not, add:
   - `from redis import RedisError`
   - `from unittest.mock import call`
2. The invocation `facts(mock_redis, limit=["node1", "node2"])` assumes the reset command exposes a `facts` function that accepts a Redis client and a `limit` argument. If the real interface differs (e.g. a Click entrypoint or a different function name/signature), adjust the call accordingly while preserving the semantics:
   - The "success" test must exercise the limited reset path and assert two deletes for the selected hosts.
   - The new error-handling test must configure `mock_redis.delete.side_effect = RedisError(...)`, assert that the return code is `1`, that only one delete is attempted, and that the same error message as the flush-all path is logged.
3. If the actual Redis key format for per-host facts is different (e.g. `ansible_facts:host:{name}`), update the keys in the `assert_has_calls` expectations to match the real implementation so the test validates the correct keys are being deleted.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-06-17T15:43:53Z

+# --- limited path ---
+
+
+def test_facts_limit_deletes_only_selected_hosts(mock_redis):


suggestion (testing): Add coverage for Redis errors in the --limit code path.

The flush-all path already tests RedisError handling (test_facts_returns_nonzero_on_redis_error), but the limited path (_reset_limited) doesn’t. Please add a --limit test that sets mock_redis.delete.side_effect = RedisError(...) and asserts that the command returns 1, stops deleting further keys, and logs the expected error, so both reset strategies have equivalent, verified error handling.

Suggested implementation:

def test_facts_limit_deletes_only_selected_hosts(mock_redis): mock_redis.delete.return_value = 1 ok = MagicMock() ok.returncode = 0 ok.stdout = "{}" with patch( "osism.commands.reset.get_inventory_path", return_value="/ansible/inventory/hosts.yml", ), patch("osism.commands.reset.subprocess.run", return_value=ok), patch( "osism.commands.reset.get_hosts_from_inventory", return_value=["node1", "node2"], ): # invoke the limited reset path and ensure only the selected hosts are deleted from osism.commands.reset import facts rc = facts(mock_redis, limit=["node1", "node2"]) assert rc == 0 # ensure we only delete the keys for the selected hosts mock_redis.delete.assert_has_calls( [ call("ansible_facts:node1"), call("ansible_facts:node2"), ], any_order=False, ) assert mock_redis.delete.call_count == 2 def test_facts_limit_returns_nonzero_on_redis_error_and_stops_deleting( mock_redis, loguru_logs ): # first delete raises, subsequent deletes must not be attempted mock_redis.delete.side_effect = RedisError("boom") ok = MagicMock() ok.returncode = 0 ok.stdout = "{}" with patch( "osism.commands.reset.get_inventory_path", return_value="/ansible/inventory/hosts.yml", ), patch("osism.commands.reset.subprocess.run", return_value=ok), patch( "osism.commands.reset.get_hosts_from_inventory", return_value=["node1", "node2"], ): from osism.commands.reset import facts rc = facts(mock_redis, limit=["node1", "node2"]) # command should indicate failure assert rc == 1 # only a single delete attempt should have been made assert mock_redis.delete.call_count == 1 # error should be logged in the same way as the flush-all path errors = [r for r in loguru_logs if r["level"] == "ERROR"] assert any( "Failed to reset Ansible fact cache" in r["message"] for r in errors )

This change assumes call and RedisError are already imported in this test module (they likely are, given the existing flush-all tests). If not, add:

from redis import RedisError

from unittest.mock import call

The invocation facts(mock_redis, limit=["node1", "node2"]) assumes the reset command exposes a facts function that accepts a Redis client and a limit argument. If the real interface differs (e.g. a Click entrypoint or a different function name/signature), adjust the call accordingly while preserving the semantics:

The "success" test must exercise the limited reset path and assert two deletes for the selected hosts.

The new error-handling test must configure mock_redis.delete.side_effect = RedisError(...), assert that the return code is 1, that only one delete is attempted, and that the same error message as the flush-all path is logged.

If the actual Redis key format for per-host facts is different (e.g. ansible_facts:host:{name}), update the keys in the assert_has_calls expectations to match the real implementation so the test validates the correct keys are being deleted.

Operators troubleshooting a deployment need a clean way to drop cached Ansible facts. Stale or wrong facts can make Ansible act on outdated information, producing confusing results that are hard to diagnose, and there is no direct command to clear them today. Add a cliff command 'osism reset facts' that deletes the 'ansible_facts*' fact-cache keys from Redis. Without arguments it flushes the whole cache; with '-l/--limit' it resolves the host or group pattern via 'ansible-inventory --list --limit' and deletes only the matching hosts' keys. The command only clears the cache; it does not start a fact-gathering run and asks for no confirmation, since the cache is rebuilt automatically on the next Ansible run. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Christian Berendt <berendt@osism.tech>

Cover both reset paths and their edges: flushing the whole ansible_facts* cache, the empty-cache no-op, a Redis error, the --limit host expansion, and the inventory-load failure and timeout contracts. Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Christian Berendt <berendt@osism.tech>

osism-agent added this to Human Board Jun 17, 2026

github-project-automation Bot moved this to Ready in Human Board Jun 17, 2026

berendt mentioned this pull request Jun 17, 2026

Add a command to reset the Ansible fact cache #2388

Open

berendt marked this pull request as ready for review June 17, 2026 15:41

berendt requested a review from ideaship June 17, 2026 15:42

sourcery-ai Bot reviewed Jun 17, 2026

View reviewed changes

berendt force-pushed the implement/issue-2388-reset-facts branch from f6c5ace to 86176ca Compare June 17, 2026 17:28

berendt force-pushed the implement/issue-2388-reset-facts branch from 86176ca to 8d0c51e Compare June 17, 2026 17:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add 'osism reset facts' command to clear the Ansible fact cache#2389

Add 'osism reset facts' command to clear the Ansible fact cache#2389
berendt wants to merge 2 commits into
mainfrom
implement/issue-2388-reset-facts

berendt commented Jun 17, 2026 •

edited

Loading

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

sourcery-ai Bot Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		# --- limited path ---


		def test_facts_limit_deletes_only_selected_hosts(mock_redis):

Conversation

berendt commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes (in commit order)

Notes for reviewers

Local verification

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sourcery-ai Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

berendt commented Jun 17, 2026 •

edited

Loading