Skip to content

Add 'osism reset facts' command to clear the Ansible fact cache#2389

Open
berendt wants to merge 2 commits into
mainfrom
implement/issue-2388-reset-facts
Open

Add 'osism reset facts' command to clear the Ansible fact cache#2389
berendt wants to merge 2 commits into
mainfrom
implement/issue-2388-reset-facts

Conversation

@berendt

@berendt berendt commented Jun 17, 2026

Copy link
Copy Markdown
Member

Summary

Adds osism reset facts, a command that clears the Ansible fact cache so
operators can drop stale or incorrect facts during troubleshooting. The
fact cache lives in Redis under ansible_facts<host> keys (read by
osism get facts, osism/api.py, and check_ansible_facts); until now
there was no command to clear it.

  • By default the command flushes the entire cache.
  • -l/--limit <pattern> restricts the reset to selected hosts or groups,
    resolving the pattern through ansible-inventory --list --limit.
  • It only clears the cache — it does not start a fact-gathering run and
    asks for no confirmation. The cache rebuilds on the next Ansible run
    that gathers facts.

Closes #2388

Changes (in commit order)

  1. Add 'osism reset facts' command to clear the fact cache

    • New osism/commands/reset.py with class Facts(Command):
      • No limit → SCAN ansible_facts* and batch-delete every key
        (mirrors the scan loop in osism/utils/__init__.py:check_ansible_facts).
      • -l/--limit → expand the host/group pattern via ansible-inventory
        and delete only the matching ansible_facts<host> keys
        (mirrors the limit→inventory idiom in osism/commands/report.py).
      • Redis key deletion reuses the utils.redis.delete(...) idiom from
        osism/commands/vault.py.
    • Registers the entry point reset facts = osism.commands.reset:Facts
      in setup.cfg (no reset namespace existed before; no collision).
  2. Add unit tests for 'osism reset facts'

    • New tests/unit/commands/test_reset.py covering flush-all,
      empty-cache no-op, Redis error, --limit host expansion, and the
      inventory-load failure and timeout paths.

Notes for reviewers

  • Command name (reset facts) is a design choice the issue did not
    fix. It avoids overloading sync facts (which does gather) and there
    is no existing reset namespace. Renaming is a one-line setup.cfg
    change if you prefer e.g. facts reset.
  • Redis cache only. A second, file-based fact cache at
    /cache/facts/<host> is read only by osism/api.py:search_inventory.
    This command clears the canonical Redis cache that the operator-facing
    CLI uses; clearing the API's file cache is out of scope absent a
    confirmed need.
  • The external CLI reference (osism.tech) lives in a separate repo and is
    not updated here; the in-repo doc surface is the command docstring and
    --limit help text rendered by osism reset facts --help.

Local verification

  • flake8 osism/commands/reset.py tests/unit/commands/test_reset.py — pass
  • Unit suite (pytest tests/unit) runs in CI/Zuul per project convention.

Implemented by planwerk-review with Claude:claude-opus-4-8

@berendt berendt marked this pull request as ready for review June 17, 2026 15:41
@berendt berendt requested a review from ideaship June 17, 2026 15:42

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 3 issues, and left some high level feedback:

  • In _reset_all, consider deleting keys per SCAN batch instead of accumulating all keys in a list before calling delete, to reduce memory use and avoid building a large in-memory list on big inventories.
  • In _reset_limited, json.loads(result.stdout) is unguarded; if ansible-inventory prints invalid or partial JSON this will raise and crash the command, so it would be safer to wrap it in a try/except, log a clear error, and return a non-zero exit code.
  • For the limited reset path, you can improve Redis efficiency by issuing a single utils.redis.delete(*keys) (or using a pipeline) instead of looping over hosts and calling delete once per host.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `_reset_all`, consider deleting keys per SCAN batch instead of accumulating all keys in a list before calling `delete`, to reduce memory use and avoid building a large in-memory list on big inventories.
- In `_reset_limited`, `json.loads(result.stdout)` is unguarded; if `ansible-inventory` prints invalid or partial JSON this will raise and crash the command, so it would be safer to wrap it in a try/except, log a clear error, and return a non-zero exit code.
- For the limited reset path, you can improve Redis efficiency by issuing a single `utils.redis.delete(*keys)` (or using a pipeline) instead of looping over hosts and calling `delete` once per host.

## Individual Comments

### Comment 1
<location path="osism/commands/reset.py" line_range="82" />
<code_context>
+            logger.error("Timeout loading inventory.")
+            return 1
+
+        hosts = get_hosts_from_inventory(json.loads(result.stdout))
+
+        if not hosts:
</code_context>
<issue_to_address>
**issue:** Handle invalid JSON from `ansible-inventory` to avoid unhandled exceptions

If `ansible-inventory` outputs malformed/partial JSON (for example, mixed with plugin errors on stdout), `json.loads(result.stdout)` will raise `JSONDecodeError` and skip your current error handling. Consider wrapping this call in `try/except json.JSONDecodeError` and handling it like the timeout/return-code paths (log and return a non-zero exit code).
</issue_to_address>

### Comment 2
<location path="osism/commands/reset.py" line_range="75-77" />
<code_context>
+                timeout=30,
+            )
+
+            if result.returncode != 0:
+                logger.error("Error loading inventory.")
+                return 1
+        except subprocess.TimeoutExpired:
</code_context>
<issue_to_address>
**suggestion:** Log more detail (stderr/return code) when inventory loading fails

Consider logging `result.stderr` and/or `result.returncode` here to aid debugging, e.g. `logger.error(f"Error loading inventory (rc={result.returncode}): {result.stderr}")`. This preserves user-facing behavior while providing more actionable diagnostics.

```suggestion
            if result.returncode != 0:
                logger.error(
                    "Error loading inventory (rc=%s): %s",
                    result.returncode,
                    result.stderr,
                )
                return 1
```
</issue_to_address>

### Comment 3
<location path="tests/unit/commands/test_reset.py" line_range="87" />
<code_context>
+# --- limited path ---
+
+
+def test_facts_limit_deletes_only_selected_hosts(mock_redis):
+    mock_redis.delete.return_value = 1
+    ok = MagicMock()
</code_context>
<issue_to_address>
**suggestion (testing):** Add coverage for Redis errors in the `--limit` code path.

The flush-all path already tests `RedisError` handling (`test_facts_returns_nonzero_on_redis_error`), but the limited path (`_reset_limited`) doesn’t. Please add a `--limit` test that sets `mock_redis.delete.side_effect = RedisError(...)` and asserts that the command returns `1`, stops deleting further keys, and logs the expected error, so both reset strategies have equivalent, verified error handling.

Suggested implementation:

```python
def test_facts_limit_deletes_only_selected_hosts(mock_redis):
    mock_redis.delete.return_value = 1
    ok = MagicMock()
    ok.returncode = 0
    ok.stdout = "{}"

    with patch(
        "osism.commands.reset.get_inventory_path",
        return_value="/ansible/inventory/hosts.yml",
    ), patch("osism.commands.reset.subprocess.run", return_value=ok), patch(
        "osism.commands.reset.get_hosts_from_inventory",
        return_value=["node1", "node2"],
    ):
        # invoke the limited reset path and ensure only the selected hosts are deleted
        from osism.commands.reset import facts

        rc = facts(mock_redis, limit=["node1", "node2"])

        assert rc == 0
        # ensure we only delete the keys for the selected hosts
        mock_redis.delete.assert_has_calls(
            [
                call("ansible_facts:node1"),
                call("ansible_facts:node2"),
            ],
            any_order=False,
        )
        assert mock_redis.delete.call_count == 2


def test_facts_limit_returns_nonzero_on_redis_error_and_stops_deleting(
    mock_redis, loguru_logs
):
    # first delete raises, subsequent deletes must not be attempted
    mock_redis.delete.side_effect = RedisError("boom")
    ok = MagicMock()
    ok.returncode = 0
    ok.stdout = "{}"

    with patch(
        "osism.commands.reset.get_inventory_path",
        return_value="/ansible/inventory/hosts.yml",
    ), patch("osism.commands.reset.subprocess.run", return_value=ok), patch(
        "osism.commands.reset.get_hosts_from_inventory",
        return_value=["node1", "node2"],
    ):
        from osism.commands.reset import facts

        rc = facts(mock_redis, limit=["node1", "node2"])

    # command should indicate failure
    assert rc == 1
    # only a single delete attempt should have been made
    assert mock_redis.delete.call_count == 1

    # error should be logged in the same way as the flush-all path
    errors = [r for r in loguru_logs if r["level"] == "ERROR"]
    assert any(
        "Failed to reset Ansible fact cache" in r["message"] for r in errors
    )

```

1. This change assumes `call` and `RedisError` are already imported in this test module (they likely are, given the existing flush-all tests). If not, add:
   - `from redis import RedisError`
   - `from unittest.mock import call`
2. The invocation `facts(mock_redis, limit=["node1", "node2"])` assumes the reset command exposes a `facts` function that accepts a Redis client and a `limit` argument. If the real interface differs (e.g. a Click entrypoint or a different function name/signature), adjust the call accordingly while preserving the semantics:
   - The "success" test must exercise the limited reset path and assert two deletes for the selected hosts.
   - The new error-handling test must configure `mock_redis.delete.side_effect = RedisError(...)`, assert that the return code is `1`, that only one delete is attempted, and that the same error message as the flush-all path is logged.
3. If the actual Redis key format for per-host facts is different (e.g. `ansible_facts:host:{name}`), update the keys in the `assert_has_calls` expectations to match the real implementation so the test validates the correct keys are being deleted.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread osism/commands/reset.py Outdated
Comment thread osism/commands/reset.py
# --- limited path ---


def test_facts_limit_deletes_only_selected_hosts(mock_redis):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add coverage for Redis errors in the --limit code path.

The flush-all path already tests RedisError handling (test_facts_returns_nonzero_on_redis_error), but the limited path (_reset_limited) doesn’t. Please add a --limit test that sets mock_redis.delete.side_effect = RedisError(...) and asserts that the command returns 1, stops deleting further keys, and logs the expected error, so both reset strategies have equivalent, verified error handling.

Suggested implementation:

def test_facts_limit_deletes_only_selected_hosts(mock_redis):
    mock_redis.delete.return_value = 1
    ok = MagicMock()
    ok.returncode = 0
    ok.stdout = "{}"

    with patch(
        "osism.commands.reset.get_inventory_path",
        return_value="/ansible/inventory/hosts.yml",
    ), patch("osism.commands.reset.subprocess.run", return_value=ok), patch(
        "osism.commands.reset.get_hosts_from_inventory",
        return_value=["node1", "node2"],
    ):
        # invoke the limited reset path and ensure only the selected hosts are deleted
        from osism.commands.reset import facts

        rc = facts(mock_redis, limit=["node1", "node2"])

        assert rc == 0
        # ensure we only delete the keys for the selected hosts
        mock_redis.delete.assert_has_calls(
            [
                call("ansible_facts:node1"),
                call("ansible_facts:node2"),
            ],
            any_order=False,
        )
        assert mock_redis.delete.call_count == 2


def test_facts_limit_returns_nonzero_on_redis_error_and_stops_deleting(
    mock_redis, loguru_logs
):
    # first delete raises, subsequent deletes must not be attempted
    mock_redis.delete.side_effect = RedisError("boom")
    ok = MagicMock()
    ok.returncode = 0
    ok.stdout = "{}"

    with patch(
        "osism.commands.reset.get_inventory_path",
        return_value="/ansible/inventory/hosts.yml",
    ), patch("osism.commands.reset.subprocess.run", return_value=ok), patch(
        "osism.commands.reset.get_hosts_from_inventory",
        return_value=["node1", "node2"],
    ):
        from osism.commands.reset import facts

        rc = facts(mock_redis, limit=["node1", "node2"])

    # command should indicate failure
    assert rc == 1
    # only a single delete attempt should have been made
    assert mock_redis.delete.call_count == 1

    # error should be logged in the same way as the flush-all path
    errors = [r for r in loguru_logs if r["level"] == "ERROR"]
    assert any(
        "Failed to reset Ansible fact cache" in r["message"] for r in errors
    )
  1. This change assumes call and RedisError are already imported in this test module (they likely are, given the existing flush-all tests). If not, add:
    • from redis import RedisError
    • from unittest.mock import call
  2. The invocation facts(mock_redis, limit=["node1", "node2"]) assumes the reset command exposes a facts function that accepts a Redis client and a limit argument. If the real interface differs (e.g. a Click entrypoint or a different function name/signature), adjust the call accordingly while preserving the semantics:
    • The "success" test must exercise the limited reset path and assert two deletes for the selected hosts.
    • The new error-handling test must configure mock_redis.delete.side_effect = RedisError(...), assert that the return code is 1, that only one delete is attempted, and that the same error message as the flush-all path is logged.
  3. If the actual Redis key format for per-host facts is different (e.g. ansible_facts:host:{name}), update the keys in the assert_has_calls expectations to match the real implementation so the test validates the correct keys are being deleted.

Operators troubleshooting a deployment need a clean way to drop
cached Ansible facts. Stale or wrong facts can make Ansible act on
outdated information, producing confusing results that are hard to
diagnose, and there is no direct command to clear them today.

Add a cliff command 'osism reset facts' that deletes the
'ansible_facts*' fact-cache keys from Redis. Without arguments it
flushes the whole cache; with '-l/--limit' it resolves the host or
group pattern via 'ansible-inventory --list --limit' and deletes only
the matching hosts' keys. The command only clears the cache; it does
not start a fact-gathering run and asks for no confirmation, since the
cache is rebuilt automatically on the next Ansible run.

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Christian Berendt <berendt@osism.tech>
@berendt berendt force-pushed the implement/issue-2388-reset-facts branch from f6c5ace to 86176ca Compare June 17, 2026 17:28
Cover both reset paths and their edges: flushing the whole
ansible_facts* cache, the empty-cache no-op, a Redis error, the
--limit host expansion, and the inventory-load failure and timeout
contracts.

Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Christian Berendt <berendt@osism.tech>
@berendt berendt force-pushed the implement/issue-2388-reset-facts branch from 86176ca to 8d0c51e Compare June 17, 2026 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Ready

Development

Successfully merging this pull request may close these issues.

Add a command to reset the Ansible fact cache

2 participants