Add draft project security threat-model document#13293
Conversation
Adds a draft project-level security threat-model document (draft-THREAT-MODEL.md) at repo root, improving discoverability for automated security scanners running against this repository. The file follows the rubric format used by several other ASF projects piloting security-model discoverability. The "draft-" prefix signals this is a proposal for the PMC to review, correct, or reject — not a finalised maintainer-blessed model. Every claim carries a provenance tag (documented / inferred / maintainer) so reviewers can see where each claim originates; §14 collects open questions for the maintainers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #13293 +/- ##
============================================
+ Coverage 18.10% 18.76% +0.65%
- Complexity 16752 17974 +1222
============================================
Files 6037 6160 +123
Lines 542796 552571 +9775
Branches 66456 67346 +890
============================================
+ Hits 98291 103705 +5414
- Misses 433460 437459 +3999
- Partials 11045 11407 +362
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Markdown / typos / table-shape fixes per the CI lint output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
There's a lot of details in the draft that needs a better set of eyes, so assigning @DaanHoogland @vishesh92 who're also PMC leads on the work. |
|
Thanks @DaanHoogland @yadvr @vishesh92 — agreed, let's make this (apache/cloudstack) the canonical project-level threat model and have the client/tooling repos inherit from it rather than each carrying a full copy. Concretely, mirroring what we've done for other multi-repo PMCs:
So let's converge here first. None of the satellite PRs are merged, so re-pointing them to reference this model once its shape is settled is cheap — I'll repurpose those into pointer PRs (or close + reopen) once you're happy with the parent. On "the fields we need": that's exactly the §14 "Open questions" section — each is a proposed answer for you to confirm, correct, or strike, grouped into waves so you can take a few at a time. Drop answers inline or here and I'll fold them in and promote the provenance tags. Happy to adjust the section set if CloudStack's shape calls for it. |
…po copy Drop the standalone draft-THREAT-MODEL.md and wire the discoverability chain AGENTS.md -> SECURITY.md -> the project-wide model in apache/cloudstack (apache/cloudstack#13293), so scanners find one canonical model and this repo inherits it rather than duplicating it. Generated-by: Claude Code
…po copy Drop the standalone draft-THREAT-MODEL.md and wire the discoverability chain AGENTS.md -> SECURITY.md -> the project-wide model in apache/cloudstack (apache/cloudstack#13293), so scanners find one canonical model and this repo inherits it rather than duplicating it. Generated-by: Claude Code
…po copy Drop the standalone draft-THREAT-MODEL.md and wire the discoverability chain AGENTS.md -> SECURITY.md -> the project-wide model in apache/cloudstack (apache/cloudstack#13293), so scanners find one canonical model and this repo inherits it rather than duplicating it. Generated-by: Claude Code
…po copy Drop the standalone draft-THREAT-MODEL.md and wire the discoverability chain AGENTS.md -> SECURITY.md -> the project-wide model in apache/cloudstack (apache/cloudstack#13293), so scanners find one canonical model and this repo inherits it rather than duplicating it. Generated-by: Claude Code
| **Q9.** Guest VM workloads — confirm that hypervisor-mediated side | ||
| channels and resource-exhaustion-within-allocation are out of scope, and | ||
| that the in-scope orchestration concerns are limited to "did CloudStack | ||
| place the VM in the right VLAN / apply the right security group / route | ||
| the right IP" (proposed)? *(maps to §3 item 5, §7, §9)* |
There was a problem hiding this comment.
This sound right.
IMO, the only scenario where it would be a cloudstack problem will be if cloudstack is setting wrong/bad settings while launching the guest VM or some other action which results in the corresponding issues with hypervisor. In this scenario, CloudStack needs to ensure it's using correct/secure settings for the hypervisor.
@DaanHoogland What do you think?
| **Q11.** Confirm the unsupported-component list: `tools/marvin/`, | ||
| `test/`, `developer/`, `quickcloud/`, `cloud-cli/`, | ||
| `tools/{devcloud4,devcloud-kvm,appliance,checkstyle,transifex,bugs-wiki,...}`, | ||
| `simulator` hypervisor plugin. Anything to add or remove? *(maps to §3 | ||
| item 7)* |
There was a problem hiding this comment.
@DaanHoogland do you think we need to include simulator & tools/appliance?
There was a problem hiding this comment.
I think we need to exclude them and make that explicit. Later we might want to create tooling with the express purpose of checking security but let’s leave oit out of scope for now.
| **Q18.** 2FA — proposed: off by default, operator turns it on per | ||
| domain / per user via `enable.2fa.*`. Confirm; and is "2FA disabled in | ||
| production" a §10 violation or a deployment choice? *(maps to §5a, | ||
| §10)* |
There was a problem hiding this comment.
IMO, this is a deployment choice. The correct global settings for this are:
enable.user.2fa - default is false. Determines whether two factor authentication is enabled or not. This can also be configured at domain level.
mandate.user.2fa - default is false. Determines whether to make the two factor authentication mandatory or not. This setting is applicable only when enable.user.2fa is true. This can also be configured at domain level.
| **Q20.** Integration API port `:8096` — proposed: closed (port-zero) by | ||
| default in production packaging, open only when explicitly configured; | ||
| when open, it is unauthenticated by design. A report of "integration | ||
| port allows admin commands without auth" is `OUT-OF-MODEL: | ||
| non-default-build` *if* the operator opened it, else `VALID`. Confirm | ||
| the default. *(maps to §5a, §10, §11a)* |
There was a problem hiding this comment.
The default should be 0 (disabled). But I need to confirm this.
@DaanHoogland any idea about this?
There was a problem hiding this comment.
yes, it is set to 0 and only in test configurations it is set to 8096.
|
Thanks @vishesh92 — this is exactly the input the §14 questions were after; folding all of it in. How each lands:
Two still on @DaanHoogland: L976 (whether I'll push the updated model with the confirmed items folded in. Thanks again — this is the review that makes it usable for triage. |
Promotes six §14 questions from proposed to maintainer-confirmed per vishesh92's inline answers (apache#13293): - Q8 root admin: confirmed trusted operator (direct access anyway) -> OUT-OF-MODEL: equivalent-harm (§3 item 4, §7). - Q9 guest/hypervisor: side channels + in-allocation exhaustion out of scope; one in-model case is CloudStack applying wrong/insecure hypervisor settings (Daan to confirm boundary). - Q10 userdata: end-user guest-OS customization, tenant-controlled data in their own boundary, not a CloudStack injection surface. - Q17 proxy headers: proxy.header.verify default false; proxy.header.names read only when Remote_Addr in proxy.cidr. - Q18 2FA: corrected the stale setting names to the real ones - enable.user.2fa + mandate.user.2fa (both default false, domain- configurable); 2FA-off is a deployment choice, not a §10 violation. - Q19 password encoders: greenfield md5/plaintext hashing is OUT-OF-MODEL: non-default-build; effective set PBKDF2,SHA256SALT,SAML2. Clears the pending-Q18/Q19 notes in §10/§11 and updates the tally. L976 (simulator/tools-appliance scope) and L1007 stay open on Daan. Generated-by: Claude Opus 4.8 (1M context)
| **Q13.** Network-fabric assumptions — proposed: at least four logical | ||
| networks (management, public, guest, storage), with the management | ||
| network as the trusted control plane. Is that the canonical model, or | ||
| do you support more compressed topologies (single-fabric) in production? | ||
| *(maps to §5, §10)* |
There was a problem hiding this comment.
There are four logical networks that can each have multiple instances for use in different topologies, e.g. multiple zones. They can be combined in physical networks. All four types of the logical networks must be present for a functional system.
| **Q14.** Clock-skew assumption for signature v3 `expires` enforcement — | ||
| proposed: operator's responsibility to keep client + management-server | ||
| clocks roughly in sync. Confirm. *(maps to §5)* |
There was a problem hiding this comment.
yes, (reminder @vishesh92 , this might be one we need to add to the security model page: https://cloudstack.apache.org/security)
| **Q15.** Confirm the filesystem-permissions inventory for sensitive | ||
| files: JCEKS keystore, Root CA private key, JaSypt key + IV, | ||
| `db.properties`. Who owns them, what mode? *(maps to §5, §10)* |
There was a problem hiding this comment.
@potiuk , do you expect a csv type list of all files in a functioning system?
| **Q16.** Confirm the "what CloudStack does not do to its host" inventory | ||
| in §5: no child processes besides agent `Script` invocations / system | ||
| VM provisioning; signal-handlers via servlet container default; | ||
| environment-variable consumption confined to documented set. Anything to | ||
| add? *(maps to §5)* |
| **Q21.** API request size cap and cluster/agent RPC payload size cap — | ||
| are these explicitly bounded, or "whatever Jetty / NIO defaults give"? | ||
| *(maps to §6, §9)* |
There was a problem hiding this comment.
the UI-server has an explicit request size: in org.apache.cloudstack.ServerDeamon; DEFAULT_REQUEST_CONTENT_SIZE = 1048576. For other components the sizes are capped by the upstream components used.
| **Q22.** `api.throttling.*` and per-account resource limits — proposed: | ||
| these are the entire DoS-protection surface, with no engine-level | ||
| guard. Confirm. *(maps to §6, §9, §10)* |
There was a problem hiding this comment.
confirmed, this is processed at API access check. (default of api.throttling.enabled == false!!)
| **Q23.** Decompression behaviour on uploaded QCOW2 / RAW / OVA — proposed: | ||
| no engine-side cap; per-account storage limits + hypervisor limits are | ||
| the bound. Confirm. *(maps to §6, §9)* |
| **Q24.** Same-host non-`cloudstack` UID — proposed: game-over, no defence | ||
| claimed. Confirm. *(maps to §7, §9)* |
There was a problem hiding this comment.
@vishesh92 , I think there is a refusal to add a host with the same IP, does this include a UID check as well (or should it)?
| **Q25.** Side-channel observers (cache, branch, hypervisor-shared) — out | ||
| of scope (proposed). *(maps to §7, §9)* |
There was a problem hiding this comment.
@potiuk I agree with cache and hypervisor-shared, (if I understand them correctly) but I do not understand “branch” in this context. Can you explain?
| **Q26.** Byzantine-internal-peer threshold — confirm CloudStack makes no | ||
| BFT claim, so any compromised cluster peer or agent with a valid | ||
| Root-CA-issued cert is unbounded (proposed). *(maps to §7, §9)* |
There was a problem hiding this comment.
Agreed. @vishesh92 we might want to add some issues/feature proposals in this area. This will only work in larger clusters, not in single or dual machine clusters (if I understand the byzantine model correctly).
| **Q27.** §8 P9 memory-safety — JVM-bounded; is the reachability | ||
| boundary correctly "in-model for the JSON API + B5 input; out-of-model | ||
| for native hypervisor SDK bugs that surface as `Throwable`"? *(maps to | ||
| §8 P9, §9)* |
There was a problem hiding this comment.
§8 P9 says that "CloudStack's own server-side code is Java”, implying it is only java. This is not correct. No limitation on implementation languages is presumed. Claims about the JVM are correct.
For instance ocaml and python code can run on hypervisors, as well as bash, and go is used on the management server. This list may be not complete now or in the future.
| **Q28.** §8 P10 listing-scope — confirm the §10 invariant "`list*` | ||
| responses are scoped to the principal's domain/account/project". And: | ||
| is information leak via error messages / async-job status / event log | ||
| an in-model concern, or accepted? *(maps to §8 P10, §9, §11)* |
There was a problem hiding this comment.
Regular system logs (log4j for instance) are exempt. Other than these, all information leaks are a concern.
| **Q29.** Data-at-rest encryption — confirm CloudStack delegates entirely | ||
| to storage layer / hypervisor (LUKS, Ceph encryption, vSphere VM | ||
| Encryption); no CloudStack-layer encryption of guest volumes. *(maps to | ||
| §9)* |
| **Q30.** Constant-time comparison — confirm that *only* the API | ||
| signature path uses `ConstantTimeComparator`. Login password compare, | ||
| session cookie compare, console-token compare — none documented | ||
| constant-time. Is that intentional? *(maps to §8, §9)* |
There was a problem hiding this comment.
I do not understand; Is the fact that it is not documented intentional?
If this is indeed the question, than yes, this is a lack of feature. (pretty sure I am missing the point, cc @vishesh92 )
| **Q31.** Time-of-check-to-time-of-use between RBAC check at API entry | ||
| and orchestration on agent fleet — confirm mid-job RBAC revocation is | ||
| **not** retroactively enforced (proposed). *(maps to §9)* |
| **Q32.** TLS posture on `:8080` vs `:8443` — confirm production deploys | ||
| behind TLS on `:8443` or behind a TLS-terminating reverse proxy; a bare | ||
| `:8080` HTTP API is dev-only. *(maps to §5a, §10)* |
| **Q33.** `security.encryption.key` reuse across environments — confirm | ||
| that reusing the JaSypt key + IV across staging and production is a | ||
| documented misuse. *(maps to §11)* |
| **Q34.** Should this document live at `docs/threat-model.md` in | ||
| `apache/cloudstack`, or as a page on `cloudstack.apache.org/security/`? | ||
| Or both, with one canonical and the other linked? *(meta)* |
There was a problem hiding this comment.
in my not so humble opinion: cloudstack.apache.org/security should contain an excerpt of, and a link to threat-model.md, the later being the source of truth. @vishesh92 ?
| **Q35.** Is there an existing CloudStack threat-model document | ||
| (Confluence, internal, or a `[SECURITY]`-tagged dev@ thread) that this | ||
| should reconcile against rather than supersede? *(meta — §3.1a of the | ||
| rubric)* |
There was a problem hiding this comment.
cloudstack.apache.org/security/ is the only security model at this moment and this should enhance this by providing it with a source of truth.
| **Q36.** What kind of change should trigger a revision (proposed list in | ||
| §12 — confirm or correct)? *(meta, §12)* |
There was a problem hiding this comment.
One that I would add is a change in the extension mechanisms implemented by CloudStack.
| **Q38.** Confirm the structural decision to keep the four satellite repos | ||
| as separate delta models (`cloudstack-go-threat-model-draft.md`, | ||
| `cloudstack-cloudmonkey-threat-model-draft.md`, | ||
| `cloudstack-terraform-provider-threat-model-draft.md`, | ||
| `cloudstack-kubernetes-provider-threat-model-draft.md`) inheriting §3 | ||
| / §4 / §7 from this document. *(meta, §3 item 9)* |
There was a problem hiding this comment.
confirmed, these are not the system core. They can not be used without the core but the core can be used without them. There is in fact an added hierarchy to the repos in that cloudstack-go is a dependency to the other three.
Summary
This PR adds an initial draft of a project-level security
threat-model document (
draft-THREAT-MODEL.md) so that automatedsecurity scanners running against this repository have a
maintainer-facing reference for which classes of findings are
in-scope vs. out-of-scope for the project.
The document follows the rubric format used by several other ASF
projects piloting improved security-model discoverability for
agentic scanners. Every claim carries a provenance tag:
the project website), cited inline.
knowledge; the PMC has not confirmed.
to this draft. (Zero in this initial draft.)
Draft stats:
§14 is the highest-leverage section: answering each question
either promotes one (inferred) tag to (maintainer) or corrects
the underlying claim.
Why "draft-" prefix?
The file is named
draft-THREAT-MODEL.mdrather thanSECURITY-THREAT-MODEL.mdbecause this is a proposal for thePMC to review — please correct, reject, or discuss as needed.
Once the PMC ratifies (or substantially edits) the content, the
file can be renamed in a follow-up PR and a discoverability
scaffold (
AGENTS.md→SECURITY.md→ the model) added soscanners can mechanically follow the chain.
What this is, and what it is not
This is not a security audit. It is a working triage document
— the reference a triager holds against an inbound report to
decide whether the report is about a CloudStack vulnerability or
about caller misuse / operator misconfiguration / an out-of-scope
concern.
The draft was generated by an automated agentic security scan
being piloted by the ASF Security team; the discoverability work
is independent of any specific scan run.
How to review
replaces the inferred claim with the correct one.
dispositions) — those govern how a vulnerability report would
be triaged.
Reply edits / corrections inline on the PR, or to the original
security@apache.orgthread, whichever fits the PMC's workflow.🤖 Generated with Claude Code