opentelemetry-sdk: Add ability to refresh process sensitive Resource attributes#5280
opentelemetry-sdk: Add ability to refresh process sensitive Resource attributes#5280herin049 wants to merge 11 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds fork-safety for process-sensitive SDK Resource attributes (e.g., process.pid) by re-running process-sensitive resource detectors in child processes and propagating refreshed resources into trace, metrics, and logs providers (including already-created tracers/loggers). It also introduces provider-level update_resource() APIs and adds subprocess-based fork tests.
Changes:
- Add
ResourceDetector.is_process_sensitive()and_get_process_sensitive_resource()to selectively re-run only process-sensitive detectors. - Add
update_resource()toTracerProvider,MeterProvider, andLoggerProvider, and wireos.register_at_fork(after_in_child=...)to refresh resources in child processes. - Add fork-based subprocess tests plus direct
update_resource()unit tests for traces, metrics, and logs.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| opentelemetry-sdk/src/opentelemetry/sdk/resources/init.py | Adds process-sensitivity marker on detectors and helper to aggregate only process-sensitive resources. |
| opentelemetry-sdk/src/opentelemetry/sdk/trace/init.py | Adds tracer/provider resource refresh on fork and public update_resource() that updates existing tracers. |
| opentelemetry-sdk/src/opentelemetry/sdk/metrics/_internal/init.py | Adds meter provider resource refresh on fork and update_resource() support. |
| opentelemetry-sdk/src/opentelemetry/sdk/_logs/_internal/init.py | Adds logger provider resource refresh on fork, update_resource(), and propagation to active loggers. |
| opentelemetry-sdk/tests/resources/test_resources.py | Adds unit tests for process-sensitivity defaults and _get_process_sensitive_resource(). |
| opentelemetry-sdk/tests/trace/test_trace.py | Adds TracerProvider.update_resource() test and fork subprocess test. |
| opentelemetry-sdk/tests/trace/scripts/tracer_provider_resource_after_fork.py | New subprocess script validating trace resource PID refresh after fork. |
| opentelemetry-sdk/tests/metrics/test_metrics.py | Adds MeterProvider.update_resource() test and fork subprocess test. |
| opentelemetry-sdk/tests/metrics/scripts/meter_provider_resource_after_fork.py | New subprocess script validating metric resource PID refresh after fork. |
| opentelemetry-sdk/tests/logs/test_logs.py | Adds LoggerProvider.update_resource() test and fork subprocess test. |
| opentelemetry-sdk/tests/logs/scripts/logger_provider_resource_after_fork.py | New subprocess script validating log resource PID refresh after fork. |
| .changelog/5280.added | Changelog entry for the new fork/resource refresh capability. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| [ | ||
| detector | ||
| for detector in _build_resource_detectors() | ||
| if detector.is_process_sensitive() |
There was a problem hiding this comment.
Maybe a dumb idea but I figured I'd share.
What if we instantiate the resource values with a subclass of string (or add a new Protocol interface with repr support) that reads the live value from the process or somehow indicates the value is process sensitive?
I think for service.instance.id, you could return a UUID consistently unless the process start time and pid are detected as having changed (this might not work across al unixes, haven't checked)
One downside, if people are reading such a resource attribute frequently it could be slow (makes a syscall).
There was a problem hiding this comment.
I think it's a good idea (since it's avoids having to register a post-fork handler), but as you mention there are certainly edge cases that need to be handled and the concern of additional overhead. This hypothetical lazy string object would also need to be thread safe since it would cross thread boundaries quite frequently via batch processors.
My original reasoning with leaning towards this approach is that the logic added in this PR is only executed if the process is forked, and given that >99% of Python applications never fork, I'd prefer to accept a slightly more operationally complex solution that has no impact on the vast majority of users that never fork over a solution that has a potential impacts on all users.
The other "solution" is to just accept this limitation and document that forking an already initialized OTel Python process is not supported. Python has already started raising a deprecation warning if you attempt to fork with multiple threads, and fork is no longer the default start method on any platform.
Changed in version 3.14: This is no longer the default start method on any platform. Code that requires fork must explicitly specify that via get_context() or set_start_method().
Changed in version 3.12: If Python is able to detect that your process has multiple threads, the os.fork() function that this start method calls internally will raise a DeprecationWarning. Use a different start method. See the os.fork() documentation for further explanation.
Description
This PR adds support for refreshing process sensitive SDK resources after
os.fork(). It introduces provider level resource updates for traces, metrics, and logs, marks process resource detection as process sensitive and usesos.register_at_fork()to re-run any process sensitive resource detectors in child processes. This ensures attributes such asprocess.pidare updated for newly emitted spans, metrics, and logs after a fork.Changes
update_resource()support for SDK tracer, meter and logger providers.os.fork()usingos.register_at_fork().Open Questions
Should
update_resource()be part of the public SDK provider API or should resource refresh remain an internal implementation detail for fork handling? An argument for making it public is if users want to have the ability to update the resource on providers in reaction to other changes.Fixes #5279
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Does This PR Require a Contrib Repo Change?
Checklist: