OPC UA + gateway hardening (security, reconnect replay, multi-alarm, secure profile)#485
OPC UA + gateway hardening (security, reconnect replay, multi-alarm, secure profile)#485mfaferek93 wants to merge 11 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR hardens both the southbound OPC UA plugin and the northbound gateway by adding OPC UA SecureChannel/user-auth configuration, improving alarm handling across reconnects (including read-based replay), supporting multi-alarm routing per condition identity, and shipping an opt-in “secure field profile” for gateway deployments that fails closed on invalid auth/TLS config.
Changes:
- Add OPC UA client security configuration (SecurityPolicy/MessageSecurityMode, certs/trust store, user identity) plus logging of the effective security profile.
- Implement active-condition replay on (re)subscribe with a configurable strategy and read-based fallback, including reconciliation logic to avoid false clears on scan failures.
- Add a secure gateway parameter preset + hardening documentation, and make gateway startup fail closed when TLS/auth is enabled but misconfigured.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/ros2_medkit_plugins/ros2_medkit_opcua/test/test_opcua_plugin.cpp | Adds unit tests for replay-strategy parsing and reconcile false-clear guard. |
| src/ros2_medkit_plugins/ros2_medkit_opcua/test/test_opcua_client.cpp | Adds unit tests for disconnected read-replay behavior and security parsing helpers. |
| src/ros2_medkit_plugins/ros2_medkit_opcua/test/test_node_map.cpp | Adds unit tests for multi-alarm mapping resolution and associated-values parsing. |
| src/ros2_medkit_plugins/ros2_medkit_opcua/src/opcua_poller.cpp | Adds per-source event select specs, multi-alarm routing, and reconnect replay (ConditionRefresh + read fallback) with reconcile logic. |
| src/ros2_medkit_plugins/ros2_medkit_opcua/src/opcua_plugin.cpp | Parses new security + replay settings; logs effective security profile; uses resolved severity override from deliveries. |
| src/ros2_medkit_plugins/ros2_medkit_opcua/src/opcua_client.cpp | Implements security config application (certs/trust, user identity) and read-based condition scanning. |
| src/ros2_medkit_plugins/ros2_medkit_opcua/src/node_map.cpp | Makes nodes: optional, adds multi-alarm mappings + associated values, expands collision checks and lookup behavior. |
| src/ros2_medkit_plugins/ros2_medkit_opcua/README.md | Documents multi-alarm mappings, associated values, security config, and replay strategies. |
| src/ros2_medkit_plugins/ros2_medkit_opcua/include/ros2_medkit_opcua/opcua_poller.hpp | Declares replay strategy, reconcile helpers, and expanded delivery/runtime metadata. |
| src/ros2_medkit_plugins/ros2_medkit_opcua/include/ros2_medkit_opcua/opcua_plugin.hpp | Declares security-profile logging helper. |
| src/ros2_medkit_plugins/ros2_medkit_opcua/include/ros2_medkit_opcua/opcua_client.hpp | Adds security enums/config fields and read-based condition snapshot API. |
| src/ros2_medkit_plugins/ros2_medkit_opcua/include/ros2_medkit_opcua/node_map.hpp | Adds mapping/associated-value config types and alarm resolution API. |
| src/ros2_medkit_plugins/ros2_medkit_opcua/docker/gateway_params.yaml | Clarifies demo config is insecure and points to secure profile for deployments. |
| src/ros2_medkit_plugins/ros2_medkit_opcua/docker/Dockerfile.gateway | Ships the gateway secure params profile into the container image. |
| src/ros2_medkit_plugins/ros2_medkit_opcua/CMakeLists.txt | Forces open62541 encryption backend (OpenSSL) to support secured OPC UA profiles. |
| src/ros2_medkit_gateway/test/test_gateway_node.cpp | Adds tests asserting fail-closed behavior for invalid auth/TLS when enabled. |
| src/ros2_medkit_gateway/src/gateway_node.cpp | Changes TLS/auth builder failures to fatal + throw (fail closed) instead of silently disabling. |
| src/ros2_medkit_gateway/design/hardening.rst | Adds gateway hardening guide describing the secure field profile and deployment checklist. |
| src/ros2_medkit_gateway/config/gateway_params.secure.yaml | Adds hardened parameter preset enabling auth/TLS/restricted CORS/rate limiting/locking. |
bd919de to
4c49f68
Compare
bburda
left a comment
There was a problem hiding this comment.
Additional findings outside the diff:
-
ros2_medkit_opcua/package.xmlhas notest_depend/exec_dependonros2_medkit_fault_manager, but the newtest_opcua_secured.test.pylaunchesfault_manager_node. The secured CI job passes only because it explicitly names--packages-up-to ... ros2_medkit_fault_manager; per the project's integration-test dependency rule, an isolated per-package build would not guarantee fault_manager is built first and the required secured gate would fail. Add<test_depend>ros2_medkit_fault_manager</test_depend>. -
The C event trampoline
on_event_trampoline_c(opcua_client.cpp) calls the user callback with no try/catch, andrun_iteratecatches onlyopcua::BadStatus. This PR adds new throwing work reachable from that callback (resolve_alarm, associated-value string concatenation inon_event,conditions_.emplace), so astd::bad_allocor any non-BadStatus exception unwinds through open62541's C frames - undefined behavior. Wrap the callback dispatch intry { ... } catch (...) { /* log, swallow */ }.
…klist Add gateway_params.secure.yaml turning on the existing JWT auth, TLS, restricted CORS and rate limiting for appliance / plant-network deployments, plus design/hardening.rst (checklist + credential/cert provisioning). The appliance image ships the preset; the demo params now point to it.
Extend OpcuaClientConfig + connect() with SecurityPolicy, MessageSecurityMode, a client application-instance certificate, a server trust store with reject-untrusted, and user identity (anonymous / username-password / X.509). Compile in the OpenSSL encryption backend. Anonymous + None stay the default.
Replay already-active conditions on (re)subscribe with a configurable strategy (ConditionRefresh, read-based fallback that browses sources and reconciles the fault set, auto, off) so servers rejecting ConditionRefresh still recover their alarms. Route one source's events to distinct faults by condition identity (ConditionName / SourceNode / EventType) and enrich descriptions with the event Message plus configured associated values (SD_n). Allow event-alarm-only maps.
Fail closed when auth/TLS is enabled but its config is invalid (throw instead of serving unauthenticated/plaintext). Warn loudly when OPC-UA credentials would cross an unencrypted channel. Stop the read-based replay from false-clearing active faults on a failed/disconnected scan, and keep conditions whose ActiveState read fails transiently. Refs #477 #478 #479 #480
Fail loud on unrecognized OPC-UA security_policy/security_mode/user_auth instead of silently defaulting to an insecure profile. Keep read-replay observed-but-unmapped conditions in the seen set so reconcile does not clear them, and reject a non-sequence nodes: section in the node map. Drop unused locals in EXPECT_THROW, fix the hardening table syntax and bulk_data.max_upload_size name, and add hardening to the design toctree. Refs #477 #478 #479 #480
Call ConditionRefresh on ConditionType i=2782 per Part 9 5.5.7, not the Server object i=2253 (root cause of BadNodeIdUnknown). Read-fallback now never clears faults for a source not positively known to expose condition instance nodes (EventNotifier-only servers like S7-1500 are preserved), filters by Retain, excludes Disabled conditions, follows HasCondition i=9006, and keeps conditions on a transient browse failure. Adds optional require_confirm_for_clear (default unchanged) for Confirm-less servers. Refs #478
Negotiates an encrypted Basic256Sha256 + SignAndEncrypt channel with a username token against an encryption-enabled test_alarm_server, covering apply_security_config()'s success path: secured read, CONFIRMED alarm, and wrong-password fail-closed. CTest 77 skip unless required; new CI job. Refs #477
…eads Gate the modeled-source insert on at least one reliably-read condition so a transient browse error cannot flip an EventNotifier-only source into modeled and wipe live event faults. Never Skip a reliably-active condition, and flag a transient Retain read failure so a retained condition is kept, not cleared. Refs #477 #478
Rephrase two docstrings to imperative mood (D401) and wrap four over-length lines (E501) in test_opcua_secured.test.py. Refs #477
- reconcile tracked faults on ConditionRefresh RefreshEnd so an alarm that cleared while offline no longer stays latched Confirmed - reject contradictory SecurityPolicy/MessageSecurityMode pairings and contain event-callback exceptions at the open62541 C boundary - reject duplicate (entity_id, fault_code) across event_alarms to stop clear-by-code flapping; add fault_manager test_depend - docs: warn that auth.jwt_secret is readable over DDS and via ps; fix the CORS default; relabel accept-any trust as insecure (not TOFU) Refs #477 #478 #479 #480
The reject_untrusted: false path uses AcceptAll, which pins nothing and re-accepts any server cert on every connect. Calling it TOFU overstated its security and contradicted the corrected code comments and the env-var table. Relabel it as INSECURE, lab only, not trust-on-first-use.
0dec989 to
b44ab2c
Compare
Closes #477 #478 #479 #480
Tested: 196 unit tests pass. Real-PLC acceptance for the security/replay/alarm paths is pending hardware validation.