Skip to content

Releases: DataDog/datadog-agent

7.78.1

23 Apr 08:49
d4a406f

Choose a tag to compare

Agent

Prelude

Released on: 2026-04-23

Enhancement Notes

  • The Agent's embedded Python has been upgraded from 3.13.12 to 3.13.13
  • Agents are now built with Go 1.25.9.

Bug Fixes

  • Fix missing signature on macOS Agent packages
  • Fix the system-probe SELinux policy module failing to load on RHEL 7 with policydb module version 21 does not match my version range 4-19. The module is now compiled against modular policy version 19, which is the highest version supported by RHEL 7 and is backward-compatible with newer RHEL releases.
  • Add logic to include integrations that do not have a manifest.json file in the Agent.
  • Adds the tasks/agent.py file to the list of files used to compute the global omnibus cache.

Datadog Cluster Agent

Prelude

Released on: 2026-04-23 Pinned to datadog-agent v7.78.1: CHANGELOG.

Bug Fixes

  • Fixed a Cluster Agent issue where container-targeted APM library injection could mount a tracing library into all application containers in a pod instead of only the annotated container.

7.78.0

15 Apr 12:53
88ace41

Choose a tag to compare

Agent

Prelude

Released on: 2026-04-15

Upgrade Notes

  • APM OTLP: Changed attribute precedence behavior when looking up OpenTelemetry semantic convention attributes that have multiple equivalent keys (e.g., http.status_code vs http.response.status_code, deployment.environment vs deployment.environment.name).

    Previous behavior: When both old and new semantic convention keys existed, the lookup would check ALL keys in span attributes before checking ANY key in resource attributes. So whichever key appeared in span attributes would win, regardless of which key was in resource attributes.

    New behavior: The lookup now uses a per-concept precedence order. For each semantic concept, the registry defines an ordered list of attribute keys; the first key that has a value is returned. The precedence order (which key takes priority) depends on the concept and may prefer either the newer or the older convention key. Span vs resource precedence (which map is checked first) is unchanged and still depends on the function.

    Who is affected: This change only affects users who have the same concept represented by different convention-version keys in span vs resource attributes. The returned value may now come from a different key than before, according to the concept's precedence order.

    This is an uncommon configuration since most instrumentation libraries use consistent semantic convention versions across span and resource attributes.

New Features

  • Allows the Agent to get an API key in exchange for an AWS cloud authorization proof. This allows you to use your AWS credentials against Datadog and removes the need for you to manage an API key. More details can be found here: https://docs.datadoghq.com/account_management/cloud_provider_authentication/

  • The autoscaling vertical controller now supports in-place vertical pod resizing.

  • Add a new configuration provider, which schedules new instances of KSM checks to generate metrics from CustomResourceDefinitions.

    This new provider works with the kube_crd listener which listens for CustomResourceDefinitions created on the cluster and triggers a new autodiscovery-service for each one.

    This new configuration provider must use the standard kubernetes GroupVersionKind format in its AdvancedADIdentifier section to apply to a matching CustomResourceDefinition.

    The rest of the configuration is a standard KSM configuration instance.

  • CNM - Add 7 per-connection TCP congestion signals: rto_count (RTO loss events), recovery_count (fast recovery events), reord_seen (send-side reordering), rcv_ooopack (receive-side out-of-order packets), delivered_ce (ECN CE-marked segments), ecn_negotiated (ECN negotiation status), and probe0_count (zero-window probes). Collected via eBPF on CO-RE and runtime-compiled tracers, Linux only.

  • dd-procmgrd can now read process definitions and manage child process lifecycles with graceful shutdown.

  • dd-procmgrd now supervises managed processes with configurable restart policies, exponential backoff, and burst limiting.

  • dd-procmgrd can now manage the DDOT (Datadog Distribution of OpenTelemetry) collector process via a dual-mode mechanism. When a processes.d/datadog-agent-ddot.yaml config is present, dd-procmgrd takes over DDOT lifecycle management; otherwise the existing systemd unit manages it directly.

  • Automatic SBOM generation for running containers via system-probe

  • Runtime usage tracking - identifies which files and packages are actively accessed by running processes

  • Security enrichment - flags SUID binaries and processes running as root

  • gRPC streaming from system-probe to core agent for efficient SBOM forwarding

  • Automatic CWS policy generation based on running container SBOMs.

  • On Windows, the APM SSI installer now automatically enables system-probe to report injection telemetry from the ddinjector driver.

  • Kubernetes pod check annotations: Invalid JSON in pod check annotations (ad.datadoghq.com/<container>.checks) now produces a clear error message in the "Configuration Errors" section of agent status. A new CLI command agent validate-pod-annotation validates annotation JSON from a file or stdin and exits with an error on invalid syntax, so you can catch mistakes before applying annotations to pods.

Enhancement Notes

  • The agent now supports explicitly set cluster names that start with a digit or contain underscores.
  • Add source and provider fields to rtloader API and add integration_security configuration properties.
  • secrets-generic-connector: Allow configuration of X-Vault-AWS-IAM-Server-ID header for Hashicorp Vault AWS authentication method. Helps to prevent different types of replay attacks.
  • APM: When a 403 is received from the backend, trigger an API Key refresh, and retry the payload submission.
  • Secret Generic Connector: The Azure Key Vault backend now supports Service Principal authentication with client secret or client certificate, in addition to Managed Identity. Credentials are configured under the azure_session block (azure_tenant_id, azure_client_id, azure_client_secret or azure_client_certificate_path).
  • Agents are now built with Go 1.25.8.
  • dd-procmgr: Add CLI for the dd-procmgrd process manager. Processes are addressable by name or UUID.
  • dd-procmgrd: Add gRPC server over Unix socket with read-only RPCs (List, Describe, GetStatus) for querying managed process state.
  • dd-procmgrd: Add multi-process startup ordering via after/before config fields with topological sort and reverse shutdown order.
  • dd-procmgrd: Add write RPCs (Create, Start, Stop, ReloadConfig, GetConfig) for runtime control of managed processes.
  • The disk check now falls back to lsblk when blkid fails or returns no labels for disk label tagging. This ensures label and device_label tags are present on disk metrics even when the agent runs as a non-root user, since lsblk reads from sysfs and does not require elevated privileges.
  • Document kubernetes_use_endpoint_slices flag
  • Add X-Datadog-Additional-Tags header with hostname and agent version to data-streams-message HTTP requests.
  • DSM: The kafka_actions check now automatically inherits Schema Registry configuration (URL, credentials, TLS, OAuth) from the kafka_consumer integration, enabling schema registry support without additional configuration.
  • DDOT now sets deployment_type on the Datadog extension to daemonset by default, or gateway when Gateway mode is enabled.
  • The podman_db_path configuration option now accepts a comma-separated list of paths to support monitoring containers from multiple users simultaneously (e.g. root and rootless users). Example: podman_db_path: "/var/lib/containers/storage/db.sql,/home/myuser/.local/share/containers/storage/db.sql". When podman_db_path is not set, the Agent automatically discovers Podman databases for the root user and for all users under /home/. Log collection (logs_config.use_podman_logs) is also updated to work correctly with both explicit multi-path configuration and auto-discovery.
  • FIPS variants of the ddot-collector and agent -full images are now published.
  • Remote Agent Management is now enabled by default on FIPS environments when Remote Configuration is explicitly enabled.
  • The resource discovery agent (system-probe-lite) now wraps system-probe, acting as a loader for it. system-probe-lite will automatically fallback to system-probe when one of the following is true:
    • `discovery.enabled is set to false
    • discovery.useSystemProbeLite is set to false (the default).
    • Any other non-discovery feature of system-probe is enabled.
  • Bumped the Security Agent policies to v0.78.0

Security Notes

  • The CMD API gRPC server is now configured to require client certificates (mTLS).

Bug Fixes

  • APM: Fix an issue where SQL stats group resources longer than 5000 characters were truncated before obfuscation, causing the trace-agent to fail to parse mid-token fragments and log an error instead of correctly obfuscating the query.

  • Use atomic file replacement (write to temp file then rename) when writing APM workload selection policy files, preventing concurrent readers from seeing partially-written data.

  • Fixed a race condition in the logs auditor where Flush() could write a stale registry to disk during a transport restart. The auditor now drains all pending payloads from its input channel before flushing, ensuring file offsets are up to date and reducing duplicate log processing after a TCP-to-HTTP transport switch.

  • [DBM] Bump go-sqllexer to v0.2.1 to fix the following bugs:

    • Fixes table name metadata extraction to correctly collect all table names from comma-separated table lists (e.g., SELECT * FROM t1, t2).
  • The diagnose command now returns an error if an API key is not configured.

  • Fixes panic when advanced dispatching is disabled when KSM Core is ran as a cluster check.

  • Fix support of Kafka actions for configurations where kafka_connect_str is a list.

  • Fixed a bug in the disk Go check (diskv2) where partition enumeration could hang indefinitely on Windows when an orphaned or offline volume is present on the system. The check now applies the configured timeout (default 5s) to partition discovery and guards against spawning duplicate goroutines on subsequent check runs, preventing permanent worker starvation, goroutine buildup, and high CPU utilization.

  • The process check now reports the correct...

Read more

7.77.3

08 Apr 21:04
b5ce415

Choose a tag to compare

Agent

Prelude

Released on: 2026-04-08

Bug Fixes

  • Fixes an issue where Cloud Network Monitoring would not resolve NAT'd cluster IPs when using Cilium to replace kube-proxy.

Datadog Cluster Agent

Prelude

Released on: 2026-04-08 Pinned to datadog-agent v7.77.3: CHANGELOG.

7.77.2

01 Apr 12:03
90afe57

Choose a tag to compare

Agent

Prelude

Released on: 2026-04-01

Enhancement Notes

  • Hide GUI app by default for MacOS agent per-user install.
  • Windows: Add PAR self-enrollment to installer.

Bug Fixes

  • Fixes Workload Protection raw-packet eBPF programs when multiple packet filters are compiled together. The generated assembly reused register R8 both as the event pointer expected by the filter chain and to hold immediate values, which corrupted the pointer and caused the kernel BPF verifier to reject the program. The code now uses a separate register for those immediates so the pointer is preserved across filters.
  • Workload Protection: resolves an issue in in-kernel cgroup tracking, enabling packet filtering to be correctly applied to containers.

Datadog Cluster Agent

Prelude

Released on: 2026-04-01 Pinned to datadog-agent v7.77.2: CHANGELOG.

7.77.1

24 Mar 07:38
4464fb6

Choose a tag to compare

Agent

Prelude

Released on: 2026-03-24

Enhancement Notes

  • Agents are now built with Go 1.25.8.

Bug Fixes

  • Fixed a bug introduced in 7.77.0 that prevents system-probe from starting on Fargate environments when Workload Protection is enabled
  • Fixed a command injection vulnerability in the Private Action Runner's inline PowerShell script execution. Parameter values are now assigned as PowerShell single-quoted string literals in a preamble instead of being substituted directly into the script body, preventing arbitrary code execution via crafted parameter inputs.

Datadog Cluster Agent

Prelude

Released on: 2026-03-24 Pinned to datadog-agent v7.77.1: CHANGELOG.

7.77.0

18 Mar 15:30
6127339

Choose a tag to compare

Agent

Known Issues

  • A bug introduced in this release prevents system-probe from starting on Fargate environments when Workload Protection is enabled. There is currently no workaround and the recommendation at this time is to downgrade to Agent v7.76.3 or upgrade to v7.77.1 when it becomes available.

Prelude

Released on: 2026-03-18

Upgrade Notes

  • APM OTLP: The datadog.* namespaced span attributes are no longer used to construct Datadog span fields. Previously, attributes like datadog.service, datadog.env, and datadog.container_id were used to directly set corresponding Datadog span fields. This functionality has been removed and the Agent now relies solely on standard OpenTelemetry semantic conventions.

    Exceptions:

    The configuration option otlp_config.traces.ignore_missing_datadog_fields (and corresponding environment variable DD_OTLP_CONFIG_IGNORE_MISSING_DATADOG_FIELDS) is deprecated and no longer has any effect. The Agent now always uses standard OTel semantic conventions.

    Migration: If you were using datadog.* attributes, switch to the standard OpenTelemetry semantic conventions:

    • datadog.serviceservice.name
    • datadog.envdeployment.environment.name (OTel 1.27+) or deployment.environment
    • datadog.versionservice.version
    • datadog.container_idcontainer.id

    Who is affected: Users who explicitly set datadog.* attributes (other than datadog.host.name and datadog.container.tag.*) in their OpenTelemetry instrumentation to override default field mappings. Users relying solely on standard OpenTelemetry semantic conventions are not affected.

New Features

  • Add dd-procmgrd, a minimal Rust daemon for the Datadog process manager. The daemon starts, logs, and waits for a shutdown signal. It does not provide user-facing functionality.
  • Add a new listener based on all Custom Resource Definitions (CRDs) found on the cluster.
  • Logs pipeline failover: Added automatic failover capability to prevent log loss when compression blocks pipelines. When a pipeline becomes blocked during compression, log messages are automatically routed to healthy pipelines. N router channels (one per pipeline) distribute tailers via round-robin, each with its own forwarder goroutine that handles failover independently across all pipelines. Enable with logs_config.pipeline_failover.enabled: true (default: false). When all pipelines are blocked, backpressure is applied to prevent data loss.
  • The system memory check on Linux can now collect memory pressure metrics from /proc/vmstat to help detect memory pressure before OOM events occur. To enable, set collect_memory_pressure: true in the memory check configuration. New metrics: system.mem.allocstall (with zone tag), system.mem.pgscan_direct, system.mem.pgsteal_direct, system.mem.pgscan_kswapd, system.mem.pgsteal_kswapd.
  • APM: Add initial support for converting trace payload formats to the new "v1.0" format. This feature is disabled by default but can be enabled by adding the feature flag "convert-traces" to apm_config.features. It is not recommended to use this flag without direction from Datadog Support.
  • Integrate the Private Action Runner into the Datadog Cluster Agent.
  • The Private Action Runner (PAR) now runs in the Datadog Cluster Agent with improved identity management for Kubernetes environments. PAR identity (URN and private key) is now stored in a Kubernetes secret and shared across all DCA replicas using leader election. The leader replica handles enrollment and secret creation, while follower replicas wait for and read the shared identity. This enables multiple DCA replicas to execute PAR tasks using a single cluster identity, eliminating the need for per-replica enrollment.
  • Add a Windows PowerShell example config for private action runner scripts.
  • APM: Add image_volume-based library injection as an alternative to init containers and csi driver (experimental). Available only for Kubernetes 1.33+. This provides faster pod startup.
  • Autodiscovery template variables are now supported in ad.datadoghq.com/tags and ad.datadoghq.com/<container>.tags Kubernetes pod annotations. Template variables are resolved at runtime, enabling dynamic tagging based on pod and container metadata. This allows centralized tag configuration that applies to all checks, logs, and traces without hardcoding pod-specific values.
  • Start the Windows Private Action Runner service alongside the Agent when private_action_runner.enabled is set in datadog.yaml.
  • On Windows, the private action runner binary is now included in the MSI installer and registered as the datadog-agent-action Windows service. The service is installed as demand-start with a dependency on the main Agent service, and its credentials and ACLs are managed alongside the other Agent services during install, upgrade, and repair.
  • Add runPredefinedPowershellScript action to the Private Action Runner on Windows. This action allows running predefined PowerShell scripts (inline or file-based) with optional parameter templating, JSON schema parameter validation, environment variable allowlisting, configurable timeouts, and a 10 MB output limit.
  • On Windows, the Agent stops the private action runner service during MSI upgrades and fleet-driven stop-all operations so it is shut down alongside the Agent.

Enhancement Notes

  • The Agent's embedded Python has been upgraded from 3.13.11 to 3.13.12.

  • Add ntp.offset metric with source:intake tag to monitor clock drift using Datadog intake server timestamps. Original ntp.offset metric calculated from an NTP server is now tagged source:ntp.

  • As of Kubernetes version 1.33, the Endpoint API object has been deprecated in favor of EndpointSlice. Autodiscovery now supports the use of an EndpointSlice listener and provider to collect endpoint checks. To enable this feature, set kubernetes_use_endpoint_slices to true in your Datadog Agent configuration.

  • Add bucket label to image_resolution_attempts telemetry to track gradual rollout progress.

  • Added a private action runner bundle that exposes the Network Path traceroute functionality through the getNetworkPath action.

  • Sends telemetry for synthetics tests run on the agent, including checks received, checks processed, and error counts for test configuration, traceroute, and event platform result submission.

  • Added support for two new configurations for tag-based gradual rollout in Kubernetes SSI deployments. The gradual rollout can be configured using the following parameters:

    • DD_ADMISSION_CONTROLLER_AUTO_INSTRUMENTATION_GRADUAL_ROLLOUT_ENABLED: Whether to enable gradual rollout (default: true)

    • DD_ADMISSION_CONTROLLER_AUTO_INSTRUMENTATION_GRADUAL_ROLLOUT_CACHE_TTL: The cache TTL duration for the gradual rollout image cache (default: 1h)

      • This cache is used to store the mapping of mutable tags to image digest for the gradual rollout, and setting this TTL helps prevent the image resolution from becoming stale.
  • Agent metrics now include a connection_type tag with a value of tcp, uds, or pipe for lib-to-agent communications.

  • Automatically collect the team tag when a Kubernetes resource has a team label or annotation and explicit team tag extraction is not configured.

  • Enables the agent to support built-in credentials like IRSA for AWS cloud environments.

  • Bump go-sqllexer to v0.1.13, improving SQL obfuscation performance and fixing incorrect tokenization of multi-byte UTF-8 characters (e.g., CJK characters, full-width punctuation).

  • Agents are now built with Go 1.25.7.

  • NDM: Cisco SD-WAN interface metadata now includes the is_physical field to distinguish physical from virtual interfaces (loopback, tunnel). cEdge interfaces also include the type field with the IANA interface type number.

  • In the Cluster Autoscaling controller, use Kubernetes client update instead of patch.

  • On ECS Managed Instances, detect hostname from IMDS when the agent runs in daemon mode.

  • On ECS Managed Instances with daemon scheduling, the agent uses ECS_CONTAINER_METADATA_URI_V4 environment variable as a fallback signal for v4 availability.

  • Expose a new metric kube_apiserver.api_resource that holds the name, kind, group, and version of all known cluster-wide (non namespaced) resources on the cluster.

  • Add new DDOT feature gate 'exporter.datadogexporter.DisableAllMetricRemapping' to disable all client-side metric remapping.

  • Increases the reliability of namespaceLabelsAsTags and namespaceAnnotationsAsTags for new pods by caching the last seen namespace metadata.

  • Added a new, optional configuration setting for journald logs: default_application_name. If set to a non-empty string, the value will replace "docker" as the default application name for contained based journald logs. If set to an empty string, the application name will be determined by the systemd journal fields, like all non-container based journald logs.

  • Simplified location permission detection on MacOS by removing the first detection with polling at the time of app startup. The permission detection now happens only at the time of WLAN data collection.

  • Use config flag 'request_locati...

Read more

7.76.3

09 Mar 09:43
fa64a68

Choose a tag to compare

Agent

Prelude

Released on: 2026-03-09

Security Notes

  • Bump github.com/cloudflare/circl to fix v1.6.3 to fix CVE-2026-1229.
  • Fixed a limited out-of-bounds memory read and DoS vulnerability in Windows kernel driver while handling TLS traffic. The host must have the ddnpm kernel driver service running, by having system_probe_config and network_config enabled, to be affected. This configuration is not enabled by default. Query with PowerShell: Get-Service ddnpm Query with command prompt: sc query ddnpm

Bug Fixes

  • Fixed IPv6 address matching logic that caused network traffic to be tracked incorrectly. Fixed failed classification of HTTP DELETE requests. Added additional memory handling and overflow safety checks.

Datadog Cluster Agent

Prelude

Released on: 2026-03-09 Pinned to datadog-agent v7.76.3: CHANGELOG.

7.76.2

05 Mar 09:05
0c76c1b

Choose a tag to compare

Agent

Prelude

Released on: 2026-03-05

Bug Fixes

  • The infra_mode tag is now correctly added to system.cpu.user on Windows when infrastructure_mode is not set to "full", matching the behavior of the Linux cpu check.

Datadog Cluster Agent

Prelude

Released on: 2026-03-05 Pinned to datadog-agent v7.76.2: CHANGELOG.

7.76.1

26 Feb 13:07
ca1d15d

Choose a tag to compare

Agent

Prelude

Released on: 2026-02-26

Security Notes

  • APM: On span tags, add obfuscation for ACL command.

Bug Fixes

  • Fixes a rare crash in the system-probe process caused by concurrent access to an internal LRU cache.
  • Fix a Windows file-permission issue that prevented workload selection policy files from being updated after the initial write.
  • Fixed a bug in the disk Go check (diskv2) where custom tags from one check instance would leak into metrics from other instances. Tags are now correctly isolated per instance.
  • GPU: ensure gpu.nvlink.speed metric is emitted in Blackwell or newer devices.

Datadog Cluster Agent

Prelude

Released on: 2026-02-26 Pinned to datadog-agent v7.76.1: CHANGELOG.

7.76.0

23 Feb 10:05
1c45a92

Choose a tag to compare

Agent

Prelude

Released on: 2026-02-23

Upgrade Notes

  • DDOT now submits Fleet Automation metadata through the upstream datadogextension, which is enabled by default. As a result, your DDOT configuration will now appear under the OTel Collector tab. If you configured otelcollector.converter.features, you may need to add the datadog feature to enable Fleet Automation, as DDOT Fleet Automation metadata is no longer submitted through the ddflareextension.

New Features

  • Allow users to filter agent check instances using a new --instance-id parameter, which filters by the instance hash found in the agent status.

  • Add privateactionrunner binary in Agent artifacts to allow running actions using the Agent, and enable running it on Linux. The binary is disabled by default. To enable it, set privateactionrunner.enabled: true in your configuration file.

  • Integration check failures are now automatically reported to the Agent Health Platform component when enabled via health_platform.enabled: true. This provides structured health issue tracking with:

    • Detailed error context including check name, error message, and configuration source
    • Actionable remediation steps for debugging check failures
    • Automatic issue resolution when checks recover
    • Integration with the health platform telemetry and reporting system

    This feature helps users proactively identify and troubleshoot integration issues across their fleet.

  • The Agent Profiling check now supports automatic Agent termination after flare generation when memory or CPU thresholds are exceeded. This feature is useful in resource-constrained environments where the Agent needs to be restarted after generating diagnostic information.

    Enable this feature by setting terminate_agent_on_threshold: true in the Agent Profiling check configuration. When enabled, the Agent uses its established shutdown mechanism to trigger graceful shutdown after successfully generating a flare, ensuring proper cleanup before exit.

    Warning: This feature will cause the Agent to exit. This feature is disabled by default and should be used with caution.

  • Experimental support the ConfigSync HTTP endpoints over unix sockets with agent_ipc.use_socket: true (defaults to false).

  • Implements the flare command for the otel-agent binary. Now you can run otel-agent flare directly in the otel-agent container to get OTel flares.

  • Adds system info metadata collection for macOS end-user devices.

  • Adds system info metadata collection for Windows end-user devices.

  • Added GPU runtime discovery support for ECS EC2 environments. The Datadog Agent can now detect GPU device UUIDs assigned to containers by extracting the NVIDIA_VISIBLE_DEVICES environment variable from the Docker container configuration. This enables GPU-to-container mapping for GPU metrics without requiring the Kubernetes PodResources API, which is not available in ECS environments.

  • After falling back to TCP, the Logs Agent periodically retries to establish HTTP and upgrades the connection once HTTP connectivity is available.

  • Container logs now include a LogSource tag indicating whether each log message originated from stdout or stderr. This applies to logs parsed via Docker and Kubernetes CRI runtimes.

  • Added paging file metrics to the Windows memory check for pagefile.sys usage.

Enhancement Notes

  • Add a new global_view_db variable to AWS Autodisovery templates. By default this is the value of the datadoghq.com/global_view_db tag on the instance or cluster.

  • Add NotReady endpoint processing to be on par with EndpointSlices processing.

  • The agentprofiling check now retries flare generation 2 times with exponential backoff (1 minute after first failure, 5 minutes after second failure) when flare creation or sending fails. This improves reliability when encountering transient failures during flare generation.

  • Adds a kubernetes_kube_service_new_behavior flag (default false) to alter kube_service tag behavior. If the flag is set to true, kube_service tag is attached unconditionally. Previously, the tag was only attached when the Kubernetes service has the status Ready.

  • APM: Add custom protobuf encoder for trace writer v1 with string compaction to reduce payload size.

  • Extended the autodiscovery secret resolver to support refreshing secrets.

  • Agents are now built with Go 1.25.7.

  • The datadog-installer setup command now prints human-readable errors instead of mixing JSON and text.

  • Added GPUDeviceIDs field to the workloadmeta Container entity to store GPU device UUIDs. This field is populated by the Docker collector in ECS environments from the NVIDIA_VISIBLE_DEVICES environment variable (e.g., GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx).

  • The GPU collector now uses GPUDeviceIDs from workloadmeta as the primary source for GPU-to-container mapping in ECS, with fallback to procfs for regular Docker environments and PodResources API for Kubernetes.

  • GPU: add new tag gpu_type to the GPU metrics to identify the type of GPU (e.g., a100, h100).

  • Improve eBPF conntracker support by using alternate probes when the primary probe is unavailable, enabling compatibility with GKE Autopilot and other environments running Google COS.

  • The logs.dropped metric now tracks dropped logs for both TCP and HTTP log transports. Previously, this metric was only available when using TCP transport. Customers can now monitor dropped logs with a single unified metric regardless of which transport protocol is configured, making it easier to detect and troubleshoot log delivery issues.

  • The logs agent now supports using start_position: beginning and start_position: forceBeginning with wildcard file paths. Previously, configurations like path: /var/log/*.log with start_position: beginning would fail validation. The agent's fingerprinting system when enabled prevents duplicate log reads during file rotation, making this combination safe to use.

  • Site config URLs are now lowercased for consistent handling.

  • APM: Add tags databricks_job_id, databricks_job_run_id, databricks_task_run_id, config.spark_app_startTime, config.spark_databricks_job_parentRunId to the default list of tags that are known to not be credit card numbers so they are skipped by the credit card obfuscator.

  • Add option to switch on/off Infra-Attribute-Processor for traces in the OTLP ingest pipeline.
    otlp_config:
    traces:
    infra_attributes:
    enabled: false

    These settings can be configured in the Agent config file or by using the environment variables.

  • The Datadog Agent now collects AWS Spot preemption events (requires IMDS access) as Datadog events.

  • Added network_config.dns_monitoring_ports, which is a list of DNS ports Cloud Network Monitoring will use to monitor DNS traffic on.

  • Automatically tag, but don't aggregate, multiline logs. Logs are tagged with the number of other logs they could potentially be aggregated with.

  • Update the histogram helpers API in the pkg/opentelemetry-mapping-go/otlp/metrics package. The API now accepts accept pointers to the OTLP data points, and returns blank DDSketches when the pointer is nil.

  • Update image resolution attempt telemetry to include the tag specified in the configuration, and remove the registry and digest_resolution tags.

  • Windows: Add a new flare artifact agent_loaded_modules.json listing loaded DLLs with metadata (full path, timestamp, size, perms) and version info (CompanyName, ProductName, OriginalFilename, FileVersion, ProductVersion, InternalName). Keeps <flavor>_open_files.txt for compatibility.

Deprecation Notes

  • The command agent diagnose show-metadata inventory-otel has been removed. To display DDOT metadata, you can query the datadog extension endpoint: http://localhost:9875/metadata.

Bug Fixes

  • Properly scrub sensitive information from Kubernetes pod specifications in agent flares. Environment variables with sensitive names are now redacted.
  • Fixed a bug where long Kubernetes event bundles were being truncated by dogweb.
  • APM: Fix a bug where the Agent would log a warning when the DD_APM_MODE environment variable was unset.
  • Properly parse the image_tag tag when defining a container spec that uses both an image tag and a digest like nginx:1.23@sha256:xxx.
  • Updates tag enrichment logic to retry on failed tag resolution attempts. This regression was introduced in #41587 on Agent v7.73+. Impacts origin detection on cgroup v2 runtimes with DogStatsD, which led to tags not being enriched, even if origin detection was possible by using other methods like container ID from socket or ExternalData.
  • Fixed a regression in the Go-native disk check (diskv2) where a failure in IO counter collection (e.g. ERROR_INVALID_FUNCTION from DeviceIoControl on Windows Server 2016) caused all disk metrics to be discarded, including successfully collected partition/usage metrics such as system.disk.total, system.disk.used, and system.disk.free. IO counter collection is now best-effort: known errors such as ERROR_INVALID_FUNCTION are logged at debug level, while unexpected errors are logged as warnings. Neither prevent partition metrics from being reported.
  • Fleet installer: ensure the DD_LOGS_ENABLED environment variable is honored again when running setup scripts, so Windows installs using the new installer flow properly. Sets logs_enabled in datadog.yaml.
  • Fixes a bug introduced in 7.73.0 that can cause a remote Agent update through Fleet Automation to fail to restore the previous version if the MSI fails a...
Read more