Understanding Azure Local monitoring: which tool does what (Insights, Metrics, Alerts, Workbooks, LENS) : DataON Support Portal

TL;DR: Azure Local monitoring is built on Azure Monitor and split across several tools — Insights, Metrics, Workbooks (including the LENS fleet workbook), Alerts, and the built-in Health Service. Each covers a different slice of what your system is doing, and they aren't interchangeable. Pick the right tool by matching what you need to see against what each one was built to show.

The tools at a glance:

Tool	What it shows	Cost at defaults	Where it lives
Insights	Health, performance, and usage across cluster/nodes/VMs/storage in pre-built dashboards	Pay-per-GB ingested (first 5 GB/month free per billing account); cost grows with retention and log volume	Azure portal, backed by Log Analytics
Metrics	60+ numeric performance counters in near-real-time time-series charts	Free at default settings; longer retention or custom multi-dimensional metrics can incur cost	Azure Monitor Metrics Explorer
Performance Metrics dashboards (Workbooks)	Pre-built single-cluster and multi-cluster performance views built on top of Metrics	Free at default settings	Azure Workbooks
LENS Workbook	Fleet-wide view of every Azure Local cluster — capacity, updates, health, workloads, compliance	Free; open-source, published by Microsoft	Azure Workbooks (imported JSON)
Alerts	Notifies you when conditions are met — four flavors: Health, Log, Metric, Recommended	Health alerts free; log/metric alerts free at default scope but custom rules and notification volume can incur cost	Azure Monitor Alerts
Health Service	Built-in OS service that detects 80+ fault conditions and exposes them on-cluster	Free, built into Azure Local	On the cluster (PowerShell)

Insights — what it does (and doesn't):

Use it when you want a curated, opinionated view of cluster health. Insights ingests Windows event logs (microsoft-windows-health/operational and microsoft-windows-sddc-management/operational) plus five performance counters into a Log Analytics workspace and renders them in Microsoft-built workbooks. It's the right place to see health faults aggregated across nodes, watch CPU/memory/network/storage trends over hours or days, and view VM state distribution.

Insights does NOT give you real-time data — it can take up to 15 minutes for the Azure Monitor Agent to surface new data, and the SDDC Management cache dumps every hour by default. It's also not recommended as the primary signal for high-severity alerting. And unlike Metrics, you pay per GB ingested — adding more data sources to your Data Collection Rule or extending workspace retention will increase your bill.

Metrics — what it does (and doesn't):

Use it when you need fast numeric data: CPU percentage, IOPS, latency, throughput, RDMA bandwidth, VHD/volume performance, VM counters. Metrics arrive in near-real-time and are queryable for up to 30 days on a single chart (retained 93 days at default). You can pin charts to Azure dashboards or build metric alert rules on top.

Metrics does NOT show health faults, event log content, or anything qualitative — only numeric counters. It also won't tell you "is this node healthy?" the way Insights or the Health Service does. It depends on the AzureEdgeTelemetryAndDiagnostics extension being installed.

Performance Metrics dashboards — what they do (and don't):

Pre-built Azure Workbooks that aggregate Metrics into ready-made views: storage performance, network performance, and compute. There's a single-cluster version with LUN-level drilldown and a multi-cluster version filterable by subscription, resource group, or cluster.

The dashboards do NOT add any data of their own — they're a presentation layer on top of Metrics. If the telemetry extension isn't installed, the dashboards render blank.

LENS Workbook — what it does (and doesn't):

LENS is a free, open-source Azure Monitor Workbook published and maintained by Microsoft that gives you a fleet-wide view of every Azure Local cluster you manage in a single Azure portal page. Where Insights and the Performance Metrics dashboards focus on one cluster at a time (or a small handful), LENS is purpose-built for fleet operations:

Predictive capacity forecasting per cluster — days-until-warning projections for CPU, memory, and storage
vCPU-to-pCPU overcommit ratios and top fleet hot-spots for CPU, memory, storage, IOPS, latency, and network
Update first-time-success rates and durations by version, with one-click handoff to Azure Update Manager
45-day Arc Resource Bridge offline clock tracking (green/yellow/red), plus orphaned ARB detection
Full VM inventory across the fleet (including VMs not onboarded to Arc), AKS Arc node inventory with Prometheus metrics
Version compliance against the 6-month support window, license verification (AHB, WSS, AVS)
Cross-subscription and cross-tenant via Azure Lighthouse — MSP-ready by default

Setup is a five-minute JSON import into Azure Monitor Workbooks. No agents, no separate portal, no licensing.

LENS does NOT replace Insights or Metrics — it sits on top of them and on Azure Resource Graph. If your clusters aren't surfacing data to Azure (extension missing, Arc disconnected), LENS shows the gaps but can't fix the source. It also does not generate alerts on its own — pair it with the Alerts capabilities below.

Alerts — four types, four jobs:

Health alerts fire on the 80+ predefined OS health faults (disk, network, storage QoS, CPU, memory, cluster config). System-generated, no rules to author, no Log Analytics required, no cost at default settings. Enabled once via the Capabilities tile, which installs the AzureEdgeAlerts extension. This is the right tool for "tell me when something is structurally wrong."
Metric-based alerts are customer-defined, evaluated at regular intervals against Metrics data. Use these for thresholds on numeric counters (for example, CPU greater than 80% for 15 minutes). Default scope is free; custom multi-dimensional alerts at higher frequencies can incur cost.
Recommended alerts are a starter pack of metric-based alerts with sensible defaults (CPU %, available memory, volume latency, network throughput). Enable them in one click as a baseline.
Log-based alerts are customer-defined and run KQL queries against Log Analytics data on a schedule. Use these only when the condition genuinely requires log content (event correlation, specific event IDs, multi-source joins). Cost depends on query frequency and volume of data scanned. Not recommended for high-severity scenarios — log ingestion delay can be 15 minutes or more.

Alerts do NOT remediate problems automatically by default — they notify (email, SMS, webhook, action group). Automation requires wiring an action group to a Logic App, Function, or Runbook.

Health Service — what it does (and doesn't):

Built into Azure Local itself, running on the cluster. It generates the underlying faults that Health alerts surface in Azure. It also automates parts of the physical disk lifecycle (retirement, resiliency restoration, indicator-light blinking) and exposes Get-HealthFault and Get-ClusterPerformanceHistory on the cluster directly. It's the canonical on-cluster source of truth for health state.

The Health Service does NOT push to Azure on its own — you need Health alerts enabled (which installs AzureEdgeAlerts) for those faults to reach Azure Monitor. By itself it's only visible to someone running PowerShell on the cluster.

A note on cost:

"Free" and "no cost" in this article refer to default configurations — Microsoft does not charge you for the capability itself or for the baseline data it collects. You can still rack up Azure Monitor charges by:

Extending Log Analytics workspace retention beyond defaults
Adding more event channels, performance counters, or data sources to a Data Collection Rule (Insights)
Creating high-frequency or multi-dimensional metric alerts
Running log-based alert queries at short intervals or against large data volumes
Sending notifications through action groups at high volume (SMS in particular)

Treat the default-enabled monitoring as a free baseline, and budget separately for whatever you turn up beyond it.

Which tool to pick for what:

"Is my cluster healthy right now?" → Health Service (on-cluster) or the Health tab in Insights (in Azure)
"What's my CPU and IOPS doing over the last hour?" → Metrics, or the Performance Metrics dashboard
"Show me capacity headroom and update health across my whole fleet" → LENS workbook
"Notify me when a disk fails or capacity exceeds 80%" → Health alerts cover the disk fault; a Recommended or metric-based alert covers the capacity threshold
"Run a custom KQL query against events across all my clusters" → Insights and Log Analytics, then a log alert
"One page showing performance across every cluster I own" → Multi Cluster Performance Metrics workbook, or LENS for a broader operational view

Going forward:

Treat the tool families as additive, not alternatives. A typical production deployment turns on Insights (for health and performance dashboards and log alerting), keeps Metrics flowing (free baseline plus 93-day retention), enables the Recommended alert rules (CPU, memory, latency thresholds), turns on Health alerts (no-cost coverage for the 80+ OS faults), and imports the LENS workbook for fleet-level visibility. Pricing kicks in only when you extend retention, author custom queries, or scale notifications well past defaults.

Optional details — deep dive links:

For step-by-step setup of each capability, see Microsoft's documentation:

Understanding Azure Local monitoring: which tool does what (Insights, Metrics, Alerts, Workbooks, LENS) Print

Related Articles