VRMWARE.NL

December 10, 2025December 27, 2025

Monitoring datastores under /mnt in Linux with VCF Operations Telegraf agent

Recently, I have spent a lot of time monitoring Linux servers with the Telegraf agent in VCF Operations. This includes metrics such as /boot, /var, /var/log, etc. This was fairly easy to implement. However, I also wanted to be able to monitor datastores under /mnt. It turned out that these metrics are not available by default after installing the Telegraf agent.

Objective: Raise an alert when a datastore under /mnt exceeds 75% capacity.

After installing the Telegraf agent on a Linux server, the following directory was created: /opt/vmware. In this directory, I created the following bash script: pct_used.sh. Please note that a script is executed from Telegraf using the system account arcuser. The arcuser account must have read and execute permissions for the script. Set the permissions using the following command:

chmod 755 /opt/vmware/pct_used.sh

The script below is a generic script for reading the % used from the datastores under /mnt. By providing the correct arguments you will receive a value (Use%) that you can use as a metric in VCF Operations as input for the alert. In the following examples:

store1 = Linux Server
001 = Datastore 001
002 = Datastore 002

Examples:
root@vrmware001:/opt/vmware# ./pct_used.sh /mnt/store1/001
Result: 70

root@vrmware001:/opt/vmware# ./pct_used.sh /mnt/store1/002
Result: 64

Please note that the system account arcuser also has read + execute permissions on the /mnt and /mnt/server directory. In the examples mentioned above, this means that these rights must be located on /mnt/store1. Here’s how to do it.

chmod 755 /mnt
chmod 755 /mnt/store1

With the next command you can check if the arcuser account have permissions the read the datastores.

sudo -u arcuser df -P /mnt/store1/001 where /mnt/store1/001 should be replaced with your own datastore path.

#!/usr/bin/env bash
# Usage: ./pct_used.sh /mnt/store1/001

set -euo pipefail

MOUNT_PATH="${1:-}"

# Check if argument is provided
if [[ -z "$MOUNT_PATH" ]]; then
  echo "Usage: $0 <mount_path>" >&2
  exit 2
fi

# Check if argument is provided
if [[ ! -d "$MOUNT_PATH" ]]; then
  echo "Path not found: $MOUNT_PATH" >&2
  exit 3
fi

# Get percentage used via df; NR==2 = the data row
# +0 forces numeric output (strips '%')
pct_used="$(df -P "$MOUNT_PATH" 2>/dev/null | awk 'NR==2{print $5+0}')"

# Validate: empty or not numeric?
if [[ -z "$pct_used" || ! "$pct_used" =~ ^[0-9]+$ ]]; then
  echo "Unable to read usage for: $MOUNT_PATH" >&2
  exit 4
fi

# Output only the number to stdout (for Telegraf)
echo "$pct_used"

Now the script must be launched from VCF Operations.

Go to Manage Telegraf Agents section, select your favourite Linux server and add a Custom Script.

If the custom script is configured correctly, the first data should arrive after 5 to 10 minutes. You can see one of the following two statuses at the Telegraf agent.

This means that the data is being received.

This means that the data has not changed since the previous measurement point.

Go to the inventory view of the Linux server and select Custom Script. Check if the status is Normal (green).

If the status is normal, go to the Metrics tab, Metrics, Scripts, Custom Script (store1-001). Double click on the Custom Script and there is the Metric value.

With these metrics, we can now create alerts (Use%) for datastores on Linux servers mounted under /mnt.

I tested it in my lab on both Aria Operations 8.18.5 and VCF Operations 9.0.1. It works on both versions.

May 27, 2025May 27, 2025

ESXi Lockdown Menu Script

The following Powershell script allows you to set the ESXi Lockdown Mode of all ESXi hosts within a vCenter to “Disabled” or “Normal”. You can also get a summary of the current Lockdown status of all ESXi hosts. This overview is displayed in an HTML page that automatically opens in the MS Edge browser.

After starting the script, you get the following menu with options:

The output of option 0, Show current Lockdown Mode status looks as follows.

This way you have an overview of the Lockdown Status of all ESXi hosts in a vCenter.

#Version 1.0
#2025-05-27
#ESXi Set Lockdown Mode and Script

# Prompt for vCenter and connect
Clear-Host
$vCenterServer = Read-Host "Enter the vCenter Server to connect to"
try {
    Connect-VIServer -Server $vCenterServer -ErrorAction Stop
    Write-Host "Connected to $vCenterServer" -ForegroundColor Green
} catch {
    Write-Host "Failed to connect to $vCenterServer. Exiting script." -ForegroundColor Red
    exit
}

function Show-Menu {
    Clear-Host
    Write-Host ""
    Write-Host "======================" -ForegroundColor Cyan
    Write-Host " Lockdown Mode Menu"
    Write-Host "======================" -ForegroundColor Cyan
    Write-Host "0. Show current Lockdown Mode status"
    Write-Host "1. Disable Lockdown Mode (lockdownDisabled)"
    Write-Host "2. Enable Lockdown Mode to 'normal' (lockdownNormal)"
    Write-Host "Q. Quit"
    $choice = Read-Host "Enter your choice (0, 1, 2 or Q)"
    return $choice
}

function Set-LockdownMode {
    param (
        [string]$mode
    )

    $ESXiHosts = Get-VMHost

    foreach ($ESXiHost in $ESXiHosts) {
        try {
            $ESXiHostView = Get-View -Id $ESXiHost.Id
            $accessManager = Get-View -Id $ESXiHostView.ConfigManager.HostAccessManager
            $accessManager.ChangeLockdownMode($mode)
            Write-Host "Lockdown Mode set to '$mode' for $($ESXiHost.Name)" -ForegroundColor Green
        } catch {
            Write-Host "Error on host $($ESXiHost.Name): $_" -ForegroundColor Red
        }
    }
}

function Generate-HTMLReport {
    $ESXiHosts = Get-VMHost
    $rows = @()

    foreach ($ESXiHost in $ESXiHosts) {
        try {
            $lockdownMode = $ESXiHost.ExtensionData.Config.LockdownMode
            $rows += "<tr><td>$($ESXiHost.Name)</td><td>$lockdownMode</td></tr>"
        } catch {
            $rows += "<tr><td>$($ESXiHost.Name)</td><td>Error: $_</td></tr>"
        }
    }

    $html = @"
<html>
<head>
    <title>Lockdown Mode Status</title>
    <style>
        body { font-family: Arial; }
        table { border-collapse: collapse; width: 100%; }
        th, td { border: 1px solid #ddd; padding: 8px; }
        th { background-color: #f2f2f2; }
    </style>
</head>
<body>
    <h2>Lockdown Mode Status Report</h2>
    <p>Generated on $(Get-Date)</p>
    <table>
        <tr><th>Hostname</th><th>Lockdown Mode</th></tr>
        $($rows -join "`n")
    </table>
</body>
</html>
"@

    $path = "$env:TEMP\\LockdownStatus.html"
    $html | Out-File -FilePath $path -Encoding UTF8
    Start-Process "msedge.exe" $path
}

# Main loop
do {
    $choice = Show-Menu

    switch ($choice.ToLower()) {
        "0" { Generate-HTMLReport }
        "1" { Set-LockdownMode -mode "lockdownDisabled" }
        "2" { Set-LockdownMode -mode "lockdownNormal" }
        "q" {
            Write-Host "Disconnecting from vCenter..." -ForegroundColor Cyan
            Disconnect-VIServer -Server $vCenterServer -Confirm:$false
            Write-Host "Disconnected. Exiting script." -ForegroundColor Cyan
        }
        default { Write-Host "Invalid choice. Please try again." -ForegroundColor Yellow }
    }

} while ($choice.ToLower() -ne "q")

April 29, 2025April 30, 2025

Which Vm have a VM Override Check Script

Does the following sound familiar? While patching a VMware cluster, one ESXi host does not want to enter Maintenance Mode. It appears that one or more vms cannot be migrated to another ESXi host. Manually these vms can be migrated and the host still goes into maintenance mode.

The cause is usually a VM Override that has been configured. How useful would it be if, prior to patching the ESXi hosts in the cluster, you have an overview of the vms with a VM Override.

The following script lists all vms with a VM Override in an HTML page sorted by cluster.

#Version 1.0
#2025-04-28
#Check VMs in Cluster with a VM Override

# Connect to vCenter
$vcServer = Read-Host "Enter the vCenter Server name or IP address"
Connect-VIServer -Server $vcServer

# Retrieve all clusters
$clusters = Get-Cluster

# Create an empty array to store results
$results = @()

# Loop through each cluster
foreach ($cluster in $clusters) {
    Write-Output "Processing cluster: $($cluster.Name)"

    $vms = Get-VM -Location $cluster

    foreach ($vm in $vms) {
        $automationLevel = $vm.DrsAutomationLevel

        $results += [PSCustomObject]@{
            ClusterName        = $cluster.Name
            VMName             = $vm.Name
            DRSAutomationLevel = $automationLevel
        }
    }
}

# Filter: Only VMs with different automation levels
$filteredResults = $results | Where-Object {
    $_.DRSAutomationLevel -ne "AsSpecifiedByCluster" -and
    $_.DRSAutomationLevel -ne "UseClusterSettings"
}

# HTML file path
$outputHtmlPath = "D:\Temp\vm_overrides.html"

# HTML header with basic table CSS styling
$htmlHeader = @"
<html>
<head>
<title>VM DRS Overrides Report</title>
<style>
body { font-family: Arial, sans-serif; }
table { border-collapse: collapse; width: 100%; }
th, td { border: 1px solid black; padding: 8px; text-align: left; }
th { background-color: #f2f2f2; }
</style>
</head>
<body>
<h2>VMs with DRS Automation Level Overrides (VM Overrides) in vCenter $vcServer</h2>
"@

$htmlFooter = @"
</body>
</html>
"@

# Generate the HTML table from the data
$htmlTable = $filteredResults | ConvertTo-Html -Fragment -Property ClusterName, VMName, DRSAutomationLevel

# Combine everything
$htmlContent = $htmlHeader + $htmlTable + $htmlFooter

# Write to the HTML file
$htmlContent | Out-File -Encoding UTF8 -FilePath $outputHtmlPath

Write-Output "HTML report saved to: $outputHtmlPath"

# Open the HTML file with Microsoft Edge
Start-Process "msedge.exe" -ArgumentList $outputHtmlPath

# Disconnect from vCenter
Disconnect-VIServer -Server * -Confirm:$false

The HTML page looks like this.

April 11, 2025April 11, 2025

VMware Aria Operations Telegraf Endpoint Agent installation fails on Bare Metal Linux server

Recent I was running into the following error when installing an VMware Aria Operations Telegraf Endpoint Agent on a Bare Metal Linux server.

Error: “Failed to update VM bootstrap failure message. Error code: {“message”:”Request Validation Failed. Reason is \”Collection of resource property contents can not be null or empty\”.”,”httpStatusCode”:400,”apiErrorCode”:1517}400
install agents failed at Runtime user: arcuser is not having password-less privileges. If runtime user was created manually, ensure required privileges are available. Also ensure that /etc/sudoers file has a line ‘#includedir /etc/sudoers.d’. Please check /root/arc_install_tmp_dir/uaf_bootstrap.log on the endpoint VM.. please check logs for more detail.”

To solve this issue add the following line to /etc/sudoers.

@includedir /etc/sudoers.d

There is a Broadcom KB about this error but personally I don’t find this one very clear to me.

November 18, 2024November 18, 2024

PowerCLI script to get Syslog.Global.Host advanced setting

The following script may be useful if you are in the process of migrating vRealize Log Insight to a new appliance/cluster. You can use this script before, during and after migrating to check the settings of Syslog.Global.Loghost of all ESXi hosts in vCenter.

# Version 1.0
# 2024-11-17
# Check ESXi Syslog Global LogHost script with HTML output
# This script checks if specified Syslog Global LogHost are configured on all hosts in a datacenter and writes the output to an HTML file

# Connect to vCenter Server
$vCenter = "FQDN vCenter"
Connect-VIServer $vCenter

# Get all ESXi hosts in the cluster
$hosts = get-vmhost

# Initialize HTML content with styles for alternating row colors
$htmlContent = @"
<html>
<head>
    <title>ESXi Syslog Settings</title>
    <style>
        table {
            width: 100%;
            border-collapse: collapse;
        }
        th, td {
            padding: 8px;
            text-align: left;
            border: 1px solid #ddd;
        }
        tr:nth-child(odd) {
            background-color: white;
        }
        tr:nth-child(even) {
            background-color: lightgrey;
        }
    </style>
</head>
<body>
    <h1>ESXi Syslog Settings</h1>
    <table>
        <tr>
            <th>ESXi Host</th>
            <th>Syslog Global Loghost</th>
        </tr>
"@

# Loop through each ESXi host and get the syslog.global.loghost advanced setting
foreach ($esxi in $hosts) {
    $setting = Get-AdvancedSetting -Entity $esxi -Name 'syslog.global.loghost'
    $htmlContent += "<tr><td>$($esxi.Name)</td><td>$($setting.Value)</td></tr>"
}

# Add the current date and time
$currentDateTime = Get-Date -Format "yyyy-MM-dd HH:mm:ss"
$htmlContent += @"
    </table>
    <p>Report generated on: $currentDateTime</p>
</body>
</html>
"@

# Save HTML content to a file
$outputPath = "C:\Scripts\esxi_syslog_settings.html"
$htmlContent | Out-File -FilePath $outputPath

# Open the HTML file
Start-Process "msedge.exe" $outputPath

# Disconnect from vCenter Server
Disconnect-VIServer -Confirm:$false

For example: You can use the script during vRealize Log Insight in the following way:

Before migration
Check the current configured syslog endpoint

During migration
Check the current and new syslog endpoints are configured

After migration
Check the new configured syslog endpoint

November 15, 2024November 15, 2024

Check wether VIBs are installed or not on ESXi hosts in a cluster with VIB Report Script

In my last blog, I shared a Powershell script that can be used to remove VIBs from all ESXi hosts in a cluster. I have created a small Powershell script that you can run before and after removing of VIBs to check the availability of VIBs.

The output is displayed in an HTML file once the script is finished. The VIBs being checked are deﬁned in $arrayvibs. The location of the HTML file can be defined in $outputPath. The number of VIBs you can have checked depends on how many you define in $arrayvibs.

Here are a few examples of the HTML output files.

Example output before run VIB Check Report script to remove the VIBs “nenic” and “iavmd”.

Example output after run VIB Check Report script to remove the VIBs “nenic” and “iavmd”.

Please be aware that using this script is at your own risk!

# Version 1.0
# 2024-11-14
# Check VIBs script with HTML output
# This script checks if specified VIBs are installed on all hosts in a cluster and writes the output to an HTML file

# Function Check VIB
function CheckVIB {
    Param (
        $ESXi  # The EsxCli object
    )

    [array]$arrayvibs = @("nenic", "iavmd")
    $esxcli = Get-EsxCli -V2 -VMHost $ESXi
    $result = ""

    foreach ($vib in $arrayvibs) {
        if ($esxcli.software.vib.list.Invoke() | Where-Object {$_.Name -eq "$vib"}) {
            $result += "<tr style='background-color: #ffe6e6;'><td>$($ESXi.Name)</td><td>$vib</td><td>Found</td></tr>"
        } else {
            $result += "<tr style='background-color: #e6ffe6;'><td>$($ESXi.Name)</td><td>$vib</td><td>Not Found</td></tr>"
        }
    }

    return $result
}

# vCenter & Cluster Parameters
$vCenter = "FQDN vCenter"
$cluster = "Cluster"

# Connect vCenter
Try {Disconnect-VIServer * -Confirm:$false -ErrorAction SilentlyContinue | Out-Null}
Catch {}

Connect-VIServer $vCenter

$ESXis = Get-Cluster -Name $cluster | Get-VMHost | Sort-Object Name | Where-Object {$_.ConnectionState -eq 'Connected' -or $_.ConnectionState -eq 'Maintenance'}

# HTML header
$html = @"
<html>
<head>
    <title>VIB Check Report</title>
    <style>
        table { width: 100%; border-collapse: collapse; }
        th, td { border: 1px solid black; padding: 8px; text-align: left; }
        th { background-color: #f2f2f2; }
    </style>
</head>
<body>
    <h1>VIB Check Report</h1>
    <table>
        <tr>
            <th>Host</th>
            <th>VIB</th>
            <th>Status</th>
        </tr>
"@

foreach ($ESXi in $ESXis) {
    $html += CheckVIB -ESXi $ESXi
}

# HTML footer with creation date and time
$creationDate = Get-Date -Format "yyyy-MM-dd HH:mm:ss"
$html += @"
    </table>
    <p>Report generated on: $creationDate</p>
</body>
</html>
"@

# Output HTML to file
$outputPath = "C:\Scripts\VIB_Check_Report.html"
$html | Out-File -FilePath $outputPath

# Open the HTML file in Microsoft Edge
Start-Process "msedge.exe" $outputPath

# Disconnect vCenter
Write-Host "Disconnecting vCenter $vCenter"
Disconnect-VIServer -Confirm:$False | Out-Null

Write-Host "Report generated at $outputPath"

October 23, 2024November 13, 2024

VMware Cluster upgrade vSphere 7 to vSphere 8 with a single image is blocked by not supported VIBs

Recently, I wanted to upgrade a vSphere 7 non vSAN cluster to version 8 with a single image. During the compliance check it turned out that all hosts in the cluster were not compliant and therefore the upgrade could not be started. The following error was displayed.

There appeared to be two VIBs installed that were not supported. On the host’s commandline, I looked up the names of VIBs and noted them. The VIBs involved in this case were the following:

nenic
iavmd

These will be needed later in the Powershell script.

Since I don’t want to log into every server and then have to delete the VIBs via the CLI, I created the following script. I would like to thank my colleague Kabir very much for his time, explanation and mentoring. He has great scripting and automation skills and has written great articles that you can find here, whatkabirwrites.nl

Back to the script. The script removes the two VIBs on all hosts in a cluster. Before using the script be sure that the VIBs are not used. The following steps are performed by the script:

Host is put into maintenance mode
Check if VIBs are installed
If present, they are removed
Host is rebooted
Host goes out of maintenance mode
Next host

If a host is already in maintenance mode, it will remain in maintenance mode after the reboot.

At the top of the script, the Remove VIBs function is defined. Adjusting the setting $settings.dryrun = $true to $false really removes the VIBs. Without adjusting this, the VIBs are not removed and only verified to be present or not. Regardless of the value of $settings.dryrun = $true or $false the hosts are always put into maintenance mode and rebooted.

After executing the host reboot, the script waits 4 minutes before continuing. I built in this pause because nested ESXi hosts reboot so quickly that otherwise the script won’t enter or exit the wait loop. I have used the script in a test lab and it works very well.

Please be aware that using this script is at your own risk!

#Version 1.0
#2024-10-20
#Pre-Upgrade script ESXi7 to ESXi8
#This script remove vibs on all hosts in a cluster that blocks the upgrade

# Function Remove VIBs
function RemoveVIB {
    Param (
        $ESXi  # The EsxCli object
      
    )


[array]$arrayvibs = @("nenic", "iavmd")

$esxcli = Get-EsxCli -V2 -VMHost $ESXi

    foreach ($vib in $arrayvibs) {
        
        if ($esxcli.software.vib.list.Invoke() | where {$_.Name -eq "$VIB"}) {
            $settings = $esxcli.software.vib.remove.CreateArgs()
            $settings.dryrun = $true
            $settings.vibname = "$VIB"
            echo "$VIB VIB found, remove VIB $esx"
            $esxcli.software.vib.remove.Invoke($settings)        
        } 
        else {
            echo "No $VIB VIB found $esx"             
        } 
    }

}

# vCenter & Cluster Parameters
$vCenter = "FQDN vCENTER"
$cluster = "Cluster Name"


# Connect vCenter
Try {Disconnect-VIServer * -Confirm:$false -ErrorAction SilentlyContinue | out-null}
Catch {}
    
Connect-VIServer $vCenter 

$ESXis= Get-Cluster -Name $cluster| Get-VMHost | sort Name | where {$_.ConnectionState -eq 'Connected' -or $_.ConnectionState -eq 'Maintenance'}

foreach ($ESXi in $ESXis) {
    write-host "Working on host $($esxi)"
    
    # Host status is Maintenance Mode
    if ($ESXi.ConnectionState -eq 'Maintenance') {
        Write-host "Host is already in Maintenance Mode..."
        RemoveVIB -ESXi $ESXi
        Write-host "Host Reboot in Maintenance Mode..."
        write-host "Herstarten host $($esxi)"
        Restart-VMHost -VMHost $ESXi -Confirm:$False | Out-Null
        write-host "Waiting 4 minutes to make sure the host is disconnected before proceeding..."
        start-sleep 240
        $hoststat = (Get-VMHost -Name $ESXi.Name)
        While ($hoststat.ConnectionState -eq "NotResponding") {
        Write-host "Host is still rebooting... waiting 10sec..."
        Start-Sleep 10
        $hoststat = (Get-VMHost -Name $ESXi.Name)
       }
    }
    #Host status is not Maintenance Mode
    else {
        Write-Host "$($ESXi) is not in Maintenance Mode. Put host in Maintenance Mode..."
        write-host ""

        # Host in Maintenance Mode
        Set-VMHost -VMHost $ESXi -State Maintenance -Confirm:$False | Out-Null

        # Vib Remove
        RemoveVIB -ESXi $ESXi
        # Host Reboot
        Write-host "Host is in Maintenance Mode"
        write-host "Reboot host $($esxi)"
        Restart-VMHost -VMHost $ESXi -Confirm:$False | Out-Null
        write-host "Waiting 4 minutes to make sure the host is disconnected before proceeding..."
        start-sleep 240
        $hoststat = (Get-VMHost -Name $ESXi.Name)
        While ($hoststat.ConnectionState -eq "NotResponding") {
        Write-host "Host is still rebooting... waiting 10sec..."
        Start-Sleep 10
        $hoststat = (Get-VMHost -Name $ESXi.Name)
       }
        
    # Host uit MM
        write-host "Reboot on host $($esxi) is completed..."
        write-host ""
        Start-Sleep 20
        Set-VMHost -VMHost $ESXi -State Connected -Confirm:$False | Out-Null
   }  
    
    write-host "Done on host $($esxi)"
    write-host ""
}

#Disconnect vCenter
    write-host "Disconnecting vCenter $vCenter"
    Disconnect-VIServer  -Confirm:$False  | Out-Null

June 5, 2024June 13, 2024

Recieve VMware Aria Operations Alerts in Microsoft Teams

This blog is the follow-up to the one I wrote earlier this week about sending VMware Aria Operations for Logs alerts to Microsoft (MS) Teams. This article describes the steps to forward alerts from VMware Aria Operations (Aria Ops) to MS Teams.

Use Case – Increase the ability to notice prio 1 alerts outside of office hours with the available technical resources.

Goal – In addition to the standard Aria Ops alerts, we also want to have the option available to receive alerts through Microsoft Teams.

Solution – Use Aria Ops Webhook to send alerts to MS Teams

Setup – In order to have Aria Ops alerts sent to MS Teams, we need to set up two things.

Setup a MS Teams Connector to receive alerts
Setup the Aria Ops Webhook configuration to push alerts

Setup a MS Teams Connector to receive alerts

First, decide in which Teams Channel you want to receive the Aria Ops alerts or add a new Teams Channel. I have created a new Channel called VRMware VMware Alerts.

Click on the 3 dots on the right side and select Manage Channel.

Select Edit under Connectors.

Select Incoming Webhook and hit the Configure button.

Provide a friendly name, upload an image and create the connector.

After creation copy the url to the clipboard. We need this URL later to configure the AO4L Webhook.

Before we move on to Aria Ops we need to enable the channel notifications. Click once again on the 3 dots on the right side and select Channel notifications > All New Posts.

Setting up Aria Ops Webhook configuration to push alerts

Go to Configure > Alerts > open Outbound Settings > Add > Plugin Type > Webhook Notification Plugin. Choose a Instance name. Copy the Webhook URL that was copied from MS Teams connector. Save, Test option does not work.

Webload Payload

Go to Configure > Alerts > Payload Templates > Add > Create Payload Template > Details. Choose a frienly name and select the Outbound Method.

On the next tab, Object Content we select Host System and Hardware|Model as property. This is just an example. Choose what you like to monitor.

On the next tab, Payload Details we add the Payload code.

The Payload code that is used in the Payload Details tab.

{
    "text": "<b> ${ALERT_CRITICALITY} alert on ${RESOURCE_NAME}: ${ALERT_DEFINITION} at $(CREATE_TIME)</b>"
}

After completing the Webhook configuration we want test the Webhook configuration.

Create an alert, in our test case we use alert “Host has lost connection to vCenter Server“. Then we select the Outbound method “Webhook Notifcation Plugin” “VRMware MS-Teams“.

Next step is select the Payload Template.

Finally, we are almost there to send a test alert. We do this on the Test Notifcation tab. Hit the “Initiate Process” button.

We select the Alert Definitions “Host has lost connection to vCenter Server“. We also use the filter the alert definition… option.

Select a host and validate the configuration.

Now we can close the alert in Aira Ops. In Teams we have received the alert.

I hope this blog post will help you configure Aria Operations to send notifications to MS Teams. Please remember that MS Teams is not a monitoring tool. So be selective with the alerts you forward.

The payload code is based on the one that I found in this blog post from Brock Peterson.

June 3, 2024June 3, 2024

Recieve VMware Aria Operations for Logs Alerts in Microsoft Teams

This blog is meant to keep up to date on important infrastructure alerts during and outside office hours with standard products such as VMware Aria Operations for Logs (AO4L) and Microsoft (MS)Teams.

Use Case – Increase the ability to notice prio 1 alerts outside of office hours with the available technical resources.

Goal – In addition to the standard AO4L alerts, we also want to have the option available to receive alerts through Microsoft Teams.

Solution – Use AO4L Webhook to send alerts to MS Teams

Setup – In order to have AO4L alerts sent to MS Teams, we need to set up two things.

Setup a MS Teams Connector to receive alerts
Setup the AO4L Webhook configuration to push alerts

Setup a MS Teams Connector to receive alerts

First, decide in which Teams Channel you want to receive the AO4L alerts or add a new Teams Channel. I have created a new Channel called VRMware VMware Alerts.

Click on the 3 dots on the right side and select Manage Channel.

Select Edit under Connectors.

Select Incoming Webhook and hit the Configure button.

Provide a friendly name, upload an image and create the connector.

After creation copy the url to the clipboard. We need this URL later to configure the AO4L Webhook.

Before we move on to AO4L we need to enable the channel notifications. Click once again on the 3 dots on the right side and select Channel notifications > All New Posts.

Setup the AO4L Webhook configuration to push alerts

Go to the Administration section and open Configuration > Webhook > New Webhook. Choose a name. From the Endpoint drop down menu select Custom. Copy the Webhook URL that was copied from MS Teams connector. From the Content Type drop down menu select JSON and from the Action drop down menu select POST. The Webhook Payload will be described under the picture.

Webload Payload

The Webload Payload was the hardest part to figuring it out how the Webhook Payload layout should look like.

As far as I know from AO4L Webhook, only clear text can be used to send notifications to MS Teams. It’s possible to use one or more parameters in the script. For an overview of the parameters see the picture above here. Because the notictions are send in clear text it’s not possible to use all parameters. In our case not a problem because MS Teams is not used to replace monitoring software. It is just an additional option to be informed in a timely manner.

I wouldn’t go indepth how I found out the layout of the Webhook Payload code. That’s why I’m only sharing the code with you, so you can start testing for yourself.

{
   "type":"message",
   "attachments":[
      {
         "contentType":"application/vnd.microsoft.card.adaptive",
         "contentUrl":null,
         "content":{
            "$schema":"http://adaptivecards.io/schemas/adaptive-card.json",
            "type":"AdaptiveCard",
            "version":"1.2",
            "body":[
                {
                "type": "TextBlock",
                "text": "${AlertName}",
				"weight": "bolder",
                "wrap": true
                }
            ]
         }
      }
   ]
}

After completing the Webhook configuration you may want test the Webhook configartion. Press the Send Test button

Finally Save the Webhook configuration.

Open the MS Teams Channel where the connector was created earlier. You should see here the Test Alert.

The last part is sending a notification to MS Teams when a ESXi host have entered Maintenance Mode.

I have created an AO4L alert with the name “AO4L Alert: Host entered Maintenance Mode“.

I have decided that I would like to be notified by MS Teams. This can be set under the Trigger Conditions.

If everything is configured correctly we should receive the Send Test Alert Results after sending a test alert.

Save the Alert. Now we are ready for the final test. I put a ESXi host in maintenance mode and we should receive within 5 minutes a MS Teams notification. It works!

I hope this blog post will help you configure AO4L to send notifications to MS Teams. Please remember that MS Teams is not a monitoring tool. So be selective with the alerts you forward.

May 21, 2024June 19, 2024

Aria Operations 8.17.1 Hotfix1 API/Webhook JSON Bug (FIXED)

Recently, while testing after upgrading Aria Operations 8.16 to 8.17.1 Hotfix1, I found that I could not forward alerts via API to a remote monitoring system.

While testing an alert notification (Test Notification in a random alert). We see an error message at ‘Endpoint Receives Notification’.

After investigation, the problem appears to be in the JSON code in the payload template. Nothing had changed here and we may be dealing with a bug here. Let’s take a look at the payload template.

Payload Template working fine in 8.16 and is broken in 8.17.1 Hotfix1

In the payload template sample, we see in the JSON code section highlighted in green that the new line character ‘\n’ is used.

Payload Template working fine in 8.17.1 Hotfix1

In the next payload template sample, we see in the JSON code section highlighted in green that the new line character ‘\n’ is not used.

After we apply another test alert notification using the payload template without the new line character “\n” everything works fine.

I opened a ticket by VMware/Broadcom GSS and they confirmed the bug. It is now waiting for an update that fixes this issue. As soon as there is an update I will report it here.

Update June 19:

The special character “\” json bug is fixed in Aria Operations 8.17.2. See the release notes. VMware Aria Operations 8.17.2 (broadcom.com) .

The following issues have been resolved as of Aria Operations 8.17.2:

Snapshot age reporting incorrectly
[App Monitoring] Product managed telegraf agent installation via script fails due to wget.exe URL is not reachable from Windows machine to CP
Reclamation page show >0 days old snapshots with 0 GB reclaimable space
Update Messaging in Manage Agents page
Uninstalling ARC agent status for windows-endpoint keeps on loading state when uninstall gets triggered through api.
Activated plugins is not shown post Agent is tried to uninstall with failed attempts
[APP Monitoring] : vROPS Large Environment – Bootstrap operation takes 11 mins to complete for 1 VM
Agent Installation on Windows VM taking more than 5mins
“Manage Telegraf Agents” page takes 30 sec to load
[APP Monitoring] : ARC services are operational on the endpoint VM; however, the agent status is currently indicated as unhealthy in vROPS UI
NPE when sending Alert Notification
[WLP] [vSAN] During the Workload Optimization, the VM vSAN compatibility check calls are failing, when the endpoint vCenter version is 8.0 U2 and higher
Cost Calculation failed on one DC.
[AWLP] [Backend] vRA calls, to get the advanced workload placement plan from vROps, fail on vROps side
In case of Metric chart Metrics mode, for deleted object widget is throwing exception.
Custom property: Internal server Error is thrown on alerts timeline
Having “\” and other JSON special characters in notification template leads to invalid JSON payload.
[Resources API Call] All resources in api response have resourceHealth and Value are GREY and -1 respectively on 8.17.x vROPs