VMware vRealize Log Insight alerts in Microsoft Teams

Recently, We missed an alert notification that had been generated in VMware vRealize Log Insight (vRLI) outside of office hours. This had caused a disruption that we could have avoided if we had been informed in time. The alert notifcation had been sent via email but email is not always accessed outside of business hours. Which I can well understand. In this blog, I will explain what I have come up with to notice these types of alerts earlier.

Use Case – Increase the ability to notice prio 1 alerts outside of office hours with the available technical resources.

Goal – In addition to the standard vRLI alerts, we also want to have the option available to receive alerts through Microsoft (MS) Teams.

Solution – Use vRLI Webhook to send alerts to MS Teams

Setup – In order to have vRLI alerts sent to MS Teams, we need to set up two things.

  1. Setup a MS Teams Connector to receive alerts
  2. Setup the vRLI Webhook configuration to push alerts

Setup a MS Teams Connector to receive alerts

First, decide in which Teams Channel you want to receive the vRLI Alerts or add a new Teams Channel. I have created a new Channel called VRMware VMware Alerts.

Click on the 3 dots on the right side and select Connectors.

Select Configure Incoming Webhook.

Provide a friendly name, upload an image and create the connector.

After creation copy the url to the clipboard. We need this URL later to configure the vRLI Webhook.

Before we move on to vRLI we need to enable the channel notifications. Click once again on the 3 dots on the right side and select Channel notifications > All activity.

Setup the vRLI Webhook configuration to push alerts

Go to the Administration section and open Configuration > Webhook > New Webhook. Choose a name. From the Endpoint drop down menu select Custom. Copy the Webhook URL that was copied from MS Teams connector. From the Content Type drop down menu select JSON and from the Action drop down menu select POST. The Webhook Payload will be described under the picture.

Webload Payload

The Webload Payload was the hardest part to configure. Thanks to my colleague Roger who has figured out how the Webhook Payload layout should look like.

As far as I know, from vRLI Webhook only clear text can be used to send notifications to MS Teams. It’s possible to use one or more parameters in the script. For an overview of the parameters see the picture above here. Because the notictions are send in clear text it’s not possible to use all parameters. In our case not a problem because MS Teams is not used to replace monitoring software. It is just an additional option to be informed in a timely manner.

I wouldn’t go indepth how we found out the layout of the Webhook Payload code. That’s why I’m only sharing the code with you, so you can start testing for yourself.

{
   "type":"message",
   "attachments":[
      {
         "contentType":"application/vnd.microsoft.card.adaptive",
         "contentUrl":null,
         "content":{
            "$schema":"http://adaptivecards.io/schemas/adaptive-card.json",
            "type":"AdaptiveCard",
            "version":"1.2",
            "body":[
                {
                "type": "TextBlock",
                "text": "${AlertName}",
				"weight": "bolder",
                "wrap": true
                }
            ]
         }
      }
   ]
}

After completing the Webhook configuration you may want test the Webhook configartion. Press the Send Test button.

Finally Save the Webhook configuration.

Open the MS Teams Channel where the connector was created earlier. You should see here the Test Alert.

The last part is sending a notification to MS Teams when a ESXi host have entered Maintenance Mode.

I have created an vRLI alert with the name “TEST VRMware VRLI Alert: vSphere Host entered Maintenance in vCenter“.

I have decided that I would like to be notified by both email and MS Teams. This can be set under the Trigger Conditions.

If everything is configured correctly we should receive the Send Test Alert Results after sending a test alert.

Save the Alert. Now we are ready for the final test. I put a ESXi host in maintenance mode and we should receive within 5 minutes a MS Teams notification. It works!

I hope this blog post will help you configure vRLI to send notifications to MS Teams. Please remember that MS Teams is not a monitoring tool. So be selective with the alerts you forward. I have chosen to only forward alerts that I know need to be acted on as soon as possible.

VxRail 7.0.300 GA

What’s new in VxRail 7.0.300

VxRail software version 7.0.300 includes VMware ESXi 7.0 Update 3, VMware vSAN 7.0 Update 3 and VMware vCSA 7.0
Update 3a with support for external storage and introduction to satellite nodes.

New features

Operationalize the edge with VxRail satellite nodes:
You can deploy the E660, E660F, and V670F as single VMware vSphere nodes with no VMware vSAN to address VxRail edge deployments that require a smaller footprint. You can configure satellite nodes with an optional PowerEdge RAID controller to add resiliency for local disks. The satellite nodes are managed by a new or existing standard cluster with VMware vSAN running 7.0.300.

Control satellite nodes from a central location:
You can deploy a VxRail Manager VM that can control all satellite nodes from a centralized host management location in VMware vCenter. You can add, remove, and update satellite nodes from one access point using VxRail Manager.

Expanded storage option for VxRail dynamic nodes:
You can deploy VxRail dynamic nodes as part of a PowerFlex 2-layer architecture. Deploy VxRail dynamic nodes cluster as compute only node leveraging PowerFlex storage for hosting the workload VMs.

Protocol support for VxRail dynamic nodes:
NVMe-FC is supported with PowerStore and PowerMax storage arrays that are attached to dynamic nodes.

VMware ESXi 7.0 Update 3, VMware vSAN 7.0 U3, VMware vCSA 7.0 Update 3a support. The major changes for VxRail include:
Support upgrade of the VMware vSAN Witness Host (dedicated) in vLCM as part of the coordinated cluster remediation workflow for VMware vSAN 2-Node and Stretched Clusters.

  1. Stretched Cluster Enhancement to allow the ability to tolerate planned or unplanned downtime of a site and the witness in a stretched cluster deployment.
  2. Nest Fault Domain in a 2-node configuration
  3. Easy VMware vSAN cluster shutdown and start-up
  4. Upgrade note for VxRail with external storage

Source: https://dl.dell.com/content/docu98130

VMware vCLS datastore selection part 2

Last year I wrote an blog post about the VMware vCLS datastore selection. This blog post is one of the most read articles on my website. This does indicate that there is a need to be able to choose a datastore on which the vCLS vms are placed.

Today VMware announced vSphere 7.0 update 3. In this update there is also an improvement on the vCLS datastore selection. It’s now possible to choose the datastore on which the vCLS vms should be located.

In the following video on the VMware vSphere YouTube channel move on to 20 minutes to learn more about the vCLS vms datastore selection improvement.

Another improvement is that the vCLS vms now have a unique identifier. This is useful when you have multiple clusters managed by the same vCenter.

It’s always good to see that a vendor is listening to the customers’ needs to further improve a product.

Cannot install the vCenter Server agent(HA) service. Unknown installer error

It had been a while since I have installed a non HCI VMware cluster. After installing the ESXi hosts, the updates and multipath software were installed. The storage team has made the datastores available. Nothing special. After installation, the host has been taken out of maintenance mode. Then there was an error “Error: “Cannot install the vCenter Server agent service. Unknown installer error“. See VMware KB #2083945 and VMware KB #2056299.

I have followed all standard procedures to resolve HA errors:

  • Right click the affected host. Reconfigure for vSphere HA
  • Reconfigure HA on a cluster level.  Turn Off vSphere HA and  Turn ON vSphere HA
  • Disconnect and reconnect the affected host

After performing the above options, the issue was still unsolved. Next I wanted to know if the HA (fdm) agent is installed or not. I ssh to the host and ran the following command:

Esxcli software vib list | grep fdm

The output was empty. I realized that the HA agent was not installed. In VMware KB #2056299 is written about a vib dependency. That made me realize that besides the VMware updates also multipath software was installed, Dell EMC PowerPath/VE. This turned me out to the right direction to solve the problem.

Solution:

  • Ssh to the affected host(in maintenance mode)
  • Esxcli software vib list or Esxcli software vib list | grep power. The results are three vibs: powerpath.plugin.esx, powerpath.cim.esx and powerpath.lib.esx
  • Uninstall the three vibs running the following command: esxcli software vib remove –vibname=powerpath.plugin.esx –vibname=powerpath.cim.esx –vibname=powerpath.lib.esx
  • Reboot the host
  • Esxcli software vib list | grep power The output shlould be empty.
  • Leaving maintenance mode. The HA agent is now installing. After the HA agent is installed enter maintenance mode again
  • Esxcli software vib list | grep fdm The output should be similar like: vmware-fdm VMware VMwareCertified 2021-02-16
  • Reinstall Dell EMC PowerPath/VE. Installing the same version PowerPath/VE gave a VUM error even after restarting the host. To resolve this error I’ve installed a newer version of PowerPath/VE. This version was installed succesful.
  • Leaving maintenance mode

In my case the PowerPath/VE vibs dependecies were causing the issue. Another dependency can also cause this problem. I am aware that looking for the right dependence can be a difficult job. I hope I have at least been able to help you start the search in the right direction.

November 2022. An update about this issue can be read here.

VMware vCLS datastore selection

Recently I noticed that after updating a VMware vCenter from 6.7 to 7.0 u1 the new VMware vCLS VMs where placed on datastores that are not meant for VMs.

Starting with vSphere 7.0 Update 1, vSphere Cluster Services (vCLS) is enabled by default and runs in all vSphere clusters.
vCLS ensures that if vCenter Server becomes unavailable, cluster services remain available to maintain the resources and health of the workloads that run in the clusters.

The datastore for vCLS VMs is automatically selected based on ranking all the datastores connected to the hosts inside the cluster. A datastore is more likely to be selected if there are hosts in the cluster with free reserved DRS slots connected to the datastore. The algorithm tries to place vCLS VMs in a shared datastore if possible before selecting a local datastore. A datastore with more free space is preferred and the algorithm tries not to place more than one vCLS VM on the same datastore. You can only change the datastore of vCLS VMs after they are deployed and powered on.

You can perform a storage vMotion to migrate vCLS VMs to a different datastore.

If you want to move vCLS VMs to a different datastore or attach a different storage policy, you can reconfigure vCLS VMs. A warning message is displayed when you perform this operation.

Conclusion: If datastores used that are intended for e.g. repository purposes, it is possible that the vCLS files are placed on that datastores. You can tag vCLS VMs or attach custom attributes if you want to group them separately.

Reference: docs.vmware.com

VMware vSphere 7 first impression

Yesterday VMware released Version 7 of vSphere. After downloading the necessary software, I built a nested vSAN 7 cluster in my lab. This is not a deep technical blogpost just my first impression.

vSphere logo 2020


I chose a fresh installation instead of an upgrade. This has to do with the available resources in my lab. The installation was simple as usual.

  • Deploy 4 nested ESXi hosts
  • Install vCSA
  • Create a cluster
  • Configure networks
  • Create vSAN
  • Deploy vm’s
  • Setup Skyline
  • Setup Backup

Deploying nested ESXi

When creating the nested ESXi hosts don’t forget to check the CPU option “Expose hardware assisted virtualization to the guest OS”. This is required if you want a working nested ESXi.

CPU hardware assisted virtualization enabled

After spinning up the ESXi installation and just before the deployment, the following warning occurred.

CPU Warning during ESXi setup

This message is due to the obsolete CPU type of the physical ESXi host. Because it’s a lab we ignore the warning and start the deployment. After a few minutes the installation is finished.

Hooray!

vCenter vCSA

The first thing that is noticed, is the absence of the vSphere-Client. Nobody used the vSphere-client either. So only the native HTML5 client is available.

vSphere UI

vSAN cluster

I’ve manually created a local vSAN cluster. I prefer this method because it gives more flexibility than the Cluster quickstart wizard. There are a lot of new and enhanced features.

New:

  • Simplify Cluster Updates with vSphere Lifecycle Manager
  • Native File Services for vSAN

Enhancements:

  • Integrated DRS awareness of Stretched Cluster configurations
  • Immediate repair operation after a vSAN Witness Host is replaced
  • Stretched Cluster I/O redirect based on an imbalance of capacity across sites
  • Accurate VM level space reporting across vCenter UI for vSAN powered VMs
  • Improved Memory reporting for ongoing optimization
  • Visibility of vSphere Replication objects in vSAN capacity views
  • Support for larger capacity devices
  • Native support for planned and unplanned maintenance with NVMe hotplug
  • Removal of Eager Zero Thick (EZT) requirement for shared disk in vSAN
  • The complete information can be found here:

The vSAN capacity monitoring has also been greatly improved. It gives a good overview of the current and historical capacity usage.

Capacity Usage
Capacity History

Virtual Machines

Windows 2019 is now available as Guest OS.

Windows 2019 available as Guest OS

Skyline

Skyline gives a daily overview of security findings and recommendation from VMware environments. That is why I immediately added this cluster to Skyline. I wonder if there are any findings and recommendations after the first collection of data.

Update Skyline April 4, 2020

vSphere7 lab is connected to VMware Skyline. Already two recommendations. Good to see it works.

vSphere 7 connected to VMware Skyline

Backup

The vm’s in this environment must also be backed up. I have choose to use the backup solution from Veeam, V10. I don’t know if Veeam currently supports vSphere 7, but it works in my lab.

Conclusion

VMware has released multiple enhancements and improvements with vSphere 7. vSphere 7 remains the strong engine of a modern SDDC. In addition to vSphere7, VMware has also released VMware Cloud Foundation 4.0 and VMware Tanzu. There is a lot to read and learn about all the new and enhanced VMware products.

WSFC on vSAN, backup & restore

After a week in Barcelona for VMworld Europe 2019 I got home with a lot of new information and ideas. This post is about Windows Server Failover Cluster(WSFC) on vSAN and how to backup and restore. WSFC is now fully supported on vSphere 6.7 update3 and for the Dell VxRail users, code 4.7.300.

I started with reading VMware KB74786. It’s a good start and describes the straight forward deployment.

First I have deployed two Windows Server 2016 vm’s in a vSAN cluster. After the initial deployment I have added the failover cluster file server role should on both vm’s. Now it was time to power-off both vm’s and add a Paravirtual SCSI controller with a physical bus sharing to both vm’s.

The next step is reconfiguring vm1 and add two new disks. The first disk is 5GB(Quorum) and the second disk is 50GB(Fileserver data). After reconfigure the vm’s it’s time to power-on them again.

At vm1 I brought the new disks online and format them as NTFS. The next step is crucial before the cluster can be created. If you forgot this steps the disks are not detected in the cluster configuration. Power-off vm2 and add the two existing disks from vm1 to the Paravirtual SCSI controller. Power-on vm2 after reconfiguring.

The creation of the cluster is now straight forward as on physical hardware. You need a cluster-core FQDN and for the fileserver role you need a cluster-cap FQDN. There is a lot of documentation available about configure a Windows failover cluster and otherwise ask your favourite Windows admin :-).

After the deployment I did some failover and failback tests. I was surprised of the speed of the failover. I know there were not many client connections, but I am really impressed.

Backup and restore

I was already convinced that WSFC on vSAN should work. But how to backup and restore the cluster and the data on it? I was thinking about this because snapshots are unsupported with WSFC on vSAN. See VMware KB74786.

I’ve performed the backup and restore tests in my testlab with Veeam B&R 9.5 update 4b.

The backup and restore test configuration:

First I have excluded the two vm’s from snapshot backup. The next step is create a new protection group for virtual failover clusters in the inventory view. In the active directory tab, search and add the two nodes and the cluster-core. In the exclusion tab of the new protection group I have unmarked “Exclude All virtual machines”. This is important otherwise the the cluster nodes can’t be added to the protection group. Use a service account with enough permissions and keep the defaults in options tab. After completing the new protection group wizard, the Veeam Agent for Windows will be deployed on the cluster nodes. A reboot is needed. Using Veeam Agent for Windows is the trick in this test. I considered the cluster and nodes as if they were physical. That’s why I used Veeam Agent for Windows. The final step is configure a backup job and backup! After this initial backup I created the recovery iso for both nodes for a bare metal restore(bmr).

I’ve succesful do the following restores from a Veeam Windows ReFS landingzone server.

  • File / folder
  • Volume restore
  • Bare metal restore

Everything went normal. Only a bmr restore with recovery iso is a bit different then bmr a physical server. You have to keep the following in mind. Normally when you create a recovery iso all the network drivers are included in the iso. VMware VMXNET3 driver is not included. I’ve asked Veeam support if it’s possible to add the VMXNET3 driver? It’s not possible. There is an option to load a driver during the startup of the recovery iso. During my test I was able to browse the the driver in the Windows folder: C:\Windows\System32\DriverStore\FileRepository\vmxnet3.inf_amd64_583434891c6e8231. And load it succesful. In the future maybe there are other ways of achieve this.

During the bmr restore I was only able to recover the system volumes only. This by design, I guess, because normally the other cluster node, including the data volumes are online. Finally I’ve succesful tested a recovery of an entire cluster data volume.

Conclusion:

The test deployment WSFC on vSAN helped me better to understand how it works. I see definitely possibilities for WSFC on vSAN.

The backup and restore tests helped me to find an answer how to backup and restore a WSFC on vSAN cluster. The tested backup configuration is supported by Veeam. I logged a case and asked them and they confirmed! Keep in mind that your guest-os is supported. See the Veeam release notes document.

Cheers!

Unable to login VAMI vCSA 6.7 update 2a

Recently we ran into a strange issue. After upgrading to vCenter vCSA 6.7 update 2a we we were no longer able to login the vCSA VAMI. The message we see was “Unable to authenticate user”. vCenter was working fine for daily use.

So we started some investigation. It was impossible for us to enable SSH because we couldn’t login into the VAMI. So we tried to log in to the vCSA vm-console with the root account. After 4 attempts the root account was locked. The used password was the correct one. In the vCSA System Configuration, Manage Tab we saw an alert “The appliance management service on this node is not running”

We went to the vCSA system services and notify that that “Appliance Management Service” was not started. After starting the service the appliance management is back online.

Next thing was enable SSH and Bash so we were able to log in to the vCSA with SSH and the root account. We used the same root account and password as before when it was locked out.

Our final test was login to the VAMI with root account. The login succeeded but we were surprised by what we saw after we logged in.

It looks the update wasn’t finished. So now we had an delayed “Hooray” moment because the update installation was succeeded. We don’t know if this was an incident or a bug?