In the past years, I have experienced some vSAN performance issues due to faulty hardware. The goal was to know at an early stage whether there are hardware errors that can lead to performance degradation.
One problem I’ve seen a few times are hardware related problems that lead to a high latency, outstanding io’s and congestions at the backend storage. I was wondering if it is possible to spot these kinds of issues earlier? I started searching in vRealize Log Insight.
I found some events afterwards during my research. In the period prior to the performance issues, many “Power-on Reset on vmhba” messages had been written in the vobd.log and vmkernel.log. At first it was a few events per day, but as time passed the frequency with which the events came increasingly higher and finally led to a very poor vSAN performance.
In the following steps I will explain how you can define an email alert in vRealize Log Insight that helps to detect this kind of issues at an early stage. Now it’s possbile to take early action to avoid potential problems.
Step 1. Create a query that search for “Power-on Reset occurred on vmhba” events
Step 2. Create an alert from the query
Step 3. Define the alert