VxRail V7.x log collection failed

Recently I ran into a problem when I needed to generate a VxRail V7.x support log bundle. The log collection failed after I went to the VxRail log section.

Cluster > Configure > VxRail > Troubleshoot > Create

After hitting the Create button I selected the required data. In my case the VxRail Manager data was enough.

When start to Generate the log bundle nothing happened. The Create button was greyed out for a few minutes and no log bundle was generated.

Solution:

  • Take a snapshot of the VxRail Manager vm
  • Open a ssh session to the VxRail manager
  • Login with mystic, switch to root
  • Cd /tmp/mystic/dc
  • Look for old log bundles for example VxRail_Support_Bundle_12d4235c-1458-d47f-e68b-a3221bca3bc3_2022-01-01_12_00_00.zip
  • Delete all old log bundles in /tmp/mystic/dc
  • Systemctl restart runjars
  • Systemctl restart vmware-marvin
  • Wait about 5-10 min.
  • Try again collecting VxRail manager logs

Now the VxRail Support log bundle was successfully generated.

If you are not familiar enough you can always open a Dell Support request. They will then help you further.

Replacing the VxRail Manager SSL Certificate failed

Cause:
Replacing the VxRail Manager SSL Certificate through the Certificate Management Gui ended up with an error.

I was used to replacing the VxRail certificate using the cli. This was the preferred way somewhere up to version VxRail 4.7.x. All preparations for using the gui are the same as when replacing the certificate through the cli:

  1. Config OpenSSL.conf with the required information
  2. Create RSA key
  3. Creare CSR file
  4. Request a Certificate signed by your CA
  5. Download Certificate CA chain

The certificate and CA certificate chain must still be in PEM format.

So I start with a snapshot of the VxRail manager and copied the VxRail certificate, RSA key, certificate chain and entered a password in the gui. After pressing the Update button I received the following error.

Solution:
I will briefly tell you what went wrong. As told earlier, I used the same way to create the certifcates as I did when using the cli.

The error was caused by the certificate CA chain. After downloading the certificate CA chain file, the certificates are listed from top to bottom. The Root CA was at the bottom. This is order has always worked when replacing the VxRail manager certificate using the cli.

I was led to the solution after reading the certificate chain content information twice, see the screenshot above here. I assumed that the certificate format was causing the issue. There is also information about the certificate format next to the information about the order of the CA certificates in the chain file. I changed the order of the CA certificates in chain file with the Root CA on top.

At the next try I copied again the VxRail certificate, RSA key and certificate chain and entered a password in the gui. After pressing the Update button I received a notification that the VxRail Manager certificate is successful replaced. It takes about 5 minutes, before the VxRail manager is back online.

During this 5 minutes the VxRail SSL Thumbprint in the cluster in vCenter will be updated and services will be restarted. You can see the VxRail SSL Thumbrint on the cluster summary page in vCenter.

This auto replacement of the VxRail SSL Thumbprint is an enhancement compared to the old way when the certificate was replaced using the cli. The VxRail SSL Thumbprint then had to be manually read and copied from the VxRail manager to the cluster custom attribute in vCenter.

Return to the VxRail System info page after 5 minutes. You should now see the VxRail information.

Done.

Configure VxRail Manager forward logs to vRealize Log Insight

This post is written by Steve Hagerty. Since I find this a very valuable blog, I have decided reblogging this article. The original article can be found here.

This is a short post to explain how to configure syslog forwarding from VxRail Manager to vRealize Log Insight.

Depending on the deployment scenario, VxRail may or may not be automatically configured to forward all of its associated logs to vRealize Log Insight. For example with VCF on VxRail, this configuration is automated for all VxRail components, while in other situations this may need to be configured manually.

There are three primary VxRail components to configure:

  • VxRail Manager
  • vCenter Server
  • ESXi Hosts

… with the iDRAC of each VxRail node also being an option.

The configuration of the VxRail vCenter Server in vRLI can also incorporate configuring the log forwarding from the associated ESXi hosts if selected, as shown below:

This is all managed under the built-in vSphere integrations for vRLI. What remains then, if required, is to configure VxRail Manager to forward its logs (marvin.log) to vRLI.

As described in KB504644 VxRail: How to configure a new syslog server , SSH to VxRail Manager as mystic user and switch user to root user, before editing the /etc/rsyslog.conf file with the following additional entries:

#
# Marvin log to loginsight
#
$ModLoad imfile
$InputFileName /var/log/vmware/marvin/tomcat/logs/marvin.log
$InputFileTag VxRail
$InputFileStateFile VxRail-Log-State
$InputRunFileMonitor
*.* @<customer remote server ip:514>

Advertisementshttps://c0.pubmine.com/sf/0.0.3/html/safeframe.htmlREPORT THIS ADPRIVACY

Ideally you should use the Log Insight load balancer IP as the target for the <customer remote server ip> (syslog/vRLI server IP), where 514 is the udp port.

Update 31/03/2021: In additional, to the above, the /var/log/mystic/connectors-cluster.log and the /var/log/mystic/connectors-esrs.log can be added to this list, simply by adding them as additional $InputFileName line items, as shown below:

Restart the syslog service using command on VxRail Manager: service rsyslog restart (or reboot the VxRail Manager VM if required).

We can then confirm that the vRLI system is receiving the forwarded logs from our VxRail Manager (vcf2mgmtvxrmgr) in the vRLI UI under Administration > Hosts

Advertisementshttps://c0.pubmine.com/sf/0.0.3/html/safeframe.htmlREPORT THIS ADPRIVACY

On the Interactive Analytics tab we can filter for the VxRail Manager hostname of vcf2mgmtvxrmgr in order to get more detail on each event received since the log forwarding was configured.

The events received from the VxRail Manager source will automatically be included in the General vRLI Dashboard, as shown below

It is also possible to create your own custom VxRail dashboard in vRLI if required. A new (VxRail) dashboard can be created under My Dashboards, where new and existing widgets can be copied and modified as required.

For completeness, if a customer requires the iDRAC logs of the VxRail nodes to be forwarded to vRLI also, then please take a look at this post which covers the required steps, leveraging the Dell iDRAC Content Pack for vRLI, installable directly from the vRLI Content Pack Marketplace, as shown below:

Advertisementshttps://c0.pubmine.com/sf/0.0.3/html/safeframe.htmlREPORT THIS ADPRIVACY

And that’s about it really, I hope that helps!

Steve

VxRail 7.0.300 GA

What’s new in VxRail 7.0.300

VxRail software version 7.0.300 includes VMware ESXi 7.0 Update 3, VMware vSAN 7.0 Update 3 and VMware vCSA 7.0
Update 3a with support for external storage and introduction to satellite nodes.

New features

Operationalize the edge with VxRail satellite nodes:
You can deploy the E660, E660F, and V670F as single VMware vSphere nodes with no VMware vSAN to address VxRail edge deployments that require a smaller footprint. You can configure satellite nodes with an optional PowerEdge RAID controller to add resiliency for local disks. The satellite nodes are managed by a new or existing standard cluster with VMware vSAN running 7.0.300.

Control satellite nodes from a central location:
You can deploy a VxRail Manager VM that can control all satellite nodes from a centralized host management location in VMware vCenter. You can add, remove, and update satellite nodes from one access point using VxRail Manager.

Expanded storage option for VxRail dynamic nodes:
You can deploy VxRail dynamic nodes as part of a PowerFlex 2-layer architecture. Deploy VxRail dynamic nodes cluster as compute only node leveraging PowerFlex storage for hosting the workload VMs.

Protocol support for VxRail dynamic nodes:
NVMe-FC is supported with PowerStore and PowerMax storage arrays that are attached to dynamic nodes.

VMware ESXi 7.0 Update 3, VMware vSAN 7.0 U3, VMware vCSA 7.0 Update 3a support. The major changes for VxRail include:
Support upgrade of the VMware vSAN Witness Host (dedicated) in vLCM as part of the coordinated cluster remediation workflow for VMware vSAN 2-Node and Stretched Clusters.

  1. Stretched Cluster Enhancement to allow the ability to tolerate planned or unplanned downtime of a site and the witness in a stretched cluster deployment.
  2. Nest Fault Domain in a 2-node configuration
  3. Easy VMware vSAN cluster shutdown and start-up
  4. Upgrade note for VxRail with external storage

Source: https://dl.dell.com/content/docu98130

Extracting VxRail code 7.0.2xx failed at 50%

Sometimes you run into an issue that can keep you busy for hours and afterwards the cause remains easy to solve. Recently I ran into such an issue.

There was a minor update that needs to be done. It was a VxRail code upgrade from 7.0.x to 7.0.2xx.

The upgrade was basically like all other upgrades:

  1. Run VxVerify
  2. If there are findings in the results, solve them before starting the upgrade
  3. Upload the desired VxRail target code
  4. Start the upgrade
  5. Done

The results of the vxVerify were fine, no issues detected.

While uploading the target VxRail code everything looks fine but during the extraction of the upgrade bundle it failed at 50%. So I start a retry but the extraction of the upgrade bundle failed again at 50%. At the Cluster level we noticed the following error.

VXR1F4114 ALARM Upload of upgrade composite bundle unsuccessful VxRail Update ran into a problem… Error extracting upgrade bundle 7.0.2xx. Failed to upload bundle. Please refer to log for more details.

I opened a support request by Dell Support and in the meantime I start to examine the lcm-web.log in /var/log/mystic. I found some errors and failures but they did not lead directly to the root cause. There were errors about upgrade bundles couldn’t uploaded but those events were too general. I noticed the VxRail node that was mentioned at last in the log before the extraction failed.

Dell Support was now also working on the case. The support engineer also noted that the VxRail node I suspected was causing the problem.

I won’t go into too much detail, but at some point we checked the status of the “dcism-netmon-watchdog” service on that particular VxRail node.

[root@ESXi03:~] /etc/init.d/dcism-netmon-watchdog status
iSM is active (not running)

I had seen recently the same service status on another VxRail nodes running on code 7.0.x. Restarting the service won’t start the service. So I restarted the VxRail node. After the restart it could take some minutes before the service is restarted. I checked the service again.

[root@ESXi03:~] /etc/init.d/dcism-netmon-watchdog status
iSM is active (running)

Finally we restarted(retry) the VxRail code extraction. Both the VxRail code extraction and VxRail upgrade were successful.

Not enough free space to upload VxRail update

As you probably know, I like the VxRail HCI concept. Yet there is one point in my opinion that can still be improved.
Sometimes a log must also be generated for support purposes in a VxRail cluster. After creating a new log bundle it can be downloaded but not deleted with the result that these logs remain on the VxRail manager (VxRm). Not a problem in itself but it has often happened to me that not enough free space is available while uploading the new VxRail code. The example below shows that “/dev/sda3” is 80% full.

vxrm:~ # df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs 3.9G 4.0K 3.9G 1% /dev/shm
tmpfs 3.9G 393M 3.6G 10% /run
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/sda3 16G 7.6G 8.1G 80% /
/dev/sda1 124M 39M 80M 33% /boot
/dev/mapper/data_vg-store1 2.0G 3.1M 1.9G 1% /data/store1
/dev/mapper/data_vg-store2 14G 9.3G 3.8G 72% /data/store2
tmpfs 850M 0 850M 0% /run/user/123
tmpfs 850M 0 850M 0% /run/user/4000

The following command finds temporary large files that are usually left behind after an update or generating a support log bundle. Always take a snapshot before make any change.

Find /tmp -type f -size +20000k -exec ls -lh {} \; | awk ‘{ print $9 “: ” $5 }’

Check the output and delete the large files in “/tmp”. As can be seen in the overview below, “/dev/sda3” is now filled up for only 52%. This is more than enough to upload the VxRail update.

vxrm:~ # df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs 3.9G 4.0K 3.9G 1% /dev/shm
tmpfs 3.9G 393M 3.6G 10% /run
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/sda3 16G 7.6G 8.1G 52% /
/dev/sda1 124M 39M 80M 33% /boot
/dev/mapper/data_vg-store1 2.0G 3.1M 1.9G 1% /data/store1
/dev/mapper/data_vg-store2 14G 9.3G 3.8G 72% /data/store2
tmpfs 850M 0 850M 0% /run/user/123
tmpfs 850M 0 850M 0% /run/user/4000

My conclusion is that I prefer a download stream of the support log bundle instead of placing the file on the VxRail Manager. Maybe in a future release?

The above is just an example. It’s at your own risk to make any changes. You can always log a support case at Dell Support if you encounter this issue.