‘The IO operation at logical block address was retried’ on Hyper-V 2012 R2 guest

I’ve recently been working a lot with Microsoft’s backup solution and part of the System Center 2012 R2 suite: Data protection Manager 2012 R2.

I will likely post some architecture and hands-on experience posts in the very near future but for now I want to share an experience I had after I migrated my DPM data store to a different storage appliance.

For various reasons the backup store was orginally on a Nimble CS300 array (very nice bit of kit, by the way) but we had an old Dell Powervault MD3200i floating around with about 40-60TB of 7200RPM spinning disks.

The setup I’ve used is that I deployed the DPM server to a Hyper-V Generation 2 Virtual Machine. A dynamic “disk pool” (Dell-speak) was then setup on the MD3200i and I created a couple of volumes which were then presented to the virtual host via iSCSI and formatted as NTFS Cluster Shared Volumes. Dynamically expanding VHDX’s were then created for the DPM storage in sizes of 1000GB each (for Server 2012 R2 Microsoft recommends not using VHDs larger than 1TB; although I gather that limit is being removed in Server 2016).
I then used Virtual Machine Manager 2012 R2 to live migrate the storage from our Nimble to the Dell unit. So far, so good.

Unfortunately, some jobs started failing and the errors indicated that one of the disks which that job was trying to write to was no longer available:

A quick look in the DPM console at the disks status confirmed that disk 12 was indeed missing.

The DPM Alerts log seemed to support the theory too.

A poke in the SYSTEM log confirmed the grizzly truth:

The confusing thing was that this particular VHDX file was hosted on a CSV with several other VHDX files for that DPM server; and none of those were disappearing. Moreover, a look in the event logs of the host server, however, revealed no errors. Likewise there were no media errors being flagged up on the MD3200i storage.

Fortunately, Microsoft have a page which lists recommended hotfixes for Windows 2012 R2 clusters and Hyper-V 2012 clusters.

https://support.microsoft.com/en-gb/kb/2920151

These hotfixes are mostly the sort that you have to request from Microsoft via the webpage and are not available via Windows Update. So this is one to keep bookmarked!

After deploying first the cluster updates and then, after a reboot, the Hyper-V cluster updates on all nodes then the issue was resolved. I cannot be certain but it’s reasonable to suspect that hotfix 3068445 (Virtual machines that host on Windows Server 2012 R2 may crash or restart unexpectedly) as a likely candidate.

Leave a Reply

Your email address will not be published. Required fields are marked *