Two Windows Server 2016 virtual machines running on top of ESXi 6.7 in a Microsoft cluster. ONTAP 9.7P8. Servers are using in-guest iSCSI initiators to mount a number of LUNs as CSV disks. SQL Server 2017 is installed as a failover cluster (not AlwaysOn groups) on these CSV disks. SnapCenter 4.4 is managing both VMs with plug-ins for Windows and MSSQL. SnapCenter for VSphere plug-in virtual appliance is installed and configured, but not linked to the SnapCenter server itself, so the C: drive VMDKs are not managed by SnapCenter. Policies and schedules are configured to back up the databases residing in the CSV disks.
Problem: about half the time backup jobs are executed, both full and log backups, they fail with an error message "Unable to find any healthy resource on NetApp storage".
This has started when this Windows/SQL cluster was deployed several months ago to replace a previous Windows Server 2008 R2/SQL 2012 cluster that was using clustered LUNs (not CSVs) and SnapManager rather than SnapCenter, and was not exhibiting this behavior. At the time, SnapCenter was running version 4.3.1P2, and the filer was running ONTAP 9.4P5. Since then, the filer has been replaced by a new AFF A220 (new cluster, new LUNs, data was moved by SQL backup/restore), and SnapCenter has been upgraded to version 4.4, but the problem persists. There is another SQL server in the same environment using the same settings except that it isn't clustered, so I suspect that the root of the problem is somewhere in the cluster settings, but I can't figure out what it is. Moving the instance between cluster nodes does not help with the problem. I tried digging through SMCore and plug-in logs but couldn't find anything pertinent.