Symptoms:

VMWare VMs running Windows Server 2012 had random lockups, reboots, and data
corruption when accessing iSCSI targets hosted on a Linux iSCSI target server.


Resolution:

Problem was TCP checksum offloading - VMWare contains an optimization to delay
computing the checksum until the *receiving* side NIC, when VMWare thinks the
packet is destined for another VM on the same host. Suspect this optimization
was being applied intermittently to iSCSI packets destined for the (non-VM)
iSCSI server.

Disabling TCP and IP checksum offloads resolved the issue.

Note: also ended up with several stray volume snapshots after failed backups;
'vssadmin' would not remove these, but 'diskshadow' was able to delete them.


Detailed symptoms:

Receiving side iSCSI server /var/log/messages contained lots of:
 tgtd: iscsi_rx_handler(2107) rx hdr digest error 0xc0a42a22 calc 0x52519197
 tgtd: conn_close(101) connection closed, 0x1b4c958 5
 tgtd: conn_close(107) sesson 0x1b4cde0 1
 tgtd: conn_close(165) Forcing release of rx task 0x1b52910 cbd60000

Also had some tgtd segfaults in dmesg:
tgtd[1575]: segfault at 1929330 ip 00000000004064cf sp 00007fff1aab7370 error 6 in tgtd[400000+3e000]
tgtd[19986]: segfault at 1c56df0 ip 00000000004064cf sp 00007fffed904d00 error 6 in tgtd[400000+3e000]
tgtd[21178]: segfault at fffffffffffffff0 ip 00000000004086b0 sp 00007fff3ca9eaf0 error 4 in tgtd[400000+3e000]

Copying a large file to & from the iSCSI disk in Windows resulted in checksum
mismatch.

When running a backup with Windows Server Backup, got these events in Windows
system log:

iScsiPrt: event ID 1: Initiator failed to connect to the target. Target IP address and TCP Port number are given in dump data.
iScsiPrt: event ID 7: The initiator could not send an iSCSI PDU. Error status is given in the dump data.
disk: event ID 157: Disk 1 has been surprise removed.
disk: event ID 153: The IO operation at logical block address 0xeea24a for Disk 1 was retried.
BugCheck: event ID 1001: The computer has rebooted from a bugcheck.  The bugcheck was: 0x000000d1 (0xfffffa800b909ab8, 0x0000000000000002, 0x0000000000000000, 0xfffff8800368732d). A dump was saved in: C:\Windows\MEMORY.DMP. Report Id: 021615-14289-01.
VSS: event ID 13: Volume Shadow Copy Service information: The COM Server with CLSID {e579ab5f-1cc4-44b4-bed9-de0991ff0623} and name Coordinator cannot be started. [0x80070005, Access is denied.
]

The description for Event ID 25 from source volsnap cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event: 

\Device\HarddiskVolumeShadowCopy2
C:

The resource loader failed to find MUI file