VAAI技术分析
目录
1 Low Level VAAI Behaviour
We’re getting a lot of queries lately around how exactly VAAI behaves at the lower level. One assumes more and more VMware customers are seeing the benefit of offloading certain storage intensive tasks to the array. Recently the questions I have been getting are even more in-depth. I’ve been back over my VAAI notes gathered since 4.1, and have put together the following article. Hope you find it useful.
VAAI first appeared in vSphere 4.1, and was only available to block storage devices (iSCSI, FC, FCoE). This was enhanced in 5.0 to include support for NAS device primitives and also introduced an UNMAP primitive for reclaiming stranded space on a thin provisioned VMFS.
A closer look at the original primitives and a description on how they work follows. I've used various references to some of the primitives as they seem to have taken on numerous different names since first launched.
1.1 Atomic Test & Set (ATS)
This is a replacement lock mechanism for SCSI reservations on VMFS volumes when doing metadata updates. Basically ATS locks can be considered as a mechanism to modify a disk sector, which when successful, allow an ESXi host to do a metadata update on a VMFS. This includes allocating space to a VMDK during provisioning, as certain characteristics would need to be updated in the metadata to reflect the new size of the file. Interestingly enough, in the initial VAAI release, the ATS primitives had to be implemented differently on each storage array, so you had a different ATS opcode depending on the vendor. ATS is now a standard T10 and uses opcode 0x89 (COMPARE AND WRITE).
1.2 Write Same/Zero
One of the most common operations on virtual disks is initializing large extents of the disk with zeroes to isolate virtual machines and promote security. vSphere hosts can be configured to enable the WRITE SAME SCSI command to zero out large portions of a disk. With WRITE SAME enabled, VMware ESX/ESXi will issue the command to arrays during specific operations. This offload task will zero large numbers of disk blocks without transferring the data over the transport link. The WRITE SAME opcode is 0x93.
The following provisioning tasks are accelerated by the use of the WRITE SAME command:
- Cloning operations for eagerzeroedthick target disks.
- Allocating new file blocks for thin provisioned virtual disks.
- Initializing previous unwritten file blocks for zerothick virtual disks.
The data out buffer of the WRITE SAME command will contain all 0's. A single zero operation has a default zeroing size of 1MB. When monitoring VAAI counters, it is possible that you will only observe the WRITE_SAME incrementing in batches of 16 in esxtop. This is because we only ever launch 16 parallel worker threads for VAAI, so don’t be surprised if you only see a batch increments of 16 Write Same commands during a zero operation.
Note: Not all storage arrays need to do this directly to the disk. Some arrays only need do a metadata update to write a page of all zeroes. There is no need to actually write zeroes to every location, speeding up this process dramatically all round.
1.3 Full Copy/XCOPY/Extended Copy
This primitive is used when a clone or migrate operation (such as a Storage vMotion) is initiated from a vSphere host, and we want the array to handle the operation on our behalf. vSphere hosts can be configured to enable the EXTENDED COPY SCSI command. When examining VAAI status in esxtop, you may see this counter increment in batches of 8 because the default size of a Full Copy transfer is 4MB. With a 32MB I/O size, this gave batches of 8 for a full XCOPY I/O. The opcode for XCOPY is 0x83.
- What VAAI offloads looks like from an I/O perspective?
I've had a number of requests to describe exactly what happens under the covers when some of these offload operations are taking place. The default XCOPY size is 4MB. With a 32MB I/O, one would expect to see this counter in esxtop incrementing in batches of 8. The default XCOPY size can be incremented to a maximum value of 16MB.
The default WRITE SAME size is 1MB. With a 32MB I/O, one would expect to see this counter in esxtop incrementing in batches of 16, since we only ever launch 16 parallel worker threads for VAAI. We currently do not support changing the WRITE SAME size of 1MB.
- Differences between VAAI in 4.x & VAAI in 5.x
One final piece of information I wanted to share with you is a distinction between our first phase/release of VAAI in vSphere 4.1 and our second phase of VAAI which was released with vSphere 5.0. A list of differences appears below.
- VAAI now uses standard T10 primitives rather than bespoke array commands
- Full ATS with VMFS-5. Here is a link to a blog post which talks about locking inmore detail.
- Support for NAS Primitives – An overview of the vSphere 5.0 primitives can be found here.
- VCAI (View Composer Array Integration) – Read how View is using VAAI to offloads clones to the array here
- UNMAP support – Some additional information on the new UNMAP primitive can be found here.
- VMware HCL now requires performance of primitives before an array receives VAAI certification. This is important as certain storage arrays which appear in the vSphere 4.1 HCL may not appear in the vSphere 5.0 HCL if the performance of the offload primitives do not meet our requirements.
My thanks to Ilia Sokolinski for clarifying some of the behaviours above.
2 Reference links
- VMware ESXi SCSI Sense Code Decoder
- http://www.virten.net/vmware/esxi-scsi-sense-code-decoder/