Let’s quick remind ourselves I/O flow and how do checksum calculation and disk encryption work in conjunction with deduplication and compression on vSAN All-Flash cluster while creating/modifying new vmdk.
1. Checksum – functionality to avoid data integrity issues – is calculated before writing data (the block) to the caching tier. vSAN created 5-byte checksum for every 4kb data block and is verified to ensure that there is no any data corruption over the network. If a checksum mismatch is detected, vSAN automatically repairs the data by overwriting the incorrect data with the correct data from the other replica ( in case of RAID 1) or from the other components in the RAID stripe (in case of RAID 5/6).
NOTE!!! Checksum is managed by DOM (Distributed Object Manager).
2. Encryption (Data at Rest Encryption) – vSAN encryption uses an XTS AES-256 cipher to encrypt all objects in the vSAN. Data is encrypted in the cache tier (step 2) and capacity tier ( step 6) so this ensure that when the caching or capacity tier devices (disks) are removed, the data is still encrypted. Once encryption is enabled, Disk Encryption Keys (DEK) is write from the KMS down to disk. Each ESXi host uses the KEK ( Key Encryption Keys) to encrypt its DEKs.
3. Decryption – when the data in the cache tier is destaged to the capacity tier, vsan decrypts the data and runs the deduplication (step 4) and compression (step 5).
4. Deduplication – only available for all-flash vSAN. Deduplication is one of the few data reduction feature. It happens once the data is destaged from the cache tier to the capacity tier. Deduplication on vSAN uses the SHA-1 hashing algorithm. For any new incoming data block it creates a fingerprint. This hashing algorithm ensures that all data blocks are uniquely hashed. If a new data block is created its first compared to the existing fingerprints for a match. If found, rather than storing the same data block again a pointer is created to the already existing data block. Otherwise, the unique data is written to the disks and a new fingerprint is created and published.
5. Compression – only available for all-flash vSAN. Deduplication is followed by compression of written data blocks. If a unique block is created it goes through compression right after deduplication. Compression takes places when LZ4 compression manages to reduce 4KB data block to equal or less than 2KB data block. If compression can reduce the size then the compressed data block is moved to the capacity tier. If not, then the full size data block is persisted to the capacity tier.
NOTE!!! Deduplication and Compression work on a disk group level so only blocks deployed on the same disk group can contribute toward space savings. Deduplication and Compression are performed in the host memory – vSAN reads the blocks in to host memory and eliminate the duplicates and compress the data blocks.
6. Encryption (Data at Rest Encryption) – once deduplication and compression is done, vSAN encrypts data blocks once more.
NOTE!!! LSOM (Local Log-Structured Object Manager) takes care of Deduplication & Compression as well as Encryption.
7. Data blocks are written to disk group devices.
More about vSAN driver and its internal mechanism you can find in my previous post ‘vSAN – internal components and mechanisms’.