One of the requirements of a VMWare cluster, and that the VMs which need to be made highly available, should reside on a common storage device, typically a SAN . VMWare provide their own filesystem called VMFS, which is a distributed filesystem. If a SAN device presents a raw data store called a Lun, then VMWare accesses that Lun using the iSCSI protocol or via FibreChannel. If the SAN/NAS device can expose it's storage space via the NFS protocol, then VMWare uses that storage directly over TCP/IP. We initially purchased a Dell MD 3000i SAN device which played a useful role when we were operating at a small scale of around 3 TB. As we increased the number of VMs, the disk space required also grew. We finally stopped needing more storage for a specific project at around 5 TB of utilization (8 TB of usable capacity).
Given the larger number of VMs, and the number of parallel deployments of various applications by our automated deployment scripts (we made everything part of our Go Grid), we started to see performance issues. Another long pending problem was that of being unable to answer why our SAN was sometimes slow even though there was no deployment in progress at all (never mind parallel deployments). For e.g., if we knew that there were six environments running 2 - 3 different builds, we'd not know which particular build caused an increase in disk IO. The VMWare GUI tool gives you only so much data. Also, devices like the MD3000i do not have any such thing as iSCSI analytics.
Around this time, I also wanted to experiment with creating VMs within seconds.
Since I've been closely associated with Belenix and OpenSolaris technologies for years, I decided to give Solaris 10 a spin as a storage box for some time. We used ZFS and it's snapshot feature to set up VM collections in minutes. A typical VMCollection would comprise of a Domain Controller, some IIS servers, an Exchange Server, and some VMs running Outlook 2007. We also got amazing performance. All this on a box with just three disks of 1 TB each configured in what is known in ZFS parlance as RAIDZ. RAIDZ is similar to but better than RAID5, because it avoids the RAID5 write hole problem. Our VMWare servers would access our various ZFS filesystems - some over iSCSI and the others as NFS storage.
So, now that we understood that ZFS could for certain give good performance, we needed to solve the additional problem of identifying which VMWare server was sending how much disk IO to which SAN LUN. Enter DTrace. With the help of a close friend from Sun, we put together some dtrace scripts, and these provided answers to some extent. Why only to some extent ? That's because we often didn't know what exactly to ask DTrace to monitor.
Some months later, we decided to replace our MD3000i with a higher end SAN. Now I'd heard of The famed DTrace Analytics that somes as part of the Sun Fishworks product line. Today, this is called their ZFS Storage Appliance Line. However, since we were going to sink in a lot of money into a SAN, I wanted to be cautious and check out other popular vendors too.
Some colleagues and I got together and attended review sessions at NetApp and at EMC. Both NetApp and EMC offered more than what we'd imagined as part of their standard offering, and this was good to know. Unfortunately, at that time (mid 2010), NetApp did not have any analytics around iSCSI sessions. At the EMC review session at their Bangalore office, we saw a good demo and were pleased with their product overall. But the key feature which is very important for us - iSCSI session analytics - was missing. They did some some raw iSCSI analytics, but I learned that activating this monitoring locks up the controller in processing information, and VMWare servers and VMs get disconnected. This was a clear no.
After some discussions with Oracle Bangalore, we finalized an order for a 7320 Storage box, and life hasn't been the same again !
More in part 2 of this blog post.