Thursday, March 10, 2011

RAID-5 is the Devil

And I’ll tell you why.  But first, the TLDR takeaway:  If you have a storage array of anything over a terabyte or two DO NOT use RAID-5, use RAID-6 or something with superior fault tolerance.  If you remember anything from this post remember that or you may find yourself in the position I’m in currently, lost data and all.

It really comes down to one little performance specification that until now I hadn’t ever really noticed: “Unrecoverable read errors”.  For the latest Seagate home drive that statistic is 1 in 1014, for a standard SATA Enterprise class drive that number is an order of magnitude better, 1 in 1015.  Remember those numbers and now read this excellent article as well as one of the referenced white papers from NetApp, and perhaps finish off with this StorageMojo post

See the problem now?  My backup server sure did when it ran afoul of a double drive failure.  Perhaps it was just bad jujus, but my 6x1TB drive array (with hot spare) had a drive fail – and hit “Unrecoverable read error” while reconstructing the data to the hot spare.  So now I’m working at evacuating the existing data and restoring from backups.  To save yourself that trouble it’s worth remembering:

For large arrays, RAID-5 is the devil.

Friday, March 4, 2011

Monitoring/Logging Concurrent VDI Sessions with Host CPU & RAM utilization

So we have spent the past few months sizing, installing, adjusting, and growing a proof-of-concept implementation of VMWare’s virtual desktop product, View 4.5.  One question that was kind of fun to develop an easily digestible solution for was, “How can we keep track of and trend how the VDI solution is being used?”  Naturally, I turned to my technical multitool of choice, PowerShell, for a solution.

As a simple logging and review tool we decided that we would like to know how many Remote Sessions were currently active, what the average CPU % utilization was, and the average % RAM utilization was.  I added Disk and Network stats to the logs as well.  For the Remote Session stats we have to use the ‘View PowerCLI’ PowerShell Snap-in, “VMware.View.Broker”.  The last time I checked it was not available as a standalone installer nor was part of the standard VMWare PowerCLI distributable.  Because of this we either have to run our powershell script locally on our view manager server, or use PowerShell sessions and remoting.  In the interest of simplicity I chose to run the script locally on our View Manager server.  For the Host(s) CPU, RAM, Disk, & Network utilization we’ll use the standard PowerCLI distributable.  For historical purposes I log the information to a simple CSV file, and we use PowerGadgets for a “live” view of the statistics.

Okay, let’s check the script out.  First things first, we make sure that the required snap-ins are available.

# Add the necessary snapins, throwing terminating errors 
# should they not be available on the machine
if (-not(get-pssnapin | ? { $_.name -eq 'VMware.View.Broker' }))
{
  add-pssnapin VMware.View.Broker -ErrorAction Stop
}
if (-not(get-pssnapin | ? { $_.name -eq 'VMware.VimAutomation.Core' }))
{
  Add-PSSnapin VMware.VimAutomation.Core -ErrorAction Stop
}

Easy enough.  Next let’s get some script variables setup with some explanations along the way.

$today = Get-Date
$logPath = "C:\ViewStatLogs"
$logName =  "{0}.log" -f $today.ToString("yyyy-MM-dd")
$logexists = test-path $logPath\$logname

Here we’re just setting up our logfiles.  For the script to function, we’ll need to make sure that the $logpath already exists.  For the sake of space I setup the directory as compressed as well.  Next we define the name of the logfile which will define how our logs are broken up.  Currently mine are broken up by the day, though I’m thinking that it may be better to do them some other way…time will tell.  $logexists should be pretty self-explanatory.  Now for some variables that are a bit more interesting.

$dnsDomain = "mydomain.net"
$vCenterServer = "vCenter"
$vSphereHosts = @("ESXi03")
$statsToGet = @(  "cpu.usage.average",
                  "mem.usage.average",
                  "disk.usage.average",
                  "net.usage.average"
               )

So here we have $dnsDomain, which I use because I’m lazy and hate typing FQDNs all the time.  Next is the vCenter Server that is used to manage the host/hosts that compose our View cluster.  We’ll connect to this server to get the CPU/RAM/Disk/Network statistics.  We’re still in “growing proof-of-concept” phase so we only have a single vSphere host running in our View cluster which is specified in $vSphereHosts.  Next is an array of the statistics that we want to gather for each vSphere Host, in this case the average usage for cpu, mem, disk, and net(work).  For more information on the statistics that are available, consult the help for the Get-Stat & Get-StatType cmdlets for the VMware.VimAutomation.Core snap-in.  Next let’s setup the header for our CSV logfile, and create the file if it doesn’t already exist.

# Define the log header
$logHeader = "Time,SessionCount"
# Alter the header for each host/stat combo
foreach ($vsh in $vSphereHosts)
{
  foreach ($stat in $statsToGet)
  {
    $logHeader += ",{0}.{1}" -f $vsh,$stat
  }
}

# Create the log file if it doesn't exist
if (-not $logexists)
{
  Out-File -FilePath $logPath\$logName `
-inputobject $logHeader -force -encoding ASCII
}

Pretty straightforward stuff here as well.  We setup the header with timestamp & sessioncount, then append each chosen statistic for each vSphere host.  Now we’ll get our stats and write the information out.

$rSessions = @(get-remotesession)
$outString = "{0:HH:mm},{1}" -f (get-date),$rSessions.count

Connect-VIServer $vCenterServer
foreach ($vsh in $vSphereHosts)
{
  foreach ($stat in $statsToGet)
  {
    [int]$statVal = Get-Stat -Entity "$vsh.$dnsDomain" `
      -Start $today.ToShortDateString() -IntervalMins 5 -Stat $stat | `
      Select-Object -ExpandProperty value -First 1
    $outString += ",$statVal"
  }
}

Out-File -FilePath $logPath\$logName `
-inputobject $outString -append -encoding ASCII
# Close all vSphere PowerCLI server connections
Disconnect-VIServer -Force -Confirm:$false

Code very similar to setting up the header for the log.  We grab all the remote sessions our View manager is tracking into an array, then use the count (length) of the array for the information we log.  Next we connect to our vCenter server and for each vSphere Host we gather the last recorded 5 minute interval value of each chosen statistic.  There are only so many interval values available by default, and you can find our what you have available by running the Get-StatInterval cmdlet.  Next we write out the completed string to the logfile.  Lastly we disconnect the VIServer session.

The last bit in this formula is setting up the schedule.  You’ll get the most detailed information by setting up a scheduled task that runs on 5 minute intervals that match the intervals that you see when running Get-StatInterval and Get-Stat from the console.  That said, I only run mine every 10 minutes – we’re really just looking for an overview.  One thing to make sure is that the task is setup to run as a user that has the correct permissions to connect and gather statistics from your vCenter server, otherwise the whole script will just hang!

So, great – we can log, but how to make a nice pretty picture out of it?  That’s a question with lots of different answers.  There’s a lot of different ways to visualize statistic data, you probably have your favorite, and the details will depend on your hosts and the statistics you want to track, so I’ll just share a quick pic of what we use to keep an eye on things using PowerGadgets.

image

Hope that can help or inspire somebody out there, Happy Friday!