Scalability in the hardware and software space is all about parallel computing nowadays. Consider our modern hardware: it used to be that all we really cared about was how fast our CPU could run (“how many GHz?”) Now, we care more about how many cores our CPU has, whether or not those cores support Hyper-threading, how many memory channels our CPU has available to it, etc. Scale-out beats scale-up.
The same is largely true in the software space. Most IT folks learned some time ago that “multithreading” and “higher performance” tended to go hand-in-hand or were at least associated in some way. Multiple threads of execution meant better scheduling of limited processor resources and fewer chances that one long-running operation would bottleneck an entire application.
Configuring SharePoint 2010 Farm Backup and Restore
When I first saw the following section in the “Configure Backup Settings” section of SharePoint 2010’s Central Administration site, it brought a big grin to my face:
In SharePoint 2007 and earlier, administrators had no real levers to pull to try and tune the performance of farm backup and restore operations. This obviously changed with SharePoint 2010. We were basically being handed a way to adjust those processes as we saw fit – for better or worse.
Strangely enough, though, I never really took the time to explore the impact of those settings in my SharePoint environments. I always left the number of assigned threads for backup and restore operations at three. I would have liked to mess around with the values, but something else was always more important in the grand scheme of things.
Why Now?
I’ve been working on a new “backup tips and tricks” whitepaper, and I found myself looking for backup and restore concerns within the SharePoint platform that I may not have given much attention to in the past. It didn’t take much wading through Central Administration before I once again found myself looking at thread counts for backup and restore operations.
Doing a little bit of Internet (background) research confirmed what I had suspected: no one else had really spent any time on the topic either. In fact, the only “fresh” and non-copyright-infringing material I found came from a Microsoft TechNet post titled Backup and recovery best practices (SharePoint Server 2010) … and to tell you the truth, the following paragraph from the section titled “Configure SharePoint settings for better backup or restore performance” really bugged me:
If you are using the Backup-SPFarm cmdlet, you can use the BackupThreads parameter to specify how many threads SharePoint Server 2010 will use during the backup process. The more threads you specify, the more resources that backup operation will take, but the faster that it will finish, if sufficient resources are available. However, each thread is reported individually in the log files, so using fewer threads makes interpreting the log files easier. By default, three threads are used. The maximum number of threads available is 10.
Without an understanding of how multithreading (in general) and SharePoint backup (specifically) work, this could easily be interpreted as follows:
The greater the number of threads you assign, the faster your backups will complete.
I realize that my summary is an oversimplification, but I believe that many administrators see the TechNet paragraph as I summarized it. And that concerns me.
I’ve always told people that increasing the backup thread count could yield better performance, but any adjustments would need to be tested in the target farm where they are to be implemented. Realistically speaking, there are several participants and a lot of moving parts in any SharePoint farm backup. Besides the SharePoint server where the backup operation is being coordinated, there is the performance of one or more SQL Servers to consider. The capabilities and restrictions of the backup destination location (typically a UNC file share) also need to be factored-in since that destination is being written to by both the SharePoint Server and one or more SQL Servers.
Setting the number of backup threads to 10 on a SharePoint Server of infinite capability and resources doesn’t guarantee a fast backup, because the farm might have a slow SQL Server, a less-capable backup destination location, a slow or congested network, or a host of other complicating factors.
Oh Yeah? Prove It.
Of course, all of this is just a bunch of hand-waving without proof. So, the scientist in me (yeah, I actually used to be a chemist) decided to take over and devise a series of simple tests to see if there is any real weight to the arguments I’ve been making.
I began with the hypothesis that the easiest and most visible way to gauge the performance of a farm backup operation is to measure how long a backup takes to run; e.g., a farm backup that takes 10 minutes to run is faster than a backup that takes 20 minutes to run if farm content, hardware, configuration, and other factors remain constant. Since SharePoint 2010 provides the ability to specify anywhere from one to 10 backup threads, running a series of backups where the only variable is backup thread count should determine if greater or fewer backup threads yield better performance.
You might recall that I also mentioned that farm topology is a factor in the overall backup equation. As part of my experiment, I decided to run the tests on two different farms I have available to me. General descriptions for each farm:
- Single-Server Farm: my single server farm environment is a VM running on my laptop. The VM houses SharePoint, SQL Server, and the backup location being targeted. The laptop hardware is a Core-i7 quad-core processor, and the underlying storage for the VM is a solid-state drive (SSD). Hardware bottlenecks should be minimized, and network latency isn’t a factor since backup operations are conducted against a local drive within the VM.
- Multi-Server Farm: my multi-server environment is the “production” environment on my home network. It consists of a SharePoint Server VM running on a Hyper-V host that also hosts other VMs. The SQL Server instance backing the farm is a non-virtualized SQL Server housing all of the SharePoint databases as well as a few databases for other applications. The backup destination location is a virtualized file server with a pass-through drive array (eSATA with RAID-5). Overall hardware, in this case, is “okay” but obviously not dedicated purely to SharePoint. In addition, network latency and bandwidth (GbE) are also in-play as potential sources of impact.
These two environments have pretty different overall topologies, and it was my hope that I’d see some effect on the performance numbers as a result.
The Script
To run the tests reproducibly, I needed a PowerShell script. So, I put the following script together while I had a bit of free time one night. Feel free to pluck this out to use for testing in your SharePoint environment, as well.
<# .SYNOPSIS TestBackupThreads.ps1 .DESCRIPTION This script is used to conduct and time a series of backups using different thread counts. The output can then be used to make an educated decision on the number of backup threads to assign for use in farm-level backups. .NOTES Author: Sean McDonough Last Revision: 25-July-2012 .PARAMETER TestLocation A UNC path to a location that can be used to create test backup sets .EXAMPLE TestBackupThreads \\FileShare\TestLocation #> param ( [string]$TestLocation = "$(Read-Host 'UNC path to test backup location [e.g. \\FileShare\TestLocation]')" ) function TestThreads($backupLocation) { # Ensure that the SharePoint cmdlets are loaded before continuing $spCmdlets = Get-PSSnapin Microsoft.SharePoint.PowerShell -ErrorAction silentlycontinue if ($spCmdlets -eq $Null) { Add-PSSnapin Microsoft.SharePoint.PowerShell } # Setup some variables we'll need for execution. $threadTimes = @{} # Hash table to hold timing results $backupItems = Join-Path $backupLocation "spbr*" # Used to delete temp backup files # We need to execute a full farm backup for each thread count 1 through 10 Clear-Host Write-Host "`nBackup thread count testing process beginning." for ($threads = 1; $threads -lt 11; $threads++) { # Clean out any backup contents from the test location Remove-Item $backupItems -recurse # Grab the starting date/time (for later comparison), kick-off a farm backup, and then # grab the stop date/time. Write-Host "`nInitiating a backup with $threads thread(s) ..." $startPoint = Get-Date Backup-SPFarm -BackupMethod Full -Directory $backupLocation -BackupThreads $threads $stopPoint = Get-Date # Store and report results $keyName = "Backup with {0} thread(s)" -f $threads $elapsedSeconds = "{0:N0}" -f ($stopPoint - $startPoint).TotalSeconds $threadTimes[$keyName] = $elapsedSeconds Write-Host "Backup with $threads thread(s) complete" Write-Host ("- time to complete (in seconds): {0}" -f $elapsedSeconds) } # Do a final sweep of the test backup location to clean out backup items Remove-Item $backupItems -recurse # Dump the results sorted in order of quickest to longest Write-Host "`nBackup thread count testing process complete." $threadTimes.GetEnumerator() | Sort-Object Value # Abort script processing in the event an exception occurs. trap { Write-Warning "`n*** Script execution aborting. See below for problem encountered during execution. ***" $_.Message break } } # Launch script TestThreads $TestLocation
The script is fairly straightforward in what it does. You supply a TestLocation parameter to specify where farm backup test data should be written to, and the script will run a series of full farm backups using the supplied location as the backup destination. The script starts with a full backup using one backup thread; at the end of each full farm backup, the script notes how long the backup took (in seconds) and cleans-up the contents of the TestLocation folder. The number of backup threads is then incremented, and the next test is run. When the script has completed running all backup tests, it sorts the results from “quickest backup” (i.e., the backup thread count requiring the least amount of time) to the slowest backup.
Test Results
I ran a series of three tests for each of the aforementioned environments for a total of six total test runs. Although there’s still quite a bit of variability between individual results within a backup thread series, some trends did appear to emerge.
Single-Server Farm
With the single-server environment, increasing the number of backup threads did appear to have a directional impact on performance. A single backup thread proved to be the slowest option for the farm backup, and “greater than one” thread resulted in better performance.
If you look at the average values, though, there wasn’t a tremendous difference between the slowest thread count (410 seconds for one thread) and the fastest (388 seconds for 10 threads). We’re only talking about a 5% to 6% difference overall. To truly find the optimum number of backup threads in an environment like this would require more than three test runs to account for standard deviation and establish significance.
Oh, and for those that might be wondering: I’m sure I introduced some of my own variability into the results. Although I didn’t do anything processor or disk intensive during the test runs, I didn’t go out of my way to minimize the impact of services, background operations, etc. To repeat: more testing (with better controls) would be needed for truly conclusive results. The only thing I started to show with this particular set of tests is that multithreading seemed to improve backup performance.
Multi-Server Farm
Things got quite a bit more interesting (to me) when I switched over to multi-server farm testing.
In the multi-server environment, the average for using just one backup thread (1413 seconds) appeared to be significantly faster than the next best option (1747 seconds for seven backup threads) – in the neighborhood of 20% or so faster. Just like the single-server results, additional trials would be needed to completely validate the observations, but the results are less ambiguous (given the relatively greater precision of the samples) than with the single-server runs.
Do you find this surprising? Given my multi-server environment and what I know about it, I can’t really say that I was caught flat-footed by the results. Going into the tests, my hypothesis was that my backup destination location would likely be the “weak link” in my overall farm and backup topology. The SharePoint Server was doing well, the SQL Server was relatively robust … but all of that backup activity was hard on my (virtualized) file server. Multiple servers trying to write to the backup location were swamping it and the network, and adding additional backup threads to the mix didn’t end up helping or improving the overall backup process.
The Take-Away
At the end of the day, I recognize that these tests of mine didn’t prove anything conclusively. Frankly, conclusive proof wasn’t my goal. The intent of these experiments wasn’t to say “more threads are better” or “more threads are worse.”
The only point I’m making (I hope) by sharing these results is this: until you run some real tests of your own in your SharePoint environment, you really don’t know where your backup thread sweet spot is. You can try to guess it, but it’s just a guess. And guessing is really no better than simply leaving the backup thread count set to its default value of three.
References and Resources
- Wikipedia: Parallel Computing
- Wikipedia: Hyper-threading
- Wikipedia: Thread (computing) and Multithreading
- TechNet: Backup and recovery best practices (SharePoint Server 2010)