Error Applying SP1 to SQL 2008 R2

In my time working as a SQL DBA, I have been forunate enough to have never counter any type of error while installing or upgrading SQL servers from one version to the next.  Last month, one of the sysadmins were applying SP1 to our new SQL 2008 R2 clusters and ran into failures.  The attempted was made on the passive nodes.  However, all four instances failed to upgrade to SP1, including the active node.

First thing anyone do is basically to look at the error log and look for anything with the word ‘error’ or ‘failed’ in it.  This method found a ton of ‘error’ and ‘failed.’  Apparently the best method is to look for ‘at microsoft.’  This will help you narrow it down faster.  The error message I got was:

2011-10-31 16:38:02 SQLEngine: : Checking Engine checkpoint ‘SQLCluster_ConfigureResourceType’
2011-10-31 16:38:02 SQLEngine: : The source DLL file exists = ‘True’.
2011-10-31 16:38:02 SQLEngine: : Type ‘SQL Server’ exists and DLL is installed …
2011-10-31 16:38:02 SQLEngine: : Type ‘SQL Server’ has 2 resources. Performing upgrade …
2011-10-31 16:38:02 SQLEngine: : The source DLL file is ‘D:\Program Files\Microsoft SQL Server\MSSQL10_50.AASQL\MSSQL\Binn\SQSRVRES.DLL’ and the target is ‘C:\Windows\system32\SQSRVRES.DLL’.
2011-10-31 16:38:02 SQLEngine: : File versions : ['D:\Program Files\Microsoft SQL Server\MSSQL10_50.AASQL\MSSQL\Binn\SQSRVRES.DLL':2009.2009.2500.0], ['C:\Windows\system32\SQSRVRES.DLL':2009.2009.1600.1].
2011-10-31 16:38:02 SQLEngine: : Upgrading resource type ‘SQL Server’ using resource for instance ‘AASQL’ as reference …
2011-10-31 16:38:23 Slp: Configuration action failed for feature SQL_Engine_Core_Inst during timing ConfigRC and scenario ConfigRC.
2011-10-31 16:38:23 Slp: The RPC server is unavailable
2011-10-31 16:38:23 Slp: The configuration failure category of current exception is ConfigurationFailure
2011-10-31 16:38:23 Slp: Configuration action failed for feature SQL_Engine_Core_Inst during timing ConfigRC and scenario ConfigRC.
2011-10-31 16:38:23 Slp: System.ComponentModel.Win32Exception: The RPC server is unavailable
2011-10-31 16:38:23 Slp:    at Microsoft.SqlServer.Configuration.Cluster.ClusterResource.UpgradeResourceDLL(String nodeName, String dllPathName)
2011-10-31 16:38:23 Slp:    at Microsoft.SqlServer.Configuration.SqlEngine.SQLEngineClusterFeature.UpgradeResourceDLL(SQLServiceResource sqlResource)
2011-10-31 16:38:23 Slp:    at Microsoft.SqlServer.Configuration.SqlEngine.SQLEngineClusterFeature.ConfigureSQLEngineResourceType()
2011-10-31 16:38:23 Slp:    at Microsoft.SqlServer.Configuration.SqlEngine.SqlEngineSetupPrivate.Patch_ConfigRC(EffectiveProperties properties)
2011-10-31 16:38:23 Slp:    at Microsoft.SqlServer.Configuration.SqlEngine.SqlEngineSetupPrivate.Patch(ConfigActionTiming timing, Dictionary`2 actionData, PublicConfigurationBase spcb)
2011-10-31 16:38:23 Slp:    at Microsoft.SqlServer.Configuration.SqlConfigBase.SlpConfigAction.ExecuteAction(String actionId)
2011-10-31 16:38:23 Slp:    at Microsoft.SqlServer.Configuration.SqlConfigBase.SlpConfigAction.Execute(String actionId, TextWriter errorStream)
2011-10-31 16:38:23 Slp: Exception: System.ComponentModel.Win32Exception.
2011-10-31 16:38:23 Slp: Source: Microsoft.SqlServer.Configuration.Cluster.
2011-10-31 16:38:23 Slp: Message: The RPC server is unavailable.
2011-10-31 16:38:23 Slp: Watson Bucket 1
 Original Parameter Values

You can find this message in the ‘detail.txt‘ in C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Log\yyyymmdd_HHmmss\instanceName.  Any detail.txt files after the initial failure won’t have this message!  Based on this error, I started digging around.  I checked the SQL instance and it was running, but when I told it to give me product version, using SERVERPROPERTY(‘ProductLevel’), it showed SP1 instead of RTM.

How the heck did it failed the upgrade but yet made it to SP1?  Having worked and troubleshoot csharp codes/errors in the past, it seem to me that it has something to do with the RPC service timing out while trying to replace the .DLL file.  The sysadmin and I wanted to manually replace the DLL file but to be safe, we contacted Microsoft’s CSS, since we have xx amount of free calls a year based on our agreement.  So I can only guess that, the SP1 patch updated everything but failed at replacing the .DLL file.

After some investigating on their part, they also suggested that we should manually replace the DLL file.  Having done so, the SP1 upgrade went without any additional glitch.  I asked CSS why this happen and how we can avoid it in the future, they said they will look but haven’t heard anything back.

Update: 12-05-2011

Today, I asked the sysadmin to see if he got any follow up on this topic.  And it seem from MS’ response is that the upgrade was attempting to connect to the clustered name and timed out, which explained the ‘The RPC server is unavailable’ error message.  Because it couldn’t communicate with the clustered name, it couldn’t stop the clustered service on the box in order to replace the .DLL.