Sunday, September 17, 2017

A .NET update broke my Azure Backup

I am currently working at a client where we are making use of Azure Backup to keep one of our critical servers backed up.  As this is production, we have been fairly diligent about keeping our servers up to date.  It turns out that a .NET patch broke the Azure backup process.  This post talks a bit about this.

During a routine check of my provisioned Azure services, I noticed that my critical server was not being backed up.  Here is a sample error report that I got.



Umm... what?  I took a look at the troubleshooting steps and followed all the steps. 

1) My server is protected by NSGs, but we do not limit outbound communication, so that was fine. 

2) The agent seemed to be communicating (the portal was getting the correct information)

3)  I attempted rebooting ( no luck) and reinstalling the guest agent (also no luck)

The only thing I could tell was that it stopped (a) the last time we patched and (b) it errors out on the "take snapshot" step.

I ended up contacting support and the problem ended up being that a .NET update had happened, and a couple of registry keys has been blown away.  These registry keys have to do with the type of encryption used for TLS (by the .NET framework).  For reference, here they are:



[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\.NETFramework\v2.0.50727]
"SchUseStrongCrypto"=dword:00000001

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\.NETFramework\v4.0.30319]
"SchUseStrongCrypto"=dword:00000001

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\.NETFramework\v4.0.30319]
"SchUseStrongCrypto"=dword:00000001

[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\.NETFramework\v2.0.50727]
"SchUseStrongCrypto"=dword:00000001

 Adding these allowed me to get past the "take snapshot" step, and I started to have successful backups.  Hope that helps!

Monday, September 11, 2017

Azure Backup and Resource Group Locks

One important feature from a security perspective is the concept of Resource Group Locks.  I generally recommend that locks be placed at the resource group level for all production resources.  While you can place locks at the resource level, it can aid in manageability if the locks are placed at the resource group level.

It turns out that locking at the resource group level affects the way Azure backup functions with the VMs in that resource group, and, in some cases, can cause the backup to fail.

When you execute a backup, here is what the shows up in the activity log for the target resource group (where the VM lives)


As you can see, there is a delete operation that occurs at the end of the backup process.  If you have a resource group lock enabled, you'll see the following error message:

From my experience, this does not prevent the backup from showing up in the vault, but does prevent the backup job from completing successfully.  According to Azure support, this situation could prevent the VM from backing up altogether.

The solution going forward is to not use resource locks at the group level, but rather manage them via arm templates on the resource level itself.