Thursday, May 11, 2017

Azure Metadata Service Scheduled Events: An initial look

For a little while now, the Scheduled Events functionality in the Azure Metadata service has been in preview.  I recently had a chance to play with this on a test VM and the goal of this post is to walk through some basics.

From a setup perspective, configuring scheduled events for your VM is quite easy.  All it takes is a get request from your VM after setup to a private, non-routable IP address.  As per the documentation, this server may take up to 2 minutes to initialize, so be aware of that.

There are three types of events that are handled by this server: Freeze, Restart, and Shutdown.  The idea behind the service is that you could have a task running continually to watch this endpoint waiting for events.  When it does find an event, you could handle whatever is required (say gracefully shutdown the service or fail to a secondary) and then action the event by sending a POST request back to the endpoint.  Pretty neat for being able to handle HA situations.

I found a couple of interesting things while playing around with this service.

- Deallocating a machine clears any pending events.  This makes sense based on my understanding of what deallocation actually does.  Further, when you hit the endpoint again upon reallocating the VM it seems to "re-register" with the service

- Issuing a manual restart when a restart event is pending does not clear the restart event.  This makes sense and lends itself to the documented workflow that you should ask the Azure fabric to run the pending event by "approving it"

- I also noticed that there is a resources element in the JSON file but it seems to append an _ to the resource name. See the following image:


I think there are some interesting use cases for this service.  Some thoughts that I have:

- It would be nice to see this added as part of an ARM template, maybe as a VM extension
- It will be interesting to see how this integrates with toolsets like DSC.  I would image that you would have to disable them for some period to avoid auto-correction like capabilities.  The issue with this is that you don't get an event to trigger when the scheduled event is finished (or has just finished)
- I would love to see something more global (IE: reporting in the portal along with email alerting).

It is cool to see Azure moving to deliver more host health information and this service is no different.  I think there are some workflow pieces to work out as this gets integrated into a production environment.