Monday, December 8, 2014

The news RSS Webjob

For the I decided to build a webjob that would handle the reading of the RSS feeds and putting that data into a database that I could then mine.  Being relatively new to using the SyndicationFeed class in .net, and to webjobs, my code went through a few iterations.

After creating my solution and subsequent website project, I added a webjob using the built in mechanism to Visual Studio 2013, by right clicking on the web project, going to add, and selecting New Azure Webjob Project.

New Azure WebJob Project menu selection

 After running through a small wizard, a project is created for you with a 2 main code files.

class Program
  static void Main(){
    var host = new JobHost();
    host.Call(typeof(Functions).GetMethod("ManualTrigger"),new {value = 20});
The other one is the functions file that contains a definition for ManualTrigger.  The JobHost is part of the Azure WebJobs SDK and provides the following services out of the box:
  1. Get azure account strings from your app.config  (the strings can also be passed in directly to the ctor).
  2. Reflect over your code to find C# methods with the SimpleBatch attributes. (much like how WebAPI discovers controllers)
  3. Listen for new blobs that match the [BlobInput] pattern.
  4. when a blob is found, invoke the function
  5. automatically log the invocation so that you can view the results in a separate dashboard. 
So basically, although you can create and run almost any program as a WebJob, by creating one using the JobHost, it provides some special helpers to deal with interacting with other Azure services.  Pretty neat.  The last point above (the dashboard) is super handy for monitoring the webjobs and getting some quick debugging information.  I will get to that later.

One key thing to note is that I had to add two connection strings to my website config in order to get the dashboard to work correctly.

Another interesting thing is the way that the function is called.  It's a static method that is called via reflection.  Interesting.  This makes it hard to do DI or anything like that.  I guess the design pattern here is that a webjob should really do only one thing, and therefore, you shouldn't need to reuse a bunch of the code.

Okay moving on, since I was just fooling around and not following TDD, this is the first version of code that I wrote.  Ewwwww.  I didn't even want to show it.

I know the above is a picture.  Don't copy it, it sucks.  Firstly, I built a webjob that runs on a schedule.  This is different to a continuous webjob, or one that is called via external trigger.  In order for the JobHost to find this method properly, you have to annotate it with the [NoAutomaticTrigger] function.  I figured that I would update my data every hour, so I set that to be the initial schedule.

There are several things bad about the above code, but most importantly, it isn't testable.  I need this function to follow some logic:

  • Read feed and get the latest info
  • Read the database for all posts from that feed
  • If the feed contains new posts, add them, else, continue on

Although it seems easy to read the above, I want to be able to make it testable.  Furthermore, I don't like how this functions class knows about the database, etc. After some refactoring, here is what the code looks like.

        public static void GetFeedData(TextWriter log, IDataService dataService, IRssService rssService)
            var feeds = dataService.GetAllFeeds();
            foreach (var feed in feeds)
                Console.WriteLine("Processing " + feed.Name);
                var posts = dataService.GetAllPostsFor(feed);

                var newPosts = rssService.ReadRssFeed(feed);

                var postsToAdd = newPosts.Where(x => !posts.Any(y => y.PostId.Equals(x.PostId))).ToList();

                postsToAdd.ForEach(x => dataService.InsertPost(Post.CreateFrom(x, feed)));

After some refactoring and moving some core components out into services, I can now test the code.  I used Moq in another test project (which I won't show here).  One interesting thing about webjobs, is that any Console.Writeline messages actually get piped to the output and easily viewed.

Pretty cool.  The last thing I needed to do is add in some error handling and event logging.  It turns out that websites and webjobs use the built in Trace mechanism and log to a log source of your choice automatically.  You can find out more about what those options are at this link