0

AlwaysOn Availability Group: Lesson Learned

Not long ago I was adding a few databases to an AlwaysOn Availability Group (AG) in our development environment. I was using the GUI and I had added, removed and added back these particular databases so many times successfully I was not giving the process my full attention. So, when I noticed an error just as I was clicking the “Close” button it was too late. I was curious about the error so I refreshed the availability databases in object explorer. The databases were all there in the availability databases folder. I then decided I’ll check the databases folder in object explorer and sure enough all the databases I had added had “Synchronized” out to right. Well that’s odd but it looks like it set up the availability group so on to other tasks.

The next day when I got to the office I realized I never checked the secondary server to see what was going on with those databases that failed. After securing a cup of coffee, I logged into the secondary server and expanded the databases folder in object explorer. What I found there surprised me. All of the databases I had added yesterday were there but some were in a restoring state. I decided I would check the error log. Comparing the log entries for the failed database to the log entries of the successful databases I noticed that the log entries were similar except the failed attempts did not have the steps for adding them to the AG on the secondary server. I then checked the error log on the primary server but that log was less useful than the secondary server’s log.

I proceeded to drop all the databases that were in a restoring state on the secondary server, removed the corresponding databases from the AG on the primary server and attempt to add the failed databases again through the GUI. My hope was that I would recreate the issue and be able to determine what the error was that occurred yesterday. I was unsuccessful in recreating the issue because the process ran successfully this time. Weird! What was different between when I ran it yesterday and when I ran it today? Well, as it turns out I had an hourly transaction log backup job that started some time during my attempt to add the databases to the AG the day before and that log backup was messing up the log backup chain. Let me show you what I mean.

If you use the GUI but on the last step instead of clicking Finish you click Script and then click Cancel you will get a script similar to the one below.

I added the line:

after the scripted log backup to simulate your backup job running. If you execute that script with the added line you will receive the following error.

ErrorMSG

As you can see the mirror database was not able to be synchronized with the primary database because the log chain had been disrupted. However, you can see from the following screen shots that the primary server thinks everything has succeeded.

Node1_DBs

Node1_AG_DBs

The secondary server, however, paints a different picture.

NODE2_DBs

Now had I been paying better attention and not so quick to click close I would have seen an error from the GUI that could have assisted me in my search to understand what happened. That error would have looked something like this.

WizzardErrorMSG

That message is not as useful as the error what you get from the script, in my opinion, but it’s vastly more helpful than the error log was.

Now, you may be thinking that’s all well and good but what do I do now? Do I have to drop the databases and remove them from the availability group like you did? Those are excellent questions and the answer is no. You simply need to find the transaction log that was taken during the setup process and restore that transaction log to the mirror database. Once that log has been restored you can run the ALTER DATABASE command to add that database to the AG.

Your database is now in the AG and in a synchronized state on both the primary and secondary server.

I learned two important lessons from this experience. First, just because a database is showing that it is in the AG and synchronized on the primary server doesn’t mean that the database is also synchronized on the secondary server, so check. Second, I’ve learned that scripting the process and then running the script could keep me from potentially missing some valuable information.

Related Post

Leave a Reply

Your email address will not be published. Required fields are marked *