AppAssure Pitfall, the Dreaded “Base Image”
The initial trouble ticket usually comes through as a problem with replication. Whether it’s an MSP alert or an onsite admin noticing that replication is out of date, a very large replication job (sometimes upwards of several terabytes) is gumming up the works. You log in to the AppAssure console, and immediately notice the lack of space on the repository. Uh-oh! We got a “base image.” When this happens to a smaller server, it can often go unnoticed. But when it happens on a server with a data volume of 1 TB or greater, watch out!
What is a base image?
A base image is a full VSS snapshot, compressed but not de-duplicated, of all the volumes on a given server. A base image is necessary as the initial “seed” of an agent server. From that initial base, all subsequent backups should be “incrementals,” meaning that they only reflect the changes in the server since the last successful backup. Along this line, a good “recovery point chain” will consist of a single base image followed by a string of incremental backups. But now, all of a sudden, you see a very large new base image in your chain, where you’d expect to see an incremental backup.
What can happen if I get a new base image of a very large volume?
In an AppAssure repository lacking space, the result can be catastrophic. It can entirely fill the repository, thereby stopping all backups and replications. In a repository with a moderate amount of space, it will likely only affect replications, but it will fill a great amount of the available free space, leaving you just one misstep away from calamity. In a repository with lots of space, the only ill effect is likely to be on replications to the remote core.
Why did I get a base image?
Unfortunately, many things can lead to a base image. They are impossible to entirely prevent or anticipate. According to Dell, here are the top ten reasons why a server might base:
An unexpected base image may be taken if:
- Dirty Shutdown of the agent. If the protected machine has a dirty shutdown, AppAssure will automatically take a new base to ensure that the recovery chain is not corrupted by any potential errors that the machine could have due to the dirty shutdown
- The driver log file has been deleted (AALog_*.log). The file is located at [drive letter]:\System Volume Information folder.
- The driver failover file (AAFailover.md) has been deleted. The file is located at [drive letter]:\System Volume Information folder.
- WARNING: Dell strongly discourages users from manipulating driver files without the help of a trained engineer.
- Driver logging was disabled due to some troubleshooting steps. This might occur if the assuremc. exe utility was used in the environment during possible troubleshooting steps.
- The AppAssure filter driver (AAFsFlt) was unloaded. By unloading the filter driver AppAssure is unable to track changes on the volume and must start over with a new changelog and base image.
- Amount of changes in the log file is too large and the system runs out of memory while building a map of the changed data. This can happen to large file-servers with significant amounts of data.
- The source volume currently under protection has been extended or shrunk after performing a rollback.
- A new encryption key was set for the agent. After the key has been added, a new base image will be taken.
- The encryption key has been removed for the agent. After removing the key, a new base image will be taken.
- The encryption key for the agent has been changed.
What can I do if I get a base image?
Like many other things in life, an ounce of prevention is worth a pound of cure. In this case, the prevention methodology is to have an adequately sized repository to withstand a new base image of your largest protected server. As a rule of thumb, it’s best to figure out how much space you need to back everything up, then double it. With the cost of storage these days, this is probably within the reach of most customers considering AppAssure. But what if a base image event takes us by surprise and fills a repository? All you can do at that point is delete backups or add disks to your storage array (remembering to be consistent in that regard across all cores). Also, you will most likely need a large capacity USB drive in order to resume the agent replications to the remote core. Because of bandwidth limitations, base images of more than a terabyte will rarely cross the WAN successfully. In this case, you can copy the agent to the USB, ship the USB to the remote core, and consume it on the other end from there.