Skip to content

Attention Symantas developers: you're all fired.

I direct your attention to Symantec (IGNORE THE DOMAIN NAME BEHIND THE CURTAIN! WE ARE THE GREAT AND POWERFUL SYMANTEC!) Software Alert, Document ID 281323, With VERITAS NetBackup ™ 4.5, 5.0, 5.1 and 6.0, when running a User Archive backup, an incorrectly handled routine results in a rare data loss situation. (Oh yeah, ignore the capitalized word in the document title too. And all over the document. And when we accidentally answer the phone with it still.)

In summary, ignoring a lot of the details of where the data’s coming from, you can use NetBackup to put data on tape, or you can do that and delete the data that you put on tape from disk after doing so. Inclusion of the deletion step is called “archiving” (you haven’t made a copy of it somewhere else to go get in case you lose the live copy, you’ve essentially filed the thing away, and you’re okay with it being kind of a pain to go dig it out, because you’re pretty sure you won’t need to). This isn’t the most common case, by a long shot, but it definitely has its very real business purposes. The devil, as always, is in the implementation (some people will call this “details,” but those people are just stereotyping). If you tell NetBackup to Archive a directory (rather than a full file listing), on any operating system, with any available version of NetBackup through and including the most current, NetBackup chooses to do the “remove” step as a recursive removal of the directory at the operating system level, rather than by doing an explicit removal for each file under the directory as it performs the backup.

Okay, file system 101 pop quiz.

Q: Why’d they do that?

A: Because fork()/execve()ing one shell call to do all the work saves a lot of expensive processor context switches over doing that for each and every file as its written to tape. People don’t like slow computers, and writing to tape media is already way slower than anybody who doesn’t understand the reasons thinks it should be. Actually, I can’t think of a good answer for this one.

Q: Name two reasons that is this an indescribably careless optimization.

A: First reason is that no read-from-here, write-to-there operation, certainly not one to tape media, is atomic relative to the data being read unless you establish a locking mechanism to do so and everybody bothers to obey the rules of your locking mechanism, so files in that directory can be both created and modified many times while the tape write is going on but has moved past them in the index (if they even existed in the index, created at the time the process began; they don’t in the case of creation), which new files and changes we’d kind of like to keep, since they won’t be there in the data on the tape… but a recursive delete of the directory nukes them along with everything else. Second reason is left as an exercise for the reader. (Hint: think, “But maybe I wanted to keep the directory structure under that directory, but wanted all the actual files to go away.”)

I tried to think through the conversation, whether aloud or internal monologue, that went past in the decision (and it had to be a decision; you don’t figure out how to say “delete recursively” in four or five different ways without considering what you’re doing for a minute or two), and I can’t come up with anything better than this:

“Do you think there’s a problem if we just do a recursive delete?

Nahhhhh…

I mean, hey, we’re backing the files up, so we’ve got a perfectly good list of exactly what we backed up, how large it was when we read it, and all its other metadata, including the last time it was read, the last time it was written to, and so forth, and it’d be trivial to use that and only delete exactly what we’ve backed up, but I can’t think of ANY REASON we’d need to bother doing THAT.”

Look, guys, you forget to do some bounds checking on your intro-to-OOP homework? Fine, whatever, punt it. But this isn’t the kind of mistake that should ever make it into a piece of software explicitly designed and marketed to protect data from loss.

Post a Comment

Your email is never published nor shared. Required fields are marked *
*
*