The thing about interesting games–and I mean things like backgammon, billiards, go, bridge, dominoes here–is that the basic mode of play is simple but the strategery is complex, so the game starts interesting because it's easy to do roughly the right things, and the game stays interesting because there's a reasonably sloped learning curve as you play the game more. Pretend, for the purposes of the subject of this post, that by “rules” I mean all of those things and that by “game” I mean “doing backups for a large bank”.
Here's the thing about the IT learning curve: if you're me, it stopped being steep some time in highschool. I'm not trying to showboat here… I operate under the assumption that most people, given a fair chance at education, find something that they do better than most of the rest of the population at some point, and this is mine. This doesn't necessarily coincide with “something you like to do,” though it mostly has for me, but that's been a bit questionable recently,
There's some variation, but a week of work, recently, has consisted of configuring a very small piece of the NetBackup puzzle on ever more systems. That piece of the puzzle is the tape access for “SAN Media Servers.” Operating on the baseline assumption that you've got some understanding of the mechanics of a home personal computer, here's how that works. In the same way that hard drives are attached (IDE/ATAPI or SCSI, whatever), we've traditionally attached tape drives to one computer, and then made a copy of that computer's data on that tape drive (a very slow process, but tape media can be stored, without use, for long periods of time, and still be relied upon to actually have that data when you want it later, modulo certain restraints). For businesses, having a tape drive attached to each individual computer quickly became prohibitively complicated, just on the tape swapping level, never mind the management level, so we started sending the data to be backed up across networks to a centralized server, which wrote that data to a tape drive attached to it. We eventually had more information than would fit on one tape, so we started using auto-switchers, juke boxes, whatever name you like, which had one tape drive, and a stack of tapes they went through in sequence. That made restoring a given piece of data prohibitively difficult (Scan through all the tapes?), so we built robotics devices which, with the magic of barcodes on the labels fo the tapes, could pull out the one you wanted, and could handle not just way more tapes, but more than a few tape drives to boot. And that was fine, really, for about ten years. Then we got to the point where we had more information on just one system than could, practically, be sent across a network “outside of work hours,” and the solution was that we already had a way, the Storage Area Network, to chop up a big stack of disks logically, and let many different systems have access to a subset of those as logical volumes (a concept divorced in theory from physical disks, though phsyicality can't be ignored if you care about performance) all at the same time. So… why couldn't we do basically the same thing with tape drives? Put all the tape drives in one system, share them out (with some communication to say who was using which drive and tape at what time) to a bunch of machines. Retaining some pieces of the previous generation, we still have plenty of systems that have small (relative to the speed of modern networks) amounts of data, but we've got a lot of them… so now, we can have maybe ten systems, instead of one, that share tape drives in a given piece of hardware, and that track what data is written where all in the same place, and this is helpful. But, more than that, we can for individual systems that have astronomical quantities of data by letting them write their data straight to tape drives, no network traffic involved (for the real data, anyway).
That's a pretty long paragraph, but I was summing up about fifteen years of technological development, so cut me some slack. In any case, I spend a lot of time configuring those systems writing their own large (think “twenty terabytes”–a terabyte being 10{00,24} gigabytes, being 10{00,24} megabytes–large) quantities of data directly to tape, and we call those SAN (for Storage Area Network) media (because they touch “(tape) media” directly) servers. Your average business, say $FORMER_EMPLOYER has a couple of general media servers (take backup data across the network, write it to tape) and five, maybe six systems with enough data to qualify as being SAN media servers. $CURRENT_EMPLOYER, in its pre-merger state (that is,
In doing this, I've found that we're doing just a bit more than most people who purchase the grossly overpriced NetBackup from Symantas (that's a conscious conflation of “Veritas” and “Symantec”, which I can't even claim responsibility for, as much as I'd like to: it comes from someone we hired away from that merged entity, because he wanted a “real job” rather than being roughly fourth tier tech support; I'd only had two calls escalated to him before we hired him, so he wasn't there all that long). I've found this by calling for support and being asked to describe the environment. Apparently, “Well, we've got about fifty media servers in this environment,” is functionally equivalent to, “Escalate this call now, bitch. Don't waste my time talking to your manager.” At $FORMER_EMPLOYER, I demonstrated to a (then-)Veritas employee that it was, in fact, possible to reconstruct data at the block level from their file system, even though they claimed I couldn't. At $CURRENT_EMPLOYER, I've openly stated fundamental design flaws preventing us from performing backups the way that we actually need to For The Business to tech support, and gotten nothing but simpering in response.
That sort of cock-sure attitude has me attending things like the Symantec Enterprise Data Protection Forum, which there are no publicly-accessible websites describing, I'm afraid, at Veritas's (hey, it's still the sign on the building, dammit) site in the suburbs of Minneapolis, where their NetBackup developers and third/fourth tier phone support works. In theory, there was to be one representative from the invited customers, and no more than fifty customer representatives total. We sent four people. This is something their developers finally got to have last year (this is year two), which I had to sign an NDA to attend so there are large chunks of conversation I'm legally bound not to relate, where customers show up and tell the developers, rather than tech support, about the problems they're having. I had very mixed feelings about two of my major problems, in that they're already addressed in NetBackup 6. Absent those, I still had two fundamental design issues (handling of “can't mount that tape” from an ACS-managed tape library–what NetBackup does is “freeze” about 500 tapes in 30 minutes; what it should do is give up on the tape library entirely, especially since the tape library is actively saying “I'm broken right now,” and dealing with multi-homed hosts in a sane way, rather than a “you are what reverse DNS on your IP address says ou are” way). I've faith that one of those two issues will be fixed in the next five years. (The tape mount thing. The multi-homed server thing probably won't ever be fixed in NetBackup, barring a full rewrite of the bits that the application holds fundamentally sacred. I should probably do some detail on how much that second part sucks another time.)
And this is all fun and games while I'm drinking six Tanq 10 martinis (dry, straight up, one olive) on the vendor, in the hotel (where I'm staying, at an absurdly reduced rate, on my corporate credit card) bar, in the middle of their “campus”. But then I come back home, and I'm back adding another five SAN media servers to an environment that was forty media servers over what I now know for certain, because I asked, what they'd ever tested before I started… and I can't decide whether to weep or laugh. Because the first half day of shiny toys they talked about were going to be released the following Monday, and the second half day of shiny toys they talked about were going to be released (maybe) by the end of this year, and the second day of shiny toys they hadn't yet fully conceived, but where I got to say “Look, here's what I actually want you to give me,” because they're early enough in the development process that structural requests like that might have some value… won't be released till, like, Q3 2007. Not that I won't still want those things then, but I'll have far more pressing needs by then.
That's maybe one tenth of what I have to say about this. I'll have to make a point of continuity, I guess, if I actually care about saying it. The “game” here, you see, is doing this configuration of new systems in such a way that their backups work, regardless of whether they're backing up just files on Windows or coming out of a DB2 database running on AIX, and there exist silly complexities across the board… but they all bore me at this point. A shiny new problem where NetBackup shits a brick because of baseline functionality for managing tape drives on HP-UX? This bores me. I've so seen it fifty times before. This game, it's no fun any more.
You may note a significant lack of a certain aspect of my job, and if you know me, you've heard me say, “My job is not about backups. It's about politics.” Politics are not part of the game I described here. They're a whole other game, unto their own selves, and will be logically divided as such. Because I've got a lot to say about that too, and I don't even know why I've been bottling it up for the weeks if not months that I have.
Post a Comment