Oh, this is just lovely.
I really can't tell you how much fun I had with this one.
I've got one particular Linux system that has an Oracle database. It's to be backed up as a client, rather than as a SAN Media Server, which all the other Oracle machines (on Sun E450s) are, since they've got access to the SAN (and an order of magnitude more active storage) than the Linux system.
But it's been failing with the infamous error type 6.[1] The log messages given to me by Oracle's RMAN aren't much more helpful:
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ch00 channel at 05/17/2004 13:45:53
ORA-19506: failed to create sequential file, name="bk_19_1_526398352", parms=""
ORA-27028: skgfqcre: sbtbackup returned error
ORA-19511: Error received from media manager layer, error text:
Failed to process backup file <bk_19_1_526398352>
So, now we've got NBU blaming RMAN (and saying go check its logs), and RMAN blaming the “media manager layer” which is–wait for it–NBU! Sweet!
Well, okay, I don't really think it'll solve the problem, but lets make sure we're running the patched (yeah, NBU 5 was brand new when it got here three weeks ago, but there was a decent-sized patch cluster already out which, unless I'm mistaken, actually caused the breakage this post is about). The way the manual says to do that is (paraphrased):
- Run bpplclients -allunique -noheader > filename, which produces a list of all of your clients for which policies (that's the pl part; they used to be called “classes”, so there are a bunch of sym links at, for instance, bpclclients so that people's scripts don't break, though I hear they're going away soon) exist. And, lo and behold, there's the one I want:
Linux Redhat2.4 grind
(Let's not discuss the fact that the names of those fields, if you don't specify -noheader are “Hardware OS hostname”. Yeah, that system's on Linux hardware.)
- Edit filename to remove any clients you don't want. So I trim it all the way down to just that one line mentioned above.
- Run update_dbclients Oracle -ClientList filename. And here's where the crystal palaces come crashing down. I get:
cyclone:netbackup/bin# ./update_dbclients Oracle -ClientList /tmp/foo The following clients with the OS type listed were skipped due to one of the following reasons: 1. they are non-UNIX clients (which cannot be installed or upgraded from the server) 2. there is no specified database agent software available for that type of client 3. the matching database agent software was not loaded on the server Client Name - OS Type --------------------- grind - Redhat2.4 File /tmp/skipped_clients.9151 contains the complete list of skipped clients.
The file containing the list of skipped clients? Yeah, it's just a list. No reasons. It's only there because the decided they only wanted to display the first 5 or so skipped clients and dump the rest in a file. Cute, huh?
So, what's the problem here?
Turns out update_dbclients (a Bourne shell script, thankfully, because at least then I can fucking fix their idiocy) is looking for (exact string match) “RedHat2.4″. Skim back. See what bpplclients is producing? Yeah: “Redhat2.4″. Maybe case doesn't matter to you crackheads over there at Veritas, but out here in the really real world of Unix systems administration we've noticed, over time, that it's kinda maybe important to at least do an insensitive match if you're going to go changing around the case of your output. You jackholes.
update_clients is similarly affected, but it doesn't do any decision making in the script, it merely relies on the /usr/openv/netbackup/clients directory hierarchy, so making a symlink from …/Linux/Redhat2.4 to …/Linux/RedHat2.4 is the way to make that one work.
So, at least that's fixed. But the real problem I was having (backups don't work on grind)? Yeah, updating the software didn't change that. Hell-ooooo Veritas tech support!
[1] Here's what NBU's help features tell you about error type 6, including the recommended actions (the -r flag gets you that):
cyclone:netbackup/bin# bperror -S 6 -r | fmt -s
the backup failed to back up the requested files
Errors caused the user backup to fail.
Try the following:
1. Verify that you have read access to the files. Check the status or
progress log on the client for messages on why the backup failed.
Correct problems and retry the backup.
2. On Windows clients, verify that the account used to start the
NetBackup Client service has read access to the files.
3. On Macintosh clients, this code can be due to multiple backups
being attempted simultaneously on the same client. Some possible
solutions are:
* Adjust the backup schedules.
* If the client is only in one policy, set the policy attribute,
Limit jobs per policy, to 1.
* Set the NetBackup global attribute, Maximum jobs per client, to 1
(note that this limits all clients in all policies).
4. For a UNIX database extension client (for example, NetBackup for
Oracle), this can mean a problem with the script that is controlling
the backup.
Check the progress report on the client for a message such as Script
exited with status code = number (the number will vary). The
progress log also usually names the script.
Check the script for problems. Also, check the troubleshooting logs
created by the database extension. See the NetBackup guide that came
with the database extension for information on the scripts and
troubleshooting logs.
Yes, they really do have an “an error occured” error type. And the recommendation is “you should fix that problem you're having”. Gee, thanks. Moronic monkeyshit like this will have me in ulcers before 30 (never mind the espresso).
Post a Comment