This started out as a post about the USPS's new Electronic Postmark, but I realized I needed to explain how it related to my work life in addition to my personal (cryptography) interests. That meant I had to explain a bit more about what I do for a living than I have here before, which is enough content to warrant its own post. If you're only interested in discussing crypto ideas (ahem,
I work for a direct marketing company.
Yeah, I know, get over it.
The vast majority of our net profit comes from postal mail (things like glossies to folks about their mortgages and health insurance plans). Most of this gets printed on hallway-sized printers, and we employ quite a few people (at, I presume, minimum-wage-but-with-benefits) to do “fulfillment” (essentially stuffing envelopes; tractor-trailers worth of envelopes weekly). A significantly lower, but next largest, percentage of our net profit (but a nearly equal proportion of our gross profit[1]) comes from “marketing databases”[2]. This latter chunk is what I'm primarily concerned with at work (though I'm involved with all of it, and the two things are necessarily interrelated; you decide who to mail based on information in the database, and you feed the results of the mail campaigns into the database for next time). There's a tiny segment of our customer base that wants to send email rather than postal mail. So we do a little bit of that[3], but nothing that isn't clearly marked as advertising (because you can both have a reputable regular advertising business and be a spamhaus; you'll get completely blackballed… and the money's still in real advertising much more than it is in spam).
A lot of the traditional software with relevance to direct marketing (things like Group1's software to standardize postal addresses and merge-purge lists of information about individuals and households; there are plenty of other examples) has traditionally run in a mainframe–particularly IBM OS/390^H^H^H^H^H^Hz/OS–environment, and there's really just no reason to move custom-built COBOL software or driving the printers off the mainframe. Printers particularly. But the commercial software? It's way cheaper, both in licensing cost and in MIPS of the (Outrageously expensive! Millions-of-dollars-on-a-goddamn-lease expensive!) mainframe processor dollar figures off the mainframe on onto “Open Systems” (what mainframe and EMC people call anything that's not a mainframe OS… and that includes Windows; yeah, “open”).
This means moving things from DB2 to Oracle, where even the (rather high) cost of an Oracle license is justified, and to (Microsoft) SQL Server, where it isn't (things like PostgreSQL and SAPdb are way too slow internally to have a prayer here, and the cost of SQL Server is insignificantly more than free relative to the cost of Oracle; MySQL's not even in the running for real DB work). It means getting the Linux versions of stuff from Group1, SAS, and Syncsort. It means teaching people who are used to writing JCL jobs to, instead, run Perl scripts. But most of all it means having functional ways to migrate the data not just from the mainframe, but back to it (remember, it's got to go back to the MF to get printed, since neither we nor the software available for Unix/Linux/Windows are in any way prepared to drive as many printers as well as the MF does).
Even after you get past the the (relatively simple) EBCDIC to {ASCII,Unicode} troubles, this is a huge, complex problem. The mainframe has a TCP/IP stack sort of glommed onto the OS as an afterthought. You can FTP things back and forth, but there are some SITE commands you MUST issue[4], since z/OS doesn't really have a file system in the sense that Unix people are used to [5]. It certainly doesn't run SSH, and probably never will (unless you know of an SSH client that plans on supporting x3270 terminals natively). It's not just that it speaks a different language or drives on the other side of the road: its means of travel are totally foreign, though they function just fine and address the same basic problems. I'm probably losing half of my audience here, so maybe I'll quit, especially since Unix-MF interaction could fill not just a post but an LJ community of its own.
The undisputed leader in this direct-marketing-and-marketing-database market (there are a lot of “market”s there; yes, my employer's business is essentially parasitic, as is all advertising) is Acxiom. I don't work for them, and my employer is certainly smaller than they, but if you were to compile a shortlist of companies that do what we and they do, we'd be on it. I'd rather avoid specific names since, although I avoid saying anything terribly disparaging about my employer here (not because I'm suppressing it: if I got to the point that I needed to vent like that, I'd just fucking quit), this journal probably isn't really in support of the corporate image. Here's a dead giveaway (that still doesn't make this googlable):
If nothing else, it's a job, as a Unix sysadmin, playing with shiny toys (FC-AL disk arrays, big Sun systems, physically smaller but still hefty Linux systems), some seriously important software (Veritas Volume Manager especially… incidentally, if Veritas chose to make/buy/endorse a POSIX operating system, they could absolutely crush Solaris/AIX in the database server market), and some very interesting problems (particularly integration of Open Systems with mainframe systems).
[1] The operating costs for doing marketing database, statistical analysis, and “campaign continuance” are THIS BIG. There's the hard costs, of the computer hardware and software to do the job, and the infrastructure (data backups, data center power), but they're pretty much dwarfed by the soft costs (incredibly skilled labor in software developers, statistical analyzers, and MIS and IT people–like me; presentation to the client; maintenance of relationships with the client and with their own internal IT people). Compared with getting something to print from the client, sending it to a printer, and paying some immigrant (not an insult, it's just true) worker to shove it in an envelope and mail it, this is huge.
Worse yet, it's really easy to meter and charge for postal mail. (”We sent 100 million pieces of mail for you in October. At our rate of 5 cents a piece, that comes to $5 million.” No, we don't actually charge anywhere near that much, and wouldn't be able to; that's way above market value.) It's basically impossible to come up with a metric like that for database work. You can charge by how many rows (few people talk of data quantities in numbers of bytes around here, it's almost always in rows; yes, that's ridiculous and two different people's idea of a row could be drastically different, but it's the logical unit from the customer's point of view: one row is one customer of theirs) are in the database… but that doesn't even get you to a reasonable quantity for storage allotments. Knowing the byte count per row doesn't help much either, because you have no idea how much space supporting files will take up until you get started. And, in any case, none of that pays for development time. So we probably could charge more than we do… except that we haven't any functional way to justify that cost to the client (and nobody else does either, to my knowledge), so it's hard to get them to believe that it's reasonable on the quote.
[2] A marketing database is a tough nut to crack. Unlike most database environments, where you know by which field(s) you will want to search later, you can't possibly know in a marketing database because any aspect of the data you've collected about individuals/households to which a customer is marketing may be relevant. This obviously hamstrings traditional methods of speeding relational database access like indexed fields, and makes hardware- and OS-level optimization drastically important.
[3]If you ever get spam routed through 209.71.48/24, know that the described opt-out method really does work and, if it doesn't, please let me know and I'll see that it gets fixed, rather than reporting us to a blackhole list. (We've been reported to blackhole lists before. They routinely blackhole our whole corporate network, rather than the separate addresses we have for sending spam, so it doesn't block anything but legitimate email anyway, and the spam still gets through. And we always get back out within a couple of days. Trust me, going through me is better than playing the blackhole game.)
[4] Go check the RFCs; this is a totally acceptable thing to require, though I don't think I've ever seen it used seriously except on the mainframe.
[5] Even when you're writing to disk, called DASD (direct access storage device, to distinguish from tapes that you have to spin past a read head till you see the block you need, and then read linearly; like a Turing Machine) in this world, the interface is very much based around the interface to tapes that was the way of the world for data storage for twenty years before Unix was written.
Post a Comment