About 10am on Wednesday, an instant message window pops up on my workstation, from my buddy and coworker Jim. "Hey, you see all those Gentran errors?"
"Ah crap," I thought, "there goes my day."
I work as a software developer, but in a specialized role: my group writes software to support a production electronic commerce and energy trading system, fitting it with new regulatory requirements, feeds to new vendors, retrofitting logging and administrataion solutions, and our current nightmare, making our code compatible with new vendor frameworks, and migrating it en masse. (Vendor lock-in, in my opinion, is the biggest scam played out on American businesses, and 9 years of bellyaching about that in my job has netted me very few victories.)
In short, I write code, but for the purpose of keeping an enterprise integration system running. In my group, we rotate who is assigned for "production support", where you try to work on your coding tasks as you can, but when there are any hiccups in the system, you're suddenly in a system admin role instead, complete with (justifiably) panicked business users worrying about the status of their million dollar wire transfer, or avoiding regulatory fines for processing data late, etc.
Wednesday was my day, but it wasn't supposed to be. My aforementioned buddy, Jim, was up to the plate, except that his daughter was in the hospital, and naturally I kept support duties until someone else was available. Jim is like me, he wants to keep the system running, knowing the consequences to human lives if, say, the payroll file doesn't make it to the bank, or the system that routes linemen to downed power lines stops processing new outage notifications, or if we're slow in processing receipt of a late customer payment and the cutoff guy gets dispatched.
There are very human consequences if our system gets out of whack, and so we take our jobs seriously, often skirting controls and bending rules to keep the data moving, and trying our best to look contrite when auditors or bureacrats come around. Or at least I do. As far as I know, everyone else follows the rules 100%; that's my story, and I'm sticking to it.
Anyway, so there were all these Gentran errors, and Jim, logging in from the hospital, showed me an error that was going to eat up the entire day:
::004,lftran , Compliance check error(s): Seg Pos. Segment Element Data (1st 20 chars) Error -------- ------- ------- -------------------- ----- 0 GS 04 20130102 Invalid date 12 DTM 02 20130102 Invalid date
A little background
One of the software packages we use to translate and deliver data is called Gentran:Server for Unix, written by my old employer Sterling Commerce. It's my understanding that the entire Gentran:Server suite has been out of support at Sterling for some time now, and they are pushing a new product line with "Integration" or "Broker" in the name somewhere.
We use Gentran mainly to translate ANSI X.12 (EDI) data to fixed-length records that our mainframe can process by referencing COBOL copybooks. The "GS" segment is the "group envelope" used in all X.12 transactions, and its configuration should never be toyed with by anyone but the vendor. Ever since version 4010 of the X.12 standard, all dates (except the one in the "ISA" or "interchange envelope") were changed from 6 digits to 8 digits, to make them Y2K compliant. 20130102 is, of course, January 2nd 2013, and is a completely valid date for the GS segment.
Something big was wrong, and it would affect all the documents we received until we fixed it. And since the software was out of support by the vendor, it was left to me, with a steadily increasing headache (our floor is being remodeled, and I was breathing in all the dust that would leave me wracked with sinus pain for most of the night... but that's another story), to modify things in the vendor code that we're never supposed to touch.
There was no putting it off, and nothing for it but to wade in. I applied my standard problem-solving algorithm: assume everything you know is wrong, and build the problem from the ground up. Since everything from December 31, 2012 was working, and now nothing was, I suspected that Gentran didn't like 2013 for some reason. So I started out of the gate by doing an internet search for "gentran 2013 invalid date", and the top link was this:
004,lftran , Compliance check error due to Invalid date
From the page:
Cause: When using the D12 format 2013, for example, is being read as month 13, which is bogus
Resolution: Set DTM segments to DC4 format (date with century).
So I opened Gentran's implementation of the 4010 standard, and, sure enough:
...the GS date field was set to D12. So what does that mean?
It means we're taking a field with a fixed length of 8 digits, and applying a 6 digit format string to it:
20130102 YYMMDD??
Month 13, indeed. Apparently, and this is what really irks me, this date pattern matching has always been completely wrong, and completely useless. We never did anything with the dates other than grab the string wholesale and stick it somewhere else. It never mattered that Gentran thought all of last year was December 2020... or 1920, but when the date was out of what it considered the correct range, document processing ceases. This has been wrong in the Gentran config for all 9 years I've worked here, and was never significant until Wednesday.
The DC4 format that the IBM article referenced matches the 20130102 string correctly:
The same was true of 31 other fields in the invoice standard. I could use the GUI tool that these screenshots came from to edit each field as I found them, then recompile the map, but I didn't trust myself to find all of them, and I couldn't afford for this problem to go on any longer than it had to. So I decided to take a look at the map source code to see if there was some pattern matching I could do there. I found this:
0 0 0 1 4 8 8 0 0 0 1 Data Interchange Date M 0 DT S27 0029 D12
Now, there's no documentation (that I've seen, anyway) that says what the layout of a map file is, or what the consequences of poking around in it outside of the GUI tool will have, but I inferred a lot based on matching these lines with the GUI screens. The 4 must be the sequence number in the segment, the 0029 matches the reference number, the "Data Interchange Date" was the same as what was in the ELEMENT NAME field, and of course format D12 is right there in the text, waiting to be edited.
To be safe, I didn't want to globally replace all D12 instances with DC4, as there may be legitimate uses of YYMMDD, such as the ISA date. Instead, I wanted to only update D12 if it was in a field with a fixed length of 8, since that format/length pairing would always be a mismatch. I conjectured that the pair of 8s in the map file corresponded to the minimum and maximum field size. So I needed a script to match D12s with a pair of 8s 14 lines earlier. Enter perl.
$ perl -pe 'push @ar,$_; shift @ar if $#ar>13; > s/D12/DC4/ if $ar[0]==8 && $ar[1]==8' in.map > out.map
Yes, it's that simple. I've always loved perl for its expressiveness, especially for quick one-line commands like this, which it is especially well suited to. Here in roughly 100 characters, on the fly, and under time pressure, I updated everything in the map that needed it, and left legitimate D12 uses alone. To verify:
$ diff in.map out.map 728c728 < D12 --- > DC4 1084c1084 < D12 --- > DC4 1928c1928 < D12 --- > DC4 ...etc.
After that, I used the GUI to recompile the map, and ran some sample data through it to make sure it was functioning correctly on 2013 dates (and no new problems were introduced), and everything looked good. After getting the compiled map moved to our production server, new documents started processing successfully.
That left the ones that had already failed, and needing to match the raw error data with the correct Gentran partner profile. There are two basic gotchas when Gentran throws a document out because of a translation failure. First, you can't drop the file back in the main staging directory, because it will fail as a duplicate, since duplicate checking happens prior to translation. You have to name the document after the partner profile it is associated with, and drop it directly into the translator's trigger directory. Second, the errored file gets named "trans_failed" with a unique identifier after it, so you have to manually reconcile the raw data with the partner.
And since all inbound documents were affected until the map was fixed, there was a hefty set of files to sift through: 38 of them. I could open each by hand, and search for the partner by EDI code using Gentran's GUI, or, for the sake of speed, I could whip up something a little more programmatic. Once again, enter perl.
Gentran's partner database is a binary file separated, fortunately, by linefeeds after every row. Each column of a row is separated by null characters, but the contents are always in the same order, and are plain text. The first column is the partner code, the third is the EDI code. I was able to extract both in a simple list with this simple one-liner:
perl -ne '@sp = /([\w\s]+)/g; print "$sp[0] $sp[2]\n" if /810R41/' tp.dat
That produced output similar to this:
PARTNER1 EDICODE1 PARTNER2 EDICODE2 PARTNER3 ABCDEFGH
Matching each file to the right partner I took a little more care with, taking my partner list as inline input to the script with a virtual filehandle. Each file, fortunately, had the EDI code in the same fixed position, 35, padded with spaces out to a max length of 15. So I need to grab 15 characters at position 35, truncate the spaces, compare it to the partner list, and print partner codes matched to filenames. Here is the final product:
#!/usr/bin/perl -w use strict; my %partners = (); for (<DATA>) { my ($code, $id) = split; $partners{$id} = $code; } for (@ARGV) { open FH, '<', $_; my $line = <FH>; close FH; my $sid = substr($line, 35, 15); $sid =~ s/\s+$//; print "$_ : $partners{$sid}\n"; } __DATA__ PARTNER1 EDICODE1 PARTNER2 EDICODE2 PARTNER3 ABCDEFGH
Still relatively compact, and easy to grok for even a novice perl coder.
So that was me manhandling Gentran with perl to make short work of something that would have taken a much longer time doing it with the available vendor tools, and that pretty accurately describes the niche I fill at my work: the Hail Mary, fixing a critical problem in a hurry, deciphering using my intuition where there is no documentation, taking somewhat ill advised shortcuts, and not being afraid to be bold... because when there's a lot at stake, second guessing yourself is suicide.
The end result? With Jim's help communicating with the business users, creating official "change requests" to promote the new map to production, and manually renaming and dropping errored files back to the translator and verifying the results, we're back in business. With any luck, he'll get all the credit (especially for doing all that while his kid was sick), and my commandline insanity and being knee deep in vendor code long past its end-of-life will go unnoticed.
No comments:
Post a Comment