Tuesday, December 28, 2010

Check 21 Java app - 1 of 2

Extracting data and images from Image Cash Letter files

Background

There are many terms relating to Image Cash Letter files: Check 21, DSTU X9.37, X9.100-187, and check truncation, to name a few.

"Check 21" refers to an act passed in 2003. From Wikipedia's "Check 21 Act" article:

The Check Clearing for the 21st Century Act (or Check 21 Act) is a United States federal law, Pub.L. 108-100 ... The law allows the recipient of the original paper check to create a digital version of the original check—called a "substitute check," thereby eliminating the need for further handling of the physical document...

...Recently, Check 21 software providers have developed a "Virtual Check 21" system which allows online and offline merchants to create and submit demand draft documents to the bank of deposit. This process which combines remotely created checks (RCC) and Check 21 X9.37 files enables merchants to benefit from direct merchant-to-bank relationships, lower NSFs, and lower chargebacks.

The last paragraph bears some resemblance to reality, but the article's footnote refers to just one of many companies capitalizing on the Check 21 act, and the lofty goal of direct merchant-to-bank relationships is not met due to the very fact of requiring a middle-man's software to manage the relationship.

The text of the act itself is available at this page on gpo.gov, the opening paragraph being:

To facilitate check truncation by authorizing substitute checks, to foster innovation in the check collection system without mandating receipt of checks in electronic form, and to improve the overall efficiency of the Nation's payments system, and for other purposes.

My interpretation is that the act clearly states that no one is forcing banks to receive checks electronically, but new law is being written allowing banks to use electronic copies (or photocopies, or re-prints) of checks in place of the originals at their discretion. Later text in the act provides for a future study to see how banks are adopting it, whether their profits are affected, whether it's useful, etc.

When the act was passed, ANSI published DSTU X9.37 "for Electronic Exchange of Check and Image Data". "DSTU" stands for "Draft Standard for Trial Use". "X9" is the banking arm of ANSI (where "X12" is the EDI arm). So basically in 2003 a trial standard was published for exchanging check images, with an evaluation of the standard to come later. That came in 2008, when ANSI published X9.100-180 for Canadian use, and X9.100-187 for use in the states, with minor alterations from the original standard.

Background on the file format

An image cash letter is designed for mainframe use. It is record-based, with a 4-byte word at the beginning of each record indicating its length. It contains mixed text and image data, with the text part being in the EBCDIC character set, and the image part being TIFF data. If you are a small business that happens to have a mainframe laying around, this is good news. For merchants living in the real world, not so much.

The file format poses a number of challenges for Windows, Mac, and Unix programmers. Unless you are versed in the mainframe/AS400 world, variable length records with length bytes and the EBCDIC character set are probably unfamiliar to you. Once you figure out EBCDIC translation, you then need to tackle converting the record-length header to a number, which, unless you are in the habit of programming at a "bits on the wire" level (checksums, for instance), you've probably never needed to do before.

Once you tackle record lengths and EBCDIC and start playing with the image data, you'll quickly realize a simple fact that you never noticed before: nothing supports TIFF images. Your web browser doesn't display them, and your programming language doesn't have support for them in its native image library. You either need to invest more research time into finding a TIFF library and learning its API, or just dump image data to individual files each time you come across it, leaving your end-user with a large collection of files to open by hand. This would lead to the following uncomfortable conversation:

User: "So I'll have a bunch of image files representing checks that I have to click on one at a time to view?"

You: "Right."

User: "And there's no way to correlate those back to the text data?"

You: "Right."

User: "Well, do I at least get the front and back of checks in the same image?"

You: "Um, well... no."

Because of these mild but unfamiliar technical hurdles, the third-party app that supports Check21 files looks more attractive to a CTO than watching the internal IT team bumble around trying to build an app. In fact, an industry has popped up around these files, costing companies money in vendor lock-in that they tried to save by converting from ACH files and mailing checks, all in the name of getting money to the bank a little faster. Not a good situation.

Impetus for this entry

In July of '09, I published a perl script that does just what I described above. It takes a Check 21 file and prints its ASCII-translated text to STDOUT, and saves the image data into individual TIFF files. It was intended only as a quick experiment with handling the Check21 file format, but its Google rank grew until it was on the front page for searches like "x9.37 parse". This is odd because the script is a quick hack (in fact the first comment on it complains about its lack of exception handling) that I threw together in about 30 minutes, so it's possible that there aren't many "open" apps available that handle this file type.

Recently someone asked me for help with a problem using the extractor. Between my script's unexpected high search rank, and the fact that questions about my tech posts are pretty rare, one can reasonably assume that Check21 files are currently a big deal in "the wild", and knowledge about them is pretty slim. Assuming that to be the case, this post is the first in a two-part Check21 java series exploring the file format, and concerns with parsing, displaying, and creating Check21 files. This first post shows how to write an extractor similar to my perl script, but in Java.

Bits on the wire

Let's start by opening a file in a hex editor to see what it looks like. The file I'm using is available on this page on x937.com, a private site run by software developer Norman Graham.

To the untrained eye, this looks like so much garbage. To those more mainframe-savvy, the top section peppered with at-signs and letters with diacritics means you're probably looking at EBCDIC rendered with an ASCII viewer. To the image geeks, the II* on line 210h is indicative of the beginning of TIFF data.

The first four bytes combine as one hex number: 0x00000050, or decimal 80, meaning the next 80 bytes constitute a record. The next few bytes (F0, F1, F0, F3) are EBCDIC representations of numeric digits, as EBCDIC places 0 through 9 in hex F0 through hex F9 (where ASCII has them at hex 30 through 39), so the first four characters of the record should be "0103".

Tracing through the JRE

With those basic assumptions, let's throw some code at it and see what we get. The record length header can be turned into a number with very little effort. The Java class javax.imageio.stream.ImageInputStreamImpl provides the following method:

public int readInt() throws IOException {
    if (read(byteBuf, 0, 4) < 0) {
    throw new EOFException();
    }

    if (byteOrder == ByteOrder.BIG_ENDIAN) {
        return
            (((byteBuf[0] & 0xff) << 24) | ((byteBuf[1] & 0xff) << 16) |
             ((byteBuf[2] & 0xff) <<  8) | ((byteBuf[3] & 0xff) <<  0));
    } else {
        return
            (((byteBuf[3] & 0xff) << 24) | ((byteBuf[2] & 0xff) << 16) |
             ((byteBuf[1] & 0xff) <<  8) | ((byteBuf[0] & 0xff) <<  0));
    }
}

The variable "byteBuf" is a byte array. Four bytes are read into it, and using bitwise math, a single 32-bit integer is returned. This is a less processor heavy version of looping through the byte array, multiplying by 256, and adding the next byte.

So if I open the file as an ImageInputStream, I can use readInt() to grab record lengths and move forward in the file. Next, the EBCDIC to ASCII translation. The java.lang.String class has a constructor that accepts a string input declaring the character set:

public String(byte bytes[], String charsetName)

This is ultimately a convenience method for java.lang.StringCoding.decode(), which looks up a CharSet class and returns a new String by converting each byte using a translation table. For EBCDIC translation, specify "Cp1047" as the character set name, and Java will find the class sun.nio.cs.ext.IBM1047, which contains this table:

 private final static String byteToCharTable =
 
     "\u00D8\u0061\u0062\u0063\u0064\u0065\u0066\u0067" +     // 0x80 - 0x87
     "\u0068\u0069\u00AB\u00BB\u00F0\u00FD\u00FE\u00B1" +     // 0x88 - 0x8F
     "\u00B0\u006A\u006B\u006C\u006D\u006E\u006F\u0070" +     // 0x90 - 0x97
     "\u0071\u0072\u00AA\u00BA\u00E6\u00B8\u00C6\u00A4" +     // 0x98 - 0x9F
     "\u00B5\u007E\u0073\u0074\u0075\u0076\u0077\u0078" +     // 0xA0 - 0xA7
     "\u0079\u007A\u00A1\u00BF\u00D0\u005B\u00DE\u00AE" +     // 0xA8 - 0xAF
     "\u00AC\u00A3\u00A5\u00B7\u00A9\u00A7\u00B6\u00BC" +     // 0xB0 - 0xB7
     "\u00BD\u00BE\u00DD\u00A8\u00AF\u005D\u00B4\u00D7" +     // 0xB8 - 0xBF
     "\u007B\u0041\u0042\u0043\u0044\u0045\u0046\u0047" +     // 0xC0 - 0xC7
     "\u0048\u0049\u00AD\u00F4\u00F6\u00F2\u00F3\u00F5" +     // 0xC8 - 0xCF
     "\u007D\u004A\u004B\u004C\u004D\u004E\u004F\u0050" +     // 0xD0 - 0xD7
     "\u0051\u0052\u00B9\u00FB\u00FC\u00F9\u00FA\u00FF" +     // 0xD8 - 0xDF
     "\\\u00F7\u0053\u0054\u0055\u0056\u0057\u0058" +     // 0xE0 - 0xE7
     "\u0059\u005A\u00B2\u00D4\u00D6\u00D2\u00D3\u00D5" +     // 0xE8 - 0xEF
     "\u0030\u0031\u0032\u0033\u0034\u0035\u0036\u0037" +     // 0xF0 - 0xF7
     "\u0038\u0039\u00B3\u00DB\u00DC\u00D9\u00DA\u009F" +     // 0xF8 - 0xFF
     "\u0000\u0001\u0002\u0003\u009C\t\u0086\u007F" +     // 0x00 - 0x07
     "\u0097\u008D\u008E\u000B\f\r\u000E\u000F" +     // 0x08 - 0x0F
     "\u0010\u0011\u0012\u0013\u009D\n\b\u0087" +     // 0x10 - 0x17
     "\u0018\u0019\u0092\u008F\u001C\u001D\u001E\u001F" +     // 0x18 - 0x1F
     "\u0080\u0081\u0082\u0083\u0084\u0085\u0017\u001B" +     // 0x20 - 0x27
     "\u0088\u0089\u008A\u008B\u008C\u0005\u0006\u0007" +     // 0x28 - 0x2F
     "\u0090\u0091\u0016\u0093\u0094\u0095\u0096\u0004" +     // 0x30 - 0x37
     "\u0098\u0099\u009A\u009B\u0014\u0015\u009E\u001A" +     // 0x38 - 0x3F
     "\u0020\u00A0\u00E2\u00E4\u00E0\u00E1\u00E3\u00E5" +     // 0x40 - 0x47
     "\u00E7\u00F1\u00A2\u002E\u003C\u0028\u002B\u007C" +     // 0x48 - 0x4F
     "\u0026\u00E9\u00EA\u00EB\u00E8\u00ED\u00EE\u00EF" +     // 0x50 - 0x57
     "\u00EC\u00DF\u0021\u0024\u002A\u0029\u003B\u005E" +     // 0x58 - 0x5F
     "\u002D\u002F\u00C2\u00C4\u00C0\u00C1\u00C3\u00C5" +     // 0x60 - 0x67
     "\u00C7\u00D1\u00A6\u002C\u0025\u005F\u003E\u003F" +     // 0x68 - 0x6F
     "\u00F8\u00C9\u00CA\u00CB\u00C8\u00CD\u00CE\u00CF" +     // 0x70 - 0x77
     "\u00CC\u0060\u003A\u0023\u0040\'\u003D\"";     // 0x78 - 0x7F
 

Client code

Here is a quick test class to open the file as an ImageInputStream, and attempt to find the first record's length, read that many bytes, and translate to ASCII.

package com.cea.check21;

import java.io.File;
import java.io.IOException;
import javax.imageio.stream.FileImageInputStream;

public class Test {
  public static void main(String[] args) throws IOException {
    FileImageInputStream is = new FileImageInputStream(new File("one.x9"));
    int recLen = is.readInt();
    System.out.println("Record length: " + recLen);
    byte[] rec = new byte[recLen];
    is.read(rec);
    System.out.println(new String(rec, "Cp1047"));
    is.close();
  }
}

After running that, the following output is displayed:

Record length: 80
0103T113000609111012822200408052030NUS BANKO NORM     First Bank of NormAUS     

This looks good. 80 matches the record length we calculated from the hex dump, and the first four characters are "0103" as we expected. The rest of the line looks like numbers and semi-readable text, so we're definitely on the right track.

Now let's flesh out the class a little by iterating over each record and see what happens. The new class:

package com.cea.check21;

import java.io.File;
import java.io.IOException;
import javax.imageio.stream.FileImageInputStream;

public class Test {
  public static void main(String[] args) throws IOException {
    FileImageInputStream is = new FileImageInputStream(new File("one.x9"));
    int recLen;
    while ((recLen = is.readInt()) > 0) {
      System.out.println("Record length: " + recLen);
      byte[] rec = new byte[recLen];
      is.read(rec);
      System.out.println(new String(rec, "Cp1047"));
    }
    is.close();
  }
}

The output:

...and finally:

Exception in thread "main" java.io.EOFException
 at javax.imageio.stream.ImageInputStreamImpl.readInt(ImageInputStreamImpl.java:237)
 at com.cea.check21.Test.main(Test.java:11)

OK, so a closer inspection of the API would have shown that readInt() throws an exception when it reads past the end of the file, but other than that, we're good up to the point where the image data is encountered. The image data occurs where the spec says it will: on the 52 record, at column 118.

So to finish up text conversion, we need to do two things. First, handle EOF more gracefully, and second, print only the first 117 characters of 52 records.

package com.cea.check21;

import java.io.File;
import java.io.IOException;
import javax.imageio.stream.FileImageInputStream;

public class Test {
  public static void main(String[] args) throws IOException {
    File file = new File("one.x9");
    FileImageInputStream is = new FileImageInputStream(file);
    
    while (is.getStreamPosition() < file.length()) {
      int recLen = is.readInt();
      byte[] rec = new byte[recLen];
      is.read(rec);
      String recNum = new String(rec, 0, 2, "Cp1047");
      if (recNum.equals("52")) System.out.println(new String(rec, 0, 117, "Cp1047"));
      else System.out.println(new String(rec, "Cp1047"));
    }
    is.close();
  }
}

Which outputs:

0103T113000609111012822200408052030NUS BANKO NORM     First Bank of NormAUS     
100111300060911101282220060109200601092030FG16410001nfg MANAGEMENTxxxxxxxxxxC0  
200109100002211101282220070315200703150020300001000104053000196                 
25           1104 113000609    123456789012345/00000004320999987267040  G01Y000 
50111101282220060109000002206260000                     0                       
5211101282220060109040991400001                                                 
    0                0000000000018628
50111101282220060109000002206261000                     0                       
5211101282220060109040991400001                                                 
    0                0000000000023720
70000100000000043200000000043200001                                             
900000010000000100000000000432000000001 l NORMPOINTE BANK20060109               
9900000100000001000000010000000000000432NORMAN AGEMENT2145085900                

Success!

The last step for a simple extractor is to dump TIFF data into files. We'll iterate a counter for each 52 record, and name the files img(count).tiff. Additionally, I've added some handling for specifying file names on the commandline, and finding their working directories. Lastly the class name has been changed to something more descriptive than "Test".

The final code:

package com.cea.check21;


import java.io.File;
import java.io.IOException;

import javax.imageio.stream.FileImageInputStream;
import javax.imageio.stream.FileImageOutputStream;

public class Extractor {
  public static void main(String[] args) throws IOException {
    
    if (args.length < 1) {
      System.out.println("Usage: java Extractor <checkImageFile>");
      System.exit(0);
    }
    
    File file = new File(args[0]);
    FileImageInputStream is = new FileImageInputStream(file);
    int tiffCount = 0;
    String workDir = file.getParent();
    if (workDir == null) workDir = ".";
    String sep = File.separator;
    
    while(is.getStreamPosition() < file.length()) {
      int recLen = is.readInt();
      byte[] rec = new byte[recLen];
      is.read(rec);
      String recNum = new String(rec, 0, 2, "Cp1047");
      if (recNum.equals("52")) {
        System.out.println(new String(rec, 0, 117, "Cp1047"));
        tiffCount++;
        String numberPart = String.valueOf(tiffCount);
        while (numberPart.length() < 4) numberPart = "0" + numberPart;
        String fileName = workDir + sep + "img" + numberPart + ".tiff";
        FileImageOutputStream out = new FileImageOutputStream(new File(fileName));
        out.write(rec, 117, rec.length - 117);
        out.close();
      } else System.out.println(new String(rec, "Cp1047"));
    }
    is.close();
  }
}

The text output is the same, and two image files are created in the directory containing one.x9:

And they look like this:


Next steps

This Java class is useful only so far as exploring the Check 21 file format, but provides very little real-world utility. What's needed is an app that can display check information more intelligently, including breaking down the record information into something meaningful, and displaying the front and back of a check alongside that information.

To do that, we'll need to figure out how to turn TIFF data into a Java Image object, and to explore the X9.100-187 record layouts in a little more detail.

I'm giving myself an arbitrary deadline of January 15, 2011 to post my progress with that app. Tune in then!

2 comments: