Saturday, November 20, 2010

Roman numerals redux

A simple JavaScript function to validate Roman numerals, convert them to Arabic numbers, and back again.

I checked Google Analytics this morning and noticed I've been getting a lot of hits on my October 9 entry on JavaScript Roman numeral conversion. In brief, I wrote a simple function to take a number and build its Roman numeral equivalent. While I was mulling it over this morning, I spontaneously thought of a method to validate and parse a Roman numeral and turn it back into Arabic numbers using regular expressions.

Input a Roman numeral or an integer between 1 and 3999:
Strict

Step 1: Validating

What contitutes a "valid" Roman numeral is ambiguous. For example, referring to 4 as IV instead of IIII has more of a spotty history than I imagined. The Wikipedia article on roman numerals has an interesting section on this, discussing the use of IIII on clock faces, and MDCCCC on 20th century buildings instead of MCM, and some instances of using VV to represent 10. I decided not to support VV, and to add a flag that sets whether to reject IIII, XXXX, and CCCC.

Assuming IIII, VIIII, IV, and IX are all valid, here are all the possible ones place Roman numerals:

I
II
III
IV or IIII
V
VI
VII
VIII
IX or VIIII

These can be divided into two classes: IX and IV, and everything else. If the numeral contains IX or IV, you're done with the ones place. If it doesn't contain either of those, the numeral can have a V or not followed by 0 to 4 I characters (0 to 3 if we're being strict). Expressing that concept in regular expression speak is surprisingly simple, and can be built from three basic rules. I'll describe what each rule means, for those not familiar with regex.

I[XV]  - This means 'I' followed by either 'X' or 'V'.
V?     - 0 or 1 V characters
I{0,4} - 0 to 4 I characters

Now we just need to put the optional V before the optional I's, and specify either-or to the choice types by using the pipe character, making the rule:

V?I{0,4}|I[XV]

Next, the same rule needs to be applied to the tens and hundreds places, and each rule wrapped in parentheses to keep the piped choices from interfering with each other:

(D?C{0,4}|C[MD])(L?X{0,4}|X[CL])(V?I{0,4}|I[XV])

Lastly, the thousands, and binding to the start and end of the string. The beginning of the string is marked with the carat symbol, the end with the dollar sign. There can be 0 to 3 thousands at the beginning of the string, followed by the regex above, followed by the end of the string. This makes the complete regex:

^M{0,3}(D?C{0,4}|C[MD])(L?X{0,4}|X[CL])(V?I{0,4}|I[XV])$

Step 2: Evaluating

Once we're sure the string is valid, we need to break it down into tokens and add up their values. Fortunately, every letter has a set value, and only the exceptions for subtraction notation (IX, IV, etc.) need to be worried about. Regular Expressions can help us out again, as the piped list of choices uses a first match rule. We just need a list of choices that finds the subtraction notation options before it finds the individual letters:

M|CM|D|CD|C|XC|L|XL|X|IX|V|IV|I

Note that IX and IV are found before just plain 'I'. Next, assign each list option a value, and write a function to iterate over all the matches and add up the values:

var word   = /(M|CM|D|CD|C|XC|L|XL|X|IX|V|IV|I)/g;
var value  = {I:1,IV:4,V:5,IX:9,X:10,XL:40,L:50,XC:90,C:100,CD:400,D:500,CM:900,M:1000};

function getInt(input) {
  var s = 0;
  var words = input.match(word);
  for (var i = 0; i < words.length; i++) {
    s += value[words[i]];
  }
  return s;
}

Putting it all together

Using those basic concepts, and borrowing from my previous code that built a Roman numeral based on an Arabic number, I wrote a smallish function that can take either an Arabic number or Roman numeral, and convert it to the other.

The source code:
var isInt  = /^\d+$/;
var isRom  = /^[IVXLCDM]+$/i;
var valid  = /^M{0,3}(D?C{0,4}|C[MD])(L?X{0,4}|X[CL])(V?I{0,4}|I[XV])$/;
var strict = /^M{0,3}(D?C{0,3}|C[MD])(L?X{0,3}|X[CL])(V?I{0,3}|I[XV])$/;
var word   = /(M|CM|D|CD|C|XC|L|XL|X|IX|V|IV|I)/g;
var value  = {I:1,IV:4,V:5,IX:9,X:10,XL:40,L:50,XC:90,C:100,CD:400,D:500,CM:900,M:1000};


function translate(input, isStrict) {
  if (input == null || input == "") {
    alert("Please enter a value.");
    return input;
  }

  var output;
  if (input.match(isRom)) output = getInt(input.toUpperCase(), isStrict);
  else if (input.match(isInt)) output = getRom(input);
  else alert(input + " is not a Roman numeral or integer value.");

  return output == null ? input : output;
}

function getInt(input, isStrict) {
  var matcher = isStrict ? strict : valid;
  if (input.match(matcher) == null) {
    alert(input + " is not a valid Roman numeral.");
    return;
  }

  var s = 0;
  var words = input.match(word);
  for (var i = 0; i < words.length; i++) {
    s += value[words[i]];
  }

  return s;
}

function getRom(input) {
  var num = parseInt(input, 10);

  if (num < 1 || num > 3999) {
    alert("Integer must be between 1 and 3999.");
    return;
  }

  var s = "";
  while (num >= 1000) { num -= 1000; s +=  "M"; }
  if    (num >=  900) { num -=  900; s += "CM"; }
  if    (num >=  500) { num -=  500; s +=  "D"; }
  if    (num >=  400) { num -=  400; s += "CD"; }
  while (num >=  100) { num -=  100; s +=  "C"; }
  if    (num >=   90) { num -=   90; s += "XC"; }
  if    (num >=   50) { num -=   50; s +=  "L"; }
  if    (num >=   40) { num -=   40; s += "XL"; }
  while (num >=   10) { num -=   10; s +=  "X"; }
  if    (num >=    9) { num -=    9; s += "IX"; }
  if    (num >=    5) { num -=    5; s +=  "V"; }
  if    (num >=    4) { num -=    4; s += "IV"; }
  while (num >=    1) { num -=    1; s +=  "I"; }
  return s;
}

No comments:

Post a Comment