Best short program

Seonghoon Kang
kang.seonghoon@mearie.org

Judges' comments:

To build:

make kang

To run:

echo "full spelling of an English cardinal numeral less than a quadrillion" | ./kang

Try:

echo Nineteen hundred and eighty-four | ./kang
echo uno | ./kang
echo trois | ./kang
echo fier | ./kang
echo "shest'" | ./kang

Selected Judges Remarks:

The judges were able to appreciate the Indo-European language family relation by making this entry successfully recognize some French, German, Italian, Russian, and Spanish numerals.

Also worth mentioning is this entry’s ability to understand the colloquial year numbers of the last millennium.

We’ve added a linefeed to the print format for convenience.

Author’s comments:

Synopsis

This short program reads a spelt number (e.g. forty-two) and writes a corresponding decimal number (e.g. 42). Too long for one-liners, alas, but it still qualifies as a short program as it has less than 0x100 bytes.

It accepts a variety of spelt numbers:

It does not accept some spelt numbers, which I found mostly irrelevant:

Requirements

This program is quite portable, only requiring the following:

The design of the program explicitly allows for EOF which does not equal to -1 (it has to be negative per the standard) and both signed and unsigned char, for example. Furthermore it is endian-independent.

Obfuscations

Many obfuscations used are typical for standard IOCCC entries:

Other obfuscations are more subtle:

But the most important obfuscation is the clever construction of lookup table. The program uses 11 different characters required for recognizing 22 lexemes:

zero        one         tw-         th(i)r-     fo(u)r-     fi-         six-
seven-      eigh-       nin-        ten         eleven      twelve
hundred(s)  thousand(s) million(s)  billion(s)  trillion(s)
a           and         -teen       -ty

So that they are internally represented as like:

r        n        tw-      tr-      fr-      f-       s-
sn-      g-       nn-      tn       ln       twl
nr(s)    tsan(s)  lln(s)   blln(s)  trlln(s)
a        an       -tn      -ty

The stemmer recognizes the longest matching prefix, so every lexeme can be recognized by at most three characters (e.g. trl instead of trlln). This is also handy for ignoring plurals. But that would make that the table does not fit in the printable byte—112 is already almost 27!

The trick is to use octal; three characters (a, b and g) are interpreted as sequences of two characters (ny, nn and nw respectively). Asides from a smaller lookup table, it has many good consequences:

Having said this important trick, other details should be relatively easier to follow. The order of lookup table, for example, is very important, and the biggest constant 6177 is not arbitrarily chosen.

Acknowledgement

[Removed for proper judgement: I’ll fill this section if I do win.]

[Note: If you think you’ve seen this program elsewhere, you are probably right; I’m submitting a program I have published years ago (and I have written myself, of course). Neither rules nor guidelines disallow such a resubmission, so I guess it’s fine.]


Creative Commons License

© Copyright 1984-2015, Leo Broukhis, Simon Cooper, Landon Curt Noll - All rights reserved
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.