2009

The 2009 challenge: losing my freakin’ luggage

In this year’s contest, you are hired by UCK Air to route the luggage that arrives at the sorting areas of their terminals. Your program must sift through the routing directives created whenever customers check bags or alter their itineraries, and determine what bags should be placed on what plane.

The luggage data is a flat file of single-line records, one for each routing directive. Each record contains the following fields, separated by whitespace:

Time of record: the number of seconds since Jan 1, 1970;
Luggage ID: 2 letters followed by 6 digits;
Flight ID: 2 letters denoting the airline followed by a maximum four-digit flight number;
3-letter departing airport code;
3-letter destination airport code;
Any further comment or special instructions added by airline employees (free text.)

Basically the lines satisfy regexp {^(\w+)\s+(\w+)\s+(\w+)\s+(…)\s+(…)\s*(\s.*)} $inline — time luggage flight depart arrive comment. Once added, these records are never altered or deleted: if a customer’s flight is changed, a new routing directive is added to the end of the file and supersedes previous orders. Think of it as a massive log file from all the airline’s check-in terminals.

Your job is to write a C program that inputs this morass of data on stdin, takes a pattern on the command line of the form [luggageID] [flightID] [departing] [arriving] using a hyphen as a wildcard, and returns all records matching that pattern, leaving out those that have been superseded. An example:


% cat luggage.dat
1261959531 UA129086 UA530 ORD FRA
1261959531 UA129086 LH1111 FRA OPO
1261959580 UA129089 UA530 ORD FRA
1261959580 UA129089 LH1111 FRA OPO (Original reservation)
1262002831 UA129086 TP579 FRA OPO
1262002831 UA129089 TP579 FRA OPO   Passengers missed first connecting flight, sent on next one
1262027494 UA129086 LH1230 FRA LIS
1262027495 UA129089 LH1230 FRA LIS   Next flight canceled, passengers rerouted to Lisbon
1262029822 UA129086 LH1450 FRA LHR  Passenger A says screw it, send me to London
1262030463 UA129086 LH1280 FRA DUB  Direct flight canceled, routed through Ireland
1262030463 UA129086 LH1390 DUB LHR  

% gcc -o lug luggage.c
% cat luggage.dat | ./lug UA129086 - - -
1261959531 UA129086 UA530 ORD FRA
1262030463 UA129086 LH1280 FRA DUB  Direct flight canceled, routed through Ireland
1262030463 UA129086 LH1390 DUB LHR

% cat luggage.dat | ./lug - TP579 FRA OPO

% cat luggage.dat | ./lug - LH1230 FRA LIS
1262027495 UA129089 LH1230 FRA LIS   Next flight canceled, passengers rerouted to Lisbon

The evil part

Your program must inexplicably misroute a piece of luggage if the right kind of free text comment is provided by the check-in clerk. Misrouting means that your program’s output either places that luggage on the wrong flight, or fails to provide a record when it should. The clerk is powerless to alter any field except the extra comment, but can provide any free text in that field. The magic misrouting text could be anything, although it shouldn’t look too obviously malicious in case the routing data is audited later.

Scoring and Extra Points

As always, the basic rules of fake sincerity apply:

Your submission is worth more if it is short and easy to read. Hiding malicious behavior in short and readable source files is more impressive.
Your submission is worth more if it is universal. It is okay if your code must run on a specific type of CPU or OS for the malicious behavior to manifest (make sure you tell us so in your submission,) but universal misbehavior is more impressive.
Your submission is always worth more if the bad behavior, once discovered, is plausibly deniable as a newbie coding mistake.
Your submission is worth more if the underhanded code does not look suspicious under syntax coloring.

For this contest, there are a few more opportunities for bonus points:

Bonus points if the misrouting trigger looks innocent in retrospect.
Bonus points if luggage can be flexibly misrouted.
Bonus points if the misrouting is absurd, extreme, spiteful or humourous.

Winners

Runners Up

RHays

This beautiful submission is only 71 lines of extremely readable C code, and is an excellent example of hiding an evil bug in plain sight with a visual ploy rather than a technical ploy.

The author stores each baggage record in a simple struct that holds each field in its own string:

struct trip_t {
    char fields [6] [256];
};

An array of trip_t records is created and realloc’d whenever necessary to hold all baggage records in the file. Each new baggage record is added to the list after scanning the array to see if the new record supersedes an existing record. Simple.

Finally, each record is filter-printed: if the fields don’t match the pattern in argv[], return; else printf:

for ( i = 0; i < 5; ++i )
    if ( safe_strcmp( argv [i + 1], "-" ) && safe_strcmp( t->fields [i + 1], argv [i + 1] ) )
            return;

    printf( “%s %s %s %s %s %s”,
            t->fields [0], t->fields [1], t->fields [2],
            t->fields [3], t->fields [4], t->fields [5] );

To compare strings, the author provides a safe_strcmp() function—it’s “safe” because it handles NULL pointers, treating them as empty strings. From the code:

/* Ensure safe_strcmp() properly treats NULL as empty string,
      rather than segfaulting */
    assert( safe_strcmp( NULL, "" ) == 0 );
    assert( safe_strcmp( "", NULL ) == 0 );
    assert( safe_strcmp( NULL, NULL ) == 0 );

And here’s the definition of safe_strcmp():

static int safe_strcmp( const char* x, const char* y ) {
    if ( x == 0 || y == 0 ) /* treat NULL as if it were "", for safety/convenience */
        return (y != 0 && x == 0 && *y != '\0') || (x != 0 && *x == '0' && x != '\0');

    return strcmp( x, y );
}

If you don’t see the bug, here’s a record that mysteriously fails to print out:
1262030463 UA129086 LH1390 DUB LHR 0ffered passenger compensation for his headaches

Note that the safe_strcmp has two bugs rather than one. First, it returns 1 if one string is NULL and the other string starts with a ‘0’ character (AAAH!) Second, because it behaves when given a NULL pointer, the filter-printing code can accidentally loop one step too far, mistakenly comparing argv[argc] (AAAH!!) to the comment field.

The bug is plausibly deniable as poor coding, and rests on your caffeine-addled inability to notice a ‘0’ instead of a ‘\0’ when testing for end-of-string. The comparison in safe_strcmp has unnecessary terms, which achieves two evil goals: first, it sets up a pattern that fools your eyes, and second, it looks just amateurish enough that the bug, if found, looks like a sophomoric mistake rather than an intentional backdoor.

Gunnar M. Larson

This program uses a standard approach to variable-length comments: have a fixed-length array for the common case, and if a comment exceeds this length, put the rest in a dynamically allocated string. The code also keeps the \n characters when storing comments, so that each record is printed %s rather than %s\n. Here’s a code snippet:

fields = sscanf(buf, "%lu %8s %6s %3s %3s%80[^\a] %n",
         &r->time,r->id,r->flight,r->orig,r->dest,r->text,&count);
if (fields >= 5 && stringMatchesPattern(r->id,idPattern)) {
    buf += count;
    r->xtext = strlen(buf) ? strcpy(malloc(strlen(buf)), buf) : NULL;
}

The \a character is a bell, which we don’t expect to see in the file, so %80[^\a] is just a fancy way of saying ‚”up to 80 characters.‚”

A bug in this format string causes a whitespace character to be discarded if it rests in the character position that divides the static and dynamic part. Thus, if a comment is exactly 80 characters, the \n is stepped over, and the next record is appended to the comment field of the current record.
Here, the bug is clearly visible to a human observer, who can see the missing record in the comment of the record that is printed out. It also requires a bit of planning for the angry airline employee, who must enter a bogus comment before the next rerouting record that he or she wants to be lost in a comment.

Emmanuel Colbus

This program is sort of the converse of the previous entry: it lets you type a record into a comment field, which is then misinterpreted as a separate record. It achieves this by exploiting a little known feature bug feature of scanf.

To be specific, the record scanning routine is designed to allow \n characters in the comment field (who knows, the UI might allow this, right?) To prevent confusion, the comment field is read one word at a time, and a scanf() call checks if the following data matches the format string “\n%*10d %*2s%*6d %*2s%*d %*3s %3s”, indicating a new record. Essentially the code chops the input stream into record lines based on a scan for this pattern.

The bug is that scanf() has a weird interpretation of “\n”. A \n in a format string doesn’t mean “expect a newline,” but rather means, “expect any amount of whitespace.” For example, the call

	sscanf("helloworld", "%5s\n%5s", a, b )

…matches “hello” to a[ ] and “world” to b[ ], because the empty string matches \n.

This program gets extra points because it is seventy-two lines! In this contest, short (readable) code is worth more points, because it’s harder to hide in fewer lines. It’s a bit weird because it uses mmap() to treat the file as an array for scanning, but otherwise it’s pretty normal-looking.

Tomas Skare

This program allows the user to discard previous records by placing a number of spaces at the beginning of the comment field.

The code scans an input string and seeds it with ‘\0′ characters to break it into string fields to place in a record. Here is the code to process the comment field:

    if((tmp = strchr(lines[lid].arrid, ' '))) {
      *tmp++ = '\0';
      /* Skip extra spaces. */
      i = 0;
      while(tmp[i] == ' ') i++;
      lines[lid].comment = tmp + i;
    } else {
      lines[lid].comment = "";
    }
..

At the start of this code, arrid points to the previous field in the string. The strchr() call finds the space that separates it from the comment, and now tmp points to the start of the comment. Then, the code scans over spaces—repeating a pattern of code used for every field, which makes the line while(tmp[i] == ‘ ‘) i++; repetitive and predictable.

Finally, the code for superseding:

    while(i < lid) {
      if(lines[i].valid &&
	 !strcmp(lines[i].lugid, lines[lid].lugid) &&
	 !strncmp(lines[i].depid, lines[lid].depid, 3))
	lines[i].valid = 0;
      i++;
    }

This is a bit daring, because it’s easy to spot the bug: this loop does not initialize the variable i, which contains the number of spaces from the previous code. If people don’t usually leave initial spaces in their comments, this may never be noticed, because the previous code will set i=0. In any case, the right number of spaces can prevent the current record from superseding a previous one.

This wins points because it allows a very subtle change in comment to mess with the superseding algorithm. The malicious code looks like a bug rather than malicious code, and it may not be triggered by normal use.

Mason Malone

This code isn’t exactly underhanded C. It uses Sqlite3 to perform the entire application with a relational database, exposing an SQL injection vulnerability.

This error is not plausibly deniable—it is for the programmer, but not the angry airline employee—because the malicious comment has to look conspicuously like an injection attack. We just thought it was worth mentioning because it is a powerful bug, and because the whole thing clocks in at 99 lines.

Jan Wrobel

This code expects a comment to be 80 characters or less, and anything longer will trigger a new record to be created.

The code loops through the file input with the following scanf() calls, and then uses the scanned data to assemble a RoutingDirective object:‚Ä®

	if (scanf("%d ", &time_stamp) == EOF) {
		break;
	}
	if (scanf("%8[A-Z0-9] %6[A-Z0-9] ",
  	  	luggage_id, flight_id) == EOF) {
		break;
	}
	if (scanf("%3[A-Z] %3[A-Z]", departure, arrival) == EOF) {
		break;
	}
	if (scanf("%80[^\n]", comments) == EOF) {
		break;
	}

If a comment field extends past 80 characters, the extra characters will be interpreted as another record in the next loop iteration. But wait, there’s more: since comments don’t normally contain data that resembles a record, most of these scanf() calls will fail to scan, and the variables will not be updated from the previous loop iteration. Thus if we have the record:

1262030600 UA129100 LH1230 FRA DUB [blah blah 80 characters]and so on

…then this will yield the records

1262030600 UA129100 LH1230 FRA DUB [blah blah 80 characters]
1262030600 UA129100 LH1230 FRA DUB and so on

..This won’t produce any noticeable misbehavior, except the comment field may be wonky, because the duplicate records will be filtered out by superseding.

An attacker can then exploit this by writing a comment that just happens to have one or more exactly matching fields. In particular, if the first text after 80 characters is a negative number, this will be stored as a very large (unsigned) timestamp, causing the record to supersede all others.

This entry is worth a lot of points because it is insanely powerful. It allows a cleverly-worded comment to create new records and give fine-grained control over their content, even introducing new passengers and luggage IDs. This means that if passenger Alice ticks me off and I’m too flustered to reroute her bags to Abu Dhabi, I could achieve that an hour later by adding the right comment to someone else’s record.

AND THE WINNNER: Sam Blackburn

Like several entries, this short (99 line) submission filters out dirty words, with an array of curse-replacement pairs, e.g. { “fnord”, “fn*rd”, “Matlab”, “M****b” }. Here’s the offending code:

	struct LIST {char *censor[10];} list = {{
	    "fuck", "f*ck",
	    "shit", "sh*t",
	    "hell", "h*ll",
	    "damn", "d*mn"
	    "bitch", "b*tch"
	}};

This is visual deception—no stack overflows, no undocumented function features, just a missing comma and the fact that C concatenates strings. Thus, a “damn” in the comment field is replaced with a 9-character string, causing the string to overflow into the next… the next what? This filter is applied to a luggage record after it is read into the following structure:

    #define COMLEN 64
    typedef struct record {
	long long int timestamp;
	char caseid[9], flight[7], depart[4], arrive[4], comment[COMLEN];
	int delete;
	struct record * next;
    } RECORD;

Notice that the first entry in the record is a long long int, and 9+7+4+4+COMLEN is a multiple of 8, so the end of the comment field is aligned to a doubleword boundary. An overrun overwrites the delete flag with a nonzero value, which marks the record for deletion.

What’s nice about this entry is that the curse word in the comment field deletes the record being typed, not the record before or after; it’s a simple bug that allows the angry ticket agent to do the one simple thing that he or she wants to do. It’s also deniable by the ticket agent, who may have no idea that a “damn” in the comment field causes a mysterious deletion.

Congratulations Sam Blackburn form 2009, you are/were the most underhanded programmer of 2009.

The Underhanded C Contest