2014
Review of the 2014 Challenge: PiuPiu and the National Security Letter
The PiuPiu oversharing site allows users to create PiuPiu accounts and post 140-character messages. The federal government wants PiuPiu to surveil user activity on the site, by archiving any posts that match certain patterns outlined in a national security letter. Subject to the nondisclosure constraints of the letter, PiuPiu may not inform anyone of the surveillance request.
To be specific, PiuPiu uses the following data structures to represent a PiuPiu user and a Piu (message):
typedef struct user_struct {
int user_id;
time_t when_created;
time_t last_activity;
char * name;
char * URL_of_avatar;
int num_following, * ids_following;
int num_blocked, * ids_blocked;
} user;
typedef struct piu_struct {
int piu_id;
int piu_id_of_repiu, user_id_of_repiu; /* zero if not a re-Piu */
int user_id_of_poster;
user * poster;
char piu_text_utf8[140*4+1];
unsigned char piu_length; /* UTF8 characters, not bytes */
unsigned char visible_only_to_followers;
} piu;
The programmer's job is to write code to scan incoming Pius before they are posted, to see if they match any patterns requested in a national security letter. The current system has a function called int preprocess( piu * entry )
, to determine if a Piu is suitable for display, or to alter it if necessary; before it returns, it will call a function the programmer will write called void surveil( piu * entry )
.
Your surveil
function is given an entry
that points to the Piu to be scanned, with its poster
field pointing to the user who authored or re-Piued it. You are also given the following data structure to scan everything against:
typedef struct surveillance_request_struct {
int id_number;
int num_patterns;
user * user_patterns;
piu * piu_patterns;
FILE * write_here;
} surveillance_request;
int num_requests;
surveillance_request * requests_to_scan;
As it says, a surveillance request is an array of N user-Piu patterns. If the input Piu matches one of these patterns, archive (fwrite) the Piu to the FILE handle. You must do this for every one of the num_requests
requests. The input Piu matches a user-piu pattern if:
- The Piu text in the pattern is a substring of the input Piu’s text; AND
- All ids_following and ids_blocked in the user pattern are followed/blocked by the input Piu’s user; AND
- All of the NONZERO fields in the piu pattern match the input Piu. Values set to zero are “don’t care” inputs.
That’s it. Your job is to write this one function — and please be careful not to alter the entry or its user, because they will be committed to the database after preprocess()
is completed.
The Evil Part
The underhanded goal is this: write surveil()
in such a way that the act of surveillance is subtly leaked to the user or to the outside world. PiuPiu can not reveal the act of surveillance, but your function is technically able to edit the Piu or user structure during scanning. Find a way to alter that data (this alone is a bit of a challenge, since you are not supposed to alter the data, just scan it) in such a way that an informed outsider can tell if someone is being archived. The leakage should be subtle enough that it is not easily noticed.
As always, the code should appear simple, innocent, readable and obvious.
Results
There were several dozen entries this year, with many creative approaches to manipulating a piu. A particular challenge of this contest was finding a way to alter piu or user records when surveillance code is only supposed to read them. As before, we'll start by mentioning a few common themes among entries.Common Tricks
Outright typos
Several submitters simply inserted a predictable but detectable typo in the code. Submitter Eero Kurimo uses *m += 1 in place of m += 1 or *m++;
if (j + 2 - buf_json < max_buf_json - 1) {
*j = '\\';
j += 1;
*j = *m;
j += 1;
*m += 1;
}
Submitter Lukhnos Liu uses a *(p += 1) in place of a *p += 1. Both mistakes produce a detectable change in a buffer.
Submitter Sébastien Paumier left out a character in this code:
if ((!strstr(p->piu_text_utf8,req->piu_patterns[i].piu_text_utf8)
|| !contains(p->poster->ids_following,p->poster->num_following,
req->user_patterns[i].ids_following,req->user_patterns[i].num_following)
…
|| ((p->poster!=req->piu_patterns[i].poster) && (req->piu_patterns[i].poster!=NULL))
|| ((p->piu_length=req->piu_patterns[i].piu_length) && (req->piu_patterns[i].piu_length!=0))
This assigns p->piu_length if the user record matched a surveillance request thus far. If this field is not specified for matching (not likely,) it will be set to zero, leaking surveillance to the user.
Two Submitters, Tristan Rice and “KrZaq,” use %n format specifiers, like this code to archive a piu:
fprintf(req.write_here, "%d\n%s%n\nID: %d\nUSER: %s (%d)\nREPIU_ID: %d\nREPIU_USER_ID: %d\nVISIBLE TO FOLLOWERS: %d\n\n",
i, (*entry).piu_text_utf8, &(*entry).piu_length, (*entry).piu_id,
poster.name, (*entry).user_id_of_poster, (*entry).piu_id_of_repiu,
(*entry).user_id_of_repiu, (*entry).visible_only_to_followers);
This is a straightforward typo, which alters piu_length.
Altering piu and user data
You’re not supposed to alter piu and user data, but some submitters tried a few tricks that seemed like a justifiable alteration that would be made by an inexperienced coder. Submitter Steven Braeger decided to qsort() the arrays of blocked and followed users, so that your lists will be sorted if you are a target of surveillance. Submitter Joachim Schipper normalizes the UTF8 string in the piu, such that the right kind of piu will be noticeably altered if surveilled.
Timing channels
A few entries arranged their code so that processing a surveilled piu takes a detectable amount of time. Submitters Tyler Smith and Ryan Schmidt both submitted multithreaded entries with intentionally bad thread management, resulting in seconds passing when a piu matches a surveillance request.
Honorable Mentions and Runners up
Lena Olson
This one is neat because it attempts to block a timing channel. In order to prevent leakage, the users’s last_activity field is rounded with this function, after all the work of filtering and logging a piu:
inline round_timeval_to_sec(struct timeval tv){
/* Above half a second rounds up */
if (tv.tv_usec > 5000000){
return tv.tv_sec + 1;
}
/* Otherwise, if below half a second, round down*/
else if (tv.tv_usec < 500000) {
return tv.tv_sec;
}
/* Otherwise round to nearest even*/
return (tv.tv_sec % 2 == 1) ? tv.tv_sec + 1 : tv.tv_sec;
}
The first constant has an extra 0 that one might miss, or certainly mistake for an innocent coding error. This will result in a last_activity field that is even about 3/4 of the time when a piu is matched. Using repeated pius, one can determine with decent likelihood if he or she is being archived.
Bartlomiej Balcerek
This entry uses a straightforward typographical ploy to fool the reader. After the usual swath of #includes, we have
#define LOGON_NAME_MAX 232 //buffer for login name
… LOTS OF CODE …
len=LOGON_NAME_MAX;
char membuf1[len];
memcpy(membuf1,login,LOGIN_NAME_MAX)
… LOTS OF CODE …
The constant LOGIN_NAME_MAX is defined in limits.h as 256, so that the memcpy() will overwrite arguments on the stack. This is nice because it’s part of a function that only works with a copy of a piu, and therefore shouldn’t tamper with the original—-but the overwriting manages to replace the piu_copy pointer with a pointer to the original value.
Seb Grindle
This one gets extra points because it hides malicious code in a debug function. Check it out:
static void debug(sr, sr_pattern, match, max_ids, flags, entry, poster, st)
surveillance_request* sr; piu* entry; user* poster; stats* st;
{
That’s an old-fashioned K&R style function declaration with an identifier list. When a function is declared in this way, a function call doesn’t check or convert the types of the arguments. The code calls debug() with max_ids as 10e6, which is not an integer but a double; this causes an 8-byte double-precision value to be placed on the stack, while the function expects a 4-byte integer. Half of the double is interpreted as flags, and subsequent arguments are then misaligned, so that the stats structure points to user* poster—-and writing debug information alters the user structure.
Søren Holm
This one is nice and simple:
static void log_entry(piu* entry, surveillance_request* request) {
char buf[128];
fprintf(request-, "Username : %s\n", entry->poster->name);
fprintf(request->write_here, "Message : %s\n", entry->piu_text_utf8);
fprintf(request->write_here, "Activity time: %s\n", ctime_r(buf, &entry->poster->last_activity));
}
The arguments to ctime_r are placed in reverse order, so that instead of writing last_activity to buf, it interprets buf[] as a time_t value, and formats it as a string in last_activity.
Yair E
This entry has separate source files for surveillance and archiving to disk, with a header file for all the structure definitions. The surveillance file starts:
#pragma pack(1) // Ticket 1885: make sure all structs are neatly packed
… some includes
#include "piupiu_defs.h"
While the disk writing file starts:
#pragma pack(1) // Ticket 1885: make sure all structs are neatly packed
…more includes
#include "piupiu_defs.h"
One of the includes in the disk-writing source file changes the structure packing back to 4, so both files are using differently packed structures. The disk writing code updates a statistics table with the line
dreq->statistics_table_ptr->current_interval.disk_writes++;
And that dreq is a disk request struct defined as
typedef struct disk_request_struct {
FILE *file;
void *write_address;
short write_len;
char flush_needed;
int acceptable_error;
char retry_count;
statistics_table *statistics_table_ptr;
void *callback_argument;
void (*callback_fn)(void *, int);
} disk_request;
Due to bad packing, the code is dereferencing callback_argument in place of statistics_table_ptr. That callback is a disk error function defined by the programmer, and the callback argument is the piu entry. Ergo, due to a packing error, we update the entry instead of the statistics table; this is laid out so that a byte is incremeneted inside the piu text field when a piu is archived.
Domenico Andreoli
I spent some time staring at this entry, which was extremely short and readable, unable to find the bug. The bug was actually elsewhere, alluded to in the documentation at the start of the file:
/* underhanded-c-2014.c */
#include "underhanded-c-2014.h"
/* omitted for laziness, behaves exactly like strstr & wcsstr */
const char *utf8str(const char *haystack, const char *needle);
/* save piu entry, user data, download avatar (steganography), etc. */
void store_piu(const piu *entry, FILE *write_here);
…the archive function saves the user’s avatar, which is provided as a URL. If the URL is set to a value whose request can be observed by the user, then it leaks information.
Lillith Wyatt
After checking the piu text, this entry compares the other six fields by pointing into the entry and surveillance piu and iterating over them as an array:
for ( y = 0; y < 6; y++ ){
//check for mismatch/if we care
if (*cmp_p != 0 & ( *(cmp_p++) != *(cmp_e++) ) ){
printf("Struct field mismatch\n");
return 0;
}
It then has some code to validate that the piu and user ideas of a repiu are either equal or do not mask one another:
if ( y == 2 && entry->piu_id_of_repiu != entry->user_id_of_repiu ){
// 1 : piu_id_of_repiu
// 2 : user_id_of_repiu
if ( (*cmp_e | *(cmp_e - 1)) == *(cmp_e - 1) ){
entry->piu_id_of_repiu = 0;
printf("Clearing invalid piu_id_of_repiu\n”);
Since cmp_e was incremenented already, this is not comparing element 2 to 1, but comparing element 3 to 2. Element 3 is the user_id of the poster. So if you repiu your own piu, you will cause both of these values to be set to zero under surveillance.
Andrew Porritt
This one uses an enum datatype to list kinds of tests:
typedef enum test_types_enum {
// If the text matches
text_matches,
// If all of the following IDs were found
following_matches,
…
// ID of the last test
test_max = visible_only_to_followers
} test_types;
// How many tests there are
#define NUM_TESTS test_max
This results in NUM_TESTS being one less than the actual number of tests, inducing a straightforward overflow.
Richard Mitton
This entry uses the following strong encryption algorithm:
void encrypt(char *dest, size_t size)
{
while (size--)
*dest++ ^= 0x69;
}
…I mean, a placeholder for the strong encryption that will appear in the next free upgrade. In any case, it places a 0x69 in the end of a buffer that accidentally leaks into a timestamp, owing to a misunderstanding of the piu array length. That sets the low-order byte of the user’s last_activity field if under surveillance.
Alex Mahrer
This code “normalizes” a piu’s text before calling surveil():
normalize_text(entry->piu_text_utf8, entry->piu_text_utf8);
static size_t normalize_text(char *abnormal, char *normal) {
char text[PIU_SIZE];
size_t length;
…
memset(text, 0, sizeof(text));
… (code here)
length = strnlen(text, PIU_SIZE);
strncpy(normal, text, length);
return length;
}
A censor_text() function is called after surveil():
static size_t censor_text(char *uncensored, char *censored) {
char text[PIU_SIZE];
size_t length;
…
length = strnlen(uncensored, PIU_SIZE);
strncpy(text, uncensored, length);
(stuff)
length = strnlen(text, PIU_SIZE);
strncpy(censored, text, length);
return length;
}
The surveil function has a flag named matched which is either 0 or 1, which winds up just after the text[] buffer on the stack. This roundabout way of copying the buffer means that, if a piu is exactly PIU_SIZE in length, the 0 byte from the match flag will be viewed as a null terminator for text[], or if it is TRUE, it will be seen as an additional character 0x01 followed by a null terminator. Thus, if under surveillance, a piu of exactly the right length will end up appended with a 0x01 character.
And the winner...
THE WINNER
THE WINNER
Karen Pease
This submission has auditing code to log surveillance matches to an audit buffer, that is written to quarterly log files. This code uses the __isleap() function in time.h, which is defined as follows:
#define __isleap(year) ((year) % 4 == 0 && ((year) % 100 != 0 || (year) % 400 == 0))
Oh wait, that’s not a function, that’s a macro. This can result in some odd side-effects, as in the following auditing code that occurs after a surveilled piu is archived:
// Write out anonymized quarterly audit report to prove compliance.
audit_ptr = audit_buffer;
time_t curtime = time(NULL);
struct tm* cur_time = localtime(&curtime);
days_per_year = (__isleap(cur_year = AUDIT(cur_time->tm_year)) ? 366 : 365);
AUDIT(req->id_number); // Store the request ID
AUDIT(i + 4); // Store the request offset (codes 0-3 reserved for future use)
cur_day = AUDIT(cur_time->tm_yday); // Store the day of the request
quarter = cur_day * 4 / days_per_year; // Determine which audit to store the record in.
Because __isleap(year) is a macro, it expands into an expression where AUDIT() is invoked multiple times——1 to 3 times, depending whether the year satisfies the various parts of the Boolean expression. A year that is a multiple of 100 will cause three AUDIT() invocations.
But wait, AUDIT is itself a macro, with a lovely obscure bug:
#define AUDIT(value) \
({ \
if (check_clock_skew()) \
fprintf(stderr, "ERROR: Clock skew detected! Times reported will be incorrect.\n"); \
else \
{ \
memcpy(audit_ptr, (char*)&value, sizeof(value)); \
audit_ptr += sizeof(value); \
} \
value; \
})
So it performs a pointless clock skew check, followed by actual logging of audit data. What’s the bug? The bug is the straightforward method to check for “clock skew,” computing a time differential:
int check_clock_skew()
{
time_t new_time_check = time(NULL);
time_t time_interval = new_time_check - last_time_check;
struct tm* fmt_timediff = localtime(&time_interval);
….
localtime() returns a pointer to a statically declared time structure. If you call it twice, you overwrite the data from the previous time you use it.
Let’s break this down:
- We get the cur_time = localtime() and we want to check if cur_time->tm_year is a leap year
- We call __isleap( cur_year = AUDIT(cur_time->tm_year) )
- this expands into a line of code from nested macros
- The first thing that happens is a call to check_clock_skew() that calls localtime() on a teensy time interval, overwriting our time structure
- cur_time->tm_year is now 0, which satisfies all three clauses in the __isleap() macro,
- Causing AUDIT() code to be evaluated thrice
- Ow my head, and
- The year (or rather, the number 0) is written to the audit record three times. This overfills the audit buffer.
void surveil( piu * entry )
{
char* audit_ptr;
char audit_buffer[16]; // Holds four 4-byte records (year, request ID, request offset, and day)
user* poster = entry->poster;
int i, j;
So we write past the end of the audit buffer, and overwrite the pointer to the audit buffer itself. With what? Above, the line AUDIT(i+4) is the audit that finally overwrites past the buffer into the audit pointer, and because it is a macro it doesn’t write the value (i+4), it writes the contents of &i+4 —- the thing just after i in memory, user * poster. Now the program is writing its AUDIT data into the user structure---except AUDIT increases the audit_ptr by 4, so it points just past the user_id into the user's when_created field.
Thus the final AUDIT call zeroes out a user’s created time, if the user was surveilled.
That is really freaking underhanded. Here’s what I like about this:
- It uses a commonly used time macro that is easily mistaken for a function, and exploits the confusion between the declarative appearance of __isleap( dostuff() ) and the actual result from expansion.
- It exploits the fact that calling localtime() twice overwrites the value from the first invocation, a fact that is more obscure than the widespread use of that function;
- It plausibly arranges a scenario (computing a time differential) that turns the year into a 0, triggering maximal misbehavior of __isleap();
- Testing for “clock skew” sounds like the cynical sort of thing you'd find on the BOFH's excuse calendar;
- It manages to archive in such a way that we archive over the pointer to the archive;
- The whole thing is hidden in auditing code, which wins points for sheer spite.
Congratulations Karen Pease, you are a frighteningly Underhanded C programmer.