segfaults on 64bit systems when build with c99

Issue #1816 closed
Wolfgang Kastaun created an issue

I recently encountered a segfault in ScheduleInterface.c, more precisely in the function

static int CCTKi_ScheduleCallFunction(void function, t_attribute attribute, t_sched_data data) { / find the timer for this function and this schedule bin / t_timer timer = attribute->timers; while (timer && strcmp(timer->schedule_bin, data->schedule_bin)) { timer = timer->next; }

Running in the debugger revealed that timer->schedule_bin pointed to an invalid address. Curiously, it had the top 33 (33 not a typo) bits all set. Taking the lowest 32 bits gave a valid address which pointed to a reasonable string "CCTK_INITIAL". This suggests a 32/64 bit issue. The pointer timer->schedule_bin seems to be initialized in the same function using strdup:

timer->schedule_bin = strdup (where);

strdup is not part of the c99 standard, but only Posix. Compiling with gcc --std=c99 means it is not defined in <string.h>. This means the compiler treats the occurrence of strdup as an implicit function declaration, and assumes it returns int. Thus, it will do an implicit conversion of the result from int to char*. If the highest bit of the int was set, this resulted in a 64 bit pointer with all 32 high bits set (I checked with a small test code). When the address returned by the actual strdup code linked from glibc has the top 33 bits zero, the conversion yields the correct results. Therefore, the problem is hard to reproduce, it only occurred with a test case almost exhausting my workstations memory, but frustratingly not small tests.

After this, I also found compiler warnings for ScheduleInterface.c of the type

implicit declaration of function ‘strdup’ [-Wimplicit-function-declaration] and assignment makes pointer from integer without a cast [-Wint-conversion]

Switching from --std=c99 to --std=gnu99 fixed the problem for now. However, this is a bug that might affect many users since the code compiles with --std=c99 and the compiler warnings are hidden within the thousands of other compiler warnings the ET code generates.

Also, a quick grep revealed many occurrences of strdup, although some of them where redefined as Util_Strdup. The rest might lead to segfaults on 64 bit systems with std=c99.

My findings concern the Wheeler release, I haven't had time to check the development version.

Keyword:

Comments (11)

  1. Frank Löffler
    • removed comment

    Thank you for the very detailed bug report. This is indeed very helpful.

    As to how to fix this: if we assume that we cannot influence (to some degree) how users compile the code, what is left is either defining a strdup() ourself and thereby making sure it is always present, or consistently use Util_Strdup) instead of strdup(), essentially doing the same.

    In terms of compiler options we could require std=gnu99, and also recommend -Werror-implicit-function-declaration (for gcc), but would not be enforceable and still leaves users out in the cold.

  2. Erik Schnetter
    • removed comment

    We require Posix features in a few other cases as well. We should abort during the configure stage if strdup is missing.

  3. Log in to comment