The Unterminated String

Embedded Things and Software Stuff

BusyBox ps - Brackets and Braces

Posted at — Aug 5, 2019

TL;DR

In the COMMAND column of BusyBox ps, the name of the process will sometimes be wrapped in curly braces or square brackets.

Curly braces {} are used to indicate that the filename of the executable retrieved from /proc/<pid>/stat doesn’t match the argv[0] value parsed from /proc/<pid>/cmdline. This can occur if the program has been run with an interpreter e.g. /usr/bin/python or has modified its own argv.

Square brackets [] are used to indicate that the process’ /proc/<pid>/cmdline was empty. Possible reasons for this are the process is a zombie or a kernel thread.

Overview

BusyBox is a bundle of UNIX utilities often found on embedded devices. One command it provides is ps, which lists the currently running processes.

An example of the output of this command is:

PID  USER  TIME  COMMAND
38   root   0:00 [oom_reaper]
3272 alan   2:24 {terminator} /usr/bin/python /usr/bin/terminator
5259 alan   0:00 vim procps/ps.c

The basic interpretation of this table is hopefully fairly obvious to anyone familiar with Linux. However I wanted to know some more information about the COMMAND column. In particular:

Straight to the Source

I downloaded and built the BusyBox source code to get a better understanding of how ps works:

git clone git://busybox.net/busybox.git
cd busybox
git checkout 9663bbd17ba3ab9f7921d7c46f07d177cb4a1435

make menuconfig
make
make -j5 CONFIG_PREFIX=~/busybox-install install

As far as this investigation is concerned, BusyBox ps inspects the following two files. The documentation for each can be found in in man proc.

The first BusyBox function we’re interested in is procps_scan(). This will read /proc/<pid>/stat and retrieve the string comm, which is the: “filename of the executable”.

When outputting the rows of process information to the terminal, the function format_process() will call read_cmdline().

This will retrieve the contents of /proc/<pid>/cmdline and sanitise its contents by swapping the delimiting NULL’s between the command line arguments for spaces. (I’ll refer to this sanitised string later as cmdline_contents).

This function will then find the name of the process actually being run by finding the basename of first command line argument, argv[0]. The strings basename(argv[0]) and comm are then compared. Based on this one of the following is chosen to be displayed in the COMMAND column of ps:

To demonstrate this, we can use the example of the stat and cmdline files for an instance of terminator running on my system:

$ cat /proc/14159/stat
14159 (terminator) S 2736 2568 2568 1026 2568 4194560 23680 508 68 1 3268 423 0 0 20 0 4 0 9123075 905396224 17607 18446744073709551615 94654019346432 94654022492352 140735972289904 0 0 0 0 16781312 65538 0 0 0 17 2 0 0 7 0 0 94654024590000 94654025079160 94654044663808 140735972297096 140735972297132 140735972297132 140735972298724 0

$ strings -n1 /proc/14159/cmdline 
/usr/bin/python
/usr/bin/terminator

From stat, BusyBox extracts terminator as the variable comm. The sanitised value of the cmdline file, cmdline_contents will be /usr/bin/python /usr/bin/terminator. The output of basename(argv[0]) here is python which is not equal to terminator. The ps output will therefore be:

{terminator} /usr/bin/python /usr/bin/terminator

Why are there Differences in stat and cmdline?

Interpreters

If the comm value in the stat file is “The filename of the executable” why would this be different than the program listed as the cmdline’s argv[0]?

Certain executables are actually scripts that need an interpreter to run them, e.g. Bash or Python.

Before running a program Linux checks to see if a shebang exists at the start of the file. If so, the user command is passed as an argument to the interpreter listed within the file. At program execution argv[0] will be the path to the interpreter, not the command being executed.

Can a Program Manipulate Its cmdline?

I started wondering what would happen to /proc/<pid>/cmdline if a program started modifying its argv? I first went to check if this was actually legal behaviour in C. It is, as documented in the C99 standard:

The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program startup and program termination.

To test how Linux would behave if these strings were modified I put together the following:

#include <stdio.h>
#include <string.h>
#include <sys/types.h>
#include <unistd.h>
int main(int const argc, char* argv[])
{
    int argv_index = 0;
    pid_t this_pid = getpid();
    printf("pid is %d\n", this_pid);
    getchar();
    while (argv_index < argc)
    {
        memset(argv[argv_index],
               'x',
               strlen(argv[argv_index]));
        argv_index++;
    }
    getchar();
    return 0;
}

The outputs below show the contents of the above programs’ /proc/<pid>/cmdline before and after user input is provided to trigger the modification. It can be seen that modifications by a program to its argv are reflected in the cmdline file.

$ strings -n1 /proc/$(pgrep modify_argv)/cmdline 

./modify_argv
testing
testing
one
two
three

$ strings -n1 /proc/$(pgrep modify_argv)/cmdline

xxxxxxxxxxxxx
xxxxxxx
xxxxxxx
xxx
xxx
xxxxx

After modification the output from BusyBox ps is:

PID  USER      TIME COMMAND
1908 alan      0:00 {modify_argv} xxxxxxxxxxxxx xxxxxxx xxxxxxx xxx xxx xxxxx

Why is a Program’s cmdline Empty?

Zombies

The manpage for proc suggests that if a process’ cmdline is empty then it is likely a zombie.

The following program was written to test this. It will fork a child process that returns after five seconds, creating a zombie.

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
int main()
{
    if (fork())
    {
        getchar();
    }
    else
    {
        sleep(5);
    }
    return 0;
}

A quick check of the process’ cmdline file shows that it is in fact empty once it becomes a zombie. Additionally, we can observe this change by looking at the output of ps before sleep(5) expires:

8702 alan      0:00 ./zombie.exe
8703 alan      0:00 ./zombie.exe

and aftersleep(5) expires:

8702 alan      0:00 ./zombie.exe
8703 alan      0:00 [zombie.exe]

Kernel Threads

The following comment in the source code of BusyBox suggests that there is another reason why cmdline might be empty:

/* Puts [comm] if cmdline is empty (-> process is a kernel thread) */

The process [oom_reaper] listed a ps output certainly sounds like it could / should be running from a kernel thread. We can check this though by finding the parent of the process, whose PID can be found as the fourth value in the stat file.

$ cat /proc/38/stat
38 (oom_reaper) S 2 0 0 0 -1 2129984 0 0 0 0 0 0 0 0 20 0 1 0 4 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Inspecting the parent’s stat file, /proc/2/stat, shows the process’ comm string is kthreadd. While the name itself is a give away, the following quote from lwn confirms it:

kthreadd is the kernel thread daemon in charge of asynchronously spawning new kernel threads whenever requested

Other ?

The command pstree -p 2 will list the running child processes of kthreadd. From a manual comparison of pstree and ps on my system, the only process listed in [square brackets] that was not a child of kthreadd was a zombie. I’m unsure if there is a third reason why a process might have an empty cmdline.