In the COMMAND
column of BusyBox ps
, the name of the process will sometimes be wrapped in curly braces or square brackets.
Curly braces {}
are used to indicate that the filename of the executable retrieved from /proc/<pid>/stat
doesn’t match the argv[0]
value parsed from /proc/<pid>/cmdline
. This can occur if the program has been run with an interpreter e.g. /usr/bin/python
or has modified its own argv
.
Square brackets []
are used to indicate that the process’ /proc/<pid>/cmdline
was empty. Possible reasons for this are the process is a zombie or a kernel thread.
BusyBox is a bundle of UNIX utilities often found on embedded devices.
One command it provides is ps
, which lists the currently running processes.
An example of the output of this command is:
PID USER TIME COMMAND
38 root 0:00 [oom_reaper]
3272 alan 2:24 {terminator} /usr/bin/python /usr/bin/terminator
5259 alan 0:00 vim procps/ps.c
The basic interpretation of this table is hopefully fairly obvious to anyone familiar with Linux. However I wanted to know some more information about the COMMAND
column. In particular:
terminator
above) wrapped in curly braces?oom_reaper
above) wrapped in square brackets?I downloaded and built the BusyBox source code to get a better understanding of how ps
works:
git clone git://busybox.net/busybox.git
cd busybox
git checkout 9663bbd17ba3ab9f7921d7c46f07d177cb4a1435
make menuconfig
make
make -j5 CONFIG_PREFIX=~/busybox-install install
As far as this investigation is concerned, BusyBox ps
inspects the following two files. The documentation for each can be found in in man proc
.
/proc/<pid>/cmdline
- “This read-only file holds the complete command line for the process”
/proc/<pid>/stat
- “Status information about the process”
The first BusyBox function we’re interested in is procps_scan()
. This will read /proc/<pid>/stat
and retrieve the string comm
, which is the: “filename of the executable”.
When outputting the rows of process information to the terminal, the function format_process()
will call read_cmdline()
.
This will retrieve the contents of /proc/<pid>/cmdline
and sanitise its contents by swapping the delimiting NULL
’s between the command line arguments for spaces. (I’ll refer to this sanitised string later as cmdline_contents
).
This function will then find the name of the process actually being run by finding the basename of first command line argument, argv[0]
. The strings basename(argv[0])
and comm
are then compared. Based on this one of the following is chosen to be displayed in the COMMAND
column of ps
:
cmdline_contents
if the process names were equal{comm} cmdline_contents
if the process names were not equal[comm]
if the file cmdline
was emptyTo demonstrate this, we can use the example of the stat
and cmdline
files for an instance of terminator
running on my system:
$ cat /proc/14159/stat
14159 (terminator) S 2736 2568 2568 1026 2568 4194560 23680 508 68 1 3268 423 0 0 20 0 4 0 9123075 905396224 17607 18446744073709551615 94654019346432 94654022492352 140735972289904 0 0 0 0 16781312 65538 0 0 0 17 2 0 0 7 0 0 94654024590000 94654025079160 94654044663808 140735972297096 140735972297132 140735972297132 140735972298724 0
$ strings -n1 /proc/14159/cmdline
/usr/bin/python
/usr/bin/terminator
From stat
, BusyBox extracts terminator
as the variable comm
.
The sanitised value of the cmdline
file, cmdline_contents
will be /usr/bin/python /usr/bin/terminator
.
The output of basename(argv[0])
here is python
which is not equal to terminator
. The ps
output will therefore be:
{terminator} /usr/bin/python /usr/bin/terminator
If the comm
value in the stat
file is “The filename of the executable” why would this be different than the program listed as the cmdline
’s argv[0]
?
Certain executables are actually scripts that need an interpreter to run them, e.g. Bash or Python.
Before running a program Linux checks to see if a shebang exists at the start of the file. If so, the user command is passed as an argument to the interpreter listed within the file. At program execution argv[0]
will be the path to the interpreter, not the command being executed.
I started wondering what would happen to /proc/<pid>/cmdline
if a program started modifying its argv
? I first went to check if this was actually legal behaviour in C. It is, as documented in the C99 standard:
The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program startup and program termination.
To test how Linux would behave if these strings were modified I put together the following:
#include <stdio.h>
#include <string.h>
#include <sys/types.h>
#include <unistd.h>
int main(int const argc, char* argv[])
{
int argv_index = 0;
pid_t this_pid = getpid();
printf("pid is %d\n", this_pid);
getchar();
while (argv_index < argc)
{
memset(argv[argv_index],
'x',
strlen(argv[argv_index]));
argv_index++;
}
getchar();
return 0;
}
The outputs below show the contents of the above programs’ /proc/<pid>/cmdline
before and after user input is provided to trigger the modification. It can be seen that modifications by a program to its argv
are reflected in the cmdline
file.
$ strings -n1 /proc/$(pgrep modify_argv)/cmdline
./modify_argv
testing
testing
one
two
three
$ strings -n1 /proc/$(pgrep modify_argv)/cmdline
xxxxxxxxxxxxx
xxxxxxx
xxxxxxx
xxx
xxx
xxxxx
After modification the output from BusyBox ps
is:
PID USER TIME COMMAND
1908 alan 0:00 {modify_argv} xxxxxxxxxxxxx xxxxxxx xxxxxxx xxx xxx xxxxx
The manpage for proc
suggests that if a process’ cmdline
is empty then it is likely a zombie.
The following program was written to test this. It will fork a child process that returns after five seconds, creating a zombie.
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
int main()
{
if (fork())
{
getchar();
}
else
{
sleep(5);
}
return 0;
}
A quick check of the process’ cmdline
file shows that it is in fact empty once it becomes a zombie. Additionally, we can observe this change by looking at the output of ps
before sleep(5)
expires:
8702 alan 0:00 ./zombie.exe
8703 alan 0:00 ./zombie.exe
and aftersleep(5)
expires:
8702 alan 0:00 ./zombie.exe
8703 alan 0:00 [zombie.exe]
The following comment in the source code of BusyBox suggests that there is another reason why cmdline
might be empty:
/* Puts [comm] if cmdline is empty (-> process is a kernel thread) */
The process [oom_reaper]
listed a ps
output certainly sounds like it could / should be running from a kernel thread. We can check this though by finding the parent of the process, whose PID can be found as the fourth value in the stat
file.
$ cat /proc/38/stat
38 (oom_reaper) S 2 0 0 0 -1 2129984 0 0 0 0 0 0 0 0 20 0 1 0 4 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Inspecting the parent’s stat
file, /proc/2/stat
, shows the process’ comm
string is kthreadd
. While the name itself is a give away, the following quote from lwn confirms it:
kthreadd is the kernel thread daemon in charge of asynchronously spawning new kernel threads whenever requested
The command pstree -p 2
will list the running child processes of kthreadd
. From a manual comparison of pstree
and ps
on my system, the only process listed in [square brackets]
that was not a child of kthreadd
was a zombie. I’m unsure if there is a third reason why a process might have an empty cmdline
.