OS

Processes

The Operating System virtualizes the CPU to promote the illusion of a nearly-endless supply of CPUs1

You can check your process limits with

 ulimit -u
62133 # calculated based on available ram i think
 sysctl kernel.pid_max
kernel.pid_max = 32768 # legacy default value
 sysctl kernel.threads-max
kernel.threads-max = 124266 # physical upper limit i imagine

pid_max and the ulimit are simply pid number limits, whereas threads-max is the max number of entries in the task table/list (usually a combo of hashtable + doubly linked list in linux)

Process API

A bare minimum interface for managing processes must include the following:

  1. create a process
  2. destroy a process forcefully
  3. wait for a process to finish running
  4. suspend pause a process
  5. status get info about state or how long it has run

UNIX API

UNIX gives you 4 routines to work with processes fork()2, exec(), exit(), and wait()

Process Creation

first load the program i.e. its code and static data (initted variables) into RAM, the address space of the process (virtual memory)

loading can be eager (entire program) or lazy (pieces of program) (paging/swapping etc)

allocate memory for running the program:

  1. stack
  2. heap (optional)

Interesting to Note

  • the os is responsible for populating argc/argv etc
  • the os is responsible for adding a canary before the return value in the stack3

allocates file descriptors to process? (i/o setup)

jumps to main() and transfers control of the cpu to the process

Process States

The exact states used by linux are

  1. Running:
    • The process is either running
    • or ready to run
  2. Waiting:
    • The process is waiting for an event or resource
    • It may be interruptible or uninterruptible by hardware signals
  3. Stopped:
    • process has been stopped, usually by receiving a signal
    • a debugged process may be in a stopped state
  4. Zombie:
    • A halted process which still has a task_struct in the task vector
    • a dead process
/* get_task_state(): */
#define TASK_REPORT			(TASK_RUNNING | TASK_INTERRUPTIBLE | \
					 TASK_UNINTERRUPTIBLE | __TASK_STOPPED | \
					 __TASK_TRACED | EXIT_DEAD | EXIT_ZOMBIE | \
					 TASK_PARKED)

Process Structures

  1. process list / task list
  2. process struct4 5 6 / Process Control Block (PCB) / process descriptor
  3. register context
// the registers xv6 will save and restore
// to stop and subsequently restart a process
struct context {
	int eip;
	int esp;
	int ebx;
	int ecx;
	int edx;
	int esi;
	int edi;
	int ebp;
};
// the different states a process can be in
enum proc_state { UNUSED, EMBRYO, SLEEPING, RUNNABLE, RUNNING, ZOMBIE };
 
// the information xv6 tracks about each process
// including its register context and state
struct proc {
	char *mem; // Start of process memory
	uint sz; // Size of process memory
	char *kstack; // Bottom of kernel stack
	// for this process
	enum proc_state state; // Process state
	int pid; // Process ID
	struct proc *parent; // Parent process
	void *chan; // If !zero, sleeping on chan
	int killed; // If !zero, has been killed
	struct file *ofile[NOFILE]; // Open files
	struct inode *cwd; // Current directory
	struct context context; // Switch here to run process
	struct trapframe *tf; // Trap frame for the
	// current interrupt
};

fork() syscall

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h> // POSIX API
 
int main(int argc, char *argv[]) {
  printf("hello (pid:%d)\n", (int) getpid());
  int rc = fork();
  if (rc < 0) {
    // fork failed
    fprintf(stderr, "fork failed\n");
    exit(1);
  } else if (rc == 0) {
    // child (new process)
    printf("child (pid:%d)\n", (int) getpid());
  } else {
    // parent goes down this path (main)
    printf("parent of %d (pid:%d)\n", rc, (int) getpid());
  }
  return 0;
 }

fork() is a way to create a new process which is an almost exact copy of the calling process.

  1. the child continues from the fork() call, and not from main()
  2. the child has its own copy of the address space etc
  3. the child returns 0 from fork() whereas the parent returns the PID of the child

wait() syscall

The wait() system call suspends execution of the calling thread until one of its children terminates.

Caution

wait() waits for any child to terminate (not all children either). to deterministically wait for a specific child you use waitpid()

This basically causes a parent to block until its child terminates and adds determinism to the forking process

else {
  // parent goes down this path (main)
  int status;                
  int wc = wait(&status);    // wc = pid of child that just terminated
                             // status now contains interesting information about how the child terminated
  printf("parent of %d (wc:%d) (pid:%d)\n", rc, wc, (int) getpid());
  if (WIFEXITED(status)) {
    printf("child EXIT_CODE %d\n", WEXITSTATUS(status));
  }
}

check out this

man 2 wait

Orphan Processes

If the parent terminates before the child, the child is orphaned and immediately adopted by init and reaped

We sometimes want to intentionally orphan a child, usually in the case of daemons. In those cases you can fork() twice and terminate the child, letting the grandchild be adopted by init and run indefinitely

exec() syscall

execute the program file given to it with specified arguments as a new process

FunctionSignatureArgument FormatEnvironmentPath Resolution
execv(path, argv[])Vector (Array)InheritedLiteral Path
execl(path, arg0, ..., NULL)List (Variadic)InheritedLiteral Path
execve(path, argv[], envp[])Vector (Array)ExplicitLiteral Path
fexecve(fd, argv[], envp[])Vector (Array)ExplicitFile Descriptor
execle(path, arg0, ..., NULL, envp[])List (Variadic)ExplicitLiteral Path
execvp(file, argv[])Vector (Array)Inherited$PATH Search
execlp(file, arg0, ..., NULL)List (Variadic)Inherited$PATH Search

given the name of an executable (e.g., wc), and some arguments (e.g., p3.c), it loads code (and static data) from that executable and overwrites its current code segment (and current static data) with it; the heap and stack and other parts of the memory space of the program are re-initialized. Then the OS simply runs that program, passing in any arguments as the argv of that process. Warning: Although theoretically you can create as many processes as you want, systems always have some limits. Therefore, always check to see if the returned value of fork() is negative and report the result. If this does happen, try to reduce the number of child processes, or re-organize your program.

exec() never returns!

It does not create a new process; rather, it transforms the currently running program (formerly p3) into a different running program (wc). After the exec() in the child, it is almost as if p3.c never ran a successful call to exec() never returns.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/wait.h>
 
int main(int argc, char *argv[]) {
  printf("hello (pid:%d)\n", (int) getpid());
  int rc = fork();
  if (rc < 0) { // fork failed; exit
  fprintf(stderr, "fork failed\n");
  exit(1);
  } else if (rc == 0) { // child (new process)
    printf("child (pid:%d)\n", (int) getpid());
    char *myargs[3];
    myargs[0] = strdup("wc");   // program: "wc"
    myargs[1] = strdup("p3.c"); // arg: input file
    myargs[2] = NULL;           // mark end of array
    execvp(myargs[0], myargs);  // runs word count
    printf("this shouldn’t print out");
  } else { // parent goes down this path
    int rc_wait = wait(NULL);
    printf("parent of %d (rc_wait:%d) (pid:%d)\n",
    rc, rc_wait, (int) getpid());
  }
  return 0;
 }
 ./a.out 
hello (pid:31450)
child (pid:31451)
 26 108 819 p3.c
parent of 31451 (rc_wait:31451) (pid:31450)

and indeed, the child does nothing except the exec() call

How your Shell runs processes

  1. type command to shell
  2. fork process
  3. exec command
  4. wait for child
  5. return to shell

the gap between fork and exec lets you do cool stuff like piping and redirection

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <fcntl.h>
#include <sys/wait.h>
 
int main(int argc, char *argv[]) {
  int rc = fork();
  if (rc < 0) {
    // fork failed
    fprintf(stderr, "fork failed\n");
    exit(1);
  } else if (rc == 0) {
    // child: redirect standard output to a file
    close(STDOUT_FILENO);
    open("./p4.output", O_CREAT|O_WRONLY|O_TRUNC, S_IRWXU);
    // now exec "wc"...
    char *myargs[3];
    myargs[0] = strdup("wc"); // program: wc
    myargs[1] = strdup("p4.c"); // arg: file to count
    myargs[2] = NULL; // mark end of array
    execvp(myargs[0], myargs); // runs word count
  } else {
    // parent goes down this path (main)
    int rc_wait = wait(NULL);
  }
  return 0;
}

process control and users

  • the kill() syscall sends signals to processes
  • the signal() syscall is used by the process to “catch” signals (sigaction() is recommended instead)
  • further your shell is hardwired to send
    • SIGINT(interrupt) on CTRL+C: usually terminating the process
    • SIGSTP(stop) on CTRL+Z: pausing the process mid-execution

Linux and most BSDs support 31 signals

/* ISO C99 signals.  */
#define SIGINT      2   /* Interactive attention signal.  */
#define SIGILL      4   /* Illegal instruction.  */
#define SIGABRT     6   /* Abnormal termination.  */
#define SIGFPE      8   /* Erroneous arithmetic operation.  */
#define SIGSEGV     11  /* Invalid access to storage.  */
#define SIGTERM     15  /* Termination request.  */
 
/* Historical signals specified by POSIX. */
#define SIGHUP      1   /* Hangup.  */
#define SIGQUIT     3   /* Quit.  */
#define SIGTRAP     5   /* Trace/breakpoint trap.  */
#define SIGKILL     9   /* Killed.  */
#define SIGPIPE     13  /* Broken pipe.  */
#define SIGALRM     14  /* Alarm clock.  */
 
/* Adjustments and additions to the signal number constants for
   most Linux systems.  */
 
#define SIGSTKFLT   16  /* Stack fault (obsolete).  */
#define SIGPWR      30  /* Power failure imminent.  */
 
/* Historical signals specified by POSIX. */
#define SIGBUS       7  /* Bus error.  */
#define SIGSYS      31  /* Bad system call.  */
 
/* New(er) POSIX signals (1003.1-2008, 1003.1-2013).  */
#define SIGURG      23  /* Urgent data is available at a socket.  */
#define SIGSTOP     19  /* Stop, unblockable.  */
#define SIGTSTP     20  /* Keyboard stop.  */
#define SIGCONT     18  /* Continue.  */
#define SIGCHLD     17  /* Child terminated or stopped.  */
#define SIGTTIN     21  /* Background read from control terminal.  */
#define SIGTTOU     22  /* Background write to control terminal.  */
#define SIGPOLL     29  /* Pollable event occurred (System V).  */
#define SIGXFSZ     25  /* File size limit exceeded.  */
#define SIGXCPU     24  /* CPU time limit exceeded.  */
#define SIGVTALRM   26  /* Virtual timer expired.  */
#define SIGPROF     27  /* Profiling timer expired.  */
#define SIGUSR1     10  /* User-defined signal 1.  */
#define SIGUSR2     12  /* User-defined signal 2.  */
 
/* Nonstandard signals found in all modern POSIX systems
   (including both BSD and Linux).  */
#define SIGWINCH    28  /* Window size change (4.3 BSD, Sun).  */

the concept of process control begets the concept of users and clarifying who in fact can control the process


While our passion for the UNIX process API remains strong, we should also note that such positivity is not uniform. For example, a recent paper by systems researchers from Microsoft, Boston University, and ETH in Switzerland details some problems with fork(), and advocates for other, simpler process creation APIs such as spawn() 2. Read it, and the related work it refers to, to understand this different vantage point

Footnotes

  1. https://www.man7.org/linux//man-pages/man5/proc_pid_limits.5.html and https://linuxvox.com/blog/maximum-number-of-processes-in-linux/

  2. https://www.microsoft.com/en-us/research/wp-content/uploads/2019/04/fork-hotos19.pdf 2

  3. https://www.kernel.org/doc/html/v4.19/security/self-protection.html#stack-buffer-overflow kernel self protection mechanisms

  4. https://tldp.org/LDP/tlk/kernel/processes.html Linux implementation of processes

  5. https://tldp.org/LDP/lki/lki-2.html linux task_struct

  6. https://github.com/torvalds/linux/blob/master/include/linux/sched.h#L819 exact task_struct definition (fucking huge and complex)