Processes and the Process API
separate policy and mechanism
process api
- create
- destroy
- wait
- misc control
- status
creation
first load the program → i.e. its code and static data (initted variables) into RAM, the address space of the process (virtual memory)
loading can be eager (entire program) or lazy (pieces of program) (paging/swapping etc)
allocate memory for running the program →
- stack
- heap (optional)
the os is responsible for populating argc/argv etc
allocates file descriptors to process? (i/o setup)
jumps to main()
and transfers control of the cpu to the process
process states
- running
- ready
- blocked
data structures
(ladkiyon ko cp krne wale mard pasand aate h)
- process list / task list
- process
struct
[^1] / Process Control Block (PCB) / process descriptor - register context
// the registers xv6 will save and restore
// to stop and subsequently restart a process
struct context {
int eip;
int esp;
int ebx;
int ecx;
int edx;
int esi;
int edi;
int ebp;
};
// the different states a process can be in
enum proc_state { UNUSED, EMBRYO, SLEEPING, RUNNABLE, RUNNING, ZOMBIE };
// the information xv6 tracks about each process
// including its register context and state
struct proc {
char *mem; // Start of process memory
uint sz; // Size of process memory
char *kstack; // Bottom of kernel stack
// for this process
enum proc_state state; // Process state
int pid; // Process ID
struct proc *parent; // Parent process
void *chan; // If !zero, sleeping on chan
int killed; // If !zero, has been killed
struct file *ofile[NOFILE]; // Open files
struct inode *cwd; // Current directory
struct context context; // Switch here to run process
struct trapframe *tf; // Trap frame for the
// current interrupt
};
EMBRYO, ZOMBIE
:sob:
[1] https://tldp.org/LDP/tlk/kernel/processes.html → Linux implementation of processes
[2] https://tldp.org/LDP/lki/lki-2.html → linux task_struct
[3] https://github.com/torvalds/linux/blob/master/include/linux/sched.h#L813 → exact task_struct
definition (fucking huge and complex)
fork()
syscall
1 #include <stdio.h>
2 #include <stdlib.h>
3 #include <unistd.h> // POSIX API
4
5 int main(int argc, char *argv[]) {
6 printf("hello (pid:%d)\n", (int) getpid());
7 int rc = fork();
8 if (rc < 0) {
9 // fork failed
10 fprintf(stderr, "fork failed\n");
11 exit(1);
12 } else if (rc == 0) {
13 // child (new process)
14 printf("child (pid:%d)\n", (int) getpid());
15 } else {
16 // parent goes down this path (main)
17 printf("parent of %d (pid:%d)\n",
18 rc, (int) getpid());
19 }
20 return 0;
21 }
fork()
is a way to create a new process which is an almost exact copy of the calling process.
- the child continues from the
fork()
call, and not frommain()
- the child has its own copy of the address space etc
- the child returns 0 from
fork()
whereas the parent returns the PID of the child
wait()
syscall
The wait() system call suspends execution of the calling thread until one of its children terminates.
adds determinism to forking processes
exec()
syscall
execute the program file given to it with specified arguments as a new process
~ info pages
int execv (const char *FILENAME, char *const ARGV[])
int execl (const char *FILENAME, const char *ARG0, ...)
int execve (const char *FILENAME, char *const ARGV[], char *const ENV[])
int execve (int FD, char *const ARGV[], char *const ENV[])
int execle (const char *FILENAME, const char *ARG0, ..., char *const ENV[])
int execvp (const char *FILENAME, char *const ARGV[])
- looks for filename in PATH env var → for system utilities
int execlp (const char *FILENAME, const char *ARG0, ...)
What it does: given the name of an executable (e.g.,
wc
), and some arguments (e.g.,p3.c
), it loads code (and static data) from that executable and overwrites its current code segment (and current static data) with it; the heap and stack and other parts of the memory space of the program are re-initialized. Then the OS simply runs that program, passing in any arguments as theargv
of that process. Thus, it does not create a new process; rather, it transforms the currently running program (formerlyp3
) into a different running program (wc
). After theexec()
in the child, it is almost as ifp3.c
never ran; a successful call toexec()
never returns.
1 #include <stdio.h>
2 #include <stdlib.h>
3 #include <unistd.h>
4 #include <string.h>
5 #include <sys/wait.h>
6
7 int main(int argc, char *argv[]) {
8 printf("hello (pid:%d)\n", (int) getpid());
9 int rc = fork();
10 if (rc < 0) { // fork failed; exit
11 fprintf(stderr, "fork failed\n");
12 exit(1);
13 } else if (rc == 0) { // child (new process)
14 printf("child (pid:%d)\n", (int) getpid());
15 char *myargs[3];
16 myargs[0] = strdup("wc"); // program: "wc"
17 myargs[1] = strdup("p3.c"); // arg: input file
18 myargs[2] = NULL; // mark end of array
19 execvp(myargs[0], myargs); // runs word count
20 printf("this shouldn’t print out");
21 } else { // parent goes down this path
22 int rc_wait = wait(NULL);
23 printf("parent of %d (rc_wait:%d) (pid:%d)\n",
24 rc, rc_wait, (int) getpid());
25 }
26 return 0;
27 }
❯ ./a.out
hello (pid:31450)
child (pid:31451)
26 108 819 p3.c
parent of 31451 (rc_wait:31451) (pid:31450)
and indeed, the child does nothing except the exec()
call
type command to shell →
fork
process →exec
command →wait
for child → return to shell
the gap between fork
and exec
lets you do cool stuff like piping and redirection
0 // p4.c
1 #include <stdio.h>
2 #include <stdlib.h>
3 #include <unistd.h>
4 #include <string.h>
5 #include <fcntl.h>
6 #include <sys/wait.h>
7
8 int main(int argc, char *argv[]) {
9 int rc = fork();
10 if (rc < 0) {
11 // fork failed
12 fprintf(stderr, "fork failed\n");
13 exit(1);
14 } else if (rc == 0) {
15 // child: redirect standard output to a file
16 close(STDOUT_FILENO);
17 open("./p4.output", O_CREAT|O_WRONLY|O_TRUNC,
18 S_IRWXU);
19 // now exec "wc"...
20 char *myargs[3];
21 myargs[0] = strdup("wc"); // program: wc
22 myargs[1] = strdup("p4.c"); // arg: file to count
23 myargs[2] = NULL; // mark end of array
24 execvp(myargs[0], myargs); // runs word count
25 } else {
26 // parent goes down this path (main)
27 int rc_wait = wait(NULL);
28 }
29 return 0;
30 }
process control and users
kill()
- ctrl-c
SIGINT
- ctrl-z
SIGSTP
signal()
syscall
the concept of process control begets the concept of users and clarifying who in fact can control the process
While our passion for the UNIX process API remains strong, we should also note that such positivity is not uniform. For example, a recent paper by systems researchers from Microsoft, Boston University, and ETH in Switzerland details some problems with fork(), and advocates for other, simpler process creation APIs such as spawn() [B+19]. Read it, and the related work it refers to, to understand this different vantage point
[4] https://www.microsoft.com/en-us/research/wp-content/uploads/2019/04/fork-hotos19.pdf