os virtualisation cpu

index

Processes and the Process API

separate policy and mechanism

process api

  1. create
  2. destroy
  3. wait
  4. misc control
  5. status

creation

first load the program i.e. its code and static data (initted variables) into RAM, the address space of the process (virtual memory)

loading can be eager (entire program) or lazy (pieces of program) (paging/swapping etc)

allocate memory for running the program

  1. stack
  2. heap (optional)

the os is responsible for populating argc/argv etc

allocates file descriptors to process? (i/o setup)

jumps to main() and transfers control of the cpu to the process

process states

  1. running
  2. ready
  3. blocked

data structures

(ladkiyon ko cp krne wale mard pasand aate h)

  1. process list / task list
  2. process struct[^1] / Process Control Block (PCB) / process descriptor
  3. register context
// the registers xv6 will save and restore
// to stop and subsequently restart a process
struct context {
	int eip;
	int esp;
	int ebx;
	int ecx;
	int edx;
	int esi;
	int edi;
	int ebp;
};
// the different states a process can be in
enum proc_state { UNUSED, EMBRYO, SLEEPING, RUNNABLE, RUNNING, ZOMBIE };
 
// the information xv6 tracks about each process
// including its register context and state
struct proc {
	char *mem; // Start of process memory
	uint sz; // Size of process memory
	char *kstack; // Bottom of kernel stack
	// for this process
	enum proc_state state; // Process state
	int pid; // Process ID
	struct proc *parent; // Parent process
	void *chan; // If !zero, sleeping on chan
	int killed; // If !zero, has been killed
	struct file *ofile[NOFILE]; // Open files
	struct inode *cwd; // Current directory
	struct context context; // Switch here to run process
	struct trapframe *tf; // Trap frame for the
	// current interrupt
};

EMBRYO, ZOMBIE :sob:

[1] https://tldp.org/LDP/tlk/kernel/processes.html Linux implementation of processes [2] https://tldp.org/LDP/lki/lki-2.html linux task_struct [3] https://github.com/torvalds/linux/blob/master/include/linux/sched.h#L813 exact task_struct definition (fucking huge and complex)

fork() syscall

1 #include <stdio.h>
2 #include <stdlib.h>
3 #include <unistd.h> // POSIX API
4
5 int main(int argc, char *argv[]) {
6 printf("hello (pid:%d)\n", (int) getpid());
7 int rc = fork();
8 if (rc < 0) {
9 // fork failed
10 fprintf(stderr, "fork failed\n");
11 exit(1);
12 } else if (rc == 0) {
13 // child (new process)
14 printf("child (pid:%d)\n", (int) getpid());
15 } else {
16 // parent goes down this path (main)
17 printf("parent of %d (pid:%d)\n",
18 rc, (int) getpid());
19 }
20 return 0;
21 }

fork() is a way to create a new process which is an almost exact copy of the calling process.

  1. the child continues from the fork() call, and not from main()
  2. the child has its own copy of the address space etc
  3. the child returns 0 from fork() whereas the parent returns the PID of the child

wait() syscall

The wait() system call suspends execution of the calling thread until one of its children terminates.

adds determinism to forking processes

exec() syscall

execute the program file given to it with specified arguments as a new process

~ info pages

  1. int execv (const char *FILENAME, char *const ARGV[])
  2. int execl (const char *FILENAME, const char *ARG0, ...)
  3. int execve (const char *FILENAME, char *const ARGV[], char *const ENV[])
  4. int execve (int FD, char *const ARGV[], char *const ENV[])
  5. int execle (const char *FILENAME, const char *ARG0, ..., char *const ENV[])
  6. int execvp (const char *FILENAME, char *const ARGV[])
    1. looks for filename in PATH env var for system utilities
  7. int execlp (const char *FILENAME, const char *ARG0, ...)

What it does: given the name of an executable (e.g., wc), and some arguments (e.g., p3.c), it loads code (and static data) from that executable and overwrites its current code segment (and current static data) with it; the heap and stack and other parts of the memory space of the program are re-initialized. Then the OS simply runs that program, passing in any arguments as the argv of that process. Thus, it does not create a new process; rather, it transforms the currently running program (formerly p3) into a different running program (wc). After the exec() in the child, it is almost as if p3.c never ran; a successful call to exec() never returns.

1 #include <stdio.h>
2 #include <stdlib.h>
3 #include <unistd.h>
4 #include <string.h>
5 #include <sys/wait.h>
6
7 int main(int argc, char *argv[]) {
8 printf("hello (pid:%d)\n", (int) getpid());
9 int rc = fork();
10 if (rc < 0) { // fork failed; exit
11 fprintf(stderr, "fork failed\n");
12 exit(1);
13 } else if (rc == 0) { // child (new process)
14 printf("child (pid:%d)\n", (int) getpid());
15 char *myargs[3];
16 myargs[0] = strdup("wc"); // program: "wc"
17 myargs[1] = strdup("p3.c"); // arg: input file
18 myargs[2] = NULL; // mark end of array
19 execvp(myargs[0], myargs); // runs word count
20 printf("this shouldn’t print out");
21 } else { // parent goes down this path
22 int rc_wait = wait(NULL);
23 printf("parent of %d (rc_wait:%d) (pid:%d)\n",
24 rc, rc_wait, (int) getpid());
25 }
26 return 0;
27 }
 ./a.out 
hello (pid:31450)
child (pid:31451)
 26 108 819 p3.c
parent of 31451 (rc_wait:31451) (pid:31450)

and indeed, the child does nothing except the exec() call

type command to shell fork process exec command wait for child return to shell

the gap between fork and exec lets you do cool stuff like piping and redirection

0 // p4.c
1 #include <stdio.h>
2 #include <stdlib.h>
3 #include <unistd.h>
4 #include <string.h>
5 #include <fcntl.h>
6 #include <sys/wait.h>
7
8 int main(int argc, char *argv[]) {
9 int rc = fork();
10 if (rc < 0) {
11 // fork failed
12 fprintf(stderr, "fork failed\n");
13 exit(1);
14 } else if (rc == 0) {
15 // child: redirect standard output to a file
16 close(STDOUT_FILENO);
17 open("./p4.output", O_CREAT|O_WRONLY|O_TRUNC,
18 S_IRWXU);
19 // now exec "wc"...
20 char *myargs[3];
21 myargs[0] = strdup("wc"); // program: wc
22 myargs[1] = strdup("p4.c"); // arg: file to count
23 myargs[2] = NULL; // mark end of array
24 execvp(myargs[0], myargs); // runs word count
25 } else {
26 // parent goes down this path (main)
27 int rc_wait = wait(NULL);
28 }
29 return 0;
30 }

process control and users

  • kill()
  • ctrl-c SIGINT
  • ctrl-z SIGSTP
  • signal() syscall

the concept of process control begets the concept of users and clarifying who in fact can control the process

While our passion for the UNIX process API remains strong, we should also note that such positivity is not uniform. For example, a recent paper by systems researchers from Microsoft, Boston University, and ETH in Switzerland details some problems with fork(), and advocates for other, simpler process creation APIs such as spawn() [B+19]. Read it, and the related work it refers to, to understand this different vantage point

[4] https://www.microsoft.com/en-us/research/wp-content/uploads/2019/04/fork-hotos19.pdf