OS

Limited Direct Execution

  • to virtualise the cpu and do stuff like timesharing and context switching whatnot, you need to be able to control the execution of a process
  • can’t just jump to main() and be like idc what it does, be a strict parent. also important for security
  • got to be efficient about this too, can’t compromise performance

Directly Executing the Program:

OSProgram
Create entry for process list
Allocate memory for program
Load program into memory
Set up stack with argc/argv
Clear registers
Execute call main()
Run main()
Execute return from main
Free memory of process
Remove from process list

two problems with direct execution

  • how to make sure program doesn’t do some shit to mess up your laptop (linux moment)?
  • how to implement time sharing? this doesn’t allow for multiple processes

Problem #1: Restricted Operations and Security

introduce user mode and kernel mode1

a user can never be trusted. it may try to hog your resources, or delete something important, or tons of other stuff. fuck the user. so we say if you want to do anything important, you have to go through the kernel. the kernel offers safe ways to do privileged things and can ensure the user does not overstep. fuck the user.

System Calls

to do something privileged, the program executes a trap instruction to that:

  1. changes privilege to kernel mode
  2. performs the privileged operation under heavy supervision
  3. once done the kernel executes a return-from-trap that changes the privilege back to user mode
  4. the process returns control to the user

gotta make sure kernel mode doesn’t overwrite the context of user mode making return-from-trap impossible. x86 implements a kernel stack for each process to store the user context when entering trap

x86 process management

still gotta do this safely, the user must not have any sort of control over the kernel memory, so it can not be told the address of these syscalls, so it must call them other way. so the os defines some trap handlers which the hardware triggers when these traps are called, the hardware knows the exact address of these handlers, and the user only calls syscalls by their numbers, so user input can not call unsafe addresses2

Calling a Trap

load the syscall number into rax and call the syscall instruction3

main:
  mov %eax, 1 ; identifier for write() syscall
  syscall
man syscalls
info syscall

Userspace/Kernelspace Protocol

OS @ boot (kernel mode)HardwareProgram (user mode)
initialize trap tableremember address of syscall handler
OS @ run (kernel mode)HardwareProgram (user mode)
Create entry for process list
Allocate memory for program
Load program into memory
Setup user stack with argc/argv
Fill kernel stack with reg/PC
return-from-trap
restore regs (from kernel stack)
move to user mode
jump to main
run main()
call sys call trap into OS
save regs (to kernel stack)
move to kernel mode
jump to trap handler
handle trap
do work of syscall
return-from-trap
restore regs (from kernel stack)
move to user mode
jump to PC after trap
return from main()
trap (via exit())
Free memory of process
Remove from process list

Some nice syscalls to remember4

  • process control:
    • fork()
    • exec()
    • wait()
  • file management:
    • open()
    • read()
    • write()
    • close()
  • memory management:
    • mmap()
    • brk()
  • device management:
    • ioctl()
  • communication:
    • pipe()
    • socket()

Problem #2: Time sharing and Switching Between Processes

if a process is running the os is not running. so how does the os stop the process to start the next one?

wafadar(cooperative) approach

  • trust the process will make syscalls at regular intervals
  • syscalls give the os control back and then it can choose to run a diff process
  • such systems usually include an explicit yield() syscall
  • overly trusting. remember, fuck the user

bewafa(non-cooperative) approach

  • don’t trust the process. maintain a timer and regain control every few miliseconds
  • it chooses to switch processes or not based on the scheduler
  • if it chooses to switch, then it executes the context switch. reminder context is basically the state of the registers at that moment for the process
    • it saves the
      • general purpose registers
      • pc
      • kernel sp
    • and restores the same of the process it is switching to
  • this is expensive though5

Limited Direct Execution Protocol (with timer interrupt)

OS @ boot (kernel mode)HardwareProgram (user mode)
initialize trap tableremember address of syscall handler
start interrupt timerstart timer
interrupt CPU in X ms
OS @ run (kernel mode)HardwareProgram (user mode)
Process A
timer interrupt
save regs(A) k-stack(A)
move to kernel mode
jump to trap handler
handle the trap
call switch() routine
- saves regs(A) proc_t(A)
- restores regs(B) proc_t(B)
- switch to k-stack(B)
return-from-trap (into B)
restore regs(B)k-stack(B)
move to user mode
jump to B’s PC
Process B

xv6 context switch code

; void swtch(struct context *old, struct context *new);
;
; Save current register context in old
; and then load register context from new.
.globl swtch
swtch:
    ; Save old registers
    movl 4(%esp), %eax       ; put old ptr into eax
    popl 0(%eax)             ; save the old IP
    movl %esp, 4(%eax)       ; and stack
    movl %ebx, 8(%eax)       ; and other registers
    movl %ecx, 12(%eax)
    movl %edx, 16(%eax)
    movl %esi, 20(%eax)
    movl %edi, 24(%eax)
    movl %ebp, 28(%eax)
    
    ; Load new registers
    movl 4(%esp), %eax       ; put new ptr into eax
    movl 28(%eax), %ebp      ; restore other registers
    movl 24(%eax), %edi
    movl 20(%eax), %esi
    movl 16(%eax), %edx
    movl 12(%eax), %ecx
    movl 8(%eax), %ebx
    movl 4(%eax), %esp       ; stack is switched here
    pushl 0(%eax)            ; return addr put in place
    ret                      ; finally return into new ctxt

Footnotes

  1. https://www.youtube.com/watch?v=fLS99zJDHOc liveoverflow video demonstrating userspace vs kernelspace

  2. https://hovav.net/ucsd/dist/geometry.pdf paper on ROP

  3. https://syscalls.mebeim.net/?table=x86/64/x64/latest fantastic syscall reference

  4. https://nyxfault.github.io/posts/Syscalls/ historical perspective of syscalls and nice low level stuff

  5. https://www.usenix.org/legacy/publications/library/proceedings/sd96/full_papers/mcvoy.pdf benchmarking tools