OS Notes: Processes & Threads

Mar 28, 2020

Processes and threads

Thread usage

Only with threads we add a new element: the ability for the parallel entities to share an address space and all of its dataamong themselves.
They are lighter weight than processes and easier (faster) to create and destroy than processes.
When there is substantial computing and also substantial I/O, having threads allows these activities to overlap, thus speeding up the application.
Useful on systems with multiple CPUs, where real parallelism is possible.

Processes are used to group resources together, threads are the entities scheduled for execution on the CPU.

per-process items	per-thread items
address space global variables open files child processes pending alarms signals and signal handlers accounting information	program counter registers stack state

User space threads

Pros

A user-level threads package can be implemented on an operating system that does not support threads.
With instructions that store all the registers and load them all, the entire thread switch can be done in a handful of instructions.
Invoking thread_yield is more efficient than making a kernel call. No trap is needed, no context switch is needed, the memory cache need not be flushed.
They allow each process have its own customized scheduling system.
Kernel threads invariably require table space and stack space in the kernel.

Cons

How blocking system calls are implemented concerns since they will stop all the threads.
- Always using nonblocking system calls require changes to the operating system and will require changes to many user programs.
- Invoking select to tell in advance if a call will block requires rewriting parts of the system call library, and is inefficient and inelegant.
If a thread causes a page fault, the kernel, unaware of even the existence of threads, blocks the process until the disk I/O is complete, even though other threads might be runnable.
No other thread in the same process will ever run unless the first thread voluntarily gives up the CPU.
- Having the run-time system request a clock signal once a second to give it control is crude and messy.
Some signals are logically thread specific. The kernel can hardly direct the signal to the right one.

Threads in the kernel

The kernel's thread table hold each thread's registers, state, and other information.

Due to the greater cost of creating and destroying threads in the kernel, some systems take an environmentally correct approach and recycle their threads.

Pros

Kernel threads do not require any new, nonblocking system calls.
Kernel can easily check to see if the process any other runnable threads if one thread causes a page fault.

Cons

The cost of a system call is substantial.
What happens when a multithreaded process forks concerns.
If two or more threads register for the same signal what happens when a signal comes in?

Making single-threaded code multithreaded

Global variables, such as erroro maintained by UNIX, break consistency.

Assign each thread its own private global variables. (thread_local)
Many library procedures are not reentrant.

To rewrite the entire library is nontrivial with a real possibility of introducing subtle errors.

Provide each procudure with a jacket that sets a bit to mark the library as in use.
A process with multiple threads must also have multiple stacks. If the kernel is not aware of all the stacks, it cannot grow them automatically upon stack fault.

When to schedule

When a new process is created.
When a process exits.
When a process blocks on I/O or some other reason.
When an I/O interrupt occurs.
At each clock interrupt or at every k-th clock interrupt.

Scheduling in batch systems

First-come, first-served
Shorted job first
Shorted remaining time next

Scheduling in interactive systems

Round-robin scheduling
Priority scheduling
Multiple queues
Shortest process next
Guaranteed scheduling
Lottery scheduling
Fair-share scheduling