# Lecture 4 - Rediscovering Process Scheduling [Part - 1]

### **Disclaimer**

⚠️ **Where the Scheduler whispers, processes tremble — for it decides who runs… and who fades into starvation.**

The following content ventures into the ticking heart of the OS — where time slices are bargained, queues grow restless, and scheduling algorithms silently wage war to minimize the average waiting time of every process.

**Students and beginners**, tread carefully — the **Scheduler** watches every move, counts every cycle, and never forgets who waited longest.

In this OS series, the focus remains on the **Operating System** (software context) components, not the hardware mechanics beneath them. *(For the hardware side of CPU pipelines, caches, and context-switch machinery, refer to computer architecture.)*

### **Special Thanks**

Heartfelt gratitude to **Mr.** [**Adhakshoj Mishra Ji**](https://www.linkedin.com/in/adhokshajmishra/) for his insightful session and for reviewing this blog.

A sincere thanks as well to the **BreachForce Community Members** for sharing their valuable notes, and to the **BreachForce Community Volunteers** for helping collate and refine this content.

# Preface

In the [last blog](https://breachforce.net/lecture-3-reinventing-mmu-part-2), we explored how **Privileged Mode** emerged inside our MMU — how we designed **SPI, MSRs, and dedicated interrupt handlers** to enforce strict control, protect critical system state, and ensure the kernel always remained safely isolated from user processes.

In this blog, we’ll uncover the depths of **process scheduling** by walking through a series of problems and exploring their possible solutions. As we refine each idea, we’ll naturally encounter subtle design caveats—tiny scheduling dilemmas that demand their own mini-solutions. Through these iterations, we’ll slowly sculpt and evolve our vision of what an ideal scheduler should look like.

## Important Terminologies

* **Process:** A program in execution with its own memory space.
    
* **Task:** A generic term that may refer to either a *process* or a *thread*, depending on the OS design.
    
    *In this blog, the words* ***process*** *and* ***task*** *will be used interchangeably.*
    
* **Job**: A **job** is a unit of work submitted to the operating system for execution, typically representing a process before it enters the ready queue.
    
* **CPU Cycle:** The smallest unit of time in which the CPU performs operations; scheduling algorithms often treat each cycle (or a group of cycles) as the basic time quantum for executing processes.
    
* **Waiting Time:** The total time a process spends in the ready queue *waiting* before it gets CPU time for execution. It excludes actual CPU run time and I/O time
    
* **Process Queue:** A data structure used by the operating system to organize and manage processes based on their current state - such as the **ready**, **waiting/blocking**, or \*\*terminated -\*\*allowing the scheduler to decide which process should run next.
    

# Process Scheduling

* In the 1980s, computers were extremely expensive, and computational resources were limited. The primary goal was to complete as many tasks as possible using the available hardware efficiently.
    
* When designing an Operating System, our focus should revolve around two key aspects:
    
    * **Accuracy of the Program:** This responsibility lies with the developer. The program running on the CPU is assumed to be tested, verified, and trusted by users. The OS does not alter program logic; it simply executes it.
        
    * **Efficiency of the Program:** This refers to minimizing the total time a process takes to complete. Higher efficiency allows the CPU to perform more work in less time.
        
* To increase efficiency, we aim to complete tasks in the shortest possible time.
    
* To achieve this, we must minimize the waiting time of processes during scheduling and context switching, because freeing the CPU as soon as possible allows more tasks to be completed.
    
* This is why we design a **Process Scheduling Algorithm**, supported by a **process queue**, to ensure that tasks are executed efficiently and system resources are utilized optimally.
    

## Problem - How to reduce overhead while executing processes?

* When multiple processes run concurrently, the system eventually reaches a point where it must pause accepting new processes so that the currently running ones can complete.
    
* Beyond this saturation point, the OS can no longer accommodate new processes in the process queue without degrading performance.
    
* Therefore, our goal is to determine a **threshold** - after how many processes should the scheduler temporarily stop accepting new tasks to prevent system overload.
    
* To better understand this problem, consider the following analogy:
    
    * **Why are NEFT and RTGS transactions processed in batches, while IMPS transactions are processed instantly - even though the underlying transaction data is essentially the same?**
        

## Solution - Batch them at once

* For any running process, we generally encounter two scenarios:
    
    * **Case 1:** The preparation time for the process is negligible (i.e., the overhead is minimal). In this case, we can process tasks immediately as they arrive.
        
    * **Case 2:** There is significant overhead associated with preparing or validating the process (for example, correlating data with other fields before execution).
        
* If there is no overhead, we follow the Case (1) approach: execute processes as they come.
    
* However, if the overhead is substantial—as in Case (2)—then whether we run **100 processes or 1000 processes**, the preparation overhead remains roughly the same.
    
* In such situations, it becomes more efficient to **batch all the operations together** and handle the overhead once rather than repeatedly.
    
* This is exactly why banks use the NEFT/RTGS approach.
    
* **Batching reduces overhead, improves efficiency at scale, and prevents system overload caused by continuous individual requests.**
    

## **Problem — How do we implement batching in Process Scheduling?**

* Before proceeding, let us assume:
    
    * We have a single computer with one CPU, one RAM module, and one storage device.
        
    * We have a list of jobs that need to be executed.
        
* The key question is: **What is the most efficient way to execute these jobs?**
    

## **Solution — Implement First Come First Served (FCFS)**

* For the jobs we want to execute, two major scenarios arise:
    
    * **Case 1: Jobs have little to no preparation overhead.**
        
        * In this situation, we can execute jobs immediately as they arrive.
            
        * This approach is known as the **First Come First Served (FCFS)** scheduling algorithm.
            
    * **Case 2: Jobs have significant preparation overhead, but the overhead does not depend on how many jobs are being processed (assuming the jobs are similar).**
        
        * In this case, it is more efficient to **batch** the jobs and handle the overhead only once.
            
        * After batching, we can run FCFS **on the batch itself**.
            
        * Example: **NEFT and RTGS transactions in banks** — they are processed in bulk to minimize repeated overhead.
            
* There are two ways to perform batching:
    
    * **Method 1: Time-based batching**
        
        * Wait for a fixed time window.
            
        * All processes that arrive during this window are grouped into a batch and executed together.
            
    * **Method 2: Count-based batching**
        
        * Wait until a minimum number of jobs arrive.
            
        * Once the threshold is reached, batch them and execute them at once.
            
* For simplicity, let us choose **Method 1 (time-based batching)**.
    
* Assume a batching window of **15–30 minutes**.
    
* When batching is used, the **order of jobs inside the batch does not matter**, because the entire batch will take the same overhead time and be processed together (e.g., a fixed 10-minute overhead).
    

## **Problem — How to Reduce Average Waiting Time?**

* The main challenge here is: **How do we reduce the average waiting time of all jobs?**
    
* Why do we care about minimizing average waiting time?
    
    * It improves overall user experience.
        
    * It provides a competitive marketing advantage (faster systems feel better).
        
    * The end-user does not understand OS internals — they only perceive how long things “feel.”
        
* Let us consider three jobs with the following execution times:
    
    * **J1:** 5 minutes
        
    * **J2:** 3 minutes
        
    * **J3:** 2 minutes
        
    * **Total execution time:** 10 minutes (this value will remain constant regardless of ordering)
        
* The question now becomes: **How can we reduce the average waiting time of these three jobs?**
    

## **Solution — Implement Shortest Job First (SJF)**

* We can reduce waiting time by reordering the jobs intelligently instead of running them in the order they arrive.
    
* Using the earlier example, consider the following two possible orderings:
    
    * **Ordering 1: J1 → J2 → J3**
        
        * **Waiting times**
            
            * J1 = 0
                
            * J2 = 5
                
            * J3 = 8
                
        * **Average waiting time**
            
            * (0 + 5 + 8) / 3 = 13 / 3 ≈ **4.3 minutes → 4 minutes (approx)**
                
    * **Ordering 2: J2 → J3 → J1**
        
        * **Waiting times**
            
            * J2 = 0
                
            * J3 = 3
                
            * J1 = 5
                
        * **Average waiting time**
            
            * (0 + 3 + 5) / 3 = 8 / 3 ≈ **2.67 minutes → 3 minutes (approx)**
                
* Notice that in both cases, the **total execution time remains the same**: 10 minutes.
    
* But by **simply reordering the jobs**, we significantly reduce the average waiting time.
    
* Therefore, the optimal strategy is to **execute shorter jobs first**.
    
* This leads us to our second scheduling algorithm:
    
    * **Shortest Job First (SJF)**
        
        * An algorithm that minimizes average waiting time by always selecting the job with the shortest execution time.
            
* If larger jobs run first, smaller jobs end up experiencing unnecessarily long waiting times.
    
* Reorder the jobs so that smaller jobs execute first, thereby reducing the waiting time for larger jobs. This is the essence of the Shortest Job First (SJF) algorithm.
    
* But the Shortest Job First has some caveats. Let’s understand them one by one using a question-answer methodology.
    

### **Question 1 - When can SJF reduce the average waiting time?**

### Answer

In **all cases except one**:

When **all jobs require the same amount of time**, every ordering results in the same waiting time.

Otherwise, SJF always reduces the average waiting time.

### **Question 2 - Is there any other scheme that guarantees the shortest average waiting time?**

### Answer

Currently, **none** apart from SJF.

SJF is mathematically proven to produce the **optimal** (minimum possible) average waiting time.

> Take-home assignment:
> 
> Prove that SJF yields the minimum average waiting time among all deterministic scheduling strategies.

### **Question 3 - Can we estimate waiting time without knowing the actual value?**

### Answer

Yes.

We can **estimate** execution time based on **CPU cycles** required by the job.

### **Question 4 - How do we estimate execution time beyond CPU cycles?**

### **Answer**

By counting the total number of CPU **instructions** the job must execute.

### **Question 5 - How do loops and control statements affect this estimation?**

Consider two jobs, **J1** and **J2**, both with:

* Same number of instructions
    
* Same number of loops
    

How do we determine which one will take more time to execute?

### Answer

We **cannot know** without actually running them.

This is fundamentally limited by the **Halting Problem** -

We cannot predict a program’s exact runtime or behavior in all cases without executing it.

Thus, runtime estimation becomes impossible for arbitrary programs.

### **Question 6 - Consider a simpler case:**

Two jobs **J1** and **J2** with:

* Same number of instructions
    
* No loops
    
* No branching
    

Will they take the same time? or different time? or something else will occur?

### **Answer**

**Not guaranteed.**

Runtime can differ due to:

* Presence of I/O instructions
    
* Location of the file being accessed
    
* Type of storage device (SSD, HDD, tape, RAM disk, etc.)
    
* Storage latency and hardware constraints
    

In short: **CPU instruction count alone is not enough** to determine execution time.

## Conclusion

* If **I/O** is involved, predicting execution time becomes uncertain.
    
* That means SJF will only be applicable on operations which are very defined for which we know how much time it will take for the job to complete.
    
* That means for general purpose instructions where time taken to complete the job is not defined, **SJF will fail.**
    
* Therefore. SJF can only be implemented on jobs where time taken to complete is fully defined.
    
* Now for process scheduling we need new algorithm which can satisfy the following cases:
    
    * Case 1: It should not be dependent on waiting time of job.
        
    * Case 2: The overall performance of the algorithm should not be negatively impacted if a job takes too much time to execute (i.e. the waiting time of a job increases).
        
* Only then can an OS handle **general-purpose computing** instead of specialized workloads.
    

## **Coming Up Next**

* In the next lecture, we will study the **Round Robin (RR)** algorithm, its caveats, and approaches to fixing them.
    
* We will then explore how scheduling works in **Modern Operating Systems**, including:
    
    * Feedback loops
        
    * Priority queues
        

## **Additional Context**

* Processes often do not know their exact memory requirements in advance. They receive a fixed **virtual address space**, but must manage it carefully using:
    
    * Memory allocators
        
    * Garbage collectors
        
    * Kernel/user memory boundaries
        
* Additionally, the OS may overcommit memory and must use mechanisms like the **OOM-Killer** to maintain system stability.
    
* The primary goals are:
    
    * Ensuring safe memory usage
        
    * Avoiding unnecessary process termination
