# Lecture 3 - Reinventing MMU [Part - 2]

### **Disclaimer**

⚠️ **Where the MMU walks, addresses shiver — for it decides which memories live… and which are exiled to the void.**

The following content dives deep into the shadows of system memory — where addresses deceive, and pages guard their secrets.

**Students and beginners**, proceed at your own risk — the **MMU** remembers everything you do.

> In this OS series, the focus will be on the **Operating System** (software context) components, not the hardware context of the components. *(For the hardware context, refer to computer architecture.)*

**Special Thanks**

Heartfelt gratitude to [**Mr. Adhakshoj Mishra Ji**](https://www.linkedin.com/in/adhokshajmishra/) for his insightful session and for reviewing this blog.

A sincere thanks as well to the **BreachForce Community Members** for sharing their valuable notes, and to the **BreachForce Community Volunteers** for helping collate and refine this content.

# Preface

In the [last blog](https://breachforce.net/lecture-2-reinventing-mmu-part-1), we explored how the Memory Management Unit (MMU) was born - what challenges it solved, how it reshaped the way systems handle memory, and how it paved the way for powerful abstraction layers that give us fine-grained control over both the CPU and RAM.

Now in Part-2, we’ll explore a series of problems along with their possible solutions. As we introduce these solutions, we’ll inevitably encounter smaller sub-problems—which we’ll resolve using their own mini-solutions. With each refinement, we’ll iteratively evolve and improve our overall design.

One such problem is this: **a process might overwrite the abstraction layer tables (page tables) if proper safeguards aren’t implemented. So how do we prevent that from happening?**

# MMU

## Problem: How do we Prevent Accidental Over-write?

* In the [last blog](https://breachforce.net/lecture-2-reinventing-mmu-part-1), we allowed our Meta-Program (the early OS) to configure the address translation tables. But this introduces a serious issue: if the Meta-Program can do it, **any other user process could also attempt to modify these tables.**
    
* How do we prevent untrusted processes from tampering with the memory translation mechanism?
    
    * ❌ No normal user process can modify MMU tables  
        ✔ Only the OS (Meta-Program) can do it safely
        

## Solution

### Solution 0: Merge the Abstraction Layer Into the CPU

It is time to do something with the CPU and the Meta-program to solve the above problem.

* We will merge the Abstraction Layer (MMU) into the CPU because:
    
    * It is not possible to directly upgrade the CPU.
        
    * The abstraction layer logically sits closer to the CPU than RAM.
        
    * It is easier to give it special interconnect, than fiddling with general purpose interconnect. And less chances of things going wrong
        

> Note: **Interconnect is the hardware wiring or communication pathway that links different components inside a CPU or system so they can exchange data and signals.**

Therefore, no separate buses for communication between CPU → Abstraction Layer and Abstraction Layer → RAM will be needed

* Now, the definition of CPU changed to a CPU Model.
    

### Design Architecture: Modern CPU Model (CPU + Abstraction Layer)

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1763233272029/c1885fb6-b3b7-4455-a3db-b2b04ede86ba.png align="center")

* We have finally created a CPU Model which stored the Abstraction Layer (MMU) inside it.
    

### Problem 1: No Separate Bus between CPU and Abstraction Layer

* Without using the general-purpose buses, how will the CPU communicate with the Abstraction Layer (MMU)?
    
* We need to have some sort of communication between CPU and Abstraction Layer for the Abstraction Layer to do its job as we are not using the General Purpose Interconnect, because
    
    * Any instruction could possibly access it
        
    * User processes could accidentally or intentionally misuse it
        
* We need to have some special purpose interconnect.
    

### Solution 1: Introduction of Special Purpose Interconnect (SPI)

* We have established a special interconnect between the CPU and Abstraction Layer.
    
* With this, the Interconnect has been divided into
    
    * **General Purpose Interconnect**: A shared, standard data pathway used by the CPU to communicate with memory and most hardware.
        
    * **Special-Purpose Interconnect**: A private, restricted hardware pathway inside the CPU that normal instructions cannot access.
        
* The SPI lets the CPU talk to the MMU securely
    
* This has again created a new problem for us.
    

### Problem 2: No Special Instructions between CPU and the Abstraction Layer

* The SPI is private, existing **general-purpose instructions** cannot communicate with the Abstraction Layer through the **special-purpose interconnect** as they don’t know how to use it.
    
* So, we need a clean way to **separate normal instructions** from those that perform privileged, hardware-level operations.
    
* This raises a design question:
    
    **Can we add a dedicated unit inside the CPU that understands, stores, executes, and protects these special instructions?**
    

### Solution 2: Introduction of Model Specific Registers (MSR)

To solve this problem, we introduce a new class of registers called **Model Specific Registers (MSRs)**.

* MSRs are special registers whose presence, purpose, and count **vary from CPU model to CPU model** (hence the name *Model Specific*).
    
    CPU vendors document these in their **datasheets**, and OS developers must consult these to understand the available MSRs.
    
* These MSRs act as a dedicated interface between the CPU and the Abstraction Layer, and only **special-purpose instructions** are allowed to read/write them.
    
* To ensure general-purpose instructions cannot accidentally or intentionally modify these sensitive registers, we introduce **new special instructions**:
    
    * **RDMSR** - Read data from an MSR
        
    * **WRMSR** - Write data into an MSR
        
* These special instructions prevent accidental overwrites by user-mode software, and old software continues to run safely, since it has **no knowledge of these new instructions**.
    
* However, if a malicious or “oversmart” developer tries to use `RDMSR` or `WRMSR` inside normal programs, the CPU will:
    
    * Trigger a **fault or interrupt**,
        
    * Hand control back to the Meta-Program (OS),
        
    * Which can then decide whether to kill the misbehaving process or silently handle it.
        
    * Importantly, we cannot expose these special opcodes as normal memory-mapped instructions - that would require fixing rigid address ranges inside the OS, leading to immense **space complexity** and unmaintainable designs. We will explore more on this at the end of the blog.
        
    
    Thus, MSRs and their special instructions together:
    
    * Compute privileged values
        
    * Store critical configuration data
        
    * Load those values back into the CPU
        
    * Act as a secure hardware control interface for the Meta-Program (OS)
        
    
    But once again, adding MSRs introduces a new challenge for us.
    

### **Problem 3: How Does the CPU Know Who Is Allowed to Run Special Instructions?**

* Older software will simply never invoke `RDMSR` or `WRMSR` - these instructions didn’t exist back then. So they naturally stay safe.
    
* But what about **new modern software** that *can* attempt to execute these special instructions?
    
* How will the CPU differentiate between:
    
    * trusted Meta-Program (which should be allowed), and
        
    * normal user programs (which must be blocked)?
        

### Solution 3: Introduction of Privilege Mode

* We cannot base our solution on:
    
    * address ranges
        
    * Hard-coded memory locations
        
    * or page-table tricks
        
* Such approaches create massive space complexity and unmaintainable OS designs.
    
* Instead, we need something simpler and more reliable.
    
* We solve this in three steps:
    

**Step 1: A Flag Inside the CPU to Identify the Meta-Program**

* We introduce a **special privilege flag** inside the CPU.
    
* The instruction decoder will check this flag before executing any special-purpose instruction.
    
* If the flag is **set** → the instruction is allowed
    
* If the flag is **unset** → the CPU triggers a **fault/interrupt**
    
* This ensures that:
    
    * General programs cannot execute privileged instructions
        
    * Only the Meta-Program (OS kernel) can
        
    * Misbehaving processes will be immediately terminated or trapped
        
* This is much cleaner than creating fixed memory regions or address-based gating.
    

**Step 2: Use Interrupts to Switch Into the Meta-Program**

* Only the Meta-Program (OS) can register interrupt handlers.
    
* So when a user program tries to execute `RDMSR` or `WRMSR`:
    
    1. The CPU sees the flag is **not set**
        
    2. The CPU raises an **interrupt / trap**
        
    3. The interrupt handler registered by the Meta-Program runs
        
    4. The Meta-Program decides whether to:
        
        * kill the process
            
        * log it
            
        * emulate the behavior
            
        * or deny access
            
* This gives us a natural mechanism for control.
    

Essentially: Triggering an interrupt = entering the Meta-Program.

This gives full control to the OS at all times.

**Step 3: Using the Meta-Program to reset the flag**

Whenever the OS switches into kernel code:

```plaintext
set privilege flag = ON
```

Whenever it returns to user mode:

```plaintext
clear privilege flag = OFF
```

So,

* During context switching:
    
    * Before running kernel code → **set the privilege flag**
        
    * Before resuming a user process → **clear the privilege flag**
        
* This ensures:
    
    * Special instructions only run in the Meta-Program
        
    * User processes always run with privilege flag OFF
        
    * The kernel cannot accidentally “leak” privilege into user mode
        

When the system boots:

* The bootloader loads the Meta-Program (OS)
    
* The OS sets the privilege flag
    
* Execution continues with full control
    
* The OS then switches CPU into the appropriate CPU mode
    
    * 8086 → Protected Mode (32-bit)
        
    * Protected Mode → Long Mode (64-bit)
        

> Using the above approach, we just invented **Privileged Mode**

### Categories of Instructions in CPU

* By adding this privilege flag check, we have effectively created **two types of instructions** inside the CPU:
    
    1. **General-Purpose Instructions**
        
        * Normal instructions
            
        * Execute regardless of privilege flag
            
        * Available in user mode
            
    2. **Special-Purpose Instructions**
        
        * Privileged operations (`RDMSR`, `WRMSR`, I/O instructions, etc.)
            
        * Execute **only** when privilege flag is ON
            
        * Otherwise trigger an interrupt
            
* This separation is the foundation of **user mode vs kernel mode** in all modern CPUs.
    
* Special-Purpose instructions can be used to switch from user mode to kernel mode.
    

### Privileged Mode

* Putting this all together:
    
    * A privilege-identifying CPU flag
        
    * An instruction decoder that checks the flag
        
    * Interrupts that hand control to the Meta-Program
        
    * Special instructions restricted to privileged code
        
    * Kernel sets/clears the flag during context switches
        

Congrats, this is **Privileged Mode**. The CPU now distinguishes between *user mode* and *kernel mode*.

* **User Mode** → normal code
    
* **Kernel Mode** → privileged operations
    

> Even I/O operations are restricted using this same mechanism

We have successfully dealt with the accidental over write problem completely

### How Real Systems Implement Privileged Mode

Different architectures expose privilege mode switching through different mechanisms:

* **32-bit systems**
    
    * `int 0x80` → switch from user mode → kernel mode
        
    * `iret` → return from kernel mode → user mode
        
* **64-bit systems**
    
    * `syscall` / `sysenter` → enter privilege mode
        
    * `sysret` / `sysexit` → return to user mode
        
    
    This is exactly how Linux, Windows, BSD, macOS, and every modern OS operate. The privileged mode gave birth to another problem.
    

### Problem: What happens when older Meta-Program boots?

When we introduce MSRs and special instructions like `RDMSR` and `WRMSR`, the CPU now expects the Meta-Program (OS) to perform **extra setup** before these instructions can be safely used:

* Initialize MSRs
    
* register interrupt handlers
    
* configure privilege mode
    
* set the privilege flag
    
* prepare CPU control structures
    

A modern OS understands these requirements and configures everything properly.

But older Meta-Programs:

* don’t know MSRs exist
    
* don’t know these new instructions(`RDMSR`/`WRMSR`) exist
    
* don’t register the required handlers
    
* don’t configure privilege flags
    
* don’t perform *any* of the required setup
    

So if the CPU were to boot directly into the new privileged architecture, an older OS would:

* fail instantly
    
* get stuck on an unexpected MSR access
    
* crash due to missing handlers
    
* or fall into undefined behavior
    

We have 2 options now:

* Either we can kiss Old meta-programs good bye and enrage our users.
    
* Or We can try to maintain some backwards compatibility.
    

### Solution: Boot the CPU in Legacy Mode

We have 2 approaches, to solve this problem:

* Either we boot the CPU in **legacy mode**.
    
* Or the Meta-program unintentionally switches to **newer mode**.
    

To avoid breaking older Meta-Programs, the CPU must not start directly in the new privileged architecture.

Instead, we **maintain backward compatibility** by doing the following:

* **Start the CPU in Legacy Mode**, where it behaves exactly like older CPUs.
    
    * In this mode, none of the new privileged features (MSRs, special instructions, privilege checks) are active.
        
* **Provide additional configuration options** that allow a modern Meta-Program (OS) to intentionally switch the CPU into the newer mode, where:
    
    * privileged vs non-privileged distinction exists
        
    * MSRs become active
        
    * special instructions like RDMSR/WRMSR are enforced
        
    * the privilege flag is checked by the instruction decoder
        

This design ensures that:

* **Old Meta-Programs continue to run normally** without crashing
    
* **Newer Meta-Programs can access and benefit from modern CPU features** whenever they choose to enable them
    
* but because we delegated all controls to Interrupts, we now face another problem.
    

### Problem: Any Process Can Trigger Interrupts - How Do We Protect Privileged Handlers?

In our design so far, any process (user-mode or Meta-Program) can execute an interrupt instruction.

But interrupts always jump directly into the Meta-Program (OS), because only the Meta-Program has registered interrupt handlers.

This creates a new risk:

* If all interrupts enter the Meta-Program, how do we prevent user processes from triggering sensitive or privileged interrupts?
    

If we do nothing, a malicious program could:

* try to reach MSR-related interrupt handlers
    
* attempt to run privileged sequences
    
* modify CPU configuration
    
* bypass privilege checks
    
* or crash the system
    

We need a mechanism to decide **which interrupt handlers a normal process is allowed to invoke**, and which ones must remain **exclusive to the Meta-Program**.

### Solution: Categorizing Interrupt Handlers into Public and Private

To solve this, we divide all interrupt handlers into two categories.

* **Public Interrupts (Allowed for User Processes)**
    
    * These are safe to expose:
        
    * Examples:
        
        * Normal software interrupts
            
        * System call entry points
            
        * Timer notifications
            
        * Basic, non-dangerous interrupts
            
    * A user-mode process *can* invoke these, because they do not give access to any privileged CPU state.
        
    * These are how normal programs request OS services.
        
* **Private Interrupts (Restricted to the Meta-Program Only)**
    
    * These must **never** be directly triggered by user processes:
        
    * Examples:
        
        * MSR-related handlers
            
        * Privileged configuration instructions
            
        * CPU mode-switching handlers
            
        * Memory-management and internal CPU traps
            
    * Anything that modifies or configures hardware state
        
    * If a user process tries to access these:
        
        * The CPU triggers a fault
            
        * The fault enters the Meta-Program
            
        * The Meta-Program kills or blocks the process
            
    * This guarantees the privileged parts of the system remain safe.
        

The CPU + Meta-Program enforce the separation using:

* Interrupt categories
* Access-control tables
* Privilege checks

Different entry gates for user vs kernel interrupts
    
Even though *any* process can execute an interrupt instruction, it will reach **only the handlers the Meta-Program has allowed**, and the CPU will **block access** to restricted handlers.
    
* User-mode programs can access safe, public interrupt handlers.
    
* Privileged interrupt handlers remain exclusive to the Meta-Program.
    
* Privilege mode and interrupt categories together ensure complete protection.
    
</br>

# Additional Concepts

### Important Concepts related to the Problem - Why MSRs Cannot Be Memory-Mapped

Before understanding the design problem, we need to clarify a few important terms:

**Opcode (Operation Code)**

* An **opcode** is the machine-level numeric code that tells the CPU which instruction to execute.
    
* Example: `RDMSR`, `WRMSR`, `ADD`, `MOV` - each has its own binary opcode.
    
</br>
**Memory-Mapped Instruction / Memory-Mapped I/O**

* A design where hardware devices or special registers are assigned **addresses inside normal memory space**, so software accesses them using regular load/store operations:
    
    ```c
    mov eax, [0xFFFF_FF10]   ; read from device/register
    ```
    
* This works for I/O devices, but not for CPU control registers like MSRs.
    
</br>
**Page Table**

* A **page table** is a data structure used by the MMU to translate **virtual addresses → physical addresses**.
    
* It defines which parts of memory a program can access.
    
</br>

**Per-Process Page Tables**

* Every process gets **its own page table**, defining:
    
    * its own private virtual memory
        
    * its own mappings
        
    * its allowed permissions
        
    * what memory it is isolated from
        
* This is how modern OSes ensure process isolation and prevent memory leaks or corruption across processes.
    
* More mappings in page tables → more memory usage → higher **space complexity**.
    
</br>

**Space Complexity (OS Context)**

How much total memory the OS must reserve in:
    * virtual address space
    * physical memory
    * each process’s page table

More reserved regions → heavier memory footprint → more complex memory layouts.
    
</br>
**Unmaintainable Designs**

A design becomes unmaintainable when:
    * it requires hacks to keep working
        
    * consumes too much address space
        
    * complicates page tables and process isolation
        
    * becomes fragile with new CPU models
        
    * is difficult for OS developers to maintain or debug
        

### Problem: Why MSRs Cannot Be Memory-Mapped

If MSRs were exposed as normal memory-mapped registers, the CPU would need to assign them fixed addresses like:
    
    ```c
    0xFFFF_FF00 – 0xFFFF_FFFF → MSR region
    ```
    
This immediately creates serious architectural problems:
    

**1\. OS Must Reserve Permanent Address Ranges**

The OS would be forced to permanently reserve these addresses across:
    
    * the kernel’s virtual memory
        
    * every process’s page tables (with allow/deny rules)
        
    * physical memory layouts
        
This increases **space complexity** and pollutes the memory map.
    
</br>
**2\. Page Tables Become Bloated and Hard to Manage**

* Every process would need to include these MSR addresses:
    
    * either mapped (for kernel use)
        
    * or marked as forbidden (for user mode)
        
* This makes **per-process page tables larger**, more complex, and less efficient.
    
* Extra entries → more TLB pressure → performance drop → more kernel bookkeeping → **unmaintainable long-term.**
    
* **So what is TLB pressure?**
    
    * The **Translation Lookaside Buffer (TLB)** is a small, very fast cache inside the CPU that stores recent **virtual → physical address translations**.
        
    * Without the TLB, every memory access would require walking the entire page table - which is slow.
        
    * When we add **extra entries** to page tables (like MSR memory regions):
        
        * The CPU has more translations to remember.
            
        * The TLB fills up faster.
            
        * Entries get evicted more often.
            
        * The CPU has to reload mappings repeatedly.
            
    * This increased load is called **TLB pressure**.
        
</br>
**3\. Accidental Access Becomes Common**

Any buggy pointer operation like:
    
    ```c
    mov eax, [rax + wrongOffset]
    ```
    
might accidentally touch an MSR address and break CPU configuration which is Catastrophic.
    
</br>
**4\. No Security Boundary**

User programs could simply try:
    
    ```c
    mov eax, [MSR_ADDRESS]
    ```
    
forcing the CPU to trap on every attempt.
    
* This creates performance overhead and security noise.
    
</br>
**5\. Hardware Becomes More Complicated**

* CPU designers would need:
    
    * dedicated address comparators
        
    * privilege checkers
        
    * memory decoders
        
* All of the above just to protect MSR regions in memory.
    
* This makes CPUs slower, larger, and more complex unnecessarily.
    

### **Conclusion**

* Mapping MSRs into normal memory would waste address space, inflate per-process page tables, weaken isolation, complicate CPU hardware, and create an overall unmaintainable design.
    
* Therefore, MSR access must use **dedicated special opcodes** (RDMSR, WRMSR) that only run in privileged mode.
    
</br>
## **Why Bare Metal Does Not Work for DOS Anymore**

### **DOS was written for a very specific era of hardware**

* DOS was designed in the late 1980s and early 1990s for:
    
    * 8086 / 80286 CPUs
        
    * Single-core processors
        
    * Real Mode (no protection)
        
    * No privilege levels
        
    * No multitasking
        
    * Specific I/O ports
        
    * BIOS routines (INT 0x10, INT 0x13, etc.)
        
    * Simple memory layout
        
    * Hardware probing via BIOS
        
* So DOS makes *assumptions* such as:
    
    * “Video card is available at this I/O port.”
        
    * “Disk can be accessed using BIOS INT 0x13.”
        
    * “Memory layout is under 1 MB.”
        
    * “Interrupts are handled by BIOS.”
        
    * “CPU boots in 8086 Real Mode.”
        
* These assumptions were **true** at that time.
    

### **Modern bare-metal hardware no longer satisfies DOS assumptions**

* Today’s bare-metal systems:
    
    * boot using UEFI → not BIOS
        
    * start in 64-bit mode (long mode)
        
    * do not expose old I/O ports
        
    * do not emulate BIOS interrupts
        
    * do not provide Real Mode drivers
        
    * use modern bus structures (PCIe, ACPI)
        
    * use protected/privileged mode architecture
        
* So if you boot DOS directly on modern hardware:
    
    * DOS looks for hardware that no longer exists.
        
    * Calls BIOS interrupts that UEFI does not provide.
        
    * Assumes CPU is in real mode (it isn’t).
        
    * Assumes disks respond to old INT 0x13 routines (they don’t).
        
* Result: DOS cannot run on bare-metal UEFI hardware because its fundamental hardware assumptions are broken.
    
</br>
## **Why DOS can still work (Legacy Mode / VM)**

### On legacy BIOS systems

* Old BIOS motherboards still emulate the environment DOS expects:
    
    * Real Mode
        
    * BIOS interrupts
        
    * Classic I/O ports
        
    * Old memory model
        
* So DOS boots perfectly.
    

### On modern CPUs through Legacy Compatibility Mode

* Even new Intel/AMD CPUs still support **8086 Real Mode** for compatibility.
    
* The problem is:
    
    * **UEFI does NOT provide the BIOS interrupt layer DOS requires.**
        
* But if the motherboard includes:
    
    * “CSM mode” (Compatibility Support Module),
        
    * “Legacy Boot” option
        
* Then the system temporarily provides BIOS-like services → DOS works.
    

### On VMs

* VMware, VirtualBox, QEMU, DOSBox all **emulate**:
    
    * BIOS
        
    * INT 0x13 / 0x10
        
    * Real Mode
        
    * ISA/PCI devices
        
* So DOS runs flawlessly.
    
</br>
## **Why UEFI Replaced BIOS**

### Limitations of BIOS

* BIOS had to:
    
    * probe every device manually
        
    * use 16-bit Real Mode
        
    * operate under 1 MB memory
        
    * depend on slow polling loops
        
    * lacked security
        
    * had no standard for drivers
        

### Limitations Fixed by UEFI

* UEFI introduces:
    
    * 32-bit or 64-bit execution
        
    * No probing - hardware reports itself to the UEFI (device discovery)
        
    * Secure Boot (signed bootloaders)
        
    * Chain of Trust
        
    * NVRAM boot entries
        
    * Drivers written in UEFI itself
        
    * Fast booting
        
    * Direct loading of OS kernel (no need for a bootloader in many cases)
        
* UEFI came around **2009–2010** for mainstream PCs.
    
</br>
## **Why UEFI Doesn’t Support DOS**

* UEFI never intended to support 1980s software.
    
    * 8086 software is no longer common
        
    * Old BIOS interrupts are not present
        
    * DOS depends entirely on BIOS services that UEFI doesn’t implement
        
    * UEFI expects the OS to handle its own drivers