Disclaimer

⚠️ Where the MMU walks, addresses shiver — for it decides which memories live… and which are exiled to the void.

The following content dives deep into the shadows of system memory — where addresses deceive, and pages guard their secrets.

Students and beginners, proceed at your own risk — the MMU remembers everything you do.

In this OS series, the focus will be on the Operating System (software context) components, not the hardware context of the components. (For the hardware context, refer to computer architecture.)

Special Thanks

Heartfelt gratitude to Mr. Adhakshoj Mishra Ji for his insightful session and for reviewing this blog.

A sincere thanks as well to the BreachForce Community Members for sharing their valuable notes, and to the BreachForce Community Volunteers for helping collate and refine this content.

Preface

In the last blog, we explored how the Memory Management Unit (MMU) was born - what challenges it solved, how it reshaped the way systems handle memory, and how it paved the way for powerful abstraction layers that give us fine-grained control over both the CPU and RAM.

Now in Part-2, we’ll explore a series of problems along with their possible solutions. As we introduce these solutions, we’ll inevitably encounter smaller sub-problems—which we’ll resolve using their own mini-solutions. With each refinement, we’ll iteratively evolve and improve our overall design.

One such problem is this: a process might overwrite the abstraction layer tables (page tables) if proper safeguards aren’t implemented. So how do we prevent that from happening?

MMU

Problem: How do we Prevent Accidental Over-write?

In the last blog, we allowed our Meta-Program (the early OS) to configure the address translation tables. But this introduces a serious issue: if the Meta-Program can do it, any other user process could also attempt to modify these tables.
How do we prevent untrusted processes from tampering with the memory translation mechanism?
- ❌ No normal user process can modify MMU tables
  ✔ Only the OS (Meta-Program) can do it safely

Solution

Solution 0: Merge the Abstraction Layer Into the CPU

It is time to do something with the CPU and the Meta-program to solve the above problem.

We will merge the Abstraction Layer (MMU) into the CPU because:
- It is not possible to directly upgrade the CPU.
- The abstraction layer logically sits closer to the CPU than RAM.
- It is easier to give it special interconnect, than fiddling with general purpose interconnect. And less chances of things going wrong

Note: Interconnect is the hardware wiring or communication pathway that links different components inside a CPU or system so they can exchange data and signals.

Therefore, no separate buses for communication between CPU → Abstraction Layer and Abstraction Layer → RAM will be needed

Now, the definition of CPU changed to a CPU Model.

Design Architecture: Modern CPU Model (CPU + Abstraction Layer)

We have finally created a CPU Model which stored the Abstraction Layer (MMU) inside it.

Problem 1: No Separate Bus between CPU and Abstraction Layer

Without using the general-purpose buses, how will the CPU communicate with the Abstraction Layer (MMU)?
We need to have some sort of communication between CPU and Abstraction Layer for the Abstraction Layer to do its job as we are not using the General Purpose Interconnect, because
- Any instruction could possibly access it
- User processes could accidentally or intentionally misuse it
We need to have some special purpose interconnect.

Solution 1: Introduction of Special Purpose Interconnect (SPI)

We have established a special interconnect between the CPU and Abstraction Layer.
With this, the Interconnect has been divided into
- General Purpose Interconnect: A shared, standard data pathway used by the CPU to communicate with memory and most hardware.
- Special-Purpose Interconnect: A private, restricted hardware pathway inside the CPU that normal instructions cannot access.
The SPI lets the CPU talk to the MMU securely
This has again created a new problem for us.

Problem 2: No Special Instructions between CPU and the Abstraction Layer

The SPI is private, existing general-purpose instructions cannot communicate with the Abstraction Layer through the special-purpose interconnect as they don’t know how to use it.
So, we need a clean way to separate normal instructions from those that perform privileged, hardware-level operations.
This raises a design question:

Can we add a dedicated unit inside the CPU that understands, stores, executes, and protects these special instructions?

Solution 2: Introduction of Model Specific Registers (MSR)

To solve this problem, we introduce a new class of registers called Model Specific Registers (MSRs).

MSRs are special registers whose presence, purpose, and count vary from CPU model to CPU model (hence the name Model Specific).

CPU vendors document these in their datasheets, and OS developers must consult these to understand the available MSRs.
These MSRs act as a dedicated interface between the CPU and the Abstraction Layer, and only special-purpose instructions are allowed to read/write them.
To ensure general-purpose instructions cannot accidentally or intentionally modify these sensitive registers, we introduce new special instructions:
- RDMSR - Read data from an MSR
- WRMSR - Write data into an MSR
These special instructions prevent accidental overwrites by user-mode software, and old software continues to run safely, since it has no knowledge of these new instructions.
However, if a malicious or “oversmart” developer tries to use RDMSR or WRMSR inside normal programs, the CPU will:
- Trigger a fault or interrupt,
- Hand control back to the Meta-Program (OS),
- Which can then decide whether to kill the misbehaving process or silently handle it.
- Importantly, we cannot expose these special opcodes as normal memory-mapped instructions - that would require fixing rigid address ranges inside the OS, leading to immense space complexity and unmaintainable designs. We will explore more on this at the end of the blog.

Thus, MSRs and their special instructions together:

Compute privileged values
Store critical configuration data
Load those values back into the CPU
Act as a secure hardware control interface for the Meta-Program (OS)

But once again, adding MSRs introduces a new challenge for us.

Problem 3: How Does the CPU Know Who Is Allowed to Run Special Instructions?

Older software will simply never invoke RDMSR or WRMSR - these instructions didn’t exist back then. So they naturally stay safe.
But what about new modern software that can attempt to execute these special instructions?
How will the CPU differentiate between:
- trusted Meta-Program (which should be allowed), and
- normal user programs (which must be blocked)?

Solution 3: Introduction of Privilege Mode

We cannot base our solution on:
- address ranges
- Hard-coded memory locations
- or page-table tricks
Such approaches create massive space complexity and unmaintainable OS designs.
Instead, we need something simpler and more reliable.
We solve this in three steps:

Step 1: A Flag Inside the CPU to Identify the Meta-Program

We introduce a special privilege flag inside the CPU.
The instruction decoder will check this flag before executing any special-purpose instruction.
If the flag is set → the instruction is allowed
If the flag is unset → the CPU triggers a fault/interrupt
This ensures that:
- General programs cannot execute privileged instructions
- Only the Meta-Program (OS kernel) can
- Misbehaving processes will be immediately terminated or trapped
This is much cleaner than creating fixed memory regions or address-based gating.

Step 2: Use Interrupts to Switch Into the Meta-Program

Only the Meta-Program (OS) can register interrupt handlers.
So when a user program tries to execute RDMSR or WRMSR:
1. The CPU sees the flag is not set
2. The CPU raises an interrupt / trap
3. The interrupt handler registered by the Meta-Program runs
4. The Meta-Program decides whether to:
  - kill the process
  - log it
  - emulate the behavior
  - or deny access
This gives us a natural mechanism for control.

Essentially: Triggering an interrupt = entering the Meta-Program.

This gives full control to the OS at all times.

Step 3: Using the Meta-Program to reset the flag

Whenever the OS switches into kernel code:

set privilege flag = ON

Whenever it returns to user mode:

clear privilege flag = OFF

So,

During context switching:
- Before running kernel code → set the privilege flag
- Before resuming a user process → clear the privilege flag
This ensures:
- Special instructions only run in the Meta-Program
- User processes always run with privilege flag OFF
- The kernel cannot accidentally “leak” privilege into user mode

When the system boots:

The bootloader loads the Meta-Program (OS)
The OS sets the privilege flag
Execution continues with full control
The OS then switches CPU into the appropriate CPU mode
- 8086 → Protected Mode (32-bit)
- Protected Mode → Long Mode (64-bit)

Using the above approach, we just invented Privileged Mode

Categories of Instructions in CPU

By adding this privilege flag check, we have effectively created two types of instructions inside the CPU:
1. General-Purpose Instructions
  - Normal instructions
  - Execute regardless of privilege flag
  - Available in user mode
2. Special-Purpose Instructions
  - Privileged operations (RDMSR, WRMSR, I/O instructions, etc.)
  - Execute only when privilege flag is ON
  - Otherwise trigger an interrupt
This separation is the foundation of user mode vs kernel mode in all modern CPUs.
Special-Purpose instructions can be used to switch from user mode to kernel mode.

Privileged Mode

Putting this all together:
- A privilege-identifying CPU flag
- An instruction decoder that checks the flag
- Interrupts that hand control to the Meta-Program
- Special instructions restricted to privileged code
- Kernel sets/clears the flag during context switches

Congrats, this is Privileged Mode. The CPU now distinguishes between user mode and kernel mode.

User Mode → normal code
Kernel Mode → privileged operations

Even I/O operations are restricted using this same mechanism

We have successfully dealt with the accidental over write problem completely

How Real Systems Implement Privileged Mode

Different architectures expose privilege mode switching through different mechanisms:

32-bit systems
- int 0x80 → switch from user mode → kernel mode
- iret → return from kernel mode → user mode
64-bit systems
- syscall / sysenter → enter privilege mode
- sysret / sysexit → return to user mode

This is exactly how Linux, Windows, BSD, macOS, and every modern OS operate. The privileged mode gave birth to another problem.

Problem: What happens when older Meta-Program boots?

When we introduce MSRs and special instructions like RDMSR and WRMSR, the CPU now expects the Meta-Program (OS) to perform extra setup before these instructions can be safely used:

Initialize MSRs
register interrupt handlers
configure privilege mode
set the privilege flag
prepare CPU control structures

A modern OS understands these requirements and configures everything properly.

But older Meta-Programs:

don’t know MSRs exist
don’t know these new instructions(RDMSR/WRMSR) exist
don’t register the required handlers
don’t configure privilege flags
don’t perform any of the required setup

So if the CPU were to boot directly into the new privileged architecture, an older OS would:

fail instantly
get stuck on an unexpected MSR access
crash due to missing handlers
or fall into undefined behavior

We have 2 options now:

Either we can kiss Old meta-programs good bye and enrage our users.
Or We can try to maintain some backwards compatibility.

Solution: Boot the CPU in Legacy Mode

We have 2 approaches, to solve this problem:

Either we boot the CPU in legacy mode.
Or the Meta-program unintentionally switches to newer mode.

To avoid breaking older Meta-Programs, the CPU must not start directly in the new privileged architecture.

Instead, we maintain backward compatibility by doing the following:

Start the CPU in Legacy Mode, where it behaves exactly like older CPUs.
- In this mode, none of the new privileged features (MSRs, special instructions, privilege checks) are active.
Provide additional configuration options that allow a modern Meta-Program (OS) to intentionally switch the CPU into the newer mode, where:
- privileged vs non-privileged distinction exists
- MSRs become active
- special instructions like RDMSR/WRMSR are enforced
- the privilege flag is checked by the instruction decoder

This design ensures that:

Old Meta-Programs continue to run normally without crashing
Newer Meta-Programs can access and benefit from modern CPU features whenever they choose to enable them
but because we delegated all controls to Interrupts, we now face another problem.

Problem: Any Process Can Trigger Interrupts - How Do We Protect Privileged Handlers?

In our design so far, any process (user-mode or Meta-Program) can execute an interrupt instruction.

But interrupts always jump directly into the Meta-Program (OS), because only the Meta-Program has registered interrupt handlers.

This creates a new risk:

If all interrupts enter the Meta-Program, how do we prevent user processes from triggering sensitive or privileged interrupts?

If we do nothing, a malicious program could:

try to reach MSR-related interrupt handlers
attempt to run privileged sequences
modify CPU configuration
bypass privilege checks
or crash the system

We need a mechanism to decide which interrupt handlers a normal process is allowed to invoke, and which ones must remain exclusive to the Meta-Program.

Solution: Categorizing Interrupt Handlers into Public and Private

To solve this, we divide all interrupt handlers into two categories.

Public Interrupts (Allowed for User Processes)
- These are safe to expose:
- Examples:
  - Normal software interrupts
  - System call entry points
  - Timer notifications
  - Basic, non-dangerous interrupts
- A user-mode process can invoke these, because they do not give access to any privileged CPU state.
- These are how normal programs request OS services.
Private Interrupts (Restricted to the Meta-Program Only)
- These must never be directly triggered by user processes:
- Examples:
  - MSR-related handlers
  - Privileged configuration instructions
  - CPU mode-switching handlers
  - Memory-management and internal CPU traps
- Anything that modifies or configures hardware state
- If a user process tries to access these:
  - The CPU triggers a fault
  - The fault enters the Meta-Program
  - The Meta-Program kills or blocks the process
- This guarantees the privileged parts of the system remain safe.

The CPU + Meta-Program enforce the separation using:

Interrupt categories
Access-control tables
Privilege checks

Different entry gates for user vs kernel interrupts

Even though any process can execute an interrupt instruction, it will reach only the handlers the Meta-Program has allowed, and the CPU will block access to restricted handlers.

User-mode programs can access safe, public interrupt handlers.
Privileged interrupt handlers remain exclusive to the Meta-Program.
Privilege mode and interrupt categories together ensure complete protection.

Additional Concepts

Before understanding the design problem, we need to clarify a few important terms:

Opcode (Operation Code)

An opcode is the machine-level numeric code that tells the CPU which instruction to execute.
Example: RDMSR, WRMSR, ADD, MOV - each has its own binary opcode.

Memory-Mapped Instruction / Memory-Mapped I/O

A design where hardware devices or special registers are assigned addresses inside normal memory space, so software accesses them using regular load/store operations:
```
  mov eax, [0xFFFF_FF10]   ; read from device/register
```
This works for I/O devices, but not for CPU control registers like MSRs.

Page Table

A page table is a data structure used by the MMU to translate virtual addresses → physical addresses.
It defines which parts of memory a program can access.

Per-Process Page Tables

Every process gets its own page table, defining:
- its own private virtual memory
- its own mappings
- its allowed permissions
- what memory it is isolated from
This is how modern OSes ensure process isolation and prevent memory leaks or corruption across processes.
More mappings in page tables → more memory usage → higher space complexity.

Space Complexity (OS Context)

How much total memory the OS must reserve in:

virtual address space
physical memory
each process’s page table

More reserved regions → heavier memory footprint → more complex memory layouts.

Unmaintainable Designs

A design becomes unmaintainable when:

it requires hacks to keep working
consumes too much address space
complicates page tables and process isolation
becomes fragile with new CPU models
is difficult for OS developers to maintain or debug

Problem: Why MSRs Cannot Be Memory-Mapped

If MSRs were exposed as normal memory-mapped registers, the CPU would need to assign them fixed addresses like:

    0xFFFF_FF00 – 0xFFFF_FFFF → MSR region

This immediately creates serious architectural problems:

1. OS Must Reserve Permanent Address Ranges

The OS would be forced to permanently reserve these addresses across:

the kernel’s virtual memory
every process’s page tables (with allow/deny rules)
physical memory layouts

This increases space complexity and pollutes the memory map.

2. Page Tables Become Bloated and Hard to Manage

Every process would need to include these MSR addresses:
- either mapped (for kernel use)
- or marked as forbidden (for user mode)
This makes per-process page tables larger, more complex, and less efficient.
Extra entries → more TLB pressure → performance drop → more kernel bookkeeping → unmaintainable long-term.
So what is TLB pressure?
- The Translation Lookaside Buffer (TLB) is a small, very fast cache inside the CPU that stores recent virtual → physical address translations.
- Without the TLB, every memory access would require walking the entire page table - which is slow.
- When we add extra entries to page tables (like MSR memory regions):
  - The CPU has more translations to remember.
  - The TLB fills up faster.
  - Entries get evicted more often.
  - The CPU has to reload mappings repeatedly.
- This increased load is called TLB pressure.

3. Accidental Access Becomes Common

Any buggy pointer operation like:

    mov eax, [rax + wrongOffset]

might accidentally touch an MSR address and break CPU configuration which is Catastrophic.

4. No Security Boundary

User programs could simply try:

    mov eax, [MSR_ADDRESS]

forcing the CPU to trap on every attempt.

This creates performance overhead and security noise.

5. Hardware Becomes More Complicated

CPU designers would need:
- dedicated address comparators
- privilege checkers
- memory decoders
All of the above just to protect MSR regions in memory.
This makes CPUs slower, larger, and more complex unnecessarily.

Conclusion

Mapping MSRs into normal memory would waste address space, inflate per-process page tables, weaken isolation, complicate CPU hardware, and create an overall unmaintainable design.
Therefore, MSR access must use dedicated special opcodes (RDMSR, WRMSR) that only run in privileged mode.

Why Bare Metal Does Not Work for DOS Anymore

DOS was written for a very specific era of hardware

DOS was designed in the late 1980s and early 1990s for:
- 8086 / 80286 CPUs
- Single-core processors
- Real Mode (no protection)
- No privilege levels
- No multitasking
- Specific I/O ports
- BIOS routines (INT 0x10, INT 0x13, etc.)
- Simple memory layout
- Hardware probing via BIOS
So DOS makes assumptions such as:
- “Video card is available at this I/O port.”
- “Disk can be accessed using BIOS INT 0x13.”
- “Memory layout is under 1 MB.”
- “Interrupts are handled by BIOS.”
- “CPU boots in 8086 Real Mode.”
These assumptions were true at that time.

Modern bare-metal hardware no longer satisfies DOS assumptions

Today’s bare-metal systems:
- boot using UEFI → not BIOS
- start in 64-bit mode (long mode)
- do not expose old I/O ports
- do not emulate BIOS interrupts
- do not provide Real Mode drivers
- use modern bus structures (PCIe, ACPI)
- use protected/privileged mode architecture
So if you boot DOS directly on modern hardware:
- DOS looks for hardware that no longer exists.
- Calls BIOS interrupts that UEFI does not provide.
- Assumes CPU is in real mode (it isn’t).
- Assumes disks respond to old INT 0x13 routines (they don’t).
Result: DOS cannot run on bare-metal UEFI hardware because its fundamental hardware assumptions are broken.

Why DOS can still work (Legacy Mode / VM)

On legacy BIOS systems

Old BIOS motherboards still emulate the environment DOS expects:
- Real Mode
- BIOS interrupts
- Classic I/O ports
- Old memory model
So DOS boots perfectly.

On modern CPUs through Legacy Compatibility Mode

Even new Intel/AMD CPUs still support 8086 Real Mode for compatibility.
The problem is:
- UEFI does NOT provide the BIOS interrupt layer DOS requires.
But if the motherboard includes:
- “CSM mode” (Compatibility Support Module),
- “Legacy Boot” option
Then the system temporarily provides BIOS-like services → DOS works.

On VMs

VMware, VirtualBox, QEMU, DOSBox all emulate:
- BIOS
- INT 0x13 / 0x10
- Real Mode
- ISA/PCI devices
So DOS runs flawlessly.

Why UEFI Replaced BIOS

Limitations of BIOS

BIOS had to:
- probe every device manually
- use 16-bit Real Mode
- operate under 1 MB memory
- depend on slow polling loops
- lacked security
- had no standard for drivers

Limitations Fixed by UEFI

UEFI introduces:
- 32-bit or 64-bit execution
- No probing - hardware reports itself to the UEFI (device discovery)
- Secure Boot (signed bootloaders)
- Chain of Trust
- NVRAM boot entries
- Drivers written in UEFI itself
- Fast booting
- Direct loading of OS kernel (no need for a bootloader in many cases)
UEFI came around 2009–2010 for mainstream PCs.

Why UEFI Doesn’t Support DOS

UEFI never intended to support 1980s software.
- 8086 software is no longer common
- Old BIOS interrupts are not present
- DOS depends entirely on BIOS services that UEFI doesn’t implement
- UEFI expects the OS to handle its own drivers

Command Palette