Lecture 2 - Reinventing MMU [Part - 1]
![Lecture 2 - Reinventing MMU [Part - 1]](/_next/image?url=https%3A%2F%2Fcdn.hashnode.com%2Fres%2Fhashnode%2Fimage%2Fupload%2Fv1762492010507%2F7db0bf79-265d-41bc-990f-0cb2c68e61f2.jpeg&w=3840&q=75)
Disclaimer
⚠️ In the depths of memory lies madness — where pages blur, addresses warp, and only the brave dare to map reality.
The following content dives deep into the shadows of system memory — where addresses deceive, and pages guard their secrets.
Students and beginners, proceed at your own risk — the MMU remembers everything you do.
In this OS series, the focus will be on the Operating System (software context) components, not the hardware context of the components. (For the hardware context, refer to computer architecture.)
Special Thanks
Heartfelt gratitude to Mr. Adhakshoj Mishra Ji for his insightful session and for reviewing this blog.
A sincere thanks as well to the BreachForce Community Members for sharing their valuable notes, and to the BreachForce Community Volunteers for helping collate and refine this content.
Preface
In the last blog, we explored how the Context Switching Mechanism allows the CPU to alternate between processes, creating the illusion of multitasking on a single processor.
In the below blog, we will discuss the problems along with their possible solutions. With those solutions, there will occur some min-problems which will be solved using some sort of mini-solutions. According to those solutions, we will modify the design
MMU
Design Architecture 1: CPU + RAM
Design
We have two processes running using the same RAM as per the context-switching process. The assumption here is that these processes will not be continuous. We will insert some chunks of RAM that are not equally sized partitions. If multiple processes are loaded, the RAM will look like this:

In the above design, the assumption is that the RAM will run one program at a time.
For some context: assume that the execution of Process 2 is in progress. Process 2 will consume the entire RAM while it is being executed.
This will lead to some problems
Problem
Is there any option to limit RAM usage per process, so that one process uses only a certain percentage of RAM?
If two processes are executed simultaneously and one process encounters an issue, the other process also gets impacted.
We will assume that there is a flag inside the RAM to check whether a particular part of the RAM is occupied by a process or not.
What’s the probability that another process wouldn’t access that particular part? Who ensures that such things don’t happen?
We cannot leave everything to an honor system between programs, can we?
If a process goes rogue, the OS and all other processes can go kaboom — corrupted code, overwritten data, and unstable execution everywhere.
(Classic overflow and overwrite chaos — oof.)
How can we prevent such things from happening?
Solution
There’s no software-based solution for this — because, well, software alone can’t handle it (or maybe because software designers got lazy 😏).
So, back to hardware we go.
The solution is that we will insert an abstraction layer between CPU and RAM.
From CPU’s POV, it will look like RAM, it will feel like RAM, it will be behave like RAM; but technically it will not be RAM
The abstraction layer will
Pretend to be RAM in front of CPU.
Pretend to be CPU in front of RAM.
The CPU does not need to know what is actually connected to the abstracted RAM
We will insert a circuit (abstraction layer) which will pretend to be RAM inside the CPU (physically).
The CPU and RAM will be connected using a data bus and an address bus, which will be understood by both the CPU and the RAM.
This marks the birth of the MMU (Memory Management Unit) inside the CPU (as of 2025).
Address Bus
The address bus carries the memory address of the data that the CPU wants to read from or write to in RAM (or other memory-mapped devices).
It is unidirectional i.e. addresses flow from the CPU to the memory.
The width of the address bus (e.g., 32-bit or 64-bit) determines how many unique memory locations the CPU can address.
- Example: a 32-bit address bus → 2³² = 4 GB addressable memory space.
The address bus is used by the CPU to specify where in memory (RAM) data should be accessed.
Data Bus
The data bus carries the actual data being transferred between the CPU, RAM, and other components.
It is bidirectional i.e. data can flow to or from the CPU, depending on whether it’s a read or write operation.
The width of the data bus (e.g., 8, 16, 32, 64 bits) determines how many bits of data can be transferred in one operation.
The data bus is used to transfer what data is being read or written.
This marks the birth of the MMU (Memory Management Unit) inside the CPU (as of 2025).
Design Architecture 2: CPU + Abstraction Layer + RAM
Design
Based on the above proposed solution, our design will look like this:

For the sake of easy explanation, the abstraction layer (MMU) will be considered as a block between CPU and RAM in the proposed design.
In reality, however, it exists as a circuit within the CPU hardware.
The CPU and RAM will be connected using a data bus and an address bus, which will be understood by both the CPU and the RAM.
The Abstraction Layer will act as a channel through which we can monitor the addresses and data sent by CPU to RAM.
At present, there would not be any modification done by the Abstraction Layer in address or data passed through it.
There might be some lag issue in the CPU → RAM communication channels due to the current design; but we will discuss it in the later stages.
But, the earlier problem still persists.
Problem
- How will we enforce controls on RAM using Abstraction Layer?
Solution
We will use a specific memory address range which will be used by the processes.
The Abstraction Layer will ensure that processes will use a specific range of memory addresses only; something like a lookup table
Each process is told it owns a continuous block of memory (like 0x1000 to 0xFFFF).
But in reality, those addresses don’t directly map to physical RAM.
The Abstraction Layer (MMU) keeps a lookup table that translates each process’s “virtual” address to the “physical” address in RAM.
This isolation ensures:
Process A cannot access Process B’s memory.
If a process crashes, it can’t corrupt the OS or others.
The OS can move memory around physically without processes noticing.
Virtual Address
A virtual address is the address seen and used by a program (process).
When your code says:
int *x = malloc(4);and it returns something like
0x1000, that’s a virtual address.It’s virtual because that address does not correspond directly to a physical location in RAM.
Instead, the Abstraction Layer (MMU) will later translate that address to a physical address before accessing RAM.
Think of it like:
Virtual address = “apartment number” inside a building
Physical address = “street address” of that building
Each tenant (process) has its own set of apartment numbers, even if two tenants both have “Apartment 101” — they’re in different buildings (separate address spaces).
Physical Address
A physical address is the actual address of data in RAM — the one used on the hardware memory chips.
It’s where the data is truly stored in memory cells.
Only the Abstraction Layer (MMU) and OS kernel deal with these directly.
So, for example:
CPU asks for 0x1000 (virtual address).
MMU translates it (using lookup table) to 0x2000 (physical address).
Data is fetched from RAM location 0x2000.
Lookup Table
A lookup table will look like this
| Virtual Address (what CPU/process uses) | Physical Address (actual RAM) | Meaning | | --- | --- | --- | | 0x1000 | 0x2000 | Data at 0x1000 in program → actually stored at 0x2000 in RAM | | 0x1500 | 0x3000 | Data at 0x1500 → actually stored at 0x3000 | | any other address | address + 0x2000 | For any address not listed, just shift by 0x2000 |
This is a lookup table (LUT) or mapping table that defines how addresses are translated.
So, the MMU (Abstraction Layer) uses this table to perform translations on the fly whenever the CPU accesses memory.
Using lookup table we can divert the processes to different sections in the RAM.
Working of Abstraction Layer
Process runs and tries to access virtual address
0x1000CPU sends
0x1000to Abstraction Layer (MMU) via address busAbstraction Layer (MMU) looks up
0x1000in its table:- Finds it → maps to
0x2000
- Finds it → maps to
Abstraction Layer (MMU) sends
0x2000(physical) to RAMRAM returns the actual data
Now the Abstraction Layer (MMU) has the following capabilities:
Pretend to be RAM in front of CPU.
Pretend to be CPU in front of RAM.
Has capability to map virtual memory addresses to physical memory addresses based on internal lookup table
The current design may look proper on the surface but it has another flaw which we will discuss further
Design Architecture 3: Integration of I/O Port/Bus and RAM within Abstraction Layer (MMU)
Problem
How will the lookup table be configured by the CPU?
Is there any way to store lookup tables inside the Abstraction Layer? Because we do not know how big the lookup table will be, how many addresses it will contain, how large the connected RAM will be, or how many offsets there will be.
Is there any way to store any temporary data output of the CPU calculation logic?
How will we ensure that older software runs on the current hardware architecture? How will we maintain backward compatibility for older software?
Solution
Addition of an I/O port/bus from the CPU to the Abstraction Layer (MMU).
Insertion of dedicated RAM inside the Abstraction Layer (MMU).
Let’s modify the design based on the proposed solution
Design

In the current design, the I/O Port will configure the lookup table before execution. We did this because we don’t want to magically configure the RAM directly.
Personal RAM of Abstraction Layer (MMU) will store current states before sending them to the physical RAM. It will also store multiple lookup-tables for multiple processes.
Working of Abstraction Layer
If someone executes old software on the current hardware configuration, it will run smoothly since the lookup table will be blank initially. The hardware is designed in such a way that requests will pass through without affecting existing software or requiring any modifications to it.
We can also configure the lookup table using the I/O port when executing any new software.
If there is any problem while running the new software, the fault lies within the Meta-Program for messing up the Lookup Table configuration.
If we are calculating instead of performing one-by-one address mapping, the calculation logic used for translation will still need memory to store the temporary data output of the computation. The Personal RAM of Abstraction Layer (MMU) can be used for this.
The problem of running old software on current hardware configurations is solved.
The problem of accidental overwrites is also solved.
Now the Abstraction Layer (MMU) has the following capabilities:
Pretends to be RAM in front of the CPU.
Pretends to be the CPU in front of RAM.
Has the capability to map virtual memory addresses to physical memory addresses based on an internal lookup table.
If the table is not configured, the Abstraction Layer simply passes through all memory I/O requests and responses.
Maintains backward compatibility — old software can still run on it just fine.
Can be configured for address mapping.
Can hold multiple lookup tables.
Can switch between multiple lookup tables — the exact table can be specified over the I/O port. Only one lookup table will be active at a given point in time.
Lookup table switching will occur during context switching.
Limitations of Current Architecture
- Only one lookup table will be active at a given point of time.
Design Architecture 3: Integration of Interrupt Pin into Abstraction Layer
Problem
What happens when an address gets modified into an invalid address?
Example:
Mapping: address → address + 5000; with total memory attached = 8000 bytes
CPU attempts to read from address 4000
Total address value = 9000 bytes > 8000 bytes (current RAM)
What happens then?
Solution
Use interrupt to switch control from the CPU to a meta program if there is an invalid read or write operation.
Connect Interrupt pin of the CPU to the Abstraction Layer.
Design
Based on the proposed solution, we will modify the design

- The Interrupt PIN has been passed from CPU to the Abstraction Layer
Working of Abstraction Layer
When invalid memory access occurs, an interrupt is triggered on the CPU.
We can use this interrupt to switch control from the CPU to a meta program if there is an invalid read or write operation on a memory address that is out of bounds.
For that, the interrupt pin of the CPU will be connected to the Abstraction Layer.
However, this introduces another problem.
Problem
- What happens after delegating control to the meta program? What will the meta program do?
Solution
After switching control to the meta program, it will terminate the corrupted process.
It will stop the program’s execution.
The developer will then investigate the issue - it’s no longer our concern. Give them the BSOD. Let them suffer 🔥
We have another potential feature of the current design architecture:
We can restrict access to specific memory ranges.
For example, the meta-program address range can be kept out of bounds by defining a permissible access range.
This ensures that not only the addresses of individual processes are isolated, but the meta-program address space is isolated as well.
There is another problem now.
Problem
A process might overwrite the abstraction layer tables if care is not taken. How do we solve this?
We will learn to solve this in the next lecture of this series
Additional Topics
Unified Memory
This is Hardware-Level Memory Architecture, not just logical abstraction.
It’s used in modern CPUs and GPUs (like Apple M-series, AMD APU, Intel integrated graphics, and NVIDIA’s Unified Memory with CUDA).
Normally, CPU and GPU have separate RAM:
CPU → System RAM
GPU → VRAM
Unified Memory combines them into one shared pool.
Both CPU and GPU can access the same physical memory directly.
Example:
GPU reads that same data without copying to VRAM.
The CPU computes something, writes to memory.
Virtual Addressing
On a 32-bit system, each process usually gets a 2 GB virtual address space (user space).
On a 64-bit system, each process can have a much larger virtual address space (4 GB or more depending on OS).
This is the limit OS generally sets though it will use memory address space depending on its usage.
Buffer Over Flow
(BoF) is a user-space problem, not a kernel or hardware one.
We will further understand why BoF is a user-space problem and a not kernel space one.
User Space vs Kernel Space
The OS divides memory into two big zones:
| Zone | Who lives here | What it does | | --- | --- | --- | | Kernel Space | OS core, device drivers | Controls hardware, MMU, process scheduling | | User Space | Your programs (processes) | Runs application code, isolated from kernel |
When your program runs, the MMU + OS give it its own virtual address space
E.g.
0x00000000→0x3FFFFFFF(let’s say 1 GB).Inside that range, the process can do whatever it wants — it’s isolated.
Lookup Table (MMU / Page Table)
That lookup table (MMU page tables) defines which virtual address in that 1 GB range maps to which physical address.
Once this mapping exists, the kernel steps back — the CPU + MMU handle translations automatically.
So within that 1 GB virtual sandbox,
the OS doesn’t care how your program lays out stack, heap, globals, or code — it’s up to you and your compiler/runtime.
Process Layout
Inside that 1 GB virtual space, the process layout usually looks like:
+---------------------+ ← High addresses | Stack | (grows downward) +---------------------+ | Memory-mapped libs | +---------------------+ | Heap | (grows upward) +---------------------+ | Data (globals) | +---------------------+ | Code (text) | +---------------------+ ← Low addressesThese regions are managed by the runtime and allocator (
malloc, etc.).But how they are used depends entirely on the program code.
Where Buffer Overflow Happens
Now, a Buffer Overflow (BoF) happens inside this user-space layout, for example:
char buf[10]; strcpy(buf, "AAAAAAAAAAAAAAAAAAAA"); // 20 bytes into a 10-byte arrayHere:
The CPU executes normal user-space instructions.
The OS and MMU have no idea you’re writing past 10 bytes.
You’re still writing to a valid address in your 1 GB range, just into the wrong variable.
So the hardware doesn’t see it as a violation — it’s still your memory!
Only when you cross into an unmapped page (like going beyond your 1 GB range) does the MMU raise a segmentation fault.
So:
The BoF isn’t caused by kernel or hardware malfunction — it’s caused by bad logic in the program’s layout or memory management inside its own address space.
That’s why buffer overflows are a software bug, not a hardware fault.
Stack overflow: function calls or local arrays exceed stack boundary.
Heap overflow: dynamic allocations overwrite adjacent blocks.
Both are results of the process corrupting its own layout - still within its own sandbox and not anything the OS or MMU did wrong.
Additional Points
We are in the 1980s now.
There is no operating system or kernel - the CPU is booted using BASIC.
Currently, we will focus only on design problems. In 2025, we will address optimization problems.
The concept of privilege levels does not exist yet - there are no roles; the CPU has only one mode of operation (system role).
This is the start of the concept of Memory Management Unit (MMU) and its functionalities.


![Lecture 4 - Rediscovering Process Scheduling [Part - 1]](/_next/image?url=https%3A%2F%2Fcdn.hashnode.com%2Fres%2Fhashnode%2Fimage%2Fupload%2Fv1765604682888%2F80e6cf20-aded-4aac-8c75-affdd35615b2.jpeg&w=3840&q=75)


![Lecture 3 - Reinventing MMU [Part - 2]](/_next/image?url=https%3A%2F%2Fcdn.hashnode.com%2Fres%2Fhashnode%2Fimage%2Fupload%2Fv1763234243271%2F6bb1c737-9def-4975-a06d-7ca59791c881.png&w=3840&q=75)