xv6: How exactly is memory addressed? (Part 2)
This is the second article covering the memory management of xv6, continuing from the previous article which covered the memory management at the early stages of xv6’s kernel initialization before getting into its main function.
In this article, the memory management mechanism in both kernel mode and user mode is covered. Specifically xv6 uses 2-level memory paging supported by MMU (memory management unit). This is essentially the same mechanism that Linux kernel manages its memory though the number of levels is different.
main.c
Once entry.S
jumps to main
function, it starts with the memory management setup. The first two functions, kinit1
and kvmalloc
set up the 2-level paging mechanism for kernel mode. After all the device initialization and interrupt setup, kinit2
and userinit
functions set up the memory for user mode.
int
main(void)
{
kinit1(end, P2V(4*1024*1024)); // phys page allocator
kvmalloc(); // kernel page table
... device initialization ...
kinit2(P2V(4*1024*1024), P2V(PHYSTOP)); // must come after startothers()
userinit(); // first user process
mpmain(); // finish this processor's setup
}
Kernel Mode
Let’s take a look at the memory setup for kernel mode first. Going through it would cover the fundamentals of paging setup and actually the kernel page mapping is used in user mode memory setup as well.
Memory Allocation
The first line of main
function runs kinit1
function which takes *vstart
and *vend
arguments. The call is made to allocate the memory from end
(first address after kernel loaded from ELF file) to P2V(4*1024*1024)
which would be 0x80400000
(4MB from KERNBASE: 0x80000000). kinit1
function allocates memory space as follows:
Notes
- As covered in the part 1, kernel instructions are based on the virtual address of
0x80100000
.- At this point, the kernel is still using the 2-level mapping,
entrypgdir
.
Firstly kinit1
calls freerange
after initializing lock for the data structure of kernel memory:
kalloc.c
void
kinit1(void *vstart, void *vend)
{
initlock(&kmem.lock, "kmem");
kmem.use_lock = 0;
freerange(vstart, vend);
}
freerange
calls kfree
until vend
by every PGSIZE
:
void
freerange(void *vstart, void *vend)
{
char *p;
p = (char*)PGROUNDUP((uint)vstart);
for(; p + PGSIZE <= (char*)vend; p += PGSIZE)
kfree(p);
}
kfree
casts the specified pointer to run
struct which is a single-linked list and push it into the head of kmem.freelist
. Subsequently kalloc
function is used to retrieve the pointer to the requested page from kmem.freelist
:
struct run {
struct run *next;
};
...
void
kfree(char *v)
{
...
// Fill with junk to catch dangling refs.
memset(v, 1, PGSIZE);
if(kmem.use_lock)
acquire(&kmem.lock);
r = (struct run*)v;
r->next = kmem.freelist;
kmem.freelist = r;
if(kmem.use_lock)
release(&kmem.lock);
}
2-level Paging
We have covered how kernel allocates memory for a range by pages. Now let’s look into kvmalloc
to find out how the 2-level paging is set up.
Once you search “paging x86” or something similar online, we see this common diagram as below explaining how a virtual address is split into 3 different parts, eventually forming a physical address. At a glance, I got the idea of this conversion from virtual addresses to physical addresses, however I had difficulty understanding how an OS would make use of it. The key point here is to understand that an OS needs to setup these maps. Specifically an OS would allocate one page for a page directory and another for page table first, and it maps the arbitrary virtual address to the targeted physical address by creating the page directory entries and the page table entries. It’s also important to understand this mapping is based on the page directory address which would be stored in
CR3
register. So switching theCR3
register value would change the virtual-to-physical mapping and xv6 does it per process. This whole flow is exactly whatkvmalloc
function does and its details are explained as follows.
Firstly, kvmalloc
calls setupkvm
which sets up the 2-level memory paging structure. It starts by allocating a new page directory, pgdir
by calling kalloc
. After that, the key function here is mappages
which sets a range of physical addresses into the page tables in a specified page directory and a starting virtual address:
pde_t*
setupkvm(void)
{
pde_t *pgdir;
struct kmap *k;
if((pgdir = (pde_t*)kalloc()) == 0) // allocates a page for the page directory
return 0;
memset(pgdir, 0, PGSIZE);
if (P2V(PHYSTOP) > (void*)DEVSPACE)
panic("PHYSTOP too high");
for(k = kmap; k < &kmap[NELEM(kmap)]; k++)
if(mappages(pgdir, k->virt, k->phys_end - k->phys_start,
(uint)k->phys_start, k->perm) < 0) {
freevm(pgdir);
return 0;
}
return pgdir;
}
mappages
writes physical address into the page table entries by a page size and loops till it writes all the pages in the specified size. Page table entries are retrieved by another function, walkpgdir
:
static int
mappages(pde_t *pgdir, void *va, uint size, uint pa, int perm)
{
char *a, *last;
pte_t *pte;
a = (char*)PGROUNDDOWN((uint)va);
last = (char*)PGROUNDDOWN(((uint)va) + size - 1);
for(;;){
if((pte = walkpgdir(pgdir, a, 1)) == 0)
return -1;
if(*pte & PTE_P)
panic("remap");
*pte = pa | perm | PTE_P; // writes physical address on retrieved page table entry
if(a == last)
break;
a += PGSIZE;
pa += PGSIZE;
}
return 0;
}
walkpgdir
firstly makes the pointer to the page directory entry, pde
from the pointer to the page directory, pgdir
and the specified virtual address, va
using its highest 10 bits. Then it checks with PTE_P
flag if the page directory entry has already been allocated or not. If it’s present, it means the page directory entry already has the pointer value to the page table, so it dereferences the pointer to the page table from the page directory entry. If it’s not present, it allocates a new page for the page table, and write its address on the page directory entry. Lastly, it returns the pointer to the page table entry specified with the page table and the specified virtual address using its 10 bits in the middle:
static pte_t *
walkpgdir(pde_t *pgdir, const void *va, int alloc)
{
pde_t *pde;
pte_t *pgtab;
pde = &pgdir[PDX(va)]; // creates a pointer to the page directory entry
if(*pde & PTE_P){
pgtab = (pte_t*)P2V(PTE_ADDR(*pde)); // creates a page table from the page directory entry
} else {
if(!alloc || (pgtab = (pte_t*)kalloc()) == 0) // creates a page table
return 0;
// Make sure all those PTE_P bits are zero.
memset(pgtab, 0, PGSIZE);
// The permissions here are overly generous, but they can
// be further restricted by the permissions in the page table
// entries, if necessary.
*pde = V2P(pgtab) | PTE_P | PTE_W | PTE_U; // writes the address to the page table on the page directory entry
}
return &pgtab[PTX(va)];
}
We have covered the workflow of setting up the 2-level paging. Now we should look into the actual mapping created for the kernel:
static struct kmap {
void *virt;
uint phys_start;
uint phys_end;
int perm;
} kmap[] = {
{ (void*)KERNBASE, 0, EXTMEM, PTE_W}, // I/O space
{ (void*)KERNLINK, V2P(KERNLINK), V2P(data), 0}, // kern text+rodata
{ (void*)data, V2P(data), PHYSTOP, PTE_W}, // kern data+memory
{ (void*)DEVSPACE, DEVSPACE, 0, PTE_W}, // more devices
};
The actual addresses of the mapping above look like this:
KERNBASE
(0x80000000) to physical:0
toEXTMEM
(0x100000)KERNLINK
(0x80100000) to physical:KERNLINK
(0x100000) toV2P(data)
(rodata in kernel image)data
to physical:V2P(data)
toPHYSTOP
(0xE000000: 235MB)DEVSPACE
(0xFE000000) to physical:DEVSPACE
(0xFE000000) to0
The kvmalloc
function creates the memory mapping as above, and calls switchkvm
function which activates it by writing the address of the returned page directory on CR3
register. The execution proceeds straight as the virtual address mapping to the physical kernel instructions are the same as the previous page directory, pgdir
.
User Mode
Back in the kernel’s main
function, there are two calls made for user processes: kinit2(P2V(4*1024*1024), P2V(PHYSTOP))
and userinit()
. The first call is for allocating the memory from 0x80400000
to 0x8E000000
. The second call takes care of multiple things for a process including context, memory, and trap frame.
We can see userinit
function calling setupkvm
function below. This is to enable the switch to kernel mode for system calls and interrupts without switching the page directories.
void userinit(void)
{
struct proc *p;
extern char _binary_initcode_start[], _binary_initcode_size[];
p = allocproc();
initproc = p;
if ((p->pgdir = setupkvm()) == 0)
panic("userinit: out of memory?");
inituvm(p->pgdir, _binary_initcode_start, (int)_binary_initcode_size);
...
Once setupkvm
sets up the page tables for kernal codes, userinit
calls inituvm
, passing the page directory created and a pointer to and the size of initcode
.
void
inituvm(pde_t *pgdir, char *init, uint sz)
{
char *mem;
if(sz >= PGSIZE)
panic("inituvm: more than a page");
mem = kalloc();
memset(mem, 0, PGSIZE);
mappages(pgdir, 0, PGSIZE, V2P(mem), PTE_W|PTE_U);
memmove(mem, init, sz);
}
inituvm
calls kalloc
function to allocate a new page and resets its content to 0
. After that, it calls mappages
with the virtual address 0
and the physical address of the new page, which means the virtual address 0
would be pointing to this new page in this page directory. Lastly memmove
is called to write the init
codes into the newly allocated page. This achieves the memory structure for a user process where it can just start with the EIP
from 0
while keeping the access to kernel pages.
Summary
It became quite a lengthy article, but I hope I could cover how memory is managed by xv6 using the 2-level paging structure step-by-step. It was a good exercise for myself to organize my learnings as well. I hope someone would find it useful.