xv6: How exactly is memory addressed? (Part 2)

This is the second article covering the memory management of xv6, continuing from the previous article which covered the memory management at the early stages of xv6’s kernel initialization before getting into its main function.

In this article, the memory management mechanism in both kernel mode and user mode is covered. Specifically xv6 uses 2-level memory paging supported by MMU (memory management unit). This is essentially the same mechanism that Linux kernel manages its memory though the number of levels is different.

main.c

Once entry.S jumps to main function, it starts with the memory management setup. The first two functions, kinit1 and kvmalloc set up the 2-level paging mechanism for kernel mode. After all the device initialization and interrupt setup, kinit2 and userinit functions set up the memory for user mode.

int
main(void)
{
  kinit1(end, P2V(4*1024*1024)); // phys page allocator
  kvmalloc();      // kernel page table
  ... device initialization ...
  kinit2(P2V(4*1024*1024), P2V(PHYSTOP)); // must come after startothers()
  userinit();      // first user process
  mpmain();        // finish this processor's setup
}

Kernel Mode

Let’s take a look at the memory setup for kernel mode first. Going through it would cover the fundamentals of paging setup and actually the kernel page mapping is used in user mode memory setup as well.

Memory Allocation

The first line of main function runs kinit1 function which takes *vstart and *vend arguments. The call is made to allocate the memory from end (first address after kernel loaded from ELF file) to P2V(4*1024*1024) which would be 0x80400000 (4MB from KERNBASE: 0x80000000). kinit1 function allocates memory space as follows:

Notes

As covered in the part 1, kernel instructions are based on the virtual address of 0x80100000.

At this point, the kernel is still using the 2-level mapping, entrypgdir.

Firstly kinit1 calls freerange after initializing lock for the data structure of kernel memory:

kalloc.c

void
kinit1(void *vstart, void *vend)
{
  initlock(&kmem.lock, "kmem");
  kmem.use_lock = 0;
  freerange(vstart, vend);
}

freerange calls kfree until vend by every PGSIZE:

void
freerange(void *vstart, void *vend)
{
  char *p;
  p = (char*)PGROUNDUP((uint)vstart);
  for(; p + PGSIZE <= (char*)vend; p += PGSIZE)
    kfree(p);
}

kfree casts the specified pointer to run struct which is a single-linked list and push it into the head of kmem.freelist. Subsequently kalloc function is used to retrieve the pointer to the requested page from kmem.freelist:

struct run {
  struct run *next;
};
...
void
kfree(char *v)
{
  ...
  // Fill with junk to catch dangling refs.
  memset(v, 1, PGSIZE);

  if(kmem.use_lock)
    acquire(&kmem.lock);
  r = (struct run*)v;
  r->next = kmem.freelist;
  kmem.freelist = r;
  if(kmem.use_lock)
    release(&kmem.lock);
}

2-level Paging

We have covered how kernel allocates memory for a range by pages. Now let’s look into kvmalloc to find out how the 2-level paging is set up.

Once you search “paging x86” or something similar online, we see this common diagram as below explaining how a virtual address is split into 3 different parts, eventually forming a physical address. At a glance, I got the idea of this conversion from virtual addresses to physical addresses, however I had difficulty understanding how an OS would make use of it. The key point here is to understand that an OS needs to setup these maps. Specifically an OS would allocate one page for a page directory and another for page table first, and it maps the arbitrary virtual address to the targeted physical address by creating the page directory entries and the page table entries. It’s also important to understand this mapping is based on the page directory address which would be stored in CR3 register. So switching the CR3 register value would change the virtual-to-physical mapping and xv6 does it per process. This whole flow is exactly what kvmalloc function does and its details are explained as follows.

Firstly, kvmalloc calls setupkvm which sets up the 2-level memory paging structure. It starts by allocating a new page directory, pgdir by calling kalloc. After that, the key function here is mappages which sets a range of physical addresses into the page tables in a specified page directory and a starting virtual address:

pde_t*
setupkvm(void)
{
  pde_t *pgdir;
  struct kmap *k;

  if((pgdir = (pde_t*)kalloc()) == 0) // allocates a page for the page directory
    return 0;
  memset(pgdir, 0, PGSIZE);
  if (P2V(PHYSTOP) > (void*)DEVSPACE)
    panic("PHYSTOP too high");
  for(k = kmap; k < &kmap[NELEM(kmap)]; k++)
    if(mappages(pgdir, k->virt, k->phys_end - k->phys_start,
                (uint)k->phys_start, k->perm) < 0) {
      freevm(pgdir);
      return 0;
    }
  return pgdir;
}

mappages writes physical address into the page table entries by a page size and loops till it writes all the pages in the specified size. Page table entries are retrieved by another function, walkpgdir:

static int
mappages(pde_t *pgdir, void *va, uint size, uint pa, int perm)
{
  char *a, *last;
  pte_t *pte;

  a = (char*)PGROUNDDOWN((uint)va);
  last = (char*)PGROUNDDOWN(((uint)va) + size - 1);
  for(;;){
    if((pte = walkpgdir(pgdir, a, 1)) == 0)
      return -1;
    if(*pte & PTE_P)
      panic("remap");
    *pte = pa | perm | PTE_P; // writes physical address on retrieved page table entry
    if(a == last)
      break;
    a += PGSIZE;
    pa += PGSIZE;
  }
  return 0;
}

walkpgdir firstly makes the pointer to the page directory entry, pde from the pointer to the page directory, pgdir and the specified virtual address, va using its highest 10 bits. Then it checks with PTE_P flag if the page directory entry has already been allocated or not. If it’s present, it means the page directory entry already has the pointer value to the page table, so it dereferences the pointer to the page table from the page directory entry. If it’s not present, it allocates a new page for the page table, and write its address on the page directory entry. Lastly, it returns the pointer to the page table entry specified with the page table and the specified virtual address using its 10 bits in the middle:

static pte_t *
walkpgdir(pde_t *pgdir, const void *va, int alloc)
{
  pde_t *pde;
  pte_t *pgtab;

  pde = &pgdir[PDX(va)]; // creates a pointer to the page directory entry
  if(*pde & PTE_P){
    pgtab = (pte_t*)P2V(PTE_ADDR(*pde)); // creates a page table from the page directory entry
  } else {
    if(!alloc || (pgtab = (pte_t*)kalloc()) == 0) // creates a page table
      return 0;
    // Make sure all those PTE_P bits are zero.
    memset(pgtab, 0, PGSIZE);
    // The permissions here are overly generous, but they can
    // be further restricted by the permissions in the page table
    // entries, if necessary.
    *pde = V2P(pgtab) | PTE_P | PTE_W | PTE_U; // writes the address to the page table on the page directory entry
  }
  return &pgtab[PTX(va)];
}

We have covered the workflow of setting up the 2-level paging. Now we should look into the actual mapping created for the kernel:

static struct kmap {
  void *virt;
  uint phys_start;
  uint phys_end;
  int perm;
} kmap[] = {
 { (void*)KERNBASE, 0,             EXTMEM,    PTE_W}, // I/O space
 { (void*)KERNLINK, V2P(KERNLINK), V2P(data), 0},     // kern text+rodata
 { (void*)data,     V2P(data),     PHYSTOP,   PTE_W}, // kern data+memory
 { (void*)DEVSPACE, DEVSPACE,      0,         PTE_W}, // more devices
};

The actual addresses of the mapping above look like this:

KERNBASE (0x80000000) to physical: 0 to EXTMEM (0x100000)
KERNLINK (0x80100000) to physical: KERNLINK (0x100000) to V2P(data) (rodata in kernel image)
data to physical: V2P(data) to PHYSTOP (0xE000000: 235MB)
DEVSPACE (0xFE000000) to physical: DEVSPACE (0xFE000000) to 0

The kvmalloc function creates the memory mapping as above, and calls switchkvm function which activates it by writing the address of the returned page directory on CR3 register. The execution proceeds straight as the virtual address mapping to the physical kernel instructions are the same as the previous page directory, pgdir.

User Mode

Back in the kernel’s main function, there are two calls made for user processes: kinit2(P2V(4*1024*1024), P2V(PHYSTOP)) and userinit(). The first call is for allocating the memory from 0x80400000 to 0x8E000000. The second call takes care of multiple things for a process including context, memory, and trap frame.

We can see userinit function calling setupkvm function below. This is to enable the switch to kernel mode for system calls and interrupts without switching the page directories.

void userinit(void)
{
  struct proc *p;
  extern char _binary_initcode_start[], _binary_initcode_size[];

  p = allocproc();

  initproc = p;
  if ((p->pgdir = setupkvm()) == 0)
    panic("userinit: out of memory?");
  inituvm(p->pgdir, _binary_initcode_start, (int)_binary_initcode_size);
  ...

Once setupkvm sets up the page tables for kernal codes, userinit calls inituvm, passing the page directory created and a pointer to and the size of initcode.

void
inituvm(pde_t *pgdir, char *init, uint sz)
{
  char *mem;

  if(sz >= PGSIZE)
    panic("inituvm: more than a page");
  mem = kalloc();
  memset(mem, 0, PGSIZE);
  mappages(pgdir, 0, PGSIZE, V2P(mem), PTE_W|PTE_U);
  memmove(mem, init, sz);
}

inituvm calls kalloc function to allocate a new page and resets its content to 0. After that, it calls mappages with the virtual address 0 and the physical address of the new page, which means the virtual address 0 would be pointing to this new page in this page directory. Lastly memmove is called to write the init codes into the newly allocated page. This achieves the memory structure for a user process where it can just start with the EIP from 0 while keeping the access to kernel pages.

Summary

It became quite a lengthy article, but I hope I could cover how memory is managed by xv6 using the 2-level paging structure step-by-step. It was a good exercise for myself to organize my learnings as well. I hope someone would find it useful.