xv6: How exactly is memory addressed? (Part 1)
Memory management is one of the key concepts to understand OS and architecture. Naturally it was necessary for me to understand it for the development of an x86 simulator, dax86 and it was indeed tricky to connect all the xv6’s source code with the understanding of MMU. This article talks about the initial part of memory addressing in xv6 as well as how it organizes the instruction addresses of kernel.
A bit more specifically speaking, the xv6 text book and many articles covering xv6’s memory management mention how kernel instructions of xv6 are managed at the virtual address of 0x80100000
, however it took some time for me to understand how exactly it happens end-to-end. So I intend to cover the mechanism from the creation of kernel executable to the switch of memory addressing mode to get into the kernel instructions.
Makefile
As my last article on xv6’s bootblock, let’s start from Makefile. That’s where some crafting of kernel happens. Here’s the recipe for the kernel image.
kernel: $(OBJS) entry.o entryother initcode kernel.ld
$(LD) $(LDFLAGS) -T kernel.ld -o kernel entry.o $(OBJS) -b binary initcode entryother
$(OBJDUMP) -S kernel > kernel.asm
$(OBJDUMP) -t kernel | sed '1,/SYMBOL TABLE/d; s/ .* / /; /^$$/d' > kernel.sym
It’s requiring all the kernel modules with $(OBJS)
as well as some additional files. entry.o
is for entering to the kernel code and initcode
is the starting instructions of a new process in a user mode, but in this article kernel.ld
is the key ingredient for the topic of the kernel’s instructions addresses.
kernel.ld
The kernel.ld
exists in the xv6 source code directory and in Makefile it is used in ld
command with -T
option which replaces the linker script. The beginning of the script looks like below. .
in the SECTIONS
is a specifal linker variable to specify current output counter, and we can see it’s set to 0x80100000
. This is specifying the starting instruction addresses of the kernel executable.
OUTPUT_FORMAT("elf32-i386", "elf32-i386", "elf32-i386")
OUTPUT_ARCH(i386)
ENTRY(_start)
SECTIONS
{
/* Link the kernel at this address: "." means the current address */
/* Must be equal to KERNLINK */
. = 0x80100000;
.text : AT(0x100000) {
*(.text .stub .text.* .gnu.linkonce.t.*)
}
As a result of the linker script, the kernel address starts from the specified address. We can observe this by running objdump
command:
$ objdump -M intel -S kernelmemfs
kernelmemfs: file format elf32-i386
Disassembly of section .text:
80100000 <multiboot_header>:
80100000: 02 b0 ad 1b 00 00 add dh,BYTE PTR [eax+0x1bad]
80100006: 00 00 add BYTE PTR [eax],al
80100008: fe 4f 52 dec BYTE PTR [edi+0x52]
8010000b: e4 0f in al,0xf
8010000c <entry>:
Memory Addressing at Start
When a boot sector is loaded, the CPU is in the real mode where the memory is addressed with physical addresses with one calculation. The value of a corresponding segment register would be shifted to left by four and added to the specified memory address. After a small number of instructions in bootasm.S
, the boot sector transitions to the protected mode. This transition is quite straightforward as many articles like this cover and in xv6 we can find it happening in bootasm.S
. Once entering to the protected mode, the memory is addressed through the entry of the global descriptor table. Basically the entry contains the base address of a memory range and its limit as well as some additional info such as the required privilege to access the memory range. At this point the segment resisters become the selectors to point those entries as the index.
Paging
As covered in Makefile
and kernel.ld
above, all the addresses in the kernel is based on 0x80100000
. This is for separating the memory space between the user mode and the kernel mode, leveraging the paging of x86. Paging supports multiple different ways of structuring memory addressing. In the entry to the kernel of xv6, entry.S
, single level paging with page size extension is used. Later on after entering to the kernel, xv6 gets into the 2-level paging.
Memory Addressing to Enter Kernel
For xv6 to get into the kernel which has high addresses set, the paging needs go be turned on and this process happens in entry.S
. Here’s the codes:
# Entering xv6 on boot processor, with paging off.
.globl entry
entry:
# Turn on page size extension for 4Mbyte pages
movl %cr4, %eax
orl $(CR4_PSE), %eax
movl %eax, %cr4
# Set page directory
movl $(V2P_WO(entrypgdir)), %eax
movl %eax, %cr3
# Turn on paging.
movl %cr0, %eax
orl $(CR0_PG|CR0_WP), %eax
movl %eax, %cr0
Here we can see $(V2P_WO(entrypgdir))
being set as a page directory. Firstly V2P_WO
is a macro defined in memlayout.h
that subtracts KERN_BASE
value from the argument. The reason it’s required is that at the point of setting the page directory, the paging mode is not turned on. If you remember, entry.o
is a part of the kernel which has the high addresses. So the pointer to entrypgdir
can’t be found from physical address space, and the mapping to the virtual address space is required, which is entrypgdir
itself. It can be found in the bottom of main.c
as below:
__attribute__((__aligned__(PGSIZE)))
pde_t entrypgdir[NPDENTRIES] = {
// Map VA's [0, 4MB) to PA's [0, 4MB)
[0] = (0) | PTE_P | PTE_W | PTE_PS,
// Map VA's [KERNBASE, KERNBASE+4MB) to PA's [0, 4MB)
[KERNBASE>>PDXSHIFT] = (0) | PTE_P | PTE_W | PTE_PS,
};
There are two mappings here and both have PTE_PS
flag on. Having this flag on page directory entries as well as having the CR4 page size extension bit on enable single level paging. Coming to the memory address of these mappings, we can see first one mapping 0
to 0
and the other one mapping KERNBASE
(0x80000000
) to 0
. The reason of having these two mapping is for the transition to the second mapping. At the moment of turning the paging on, EIP
(instruction pointer) is still holding the physical address of instructions. As the paging applies on the instruction retrieval from memory, if we don’t have the first entry of identical mapping, CPU won’t find the instruction and will end up with an error. After successful jump to main()
at the last line of entry.S
, CPU enters the high address space and starts using the second mapping of KERNBASE
to 0
.
Once the OS gets into main function of kernel, it switches to the conventional 2-level paging. I’ll cover it as the part 2 of this article since the memory management till this part is already a chunk of decent size to cover while 2-level paging might be even bigger. Anyways thanks for reading :)