Memory management is one of the key concepts to understand OS and architecture. Naturally it was necessary for me to understand it for the development of an x86 simulator, dax86 and it was indeed tricky to connect all the xv6’s source code with the understanding of MMU. This article talks about the initial part of memory addressing in xv6 as well as how it organizes the instruction addresses of kernel.
A bit more specifically speaking, the xv6 text book and many articles covering xv6’s memory management mention how kernel instructions of xv6 are managed at the virtual address of
0x80100000, however it took some time for me to understand how exactly it happens end-to-end. So I intend to cover the mechanism from the creation of kernel executable to the switch of memory addressing mode to get into the kernel instructions.
As my last article on xv6’s bootblock, let’s start from Makefile. That’s where some crafting of kernel happens. Here’s the recipe for the kernel image.
It’s requiring all the kernel modules with
$(OBJS) as well as some additional files.
entry.o is for entering to the kernel code and
initcode is the starting instructions of a new process in a user mode, but in this article
kernel.ld is the key ingredient for the topic of the kernel’s instructions addresses.
kernel.ld exists in the xv6 source code directory and in Makefile it is used in
ld command with
-T option which replaces the linker script. The beginning of the script looks like below.
. in the
SECTIONS is a specifal linker variable to specify current output counter, and we can see it’s set to
0x80100000. This is specifying the starting instruction addresses of the kernel executable.
As a result of the linker script, the kernel address starts from the specified address. We can observe this by running
Memory Addressing at Start
When a boot sector is loaded, the CPU is in the real mode where the memory is addressed with physical addresses with one calculation. The value of a corresponding segment register would be shifted to left by four and added to the specified memory address. After a small number of instructions in
bootasm.S, the boot sector transitions to the protected mode. This transition is quite straightforward as many articles like this cover and in xv6 we can find it happening in
bootasm.S. Once entering to the protected mode, the memory is addressed through the entry of the global descriptor table. Basically the entry contains the base address of a memory range and its limit as well as some additional info such as the required privilege to access the memory range. At this point the segment resisters become the selectors to point those entries as the index.
As covered in
kernel.ld above, all the addresses in the kernel is based on
0x80100000. This is for separating the memory space between the user mode and the kernel mode, leveraging the paging of x86. Paging supports multiple different ways of structuring memory addressing. In the entry to the kernel of xv6,
entry.S, single level paging with page size extension is used. Later on after entering to the kernel, xv6 gets into the 2-level paging.
Memory Addressing to Enter Kernel
For xv6 to get into the kernel which has high addresses set, the paging needs go be turned on and this process happens in
entry.S. Here’s the codes:
Here we can see
$(V2P_WO(entrypgdir)) being set as a page directory. Firstly
V2P_WO is a macro defined in
memlayout.h that subtracts
KERN_BASE value from the argument. The reason it’s required is that at the point of setting the page directory, the paging mode is not turned on. If you remember,
entry.o is a part of the kernel which has the high addresses. So the pointer to
entrypgdir can’t be found from physical address space, and the mapping to the virtual address space is required, which is
entrypgdir itself. It can be found in the bottom of
main.c as below:
There are two mappings here and both have
PTE_PS flag on. Having this flag on page directory entries as well as having the CR4 page size extension bit on enable single level paging. Coming to the memory address of these mappings, we can see first one mapping
0 and the other one mapping
0. The reason of having these two mapping is for the transition to the second mapping. At the moment of turning the paging on,
EIP (instruction pointer) is still holding the physical address of instructions. As the paging applies on the instruction retrieval from memory, if we don’t have the first entry of identical mapping, CPU won’t find the instruction and will end up with an error. After successful jump to
main() at the last line of
entry.S, CPU enters the high address space and starts using the second mapping of
Once the OS gets into main function of kernel, it switches to the conventional 2-level paging. I’ll cover it as the part 2 of this article since the memory management till this part is already a chunk of decent size to cover while 2-level paging might be even bigger. Anyways thanks for reading :)