OSC:Memory Management

From SOFTICE

Jump to: navigation, search

Contents


Memory management

Pedagogical Objectives
  • Introduce Kernel Data Structures:
  • Introduce Kernel APIs:
  • Big picture:
Developed by:

Synopsis

Every operating system must manage the computer's physical memory and distribute it to processes as they need it. If the physical memory proves insufficient, they must also use a swapping mechanism to simulate more memory, using slower media (such as a magnetic disk) to provide the additional storage.

In this lab, we will implement a paging and shared-memory system.

[Briefing]

Every general-purpose operating system, and most specialized operating systems, must control a limited amount of physical memory and share it between multiple running processes. Operating systems that support multiple simultaneous users (Unix, MacOS X, Windows NT or later) must also take the needs of multiple users into account.

In modern operating systems, every process gets its own virtual address space; two different processes can store different values at the same virtual location and the operating system will map them to different physical addresses. It does this by consulting each process's page table. Different processes have different physical addresses associated with each virtual address.

Rather than manage memory on a per-byte or per-word basis, modern operating systems use chunks of memory known as 'pages' (hence the name "page table"). The memory management hardware found in general-purpose computers works in terms of these pages, and the operating system follows suit.

Linux uses a three-tier page table system to allow sparsely-populated virtual address spaces without wasting significant amounts of space. Linux's default page size on 32-bit architectures is 4kB, and the address space is 4GB; without a multilevel page table system, the system would need to keep track of 1048576 pages per process. Since each entry in the page table must be, at minimum, a pointer, and a pointer on a 32-bit architecture is 4 bytes long, that would translate to four megabytes of memory per process just for keeping track of the pages.

[Solved]

In this demonstration, we set up a virtual RAM system to demonstrate the use of page tables.

Note: All locking is done by way of semaphores, using uninterruptible sleep. This is intentional: using interruptible sleep is more complicated (though more responsive and less likely to cause a lockup). Production code would likely use interruptible sleep.

Kernel modification

We implement four new system calls in this module: getpagetable(), putpagetable(), readfromaddress(), and writetoaddress(). To create the hooks for these system calls in kernel version 2.6.16.20, we add these lines to include/asm-i386/unistd.h, at the end of the list:

#define __NR_getpagetable      311
#define __NR_putpagetable      312
#define __NR_readfromaddress   313
#define __NR_writetoaddress    314

Also change the definition of NR_syscalls from 311 to 315.

At the end of the file arch/i386/kernel/syscall_table.S, add the following lines:

       .long sys_ni_syscall
       .long sys_ni_syscall
       .long sys_ni_syscall
       .long sys_ni_syscall

These extend the system call table by four entries and define the handler for the new system calls to be the default handler, sys_ni_syscall (non-implemented syscall). All this handler does is return -ENOSYS ("No such system call"). Of course, the replacement handlers defined in this module will do more.

mem.c

The only particularly notable include is mem.h; this is the first project in which an include file has been worth the hassle.

#include <linux/init.h>
#include <linux/module.h>
#include <linux/moduleparam.h>
#include <linux/sched.h>

#include <linux/vmalloc.h>
#include <linux/list.h>

#include <linux/string.h>
#include <linux/fs.h>

#include <linux/unistd.h>       /* The list of system calls */

/* 
 * For the current (process) structure, we need
 * this to know who the current user is. 
 */
#include <linux/sched.h>
#include <asm/uaccess.h>

#include "mem.h"

Utilities for replacing system calls, copied directly from earlier labs.

extern void *sys_call_table[];	/*This is where we hook on the syscall*/

/***********************************************************************
 * Important naming convention:
 *
 * The original function handling each syscall is saved as o_<syscall>,
 * and the replacement is named n_<syscall>.  For example, o_getdents64
 * is the original handler for the getdents64() system call, and
 * n_getdents64 is the replacement we write.
 ***********************************************************************/

#define REPLACE_SYSCALL(call) \
        do { \
                o_ ## call = sys_call_table[__NR_ ## call]; \
                sys_call_table[__NR_ ## call] = n_ ## call; \
        } while (0)

#define RESTORE_SYSCALL(call) \
        do { \
                sys_call_table[__NR_ ## call] = o_ ## call; \
        } while (0)

Here we define our central data structures and the lock that protects them.

/* Our virtual RAM. */
byte *rambuffer = NULL;

/* An array of page structures describing the RAM. */
struct si_page *rampages = NULL;

/* A list of page tables, so we can find them later. */
struct list_head *pagetables;

struct semaphore ptlock;

MODULE_LICENSE("GPL");

The first four functions are replacements for system calls; the rest are utility functions. See their definitions for details.

/* function prototypes */
asmlinkage long n_getpagetable (void);
asmlinkage long n_putpagetable (void);
asmlinkage long n_readfromaddress (int address, byte * __user buf, int len);
asmlinkage long n_writetoaddress (int address, byte * __user buf, int len);
static long copy_from_to_page (int direction, int address, byte * __user buf, int len);
static int acquire_free_page (void);
static int acquire_page (int num);
static void release_page (int num);
static struct si_pagetable * find_pagetable (int tag);
static void free_pagetable (struct si_pagetable *pt);

/*
 * Pointers to the old handlers for these system calls.
 * (They should all be NULL, but they need to be saved, just in case.)
 */
asmlinkage long (*o_getpagetable) (void);
asmlinkage long (*o_putpagetable) (void);
asmlinkage long (*o_readfromaddress) (int address, byte * buf, int len);
asmlinkage long (*o_writetoaddress) (int address, byte * buf, int len);

In the getpagetable() system call, a page table is created for the calling process. (Any existing page tables for a process with the same PID are destroyed.)

/*
 * Allocate and initialize the page table for a process.
 *
 * Returns 0 on success and <0 on failure.
 */
asmlinkage long
n_getpagetable (void)
{
	struct si_pagetable *pt;
	int i;

	/* Allocate memory for the page table. */
	pt = vmalloc(sizeof(struct si_pagetable));
	if (pt == NULL) {
		printk("AIEEE! No memory for a page table!\n");
		return -ENOMEM;
	}

	/* Initialize the contents of the new page table. */
	INIT_LIST_HEAD(&(pt->list));
	pt->tag = current->pid;
	for (i = 0; i < SI_NUM_PAGES; i++) {
		pt->page[i] = 0;
		pt->present[i] = 0;
	}

	/* Make sure there isn't already a page table for us. */
	n_putpagetable();

	/* Connect the new pagetable to the list of old pagetables. */
	down(&ptlock);
	list_add(pagetables, &(pt->list));
	up(&ptlock);

	return 0;
}

The putpagetable system call destroys the page table associated with the calling process. (This handler is also called by n_getpagetable, to clear out any existing page tables with a given PID before attaching the new one.)

/*
 * Release the page table for a specific process.
 *
 * Returns 0 and cannot fail.
 */
asmlinkage long
n_putpagetable (void)
{
	struct si_pagetable *tmppt;
	struct list_head *tmplist1, *tmplist2;

	down(&ptlock);
	/* Check each pagetable. */
	list_for_each_safe(tmplist1, tmplist2, pagetables) {
		tmppt = list_entry(tmplist1, struct si_pagetable, list);
		/* If the tags match, delete it. */
		if (tmppt->tag == current->pid) {
			list_del(&(tmppt->list));
			free_pagetable(tmppt);
		}
	}
	up(&ptlock);

	return 0;
}

n_readfromaddress and n_writetoaddress are wrappers around the same function, copy_from_to_page. The two functions differed only in the function used to copy the contents of RAM, so it was cleaner to put the common code into one shared function, passing in the direction to copy.

/*
 * Read len bytes from address into buf.
 *
 * Returns the number of bytes successfully read.
 */
asmlinkage long
n_readfromaddress (int address, byte * __user buf, int len)
{
	return copy_from_to_page(SI_TO_USER, address, buf, len);
}

/*
 * Write len bytes into address from buf.
 *
 * Returns the number of bytes successfully written.
 */
asmlinkage long
n_writetoaddress (int address, byte * __user buf, int len)
{
	return copy_from_to_page(SI_FROM_USER, address, buf, len);
}

This function actually handles copying data to or from the user.

If direction is SI_TO_USER, it reads up to len bytes from RAM and puts them into buf. If direction is SI_FROM_USER, it takes up to len bytes from buf and copies them to RAM.

/*
 * Handle reading and writing data from RAM.
 *
 * (This is a separate function because the read and write functions
 * differed only in one line.)
 */
static long
copy_from_to_page (int direction, int address, byte * __user buf, int len)
{
	int pageindex, offset, bad;
	struct si_pagetable *pt;
	byte *ramstart;

	/* Get the virtual page number and offset within the page. */
	pageindex = SI_ADDR_PAGE(address);
	offset = SI_ADDR_OFFSET(address);

	/* Limit access to a single page. */
	if ((len + offset) > SI_PAGE_SIZE) {
		len = SI_PAGE_SIZE - offset;
	}

	/* Get the page table for the current process. */
	down(&ptlock);
	pt = find_pagetable(current->pid);
	if (pt == NULL) {
		up(&ptlock);
		return -EBADF;	/* "Bad file descriptor", closest I could find */
	}

	/* Make sure the page is loaded and mapped. */
	if (pt->present[pageindex]) {
		/* the page is there and ready */
		printk(KERN_DEBUG "cache hit, %d -> %d\n", pageindex, pt->page[pageindex]);
	} else {
		/* the page isn't there */
		printk(KERN_DEBUG "page fault\n");
		pt->page[pageindex] = acquire_free_page();
		if (pt->page[pageindex] == -1) {
			/* We couldn't allocate the page.
			   Release lock and fail. */
			up(&ptlock);
			return -ENOMEM;	
		}
		pt->present[pageindex] = 1;
		printk(KERN_DEBUG "%d mapped to %d\n", pageindex, pt->page[pageindex]);
	}

	/* Find the real address corresponding to the virtual address. */
	pageindex = pt->page[pageindex];
	ramstart = rampages[pageindex].start + offset;

	/* Copy the data. */
	if (direction == SI_TO_USER) {
		bad = copy_to_user(buf, ramstart, len);
		len -= bad;
	} else if (direction == SI_FROM_USER) {
		bad = copy_from_user(ramstart, buf, len);
		len -= bad;
	} else {
		len = -EINVAL;
	}

	/* We're done, so release the lock. */
	up(&ptlock);

	return len;
}

acquire_free_page finds a page in RAM that is not being used and locks it for use by the current process.

If there are no free pages to be acquired, this function returns -1.

/*
 * Get a free page of RAM and return its physical index.
 *
 * Caller must hold ptlock.
 *
 * Returns the page's physical index, or -1 on failure.
 */
static int
acquire_free_page (void)
{
	int i;
	
	/* No swapping; return -1 if no pages are free. */

	/* Scan the ram page array to find one that is free. */
	for (i = 0; i < SI_PHYSICAL_PAGES; i++) {
		if (rampages[i].used == 0) {
			return acquire_page(i);
		}
	}

	return -1;
}

Given a specific page number in RAM, acquire_page marks it as being used by a process. If given an invalid page number, it returns -1; otherwise, it returns the number of the page.

/*
 * Get a specific page of RAM and return its index.
 *
 * Caller must hold ptlock.
 *
 * Returns the page's physical index or -1 on failure.
 */
static int
acquire_page (int num)
{
	printk(KERN_DEBUG "Attempting to acquire page %d\n", num);
	if ((num < 0) || (num >= SI_PHYSICAL_PAGES)) {
		return -1;
	}
	
	rampages[num].used++;
	return num;
}

release_page is the complement of acquire_page: it signifies that the page is no longer needed by a process.

/*
 * Release a page of RAM.
 *
 * Caller must hold ptlock.
 */
static void
release_page (int num)
{
	if ((num >= 0) && (num < SI_PHYSICAL_PAGES)) {
		rampages[num].used--;
	}
}

Each process is expected to call find_pagetable, passing in its PID, in order to acquire its page table; all pagetables are stored in a linked list, and this function finds the element that corresponds to the tag that is passed in. (Under normal circumstances, this will be the process' PID.)

/*
 * Find the pagetable that corresponds to tag.
 *
 * Caller must hold ptlock.
 *
 * Returns a pointer to the pagetable.
 */
static struct si_pagetable *
find_pagetable (int tag)
{
	struct list_head *cur;
	struct si_pagetable *pt;

	list_for_each(cur, pagetables) {
		pt = list_entry(cur, struct si_pagetable, list);
		
		if (pt->tag == tag) {
			return pt;
		}
	}

	return NULL;
}

When a process is done with its pagetable, it must release it. This is done with the free_pagetable function, which releases all the pages held in the pagetable, then frees the pagetable itself.

/*
 * Destroy a pagetable.
 *
 * Caller must hold ptlock.
 */
static void
free_pagetable (struct si_pagetable *pt)
{
	int i;
	for (i = 0; i < SI_NUM_PAGES; i++) {
		if (pt->present[i] != 0) {
			release_page(pt->page[i]);
			pt->present[i] = 0;
		}
	}

	vfree(pt);
}

The initialization function handles allocation of memory for use as RAM, as a page array, and for pagetables, along with the corresponding locks. It also replaces the existing system calls with our replacements.

/*
 * Initialize the module and insert it into the kernel.
 */
int
init_module (void)
{
	int i;

	/* We need a list head for the pagetables list.  Initialize
	   it here. */
	pagetables = kmalloc(sizeof(struct list_head), GFP_KERNEL);
	INIT_LIST_HEAD(pagetables);

	/* Allocate our RAM. */
	rambuffer = vmalloc(SI_PAGE_SIZE * SI_PHYSICAL_PAGES);
	if (rambuffer == NULL) {
		printk("Not enough RAM to allocate buffer!\n");
		return -ENOMEM;
	}

	/* Allocate the page array. */
	rampages = vmalloc(SI_PHYSICAL_PAGES * sizeof(struct si_page));
	if (rampages == NULL) {
		printk("Not enough RAM to allocate page list!\n");
		vfree(rambuffer);
		return -ENOMEM;
	}

	/* Initialize the page array. */
	for (i = 0; i < SI_PHYSICAL_PAGES; i++) {
		rampages[i].start = rambuffer + (i * SI_PAGE_SIZE);
		rampages[i].used = 0;
	}

	/* Initialize the locks. */
	init_MUTEX(&ptlock);

	/* Attach our functions to system calls. */
	REPLACE_SYSCALL(getpagetable);
	REPLACE_SYSCALL(putpagetable);
	REPLACE_SYSCALL(readfromaddress);
	REPLACE_SYSCALL(writetoaddress);

	return 0;
}

In this case, the module cleanup function is effectively the initialization function in reverse: everything init_module does, cleanup_module undoes, and in reverse order. That is, it restores the system calls to their original handlers and deletes the page array and RAM.

void
cleanup_module (void)
{
	struct si_pagetable *tmppt;
	struct list_head *tmplist1, *tmplist2;

	/* Detach our functions from system calls. */
	RESTORE_SYSCALL(getpagetable);
	RESTORE_SYSCALL(putpagetable);
	RESTORE_SYSCALL(readfromaddress);
	RESTORE_SYSCALL(writetoaddress);

	/* Destroy all pagetables. */
	down(&ptlock);

	list_for_each_safe(tmplist1, tmplist2, pagetables) {
		tmppt = list_entry(tmplist1, struct si_pagetable, list);
		list_del(&(tmppt->list));
		free_pagetable(tmppt);
	}
	kfree(pagetables);

	/* Destroy the page array. */
	vfree(rampages);

	/* Release our RAM. */
	vfree(rambuffer);
	
	up(&ptlock);
}

mem.h

The mem.h include file is fairly self-explanatory:

#ifndef SOFTICE_MEM_H
#define SOFTICE_MEM_H

typedef char byte;


/* Address size and structure defines: */

/* Number of offset bits in a virtual address. */
#define SI_OFFSET_BITS 10	/* 1024 bytes per page */
/* Number of page bits in a virtual address. */
#define SI_PAGE_BITS    6	/* 64 pages per page table */
/* The number of pages in the virtual address space. */
#define SI_NUM_PAGES   (1 << SI_PAGE_BITS)
/* The number of bytes in a single page. */
#define SI_PAGE_SIZE   (1 << SI_OFFSET_BITS)

/* The number of physical pages available. */
/* Set as half the virtual address space. */
#define SI_PHYSICAL_PAGES (1 << (SI_PAGE_BITS - 1))

/* Create a mask that passes the x low bits. */
#define SI_MASK_LOW_BITS(x) ((1 << (x)) - 1)
/* Retrieve the page index from a virtual address. */
#define SI_ADDR_PAGE(x)   (((x) >> SI_OFFSET_BITS) & SI_MASK_LOW_BITS(SI_PAGE_BITS))
/* Retrieve the offset from a virtual address. */
#define SI_ADDR_OFFSET(x) ((x) & SI_MASK_LOW_BITS(SI_OFFSET_BITS))

/* Describes a single page of RAM. */
struct si_page {
	byte *start;		/* The beginning of the page. */
	int used;		/* Reference count, initialized to 0. */
};

/* A page table. */
struct si_pagetable {
	struct list_head list;	/* Tie all the page tables together. */
	int tag;		/* to identify the pagetable. for now, PID. */
	int page[SI_NUM_PAGES];	/* Index into the array of struct page. */
	int present[SI_NUM_PAGES]; /* Whether the physical addresses are valid. */
};

/* Constants for use in the copy_to_from_user() function. */
#define SI_FROM_USER 1
#define SI_TO_USER 2

#endif

The distinction between SI_NUM_PAGES and SI_PHYSICAL_PAGES should be underscored: SI_NUM_PAGES is the number of pages in the virtual address space, while SI_PHYSICAL_PAGES are the number of 'physical' RAM pages available. The two do not necessarily have any correlation; it is possible for there to be more physical pages than virtual pages, while in physical systems, the reverse is more common (32-bit systems have a 4 GiB address space, but very few systems have 4 GiB of physical RAM. The problem is even more pronounced on 64-bit systems, as they have a 16 exabyte (16 * 1024 * 1024 * 1024 gigabyte) address space.)

client.c (userspace)

Since we have added new system calls, there are no library functions that will use them. Instead, we must call them directly, using the syscall() function.

Note: you may find references to the _syscallN macros. While they work in the current kernel version as of the time of writing (and are arguably cleaner), they have been officially deprecated and may be removed from future versions.

The syscall() function takes as its first argument the number of the system call to execute (these are the standard __NR_foo preprocessor constants used in prior labs), and the rest of the arguments are passed directly to the system call.

#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <linux/unistd.h>
#include <asm/unistd.h>
#include <sys/syscall.h>

int
main (int argc, char **argv)
{
	char *data = "He lived as a devil, eh?";
	char buffer[64];
	int addr  = 0xBEEF;
	int addr2 = 0xBABE;
	int addr3 = 0x1234;
	long tmp;

	tmp = syscall(__NR_getpagetable);
	printf("getpagetable return: %ld\n", tmp);

	tmp = syscall(__NR_writetoaddress, addr, data, strlen(data) + 1);
	printf("writetoaddress return: %ld\n", tmp);

	tmp = syscall(__NR_readfromaddress, addr, buffer, 64);
	printf("readfromaddress return: %ld\n", tmp);
	printf("readfromaddress data: %s\n", buffer);

	tmp = syscall(__NR_writetoaddress, addr2, data, strlen(data) + 1);
	printf("writetoaddress return: %ld\n", tmp);

	tmp = syscall(__NR_readfromaddress, addr2, buffer, 64);
	printf("readfromaddress return: %ld\n", tmp);
	printf("readfromaddress data: %s\n", buffer);

	tmp = syscall(__NR_writetoaddress, addr3, data, strlen(data) + 1);
	printf("writetoaddress return: %ld\n", tmp);

	tmp = syscall(__NR_readfromaddress, addr3, buffer, 64);
	printf("readfromaddress return: %ld\n", tmp);
	printf("readfromaddress data: %s\n", buffer);

	tmp = syscall(__NR_putpagetable);
	printf("putpagetable return: %ld\n", tmp);
	
	return 0;
}

[Exercises]

Exercise 1: Per-User Shared Memory

Change the provided program so that all processes owned by the same user ID share the same virtual address space (and therefore the same pagetable).

[Projects]

Project 1: Swap

Add a paging system and virtual memory ("swap") to the memory system provided. (Pagetables should be per-process; do not use the result of Exercise 1 as the basis for this project.)

The page-replacement algorithm should be isolated for ease of replacement, though this is at the instructor's option.

You should also modify the provided userspace program to exhaust the provided RAM buffer and force the memory module to use the swap that you implemented.

Project 2: Shared Memory

To the module implemented in Project 1 (or, at the instructor's option, to the module provided), add the following three system calls:

  • sharepage(int address, int key): Export the page containing address address with the key key. The key will be used to retrieve the page later. (If multiple pages are shared with the same key, the result is undefined.) address should be rounded down to the beginning of its page.
  • unsharepage(int key): Remove the page with key key from the list of exported pages.
  • importpage(int address, int key): Retrieve the page with key key and import it into the current process' address space as the page containing address address. address should be rounded down to the beginning of its page.

Implement these system calls and two userspace programs to test them (one to feed data into the shared page and one to retrieve the data from the shared page). The userspace programs must not map the shared page to the same address.

References

References (Textbooks)

[NUTT] Operating Systems, 3/e


[STALL] Operating Systems, Internals and Design Principles


[SILB] Operating System Concepts with Java

  • http://os-book.com/
  • Abraham Silberschatz, Peter Baer Galvin, Greg Gagne
  • Wiley, ISBN: 978-0-471-76907-1
  • Chapter(s): #8 (Memory Management), #9 (Virtual Memory)


[DEIT] Operating Systems

  • Deitel, ISBN: 0131828274
  • Chapter(s): #10 (Virtual Memory Organization), #11 (Virtual Memory Management)