memfs, section 2.

2. Implementation

The current implementation took less time to write than did this paper. It consists of 560 lines of kernel code (1.7K text + data) and some minor modifications to the program that builds disk based filesystems, newfs. A condensed version of the kernel code for the memory-based filesystem are reproduced in Appendix 1.

A filesystem is created by invoking the modified newfs, with an option telling it to create a memory-based filesystem. It allocates a section of virtual address space of the requested size and builds a filesystem in the memory instead of on a disk partition. When built, it does a mount system call specifying a filesystem type of MFS (Memory File System). The auxiliary data parameter to the mount call specifies a pointer to the base of the memory in which it has built the filesystem. (The auxiliary data parameter used by the local filesystem, ufs, specifies the block device containing the filesystem.)

The mount system call allocates and initializes a mount table entry and then calls the filesystem-specific mount routine. The filesystem-specific routine is responsible for doing the mount and initializing the filesystem-specific portion of the mount table entry. The memory-based filesystem-specific mount routine, mfs_mount(), is shown in Appendix 1. It allocates a block-device vnode to represent the memory disk device. In the private area of this vnode it stores the base address of the filesystem and the process identifier of the newfs process for later reference when doing I/O. It also initializes an I/O list that it uses to record outstanding I/O requests. It can then call the ufs filesystem mount routine, passing the special block-device vnode that it has created instead of the usual disk block-device vnode. The mount proceeds just as any other local mount, except that requests to read from the block device are vectored through mfs_strategy() (described below) instead of the usual spec_strategy() block device I/O function. When the mount is completed, mfs_mount() does not return as most other filesystem mount functions do; instead it sleeps in the kernel awaiting I/O requests. Each time an I/O request is posted for the filesystem, a wakeup is issued for the corresponding newfs process. When awakened, the process checks for requests on its buffer list. A read request is serviced by copying data from the section of the newfs address space corresponding to the requested disk block to the kernel buffer. Similarly a write request is serviced by copying data to the section of the newfs address space corresponding to the requested disk block from the kernel buffer. When all the requests have been serviced, the newfs process returns to sleep to await more requests.

Once mounted, all operations on files in the memory-based filesystem are handled by the ufs filesystem code until they get to the point where the filesystem needs to do I/O on the device. Here, the filesystem encounters the second piece of the memory-based filesystem. Instead of calling the special-device strategy routine, it calls the memory-based strategy routine, mfs_strategy(). Usually, the request is serviced by linking the buffer onto the I/O list for the memory-based filesystem vnode and sending a wakeup to the newfs process. This wakeup results in a context-switch to the newfs process, which does a copyin or copyout as described above. The strategy routine must be careful to check whether the I/O request is coming from the newfs process itself, however. Such requests happen during mount and unmount operations, when the kernel is reading and writing the superblock. Here, mfs_strategy() must do the I/O itself to avoid deadlock.

The final piece of kernel code to support the memory-based filesystem is the close routine. After the filesystem has been successfully unmounted, the device close routine is called. For a memory-based filesystem, the device close routine is mfs_close(). This routine flushes any pending I/O requests, then sets the I/O list head to a special value that is recognized by the I/O servicing loop in mfs_mount() as an indication that the filesystem is unmounted. The mfs_mount() routine exits, in turn causing the newfs process to exit, resulting in the filesystem vanishing in a cloud of dirty pages.

The paging of the filesystem does not require any additional code beyond that already in the kernel to support virtual memory. The newfs process competes with other processes on an equal basis for the machine's available memory. Data pages of the filesystem that have not yet been used are zero-fill-on-demand pages that do not occupy memory, although they currently allocate space in backing store. As long as memory is plentiful, the entire contents of the filesystem remain memory resident. When memory runs short, the oldest pages of newfs will be pushed to backing store as part of the normal paging activity. The pages that are pushed usually hold the contents of files that have been created in the memory-based filesystem but have not been recently accessed (or have been deleted).[Leffler1989a]