Protected RAMFS
Basic Features
PRAMFS is robust in the face of errant writes to the protected area, which could arise due to kernel bugs. The page table entries that map the backing-store RAM are marked read-only on initialization. Write operations to the filesystem temporarily mark the pages to be written as writable, the write operation is carried out with locks held, and then the pte is marked read-only again. This limits the writes to the filesystem in the window when the locks are held. The write-protection feature can be disabled by the kernel config option CONFIG_PRAMFS_NOWP.
PRAMFS forces all files to use direct-IO. The filp->f_flags is set to O_DIRECT when the files are opened. Opening all files as O_DIRECT avoids page caching, and data is written immediately to a storage device. This is nearly equal to the speed of the system RAM, but it forces applications to do block-aligned I/O.
PRAMFS does not have recovery facilities, such as journaling, to survive a crash or power failure during a write operation. The filesystem maintains checksums for the superblock and inode to check the validity of the stored object. An inode with an incorrect checksum is marked as bad, which may lead to data loss in case of power failure during a write operation.
PRAMFS also supports execute in place (XIP), which is a technique that executes programs directly from the storage instead of copying it into RAM. For a RAM filesystem, XIP makes sense since the system can execute from the storage device as fast as it can from the system RAM, and it does not make a duplicate copy in RAM.
Usage
There is no mkfs utility to create a PRAMFS. The filesystem is automatically created when the filesystem is mounted with the init option. The command to create and mount a PRAMFS is:
# mount -t pramfs -o physaddr=0x20000000,init=0x2F000,bs=1024 none /mnt/pram
This command creates a filesystem of 0x2F000 bytes, with a block size of 1024 bytes, and locates it at the physical address 0x20000000.
To retrieve an existing filesystem, mount the PRAMFS with the physaddr parameter that was used in the previous mount. The details of the filesystem such as blocksize and filesystem size are read from the superblock:
# mount -t pramfs -o physaddr=0x20000000 none /mnt/pram
Other filesystem parameters are:
- bpi: specifies the bytes-per-inode ratio. For every
bpi bytes in
the filesystem, an inode is created.
- N: specifies the number of inodes to allocate in the inode table. If the option is not specified, the bytes-per-inode ratio is used to calculate the number of inodes.
If the init option is not specified, the bs, bpi, or N options are ignored by the mount, since this information is picked up from the existing filesystem. When creating the filesystem, if no option for the inode reservation is specified, by default 5% of the filesystem space is used for the inode table.
To test the memory protection of PRAMFS, the developers have written a kernel module that attempts to write within the PRAMFS memory with the intention of corrupting the memory space. This causes a kernel protection fault, and, after a reboot, you may re-mount the filesystem to find that the test module was not capable of corrupting the filesystem.
Filesystem Layout
PRAMFS has a simple layout, with the super-block in the first 128 bytes of the RAM block, followed by the inode table, the block usage map, and finally the data blocks. The superblock is 128 bytes long and contains all of the important information, such as filesystem size, block size, etc., needed to remount the filesystem.
![[PRAMFS layout]](/_proxy/https/static.lwn.net/images/pramfs_layout.png)
The inode table consists of the inodes required for the filesystem. The number of inodes are computed when the filesystem is initialized. Each inode is 128 bytes long. Directory entry information, such as filename and owning inode, are contained within the inode. This presents a problem for hard links because a hard link requires two directory entries under different directories for the same inode. Hence, PRAMFS does not support hard links. The inode format also limits the filename to 48 characters. The inode number is the absolute offset of that inode from the beginning of the filesystem.
Regular PRAMFS file inodes contain the i_type.reg.row_block field, which points to a data block which contains doubly-indirect pointers to the file's data blocks. This is similar to the double indirect block field of the ext2 filesystem inode. But, that means that a file smaller than 1 block will require 3 blocks to store it.
![[PRAMFS inode]](/_proxy/https/static.lwn.net/images/pramfs_inode.png)
Inodes within a directory are linked together in
a doubly-linked list. The directory inode stores the first and last
inode in the directory listing. The previous entry of the first inode
and the next entry of the last inode are null terminated.
Write Protection
PRAMFS utilizes the system's paging unit by mapping its RAM initially as read-only. Writes to data objects first mark the corresponding page table entries as writable, perform the write and then mark them read-only again. This operation is done atomically by holding the page-table spin-lock with interrupts disabled. Following a write, stale entries in the system TLB are flushed. Write locks are held at the superblock, inode, or block level, depending on the granularity of modification.
Since PRAMFS attempts to avoid filesystem corruption caused because of kernel bugs, shared mmap() regions can only be read. Dirty pages in the page cache cannot be written back to the filesystem. For this reason, PRAMFS defines only the readpage() member of struct address_space_operations; the writepage() entry is declared as NULL.
Acceptance
This is the second attempt to get PRAMFS in the mainline. The previous attempt was done in 2004 by Steve Longerbeam of Montavista.The home page of PRAMFS claims the filesystem to be fully-featured. But, as part of the linux-kernel discussion, Henrique de Moraes Holschuh strongly disagreed:
There are not enough performance benchmarks information against other filesystems, yet, to form an opinion. Performance tests done while adding Execute in Place (XIP) reveal a performance as low as 13Mbps for per-character writes and 35Mbps for block writes using bonnie. Pavel Machek considers these numbers to be pretty low, especially for a RAM-based filesystem:
No tests have been performed using existing solutions, such as ramdisk on the same hardware, to compare apples with apples. The low performance is attributed to the excessive locking done for writes. Pavel believes the developers of PRAMFS are confused regarding the goals of the filesystem, and whether they are designing for speed, completeness, or robustness.
PRAMFS is a niche filesystem, mostly for embedded devices with NVRAM,
and hence lacks important features, such as hard links and shared
mmap()s. However, for quite a number of situations an entire
filesystem seems like overkill. Pavel suggests a special NVRAM-based block device
with a traditional filesystem or a filesystem based on Solid State Device
(SSD) filesystems would be a better option. With the current number of
objections, PRAMFS is unlikely to go into the mainline. However, Marco plans
to further improve the code with more features, and to update the
PRAMFS homepage to better reflect the filesystem's goals.
| Index entries for this article | |
|---|---|
| Kernel | Filesystems |
| GuestArticles | Rodrigues, Goldwyn |
