3.  Fixing corrupted file systems

      A file system can become corrupted in several ways. The most common of these ways are improper shutdown procedures and hardware failures.

      File systems may become corrupted during an unclean halt. This happens when proper shutdown procedures are not observed, physically write-protecting a mounted file system, or a mounted file system is taken off-line. The most common operator procedural failure is forgetting to sync the system before halting the CPU.

      File systems may become further corrupted if proper startup procedures are not observed, e.g., not checking a file system for inconsistencies, and not repairing inconsistencies. Allowing a corrupted file system to be used (and, thus, to be modified further) can be disastrous.

      Any piece of hardware can fail at any time. Failures can be as subtle as a bad block on a disk pack, or as blatant as a non-functional disk-controller.

3.1.  Detecting and correcting corruption

      Normally fsck is run non-interactively. In this mode it will only fix corruptions that are expected to occur from an unclean halt. These actions are a proper subset of the actions that fsck will take when it is running interactively. Throughout this paper we assume that fsck is being run interactively, and all possible errors can be encountered. When an inconsistency is discovered in this mode, fsck reports the inconsistency for the operator to chose a corrective action.

      A quiescent*** file system may be checked for structural integrity by performing consistency checks on the redundant data intrinsic to a file system. The redundant data is either read from the file system, or computed from other known values. The file system must be in a quiescent state when fsck is run, since fsck is a multi-pass program.

      In the following sections, we discuss methods to discover inconsistencies and possible corrective actions for the cylinder group blocks, the inodes, the indirect blocks, and the data blocks containing directory entries.

3.2.  Super-block checking

      The most commonly corrupted item in a file system is the summary information associated with the super-block. The summary information is prone to corruption because it is modified with every change to the file system's blocks or inodes, and is usually corrupted after an unclean halt.

      The super-block is checked for inconsistencies involving file-system size, number of inodes, free-block count, and the free-inode count. The file-system size must be larger than the number of blocks used by the super-block and the number of blocks used by the list of inodes. The file-system size and layout information are the most critical pieces of information for fsck. While there is no way to actually check these sizes, since they are statically determined by newfs, fsck can check that these sizes are within reasonable bounds. All other file system checks require that these sizes be correct. If fsck detects corruption in the static parameters of the default super-block, fsck requests the operator to specify the location of an alternate super-block.

3.3.  Free block checking

      Fsck checks that all the blocks marked as free in the cylinder group block maps are not claimed by any files. When all the blocks have been initially accounted for, fsck checks that the number of free blocks plus the number of blocks claimed by the inodes equals the total number of blocks in the file system.

      If anything is wrong with the block allocation maps, fsck will rebuild them, based on the list it has computed of allocated blocks.

      The summary information associated with the super-block counts the total number of free blocks within the file system. Fsck compares this count to the number of free blocks it found within the file system. If the two counts do not agree, then fsck replaces the incorrect count in the summary information by the actual free-block count.

      The summary information counts the total number of free inodes within the file system. Fsck compares this count to the number of free inodes it found within the file system. If the two counts do not agree, then fsck replaces the incorrect count in the summary information by the actual free-inode count.

3.4.  Checking the inode state

      An individual inode is not as likely to be corrupted as the allocation information. However, because of the great number of active inodes, a few of the inodes are usually corrupted.

      The list of inodes in the file system is checked sequentially starting with inode 2 (inode 0 marks unused inodes; inode 1 is saved for future generations) and progressing through the last inode in the file system. The state of each inode is checked for inconsistencies involving format and type, link count, duplicate blocks, bad blocks, and inode size.

      Each inode contains a mode word. This mode word describes the type and state of the inode. Inodes must be one of six types: regular inode, directory inode, symbolic link inode, special block inode, special character inode, or socket inode. Inodes may be found in one of three allocation states: unallocated, allocated, and neither unallocated nor allocated. This last state suggests an incorrectly formated inode. An inode can get in this state if bad data is written into the inode list. The only possible corrective action is for fsck is to clear the inode.

3.5.  Inode links

      Each inode counts the total number of directory entries linked to the inode. Fsck verifies the link count of each inode by starting at the root of the file system, and descending through the directory structure. The actual link count for each inode is calculated during the descent.

      If the stored link count is non-zero and the actual link count is zero, then no directory entry appears for the inode. If this happens, fsck will place the disconnected file in the lost+found directory. If the stored and actual link counts are non-zero and unequal, a directory entry may have been added or removed without the inode being updated. If this happens, fsck replaces the incorrect stored link count by the actual link count.

      Each inode contains a list, or pointers to lists (indirect blocks), of all the blocks claimed by the inode. Since indirect blocks are owned by an inode, inconsistencies in indirect blocks directly affect the inode that owns it.

      Fsck compares each block number claimed by an inode against a list of already allocated blocks. If another inode already claims a block number, then the block number is added to a list of duplicate blocks. Otherwise, the list of allocated blocks is updated to include the block number.

      If there are any duplicate blocks, fsck will perform a partial second pass over the inode list to find the inode of the duplicated block. The second pass is needed, since without examining the files associated with these inodes for correct content, not enough information is available to determine which inode is corrupted and should be cleared. If this condition does arise (only hardware failure will cause it), then the inode with the earliest modify time is usually incorrect, and should be cleared. If this happens, fsck prompts the operator to clear both inodes. The operator must decide which one should be kept and which one should be cleared.

      Fsck checks the range of each block number claimed by an inode. If the block number is lower than the first data block in the file system, or greater than the last data block, then the block number is a bad block number. Many bad blocks in an inode are usually caused by an indirect block that was not written to the file system, a condition which can only occur if there has been a hardware failure. If an inode contains bad block numbers, fsck prompts the operator to clear it.

3.6.  Inode data size

      Each inode contains a count of the number of data blocks that it contains. The number of actual data blocks is the sum of the allocated data blocks and the indirect blocks. Fsck computes the actual number of data blocks and compares that block count against the actual number of blocks the inode claims. If an inode contains an incorrect count fsck prompts the operator to fix it.

      Each inode contains a thirty-two bit size field. The size is the number of data bytes in the file associated with the inode. The consistency of the byte size field is roughly checked by computing from the size field the maximum number of blocks that should be associated with the inode, and comparing that expected block count against the actual number of blocks the inode claims.

3.7.  Checking the data associated with an inode

      An inode can directly or indirectly reference three kinds of data blocks. All referenced blocks must be the same kind. The three types of data blocks are: plain data blocks, symbolic link data blocks, and directory data blocks. Plain data blocks contain the information stored in a file; symbolic link data blocks contain the path name stored in a link. Directory data blocks contain directory entries. Fsck can only check the validity of directory data blocks.

      Each directory data block is checked for several types of inconsistencies. These inconsistencies include directory inode numbers pointing to unallocated inodes, directory inode numbers that are greater than the number of inodes in the file system, incorrect directory inode numbers for ``.'' and ``..'', and directories that are not attached to the file system. If the inode number in a directory data block references an unallocated inode, then fsck will remove that directory entry. Again, this condition can only arise when there has been a hardware failure.

      Fsck also checks for directories with unallocated blocks (holes). Such directories should never be created. When found, fsck will prompt the user to adjust the length of the offending directory which is done by shortening the size of the directory to the end of the last allocated block preceeding the hole. Unfortunately, this means that another Phase 1 run has to be done. Fsck will remind the user to rerun fsck after repairing a directory containing an unallocated block.

      If a directory entry inode number references outside the inode list, then fsck will remove that directory entry. This condition occurs if bad data is written into a directory data block.

      The directory inode number entry for ``.'' must be the first entry in the directory data block. The inode number for ``.'' must reference itself; e.g., it must equal the inode number for the directory data block. The directory inode number entry for ``..'' must be the second entry in the directory data block. Its value must equal the inode number for the parent of the directory entry (or the inode number of the directory data block if the directory is the root directory). If the directory inode numbers are incorrect, fsck will replace them with the correct values. If there are multiple hard links to a directory, the first one encountered is considered the real parent to which ``..'' should point; fsck recommends deletion for the subsequently discovered names.

3.8.  File system connectivity

      Fsck checks the general connectivity of the file system. If directories are not linked into the file system, then fsck links the directory back into the file system in the lost+found directory. This condition only occurs when there has been a hardware failure.

Acknowledgements

      I thank Bill Joy, Sam Leffler, Robert Elz and Dennis Ritchie for their suggestions and help in implementing the new file system. Thanks also to Robert Henry for his editorial input to get this document together. Finally we thank our sponsors, the National Science Foundation under grant MCS80-05144, and the Defense Advance Research Projects Agency (DoD) under Arpa Order No. 4031 monitored by Naval Electronic System Command under Contract No. N00039-82-C-0235. (Kirk McKusick, July 1983)

      I would like to thank Larry A. Wehr for advice that lead to the first version of fsck and Rick B. Brandt for adapting fsck to UNIX/TS. (T. Kowalski, July 1979)

References

[Dolotta78]
Dolotta, T. A., and Olsson, S. B. eds., UNIX User's Manual, Edition 1.1, January 1978.
[Joy83]
Joy, W., Cooper, E., Fabry, R., Leffler, S., McKusick, M., and Mosher, D. 4.2BSD System Manual, University of California at Berkeley, Computer Systems Research Group Technical Report #4, 1982.
[McKusick84]
McKusick, M., Joy, W., Leffler, S., and Fabry, R. A Fast File System for UNIX, ACM Transactions on Computer Systems 2, 3. pp. 181-197, August 1984.
[Ritchie78]
Ritchie, D. M., and Thompson, K., The UNIX Time-Sharing System, The Bell System Technical Journal 57, 6 (July-August 1978, Part 2), pp. 1905-29.
[Thompson78]
Thompson, K., UNIX Implementation, The Bell System Technical Journal 57, 6 (July-August 1978, Part 2), pp. 1931-46.