6.  ADDING NEW SYSTEM SOFTWARE

      This section is not for the novice, it describes some of the inner workings of the configuration process as well as the pertinent parts of the system autoconfiguration process. It is intended to give those people who intend to install new device drivers and/or other system facilities sufficient information to do so in the manner which will allow others to easily share the changes.

      This section is broken into four parts:

6.1.  Modifying system code

      If you wish to make site-specific modifications to the system it is best to bracket them with

#ifdef SITENAME
...
#endif
to allow your source to be easily distributed to others, and also to simplify diff(1) listings. If you choose not to use a source code control system (e.g. SCCS, RCS), and perhaps even if you do, it is recommended that you save the old code with something of the form:
#ifndef SITENAME
...
#endif
We try to isolate our site-dependent code in individual files which may be configured with pseudo-device specifications.

      Indicate machine-specific code with ``#ifdef vax'' (or other machine, as appropriate). 4.4BSD underwent extensive work to make it extremely portable to machines with similar architectures- you may someday find yourself trying to use a single copy of the source code on multiple machines.

6.2.  Adding non-standard system facilities

      This section considers the work needed to augment config's data base files for non-standard system facilities. Config uses a set of files that list the source modules that may be required when building a system. The data bases are taken from the directory in which config is run, normally /sys/conf. Three such files may be used: files, files.machine, and files.ident. The first is common to all systems, the second contains files unique to a single machine type, and the third is an optional list of modules for use on a specific machine. This last file may override specifications in the first two. The format of the files file has grown somewhat complex over time. Entries are normally of the form

for example,

The type is one of standard or optional. Files marked as standard are included in all system configurations. Optional file specifications include a list of one or more system options that together require the inclusion of this module. The options in the list may be either names of devices that may be in the configuration file, or the names of system options that may be defined. An optional file may be listed multiple times with different options; if all of the options for any of the entries are satisfied, the module is included.

      If a file is specified as a device-driver, any special compilation options for device drivers will be invoked. On the VAX this results in the use of the -i option for the C optimizer. This is required when pointer references are made to memory locations in the VAX I/O address space.

      Two other optional keywords modify the usage of the file. Config understands that certain files are used especially for kernel profiling. These files are indicated in the files files with a profiling-routine keyword. For example, the current profiling subroutines are sequestered off in a separate file with the following entry:

The profiling-routine keyword forces config not to compile the source file with the -pg option.

      The second keyword which can be of use is the config-dependent keyword. This causes config to compile the indicated module with the global configuration parameters. This allows certain modules, such as machdep.c to size system data structures based on the maximum number of users configured for the system.

6.3.  Adding device drivers to 4.4BSD

      The I/O system and config have been designed to easily allow new device support to be added. The system source directories are organized as follows:

/sys/h         machine independent include files
/sys/sys       machine-independent system source files
/sys/conf      site configuration files and basic templates
/sys/net       network-protocol-independent, but network-related code
/sys/netinet   DARPA Internet code
/sys/netimp    IMP support code
/sys/netns     Xerox NS code
/sys/vax       VAX-specific mainline code
/sys/vaxif     VAX network interface code
/sys/vaxmba    VAX MASSBUS device drivers and related code
/sys/vaxuba    VAX UNIBUS device drivers and related code

      Existing block and character device drivers for the VAX reside in ``/sys/vax'', ``/sys/vaxmba'', and ``/sys/vaxuba''. Network interface drivers reside in ``/sys/vaxif''. Any new device drivers should be placed in the appropriate source code directory and named so as not to conflict with existing devices. Normally, definitions for things like device registers are placed in a separate file in the same directory. For example, the ``dh'' device driver is named ``dh.c'' and its associated include file is named ``dhreg.h''.

      Once the source for the device driver has been placed in a directory, the file ``/sys/conf/files.machine'', and possibly ``/sys/conf/devices.machine'' should be modified. The files files in the conf directory contain a line for each C source or binary-only file in the system. Those files which are machine independent are located in ``/sys/conf/files,'' while machine specific files are in ``/sys/conf/files.machine.'' The ``devices.machine'' file is used to map device names to major block device numbers. If the device driver being added provides support for a new disk you will want to modify this file (the format is obvious).

      In addition to including the driver in the files file, it must also be added to the device configuration tables. These are located in ``/sys/vax/conf.c'', or similar for machines other than the VAX. If you don't understand what to add to this file, you should study an entry for an existing driver. Remember that the position in the device table specifies the major device number. The block major number is needed in the ``devices.machine'' file if the device is a disk.

      With the configuration information in place, your configuration file appropriately modified, and a system reconfigured and rebooted you should incorporate the shell commands needed to install the special files in the file system to the file ``/dev/MAKEDEV'' or ``/dev/MAKEDEV.local''. This is discussed in the document ``Installing and Operating 4.4BSD''. APPENDIX A. CONFIGURATION FILE GRAMMAR

      The following grammar is a compressed form of the actual yacc(1) grammar used by config to parse configuration files. Terminal symbols are shown all in upper case, literals are emboldened; optional clauses are enclosed in brackets, ``['' and ``]''; zero or more instantiations are denoted with ``*''.

Configuration ::=  [ Spec ; ]*

Spec ::= Config_spec
	| Device_spec
	| trace
	| /* lambda */

/* configuration specifications */

Config_spec ::=  machine ID
	| cpu ID
	| options Opt_list
	| ident ID
	| System_spec
	| timezone [ - ] NUMBER [ dst [ NUMBER ] ]
	| timezone [ - ] FPNUMBER [ dst [ NUMBER ] ]
	| maxusers NUMBER

/* system configuration specifications */

System_spec ::= config ID System_parameter [ System_parameter ]*

System_parameter ::=  swap_spec | root_spec | dump_spec | arg_spec

swap_spec ::=  swap [ on ] swap_dev [ and swap_dev ]*

swap_dev ::=  dev_spec [ size NUMBER ]

root_spec ::=  root [ on ] dev_spec

dump_spec ::=  dumps [ on ] dev_spec

arg_spec ::=  args [ on ] dev_spec

dev_spec ::=  dev_name | major_minor

major_minor ::=  major NUMBER minor NUMBER

dev_name ::=  ID [ NUMBER [ ID ] ]

/* option specifications */

Opt_list ::=  Option [ , Option ]*

Option ::=  ID [ = Opt_value ]

Opt_value ::=  ID | NUMBER

Mkopt_list ::=  Mkoption [ , Mkoption ]*

Mkoption ::=  ID = Opt_value

/* device specifications */

Device_spec ::= device Dev_name Dev_info Int_spec
	| master Dev_name Dev_info
	| disk Dev_name Dev_info
	| tape Dev_name Dev_info
	| controller Dev_name Dev_info [ Int_spec ]
	| pseudo-device Dev [ NUMBER ]

Dev_name ::=  Dev NUMBER

Dev ::=  uba | mba | ID

Dev_info ::=  Con_info [ Info ]*

Con_info ::=  at Dev NUMBER
	| at nexus NUMBER

Info ::=  csr NUMBER
	| drive NUMBER
	| slave NUMBER
	| flags NUMBER

Int_spec ::=  vector ID [ ID ]*
	| priority NUMBER

Lexical Conventions

The terminal symbols are loosely defined as:

ID

One or more alphabetics, either upper or lower case, and underscore, ``_''.
NUMBER

Approximately the C language specification for an integer number. That is, a leading ``0x'' indicates a hexadecimal value, a leading ``0'' indicates an octal value, otherwise the number is expected to be a decimal value. Hexadecimal numbers may use either upper or lower case alphabetics.
FPNUMBER

A floating point number without exponent. That is a number of the form ``nnn.ddd'', where the fractional component is optional.

In special instances a question mark, ``?'', can be substituted for a ``NUMBER'' token. This is used to effect wildcarding in device interconnection specifications.

Comments in configuration files are indicated by a ``#'' character at the beginning of the line; the remainder of the line is discarded.

A specification is interpreted as a continuation of the previous line if the first character of the line is tab. APPENDIX B. RULES FOR DEFAULTING SYSTEM DEVICES

      When config processes a ``config'' rule which does not fully specify the location of the root file system, paging area(s), device for system dumps, and device for argument list processing it applies a set of rules to define those values left unspecified. The following list of rules are used in defaulting system devices.

1)
If a root device is not specified, the swap specification must indicate a ``generic'' system is to be built.
2)
If the root device does not specify a unit number, it defaults to unit 0.
3)
If the root device does not include a partition specification, it defaults to the ``a'' partition.
4)
If no swap area is specified, it defaults to the ``b'' partition of the root device.
5)
If no device is specified for processing argument lists, the first swap partition is selected.
6)
If no device is chosen for system dumps, the first swap partition is selected (see below to find out where dumps are placed within the partition).

      The following table summarizes the default partitions selected when a device specification is incomplete, e.g. ``hp0''.

Type    Partition
------------------
root    ``a''
swap    ``b''
args    ``b''
dumps   ``b''

Multiple swap/paging areas

      When multiple swap partitions are specified, the system treats the first specified as a ``primary'' swap area which is always used. The remaining partitions are then interleaved into the paging system at the time a swapon(2) system call is made. This is normally done at boot time with a call to swapon(8) from the /etc/rc file.

System dumps

      System dumps are automatically taken after a system crash, provided the device driver for the ``dumps'' device supports this. The dump contains the contents of memory, but not the swap areas. Normally the dump device is a disk in which case the information is copied to a location at the back of the partition. The dump is placed in the back of the partition because the primary swap and dump device are commonly the same device and this allows the system to be rebooted without immediately overwriting the saved information. When a dump has occurred, the system variable dumpsize is set to a non-zero value indicating the size (in bytes) of the dump. The savecore(8) program then copies the information from the dump partition to a file in a ``crash'' directory and also makes a copy of the system which was running at the time of the crash (usually ``/kernel''). The offset to the system dump is defined in the system variable dumplo (a sector offset from the front of the dump partition). The savecore program operates by reading the contents of dumplo, dumpdev, and dumpmagic from /dev/kmem, then comparing the value of dumpmagic read from /dev/kmem to that located in corresponding location in the dump area of the dump partition. If a match is found, savecore assumes a crash occurred and reads dumpsize from the dump area of the dump partition. This value is then used in copying the system dump. Refer to savecore(8) for more information about its operation.

      The value dumplo is calculated to be

dumpdev-size - memsize
where dumpdev-size is the size of the disk partition where system dumps are to be placed, and memsize is the size of physical memory. If the disk partition is not large enough to hold a full dump, dumplo is set to 0 (the start of the partition). APPENDIX C. SAMPLE CONFIGURATION FILES

      The following configuration files are developed in section 5; they are included here for completeness.

#
# ANSEL VAX (a picture perfect machine)
#
machine	vax
cpu	VAX780
timezone	8 dst
ident	ANSEL
maxusers	40

config	kernel	root on hp0
config	hpkernel	root on hp0 swap on hp0 and hp2
config	genkernel	swap generic

controller	mba0	at nexus ?
disk	hp0	at mba? disk ?
disk	hp1	at mba? disk ?
controller	mba1	at nexus ?
disk	hp2	at mba? disk ?
disk	hp3	at mba? disk ?
controller	uba0	at nexus ?
controller	tm0	at uba? csr 0172520	vector tmintr
tape	te0	at tm0 drive 0
tape	te1	at tm0 drive 1
device	dh0	at uba? csr 0160020	vector dhrint dhxint
device	dm0	at uba? csr 0170500	vector dmintr
device	dh1	at uba? csr 0160040	vector dhrint dhxint
device	dh2	at uba? csr 0160060	vector dhrint dhxint
#
# UCBVAX - Gateway to the world
#
machine	vax
cpu	"VAX780"
cpu	"VAX750"
ident	UCBVAX
timezone	8 dst
maxusers	32
options	INET
options	NS

config	kernel	root on hp swap on hp and rk0 and rk1
config	upkernel	root on up
config	hkkernel	root on hk swap on rk0 and rk1

controller	mba0	at nexus ?
controller	uba0	at nexus ?
disk	hp0	at mba? drive 0
disk	hp1	at mba? drive 1
controller	sc0	at uba? csr 0176700	vector upintr
disk	up0	at sc0 drive 0
disk	up1	at sc0 drive 1
controller	hk0	at uba? csr 0177440	vector rkintr
disk	rk0	at hk0 drive 0
disk	rk1	at hk0 drive 1
pseudo-device	pty
pseudo-device	loop
pseudo-device	imp
device	acc0	at uba? csr 0167600	vector accrint accxint
pseudo-device	ether
device	ec0	at uba? csr 0164330	vector ecrint eccollide ecxint
device	il0	at uba? csr 0164000	vector ilrint ilcint































APPENDIX D. VAX KERNEL DATA STRUCTURE SIZING RULES

      Certain system data structures are sized at compile time according to the maximum number of simultaneous users expected, while others are calculated at boot time based on the physical resources present, e.g. memory. This appendix lists both sets of rules and also includes some hints on changing built-in limitations on certain data structures.

Compile time rules

      The file /sys/conf/param.c contains the definitions of almost all data structures sized at compile time. This file is copied into the directory of each configured system to allow configuration-dependent rules and values to be maintained. (Each copy normally depends on the copy in /sys/conf, and global modifications cause the file to be recopied unless the makefile is modified.) The rules implied by its contents are summarized below (here MAXUSERS refers to the value defined in the configuration file in the ``maxusers'' rule). Most limits are computed at compile time and stored in global variables for use by other modules; they may generally be patched in the system binary image before rebooting to test new values.

nproc

The maximum number of processes which may be running at any time. It is referred to in other calculations as NPROC and is defined to be
20 + 8 * MAXUSERS
ntext

The maximum number of active shared text segments. The constant is intended to allow for network servers and common commands that remain in the table. It is defined as
36 + MAXUSERS.
ninode

The maximum number of files in the file system which may be active at any time. This includes files in use by users, as well as directory files being read or written by the system and files associated with bound sockets in the UNIX IPC domain. It is defined as
(NPROC + 16 + MAXUSERS) + 32
nfile

The number of ``file table'' structures. One file table structure is used for each open, unshared, file descriptor. Multiple file descriptors may reference a single file table entry when they are created through a dup call, or as the result of a fork. This is defined to be
16 * (NPROC + 16 + MAXUSERS) / 10 + 32
ncallout

The number of ``callout'' structures. One callout structure is used per internal system event handled with a timeout. Timeouts are used for terminal delays, watchdog routines in device drivers, protocol timeout processing, etc. This is defined as
16 + NPROC
nclist

The number of ``c-list'' structures. C-list structures are used in terminal I/O, and currently each holds 60 characters. Their number is defined as
60 + 12 * MAXUSERS
nmbclusters

The maximum number of pages which may be allocated by the network. This is defined as 256 (a quarter megabyte of memory) in /sys/h/mbuf.h. In practice, the network rarely uses this much memory. It starts off by allocating 8 kilobytes of memory, then requesting more as required. This value represents an upper bound.
nquota

The number of ``quota'' structures allocated. Quota structures are present only when disc quotas are configured in the system. One quota structure is kept per user. This is defined to be
(MAXUSERS * 9) / 7 + 3
ndquot

The number of ``dquot'' structures allocated. Dquot structures are present only when disc quotas are configured in the system. One dquot structure is required per user, per active file system quota. That is, when a user manipulates a file on a file system on which quotas are enabled, the information regarding the user's quotas on that file system must be in-core. This information is cached, so that not all information must be present in-core all the time. This is defined as
NINODE + (MAXUSERS * NMOUNT) / 4
where NMOUNT is the maximum number of mountable file systems.

In addition to the above values, the system page tables (used to map virtual memory in the kernel's address space) are sized at compile time by the SYSPTSIZE definition in the file /sys/vax/vmparam.h. This is defined to be

20 + MAXUSERS
pages of page tables. Its definition affects the size of many data structures allocated at boot time because it constrains the amount of virtual memory which may be addressed by the running system. This is often the limiting factor in the size of the buffer cache, in which case a message is printed when the system configures at boot time.

Run-time calculations

      The most important data structures sized at run-time are those used in the buffer cache. Allocation is done by allocating physical memory (and system virtual memory) immediately after the system has been started up; look in the file /sys/vax/machdep.c. The amount of physical memory which may be allocated to the buffer cache is constrained by the size of the system page tables, among other things. While the system may calculate a large amount of memory to be allocated to the buffer cache, if the system page table is too small to map this physical memory into the virtual address space of the system, only as much as can be mapped will be used.

      The buffer cache is comprised of a number of ``buffer headers'' and a pool of pages attached to these headers. Buffer headers are divided into two categories: those used for swapping and paging, and those used for normal file I/O. The system tries to allocate 10% of the first two megabytes and 5% of the remaining available physical memory for the buffer cache (where available does not count that space occupied by the system's text and data segments). If this results in fewer than 16 pages of memory allocated, then 16 pages are allocated. This value is kept in the initialized variable bufpages so that it may be patched in the binary image (to allow tuning without recompiling the system), or the default may be overridden with a configuration-file option. For example, the option options BUFPAGES="3200" causes 3200 pages (3.2M bytes) to be used by the buffer cache. A sufficient number of file I/O buffer headers are then allocated to allow each to hold 2 pages each. Each buffer maps 8K bytes. If the number of buffer pages is larger than can be mapped by the buffer headers, the number of pages is reduced. The number of buffer headers allocated is stored in the global variable nbuf, which may be patched before the system is booted. The system option options NBUF="1000" forces the allocation of 1000 buffer headers. Half as many swap I/O buffer headers as file I/O buffers are allocated, but no more than 256.

System size limitations

      As distributed, the sum of the virtual sizes of the core-resident processes is limited to 256M bytes. The size of the text segment of a single process is currently limited to 6M bytes. It may be increased to no greater than the data segment size limit (see below) by redefining MAXTSIZ. This may be done with a configuration file option, e.g. options MAXTSIZ="(10*1024*1024)" to set the limit to 10 million bytes. Other per-process limits discussed here may be changed with similar options with names given in parentheses. Soft, user-changeable limits are set to 512K bytes for stack (DFLSSIZ) and 6M bytes for the data segment (DFLDSIZ) by default; these may be increased up to the hard limit with the setrlimit(2) system call. The data and stack segment size hard limits are set by a system configuration option to one of 17M, 33M or 64M bytes. One of these sizes is chosen based on the definition of MAXDSIZ; with no option, the limit is 17M bytes; with an option options MAXDSIZ="(32*1024*1024)" (or any value between 17M and 33M), the limit is increased to 33M bytes, and values larger than 33M result in a limit of 64M bytes. You must be careful in doing this that you have adequate paging space. As normally configured , the system has 16M or 32M bytes per paging area, depending on disk size. The best way to get more space is to provide multiple, thereby interleaved, paging areas. Increasing the virtual memory limits results in interleaving of swap space in larger sections (from 500K bytes to 1M or 2M bytes).

      By default, the virtual memory system allocates enough memory for system page tables mapping user page tables to allow 256 megabytes of simultaneous active virtual memory. That is, the sum of the virtual memory sizes of all (completely- or partially-) resident processes can not exceed this limit. If the limit is exceeded, some process(es) must be swapped out. To increase the amount of resident virtual space possible, you can alter the constant USRPTSIZE (in /sys/vax/vmparam.h). Each page of system page tables allows 8 megabytes of user virtual memory.

      Because the file system block numbers are stored in page table pg_blkno entries, the maximum size of a file system is limited to 2^24 1024 byte blocks. Thus no file system can be larger than 8 gigabytes.

      The number of mountable file systems is set at 20 by the definition of NMOUNT in /sys/h/param.h. This should be sufficient; if not, the value can be increased up to 255. If you have many disks, it makes sense to make some of them single file systems, and the paging areas don't count in this total.

      The limit to the number of files that a process may have open simultaneously is set to 64. This limit is set by the NOFILE definition in /sys/h/param.h. It may be increased arbitrarily, with the caveat that the user structure expands by 5 bytes for each file, and thus UPAGES (/sys/vax/machparam.h) must be increased accordingly.

      The amount of physical memory is currently limited to 64 Mb by the size of the index fields in the core-map (/sys/h/cmap.h). The limit may be increased by following instructions in that file to enlarge those fields. APPENDIX E. NETWORK CONFIGURATION OPTIONS

      The network support in the kernel is self-configuring according to the protocol support options (INET and NS) and the network hardware discovered during autoconfiguration. There are several changes that may be made to customize network behavior due to local restrictions. Within the Internet protocol routines, the following options set in the system configuration file are supported:

GATEWAY

The machine is to be used as a gateway. This option currently makes only minor changes. First, the size of the network routing hash table is increased. Secondly, machines that have only a single hardware network interface will not forward IP packets; without this option, they will also refrain from sending any error indication to the source of unforwardable packets. Gateways with only a single interface are assumed to have missing or broken interfaces, and will return ICMP unreachable errors to hosts sending them packets to be forwarded.
TCP_COMPAT_42

This option forces the system to limit its initial TCP sequence numbers to positive numbers. Without this option, 4.4BSD systems may have problems with TCP connections to 4.2BSD systems that connect but never transfer data. The problem is a bug in the 4.2BSD TCP.
IPFORWARDING

Normally, 4.4BSD machines with multiple network interfaces will forward IP packets received that should be resent to another host. If the line ``options IPFORWARDING="0"'' is in the system configuration file, IP packet forwarding will be disabled.
IPSENDREDIRECTS

When forwarding IP packets, 4.4BSD IP will note when a packet is forwarded using the same interface on which it arrived. When this is noted, if the source machine is on the directly-attached network, an ICMP redirect is sent to the source host. If the packet was forwarded using a route to a host or to a subnet, a host redirect is sent, otherwise a network redirect is sent. The generation of redirects may be inhibited with the configuration option ``options IPSENDREDIRECTS="0".''
SUBNETSARELOCAL
TCP calculates a maximum segment size to use for each connection, and sends no datagrams larger than that size. This size will be no larger than that supported on the outgoing interface. Furthermore, if the destination is not on the local network, the size will be no larger than 576 bytes. For this test, other subnets of a directly-connected subnetted network are considered to be local unless the line ``options SUBNETSARELOCAL="0"'' is used in the system configuration file.

The following options are supported by the Xerox NS protocols:

NSIP

This option allows NS IDP datagrams to be encapsulated in Internet IP packets for transmission to a collaborating NSIP host. This may be used to pass IDP packets through IP-only link layer networks. See nsip(4P) for details.
THREEWAYSHAKE

The NS Sequenced Packet Protocol does not require a three-way handshake before considering a connection to be in the established state. (A three-way handshake consists of a connection request, an acknowledgement of the request along with a symmetrical opening indication, and then an acknowledgement of the reciprocal opening packet.) This option forces a three-way handshake before data may be transmitted on Sequenced Packet sockets.