6.  Usage Statistics

      The following usage statistics were collected on two DEC VAX-11/780 computers of the Purdue Computer Science Department. Both machines are mainly used for research purposes. Thus, the data reflect an environment in which the majority of projects involve prototyping and advanced software development, but relatively little long-term maintenance.

      For the first experiment, the ci and co operations were instrumented to log the number of backward and forward deltas applied. The data were collected during a 13 month period from Dec. 1982 to Dec. 1983. Table I summarizes the results.

+----------+-------------+--------------+-------------+---------------+------------+
|Operation |    Total    | Total deltas | Mean deltas |  Operations   |   Branch   |
|          | operations  |   applied    |   applied   | with >1 delta | operations |
+----------+-------------+--------------+-------------+---------------+------------+
|co        |     7867    |     9320     |    1.18     |  509    (6%)  | 203   (3%) |
|ci        |     3468    |     2207     |    0.64     |   85    (2%)  |  75   (2%) |
|ci & co   |    11335    |    11527     |    1.02     |  594    (5%)  | 278   (2%) |
+----------+-------------+--------------+-------------+---------------+------------+
Table I. Statistics for co and ci operations.

      The first two lines show statistics for check-out and check-in; the third line shows the combination. Recall that ci performs an implicit check-out to obtain a revision for computing the delta. In all measures presented, the most recent revision (stored intact) counts as one delta. The number of deltas applied represents the number of passes necessary, where the first `pass' is a copying step.

      Note that the check-out operation is executed more than twice as frequently as the check-in operation. The fourth column gives the mean number of deltas applied in all three cases. For ci, the mean number of deltas applied is less than one. The reasons are that the initial check-in requires no delta at all, and that the only time ci requires more than one delta is for branches. Column 5 shows the actual number of operations that applied more than one delta. The last column indicates that branches were not used often.

      The last three columns demonstrate that the most recent trunk revision is by far the most frequently accessed. For RCS, check-out of this revision is a simple copy operation, which is the absolute minimum given the copy-semantics of co. Access to older revisions and branches is more common in non-academic environments, yet even if access to older deltas were an order of magnitude more frequent, the combined average number of deltas applied would still be below 1.2. Since RCS is faster than SCCS until up to 10 delta applications, reverse deltas are clearly the method of choice.

      The second experiment, conducted in March of 1984, involved surveying the existing RCS files on our two machines. The goal was to determine the mean number of revisions per RCS file, as well as the space consumed by them. Table II shows the results. (Tables I and II were produced at different times and are unrelated.)

+------------+-----------+-----------+-----------+--------------+--------------+----------+
|            | Total RCS |   Total   |   Mean    | Mean size of | Mean size of | Overhead |
|            |   files   | revisions | revisions |  RCS files   |  revisions   |          |
+------------+-----------+-----------+-----------+--------------+--------------+----------+
|All files   |   8033    |   11133   |   1.39    |     6156     |     5585     |   1.10   |
|Files with  |   1477    |    4578   |   3.10    |     8074     |     6041     |   1.34   |
|>= 2 deltas |           |           |           |              |              |          |
+------------+-----------+-----------+-----------+--------------+--------------+----------+
Table II. Statistics for RCS files.

      The mean number of revisions per RCS file is 1.39. Columns 5 and 6 show the mean sizes (in bytes) of an RCS file and of the latest revision of each RCS file, respectively. The `overhead' column contains the ratio of the mean sizes. Assuming that all revisions in an RCS file are approximately the same size, this ratio gives a measure of the space consumed by the extra revisions.

      In our sample, over 80 per cent of the RCS files contained only a single revision. The reason is that our systems programmers routinely check in all source files on the distribution tapes, even though they may never touch them again. To get a better indication of how much space savings are possible with deltas, all measures with those files that contained 2 or more revisions were recomputed. Only for those files is RCS necessary. As shown in the second line, the average number of revisions for those files is 3.10, with an overhead of 1.34. This means that the extra 2.10 deltas require 34 per cent extra space, or 16 per cent per extra revision. Rochkind3 measured the space consumed by SCCS, and reported an average of 5 revisions per group and an overhead of 1.37 (or about 9 per cent per extra revision). In a later paper, Glasser6 observed an average of 7 revisions per group in a single, large project, but provided no overhead figure. In his paper on DSEE5, Leblang reported that delta storage combined with blank compression results in an overhead of a mere 1-2 per cent per revision. Since leading blanks accounted for about 20 per cent of the surveyed Pascal programs, a revision group with 5-10 members was smaller than a single cleartext copy.

      The above observations demonstrate clearly that the space needed for extra revisions is small. With delta storage, the luxury of keeping multiple revisions online is certainly affordable. In fact, introducing a system with delta storage may reduce storage requirements, because programmers often save back-up copies anyway. Since back-up copies are stored much more efficiently with deltas, introducing a system such as RCS may actually free a considerable amount of space.