|
---|
Table of Contents |
Introduction |
Report Contents |
Counting Methods |
Command line syntax |
Configuration |
Getting CCCC |
CCCC is a tool for the analysis of source code in various languages (primarily C++), which generates a report in HTML format on various measurements of the code processed. Although the tool was originally implemented to process C++ and ANSI C, facilities have recently been added to allow Java and Ada 95 source files to be recognized and processed as well. The name CCCC stands for 'C and C++ Code Counter'.
Measurements of source code of this kind are generally referred to as 'software metrics', or more precisely 'software product metrics' (as the term 'software metrics` also covers measurements of the software process, which are called 'software process metrics'). There is a reasonable consensus among modern opinion leaders in the software engineering field that measurement of some kind is probably a Good Thing, although there is less consensus on what is worth measuring and what the measurements mean.
CCCC has been developed as freeware, and is released in source code form, although precompiled binaries for Linux and Windows are included in the distribution for the convenience of users. Users are encouraged to compile the program themselves, and to modify the source to reflect their preferences and interests. The CCCC Reference Guide is intended to document the internals of the program to enable interested users to get started on hacking the source in this way.
The simplest way of using CCCC is just to run it with the names of a selection of files on the command line like this:
cccc my_types.h big.h small.h *.cc
CCCC will open each of the files specified on the command line (using standard wildcard processing were appropriate), and parse it using a parser selected to match the filename extension. As the parser processes each file, recognition of certain constructs will cause records to be written into an internal database. When all files have been processed, a report on the contents of the internal database will be generated in HTML format. By default the HTML report is generated to the file cccc.htm in the current working directory, although the output filename is configurable.
The report contains a number of tables identifying the modules in the files submitted and covering:
The report generated by CCCC normally consists of six tables plus a table of contents at the beginning and some informational material about CCCC itself at the end.
Table name | Description |
---|---|
Project Summary | This table presents summary values of various measures over the body of source code submitted. |
Procedural Summary | This table presents values of procedural measures summed for each module identified in the code submitted. |
Procedural Details | This table presents values of the same procedural measures covered in the procedural summary report, but this time broken down within each module into the contributions of each member function of the module. |
Structural Summary | This table presents counts of fan-in and fan-out relationships to each module identified, and a derived metric called the Henry/Kafura/Shepperd measure, which is calculated as the square of the product of the fan-in and fan-outcounts. |
Structural Details | This table presents lists of the modules contributing to the relationship counts reported in the structural summary. |
Rejected Extents | This table presents a list of code regions which the analyser was unable to parse. |
Tag | Metric Name | Description |
---|---|---|
LOC | Lines of Code |
This metric counts the lines of non-blank, non-comment source code in
a function (LOCf), module (LOCm), or project (LOCp).
LOC was one of the earliest metrics to come into use (principally because
it is straightforward to measure).
It has an obvious relation to the size or complexity of a piece of code, and can be calibrated for use in prediction of maintenance effort, although concern has been expressed that use of this metric as a measure of programmer productivity may tend to encourage verbose programming practises and discourage desirable simplification. |
MVG | McCabe's Cyclomatic Complexity | A measure of a body of code based on analysis of the cyclomatic complexity of the directed acyclic graph which represents the flow of control within each function. First proposed as a measure of the minimum number of test cases to ensure all parts of each function are exercised, it is now widely accepted as a measure for the detection of code which is likely to be error-prone and/or difficult to maintain. |
COM | Comment Lines | A crude measure comparable to LOC of the extent of commenting within a region of code. Not very meaningful in isolation, but sometimes used in ratio with LOC or MVG to ensure that comments are distributed proportionately to the bulk or complexity of a region of code. |
L_C,M_C | LOC/COM, MVG/COM | See above |
FO,FOc,FOv FI,FIc,FIc | Fan-out, Fan-in |
For a given module A, the fan-out is the number of other modules which the
module A uses, while the fan-in is the number of other modules which use A.
See the section below on counting methods for a discussion of the distinction between the variants on each of these measures. these figures. |
HKS, HKSv, HKSc | Henry-Kafura/Shepperd measure |
This metric is derived by squaring the product of the fan-in and fan-out
of each module. The original Henry-Kafura measure, which has been described
as a measure of 'information flow complexity' includes a term for the length
of the module under consideration, but CCCC uses the measure as modified by
Shepperd, which omits this term on the basis that it debases the measure by
combining two attributes which can and should be separately measured.
Corresponding to the variants on the fan-in and fan-out measures described above, similar variants are calculated on this metric. |
NOM | Number of modules | Number of modules identified in the project. See discussion below about what constitutes a module. |
WMC | Weighted methods per class | This measure, proposed by Chidamber and Kemerer, is a count of the number of functions defined in a module multiplied by a weighting factor. The only weighting algorithm suggested in the original formulation is a uniform weighting of one unit per function. |
REJ | Rejected lines | This is a measure of the number of non-blank non-comment lines of code which was not successfully analysed by the parser. This is more of a validity check on the report generated than a metric of the code submitted: if the amount of code rejected was more than a small fraction (say 10%) of the total code processed, the meaningfulness of the numbers generated by the run must be in doubt. |
Note that the boolean operations introduce extra paths through the code because the second operand may or may not be evaluated according to the value of the first operand. Note also that the treatment of switch statements is problematic: it is quite common for multiple 'case' labels to be attached to the same block of code, so counting these might overstate the value. Counting the 'break' tokens instead is better so long as their are no case labels in the middle of the block of code which the break terminates. The motive for counting the 'switch' token is to provide for the default case, which gives rise to a path whether or not the programmer defines a default label. Counting the break token in this way may distort the count where it is used in other contexts (i.e. to exit from a block).
While these relationships may seem unrelated to the invocation and module data counts, they are likely to show a strong corelation because of the fact that in an object oriented environment, it is likely (but not inevitable) that the low-level use relationships of invocation and direct access to data structures require an object of the class of the supplier module to be available. This availability can be through instantiation of an instance of the supplier class within procedural code, but will often be due to the existence of one of the higher level relationships described above.
The counts of Fan-In and Fan-Out are regarded as a measure of the structural quality of a program, with high values of either (and particularly high values of both within the same module) indicating increased risk of changes required in one module requiring changes across other modules. CCCC chooses to define the relationship counts in such a way that each supplier or client module is counted only once, however many separate ways the relationship is detected. CCCC applies filtering to the relationships identified to distinguish between different kinds of uses which may carry with them different levels of structural risk. There are two filters: visibility and concreteness.
The visibility filter removes from consideration relationships which are known to be only accessible from the private interface of a module. Relationships which are defined in the visible part of the interface can be exploited by clients of the current module, thus forcing those clients also to be clients of the current module's supplier. Visible relationships also increase the range of operations available on an object, thus increasing the cognitive complexity of the interface from the point of view of a programmer required to use a module.
The concreteness filter removes from consideration relationships which do not create a dependency of the implementation of the client module on the implementation of the supplier class. Dependency-creating relationships increase risk because they may not be cyclical, and thus inhibit the creation of other relationships. They also inhibit the ability of modules to be built separately, requiring recompilation of the client module when the supplier changes. The test for this filter in C++ is whether a forward declaration of the supplier class is adequate to allow the client module definition to be compiled: containment and parameter passing where the client module is modified by a referential operator are allowed in this case, containment or passing by value or inheritance are all dependency-creating. In Java, relationships except inheritance are treated as non-dependency creating.
# cccc_tmt.dat # # configuration file for treatment of metric values in CCCC # # lines in this file starting with '#' are treated as comments # all other lines are treated as defining a record in a table of # treatments for different metrics, which controls the display of # values of that metric # # all metric values are displayed using the class CCCC_Metric, which may be # viewed as ratio of two integers associated with a character string tag # the denominator of the ratio defaults to 1, allowing simple counts to # be handled by the same code as is used for ratios # # the tag associated with a metric is used as a key to lookup a record # describing a policy for its display (class Metric_Treatment) # # the fields of each treatment record are as follows: # TAG the short string of characters used as the lookup key # T1, T2 two numeric thresholds which are the lower bounds for the ratio of # the metric's numerator and denominator beyond which the # value is treated as high or extreme by the analyser # these will be displayed in emphasized fonts, and if the browser # supports the BGCOLOR attribute, extreme values will have a red # background, while high values will have a yellow background # the intent is that high values should be treated as suspicious but # tolerable in moderation, whereas extreme values should almost always # be regarded as defects (not necessarily that you will fix them) # NT a third threshold which supresses calculation of ratios where # the numerator is lower than NT # the principal reason for doing this is to prevent ratios like L_C # being shown as *** (infinity) and displayed as extreme when the # denominator is 0, providing the numerator is sufficiently low # suitable values are probably similar to those for T1 # W the width of the metric (total number of digits) # P the precision of the metric (digits after the decimal point) # Comment a free form field extending to the end of the line # #TAG T1 T2 NT W P Comment LOCf 30 100 0 6 0 Lines of code/function LOCm 500 2000 0 6 0 Lines of code/module LOCp 999999 999999 0 6 0 Lines of code/project MVGf 10 30 0 6 0 Cyclomatic complexity/function MVGm 200 1000 0 6 0 Cyclomatic complexity/module MVGp 999999 999999 0 6 0 Cyclomatic complexity/project COM 999999 999999 0 6 0 Comment lines M_C 5 10 5 6 3 MVG/COM McCabe/comment line L_C 7 30 20 6 3 LOC/COM Lines of code/comment line CGS 10 30 0 6 3 Card & Glass Structural Complexity CGSv 7 20 0 6 3 Card & Glass Structural Complexity (visible) CGSc 7 20 0 6 3 Card & Glass Structural Complexity (concrete) FI 12 20 0 6 0 Fan in (overall) FIv 6 12 0 6 0 Fan in (visible uses only) FIc 6 12 0 6 0 Fan in (concrete uses only) FO 12 20 0 6 0 Fan out (overall) FOv 6 12 0 6 0 Fan out (visible uses only) FOc 6 12 0 6 0 Fan out (concrete uses only) HKS 100 1000 0 6 0 Henry-Kafura/Shepperd measure (overall) HKSv 30 100 0 6 0 Henry-Kafura/Shepperd measure (visible only) HKSc 30 100 0 6 0 Henry-Kafura/Shepperd measure (concrete only)
The best place to look for information about CCCC is the CCCC home page at http://www.fste.ac.cowan.edu.au.
Recent versions of CCCC are usually available for download from ftp://www.fste.ac.cowan.edu.au/pub/tlittlef.
Significant new production versions are announced in the comp.software.measurement USENET news group