CScout Documentation


Diomidis Spinellis
Department of Management Science and Technology
Athens University of Economics and Business
Athens, Greece
dds@aueb.gr

Introduction

CScout is a source code analyzer and refactoring browser for collections of C programs. It can process workspaces of multiple projects (we define a project as a collection of C source files that are linked together) mapping the complexity introduced by the C preprocessor back into the original C source code files. CScout takes advantage of modern hardware advances (fast processors and large memory capacities) to analyze C source code beyond the level of detail and accuracy provided by current compilers and linkers. The analysis CScout performs takes into account the identifier scopes introduced by the C preprocessor and the C language proper scopes and namespaces.

CScout has already been applied on

CScout as a source code analyzer can:

More importantly, CScout helps you in refactoring code by identifying dead objects to remove, and can automatically perform accurate global rename identifier refactorings. CScout will automatically rename identifiers

Walkthrough

If you are impatient, you can get an immediate feeling of CScout, by unpacking its distribution file, entering the example directory and typing You will then be able to use CScout and your browser to explore the source code of the one true awk.

For a more structured walkthrough, read on. Consider the following C file, idtest.c

#define getval(x) ((x).val)

struct number {
        int id;
        double val;
} n;

struct character {
        int id;
        char val;
} c;

static int val;

main(int argc, char *argv[])
{
        int val;

        if (argc > 2)
                goto val;
        return getval(n) + getval(c);
        val: return 0;
}
Even though the file forms a contrived example, it will serve us to illustrate the basic concepts behind CScout's operation. Consider what would the correct renaming of one of the identifiers named val entail. CScout will help us to automate this process.

Although, we are dealing with a single file we need to specify its processing within the context of a workspace. In a realistic concept a workspace will specify how numerous projects consisting of multiple files will be processed; think of a workspace as a collection of Makefiles. CScout will operate across the many source files and related executables in the same way as it operates on our example file idtest.c.

A workspace specifies the set of files on which CScout will operate. Each workspace consists of a number of projects; a project is a set rules for linking together C files to form an executable. The workspace definition file is in our case very simple:

workspace example {
	project idtest {
		file idtest.c
	}
}
Our workspace, named example, consists of a single project, named idtest, that consists of a single C source file, idtest.c.

Our first step will be to transform the declarative workspace definition file into an imperative processing specification that CScout will handle.

prompt> cswc example.csw >example.c
We then invoke CScout on the compiled workspace definition file example.c.
prompt> cscout example.c
Processing workspace example
Processing project idtest
Processing file idtest.c
Done processing file idtest.c
Done processing project idtest
Done processing workspace example
Post-processing our_path/example.c
Post-processing our_path/idtest.c
Processing identifiers
100%
We are now ready to serve you at http://localhost:8081
The output of CScout is quite verbose; when processing a large source code collection, the messages will serve to assure us that progress is being made.

The primary interface of CScout is Web-based, so once our files have been processed, we fire-up our Web browser and navigate to the CScout's URL. We leave the CScout process running; its job from now on will be to service the pages we request and perform the operations we specify.

Our browser will show us a page like the following:

CScout Home

Files

Identifiers

Operations

Main page

In our first example we will only rename an identifier, but as is evident from the page's links CScout provides us with many powerfull tools.

By navigating through the links All files, idtest.c, and Source code with identifier hyperlinks we can see the source code with each recognised identifier marked as a hyperlink:

Source Code With Identifier Hyperlinks: your_path/idtest.c

(Use the tab key to move to each marked identifier.)


#define getval(x) ((x).val)

struct number {
        int id;
        double val;
n;

struct character {
        int id;
        char val;
c;

static int val;

main(int argc, char *argv[])
{
        int val;

        if (argc > 2)
                goto val;
        return getval(n) + getval(c);
        val: return 0;
}

Main page

Clicking on the first identifier val (in the macro definition) we are taken to a page specifying the identifier's details. There we can specify the identifier's new name, e.g. value.

Identifier: val

  • Read-only: No
  • Tag for struct/union/enum: No
  • Member of struct/union: Yes
  • Label: No
  • Ordinary identifier: No
  • Macro: No
  • Undefined macro: No
  • Macro argument: No
  • File scope: No
  • Project scope: No
  • Typedef: No
  • Crosses file boundary: No
  • Unused: No
  • Matches 3 occurence(s)
  • Appears in project(s):
    • idtest
  • Substitute with:

Dependent Files (Writable)

Dependent Files (All)

Main page

Clicking on the marked source hyperlink, CScout will show us again the corresponding source code, but with only the identifiers val matches marked as hyperlinks:

Identifier val: C:\dds\src\Research\cscout\refactor\idtest.c

(Use the tab key to move to each marked identifier.)


#define getval(x) ((x).val)

struct number {
        int id;
        double val;
} n;

struct character {
        int id;
        char val;
} c;

static int val;

main(int argc, char *argv[])
{
        int val;

        if (argc > 2)
                goto val;
        return getval(n) + getval(c);
        val: return 0;
}

Main page

The marked identifiers will be all the ones and the only ones the replacement we specified will affect. Similarly we can specify the replacement of the val label, the static variable, or the local variable; each one will only affect the relevant identifiers.

Selecting the hyperlink Exit - saving changes from the CScout's main page will commit our changes, modifying the file idtest.c.

Installation and setup

System Requirements

To run CScout your system must satisfy the following requirements:

Installation and Configuration

From this point onward we use the term Unix to refer to Unix-like systems like GNU/Linux and FreeBSD, and Windows to refer to Microsoft Windows systems.

You install CScout in eight steps:

  1. Unpack the distribution file on your system.

  2. Copy the executable files cscout and cswc (under Unix) or cscout.exe and cswc.bat (under Windows) from the bin directory into a directory that is part of your path. Under Unix /usr/local/bin is a common suitable choice. Under Windows C:\WINNT\system32 is a location you can use, if your system is not better organized.

  3. Under Windows adjust the second line of the file cswc.bat to point to the directory you installed it.

  4. Copy the directory etc to the final installation place you prefer (renaming it, if you wish), and arrange for the environment variable CSCOUT_HOME to point to it. As an example, under Unix you would probably have the directory installed as /usr/local/etc/cscout. Under Unix, you can permanently set the CSCOUT_HOME environment variable by editing a file named .profile (sh and derivative shells) .login (csh and derivative shells) in your home directory. Under Windows (NT and later), you can set environment variables through an option in: Control Panel - System - Advanced - Environment Variables, on Windows-95/98/Me you will need to edit the file c:\autoexec.bat.

    Alternativelly, the contents of the directory etc will be searched in $HOME/.cscout and the current directory's .cscout directory.

  5. Go in the CScout etc directory and copy the file pair cscout_incs.PLATFORM and cscout_defs.PLATFORM (where PLATFORM is the operating system and the compiler that most closely resemble your setup) as cscout_incs.h and cscout_defs.h.

    In most cases you want CScout to process your code using the include files of the compiler you are normally using. This will allow CScout to handle programs using the libraries and facilities available in your environment (e.g. Unix system calls or the Windows API). If your programs are written in ANSI C and do not use any additional include files, you can use the .GENERIC files and rely on the include files supplied with the CScout distribution.

  6. If you decided to use the .GENERIC files copy the include directory to an appropriate location (e.g. /usr/local/include/cscout under Unix).

  7. Edit the file cscout_incs.h to specify the location where your compiler's (or the generic) include files reside.

  8. If the compiler you are using does not match any of the files supplied, start with the .GENERIC file set and add suitable definitions to sidestep the problems caused by the extensions your compiler supports. As an example, if your compiler supports a quad_double type and associated keyword with semantics roughly equivalent to double you would add a line in cscout_incs:
    #define quad_double double
    
    Have a look in the existing cscout_defs files to see what might be required.
Note that there is nothing magical about the installation steps described above; feel free to follow them in whatever way matches your setup and environment, as long as you achieve the desired results.

Defining workspaces

A workspace definition provides CScout with instructions for parsing a set of C files; the task that is typically accomplished when compiling programs through the use of makefiles. CScout must always process all its source files in a single batch, so running it for each file from a makefile is not possible. Workspace definition files provide facilities for specifying linkage units (typically executable files - projects in the workspace definition file parlance) grouping together similar files and specifying include paths, read-only paths, and macros.

Workspace definition files are line-oriented and organized around C-like blocks. Comments are introduced using the # character. Consider the following simple example:

workspace echo {
	project echo {
		cd "/usr/src/bin/echo"
		file echo.c
	}
}
The above workspace definition consists of a single program (echo), which in turn consists of a single source file (echo.c).

See how we could expand this for two more programs, all residing in our system's /usr/src/bin directory:

workspace bin {
	cd "/usr/src/bin"
	ro_prefix "/usr/include"
	project cp {
		cd "cp"
		file cp.c utils.c
	}
	project echo {
		cd "echo"
		file echo.c
	}
	project date {
		cd "date"
		file date.c
	}
}
In the new bin workspace we have factored out the common source directory at the workspace level (cd "/usr/src/bin"), so that each project only specifies its directory relatively to the workspace directory (e.g. cd "date"). In addition, we have specified that files residing in the directory /usr/include are to be considered read-only (ro_prefix "/usr/include"). This is typically needed when the user running CScout has permission to modify the system's include files. Specifying one or more read-only prefixes allows CScout to distinguish between application identifiers and files, which you can modify, and system identifiers and files, which should not be changed.

You can see the complete syntax of CScout workspaces in the following BNF grammar.

WORKSPACE:
	workspace NAME { WORKSPACE_ELEMENT ... }

WORKSPACE_ELEMENT:
	SCOPED_COMMAND
	GLOBAL_COMMAND
	cd "PATH"
	PROJECT

SCOPED_COMMAND:
	ipath "PATH"
	define MACRO
	define MACRO VALUE

GLOBAL_COMMAND:
	ro_prefix "PATH"
	readonly "FILENAME"

PROJECT:
	project NAME { PROJECT_ELEMENT ... }

PROJECT_ELEMENT:
	SCOPED_COMMAND
	cd "PATH"
	DIRECTORY
	FILE

DIRECTORY:
	directory PATH  { DIRECTORY_ELEMENT ... }

DIRECTORY_ELEMENT:
	SCOPED_COMMAND
	FILE

FILE:
	file FILENAME ...
	file "FILENAME" { FILESPEC ... }

FILESPEC:
	SCOPED_COMMAND
	cd "PATH"
	readonly
The above grammar essentially specifies that a workspace consists of projects, which consist of files or files in a directory. At the workspace level you can specify files and directories that are to be considered read-only using the readonly and ro_prefix commands. Both commands affect the complete workspace. The scoped commands (define and ipath) are used to specify macro definitions and the include path. Their scope is the block they appear in; when you exit the block (project, directory, or file) their definition is lost. You can therefore define a macro or an include path for the complete workspace, a specific project, files within a directory, or a single file. The syntax of the define command is the same as the one used in C programs. The cd command is also scoped; once you exit the block you return to the directory that was in effect in the outside block. Within a project you can either specify individual files using the file command, or express a grouping of files in a directory using the directory command. The directory command's name is the directory where a group of files resides and serves as an implicit cd command for the files it contains. Finally, files can be either specified directly as arguments to the file command, or file can be used to start a separate block. In the latter case the argument of file is the file name to process; the block can contain additional specifications (scoped commands or the readonly command without an argument) for processing that file.

The following workspace definition was used for processing the apache web server and includes most of the features and formulations we discussed.

workspace apache {
	cd "/usr/local/src/apache/src"

	ro_prefix "/usr/local/src/apache/src/include/ap_config"

	# Global project definitions
	define HTTPD_ROOT "/usr/local/apache"
	define SUEXEC_BIN "/usr/local/apache/bin/suexec"
	define SHARED_CORE_DIR "/usr/local/apache/libexec"
	define DEFAULT_PIDLOG "logs/httpd.pid"
	define DEFAULT_SCOREBOARD "logs/httpd.scoreboard"
	define DEFAULT_LOCKFILE "logs/httpd.lock"
	define DEFAULT_XFERLOG "logs/access_log"
	define DEFAULT_ERRORLOG "logs/error_log"
	define TYPES_CONFIG_FILE "conf/mime.types"
	define SERVER_CONFIG_FILE "conf/httpd.conf"
	define ACCESS_CONFIG_FILE "conf/access.conf"
	define RESOURCE_CONFIG_FILE "conf/srm.conf"

	define AUX_CFLAGS
	define LINUX 22 
	define USE_HSREGEX 
	define NO_DL_NEEDED

	# Give project-specific directory and include path properties
	project gen_uri_delims {
		cd "main"
		ipath "../os/unix"
		ipath "../include"
		file gen_uri_delims.c
	}

	# Alternative formulation; specify per-file properties
	project gen_test_char {
		file gen_test_char.c {
			cd "main"
			ipath "../os/unix"
			ipath "../include"
		}
	}

	# httpd executable; specify directory-based properties
	project httpd {
		directory main {
			ipath "../os/unix"
			ipath "../include"
 			file alloc.c buff.c http_config.c http_core.c
			file http_log.c http_main.c http_protocol.c
			file http_request.c http_vhost.c util.c util_date.c
			file util_script.c util_uri.c util_md5.c rfc1413.c
		}
		directory regex {
			ipath "."
			ipath "../os/unix"
			ipath "../include"
			define POSIX_MISTAKE
			file regcomp.c regexec.c regerror.c regfree.c
		}
		directory os/unix {
			ipath "../../os/unix"
			ipath "../../include"
			file os.c os-inline.c
		}
		directory ap {
			ipath "../os/unix"
			ipath "../include"
			file ap_cpystrn.c ap_execve.c ap_fnmatch.c ap_getpass.c 
			file ap_md5c.c ap_signal.c ap_slack.c ap_snprintf.c 
			file ap_sha1.c ap_checkpass.c ap_base64.c ap_ebcdic.c
		}
		directory modules/standard {
			ipath "../../os/unix"
			ipath "../../include"
			file mod_env.c mod_log_config.c mod_mime.c
			file mod_negotiation.c mod_status.c mod_include.c
			file mod_autoindex.c mod_dir.c mod_cgi.c mod_asis.c
			file mod_imap.c mod_actions.c mod_userdir.c
			file mod_alias.c mod_access.c mod_auth.c mod_setenvif.c
		}
		directory . {
			ipath "./os/unix"
			ipath "./include"
			file modules.c buildmark.c
		}
	}
}

Execution

The CScout workspace compiler cswc will read from its standard input, or from the file(s) specified on its command line, a workspace definition and produce on its standard output a C-like file that CScout can process. You will have to redirect the cswc output to a file that will then get passed as an argument to CScout.

The CScout engine (cscout) requires as an argument a cswc-compiled workspace definition file. It will serially process each project and directory parsing the corresponding files specified in the workspace definition file, and then process once more each one of the files examined to establish the location of the identifiers. Note that the bulk of the work is performed in the first pass. During the first pass CScout may report warnings, errors, and fatal errors. Fatal errors will terminate processing, all other errors may result in an incorrect analysis of the particular code fragment. CScout only checks the code to the extend needed to perform its analysis; CScout will hapily process many illegal constructs.

The following lines illustrate the output of CScout when run on the bin workspace.

Entering directory /usr/src/bin
Processing project cp
Entering directory cp
Processing file cp.c
Done processing file cp.c
Processing file utils.c
Done processing file utils.c
Exiting directory cp
Done processing project cp
Processing project echo
Entering directory echo
Processing file echo.c
Done processing file echo.c
Exiting directory echo
Done processing project echo
Processing project date
Entering directory date
Processing file date.c
Done processing file date.c
Exiting directory date
Done processing project date
Exiting directory /usr/src/bin
Done processing workspace bin
Post-processing /home/dds/src/cscout/cscout_defs.h
Post-processing /home/dds/src/cscout/cscout_incs.h
Post-processing /usr/home/dds/src/cscout/bin.c
Post-processing /usr/include/ctype.h
Post-processing /usr/include/err.h
Post-processing /usr/include/errno.h
Post-processing /usr/include/fcntl.h
Post-processing /usr/include/fts.h
Post-processing /usr/include/limits.h
Post-processing /usr/include/locale.h
Post-processing /usr/include/machine/ansi.h
Post-processing /usr/include/machine/endian.h
Post-processing /usr/include/machine/limits.h
Post-processing /usr/include/machine/param.h
Post-processing /usr/include/machine/signal.h
Post-processing /usr/include/machine/trap.h
Post-processing /usr/include/machine/types.h
Post-processing /usr/include/machine/ucontext.h
Post-processing /usr/include/runetype.h
Post-processing /usr/include/stdio.h
Post-processing /usr/include/stdlib.h
Post-processing /usr/include/string.h
Post-processing /usr/include/sys/_posix.h
Post-processing /usr/include/sys/cdefs.h
Post-processing /usr/include/sys/inttypes.h
Post-processing /usr/include/sys/param.h
Post-processing /usr/include/sys/signal.h
Post-processing /usr/include/sys/stat.h
Post-processing /usr/include/sys/syslimits.h
Post-processing /usr/include/sys/time.h
Post-processing /usr/include/sys/types.h
Post-processing /usr/include/sys/ucontext.h
Post-processing /usr/include/sys/unistd.h
Post-processing /usr/include/sysexits.h
Post-processing /usr/include/syslog.h
Post-processing /usr/include/time.h
Post-processing /usr/include/unistd.h
Post-processing /vol/src/bin/cp/cp.c
Post-processing /vol/src/bin/cp/extern.h
Post-processing /vol/src/bin/cp/utils.c
Post-processing /vol/src/bin/date/date.c
Post-processing /vol/src/bin/date/extern.h
Post-processing /vol/src/bin/date/vary.h
Post-processing /vol/src/bin/echo/echo.c
Processing identifiers
100%
We are now ready to serve you at http://localhost:8081
After processing your files CScout will start operating as a Web server. At that point you must open a Web browser and connect to the location printed on its output. From that point onward your CScout contact is the Web browser interface; only fatal errors and progress indicators will appear on CScout's standard output. Depending on the version of CScout you have, you may also be able to perform some operations over the network. However, since CScout operates as a single-threaded process, you may experience delays when another user sends a complex query.

When CScout processes a large project it will contact our server over the Web to say hello. The unsupported version will also send us the identifier and file metrics of the project it has processed and register your CScout project for public browsing over the Web (the unsupported version is licensed only for use on free open-source software).

Preprocessor invocation

As an aid for configuring CScout for a different compiler you can run CScout and the workspace compiler with the optional -E command-line argument. The -E option will orchestrate both programs to act as a simple C preprocessor. The workspace definition file you should use in such a case should only specify a single file. The corresponding output of CScout will be the file with all preprocessor commands evaluated. If CScout reports an error in a place where a macro is invoked, you can examine the preprocessed output to see the result of the macro execution. During the CScout trials, this feature often located the use of nonstandard compiler extensions, that were hidden inside header files. To search for the corresponding error location in the postprocessed file use the name of a nearby identifier as a bookmark, since the line numbers will not match and CScout will not generate #line directives. Alternatively, you can rerun CScout on the preprocessed file.

Checking invocation

There are cases where you may only want to run CScout to see its error diagnostic messages. As an example, you may be running CScout as part of your daily build cycle to verify that the source code can always be parsed by CScout. The -c command-line option will cause CScout to immediately exit after processing the specified file.

The -c option is often used in conjunction with the -r option. The -r command-line option instructs CScout to report all superfluously included header files and identifiers that are either unused or wrongly scoped. Although it is easy to recognise when a header file must be included (if you do not follow the specification of the respective API, a compiler's error message will act as a reminder) detecting when an included header is no longer needed is a lot more difficult. Thus, as code changes, entire files are duplicated as source code templates, and functions are moved to different files, header files that were once needed may no longer be required. Their existence can confuse the programmers reading the code (why is this header file included?) and unnecessarily burden the compilation process. CScout can detect such files by keeping track of dependencies across files, and report included files that are not required. The following is an example of CScout's output:

$ cscout -rc awk.cs
Processing workspace awk
Processing project awk
Entering directory awk
Processing file awkgram.y
Done processing file awkgram.y
[...]
Processing file tran.c
Done processing file tran.c
Exiting directory awk
Done processing project awk
Done processing workspace awk
Post-processing /home/dds/src/cscout/example/.cscout/cscout_defs.h
[...]
Post-processing /home/dds/src/cscout/include/time.h
Processing identifiers
100%
/home/dds/src/cscout/example/awk/run.c:84: jexit: unused project scoped writable identifier
[...]
/home/dds/src/cscout/example/awk/awkgram.y:93: LASTTOKEN: unused file scoped writable identifier
/home/dds/src/cscout/example/awk/awk.h:152: CFREE: unused writable macro
[...]
/home/dds/src/cscout/example/awk/tran.c:44: CONVFMT: writable identifier should be made static
/home/dds/src/cscout/example/awk/lib.c:36: file: writable identifier should be made static
[...]
/home/dds/src/cscout/example/awk/lib.c:33: unused included file /home/dds/src/cscout/example/awk/ytab.h
/home/dds/src/cscout/example/awk/main.c:29: unused included file /home/dds/src/cscout/include/ctype.h
/home/dds/src/cscout/example/awk/main.c:35: unused included file /home/dds/src/cscout/example/awk/ytab.h
/home/dds/src/cscout/example/awk/tran.c:32: unused included file /home/dds/src/cscout/example/awk/ytab.h
Notice that there are two types of unused include files:
  1. Directly included files
  2. Included files that are only indirectly included
You will typically remove the #include directives for the directly included files. The files that are indirectly included and unused are a lot more tricky. They are brought into your file's compilation by the inclusion of another file. Even if you have control over the header file that included them and even if your file has no use for their contents, another file may require them, so in most cases it is best not to mess with those files. Finally note that it is possible to construct pathological examples of include files that CScout will not detect as being required. These will contain just parts of a statement or declaration that can not be related to the file including them (e.g. a single operator, or a comma):
/* Main file main.c */
main(int argc
#include "comma.h"
char *argv[])
{
}

/* File comma.h */
,
Although such a construct is legal C it is not used in practice.

Recently CScout processed a 190KLOC project that is under active development since 1989. The project consists of 231 files, containing 5249 include directives. Following CScout's analysis 765 include directives from 178 files were removed, without a single problem.

The Web interface

The main screen CScout presents to your browser is divided into three sections: Most pages CScout sends to your browser are dynamically generated and may contain elements that can vary from one CScout invocation to the next. Therefore you should not bookmark source listings, or file or identifier detail pages, and expect them to be available on another CScout invocation. On the other hand, the pages containing results of identifier or file queries can be freely bookmarked and are identified with a comment specifying the fact.

Identifier Query Results

Matching Identifiers

[...]

You can bookmark this page to save the respective query

Main page

You can therefore use your browser's bookmark facility to ``store'' such queries for future use, or pass the URL around so that others can reproduce your results.

We will examine CScout's functionality using as an example the bin workspace we presented in the previous section.

File elements

Although some of the file queries operate on identifier properties, all file queries produce file-list data as their result. Clicking on an element of a file list leads you to a page with a summary of the file.

File: /usr/include/stdio.h

Metrics
  • Read-only: Yes
  • Number of characters: 14935
  • Comment characters: 6253
  • Space characters: 1385
  • Number of line comments: 0
  • Number of block comments: 74
  • Number of lines: 454
  • Length of longest line: 77
  • Number of C strings: 1
  • Number of defined functions: 1
  • Number of preprocessor directives: 124
  • Number of directly included files: 2
  • Number of C statements: 3
  • Used in project(s):
    • cp
    • echo
    • date
Listings
Include Files

Main page


CScout 1.6 - 2003/06/04 15:14:51
The page contains some representative metrics for the given file, the projects using this file, links for viewing the file's source code, and links for listing include file dependencies.

You can view a file's source code in three different forms:

  1. The plain source code, will only provide you the file's code text
  2. Source code with identifier hyperlinks, will provide you with a page of the file's code text where each identifier is represented as a hyperlink leading to the identifier's page. The following is a representative example.
    int
    copy_fifo(from_statexists)
            struct stat *from_stat;
            int exists;
    {
            if (exists && unlink(to.p_path)) {
                    warn("unlink: %s", to.p_path);
                    return (1);
            }
            if (mkfifo(to.p_pathfrom_stat->st_mode)) {
                    warn("mkfifo: %s", to.p_path);
                    return (1);
            }
            return (pflag ? setfile(from_stat, 0) : 0);
    }

  3. As the above display can be overwhelming, you may prefer to browse the source code with hyperlinks only to project-global writable identifiers, which are typically the most important identifiers. Consider again how the above example would be displayed:
    int
    copy_fifo(from_stat, exists)
            struct stat *from_stat;
            int exists;
    {
            if (exists && unlink(to.p_path)) {
                    warn("unlink: %s", to.p_path);
                    return (1);
            }
            if (mkfifo(to.p_path, from_stat->st_mode)) {
                    warn("mkfifo: %s", to.p_path);
                    return (1);
            }
            return (pflag ? setfile(from_stat, 0) : 0);
    }

File Metrics

File metrics produces a summary of the workspace's file-based metrics like the following:

File Metrics

Writable Files
Number of files: 1

File metricTotalMinMaxAvg
Number of characters2272227222722272
Comment characters98989898
Space characters176176176176
Number of line comments8888
Number of block comments0000
Number of lines86868686
Length of longest line47474747
Number of C strings44444444
Number of defined functions0000
Number of preprocessor directives70707070
Number of directly included files12121212
Number of C statements0000

Read-only Files
Number of files: 43

File metricTotalMinMaxAvg
Number of characters246448384149355731
Comment characters13489512968203137
Space characters1858051817432
Number of line comments1010
Number of block comments1229111428
Number of lines706416484164
Length of longest line33956210778
Number of C strings1840384
Number of defined functions0000
Number of preprocessor directives1865017943
Number of directly included files900132
Number of C statements0000

Main page


CScout 1.6 - 2003/06/04 15:14:51

All files

The "All files" link will list all the project's files, including source files, and directly and indirectly included files. You can use this list to create a "bill of materials" for the files your workspace requires to compile. The following is an example of the output:

All Files

You can bookmark this page to save the respective query

Main page


CScout 1.6 - 2003/06/04 15:14:51

Read-only files

The "Read-only files" link will typically show you the system files your project used. The following output was generated using the "remove common path prefix in file lists" option.

Read-only Files

You can bookmark this page to save the respective query

Main page


CScout 1.6 - 2003/06/04 15:14:51

Writable files

Correspondingly the "Writable files" link will only show you all your workspace's source files:

Writable Files

You can bookmark this page to save the respective query

Main page


CScout 1.6 - 2003/06/04 15:14:51

Files containing unused project-scoped writable identifiers

The link ``files containing unused project-scoped writable identifiers'' performs an identifier query, but lists as output files containing matching identifiers. Specifically, the link will produce a list of files containing global (project-scoped) unused writable identifiers. Modern compilers can detect unused block-local or even file-local (static) identifiers, but detecting global identifiers is more tricky, since it requires processing of all files that will be linked together. The restriction to writable identifiers will filter-out noise generated through the use of the system's library functions.

In our example, the following list is generated:

Files Containing Unused Project-scoped Writable Identifiers

Matching Files

You can bookmark this page to save the respective query

Main page


CScout 1.6 - 2003/06/04 15:14:51
The output contains the path to each file, and a link that will generate the file's source code with the offending identifiers marked as hyperlinks. You can use the ``marked source'' link to inspect the identifiers in the context of their source code; simply follow the link with your browser and press tab to go to each hyperlink. In our example the identifier will appear as follows:

void
setthetime(fmt, p, jflag, nflag)
        const char *fmt;
        register const char *p;
        int jflag, nflag;
{
        register struct tm *lt;
        struct timeval tv;
        const char *dot, *t;
        int century;

(In our case the function setthetime is declared as static, but not defined as such.)

Files containing unused file-scoped writable identifiers

The link ``files containing unused file-scoped writable identifiers'' performs an identifier query, but lists as output files containing matching identifiers. Specifically, the link will produce a list of files containing file-scoped (static) unused writable identifiers. Although some modern compilers can detect file-local identifiers, they fail to detect macros and some types of variable declarations. The CScout query is more general and can be more reliable. The restriction to writable identifiers will filter-out noise generated through the use of the system's library functions.

In our example, the following list is generated:

Files Containing Unused File-scoped Writable Identifiers

Matching Files

You can bookmark this page to save the respective query

Main page


CScout 1.6 - 2003/06/04 15:14:51
In our case all identifiers located were the copyright and the rcsid identifiers.

#ifndef lint
static char const copyright[] =
"@(#) Copyright (c) 1989, 1993\n\
        The Regents of the University of California.  All rights reserved.\n";
#endif /* not lint */

#ifndef lint
#if 0
static char sccsid[] = "@(#)echo.c      8.1 (Berkeley) 5/31/93";
#endif
static const char rcsid[] =
  "$FreeBSD: src/bin/echo/echo.c,v 1.8.2.1 2001/08/01 02:33:32 obrien Exp $";
#endif /* not lint */

Later on we will explain how an identifier query could have used a regular expression to filter-out the noise generated by these two identifiers.

Writable .c files without any statements

The ``writable .c files without any statements'' will locate C files that do not contain any C statements. You can use it to locate files that only contain variable definitions, or files that are #ifdef'd out.

In our example, the result set only contains the compiled workspace definition file.

Writable .c Files Without Any Statments

You can bookmark this page to save the respective query

Main page


CScout 1.6 - 2003/06/04 15:14:51
The compiled workspace definition file follows the C syntax, but only contains preprocessor directives (mostly CScout-specific #pragma commands) to drive the CScout's source code analysis.

Writable files containing strings

The ``writable files containing strings'' link will present you C files containing C strings. In some applications user-messages are not supposed to be put in the source code, to aid localization efforts. This file query can then help you locate files that contain strings.

In our case the results are:

Writable Files Containing Strings

You can bookmark this page to save the respective query

Main page


CScout 1.6 - 2003/06/04 15:14:51

Writable .h files with #include directives

Some coding conventions dictate against recursive #include invocations. This query can be used to find files that break such a guideline. As usual, read-only system files are excluded; these typically use recursive #include invocations as a matter of course.

In our example, the result is:

Writable .h Files With #include directives

You can bookmark this page to save the respective query

Main page


CScout 1.6 - 2003/06/04 15:14:51

Generic file queries

A generic file query is a powerful mechanism for locating files that match the criteria you specify. All the ready-made file queries that CScout provides you are just URLs specifying saved instances of generic queries.

You specify the query through the following form:

File Query

Writable
Read-only
Number of characters
Comment characters
Space characters
Number of line comments
Number of block comments
Number of lines
Length of longest line
Number of C strings
Number of defined functions
Number of preprocessor directives
Number of directly included files
Number of C statements

Match any of the above         Match all of the above


File names should match RE

Query title   

Main page


CScout 1.6 - 2003/06/04 15:14:51
You start by specifying whether the file should be writable (i.e. typically part of your application) and/or readable (i.e. typically part of the compiler or system). Next come a series of metrics CScout collects for each file. For each metric (e.g. the number of comments) you can specify an operator ==, !=, < or > and a number to match that metric against. Thus to locate files without any comments you would specify
Number of block comments == 0.

You can request to see files matching any of your specifications (Match any of the above) or to see files matching all your specifications (Match all of the above).

Sometimes you may only want to search in a subset of files; you can then specify a regular expression that filenames should match against (File names should match RE).

Finally, you can also specify a title for your query. The title will then appear on the result document annotating the results, and will also provide you with a sensible name when creating a bookmark to it.

C Namespaces

To understand identifier queries it is best to refresh our notion of the C namespaces. The main way we normally reuse identifier names in C programs is through scoping: an identifier within a given scope such as a block or declared as static within a file will not interfere with identifiers outside that scope. Thus, the following example will print 3 and not 7.
int i = 3;

foo()
{
        int i = 7;
}

main()
{
        foo();
        printf("%d\n", i);
}
CScout analyzes and stores each identifier's scope performing substitutions accordingly.

In addition, C also partitions a program's identifiers into four namespaces. Identifiers in one namespace, are also considered different from identifiers in another. The four namespaces are:

  1. Tags for a struct/union/enum
  2. Members of struct/union (actually a separate namespace is assigned to each struct/union)
  3. Labels
  4. Ordinary identifiers (termed objects in the C standard)
Thus in the following example all id identifier instances are different:
/* structure tag */
struct id {
        int id;         /* structure member */
};

/* Different structure */
struct id2 {
        char id;        /* structure member */
};

/* ordinary identifier */
id()
{
id:     /* label */
}
Furthermore, macro names and the names of macro formal arguments also live in separate namespaces within the preprocessor.

Normally when you want to locate or change an identifier name, you only consider identifiers in the same scope and namespace. Sometimes however, a C preprocessor macro can semantically unite identifiers living in different namespaces, so that changes in one of them should be propagated to the others. The most common case involves macros that access structure members.

struct s1 {
        int id;
} a;

struct s2 {
        char id;
} b;

#define getid(x) ((x)->id)

main()
{
        printf("%d %c", getid(a), getid(b));
}
In the above example, a name change in any of the id instances should be propagated to all others for the program to retain its original meaning. CScout understands such changes and will propagate any changes you specify accordingly.

Finally, the C preprocessor's token concatenation feature can result in identifiers that should be treated for substitution purposes in separate parts. Consider the following example:

int xleft, xright;
int ytop, ybottom;

#define coord(a, b) (a ## b)

main()
{
        printf("%d %d %d %d\n",
                coord(x, left),
                coord(x, right),
                coord(y, top),
                coord(y, bottom));
}
In the above example, replacing x in one of the coord macro invocations should replace the x part in the xleft and xright variables. Again CScout will recognize and correctly handle this code.

Identifier elements

All identifier queries produce identifier lists data as their result. Clicking on an identifier in the list will lead you to a page like the following.

Identifier: copy_file

  • Read-only: No
  • Tag for struct/union/enum: No
  • Member of struct/union: No
  • Label: No
  • Ordinary identifier: Yes
  • Macro: No
  • Undefined macro: No
  • Macro argument: No
  • File scope: No
  • Project scope: Yes
  • Typedef: No
  • Crosses file boundary: Yes
  • Unused: No
  • Matches 5 occurence(s)
  • Appears in project(s):
    • cp
  • Substitute with:
Dependent Files (Writable)
Dependent Files (All)

Main page


CScout 1.6 - 2003/06/04 15:14:51
As you see, for each identifier CScout will display: The substitution will globally replace the identifier (or the identifier part) in all namespaces, files, and scopes required for the program to retain its original meaning. No checks for name collisions are made, so ensure that the name you specify is unique for the appropriate scope. Performing the substitution operation will not change the identifier's name in the current invocation of CScout. However, once you have finished your browsing and replacing session, you have an option to terminate CScout and write back all the subtitutions you made to the respective source files.

Finally, the identifier's page will list the writable and all files the specific identifier appears in. Clicking on the ``marked source'' hyperlink will display the respective file's source code with only the given identifier marked as a hyperlink. By pressing your browser's tab key you can then see where the given identifier is used. In our example the cp.c source code with the copy_file identifier marked would appear as follows:
                case S_IFBLK:
                case S_IFCHR:
                        if (Rflag) {
                                if (copy_special(curr->fts_statp, !dne))
                                        badcp = rval = 1;
                        } else {
                                if (copy_file(curr, dne))
                                        badcp = rval = 1;
                        }
                        break;
                case S_IFIFO:
                        if (Rflag) {
                                if (copy_fifo(curr->fts_statp, !dne))
                                        badcp = rval = 1;
                        } else {
                                if (copy_file(curr, dne))
                                        badcp = rval = 1;
                        }
                        break;
                default:
                        if (copy_file(curr, dne))
                                badcp = rval = 1;
                        break;
                }

Identifier Metrics

The identifier metrics page displays a summary of metrics related to identifier use. In our example, the metrics are as follows:

Identifier Metrics

Writable Identifiers
Identifier classDistinct # idsTotal # idsAvg lengthMin lengthMax length
All identifiers1709226132
Tag for struct/union/enum213324
Member of struct/union5685310
Label00-00
Ordinary identifier1307825112
Macro274210422
Undefined macro28221232
Macro argument514215
File scope11327510
Project scope271727212
Typedef13666
Read-only Identifiers
Identifier classDistinct # idsTotal # idsAvg lengthMin lengthMax length
All identifiers208341328133
Tag for struct/union/enum301217211
Member of struct/union2964108121
Label00-00
Ordinary identifier58311447220
Macro97419069333
Undefined macro9836012327
Macro argument1703901111
File scope1034938318
Project scope4616077320
Typedef984868318

Main page


CScout 1.6 - 2003/06/04 15:14:51
You can use these metrics to compare characteristics of different projects, adherance to coding standards, or to identify identifier classes with abnormally short or long names. The ratio between the distinct number of identifiers and the total number of identifiers is the number of times each identifier is used. Notice the difference in our case between the read-only identifiers (which are mostly declarations) and the writable identifiers (which are actually used).

All identifiers

The all identifiers page will list all the identifiers in your project in alphabetical sequence. In large projects this page will be huge.

Read-only identifiers

The ``read-only identifiers'' page will only list the read-only identifiers of your project in alphabetical sequence. These typically become part of the project through included header files.

Writable identifiers

The ``writable identifiers'' page will only list the writable identifiers of your project in alphabetical sequence. These are typically the identifiers your project has defined. In large projects this page will be huge.

File-spanning writable identifiers

The ``file-spanning writable identifiers'' page will only list your project's identifiers that span a file boundary. Refactoring operations and coding standards typically pay higher attention to such identifiers, since they tend occupy the project's global namespace. In our example, the following page is generated:

File-spanning Writable Identifiers

Matching Identifiers

PATH_T
arg
copy_fifo
copy_file
copy_link
copy_special
fflag
iflag
netsettime
nflag
p_end
p_path
pflag
setfile
target_end
to
usage
vary
vary_append
vary_apply
vary_destroy
vflag

You can bookmark this page to save the respective query

Main page


CScout 1.6 - 2003/06/04 15:14:51

Unused project-scoped writable identifiers

The unused project-scoped writable identifiers are useful to know, since they can pinpoint functions or variables that can be eliminated from a workspace.

Unused file-scoped writable identifiers

The unused file-scoped writable identifiers can also pinpoint functions or variables that can be eliminated from a file. In our example the following list is generated:

Unused File-scoped Writable Identifiers

Matching Identifiers

copyright
copyright
copyright
rcsid
rcsid
rcsid
rcsid

You can bookmark this page to save the respective query

Main page


CScout 1.6 - 2003/06/04 15:14:51
Notice how distinct identifiers appear as separate entries.

Unused writable macros

Finally, the unused writable macros page will list macros that are not used within a workspace. In our case the list contains an identifier that was probably used in an earlier version.

Unused Writable Macros

Matching Identifiers

RETAINBITS

You can bookmark this page to save the respective query

Main page


CScout 1.6 - 2003/06/04 15:14:51

Generic identifier queries

The generic identifier query feature of CScout is one of its most powerfull features, allowing you to accurately specify the properties of identifiers you are looking for, by means of the following form.

Identifier Query

Writable
Read-only
Tag for struct/union/enum
Member of struct/union
Label
Ordinary identifier
Macro
Undefined macro
Macro argument
File scope
Project scope
Typedef
Enumeration constant
Crosses file boundary
Unused

Match any marked         Match all marked         Exclude marked         Exact match


Identifier names should ( not) match RE
Select identifiers from filenames matching RE

Query title   

Main page


CScout 1.16 - 2003/08/17 12:13:01
In the form you specify: You can either select to see the identifiers that match the specific query, or the files containing identifiers that match the query. In the second case, each file in the file list will provide you with a link (marked source) showing the file's source code with all matched identifiers marked using hyperlinks.

As an example, the following query could be used to identify unused file-scoped writable identifiers, but excluding the copyright and rcsid identifiers:

Identifier Query

Writable
Read-only
Tag for struct/union/enum
Member of struct/union
Label
Ordinary identifier
Macro
Undefined macro
Macro argument
File scope
Project scope
Typedef
Enumeration constant
Crosses file boundary
Unused

Match any marked         Match all marked         Exclude marked         Exact match


Identifier names should ( not) match RE
Select identifiers from filenames matching RE

Query title   

Main page


CScout 1.16 - 2003/08/17 12:13:01

Options

The operations CScout provides group together functions that globally affect its operation. The global options link leads you to the following page.

Global Options

Remove common path prefix in file lists
Sort identifiers starting from their last character
Show only true identifier classes (brief view)
Show line numbers in source listings
Case-insensitive file name regular expression match

Code listing tab width

Main page


CScout 1.6 - 2003/06/04 15:14:51
The meaning of each option is described in the following sections.

Remove Common Path Prefix in File Lists

Setting the ``remove common path prefix in file lists'' option will result in file lists grouped by the common path prefix as in the following example:
This results in lists that are easier to read, but that can not be easilly copy-pasted into other tools for further processing.

Sort Identifiers Starting from their Last character

Some coding conventions use identifier suffixes for distinguishing the use of a given identifier. As an example, typedef identifiers often end in _t. The following list contains our example's typedefs ordered by the last character, making it easy to distinguish typedefs not ending in _t
FILE
FTS
FTSENT
PATH_T
_RuneRange
_RuneLocale
u_long
fd_mask
u_char
physadr
int32_t
__int32_t
u_int32_t
uint32_t
__uint32_t
inthand2_t
ointhand2_t
int64_t
[... 40 lines removed]
in_addr_t
caddr_t
c_caddr_t
v_caddr_t
daddr_t
ufs_daddr_t
u_daddr_t
qaddr_t
__sighandler_t
__siginfohandler_t
timer_t
register_t
u_register_t
intptr_t
__intptr_t
uintptr_t
__uintptr_t
fpos_t
timecounter_pps_t
timecounter_get_t
vm_offset_t
vm_ooffset_t
sigset_t
osigset_t
fixpt_t
in_port_t
mcontext_t
ucontext_t
dev_t
div_t
ldiv_t
vm_pindex_t
key_t
segsz_t
fd_set
u_int
uint
u_short
ushort
_RuneEntry

Show Only True Identifier Classes

Setting the option ``show only true identifier classes (brief view)'' will remove from each identifier page all identifier properties marked as no, resulting in a less verbose page.

Identifier: argc

  • Ordinary identifier: Yes
  • Matches 8 occurence(s)
  • Appears in project(s):
    • cp
  • Substitute with:
Dependent Files (Writable)
Dependent Files (All)

Main page


CScout 1.6 - 2003/06/04 15:14:51

Show Line Numbers in Source Listings

The "show line numbers in source listings" option allows you to specify whether the source file line numbers will be shown in source listings. Line numbers can be useful when you are editing or viewing the same file with an editor. A file with line numbers shown appears as follows:

   78 fa *makedfa(const char *s, int anchor)  /* returns dfa for reg expr s */
   79 {
   80         int i, use, nuse;
   81         fa *pfa;
   82         static int now = 1;
   83 
   84         if (setvec == 0) {      /* first time through any RE */
   85                 maxsetvec = MAXLIN;
   86                 setvec = (int *) malloc(maxsetvec * sizeof(int));
   87                 tmpset = (int *) malloc(maxsetvec * sizeof(int));
   88                 if (setvec == 0 || tmpset == 0)
   89                         overflo("out of space initializing makedfa");
   90         }
   91 
   92         if (compile_time)       /* a constant for sure */
   93                 return mkdfa(s, anchor);
   94         for (i = 0; i < nfatab; i++)    /* is it there already? */
   95                 if (fatab[i]->anchor == anchor
   96                   && strcmp((const char *) fatab[i]->restr, s) == 0) {
   97                         fatab[i]->use = now++;
   98                         return fatab[i];
   99                 }
  100         pfa = mkdfa(s, anchor);
  101         if (nfatab < NFA) {     /* room for another */
  102                 fatab[nfatab] = pfa;
  103                 fatab[nfatab]->use = now++;
  104                 nfatab++;
  105                 return pfa;
  106         }

Case-insensitive File Name Regular Expression Match

Some environments, such as Microsoft Windows, are matching filenames in a case insensitive manner. As a result the same filename may appear with different capitalization (e.g. Windows.h, WINDOWS.h, and windows.h). The use of the ``case-insensitive file name regular expression match'' option makes filename regular expression matches ignore letter case thereby matching the operating system's semantics.

Code Listing Tab Width

The ``code listing tab width'' option allows you to specify the tab width to use when listing source files as hypertext (8 by default). The width should match the width normally used to display the file. It does not affect the way the modified file is written; tabs and spaces will get written exactly as found in the source code file.

Operations

The operations CScout provides group together functions that globally affect its operation. The following sections describe all operations appart from the global options.

Select active project

When using a workspace with multiple projects, you can restrict the results of all identifier and file queries (read-made and those you explicitly specify) to refer to a particular project or to all projects. The metric results displayed are not affected. When a project is delected, all pages end with a remark indicating the fact. The following shows our example's project selection page.

Select Active Project

Project cp is currently selected

Main page


CScout 1.6 - 2003/06/04 15:14:51

Exit - saving changes

Once you have changed the name of some identifiers by substituting it with another name, you should exit CScout through this option to commit the changes you made to the respective file source code.

Exit - ignore changes

You can also exist CScout without committing any changes. As this option will trigger millions of object desctructors in large workspaces, it may be faster to just terminate CScout from its command-line instance by pressing ^C.

Interfacing with version management systems

When the files CScout will modify are under revision control you may want to check them out for editing before doing the identifier substitutions, and then check them in again. CScout provides hooks for this operation. Before a file is modified CScout will try to execute the command cscout_checkout; after the file is modified CScout will try to execute the command cscout_checkin. Both commands will receive as their argument the full path name of the respective file. If commands with such names are in your path, they will be executed performing whatever action you require.

As an example, for a file under RCS control the following commands could be used:

cscout_checkout

#!/bin/sh
co -l $1

cscout_checkin

#!/bin/sh
co -m 'CScout identifier name refactoring' -u $1

Language extensions

CScout implements the C language as defined in ANSI X3.159-1989. In addition, it supports the following extensions:
  1. Initialization designators (C99)
  2. Compound literals (C99)
  3. Declarations can be intermixed with statements (C99).
  4. Recognise __atribute__(__unused__) for determining which identifiers should not be reported as unused (gcc).
  5. // line comments (common extension)
  6. __asm__ blocks (gcc)
  7. enum lists ending with a comma (common extension)
  8. Anonymous struct/union members (gcc, Microsoft C)
  9. Allow case expression ranges (gcc).
  10. __typeof keyword (gcc)
  11. A compound statement in brackets can be an expression (gcc)
  12. Macros expanding from /##/ into // are then treated as a line comment (Microsoft C)
  13. #include_next preprocessor directive (gcc)
  14. #warning preprocessor directive (gcc)
  15. Variable number of arguments preprocessor macros (support for both the gcc and the C99 syntax)
  16. Allow empty member declarations in aggregates (gcc)
  17. long long type (common extension)
  18. A semicolon can appear as a declatation (common extension)
  19. An aggregate declaration body can be empty (gcc)
  20. Aggregate member initialization using the member: value syntax (gcc)
  21. Statement labels do not require a statement following them (gcc)
  22. #ident preprocessor directive (gcc)
  23. Allow assignment to case expressions (common extension)
  24. Accept an empty translation unit (common extension).
  25. Support locally declared labels (__label__) (gcc).
  26. Dereferencing a function yields a function (common extension).
Many other compiler-specific extensions are handled by suitable macro definitions in the CScout initialization file.

Processing yacc files

Many C programs include parsing code in the form of yacc source files. CScout can directly process those files, allowing you to analyze and modify the identifiers used in those files. CScout determines whether a file is yacc source or plain C, by examining the file's suffix: file names ending in a lowercase 'y' are considered to contain yacc source and processed accordingly.

CScout processes yacc files as follows:

CScout is designed to process well-formed modern-style yacc files. All rules must be terminated with a semicolon (apparently this is optional in the original yacc version). The accepted grammar appears below.

body:
	defs '%%' rules tail
	;

tail:
	/* Empty */
	| '%%' c_code
	;

defs:
	/* Empty */
	| defs def
	;

def:
	'%start' IDENTIFIER
	| '%union' '{' member_declaration_list  '}' 
	| '%{' c_code '%}'
	| rword name_list_declaration
	;

rword:
	'%token'
	| '%left'
	| '%right'
	| '%nonassoc'
	| '%type'
	;

tag:
	/* Empty: union tag is optional */
	| '<' IDENTIFIER '>'
	;

name_list_declaration:
	tag name_number
	| name_list_declaration opt_comma name_number
	;

opt_comma:
	/* Empty */
	| ','
	;

name_number:
	name
	| name INT_CONST
	;

name:
	IDENTIFIER
	| CHAR_LITERAL
	;

/* rules section */

rules:
	rule
	| rules rule
	;

rule:
	IDENTIFIER ':'  rule_body_list ';'
	;

rule_body_list:
	rule_body
	| rule_body_list '|' rule_body
	;

rule_body:
	id_action_list prec
	;

id_action_list:
	/* Empty */
	| id_action_list name
        | id_action_list '{' c_code '}' 
	;

prec:
	/* Empty */
	| '%prec' name
	| '%prec' name  '{' c_code '}'
	;

variable:
	'$$'
	| '$' INT_CONST
	| '$-' INT_CONST
		{ $$ = basic(b_int); }
	| '$<' IDENTIFIER '>' variable_suffix
		{ $$ = $3; }
	;

variable_suffix:
	'$'
	| INT_CONST
	| '-' INT_CONST
	;

Regular expression syntax

CScout allows you to specify regular expressions for specifying identifier or file names you are looking for. The following description of the regular expressions CScout accepts is adapted from the FreeBSD re_format(7) manual page.

Regular expressions (``REs''), as defined in IEEE Std 1003.2 (``POSIX.2''), come in two forms: modern REs (roughly those of egrep(1); 1003.2 calls these ``extended'' REs) and obsolete REs (roughly those of ed(1); 1003.2 ``basic'' REs). CScout has adopted the use of modern (extended) REs.

A (modern) RE is one= or more non-empty= branches, separated by `|'. It matches anything that matches one of the branches.

A branch is one= or more pieces, concatenated. It matches a match for the first, followed by a match for the second, etc.

A piece is an atom possibly followed by a single= `*', `+', `?', or bound. An atom followed by `*' matches a sequence of 0 or more matches of the atom. An atom followed by `+' matches a sequence of 1 or more matches of the atom. An atom followed by `?' matches a sequence of 0 or 1 matches of the atom.

A bound is `{' followed by an unsigned decimal integer, possibly followed by `,' possibly followed by another unsigned decimal integer, always fol- lowed by `}'. The integers must lie between 0 and RE_DUP_MAX (255=) inclusive, and if there are two of them, the first may not exceed the second. An atom followed by a bound containing one integer i and no comma matches a sequence of exactly i matches of the atom. An atom fol- lowed by a bound containing one integer i and a comma matches a sequence of i or more matches of the atom. An atom followed by a bound containing two integers i and j matches a sequence of i through j (inclusive) matches of the atom.

An atom is a regular expression enclosed in `()' (matching a match for the regular expression), an empty set of `()' (matching the null string)=, a bracket expression (see below), `.' (matching any single character), `^' (matching the null string at the beginning of a line), `$' (matching the null string at the end of a line), a `\' followed by one of the characters `^.[$()|*+?{\' (matching that character taken as an ordinary character), a `\' followed by any other character= (matching that character taken as an ordinary character, as if the `\' had not been present=), or a single character with no other significance (matching that character). A `{' followed by a character other than a digit is an ordinary character, not the beginning of a bound=. It is illegal to end an RE with `\'.

A bracket expression is a list of characters enclosed in `[]'. It nor- mally matches any single character from the list (but see below). If the list begins with `^', it matches any single character (but see below) not from the rest of the list. If two characters in the list are separated by `-', this is shorthand for the full range of characters between those two (inclusive) in the collating sequence, e.g. `[0-9]' in ASCII matches any decimal digit. It is illegal= for two ranges to share an endpoint, e.g. `a-c-e'. Ranges are very collating-sequence-dependent, and portable programs should avoid relying on them.

To include a literal `]' in the list, make it the first character (fol- lowing a possible `^'). To include a literal `-', make it the first or last character, or the second endpoint of a range. To use a literal `-' as the first endpoint of a range, enclose it in `[.' and `.]' to make it a collating element (see below). With the exception of these and some combinations using `[' (see next paragraphs), all other special charac- ters, including `\', lose their special significance within a bracket expression.

Within a bracket expression, a collating element (a character, a multi- character sequence that collates as if it were a single character, or a collating-sequence name for either) enclosed in `[.' and `.]' stands for the sequence of characters of that collating element. The sequence is a single element of the bracket expression's list. A bracket expression containing a multi-character collating element can thus match more than one character, e.g. if the collating sequence includes a `ch' collating element, then the RE `[[.ch.]]*c' matches the first five characters of `chchcc'.

Within a bracket expression, a collating element enclosed in `[=' and `=]' is an equivalence class, standing for the sequences of characters of all collating elements equivalent to that one, including itself. (If there are no other equivalent collating elements, the treatment is as if the enclosing delimiters were `[.' and `.]'.) For example, if `x' and `y' are the members of an equivalence class, then `[[=x=]]', `[[=y=]]', and `[xy]' are all synonymous. An equivalence class may not= be an end- point of a range.

Within a bracket expression, the name of a character class enclosed in `[:' and `:]' stands for the list of all characters belonging to that class. Standard character class names are:

   alnum    digit    punct
   alpha    graph    space
   blank    lower    upper
   cntrl    print    xdigit
These stand for the character classes defined in ctype(3). A locale may provide others. A character class may not be used as an endpoint of a range.

There are two special cases= of bracket expressions: the bracket expres- sions `[[:<:]]' and `[[:>:]]' match the null string at the beginning and end of a word respectively. A word is defined as a sequence of word characters which is neither preceded nor followed by word characters. A word character is an alnum character (as defined by ctype(3)) or an underscore. This is an extension, compatible with but not specified by IEEE Std 1003.2 (``POSIX.2''), and should be used with caution in soft- ware intended to be portable to other systems.

In the event that an RE could match more than one substring of a given string, the RE matches the one starting earliest in the string. If the RE could match more than one substring starting at that point, it matches the longest. Subexpressions also match the longest possible substrings, subject to the constraint that the whole match be as long as possible, with subexpressions starting earlier in the RE taking priority over ones starting later. Note that higher-level subexpressions thus take priority over their lower-level component subexpressions.

Match lengths are measured in characters, not collating elements. A null string is considered longer than no match at all. For example, `bb*' matches the three middle characters of `abbbc', `(wee|week)(knights|nights)' matches all ten characters of `weeknights', when `(.*).*' is matched against `abc' the parenthesized subexpression matches all three characters, and when `(a*)*' is matched against `bc' both the whole RE and the parenthesized subexpression match the null string.

If case-independent matching is specified, the effect is much as if all case distinctions had vanished from the alphabet. When an alphabetic that exists in multiple cases appears as an ordinary character outside a bracket expression, it is effectively transformed into a bracket expres- sion containing both cases, e.g. `x' becomes `[xX]'. When it appears inside a bracket expression, all case counterparts of it are added to the bracket expression, so that (e.g.) `[x]' becomes `[xX]' and `[^x]' becomes `[^xX]'.

IDE and Makefile integration

It is relatively easy to integrate CScout into an existing IDE (such as Eclipse) or to provide an alternative method for specifying workspaces by directly creating raw compiled workspace file from existing Makefiles. A compiled workspace file is a C file containing a number of #pragma preprocessor directives. CScout uses the following pragmas:
#pragma echo "STRING"
Will display the STRING on CScout's standard output when that part of the file is reached.

Example:

#pragma echo "Processing workspace date\n"
#pragma ro_prefix "STRING"
Will add STRING to the list of filename prefixes that mark read-only files. This is a global setting.

Example:

#pragma ro_prefix "C:\gcc"
#pragma project "STRING"
Will set the current project to STRING. All identifiers and files processed from then on will belong to the given project.

Example:

#pragma project "date"
#pragma block_enter
Will enter a nested scope block. Two blocks are supported, the first block_enter will enter the project scope (linkage unit); the second encountered nested block_enter will enter the file scope (compilation unit).

#pragma block_exit
Will exit a nested scope block. The number of block_enter pragmas should match the number of block_exit pragmas and there should never be more than two block_enter pragmas in effect.

#pragma process "STRING"
Will analyze (CScout's equivalent to compiling) the C source file named STRING.

Example:

#pragma process "date.d"
#pragma pushd "STRING"
Will set the current directory to STRING, saving the previous current directory in a stack. From that point onward, all relative file accesses will start from the given directory.

Example:

#pragma pushd "cp"
#pragma popd
Will restore the current directory to the one in effect before a previously pushed directory. The number of pushd pragmas should match the number of popd pragmas.

#pragma includepath "STRING"
Will add STRING to the list of directories used for searching included files (the include path).

Example:

#pragma includepath "/usr/lib/gcc-lib/i386-redhat-linux/2.96/include"
#pragma clear_includes
Will clear the include path, allowing the specification of a new one.

#pragma clear_defines
Will clear all defined macros allowing the specification of new ones. Should normally be executed before processing a new file. Note that macros can be defined using the normal #define C preprocessor directive.
The following is a complete example of a CScout compiled workspace definition.
// workspace bin
#pragma echo "Processing workspace bin\n"
#pragma ro_prefix "/usr/include"
#pragma echo "Entering directory /usr/src/bin"
#pragma pushd "/usr/src/bin"
// project date
#pragma echo "Processing project date\n"
#pragma project "date"
#pragma block_enter
#pragma echo "Entering directory date"
#pragma pushd "date"
// file date.c
#pragma echo "Processing file date.c\n"
#pragma block_enter
#pragma clear_defines
#pragma clear_include
#include "/home/dds/src/cscout/cscout_defs.h"
#include "/home/dds/src/cscout/cscout_incs.h"
#pragma process "date.c"
#pragma block_exit
#pragma echo "Done processing file date.c\n"
#pragma echo "Exiting directory date\n"
#pragma popd
#pragma block_exit
#pragma echo "Done processing project date\n"
#pragma echo "Exiting directory /usr/src/bin\n"
#pragma popd
#pragma echo "Done processing workspace bin\n"

Case Study: Processing the FreeBSD Kernel

As a further example consider the way CScout was applied on the FreeBSD kernel, processing five different architectures. These are the steps to follow:
  1. Configure a LINT or GENERIC version of each architecture's kernel.
    Example: config LINT
  2. Go to the compilation directory, update the dependencies (make depend) and compile (make). This step is used to create all automatically generated C and header files. Also during this step note the include path used, in order to provide CScout with the same specification.
  3. Remove the generated object files to force a make operation to rebuild them (rm *.o).
  4. Replace the C compiler invocation command in the Makefile with an appropriate series of shell commands.
    .include "$S/conf/kern.pre.mk"
    # The code below was added after the line above
    NORMAL_C= echo '\#pragma echo "Processing file ${.IMPSRC}\n"' >>kernel.cs ;\
            echo '\#pragma block_enter' >>kernel.cs ;\
            echo '\#pragma clear_defines' >>kernel.cs ;\
            echo '\#pragma clear_include' >>kernel.cs ;\
            echo '\#include "cscout_defs.h"' >>kernel.cs ;\
            echo '\#pragma includepath "."' >>kernel.cs ;\
            echo '\#pragma includepath "../../.."' >>kernel.cs ;\
            echo '\#pragma includepath "../../../dev"' >>kernel.cs ;\
            echo '\#pragma includepath "../../../contrib/dev/acpica"' >>kernel.cs ;\
            echo '\#pragma includepath "../../../contrib/ipfilter"' >>kernel.cs ;\
            echo '\#pragma includepath "../../../contrib/dev/ath"' >>kernel.cs ;\
            echo '\#pragma includepath "../../../contrib/dev/ath/freebsd"' >>kernel.cs ;\
            echo '\#define _KERNEL 1' >>kernel.cs ;\
            echo '\#pragma process "opt_global.h"' >>kernel.cs ;\
            echo '\#pragma process "${.IMPSRC}"' >>kernel.cs ;\
            echo '\#pragma block_exit' >>kernel.cs ;\
            echo '\#pragma echo "Done processing file ${.IMPSRC}\n"' >>kernel.cs
    
  5. Create a cscout_incs.h file for each different architecture.
  6. Remove kernel.cs The existing file documents the way to do it.
  7. Run make on the custom Makefile
  8. Repeat for each different architecture
  9. Create a top-level CScout file to process all architectures:
    #pragma echo "Processing workspace FreeBSD kernel\n"
    
    #pragma echo "Entering directory sys/i386/compile/LINT\n"
    #pragma pushd "sys/i386/compile/LINT"
    #pragma echo "Processing project i386\n"
    #pragma project "i386"
    #pragma block_enter
    #include "kernel.cs"
    #pragma echo "Exiting directory sys/i386/compile/LINT\n"
    #pragma popd
    #pragma echo "Done processing project i386\n"
    #pragma block_exit
    
    #pragma echo "Entering directory sys/amd64/compile/GENERIC\n"
    // [...]
    // and so on for all architectures
    // [...]
    #pragma echo "Exiting directory sys/sparc64/compile/LINT\n"
    #pragma popd
    #pragma echo "Done processing project sparc64\n"
    #pragma block_exit
    
    Note that the block_enter and block_exit pragmas are furnished by this top-level file.
The run of the above specification (2 million unique lines) took 330 CPU minutes on a Rioworks HDAMA (AMD64) machine (2x1.8GHz Opteron 244 (in UP mode) - AMD 8111/8131 chipset, 8192MB mem) and required 1474MB of RAM. These are the complete metrics:

CScout Home

File Metrics

Writable Files
Number of files: 4310

File metricTotalMinMaxAvg
Number of characters625057700100834514502
Comment characters159217520850593694
Space characters79364010739681841
Number of line comments19040
Number of block comments1762530433740
Number of lines2063096027336478
Length of longest line3370490186778
Number of C strings13251901929630
Number of defined functions2958403336
Number of preprocessor directives26754202733662
Number of directly included files35408016088
Number of C statements67982504465157

Access control

The unsupported version of CScout allows any machine from the Internet to connect to your server for casual browsing. Operations requiring substantial CPU resources, or operations that will modify source files or will change the CScout execution environment can only be performed from the local host.

The supported version of CScout supports an access control list. The list is specified in a file called cscout_acl which should be located in $CSCOUT_HOME, $HOME/.cscout, or .cscout in the current directory. The list contains lines with IP numeric addresses prefixed by an A (allow) or D (deny) prefix and a space. Matching is performed by comparing a substring of a machine's IP address against the specified access rule. Thus an entry such as

A 128.135.11.
can be used to allow access from a whole subnet. Unfortunatelly allowing access from the IP address 192.168.1.1 will also allow access 192.168.1.10, 192.168.1.100, and so on. Allow and deny entries cannot be combined in a useful manner since the rules followed are: Thus you will either specify a restricted list of allowed hosts, or allow access to the world, specifying a list of denied hosts.

Shortcomings

The nature of the C language and its preprocessor can result in pathological cases that can confuse the CScout analysis and substitution engine. In all cases the confusion only results in erroneous analysis or substitutions of the particular identifiers and will not affect other parts of the code. In some cases you can even slightly modify your workspace definition or code to ensure CScout works as you intend. The following cases are the most important:
  1. Conditional compilation

    Some programs have parts of them compiled under conditional preprocessor directives. Consider the following example:

    #ifdef unix
    #include <unistd.h>
    #define erase_file(x) unlink(x)
    #endif
    
    #ifdef WIN32
    #include <windows.h>
    #define erase_file(x) DeleteFile(x)
    #endif
    
    main(int argc, char *argv[])
    {
            erase_file(argv[1]);
    }
    
    As humans we can understand that erase_file occurs three times within the file. However, because CScout preprocesses the file following the C preprocessor semantics, it will typically match only two instances. In some cases you can get around this problem by defining macros that will ensure that all code inside conditional directives gets processed. In other cases this will result in errors (e.g. a duplicate macro definition in the above example). In such cases you can include in your workspace the same project multiple times, each time with a different set of defined macros.
    workspace example {
    	project idtest {
    		define DEBUG 1
    		define TEST 1
    		file idtest.c util.c
    	}
    	project idtest2 {
    		define NDEBUG 1
    		define PRODUCTION
    		file idtest.c util.c
    }
    
  2. Partial coverage of macro use

    Consider the following example:

    struct s1 {
            int id;
    } a;
    
    struct s2 {
            char id;
    } b;
    
    struct s3 {
            double id;
    } c;
    
    #define getid(x) ((x)->id)
    
    main()
    {
            printf("%d %c", getid(a), getid(b));
    }
    
    In the above example, changing an id instance should also change the other three instances. However, CScout will not associate the member of s3 with the identifier appearing in the getid macro or the s1 or s2 structures, because there is no getid macro invocation to link them together. If e.g. id is replaced with val the program will compile and function correctly, but when one tries to access the c struture's member in the future using getid an error will result.
    struct s1 {
            int val;
    } a;
    
    struct s2 {
            char val;
    } b;
    
    struct s3 {
            double id;
    } c;
    
    #define getid(x) ((x)->val)
    
    main()
    {
            printf("%d %c", getid(a), getid(b));    /* OK */
            printf(" %g", getid(c));                /* New statement: error */
    }
    
    To avoid this (rare) problem you can introduce dummy macro invocations of the form:
    #ifdef CSCOUT
            (void)getid(d)
    #endif
    
  3. Undefined macros

    We employ a heuristic classifying all instances of an undefined macro as being the same identifier. Thus in the following sequence foo will match all three macro instances:

    #undef foo
    
    #ifdef foo
    #endif
    
    #ifdef foo
    #endif
    
    #define foo 1
    
    In most cases this is what you want, but there may be cases where the macro appears in different files and with a different meaning. In such cases the undefined instances of the macro will erroneously match the defined instance.

Error messages

Warnings

Fatal Errors

Errors

License

The free unsupported CScout version is distributed under the CScout Public License. It allows free use of CScout for analyzing and modifying free/Open Source software.

For using CScout on non-free/proprietary software, the CScout supported version and associated license are available. The supported version comes with a normal commercial software license, with none of the special restrictions of this license.


THE CSCOUT PUBLIC LICENSE version 1.0

Copyright (C) 2003 Diomidis Spinelllis, Athens, Greece.
Everyone is permitted to copy and distribute this license document.

The intent of this license is to establish freedom to use, share, and change the software regulated by this license under the open source model.

This license applies to any software containing a notice placed by the copyright holder saying that it is covered by the terms of the CSCOUT Public License version 1.0. Such software is herein referred to as the Software. This license covers modification and distribution of the Software and the use of the Software for the development and maintenance of free software.

Granted Rights

1. You are granted the non-exclusive rights set forth in this license provided you agree to and comply with any and all conditions in this license. Whole or partial distribution or use of the Software in any form or way signifies acceptance of this license.

2. You may copy and distribute the Software in unmodified form provided that the entire package, including - but not restricted to - copyright, trademark notices and disclaimers, as released by the initial developer of the Software, is distributed under this license.

3. You may make modifications to the Software's source code and distribute your modifications, in a form that is separate from the Software, such as patches. The following restrictions apply to modifications:

a. Modifications must not alter or remove any copyright notices in the Software.

b. When modifications to the Software are released under this license, a non-exclusive royalty-free right is granted to the initial developer of the Software to distribute your modification in future versions of the Software provided such versions remain available under these terms in addition to any other license(s) of the initial developer.

c. The machine-executable (compiled) parts of the Software shall not be modified.

4. You may use the original or modified versions of the Software to analyze and modify application programs, libraries, or other software legally developed by you or by others provided that when these items are distributed in any form you satisfy following requirements:

a. You must ensure that all recipients of machine-executable forms of these items are also able to receive and use the complete machine-readable source code to the items without any charge beyond the costs of data transfer.

b. You must explicitly license all recipients of your items to use and re-distribute original and modified versions of the items in both machine-executable and source code forms. The recipients must be able to do so without any charges whatsoever, and they must be able to re-distribute to anyone they choose.

c. If the items are not available to the general public, and the initial developer of the Software requests a copy of the items, then you must supply one.

5. You acknowledge and accept the fact that the Software may contain technical measures to enforce parts of this license (such as providing the public with a browsable version of the code you are analyzing, and the transmission of workspace-related data) and agree not to interfere with these measures.

Limitations of Liability

In no event shall the initial developers or copyright holders be liable for any damages whatsoever, including - but not restricted to - lost revenue or profits or other direct, indirect, special, incidental or consequential damages, even if they have been advised of the possibility of such damages, except to the extent invariable law, if any, provides otherwise.

No Warranty

THE SOFTWARE AND ITS DOCUMENTATION ARE PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF DESIGN, MERCHANTIBILITY, OR FITNESS FOR A PARTICULAR PURPOSE.

Choice of Law

This license is governed by the Laws of Greece. Disputes shall be settled by the Courts of Athens.


Differences between the unsupported free version and the supported version licence and software

Free / Unsupported Version Supported Version
Distributd under the CScout public license Distributed under a commercial software license and a support contract
Shall only be used on open source software. Can be used on proprietary software
Unsupported Includes 8 hours of email-based installation and configuration support and two years of free software updates.
After a large workspace is processed, the workspace is registered for public browsing at CScout's Web site. Project metrics are sent to the Web site and recorded for statistical processing. Web communication is only performed for validating the software's license. No details other than the host's name and IP address are communicated.
The entire Internet is allowed read-only access to the CScout server. Access is regulated by a fully configurable access control list, defaulting to localhost-only access.
Only users on the local host are allowed read-write access to the server. Read-write access through specified remote hosts is possible.
Includes C source obfuscation back-end.

Frequently asked questions

How do I handle conditional compilation?

You can either define macros that will cover all conditional cases, or process the same project multiple times using different macro definitions. See this page.

What details does CScout send when calling its home base?

Your name, address, and credit card number :-)

Seriously, the data sent to our server consists of

The authentication system is really a child's toy, even an idiot would be able to hack it, so don't bother.

How can I save an identifier or file query?

Simply bookmark the page that shows the query's results. You can even pass the URL around or print it on a T-shirt; the URL contains the whole query.

What do I do with automatically generated files?

Some projects use mini domain-specific languages similar to yacc and lex to express some of their elements. CScout can natively parse C and yacc source files, but no other language. Obviously changes should be performed in the original domain-specific files, rather than the generated C code. On the other hand, CScout can not parse the original files, but can parse the generated code. To escape this situation include the automatically generated file in your workspace definition, but define it as read-only. In this way CScout will not allow you to modify identifiers appearing in it.

The cscout command manual page

NAME
SYNOPSIS
DESCRIPTION
OPTIONS
EXAMPLE
SEE ALSO
AUTHOR

NAME

cscout - C code analyzer and refactoring browser

SYNOPSIS

cscout [-cErv] [-p port] [-m specification] file

DESCRIPTION

CScout is a source code analyzer and refactoring browser for collections of C programs. It can process workspaces of multiple projects (we define a project as a collection of C source files that are linked together) mapping the complexity introduced by the C preprocessor back into the original C source code files. CScout takes advantage of modern hardware advances (fast processors and large memory capacities) to analyze C source code beyond the level of detail and accuracy provided by current compilers and linkers. The analysis CScout performs takes into account the identifier scopes introduced by the C preprocessor and the C language proper scopes and namespaces.

CScout as a source code analyzer can:

annotate source code with hyperlinks to each identifier

list files that would be affected by changing a specific identifier

determine whether a given identifier belongs to the application or to an external library based on the accessibility and location of the header files that declare or define it

locate unused identifiers taking into account inter-project dependencies

perform queries for identifiers based on their namespace, scope, reachability, and regular expressions of their name and the filename(s) they are found in,

perform queries for files, based on their metrics, or properties of the identifiers they contain

monitor and report superfluously included header files

provide accurate metrics on identifiers and files

More importantly, CScout helps you in refactoring code by identifying dead objects to remove, and can automatically perform accurate global rename identifier refactorings. CScout will automatically rename identifiers

taking into account the namespace of each identifier: a renaming of a structure tag, member, or a statement label will not affect variables with the same name

respecting the scope of the renamed identifier: a rename can affect multiple files, or variables within a single block, exactly matching the semantics the C compiler would enforce

across multiple projects when the same identifier is defined in common shared include files

occuring in macro bodies and parts of other identifiers, when these are created through the C preprocessor’s token concatenation feature

This manual page describes the CScout invocation and command-line options. Details about its web interface, setup, and configuration can be found in the online hypertext documentation and at the project’s home page http://www.spinellis.gr/cscout.

OPTIONS

-c

Exit immediately after processing the specified files. Useful, when you simply want to check the source code for errors.

-E

Preprocess the specified file and send the result to the standard output. Note that for this option to work correctly, you need to also process the workspace definition file with -E.

-p port

The web server will listen for requests on the TCP port number specified. By default the CScout server will listen at port 8081. The port number must be in the range 1024-32767.

-m specification

Specify the type of identifiers that CScout will monitor. The identifier attribute specification is given using the syntax: Y|L|E|T[:attr1][:attr2]... The meaning of the first letter is:

Y:

Match any of the specified attributes

L:

Match all of the specified attributes

E:

Exclude the specified attributes matched

T:

Exact match of the specified attributes

Allowable attribute names and their corresponding meanings are:

unused:

Unused identifier

writable:

Writable identifier

ro:

Read-only identifier

tag:

Tag for a struct/union/enum

member:

Member of a struct/union

label:

Label

obj:

Ordinary identifier (note that enumeration constants and typedefs belong to the ordinary identifier namespace)

macro:

Preprocessor macro

umacro:

Undefined preprocessor macro

macroarg:

Preprocessor macro argument

fscope:

Identifier with file scope

pscope:

Identifier with project scope

typedef:

Typedef

enumconst:

Enumeration constant

The -m flag can provide enormous savings on the memory CScout uses (specify e.g. -m Y:pscope to only track project-global identifiers), but the processing CScout performs under this flag is unsound. The flag should therefore be used only if you are running short of memory. There are cases where the use of preprocessor macros can change the attributes of a given identifier shared between different files. Since the -m optimization is performed after each single file is processed, the locations where an identifier is found may be misrepresented.

-r

Report on the standard error output warnings about unused and wrongly scoped identifiers and unused included files. The error message format is compatible with gcc and can therefore be automatically processed by editors that recognize this format.

-v

Display the CScout version and copyright information and exit.

EXAMPLE

Assume you want to analyze three programs in /usr/src/bin. You first create the following project definition file, bin.prj.

# Some small tools from the src/bin directory
workspace bin {
        ro_prefix &quot;/usr/include&quot;
        cd &quot;/usr/src/bin&quot;
        project cp {
                cd &quot;cp&quot;
                file cp.c utils.c
        }
        project echo {
                cd &quot;echo&quot;
                file echo.c
        }
        project date {
                cd &quot;date&quot;
                file date.c
        }
}

Then you compile the workspace file bin.prj by running the CScout workspace compiler cswc on it, and finally you run cscout on the compiled workspace file. At that point you are ready to analyze your code and rename its identifiers through your web browser.

$ cswc bin.prj >bin.cs
$ cscout bin.cs
Processing workspace bin
Entering directory /usr/src/bin
Processing project cp
Entering directory cp
Processing file cp.c
Done processing file cp.c
Processing file utils.c
Done processing file utils.c
Exiting directory cp
Done processing project cp
Processing project echo
Entering directory echo
Processing file echo.c
Done processing file echo.c
Exiting directory echo
Done processing project echo
Processing project date
Entering directory date
Processing file date.c
Done processing file date.c
Exiting directory date
Done processing project date
Exiting directory /usr/src/bin
Done processing workspace bin
Post-processing /usr/home/dds/src/cscout/bin.c
[...]
Post-processing /vol/src/bin/cp/cp.c
Post-processing /vol/src/bin/cp/extern.h
Post-processing /vol/src/bin/cp/utils.c
Post-processing /vol/src/bin/date/date.c
Post-processing /vol/src/bin/date/extern.h
Post-processing /vol/src/bin/date/vary.h
Post-processing /vol/src/bin/echo/echo.c
Processing identifiers
100%
We are now ready to serve you at http://localhost:8081

SEE ALSO

cswc(1)

AUTHOR

(C) Copyright 2003 Diomidis Spinellis.


The cswc command manual page

NAME
SYNOPSIS
DESCRIPTION
OPTIONS
EXAMPLE
SEE ALSO
AUTHOR

NAME

cswc - CScout workspace compiler

SYNOPSIS

cswc [-vE] [-d directory] [file]

DESCRIPTION

cswc is a workspace compiler for the CScout C source code analyzer and refactoring browser. CScout integrates in a single process the functionality of a multi-project build engine, an ANSI C preprocessor, and the parts of a C compiler up to and including the semantic analysis based on types. The build engine functionality is required to allow the user to process multiple compilation and link units as a single batch. Only thus can CScout detect dependencies across different files and projects. Each compilation unit can reside in a different directory and can require processing using different macro definitions or a different include file path. In a normal build process these options are typically specified in a Makefile. The CScout operation is similarly guided by a declarative workspace definition file. To decouple the complexity of the CScout workspace processing specification from its actual operation, and to encouriage experimentation with alternative (e.g. IDE-based) workspace specification mechanisms, CScout is guided by a very simple imperative script typically generated from more sophisticated workspace definitions by cswc, the CScout workspace compiler.

This manual page describes the cswc invocation and command-line options. Details about its input and output formats, setup, and configuration can be found in the online hypertext documentation and at the project’s home page http://www.spinellis.gr/cscout.

OPTIONS

-E

Generate a modified CScout script that will be used by CScout to preprocess the specified file and send the result to the standard output. Note that for this option to work correctly, you need to also specify -E in the CScout invocation.

-d directory

Specify the directory to use for locating the CScout configuration files.

-v

Display the cswc version and copyright information and exit.

EXAMPLE

The following is a configuration file used for processing the apache web server.

workspace apache {
    cd &quot;/usr/local/src/apache/src&quot;

   ro_prefix &quot;/usr/local/src/apache/src/include/ap_config&quot;

   # Global project definitions
    define HTTPD_ROOT &quot;/usr/local/apache&quot;
    define SUEXEC_BIN &quot;/usr/local/apache/bin/suexec&quot;
    define SHARED_CORE_DIR &quot;/usr/local/apache/libexec&quot;
    define DEFAULT_PIDLOG &quot;logs/httpd.pid&quot;
    define DEFAULT_SCOREBOARD &quot;logs/httpd.scoreboard&quot;
    define DEFAULT_LOCKFILE &quot;logs/httpd.lock&quot;
    define DEFAULT_XFERLOG &quot;logs/access_log&quot;
    define DEFAULT_ERRORLOG &quot;logs/error_log&quot;
    define TYPES_CONFIG_FILE &quot;conf/mime.types&quot;
    define SERVER_CONFIG_FILE &quot;conf/httpd.conf&quot;
    define ACCESS_CONFIG_FILE &quot;conf/access.conf&quot;
    define RESOURCE_CONFIG_FILE &quot;conf/srm.conf&quot;

   define AUX_CFLAGS
    define LINUX 22
    define USE_HSREGEX
    define NO_DL_NEEDED

   # Give project-specific directory and include path properties
    project gen_uri_delims {
        cd &quot;main&quot;
        ipath &quot;../os/unix&quot;
        ipath &quot;../include&quot;
        file gen_uri_delims.c
    }

   # Alternative formulation; specify per-file properties
    project gen_test_char {
        file gen_test_char.c {
            cd &quot;main&quot;
            ipath &quot;../os/unix&quot;
            ipath &quot;../include&quot;
        }
    }

   # httpd executable; specify directory-based properties
    project httpd {
        directory main {
            ipath &quot;../os/unix&quot;
            ipath &quot;../include&quot;
            file alloc.c buff.c http_config.c http_core.c
            file http_log.c http_main.c http_protocol.c
            file http_request.c http_vhost.c util.c util_date.c
            file util_script.c util_uri.c util_md5.c rfc1413.c
        }
        directory regex {
            ipath &quot;.&quot;
            ipath &quot;../os/unix&quot;
            ipath &quot;../include&quot;
            define POSIX_MISTAKE
            file regcomp.c regexec.c regerror.c regfree.c
        }
        directory os/unix {
            ipath &quot;../../os/unix&quot;
            ipath &quot;../../include&quot;
            file os.c os-inline.c
        }
        directory ap {
            ipath &quot;../os/unix&quot;
            ipath &quot;../include&quot;
            file ap_cpystrn.c ap_execve.c ap_fnmatch.c ap_getpass.c
            file ap_md5c.c ap_signal.c ap_slack.c ap_snprintf.c
            file ap_sha1.c ap_checkpass.c ap_base64.c ap_ebcdic.c
        }
        directory modules/standard {
            ipath &quot;../../os/unix&quot;
            ipath &quot;../../include&quot;
            file mod_env.c mod_log_config.c mod_mime.c
            file mod_negotiation.c mod_status.c mod_include.c
            file mod_autoindex.c mod_dir.c mod_cgi.c mod_asis.c
            file mod_imap.c mod_actions.c mod_userdir.c
            file mod_alias.c mod_access.c mod_auth.c mod_setenvif.c
        }
        directory . {
            ipath &quot;./os/unix&quot;
            ipath &quot;./include&quot;
            file modules.c buildmark.c
        }
    }
}

SEE ALSO

cscout(1)

AUTHOR

(C) Copyright 2003 Diomidis Spinellis.


Change history

Version 1.16
Version 1.15
Version 1.14
Version 1.13
Version 1.12
Version 1.10
Version 1.9
Version 1.8
First public release

Last change: Wednesday, August 27, 2003 4:52 pm
(C) Copyright 2000-2003 Diomidis Spinellis. May be freely viewed using web browsers and similar programs. All other rights reserved.