3. <instr>/instr

<instr>/instr — offline class instrumentor.

3.1. Description

<instr>/instr is EMMA's offline class instrumentor. It adds bytecode instrumentation to all classes found in an instrumentation path that also pass through user-provided coverage filters. Additionally, it produces the class metadata file necessary for associating runtime coverage data with the original class definitions during coverage report generation.

Instrumentation path. Note that the classes to be instrumented are taken from a path element that is exactly like the kind taken by normal JDK tools and ANT tasks: it is a list of directories (containing .class files) and .jar/.zip archives (specified as an arbitrary number of instrpath (-ip) options). All non-existent or duplicate entries in the instrumentation path are effectively ignored during processing.

Class-Path manifest entries

Note that <instr>/instr processes Class-Path entries in the manifests of class archives that it encounters. This is by design and is the desirable behavior (especially in the overwrite and fullcopy output modes), but care needs to be taken to avoid processing unintended implicit path segments.

Output modes. To accomodate different build and testsuite designs <instr>/instr has three different modes for how it outputs instrumented classes:

copy

In this mode, all instrumented classes are output to a single destination directory, regardless of whether the source classes came from a directory or an archive. Furthemore, only the classes and archive entries that are in the instrumentation set are written out. The idea here is to process just the necessary classes in as few disk I/O operations as possible.

For coverage-enabled application/testsuite runs the destination directory needs to be placed in the classpath ahead of the original classes. If this is inconvenient (say, because you need to package classes in archives before you can run), the overwrite mode might be a better option.

overwrite

This mode is similar to copy, except it overwrites the original class and archive files. This is ideal as a pre-packaging step turned on only when coverage-enabled application/testsuite runs are needed. Its advantage over the copy mode is that it can do jar-to-jar processing and eliminates the need to prepend a special output directory to the classpath. Its disadvantage is the extra CPU and disk I/O times needed to duplicate archive entries that are not being instrumented[3].

fullcopy

This mode is a hybrid between copy and overwrite. It offers the convenience of mixed individual class file and jar-to-jar processing without having to overwrite the original content. In this mode, the destination directory is split into two subdirectories, classes and lib, which accept instrumented class files and instrumented class archives, respectively.

Note that because in this mode <instr>/instr has to copy the most content (both files and archive entries that are not being instrumented), this mode could be the slowest of the three. The exact performance behavior depends on the relative speeds of your CPU and I/O subsystems and on the relative content mixes between class files and class archives in the input.

By design, in all output modes that can do jar-to-jar processing <instr>/instr does not compress the instrumented zip entries in the output archives. This saves CPU time needed for doing compression, usually at an acceptable cost in the increased disk space taken by the affected archive files.

Class coverage metadata. An important byproduct of class instrumentation is class metadata. As described in more detail elsewhere, EMMA coverage is based on instrumenting basic bytecode blocks. Every instrumentation run outputs a compact representation of data necessary to associate coverage of an individual basic block with its parent method and class as well as the original Java source lines that map to this basic basic (there is a metadata entry for every class in the instrumentation class set). Class metadata from each offline instrumentation run needs to be saved in a disk file, because it will be required for coverage stats computation and coverage report generation.

Note that when <instr>/instr writes metadata into a file, it will by default merge incoming metadata into the existing data in the destination file (if it exists). This behavior is also necessary to support incremental instrumentation, as described shortly.

Class metadata merging. To avoid any ambiguities, it is necessary to completely specify how <instr>/instr resolves duplicate data during instrumentation path processing:

  1. During a given instrumentation run, all directory and archive entries in the instrumentation path are processed left-to-right. All duplicates (defined as entries with the same canonical file pathnames) are skipped. As noted above, valid Class-Path manifest entries are also processed, in the order they are discovered. This sequence is thus the same as it would be for classloading lookup if the instrumentation path were used as a classpath.
  2. It is still possible that during the same instrumentation run identical class names are encountered (e.g., if the same class name shows up in differently named archives). To stay consistent with classloading lookup rules (the first class definition in a classpath wins), <instr>/instr will instrument and emit metadata only for the first class definition it encounters.
  3. Finally, it is possible that multiple metadata entries for idential class names are brought together when metadata from independent instrumentation runs is merged together. The rule here is that the last metadata entry wins. The last entry is defined as either the last one merged into a given metadata file or (in the case of multiple files) contained in the last file in a given input file set.

The last point is best illustrated with an example. If both coverageA.em and coverageB.em contain metadata for class MyClass:

>java emma instr -ip ... -d coverageA.em ...
>java emma instr -ip ... -d coverageB.em ...

then the definition in coverageB.em wins in all these cases:

>java emma report -in coverageA.em -in coverageB.em ...
>java emma report -in coverageA.em,coverageB.em ...
>java emma merge -in coverageA.em -in coverageB.em ...

Similar rules apply to EMMA ANT tasks.

Incremental instrumentation and metadata merging. As is common knowledge, when working with javac, either from command line or via ANT's <javac> task, only the classes that were modified since the last compilation get re-compiled. This is incredibly convenient for an individual developer, as it makes a complex product build incremental: small changes to the source code results in quick incremental compiles. This is indispensable for the "code some—test some—repeat" approach to software development.

EMMA can be used such that it fully preserves the incremental nature of a build. The key to this is how class metadata is merged when it is output to the same file. Suppose a developer executes the following actions (EMMA command line tools are used here for compactness, but the same is possible with an EMMA-enhanced ANT build):

>javac -g -d classes src/my/java/sources/*.java
>java emma instr -ip classes ...
... edit some sources ...
>javac -g -d classes src/my/java/sources/*.java
... only the changed source files get re-compiled ...
>java emma instr -ip classes ...
... only the re-compiled class files get re-instrumented ...

In this case <instr>/instr was either in copy or in overwrite mode and it implicitly used the same default coverage metadata repository file, coverage.em, for each instrumentation run. In the copy mode, <instr>/instr instruments only the class files whose instrumented versions in the output directory are older than their javac-produced original versions. In the overwrite mode case, <instr>/instr will instrument (and overwrite) only the classes that are not already instrumented (because those would be the classes recently recompiled by javac). All later metadata entries written to coverage.em override any earlier definitions and it all works out correctly (and very fast).

Because the metadata is always up-to-date in this scenario, the developer can run his/her tests and look at coverage stats at any time he/she runs the tests, without doing an expensive rebuild of the entire project.

Runtime coverage data merging

Note that the rules for merging runtime coverage data are different: the data from different coverage runs is assumed to correspond to the same class definitions (in most cases EMMA will abort with an error if it detects a mismatch). Basic block coverage is merged such that the final coverage profile is a union of all merged profiles.

The following table summarizes the major differences between <instr>/instr output modes:

Table 2.2. <instr>/instr output mode summary

ModeSupports jar-to-jar processingSupports incremental instrumentationOutput behavior
copyNoYesAll instrumented classes are written to a single destination directory (only instrumented entities are written out), regardless of whether they come from class files or class archives.
overwriteYesYesInstrumented (and only instrumented) classes are overwritten in-place. Instrumented (and only instrumented) archive entries are updated in their archives.
fullcopyYesNoAll (instrumented or not) class files are written to a classes subdirectory of the destination directory. All (instrumented or not) class archives are written out to a lib subdirectory of the destination directory.

Internal EMMA properties that affect class instrumentation. Several property settings affect <instr>/instr behavior:

Most of these should normally be left with their default values. instr.do_suid.compensation can be set to false to gain extra instrumentation processing speed when runtime execution does not involve class de-serialization from existing files or serialization across JVMs.



[3] ZIP file format does not allow incremental updates. For every class archive in the instrumentation path, to replace the selected entries with their instrumented version EMMA has to create a temporary archive that eventually replaces the original. This implies that all zip entries not being instrumented must be copied from one archive file to the other.