Shakyboi - Part 3: Building the class dependency graph

February 28, 2021

Shake it like a polaroid picture.

Shakyboi - Part 3: Building the class dependency graph

Last time we build the scafolding for Shakyboi, including reading the raw bytes of .class files from various locations, unit tests, and class file parsing.

In this installement, we'll implement a class dependency graph generator. It's task is it to find all classes referenced by a set of root classes recursively. Let's visualize this using our simple example app from the first article.

The root class of our app is the App class, highlighted in red below, which is fed as input to the class dependency graph generator.

Classes to process: [App]
Reachable classes:  [ ]

We start out with the given root class, which is added to the set of classes to be processed. We then apply the following algorithm.

While there are classes to process:

Remove the next class from the set of classes to be processed.
If it's not already been loaded, load and parse its .class file.
Check if it has already been processed, in which case we goto step 1.
Add the class to the set of reachable classes.
If this is an app class, find the set of classes it references and

Add these classes to the set of classes to be processed.
Add these classes to this class' set of classes it depends on.

This is a simple non-recursive version of graph traversal. Let's execute it manually, iteration by iteration. Here's the state after the first iteration.

Classes to process: [Bar]
Reachable classes:  [App]

We've removed App from the set of classes to be processed, added it to the set of reachable classes, found that it depends on Bar, and added Bar to the set of classes to be processed, and to App's set of classes it depends on. Since there are still classes to be processed, we iterate again:

Classes to process: [Zap, Zop]
Reachable classes:  [App, Bar]

We've removed Bar from the set of classes to be processed, added it to the set of reachable classes, and found two new classes, Zap and Zop, which we add to the set of classes to be processed, and to Bar's set of classes it depends on. Let's continue iterating:

Classes to process: [Zop]
Reachable classes:  [App, Bar, Zap]

We've removed Zap from the set of classes to be processed, and added it to the set of reachable classes. No additional classes have been found in Zap! Let's process the next class:

Classes to process: [ ]
Reachable classes:  [App, Bar, Zap, Zop]

We've removed Zop from the set of classes to be processed, and added it to the set of reachable classes. No additional classes have been found. And since the set of classes to be processed, we've completed our search for reachable classes, consisting of App, Bar, Zap, and Zop.

Classes Foo and Zip can not be reached from the other classes. They can thus be removed from the application.

Mission accomplished? Not quite. For brevity reasons, I've excluded one aspect: bootstrap classes. If you look at any of the classes we processed, you'll find that they also depend on Object, String, System and PrintStream. These classes are bootstrap classes, i.e. not part of our application. If we encounter a bootstrap class during our search, it is registered as a dependency, but we do not process it. A bootstrap class should only reference classes not defined by the app. As such, anything a bootstrap class references is irrelevant for our class tree shaker.

Let's have a look at how we implement the above in plain old Java code, using the scafolding we built in the last installment.

Representing the class dependency graph in code

Since we are building a graph, we need a representation for nodes in the graph. The class ClassNode does just that:

public static class ClassNode {
    /** The {@link ClassFile} this node represents */
    public final ClassFile classFile;
    /** Whether this class comes from the app class lookup or the bootstrap class lookup */
    public final boolean isAppClass;
    /** The list of classes this class depends on. Filled by  {@link ClassDependencyGraphGenerator#generate(ClassLookup, ClassLookup, String...)} */
    public final List<ClassNode> dependsOn = new ArrayList<>(16);
    /** Whether this class is a root class */
    public boolean isRootClass;
    /** Whether this class has been processed by {@link ClassDependencyGraphGenerator#generate(ClassLookup, ClassLookup, String...)} **/
    public boolean isProcessed;

    public ClassNode(ClassFile classFile, boolean isAppClass) {
        this.classFile = classFile;
        this.isAppClass = isAppClass;
    }
}

Note the dependsOn list. It contains the classes the class the node represents depends on. In graph speak, it's a list of outgoing edges to the classes this class depends on.

The other fields store some book keeping data, such as if the class is an app class or not, or the ClassFile containing its parsed .class file contents. Yes, encapsulation and getters/setters be damned, all fields are public final.

The graph itself is represented by ClassDependencyGraph:

public class ClassDependencyGraph {
    /** The root classes as given to {@link ClassDependencyGraphGenerator#generate(ClassLookup, ClassLookup, String...)} **/
    public final List<ClassNode> rootClasses;
    /** All classes reachable by the class dependency graph **/
    public final Map<String, ClassNode> reachableClasses;

    /**
     * Constructs a new dependency graph from the given root classes and reachable classes.
     * @param rootClasses the root classes
     * @param reachableClasses the reachable classes, including the root classes
     */
    public ClassDependencyGraph(List<ClassNode> rootClasses, Map<String, ClassNode> reachableClasses) {
        this.rootClasses = rootClasses;
        this.reachableClasses = reachableClasses;
    }
}

All reachable classes are stored in a map, using the class name as the key. Additionally, the root classes are also stored as a linear list, which may come in handy later on (or not).

Let's have a look at the code generating the graph.

Generating the class dependency graph

The ClassDependencyGraphGenerator is responsible for generating a ClassDependencyGraph given an app ClassLookup, a boostrap ClassLookup, and a list of root classes to start the reachability analysis from. The main entry point is this method:

    public static ClassDependencyGraph generate(ClassLookup appClassLookup, ClassLookup bootstrapClassLookup, String... rootClassNames) throws IOException {
    var rootClasses = new ArrayList<ClassDependencyGraph.ClassNode>(); // the root classes nodes
    var reachableClasses = new HashMap<String, ClassDependencyGraph.ClassNode>(); // all reachable classes, processed and unprocessed
    var classesToProcess = new ArrayList<ClassDependencyGraph.ClassNode>(); // classes that still need to be processed

    // Lookup all root classes and add them to to the list of classes to be processed.
    for (String className : rootClassNames) {
        var classNode = lookupClassNode(className, reachableClasses, bootstrapClassLookup, appClassLookup);
        classNode.isRootClass = true;
        rootClasses.add(classNode);
        classesToProcess.add(classNode);
    }

    // Process classes until there no more classes to process.
    while (classesToProcess.size() > 0) {
        var classNode = classesToProcess.remove(classesToProcess.size() - 1);

        // If this class has already been processed, continue with the next class.
        if (classNode.isProcessed)
            continue;

        // Mark the class as processed.
        classNode.isProcessed = true;

        // Don't collect dependencies of bootstrap classes
        if (!classNode.isAppClass)
            continue;

        // Collect the classes referenced by this class and add them to the list
        // of classes to be processed if they haven't been processed yet. Also
        // add the classes to this class' set of classes it depends on.
        Set collectedClassNames = collectClassNames(classNode);
        for (String className : collectedClassNames) {
            var otherClassNode = lookupClassNode(className, reachableClasses, bootstrapClassLookup, appClassLookup);
            // Don't depend on this class itself
            if (otherClassNode.classFile.getName().equals(classNode.classFile.getName()))
                continue;
            if (!otherClassNode.isProcessed) {
                classesToProcess.add(otherClassNode);
            }
            classNode.dependsOn.add(otherClassNode);
        }
    }
    return new ClassDependencyGraph(rootClasses, reachableClasses);
}

This is an almost one-to-one translation of the algorithm given above! The most interesting bit is the call to collectClassNames(), which implements our logic to extract referenced classes from a .class file as described in the last post:

private static Set<String> collectClassNames(ClassDependencyGraph.ClassNode classNode) {
    var className = classNode.classFile.getName();
    var collectedClassNames = new HashSet<String>();

    // Collect class names from the constant pool
    var constantPool = classNode.classFile.constantPool;
    for (int i = 0; i < constantPool.size(); i++) {
        var entry = constantPool.get(i);
        if (entry == null)
            continue; // There might be null entries in the constant pool, i.e. for long and double entries.

        if (entry instanceof ClassFile.ClassInfoEntry) {
            var otherClassName = ((ClassFile.ClassInfoEntry) entry).getName();
            // A class info entry can also be an array descriptor. In this case, we
            // fetch the array element type.
            if (otherClassName.charAt(0) == '[') {
                otherClassName = getClassFromFieldDescriptor(otherClassName);
                if (otherClassName == null)
                    continue;
            }
            collectedClassNames.add(otherClassName);
        } else if (entry instanceof ClassFile.NameAndTypeEntry) {
            var nameAndTypeEntry = (ClassFile.NameAndTypeEntry) entry;
            String descriptor = nameAndTypeEntry.getDescriptor();
            if (descriptor.charAt(0) == '(') {
                collectedClassNames.addAll(getClassesFromMethodDescriptor(descriptor));
            } else {
                var otherClassName = getClassFromFieldDescriptor(descriptor);
                if (otherClassName != null)
                    collectedClassNames.add(otherClassName);
            }
        } else if (entry instanceof ClassFile.MethodTypeEntry) {
            var methodTypeEntry = (ClassFile.MethodTypeEntry) entry;
            var descriptor = methodTypeEntry.getDescriptor();
            collectedClassNames.addAll(getClassesFromMethodDescriptor(descriptor));
        }
    }

    // Collect class names from fields
    var fields = classNode.classFile.fields;
    for (var field : fields) {
        var otherClassName = getClassFromFieldDescriptor(field.getDescriptor());
        if (otherClassName != null)
            collectedClassNames.add(otherClassName);
    }

    // Collect class names from methods
    var methods = classNode.classFile.methods;
    for (var method : methods) {
        collectedClassNames.addAll(getClassesFromMethodDescriptor(method.getDescriptor()));
    }

    return collectedClassNames;
}

As described earlier, we collect class names from a select few constant pool entry types, as well as the descriptors of fields and methods. Pretty straight forward!

I've also added a method called generateDotFile(), which given a ClassDependencyGraph spits out a DOT formatted string. The DOT file format is a standard format for describing graphs that can be visuazlized, e.g. via this online visualizer. Here's DOT file for the the full graph of reachable classes for our simple app, including bootstrap classes. Plug it into the visualizer above!

digraph classDependencies {
node [shape=box, fontsize=16]
"io/marioslab/shakyboi/tests/apps/simple/Zap" -> "java/lang/Object";
"io/marioslab/shakyboi/tests/apps/simple/Zap" -> "java/lang/String";
"io/marioslab/shakyboi/tests/apps/simple/Zap" -> "java/lang/System";
"io/marioslab/shakyboi/tests/apps/simple/Zap" -> "java/io/PrintStream";
"java/lang/Object" [color=#00ff00];
"java/lang/String" [color=#00ff00];
"io/marioslab/shakyboi/tests/apps/simple/Zop" -> "java/lang/Object";
"io/marioslab/shakyboi/tests/apps/simple/Zop" -> "java/lang/String";
"io/marioslab/shakyboi/tests/apps/simple/Zop" -> "java/lang/System";
"io/marioslab/shakyboi/tests/apps/simple/Zop" -> "java/io/PrintStream";
"io/marioslab/shakyboi/tests/apps/simple/App" [color=#ff0000];
"io/marioslab/shakyboi/tests/apps/simple/App" -> "java/lang/Object";
"io/marioslab/shakyboi/tests/apps/simple/App" -> "java/lang/String";
"io/marioslab/shakyboi/tests/apps/simple/App" -> "io/marioslab/shakyboi/tests/apps/simple/Bar";
"io/marioslab/shakyboi/tests/apps/simple/Bar" -> "io/marioslab/shakyboi/tests/apps/simple/Zap";
"io/marioslab/shakyboi/tests/apps/simple/Bar" -> "java/lang/Object";
"io/marioslab/shakyboi/tests/apps/simple/Bar" -> "java/lang/String";
"io/marioslab/shakyboi/tests/apps/simple/Bar" -> "io/marioslab/shakyboi/tests/apps/simple/Zop";
"io/marioslab/shakyboi/tests/apps/simple/Bar" -> "java/lang/System";
"io/marioslab/shakyboi/tests/apps/simple/Bar" -> "java/io/PrintStream";
"java/lang/System" [color=#00ff00];
"java/io/PrintStream" [color=#00ff00];
}

Having quick ways to visualize graphs comes in handy a lot when doing compiler-y stuff, from visualizing abstract syntax trees to control flow graphs. If you plan on going in that direction, I strongly recommend to take the time and build such simple visualization tools.

You can see the ClassDependencyGraphGenerator in action in the ClassDependencyGraphTest class. It works for this simple app :)

Up next

Next time, we'll create the driver, aka the command line interface, of Shakyboi. We'll have to deal with a few more book keeping things, and reflection. Exciting times ahead!

Discuss this post on Twitter.