Shakyboi - Part 4: Command line driver

February 28, 2021

If you read this, say 'Moo' on Twitter.

Shakyboi - Part 4: Command line driver

Last time we build the class dependency graph. Time to wrap it all up and create a command line driver for Shakyboi, i.e. the actual thing users can execute to tree shake their .jar files. The driver needs to be passed:

Here's the interface of the command line driver, telling us what we can do.

Usage: shakyboi <options>
    Options:
    
       --app <dir|jar>               A directory or .jar to lookup app class files in.
    
       --bootstrap <dir|jar|"jrt">   A directory, .jar, or "jrt" (Java runtime image)
                                     to lookup bootstrap class files in. "jrt" is the default.
    
       --root <class-name-pattern>   A root class name (pattern), e.g. my.package.App, **.Foo.
                                     You can specify multiple classes by using multiple --root.
                                     options.
    
       --output <jar-file>           The name of the output .jar file. Performs a dry-run if omitted.
    
       --html-report <html-file>     (Optional) The name of the .html file to write the report to.
                                     You can view it locally in a browser.
    
       --json-report <json-file>     (Optional) The name of the .json file to write the report to.

Refactoring ClassLookup

The ClassLookup class can currently only find .class files. Apps may however include resource files as well, which we have to copy to the output .jar file. If we want to report the classes we've removed from the app, we also have to have a way to enumerate all classes of an app. We thus refactor ClassLookup to become Lookup:

public interface Lookup {
    /**
     * Looks up the class with the given name and returns its
     *  <code>.class </code> file content as a byte array.
     *
     * @param name the binary class name, e.g. "java/lang/Object". @see  <a href="https://docs.oracle.com/javase/specs/jvms/se15/html/jvms-4.html#jvms-4.2.1" >jvms-4.2.1 </a >.
     * @return the classes bytes or null.
     * @throws RuntimeException in case an unrecoverable error happened.
     */
    byte[] findClass(String name);

    /**
     * Looks up the resource with the given name and returns its
     * content as a byte array.
     *
     * @param name the name of the resource, e.g. "java/lang/Object.class", or "images/bunny.png".
     * @return the contents or null.
     * @throws RuntimeException in case an unrecoverable error happened.
     */
    byte[] findResource(String name);

    /**
     * Lists all files contained in this lookup, both class and resource files.
     *
     * @return a list of all files in this lookup.
     */
    List <String > list();
}

We've added the method findResource(String) which returns the data for a file name from the lookup. The new list() method returns all files in the lookup. I've converted all ClassLookup implementations to conform to this interface and renamed them accordingly. ClassLoaderLookup is special and will throw an exception if its list() method is called. While it's theoretically possibly to list all files accessible to a ClassLoader, it's not worth the headache as we use ClassLoaderLookup only in a handful of unit tests that don't use Lookup.list().

Refactoring ClassNode and ClassDependencyGraphGenerator

The reports of reachable and removed app classes will also contain information on which classes a class depends on. That's already quite useful, but it would be even better to also know which classes referred to a specific class.

We thus refactor ClassNode to have a set of classes that refer to it, and fix up ClassDependencyGraphGenerator to add that information to all class nodes it encounters. You can find the single line fixes for both changes here and here

A class may reference other classes that can't be found in either the app lookup or the bootstrap lookup. This can happen if you've omitted some classes from the .jar file you want to tree shake which you know will never be uses. It can also be due to a broken app. In either case, it's probably good to warn the user of such problems. I've thus refactored ClassDependencyGraphGenerator.generate() to also receive a list it can store warnings in that we later print in the driver.

Core logic and driver

Shakyboi should be usable from both the command line as well as programmatically, i.e. in build system plugins. We should thus separate the core logic from the command line driver.

Shakyboi's core is implemented in the class Shakyboi. The class contains a single method called shake(), which takes an instance of Settings:

/**
* Specifies app {@link Lookup}, bootstrap {@link Lookup}, root classes, and output file for {@link Shakyboi}.
* Optionally specify HTML and JSON report output files.
*/
public static class Settings {
    /** The {@link io.marioslab.shakyboi.lookup.Lookup} to find app files in **/
    public final Lookup appLookup;
    /** The {@link io.marioslab.shakyboi.lookup.Lookup} to find bootstrap files in **/
    public final Lookup bootstrapLookup;
    /** List of root classes given as {@link io.marioslab.shakyboi.util.Pattern} instances */
    public final List rootClasses;
    /** Output file **/
    public final File output;
    /** Optionel HTML report file, may be null **/
    public final File htmlReport;
    /** Optionel JSON report file, may be null **/
    public final File jsonReport;

    /**
    * Creates a new settings instance to be passed to {@link #shake(Settings)}.
    *
    * @param appLookup       the {@link Lookup} to find app files in.
    * @param bootstrapLookup the {@link Lookup} to find bootstrap files in.
    * @param rootClasses     the list of root classes given as {@link Pattern} instances.
    * @param output          the output .jar file. The parent directory must exist.
    * @param htmlReport      optional file to write the HTML report to. May be null.
    * @param jsonReport      optional file to write the JSON report to. May be null.
    */
    public Settings(Lookup appLookup, Lookup bootstrapLookup, List rootClasses, File output, File htmlReport, File jsonReport) {
        this.appLookup = appLookup;
        this.bootstrapLookup = bootstrapLookup;
        this.rootClasses = rootClasses;
        this.output = output;
        this.htmlReport = htmlReport;
        this.jsonReport = jsonReport;
    }
}

Pretty straight forward and hopefully selfexplanatory. The only interesting bit is the way root class names are specified. They are given as Pattern instances instead of strings, so users can specify globs like my.app.package.* or my.app.**.Foo*. The class was kindly donated by Nate, Spine's benevolent dictator.

The shake() method ties everything together:

/**
* Applies class tree shaking to the app classes given as a {@link Lookup} in the {@link Settings}. Generates
* an output  <code >.jar </code > file containing all reachable classes from the app lookup, as well as any files
* found in the app lookup. Optionally generates a HTML and/or JSON report file. See {@link Settings}.
*
* @param settings the {@link Settings} specifying input and output parameters for the class tree shaking.
* @return {@link Statistics} generated during class tree shaking.
* @throws IOException in case a file couldn't be read from a lookup.
*/
public static Statistics shake(Settings settings) throws IOException {
    // expand root classes
    long timeRootClassExpansion = System.nanoTime();
    var rootClassNames = new ArrayList <String >();
    var inputClassesAndFiles = settings.appLookup.list();
    var inputClasses = inputClassesAndFiles.stream().filter(f - > f.endsWith(".class")).collect(Collectors.toList());
    var inputFiles = inputClassesAndFiles.stream().filter(f - > !f.endsWith(".class")).collect(Collectors.toList());
    for (var file : inputClasses) {
        for (var rootPattern : settings.rootClasses) {
            if (rootPattern.matchesPath(file)) {
                rootClassNames.add(file.replace(".class", ""));
                break;
            }
        }
    }
    if (rootClassNames.size() == 0) throw new IOException("No root classes found in app lookup.");
    timeRootClassExpansion = System.nanoTime() - timeRootClassExpansion;

    // Generate the class dependency graph and gather all reachable app classes.
    long timeClassDependencyGraph = System.nanoTime();
    var warnings = new ArrayList <String >();
    var classDependencyGraph = ClassDependencyGraphGenerator.generate(settings.appLookup,
            settings.bootstrapLookup,
            warnings,
            rootClassNames.toArray(new String[0]));
    var reachableAppClasses = classDependencyGraph.reachableClasses.values().stream().filter(cl - > cl.isAppClass).collect(Collectors.toList());
    timeClassDependencyGraph = System.nanoTime() - timeClassDependencyGraph;

    // Write output .jar file
    long timeWriteJar = System.nanoTime();
    if (settings.output != null) {
        try (var writer = new JarFileWriter(settings.output)) {
            for (var file : inputFiles)
                writer.addFile(file, settings.appLookup.findResource(file));

            for (var clazz : reachableAppClasses)
                writer.addFile(clazz.classFile.getName() + ".class", clazz.classFile.originalData);
        }
    }
    timeWriteJar = System.nanoTime() - timeWriteJar;

    // Create report if requested
    long timeReport = System.nanoTime();
    if (settings.htmlReport != null) generateHtmlReport(settings, inputClasses, classDependencyGraph);
    if (settings.jsonReport != null) generateJsonReport(settings, inputClasses, classDependencyGraph);
    timeReport = System.nanoTime() - timeReport;

    return new Statistics(inputClasses.size(), reachableAppClasses.size(), warnings, timeRootClassExpansion / 1e9f, timeClassDependencyGraph / 1e9f, timeWriteJar / 1e9f, timeReport / 1e9f);
}

That's not a lot of code! We first find the fully qualified names of all root classes matching the root name patterns the user provided. We also split up the input files into .class files and resource files (anything not endign in .class.

We then generate the class dependency graph using ClassDependencyGraphGenerator

If the user provided an output file name, we write the .class files of reachable classes as well as all resources to the output file.

Finally, if the user requested to get reports, we generate those as well.

The method returns a Statistics instance storing a few numbers that help assess how well Shakyboi performed.

The command line driver is then merely a wrapper around Shakyboi that parses the command line arguments and passes them to the Shakyboi.shake() method. You can find the implementation of that in ShakyboiCLI.java

Executing Shakyboi

Time to take Shakyboi for a ride! I'll be using the libGDX Pax Britannica demo, the .jar file of which can be found http://libgdx.badlogicgames.com/demos/paxbritannica/paxbritannica.jar. You can run it via java -jar paxbritannica.jar on Windows, Linux, and macOS.

After packaging Shakyboi via mvn package, we can have it shake Pax Britannica as follows:

java -jar shakyboi.jar \
    --app paxbritannica.jar \
    --root de.swagner.paxbritannica.desktop.DesktopLauncher \
    --html-report report.html \
    --output paxbritannica-shaky.jar

This tells Shakyboi to find app classes in paxbritannica.jar, to start tracing classes in de.swagner.paxbritannica.desktop.DesktopLauncher, to write an HTML report to report.html and to write the resulting output to paxbritannica-shaky.jar. Executing the above command yields:

WARNING: No bootstrap classes specified, defaulting to JRT image.
WARNING: Class org.lwjgl.opengl.WindowsDisplay depends on org.lwjgl.opengles.PixelFormat, but org.lwjgl.opengles.PixelFormat could not be found in app or bootstrap classpath.
WARNING: Class org.lwjgl.opengl.WindowsDisplayPeerInfo depends on org.lwjgl.opengles.GLContext, but org.lwjgl.opengles.GLContext could not be found in app or bootstrap classpath.
WARNING: Class org.lwjgl.opengl.LinuxDisplay depends on org.lwjgl.opengles.GLContext, but org.lwjgl.opengles.GLContext could not be found in app or bootstrap classpath.
WARNING: Class org.lwjgl.opengl.LinuxDisplay depends on org.lwjgl.opengles.PixelFormat, but org.lwjgl.opengles.PixelFormat could not be found in app or bootstrap classpath.
WARNING: Class org.lwjgl.opengl.LinuxDisplayPeerInfo depends on org.lwjgl.opengles.GLContext, but org.lwjgl.opengles.GLContext could not be found in app or bootstrap classpath.
WARNING: Class org.lwjgl.opengl.DrawableGLES depends on org.lwjgl.opengles.GLES20, but org.lwjgl.opengles.GLES20 could not be found in app or bootstrap classpath.
WARNING: Class org.lwjgl.opengl.DrawableGLES depends on org.lwjgl.opengles.Util, but org.lwjgl.opengles.Util could not be found in app or bootstrap classpath.
WARNING: Class org.lwjgl.opengl.DrawableGLES depends on org.lwjgl.opengles.EGL, but org.lwjgl.opengles.EGL could not be found in app or bootstrap classpath.
WARNING: Class org.lwjgl.opengl.DrawableGLES depends on org.lwjgl.opengles.PowerManagementEventException, but org.lwjgl.opengles.PowerManagementEventException could not be found in app or bootstrap classpath.
WARNING: Class org.lwjgl.opengl.DrawableGLES depends on org.lwjgl.opengles.EGLConfig, but org.lwjgl.opengles.EGLConfig could not be found in app or bootstrap classpath.
WARNING: Class org.lwjgl.opengl.DrawableGLES depends on org.lwjgl.opengles.PixelFormat, but org.lwjgl.opengles.PixelFormat could not be found in app or bootstrap classpath.
WARNING: Class org.lwjgl.opengl.DrawableGLES depends on org.lwjgl.opengles.EGLContext, but org.lwjgl.opengles.EGLContext could not be found in app or bootstrap classpath.
WARNING: Class org.lwjgl.opengl.DrawableGLES depends on org.lwjgl.opengles.EGLSurface, but org.lwjgl.opengles.EGLSurface could not be found in app or bootstrap classpath.
WARNING: Class org.lwjgl.opengl.DrawableGLES depends on org.lwjgl.opengles.EGLDisplay, but org.lwjgl.opengles.EGLDisplay could not be found in app or bootstrap classpath.
WARNING: Class org.lwjgl.opengl.ContextGLES depends on org.lwjgl.opengles.GLES20, but org.lwjgl.opengles.GLES20 could not be found in app or bootstrap classpath.
WARNING: Class org.lwjgl.opengl.ContextGLES depends on org.lwjgl.opengles.EGL, but org.lwjgl.opengles.EGL could not be found in app or bootstrap classpath.
WARNING: Class org.lwjgl.opengl.ContextGLES depends on org.lwjgl.opengles.PowerManagementEventException, but org.lwjgl.opengles.PowerManagementEventException could not be found in app or bootstrap classpath.
WARNING: Class org.lwjgl.opengl.ContextGLES depends on org.lwjgl.opengles.GLContext, but org.lwjgl.opengles.GLContext could not be found in app or bootstrap classpath.
WARNING: Class org.lwjgl.opengl.ContextGLES depends on org.lwjgl.opengles.ContextCapabilities, but org.lwjgl.opengles.ContextCapabilities could not be found in app or bootstrap classpath.
WARNING: Class org.lwjgl.opengl.ContextGLES depends on org.lwjgl.opengles.EGLConfig, but org.lwjgl.opengles.EGLConfig could not be found in app or bootstrap classpath.
WARNING: Class org.lwjgl.opengl.ContextGLES depends on org.lwjgl.opengles.EGLContext, but org.lwjgl.opengles.EGLContext could not be found in app or bootstrap classpath.
WARNING: Class org.lwjgl.opengl.ContextGLES depends on org.lwjgl.opengles.EGLDisplay, but org.lwjgl.opengles.EGLDisplay could not be found in app or bootstrap classpath.
WARNING: Class org.lwjgl.opengl.ContextGLES depends on org.lwjgl.opengles.EGLSurface, but org.lwjgl.opengles.EGLSurface could not be found in app or bootstrap classpath.
Root class expansion:    0.06704833 secs
Class dependency graph:  0.06704833 secs
Write jar:               1.0075673 secs
Write report:            0.15579033 secs
Took:                    1.35360575 secs
Output:                  /Users/badlogic/workspaces/shakyboi/paxbritannica-shaky.jar
HTML report:             /Users/badlogic/workspaces/shakyboi/report.html
Total app classes:       2129
Reachable app classes:   973
Reduction:               54%

Nice! Shakyboi removed 54% of classes from the output. In terms of file size, that's a reduction by 1MB.

-rw-r--r--   1 badlogic  staff    14M Mar  7 10:26 paxbritannica-shaky.jar
-rw-r--r--   1 badlogic  staff    15M May  5  2018 paxbritannica.jar

Most of the .jar file is made up of graphics files and native shared libraries, which Shakyboi just copies verbatim to the output.

How does it compare to ProGuard? Well. I tried to write a configuration that would generate a runnable jar. But I gave up after 30 minutes, as the damn thing would just keep on crashing due to missing classes and methods. The final .jar size ProGuard spits out is 1MB smaller than the one Shakyboi generates. But since the app doesn't run, I can't say if that's comparable. ProGuard also removes methods, which further reduces the size of .class files considerably, so it definitely has an edge.

Shakyboi also generated an HTML report of classes it kept and removed. I've embedded it below in a fancy old iframe.

You can also get the raw data by passing --json-report as an argument.

What about reflection?

For Pax Britannica, I was lucky that all classes, even those loaded via reflection, have somehow been referenced by code. In other circumstances, that may not be the case. However, with the --root argument, we can solve any issues stemming from reflection by force including classes (possibly expressed as patterns) in the final output. Easy.

In conclusion (and maybe up next)

Well, that was a fun ride. I consider Shakyboi to be complete at this point, at least for my meager purposes. There's definitely room for improvement. We could further reduce the .class file size by tracing which methods are actually used. From that information, we may be able to also remove more classes. But for now, that is left as an exercise to the reader :)

Discuss this post on Twitter.