Shakyboi - Part 1: Of tree shaking and class files
I don't hate ProGuard. It's a fine piece of software. However, its documentation is a bit lacking, and some of its behaviour is weird. So what better way to spend a few weekends than to learn all the hard lessons Eric, creator of ProGuard, had learned over many years! Thus I've started building Shakyboi, my personal replacement for ProGuard. I'll probably reinvent everything ProGuard does badly. But at least I'll have some fun doing so.
What's tree shaking?
ProGuard takes as input Java .class
files and "shrinks", optimizes, and obfuscates them. The "shrinking" part is more commonly known as tree shaking, at least in the JavaScript space.
Tree shaking is a form of dead code removal. Removing dead code generally results in smaller binary sizes, a desirable thing for faster deployments, smaller and thus faster downloads, and potentially faster app start-up times.
In case of Java, we can apply tree shaking on multiple levels. For now, we'll concern ourselves with tree shaking at the class level. The goal of class level tree shaking is to remove any classes that are not used by the app.
Have a look at this contrived example Java app.
// Zipp.java
public class Zipp {
public void doZipp() {
System.out.println("I'm a Zipp.");
}
}
// Zapp.java
public class Zapp {
public void doZapp() {
System.out.println("I'm a Zapp.");
}
}
// Zopp.java
public class Zopp {
public void doZopp() {
System.out.println("I'm a Zopp.");
}
}
// Foo.java
public class Foo {
public void doFoo() {
System.out.println("I'm a Foo.");
new Zipp().doZipp();
}
}
// Bar.java
public class Bar {
public void doBar() {
System.out.println("I'm a Bar.");
new Zapp().doZapp();
new Zopp().doZopp();
}
}
// App.java
public class App {
public static void main(String[] argv) {
new Bar().doBar();
}
}
Let's assume we have compiled the 6 classes via javac
and packaged them into a file called app.jar
using the build system of our choice. The app.jar
file (which is really just a ZIP file) would look like this.
App.class
Bar.class
Foo.class
Zapp.class
Zipp.class
Zopp.class
Running the app on the CLI looks like this:
❯ java -jar app.jar
I'm a Bar.
I'm a Zapp.
I'm a Zopp.
The command starts the Java Virtual Machine (JVM) and tells it to look up the class App
in the app.jar
file and call its main(String[])
method.
With our mushy brains, we can easily see from both the output and the source code that class App
depends on class Bar
(because Bar
is referenced in App.main(String[])
), and that Bar
depends on Zapp
and Zopp
(because they are referenced in Bar.doBar()
). Class Foo
is not referenced in App
or Bar
, nor is Zipp
. Congratulations, you just manually shook the class dependency tree (well, graph really), which looks like this:
In graph theory terms, we've created a directed graph that encodes the class dependencies, which has 2 connected components. The connected component containing the App
class contains all classes necessary for the successful execution of our app. All classes in the other connected components can be removed from our app.
Having performed our dependency analysis manually, we can remove the files Foo.class
and Zipp.class
from our app.jar
file and the app will still work:
❯ zip -d app.jar Foo.class Zipp.class
deleting: Zipp.class
deleting: Foo.class
❯ java -jar app.jar
I'm a Bar.
I'm a Zapp.
I'm a Zopp.
Here's the before-and-after in terms of file size:
❯ ls -lah app.jar
-rw-r--r-- 1 badlogic staff 4.1K Feb 20 14:16 app.jar
❯ zip -d app.jar Foo.class Zipp.class
deleting: Zipp.class
deleting: Foo.class
❯ ls -lah app.jar
-rw-r--r-- 1 badlogic staff 3.3K Feb 20 14:17 app.jar
We saved about 0.8K! 40 years ago that would have been marvelous.
Constructing the class dependency graph from .class
files
Obviously, doing manual dependency analysis and removal of "dead" classes is an itsy bit too complex for real-world applications. Being the computer loving folks we are, we can surely devise a way to automate the class dependency graph creation.
For Java, we could do it in multiple different ways. One approach would be to parse Java source files, but that's just nasty, so we ignore that option.
A possibly less complex way to do it is by parsing .class
files and extracting the classes a class depends on. For this, we have to understand the Java class file format. It's a mostly trivial and well thought out format, so it should be easy to parse and extract the information we need.
Instead of diving into the format specification, let's first have a look at the output of javap
when fed one of our classes. javap
comes with your JDK and is able to disassemble .class
files. Here's what Main.class
contains:
❯ javap -c -v -cp app.jar App
Classfile jar:file:///Users/badlogic/workspaces/marioslab/shakyboi/testapps/foo/target/app.jar!/App.class
Last modified 20 Feb 2021; size 398 bytes
SHA-256 checksum ea9637fce3dfdead99a625a3047ff56547e428c5a8bc81549c537eea8663af13
Compiled from "App.java"
public class App
minor version: 0
major version: 59
flags: (0x0021) ACC_PUBLIC, ACC_SUPER
this_class: #13 // App
super_class: #2 // java/lang/Object
interfaces: 0, fields: 0, methods: 2, attributes: 1
Constant pool:
#1 = Methodref #2.#3 // java/lang/Object."":()V
#2 = Class #4 // java/lang/Object
#3 = NameAndType #5:#6 // "":()V
#4 = Utf8 java/lang/Object
#5 = Utf8
#6 = Utf8 ()V
#7 = Class #8 // Bar
#8 = Utf8 Bar
#9 = Methodref #7.#3 // Bar."":()V
#10 = Methodref #7.#11 // Bar.doBar:()V
#11 = NameAndType #12:#6 // doBar:()V
#12 = Utf8 doBar
#13 = Class #14 // App
#14 = Utf8 App
#15 = Utf8 Code
#16 = Utf8 LineNumberTable
#17 = Utf8 LocalVariableTable
#18 = Utf8 this
#19 = Utf8 LApp;
#20 = Utf8 main
#21 = Utf8 ([Ljava/lang/String;)V
#22 = Utf8 argv
#23 = Utf8 [Ljava/lang/String;
#24 = Utf8 SourceFile
#25 = Utf8 App.java
{
public App();
descriptor: ()V
flags: (0x0001) ACC_PUBLIC
Code:
stack=1, locals=1, args_size=1
0: aload_0
1: invokespecial #1 // Method java/lang/Object."":()V
4: return
LineNumberTable:
line 1: 0
LocalVariableTable:
Start Length Slot Name Signature
0 5 0 this LApp;
public static void main(java.lang.String[]);
descriptor: ([Ljava/lang/String;)V
flags: (0x0009) ACC_PUBLIC, ACC_STATIC
Code:
stack=2, locals=1, args_size=1
0: new #7 // class Bar
3: dup
4: invokespecial #9 // Method Bar."":()V
7: invokevirtual #10 // Method Bar.doBar:()V
10: return
LineNumberTable:
line 3: 0
line 4: 10
LocalVariableTable:
Start Length Slot Name Signature
0 11 0 argv [Ljava/lang/String;
}
SourceFile: "App.java"
Looks intimidating at first sight, but it's not so bad! If you CTRL + F
for Bar
, you can see all the places App
references Bar
in the .class
file.
We see a bunch of Bar
in the constant pool section of the javap
output. The constant pool is a lookup table referenced by other parts of the class. It contains symbolic references to other classes, fields, methods, constants and so on. Each constant pool entry has an index and a type. A constant pool entry may also reference another constant pool entry. We can see Bar
in a handful of constant pool entries. Let's look at the first two.
#7 = Class #8 // Bar
#8 = Utf8 Bar
The entry at index 7 is of type Class, and encodes, well, a class. It references another entry at index 8, which is of type Utf8, a UTF-8 string encoding the class' name. Great! In general, we can assume that almost all classes a class depends on can be found in the constant pool in form of Class entries.
The actual byte code making up a class' method will reference the constant pool, e.g. the App.main(String[])
method looks like this:
public static void main(java.lang.String[]);
descriptor: ([Ljava/lang/String;)V
flags: (0x0009) ACC_PUBLIC, ACC_STATIC
Code:
stack=2, locals=1, args_size=1
0: new #7 // class Bar
3: dup
4: invokespecial #9 // Method Bar."":()V
7: invokevirtual #10 // Method Bar.doBar:()V
10: return
LineNumberTable:
line 3: 0
line 4: 10
LocalVariableTable:
Start Length Slot Name Signature
0 11 0 argv [Ljava/lang/String;
The instruction at byte code index 0 (new #7
) references the constant pool entry at constant pool index 7, which is the Class entry discussed above. The instruction tells the JVM to construct a new instance of class Bar
, as helpfully pointed out by javap
via the comment // class Bar
appended to the instruction.
Let's ignore the dup
instruction.
The next instruction at byte code index 4 (invokespecial #9
) calls a special method, the constructor of Bar
. The constructor is specified as a constant pool entry at index 9. The constant pool entry is of type MethodRef, which encodes a class (Bar
in this case), and the method of that class we want to invoke (the constructor of Bar
):
#9 = Methodref #7.#3 // Bar."":()V
The MethodRef references constant pool entries 7 and 3 (javap
has again helpfully resolved them for us in the comment):
#3 = NameAndType #5:#6 // "":()V
#7 = Class #8 // Bar
Ah, we already know constant pool entry 7, it's the class! Constant pool entry 3 is of type NameAndType, which describes the name and type of either a field or a method. The type is encoded as a descriptor string. You may have seen those before, they may look a bit wonky. Here's the descriptor of App.main(String[])
([Ljava/lang/String;)V
As it turns out, classes directly referenced by a class can all be found in the constant pool, either in form of a Class constant pool entry, or as a descriptor string. The latter is a bit more complex to handle, as we have to go through all constant pool entry types that reference such a descriptor. We can figure this out from the class file format specification. Additionally, we have to consider the super class hierarchy of the class, as well as the interfaces it implements, information that's also part of the .class
file.
And if we can do it for one class, we can do it for all other classes it depends on, recursively, which gives us our class dependency graph. Easy!
Up next
I've already created a repository and filled in some class file parsing code, see Shakyboi on GitHub. We won't be using any of the fancy byte code manipulation libraries like ASM, but instead use some duct tape and spit and write our own. In the next installment I'll discuss the implementation of the basic class level tree shaker.
Discuss this post on Twitter.