Is Your Programming Language Data-Oriented?

For many years, there was a strict hierarchy in my mind: Static languages like Java and C++ are cumbersome and verbose, and require me to write lots of redundant type names and boilerplate code, while dynamic languages such as Python are cool and awesome and productive because defining a new integer variable is as easy as writing “x = 1”.

However, I’ve come to realize that Python’s faster development cycle is actually due to being data-oriented, and that a static language like Kotlin can rival Python in terms of concise code and development speed. The reason some languages are painful to work with is that they are not designed to be a data-oriented. In this article I will demonstrate that using Java.

Java is not data-first

What does the following Java code print?

It prints nothing helpful:

How do you define the map {"a": "b", "c": "d"} in Java? Before Java 9, your best option was:

Java 9 allows you to use Map.of(…), but this is done hackily with function overloads so it only works for up to 5 entries. As of 2021, there still is no great generic way to do this in Java. Witness the full travesty here: https://stackoverflow.com/a/6802502/2111778

Java is designed with an object-oriented mindset, so most of your data will actually be hidden behind one or multiple indirect references.

Lists and Maps in Java

Java’s list type is List<T>, which define a list which only accepts objects of a type T of your choice.* However, you cannot directly instantiate it, only implementations of it. List<T> is an interface for different list types which support basic list operations (adding to the back, indexing at any position, etc.). The idea is that you can keep this “List contract” and swap out implementations at any time.

Similarly for Map<K, V> and Set<E>:

This seems logical, so what is the issue here? Any working programmer can tell you that 99% of the time, you want to use an ArrayList<T> and a HashMap<K, V>/HashSet<E>. In fact, that is exactly what Kotlin’s and Python’s list and map types are. Java’s way makes no sense: For new programmers, it increases the risk of choosing the wrong implementation. For experienced programmers, it is adding unnecessary line noise. Just make the right thing be the default.

Conclusion: Java hides useful classes deep in its class library under obscure names. Even the basic print command is buried under System.out.println. In contrast, Python libraries usually put commonly-used functions at top-level rather than hiding them inside a huge module hierarchy. (The only exception I can think of is xml.etree.ElementTree, but is that really any wonder considering it’s XML? ;))

* Little footnote: Remember T? The type T has to derive from Object, so it isn’t possible to use primitive types, so you cannot have a List<int> for example, only lists of objects. This introduces an additional layer of indirection and data hiding. (There is an ugly workaround called integer auto-(un)boxing, but it only works half the time.)

Testing

Writing tests in Java is a colossal pain.

In a past job I worked on a Java microservice which had 5k lines of business code, and 10k lines of test code. Most tests weren’t complicated, often just one positive and one negative case, but Java’s lack of true data classes, refusal to overload operators, and other data hiding tendencies made setting up even simple business objects incredibly cumbersome.

Private fields

In the Java community, it is considered good practice to declare the fields of a class as private and access them using getters/setters (or deny access at all) rather than directly. The idea is that it allows hiding the internal representation of the field (rarely useful) and allows an unchanging public class API. This may be useful when writing a library, but makes no sense when writing an application, where you would want a new improved name everywhere, and IDEs are plenty capable of changing it everywhere at once.

At a previous job, I had to work with an XML parsing library that contained a bug: After 10 parsing errors, it would stop outputting any more errors, to prevent denial-of-service attacks. This was intended to be per-call, but a bug was causing this limit to be for the program lifetime. Many expensive team hours went into locating this issue, but since the field was private static we were unable to account for this bug by manually resetting the count between parses.

(By the way, XML is another excellent case study of data hiding by drowning data among line noise and wasting programmer’s brain cycles: https://blog.codinghcodinghorrororror.com/xml-the-angle-bracket-tax/)

Do not hide your data inside of private fields. In Python, all fields and functions are public. Fields and functions not intended for outside use start with a single underscore (_) (read more here). Having “emergency access” to private fields is useful when debugging and in the rare case where there is no better workaround.

Conclusion

Java actively hides data from you. There is a saving grace: Among Java developers, debugger use is very common, which helps mitigate this issue a bit. You can set a breakpoint and rather than printing out data, you can click-click-click on the active variables to somehow get to your data. I’d much rather keep coding in a data-oriented language though.

23-year-old programmer from Germany!