Is Your Programming Language Data-Oriented?

For many years, there was a strict hierarchy in my mind: Static languages like Java and C++ are cumbersome and verbose, and require me to write lots of redundant type names and boilerplate code, while dynamic languages such as Python are cool and awesome and productive because defining a new integer variable is as easy as writing “x = 1”.

However, I’ve come to realize that Python’s faster development cycle is actually due to being data-oriented, and that a static language like Kotlin can rival Python in terms of concise code and development speed. The reason some languages are painful to work with is that they are not designed to be a data-oriented. In this article I will demonstrate that using Java.

Java is not data-first

int[] a = {1, 2, 3};
System.out.println(a);

It prints nothing helpful:

[I@2ff4acd0

How do you define the map {"a": "b", "c": "d"} in Java? Before Java 9, your best option was:

Map<String, String> myMap = new HashMap<String, String>();
myMap.put("a", "b");
myMap.put("c", "d");

Java 9 allows you to use Map.of(…), but this is done hackily with function overloads so it only works for up to 5 entries. As of 2021, there still is no great generic way to do this in Java. Witness the full travesty here: https://stackoverflow.com/a/6802502/2111778

Java is designed with an object-oriented mindset, so most of your data will actually be hidden behind one or multiple indirect references.

Lists and Maps in Java

               List<T>
/ \
ArrayList<T> LinkedList<T>

Similarly for Map<K, V> and Set<E>:

                      Map<K, V>
/ | \
HashMap<K, V> TreeMap<K, V> LinkedHashMap<K, V>

This seems logical, so what is the issue here? Any working programmer can tell you that 99% of the time, you want to use an ArrayList<T> and a HashMap<K, V>/HashSet<E>. In fact, that is exactly what Kotlin’s and Python’s list and map types are. Java’s way makes no sense: For new programmers, it increases the risk of choosing the wrong implementation. For experienced programmers, it is adding unnecessary line noise. Just make the right thing be the default.

Conclusion: Java hides useful classes deep in its class library under obscure names. Even the basic print command is buried under System.out.println. In contrast, Python libraries usually put commonly-used functions at top-level rather than hiding them inside a huge module hierarchy. (The only exception I can think of is xml.etree.ElementTree, but is that really any wonder considering it’s XML? ;))

* Little footnote: Remember T? The type T has to derive from Object, so it isn’t possible to use primitive types, so you cannot have a List<int> for example, only lists of objects. This introduces an additional layer of indirection and data hiding. (There is an ugly workaround called integer auto-(un)boxing, but it only works half the time.)

Testing

In a past job I worked on a Java microservice which had 5k lines of business code, and 10k lines of test code. Most tests weren’t complicated, often just one positive and one negative case, but Java’s lack of true data classes, refusal to overload operators, and other data hiding tendencies made setting up even simple business objects incredibly cumbersome.

Private fields

At a previous job, I had to work with an XML parsing library that contained a bug: After 10 parsing errors, it would stop outputting any more errors, to prevent denial-of-service attacks. This was intended to be per-call, but a bug was causing this limit to be for the program lifetime. Many expensive team hours went into locating this issue, but since the field was private static we were unable to account for this bug by manually resetting the count between parses.

(By the way, XML is another excellent case study of data hiding by drowning data among line noise and wasting programmer’s brain cycles: https://blog.codinghcodinghorrororror.com/xml-the-angle-bracket-tax/)

Do not hide your data inside of private fields. In Python, all fields and functions are public. Fields and functions not intended for outside use start with a single underscore (_) (read more here). Having “emergency access” to private fields is useful when debugging and in the rare case where there is no better workaround.

Conclusion

23-year-old programmer from Germany!