Google Code Jam — How To Prepare

Study guide for succeeding at competitive programming

16 min readApr 26, 2021

Google Code Jam is a worldwide online programming competition which draws 30,000 to 60,000 contestants each year. Contestants solve a set of programming problems within a usually 2.5-hour-timeframe.

I love the excitement that each round brings, and that feeling of submitting a correct solution just minutes from the deadline. I also feel like I learn a lot of clever tricks each time. And while I don’t think I’m good enough to make top 25 and participate in the World Finals, I have placed 326th before, which is still in the top 1% of contestants.

In this article I will present common classes of problems from my 8 years of Code Jam experience. If you are a skilled programmer and have a mathematical brain, then using this guide should easily get you into round 2 (the top 4,500 contestants) and allow you a good chance to get into round 3 (the top 1,000 contestants who all win t-shirts). All past problems are open to practice submissions, allowing you to study them at will.

Introduction

Code Jam started out as a recruitment tool for Google to identify talented engineers, but unfortunately I have not met a single German company who cared about my performance at Code Jam. As usual, German companies tend to care about years-of-experience and manager impressions and not about actual programming skill.

The now-defunct Go-Hero site reveals that about 85% of contestants use one of three languages: C++ (45%), Python (20%), and Java (20%). Funnily enough, the percentage of C++ only further rises in more advanced rounds (to 80%+), implying that it might be the best tool for the job.

Looking at a competitive C++ submission is an absolute nightmare. At first, dozen of lines of preprocessor directives to save a couple keystrokes (for example, defining “long long” as “ll” and a “FOR(i,n)” macro for loops). Variable names will be 1 or 2 letters to further save keystrokes. And of course a single include of the whole stdlib and no comments.

I’m a contrarian here, I only use Python. While the speed disadvantage has hurt me one or two times, I believe in writing solutions that I’m still able to read after the contest. I also detail my thought process, which puts me at a time disadvantage, but I think it helps me come up with a solution. I’ll also use Python in this article for instruction.

The Code Jam online judge offers Python bundled with numpy and scipy, which are great tools to have beautiful and fast code. One caveat for the remaining article: round(x.5) rounds to the closest even number, so round(1.5) is 2 and round(2.5) is 2 as well. So, before proceeding, add the following redefinition of the round function to your toolbelt:

round = lambda x: int(x + .5)

Greedy & Sort-By-Goodness

A lot of early round problems (Q, 1A-1C) can be solved by a greedy algorithm. A greedy algorithm simply picks the best option at each step instantly, without having to consider the complete input.

There is nothing to study here, it’s just a reminder to try a greedy approach first. I usually fall into the trap that I want to model the dependencies somehow, or use some clever data structure, when in reality you just need to go over a list once and pick the best option at each step.

Sort-by-goodness strategies are closely related to greedy strategies. Each list element gets assigned a score or goodness, and then we sort and greedily pick the best elements first.

Example Problems:
- The Last Word
- Senate Evacuation
- Alphabet Cake
- Rather Perplexing Showdown
- Rounding Error: A bit hard. Also needs the round() fix.

Example Solutions:
- Rounding Error

Triangle Numbers

The triangle number of n is simply defined as the sum of all numbers from 1 to n, which looks like a triangle if you draw them as little circles:

So tri(1) = 1, tri(2) = 3, tri(3) = 6, tri(4) = 10, and so on. The general formula in Python is:

tri = lambda n: n * (n + 1) // 2

Triangle numbers have many applications. For example, tri(n-1) gives the number of pairs one can make for n people. (This is also the reason big software teams fail, because as n grows, the number of communication channels between people (the triangle number) grows quadratically in n.)

Sometimes it is useful to know the inverse, mapping triangle numbers to their original numbers. For a known triangle number t we get the equation

.5 n**2 + .5 n = t
.5 n**2 + .5 n + (-1) t = 0

Solving with the abc-formula leads to:

invtri = lambda t: int( (2 * t + .25)**.5 - .5 )

The int() “rounds down”, mapping [3, 4, 5] to 2 and [6, 7, 8, 9] to 3.

We still have a problem: We are using floats. This actually bit me on the pancake problem and led to me losing out on a prize. The limitations of floating point precision is something every programmer should be able to analyze, and Code Jam deliberately chooses limits in such a way that floats can become an issue, hence I added another section about them below.

Luckily, integer square roots are a well-studied problem and have been in Python since 3.8. Since the judge software uses 3.7, you will have to do with your own implementation for now. We just need to multiply the entire expression by 2 to get rid of the fractions:

invtri = lambda t: (math.isqrt(8 * t + 1) - 1) // 2

Now you will be well-armed for any triangle number problems!

Example Problems:
- Graceful Chainsaw Jugglers: Combined with Dynamic Programming. HARD.
- Incremental House of Pancakes: Inverted triangle numbers. HARD.

Binary Search

Binary Search problems are common, especially in earlier rounds (Q, 1A-1C). Binary search allows you to index a value inside of a sorted list by halving the search interval at each step, leading to a very fast algorithm. Since these are extensively covered in university and job interviews I do not expect anyone to have difficulty with those.

It’s nonetheless useful to have a canned version of this algorithm to avoid off-by-one errors or limit errors which can be tricky. This is the function I use:

# Search on the interval [lo,hi)                                vvvv
# Return first index where test_func is True  -> [False..False, True..True]
# This will fail when the list is all False
def bin_search(lo, hi, test_func):
    print('Calling bin_search with ', lo, hi)
    if lo == hi - 1:
        return lo
    mid = (hi+lo-1)//2
    if test_func(mid):
        return bin_search(lo, mid + 1, test_func)
    else:
        return bin_search(mid + 1, hi, test_func)

Example usage (find index of 1000):

a = [10, 100, 1000, 10000, 100000, 1000000, 10000000, 100000000]
assert bin_search(0, len(a), lambda i: a[i] >= 1000) == 2

numpy also provides numpy.searchsorted.

Sometimes Code Jam throws a spin on the problem by not having an explicit list/array, but requiring a dynamically calculated value, since generating the entire list would be too time-consuming (essentially linear-time itself, at which point you have done a linear search, not a binary search). This is why I designed my helper to take a function instead of an array, it’s more generic.

Binary search is so powerful, it can even be used to invert any monotonic function, such as the triangle number function discussed above. Simply do a binary search over the x values, and check when you match the desired y value. This is very fast for both integer and float domains (x values), and an example of a dynamically calculated value as just mentioned. So inverting tri(4) = 10:

tri = lambda n: n * (n + 1) // 2
assert bin_search(0, 20, lambda i: tri(i) >= 10) == 4

Note that this rounds up instead of down, so it only works with perfect triangle numbers. A fully-features version with rounding-down would be:

invtri = lambda t: bin_search(0, 10**10, lambda i: tri(i) > t) - 1

More generally:

def invert_monotonic_int_function(fct):
    return lambda y: bin_search(0, 10**20, lambda x: fct(x) > y) - 1invtri = invert_monotonic_int_function(tri)

Example Problems:
- Blindfolded Bullseye: Straightforward once you handle the interactive part.
- Haircut
- Bit Party
- Cubic UFO: Used to invert a function. A direct linear algebra solution is also possible.

Example Solutions:
- Blindfolded Bullseye

Dynamic Programming

Dynamic Programming (DP) is such a staple of Code Jam, they have even joked about it:

Sometimes our pangrams contain confidential information — for example, CJ QUIZ: KNOW BEVY OF DP FLUX ALGORITHMS — so we need to keep them secure.

A short explanation is that Dynamic Programming can be applied to recursive problems with overlapping subproblems. The trick is to solve the problem “forward” (from base case to full problem) instead of “backward”.

This is typically done with a 2D table. But it works just as well on a 3D table (see example problems) or a 1D table. I will use Fibonacci numbers as an example to explain DP, which actually just require a 1D table. Fibonacci numbers are defined as:

def fib(n: int):
    if n <= 2: return 1
    return fib(n - 1) + fib(n - 2)

This means the n-th Fibonacci number is the sum of the previous two Fibonacci numbers, the 1st and 2nd Fibonacci number are 1. This recursive definition causes a lot of unnecessary work however:

units_of_work = 0
def fib(n: int):
    global units_of_work
    if n <= 2: return 1
    units_of_work += 1
    return fib(n - 1) + fib(n - 2)
fib(30)
assert units_of_work == 832039

This is because subproblems have to be computed many times. For example, f(30) = f(29) + f(28) = f(28) + f(28) + f(27), so we have to compute f(28) twice instead of once, leading to a huge computation tree. We do almost a million units of work, in fact the amount of work we do is as large as the 30th Fibonacci number itself (832040).

One fix is to cache calls using lru_cache:

from functools import lru_cache
units_of_work = 0
@lru_cache()
def fib(n: int):
    global units_of_work
    if n <= 2: return 1
    units_of_work += 1
    return fib(n - 1) + fib(n - 2)
fib(30)
assert units_of_work == 28

More explicit is using a DP table and doing the calculations “forward” instead of “backward”:

units_of_work = 0
fib = [1] * 31
for n in range(3, 31):
    units_of_work += 1
    fib[n] = fib[n-1] + fib[n-2]
assert units_of_work == 28

Things become slightly more tricky in 2D and when using Python.

In a typical DP problem, each cell takes some sort of maximum of two ancestors (for example, the cell directly to the top and directly to the top-left). Then we iterate over all the rows and apply this maximum for each cell. Usually you will need to initialize the first row separately then, and also make sure that the ancestor is not out of bounds (tricky in Python as negative numbers are legal indexes!).

Python is usually too slow for this task. Here is where numpy comes in. If we use a numpy array instead of a list of lists, and program smartly, we can achieve a big speedup. What we will do is consider each row as a numpy array and apply some sort of numpy operation on it, like shifting by 1, or even just adding a constant. This essentially moves our for-loops from Python to numpy-internal C code which is leaps and bounds faster. However, we must be careful to avoid accidentally using the slow Python iteration (except for the outermost loop) and if-statements.

I know this is a lot of information, but let us look at a problem to understand it in depth.

Dynamic Programming (Example: Ant Stack)

In Ant Stack, we are given the following parameters:
- Input is a sequence of ants, given as a simple list of weights
- Find a stack where each ant carries no more than 6 times its weight
- The output stack must be a subsequence of the input
- Ants weight up to 10**9 milligrams

At first we can do some limit analysis (discussed below) to figure out that we can never have more than 139 ants in a stack, because even the heaviest ant wouldn’t be able to carry any more. The exact reason is not important, but this way we can set up a table with dimensions MAXANTS x NUMANTS = 139 x 10**5. For simplicity (and because I think I got a memory limit error) we will only be storing the most recent row of size 139 (for technical reasons 141), in many DP problems you can just store the recent row instead of the table.

We will get a table with axes like this:

    0   1   2   3  ... 140 (ants on stack)
__
W1
W2
W3
…
W10000
(input ant weights)

A cell (j, Wi) represents the weight of the best/lowest-weight ant stack of (ants 1 to i) and (size j) so far. Knowing which variables to put on which axes is a bit of an art, but let us just roll with it. How can we recursively obtain the next row from the previous row? We want to add the ant if it’s possible and skip it otherwise:

DP[i,j] = DP[i-1,j-1] + Wj if 6 * DP[i-1,j-1] <= Wj else DP[i-1,j]

Just like discussed earlier, the recursion looks at the direct top and topleft ancestors.

You might argue that this allows us to pick a suboptimal ant. Let’s say we are in column j and pick an ant Wi but W_i+1 is better because it weighs slightly less but can still lift everyone above. When we process W_i+1 later we will overwrite that value for j since we recurse into the j-1 case — this is why it’s important to keep all the columns with the partial problems around.

For the input [31, 30, 6, 6] the ideal solution is to skip the ant with weight-31 because our final weight-6 ant can only carry up to weight-36, yielding a weight-42 stack consisting of 3 ants. This solution bubbles up nicely:

         0   1   2   3   4   5 (ants on stack)_____    0 inf inf inf inf inf
W1=31    0  31 inf inf inf inf
W2=30    0  30  61 inf inf inf
W3= 6    0   6  36 inf inf inf
W4= 6    0   6  12  42 inf inf
(input ant weights)

Here is the full code. Note that I defined a large int as pseudo-infinity to avoid floating point numbers, and am using this value in the minimum calculation to avoid if-statements and use the numpy fast loops. At the end I return the size of the largest possible (so non-infinity) stack.

def run(ants):
    n = len(ants)
    MAXANSWER = 139
    INFTY = MAXANSWER * 10**9 + 1
    num_cols = min(MAXANSWER+2, n+2)
    DP = numpy.full(num_cols, INFTY, dtype=int)
    DP[0] = 0    for i in range(n):
        add_this_ant = (DP + ants[i]) + INFTY * (DP > 6 * ants[i])
        add_this_ant = numpy.concatenate((add_this_ant[-1:], add_this_ant[:-1]))
        DP = numpy.minimum(DP, add_this_ant)   # next row    return numpy.argmax(DP >= INFTY) - 1   # finds first 'True'

A lot of this is counter-intuitive but after a bit of practice and reminding yourself, you will start seeing DP answers to problems.

Be familiar with:
- The Knapsack Problem: The archetypical DP problem

Example Problems:
- Ant Stack: 2D DP table. Problem discussed above, the other ones are harder
- Edgy Baking
- Red Tape Committee: 3D DP table
- Graceful Chainsaw Jugglers: 3D DP table

Modular Arithmetic & Primes

Modular Arithmetic should be covered by any good college CS course. Key concepts to review and be familiar with are:

- How to check if a number n is prime (in short: try to divide by all numbers between 2 and the square root of n, using modulo)
- GCDs and the Euclidean algorithm. Know that two numbers are coprime if their GCD is 1.
- Euler’s totient (prime-counting) function
- The Fundamental Theorem of Arithmetic
- Chinese Remainder Theorem

Let us analyze the recent Broken Clock problem as an example. We found an unlabeled clock in the basement, meaning we don’t know which hand is the seconds/minutes/hours hand, and in the larger test cases we don’t know what direction 12 o’clock is, and we’re asked to find the time shown. Because the problem guarantees there is always one solution, we can actually solve the first part by just trying all 6 different assignments of seconds/minutes/hours to the hands and checking which one is valid.

The problem asks us to deal with millisecond time, so the problem defines a “tick” as the smallest amount any hand will move in a millisecond, specifically the hour hand, which means it takes T = 12 * 60 * 60 * 10**9 = 43200000000000 ticks to go full circle around the clock. Every millisecond advances the hour hand by 1 tick, the minutes hand by 12 ticks, and the seconds hand by 12*60 = 720 ticks. The hours hand will always be on its first rotation since the actual time is between noon and midnight, but the other hands could have crossed 12 o’clock multiple times, leading to the following equations:

    t             =  hours_hand
    (12 t)  %  T  =  minutes_hand
    (720 t) %  T  =  seconds_hand

Let us rewrite this in modular arithmetic for reasons that will become clear later. The following just means that both sides are equal once taken mod T:

    t        ===T  hours_hand
    (12 t)   ===T  minutes_hand
    (720 t)  ===T  seconds_hand

So, for the small test case, we know t (the actual number of milliseconds since midnight) immediately, and can just check if all equations match, and return the actual time since midnight (can be trivially calculated from the milliseconds since midnight).

For the large case, the clock can be rotated any which way, leading to each hand being off by some number of ticks r:

    r + t        ===T  hours_hand
    r + (12 t)   ===T  minutes_hand
    r + (720 t)  ===T  seconds_hand

If we subtract the first two equations we get

    (11 t)   ===T  minutes_hand - hours_hand

Notice how the r disappeared! We can figure out t directly from the minutes and hours hands.

The only task left is to “divide by 11” on both sides. In modular arithmetic, this can be done by multiplying with the modular inverse. The modular inverse of a number M1 is a number M2 such that M1 M2 = 1. The naive algorithm to find a modular inverse is to simply try all numbers from 0 to T-1 and check which one results in 1 after multiplication and modulo. A more efficient way is using the extended Euclidean algorithm, which I recommend studying.

For Python 3.8+, finding the modular inverse is actually baked straight into the language:

>>> pow(11, -1, 43200000000000)
15709090909091

Thus we simply have

    t   ===T  15709090909091 (minutes_hand - hours_hand)
    r   ===T  hours_hand  -  t

Sadly the judge uses Python 3.7 so I will hardcode that constant for now. Full solution:

T = 12 * 60 * 60 * 10**9  # 43200000000000 ticks on the clock
modular_inverse_of_eleven = 15709090909091def try_assignment(seconds, minutes, hours):
    t = ((minutes - hours) * modular_inverse_of_eleven) % T
    r = (hours - t) % T    if not (
            (t + r) % T == hours and
            (12 * t + r) % T == minutes and
            (720 * t + r) % T == seconds
    ):
        return    return (
        f'{t // (60 * 60 * 10**9)} '
        f'{(t // (60 * 10**9)) % 60} '
        f'{(t // 10**9) % 60} '
        f'{t % 10**9}'
    )def run(data):
    a, b, c = data
    return (
            try_assignment(a, b, c) or
            try_assignment(a, c, b) or
            try_assignment(b, a, c) or
            try_assignment(b, c, a) or
            try_assignment(c, a, b) or
            try_assignment(c, b, a) or
            "no result found, this should never happen"
    )

Example problems:

- Part Elf: Relatively straightforward application of GCD
- Broken Clock: The problem from the previous section (modular arithmetic)
- Prime Time: This problem is also about analyzing limits/boundaries
- Golf Gophers

Limits / Boundaries

This is sort of a meta-category. Many problems actually don’t require a very smart or fast algorithm, and can be solved with a standard algorithm by analyzing what maximum values occur. Essentially you are analyzing what NOT to compute, so the parts you can skip.

Example Problems:

- Prime Time: Actually a Primes problem
- Ant Stack: Actually a DP problem.

Floating Point Precision

This is another meta-category. Remember this: A float has 7 digits of precision, a double has 15 digits of precision. For example, in Incremental House of Pancakes the inputs range up to 10**18, this is already a red flag because that number has 18 digits which is in fact more than 15. You can even play around with this in Python and see that 10**18 and 10**18 + 64 are all mapped onto the same float:

>>> float(10**18)
1e+18
>>> float(10**18 + 64)
1e+18
>>> float(10**18 + 65)
1.0000000000000001e+18

Try using ints and int operations whenever possible. In Python, this means using // instead of / when appropriate. In that particular problem, using the integer square root instead of the normal floating point square root was the key.

Catastrophic Cancellation is another floating point issue that can occur. When subtracting two floats close to each other, you will be left with a lot less significant digits, since the leading digits that were close cancel each other out. With above example:

>>> float(10**18 + 65) - float(10**18 + 64)
128.0

One possible fix here is to avoid subtracting values when removing an item from your current total, but instead re-calculating it by summing the original value of all remaining items. To be extra safe, always sum from low to high, because once your sum grows large, adding small values will cause a larger absolute error. This only matters when summing a huge number of small values however. Multiplication and division of float numbers is numerically well-behaved and should be okay.

Example Problems:

- Kiddie Pool: Catastrophic Cancellation
- Incremental House of Pancakes

Interactive Problems

Interactive problems don’t require you to read an input file and produce an output file, but instead require you to communicate with another problem by reading and writing one line at a time. In my opinion, IP tend to have a very low number of attempts and correct answers compared to their difficulty. Most IP are actually easy, but I think people just cannot figure out how to use the interactive runner or debug the program. Or maybe they are afraid of the probabilistic nature of them (you get random input thrown at you and need to be better than some threshold, which is hard to prove). My advice: Never miss out on them! They will put you ahead of the competition for relatively little effort.

Example Problems:

- Lollipop Shop: Easiest
- Golf Gophers
- Pottery Lottery
- Digit Blocks

Graphs / Trees

Graph algorithms is a large field on its own. What you should focus on is knowing depth-first search (DFS) and trees, although most graph and tree algorithms I had to use in Code Jam were pretty intuitive.

The one non-obvious algorithm I’ve encountered multiple times is the Hungarian Algorithm. Note that the Python judge DOES include numpy and scipy so you can solve these types of problems using scipy.optimize.linear_sum_assignment.

DFS:
- BFFs

Tries:
- Alien Rhyme

Maximum-Bipartite-Matching / Hungarian Algorithm:
- Technobabble: Very straightforward
- Costume Change
- Energy Stones

Maximum-Flow:
- Bilingual

Google Code Jam — How To Prepare

Study guide for succeeding at competitive programming

Introduction

Greedy & Sort-By-Goodness

Triangle Numbers

Binary Search

Dynamic Programming

Dynamic Programming (Example: Ant Stack)

Modular Arithmetic & Primes

Limits / Boundaries

Floating Point Precision

Interactive Problems

Graphs / Trees

Written by Konopka Kodes Blog