Using Kotlin `fold` into collections

4 min readNov 12, 2022

In this article I’m going to show the significant performance downsides associated with using Kotlin’s fold into immutable collections such as list/set.

Ignoring the fact this is the equivalent to collection.toList() for a moment. What is wrong with this?:

collection.fold(emptyList()) { list, item -> list + item }

Does it look OK? What about the big O for this? What if I told you it was O(n²) where n is the size of the collection?

Just to explain if you don’t know of need a refresher on big O. What I’m claiming is that when the input size (n) doubles, the execution time squares. So for example suppose 100 items took 50 milliseconds, then 200 items will take 50x50 = 2,500 milliseconds, or 2.5 seconds!

This can be problem for developers as you often test with small datasets, and your users can often see larger sets in the wild.

Cause

Typically O(n²) performance means that somewhere there’s the equivalent to a nested loop in some way, like:

collection.forEach { outerItem -> 
  collection.forEach { innerItem ->
  }
}

We can easily see that the outer for loop is the fold as it can be written as so:

var result = emptyList()
collection.forEach { item ->
    result = result + item
}

There’s only one method called now, the +, the add method. And so this must be the cause of the other loop. Due to the immutable nature of the list result, a new list must be created with as many items as it had before and one additional. A reference to each existing item and the new one must be copied over to the new list.

For example, suppose our collection is 5 items long:

On the first loop, 1 item copied over in to a new list
Second loop, 2 items copied over
Third loop, 3 items copied over
Forth loop, 4 items copied over
Fifth loop, 5 items copied over
Total copy operations 1 + 2 + 3 + 4 + 5 = 15

So we can come up with a formula for how many copy operations we expect, it’s the sum of the series [1..n]:

f(n) = ((1 + n)*n)/2
     = (n + n²)/2
     = n/2 + n²/2

From this we can calculate the Big O.

O(n/2 + n²/2)
= O(n²/2) because we drop smaller terms and n < n²
= O(n²)   because we drop constant multipliers

Proof

We can write a quick example that doubles the size of the input and sees the effect, I’ve also for comparison added the toList .

fun main() {
    var n = 1
    // warm up
    test(n)
    println("n, fold duration, toList duration")
    (0..20).forEach { _ ->
        println(test(n))
        // double n
        n *= 2
    }
}

fun test(n: Int): String {
    // make a list of size n
    val collection = mutableListOf<String>()
    // fill it
    (1..n).forEach { i -> collection.add(i.toString()) }
    val foldDuration = time {
        val result: List<String> = collection.fold(emptyList()) { list, item -> list + item }
    }
    val toListDuration = time {
        val result: List<String> = collection.toList()
    }

    return "$n, $foldDuration, $toListDuration"
}

private fun time(block: () -> Unit): Long {
    val startTime = System.currentTimeMillis()
    block()
    return System.currentTimeMillis() - startTime
}

n, fold duration, toList duration
1,      0,        0
2,      0,        0
4,      0,        0
8,      0,        0
16,     0,        0
32,     0,        0
64,     0,        0
128,    0,        0
256,    1,        0
512,    1,        0
1024,   1,        0
2048,   5,        0
4096,   14,       0
8192,   43,       0
16384,  204,      0
32768,  528,      0
65536,  1784,     0
131072, 7695,     1
262144, 66206,    1

Charting the result we see a clear exponential growth in the time taken. toList is too fast to measure on the millisecond scale, but at least does not show the exponential properties.

Chart of fold duration against size of input

At the start I gave an example of an input of size 100 and 50 ms and it squaring when you double the input size. You may notice that is not happening in the results here. We double the input but the duration is less than the square of the previous. This is because of those lower terms, remember how we started with O(n/2 + n²/2). There’s also a hidden multiplier on the actual time a copy operation takes. However, the larger n gets, the less significant these terms and constant multipliers become, and so big O is really an indication of the duration as the sizes tend to infinity.

StringBuilder

This actual problem should be familiar to most Java/Kotlin developers although you might not realize the reason was as above.

// we don't do this:
var result = ""
(0..100).forEach { i -> 
   result += i
)

Instead of appending to a string in a loop, we’ve all been taught to use a StringBuilder and it is for the exact same reason, the String is immutable and each addition to it requires a copy of all the chars that are currently in the string.

// we do this
val result = StringBuilder()
(0..100).forEach { i ->
    result.append(i)
}

Conclusion

Even seemingly insignificant operations like + can hide large performance costs.

While you’re unlikely to have lists large enough to cause significant time issues, remember there are additional costs here as well such as the subsequent garbage collection of all the intermediary fold results that were allocated.

So while there’s no lint right now for this, it’s worth at least grepping your code base for instances of fold(empty . If you’re unsure what to replace them with, leave a comment.

Using Kotlin fold into collections

Cause

Proof

StringBuilder

Conclusion

Written by Alan Evans

Using Kotlin `fold` into collections