Here comes the Loom, the concurrency will never be the same again

I have been following Project Loom nearly since its appearance and I am very excited about how good a job we received from committers. It wasn’t easy to provide that, still, there is a lot to do but we can enjoy that great solution.
The JDK21 has arrived so we can say Project Loom is production-ready, maybe part of it.

The eternal problem

Many application are written in JVM languages and most of them are concurrent – I mean applications like servers, brokers, databases or application which needs more concurrency because business logic requires that. That requirement is simple – serve many requests. The request occurs concurrently and competes for computational resources, but it’s not easy to achieve it. The applications very often struggle with this, and there is always the same problem, how to achieve the best performance with the resources that currently we have. We can increase resources but sometimes the sky is not the limit, but the limit is much too low.

A few years ago we heard that:

There are only two hard things in Computer Science: cache invalidation and naming things.


Phil Karlton

But when we look at our application we can try to remember this quote:

The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software


Herb Sutter

Personally, I agree with that I think the third and maybe most important is concurrency in our application.

Okay, but let’s cut the bullshit. What was the issue?

  • Problem resources
  • Scalability
  • Efficiency

and last but not least: money. The main problem is with money because if we have a lot of money some of these things can quickly be swept under the rug.

Let’s start from the beginning

You need to know that every thread in the JVM is just a thin wrapper around the OS thread. The OS allocates a large* amount of memory ~2MB to store thread context, Java call stacks, etc. but our problem is that we want to write concurrent programs in a simple way and create new a thread for every task.

Way to write concurrent programs is create new a thread of every task

Thread per task model

If you look at the technology stack in JVM I am pretty sure you use at least one of the projects like Netty, Tomcat, Quarkus, Micronaut, Spring, Elasticsearch, Hibernate, etc. These projects use I/O which is one of the root causes of why our applications suffer and we can’t achieve better performance. I don’t say it is not possible to do but it’s hard and sometimes we don’t have any knowledge in the organisation of how we can achieve that.

Concurrency for I/O

When I think about Project Loom and Virtual Threads I consider them as concurrency for I/O, that solution is perfect for that.
We have to also understand the difference between concurrency and parallelism

Schedule multiply largely indepedent task to set of computational resources but not necessarily simultanesously – throuput(task/time unit)

Concurrency

Speed up a task by splitting it to sub-tasks and exploriting multiple processing unints – time unit

Parallelism

Reactive approach

The problem with resources and how to handle threads, etc. was resolved (??) in a reactive approach. Reactive programming models address limitations by releasing threads upon blocking operation network I/O or file. Once a blocking call has completed, the request will be continued, using the thread again. This model is a much more efficient use of the threads for I/O workloads. Unfortunately, there is a price, programmers have an issue with that style, and it’s hard to debug observe, or even maintain after a couple of months. That issue is covered by Project Loom.

Threads

You need to understand thread is just an abstraction, we use them as black-box, and thread is just a unit of work where we can execute our logic, separately on the current execution thread.
The problem is that there are always too less them, they require a lot of resources. We try to use thread pools but sometimes are hard to manage also programmers don’t have knowledge of how to use them, manage them, etc.

Sometimes is not such a simple. In the thread per request model, the calculation is simple…

You can imagine a situation when a single HTTP request takes 1 sec because it waits for I/O, and DB, doesn’t matter but it blocks for 1 sec… You can see that resources are always bound

Thread types

Project Loom provides three types of threads:

  • Platform thread
  • Carrier thread
  • Virtual thread

Platform thread

Platform thread is a standard thread which is created every time you call `new Thread`.
Platform thread needs ~1 ms to schedule, and consumes 2MB of stack, you have to think about it as the fast expensive thread which is managed by OS. You also be aware that task-switching requires a switch to the kernel and takes ~100µs (depending on OS). Scheduling is a compromise for all usages. Bad cache locality 

Carrier thread

The number of carrier threads depends on how many cores are on the machine, The Carrier thread is the same as the platform thread but they are hidden beyond programmer access. They are working in Fork-Join Pool which handles Virtual threads.

Virtual thread

A virtual thread is a lightweight user-mode thread which consumes much less memory than a usual thread does. They don’t need a platform thread to work. The blocking code is faster. You have to be aware that CPU cache misses are possible when you use Virtual threads, but later I describe more about it. A lot of engineers were faired of GC, but the Virtual thread is not GC root. Virtual thread is cheap to create, cheap to destroy and cheap to block which is a game changer. Virtual thread is mostly intended to write I/O application servers and message brokers. We can achieve higher concurrency if the system needs additional resources for concurrency e.g. available connection in the connection pool. Now sufficient memory is enough to serve increased load and increase efficiency for short-cycle tasks. Virtual thread isn’t for everything, especially for CPU-bound tasks and long-running tasks. Engineers who look more at performance have to be aware that the non-realtime kernels primarily employ time-sharing when the CPU is at 100%.
The non-realtime kernels schedule relatively fast but their priority implementation isn’t that good.
The real-time kernels do prioritise well, but their scheduling is slow. There is always a trade-off predictability for speed.

Virtual threads are not an execution resource, but a business logic object like a string.

I don’t want to say about who we use Virtual threads and Platform threads yet, but for these people who want to see how quickly we can create Virtual threads below is simple code.

final Thread thread1 = Thread
        .ofPlatform()
        .unstarted(() -> System.out.println("Hello from " + Thread.currentThread()));
final Thread thread2 = Thread
        .ofVirtual()
        .unstarted(() -> System.out.println("Hello from " + Thread.currentThread()));
Hello from Thread[#22,Thread-0,5,main]
Hello from VirtualThread[#23]/runnable@ForkJoinPool-1-worker-1

Fast forward to today

Virtual thread is a user-mode thread and is an instance of java.lang.Thread
The virtual thread is scheduled by JVM, not OS.
The platform thread is an instance of java.lang.Thread but implemented in a “traditional way”. A thin wrapper around OS thread.