An educated guess (Timur Fanshteyn): Multithreading will only take you so far

Working with Oracle Coherence, I do a lot of thinking about distributed architecture, parallel processing, and multithreading. Making use of all this technology is a great way to solve many problems. It can often seem, that as long as you can split your problem into small enough pieces, you’ll be able to process data instantaneously.

When thinking about distributed system, we often forget that making a system distributed, we still have a limited number resources to distribute the workload. The application is deployed on a specific number of machines, each with a specific number of CPU Cores. Each CPU core can process one instruction at a time (not completely true, but for simplicity’s sake). For example: In a distributed system where each workload takes 1 second to process, and we have a total of 10 workloads that need to be processed, will take at least 3 seconds to process all of them

Adding more threads will not work, only adding more CPU cores will. That is critically important when you consider that for many complex operations, a result of individual workload is not enough to provide a meaningful result to the end user. Results from all requested workloads have to be aggregated to create a final result. This makes it relatively simple to figure out how much a calculation will take:

Total Time = Number of Work Loads / Number of Cores * Time Take by Work Load

Another very import point to remember, is that during performance testing, distributed systems behave differently. Requests from multiple clients will interfere a lot more with each other then they do in a straight processing system. In a prior example, a request that takes 3 seconds when ran from 1 client, can take 6 seconds with 2 clients, if the last work item for the first client, is started after all work items of the second client.