Benchmarking of PHP application with php-fpm vs swoole/openswoole

How realistic is this benchmark?

Before people start pulling out pitchforks from their basements (sheds, whatever) – I do realize this is a rather NOT a realistic test-case scenario. In the real life scenario, it is somewhat unlikely your API will just be serving some static json responses without any io calls. And even a single io call (mysql/postgres/elasticsearch/mongo) will slow things down dramatically and largely even out these numbers (at least until we throw in async and io calls parallelization that swoole/openswoole makes possible).

However, even without IO calls -> this benchmark of otherwise identical applications demonstates that re-bootstrapping on every request also comes with hard to ignore costs.

Now with this disclaimer out of the way – let us proceed.

What application is being tested?

I’ve explained how to setup mezzio skeleton application with either swoole/openswoole or php-fpm in the two recent posts, see here:

This was obviously done in preparation for something – and that is this benchmark (and other posts I hope to follow).

You may ask – why mezzio php framework? That’s because it’s a nice minimal and modern little framework that supports swoole/openswoole as well as php-fpm out of the box (which makes it easier to set up near feature-identical applications for head-to-head benchmark). And since it’s a real micro application that includes several middlewares, logging, PSR-7 marshaling/unmarshalling for swoole and php-fpm – such a benchmark is more realistic than just “echo OK” that we see in some other places.

Test system specs

I’m going to be running this benchmark on my local home micro server with Ubuntu 22.04 LTS, CPU: i7-6700T (4 cores / 8 threads), with 32Gb DDR4 ram.

linuxdev@hs-dev1: /tmp: neofetch
            .-/+oossssoo+/-.               linuxdev@hs-dev1 
        `:+ssssssssssssssssss+:`           ---------------- 
      -+ssssssssssssssssssyyssss+-         OS: Ubuntu 22.04.1 LTS x86_64 
    .ossssssssssssssssssdMMMNysssso.       Host: HP EliteDesk 800 G2 DM 35W 
   /ssssssssssshdmmNNmmyNMMMMhssssss/      Kernel: 5.15.0-47-generic 
  +ssssssssshmydMMMMMMMNddddyssssssss+     Uptime: 5 days, 45 mins 
 /sssssssshNMMMyhhyyyyhmNMMMNhssssssss/    Packages: 1057 (dpkg), 6 (snap) 
.ssssssssdMMMNhsssssssssshNMMMdssssssss.   Shell: zsh 5.8.1 
+sssshhhyNMMNyssssssssssssyNMMMysssssss+   Terminal: /dev/pts/21 
ossyNMMMNyMMhsssssssssssssshmmmhssssssso   CPU: Intel i7-6700T (8) @ 3.600GHz 
ossyNMMMNyMMhsssssssssssssshmmmhssssssso   GPU: Intel HD Graphics 530 
+sssshhhyNMMNyssssssssssssyNMMMysssssss+   Memory: 2246MiB / 31984MiB 

We are going to base this benchmark upon 2 dockerized stacks referenced above, you can find source code in the following git repos:

Using wrk to create load

For this test – I will be using wrk benchmarking tool – a modern performance testing tool:

wrk is a modern HTTP benchmarking tool capable of generating significant load when run on a single multi-core CPU. It combines a multithreaded design with scalable event notification systems such as epoll and kqueue.

Since the CPU for this test has 8 threads – we will be setting wrk threading to the same number (-t8) to better utilize the CPU. We will also limit the test to 10 seconds each (-d10s).

About swoole vs php-fpm processing differences

Swoole has two working modes: SWOOLE_BASE and SWOOLE_PROCESS. This naming could be a little confusing as BASE does not necessarily means basic and doesn’t mean that it can’t have multiple worker processes (yes it can – and you can even use addProcess to add sidecar processes) and PROCESS mode doesn’t mean that it’s the only mode capable of multi-processing (through workers_num or task_workers_num) or using additional processes via addProcess ¯_(ツ)_/¯.

Below I’ve listed a high-level differences (to the best of my abilities – please correct me if you disagree with something).

php-fpm working model: the master process starts the Worker Pool. The worker pool maintains multiple worker processes (how many – depends on the configuration). The worker process then listens for requests, creates connections and uses the FastCGI protocol to communicate with likes of nginx or apache servers in the front. Each worker blocks http until php prepares a response, and the same worker can only process one request at a time – which is not too good for concurrency when applications are mostly async.

SWOOLE_BASE working model: It uses the same processing model as node.js or nginx (as in – single-process and single-threaded). However – you can still set worker_num configuration to spin off multiple worker processes to handle requests in multiprocessing environments – in which case it would be more like node.js with pm2 process manager (in my opinion).

You cannot use dispatch_mode for a more specific workers load balancing choice when you are running a server as SWOOLE_BASE.

If worker process was terminated due to some reason – it will be restarted automatically.

And yes – you can still attach extra processes to the server using addProcess ( at least to me this makes swoole mode names a little confusing, but its probably just me 🙂 ).

SWOOLE_PROCESS working model: SWOOLE_PROCESS runs swoole as a multi-thread reactor that accepts connections and passes them to multi-process worker, depending on amount of cores you get 1 thread per core by default. A lot of things are configurable – like number of reactor threads and number of worker processes.

Choosing between SWOOLE_BASE vs SWOOLE_PROCESS for your application

When choosing which mode you should for your application (SWOOLE_BASE vs SWOOLE_PROCESS) – I would probably say use SWOOLE_BASE for less complex use-cases and would be especially good fit for Kubernetes (k8s) or when other container orchestration platforms are used like ECS or docker (where it’s been a long-lasting practice to favor one process per container). For containerized swoole – containers are typically already scaled and restarted by container orchestrator (so we don’t need swoole manager to do it for us) – so it could be a very good idea to use SWOOLE_BASE with workers_num=1 (and make sure not to use task workers or sidecar processes) – which makes your swoole application a single process that’s also single threaded like node.js or nginx (but still with event loop and coroutines) – really very very similar to processing model of of node.js or nginx.

SWOOLE_BASE comes with less IPC overhead due to simpler networking model and is a default mode since swoole v5 (see release notes) – but various swoole frameworks might choose their own defaults (hyperf framework uses SWOOLE_PROCESS as default and mezzio used SWOOLE_BASE as default.

Testing setup

If you are using modern multicore cpu – you should be using the SWOOLE_PROCESS mode and you should be setting worker_num accordingly to your workload – in order to fully utilize the processing power.

Swoole gives us certain guidelines on how to choose the worker_cpu value:

The number of worker processes to start. By default this is set to the number of CPU cores you have.
If your server is running code that is asynchronous and non-blocking, set the worker_num to the value from one to four times of CPU cores. For example, you could use swoole_cpu_num() * 2.
If your server is running code that is synchronous and blocking, set the worker_num to an average of how long a single request takes, so 100ms to 500ms would be a worker_num of somewhere between that range.

We will be testing a very simple json api response, that simply returns current timestamp.



namespace App\Handler;

use Laminas\Diactoros\Response\JsonResponse;
use Psr\Http\Message\ResponseInterface;
use Psr\Http\Message\ServerRequestInterface;
use Psr\Http\Server\RequestHandlerInterface;

use function time;

class PingHandler implements RequestHandlerInterface
    public function handle(ServerRequestInterface $request): ResponseInterface
        return new JsonResponse(['ack' => time()]);

MEZZIO with PHP-FPM testing


linuxdev@hs-tester:~$ wrk -t8 -c100 -d10s
Running 10s test @
  8 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    27.60ms    3.68ms  57.91ms   87.91%
    Req/Sec   435.95     58.63   484.00     84.38%
  34740 requests in 10.01s, 6.10MB read
Requests/sec:   3471.59
Transfer/sec:    623.80KB


linuxdev@hs-tester:~$ wrk -t8 -c200 -d10s
Running 10s test @
  8 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    56.86ms    5.74ms 127.13ms   88.72%
    Req/Sec   440.20     56.91   530.00     73.75%
  35076 requests in 10.01s, 6.15MB read
Requests/sec:   3504.54
Transfer/sec:    629.72KB


linuxdev@hs-tester:~$ wrk -t8 -c300 -d10s
Running 10s test @
  8 threads and 300 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    84.37ms    8.89ms 155.49ms   87.91%
    Req/Sec   438.92     64.92   610.00     64.88%
  34966 requests in 10.03s, 6.14MB read
Requests/sec:   3486.73
Transfer/sec:    626.52KB


linuxdev@hs-tester:~$ wrk -t8 -c400 -d10s
Running 10s test @
  8 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   113.18ms   10.72ms 216.88ms   89.14%
    Req/Sec   441.99     57.91   820.00     74.06%
  35154 requests in 10.03s, 6.17MB read
Requests/sec:   3505.87
Transfer/sec:    629.96KB


Do note how we are seeing the appearance of the Socket read errors here.

linuxdev@hs-tester:~$ wrk -t8 -c500 -d10s
Running 10s test @
  8 threads and 500 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   156.28ms   15.28ms 407.92ms   88.41%
    Req/Sec   387.09     84.78   616.00     72.02%
  30755 requests in 10.03s, 5.40MB read
  Socket errors: connect 0, read 73793, write 0, timeout 0
Requests/sec:   3065.08
Transfer/sec:    550.76KB


linuxdev@hs-tester:~$ wrk -t8 -c600 -d10s
Running 10s test @
  8 threads and 600 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   164.47ms   13.91ms 228.76ms   82.84%
    Req/Sec   378.57     57.90   808.00     73.53%
  30088 requests in 10.05s, 5.28MB read
  Socket errors: connect 0, read 172293, write 0, timeout 0
Requests/sec:   2992.78
Transfer/sec:    537.76KB


linuxdev@hs-tester:~$ wrk -t8 -c700 -d10s
Running 10s test @
  8 threads and 700 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   171.80ms   15.72ms 425.10ms   90.11%
    Req/Sec   374.78     62.82   790.00     76.49%
  29716 requests in 10.05s, 5.21MB read
  Socket errors: connect 0, read 176652, write 0, timeout 0
Requests/sec:   2957.43
Transfer/sec:    531.41KB


linuxdev@hs-tester:~$ wrk -t8 -c800 -d10s
Running 10s test @
  8 threads and 800 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   176.60ms   14.46ms 278.12ms   91.18%
    Req/Sec   378.72     61.97   700.00     76.26%
  30035 requests in 10.04s, 5.27MB read
  Socket errors: connect 0, read 165457, write 0, timeout 0
Requests/sec:   2991.38
Transfer/sec:    537.51KB


linuxdev@hs-tester:~$ wrk -t8 -c900 -d10s
Running 10s test @
  8 threads and 900 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   180.88ms   13.52ms 413.15ms   87.89%
    Req/Sec   380.71     61.47   560.00     76.98%
  30185 requests in 10.03s, 5.30MB read
  Socket errors: connect 0, read 167945, write 0, timeout 0
Requests/sec:   3008.29
Transfer/sec:    540.55KB


linuxdev@hs-tester:~$ wrk -t8 -c1000 -d10s
Running 10s test @
  8 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   191.65ms   25.73ms 401.45ms   92.11%
    Req/Sec   388.00     59.92   545.00     74.75%
  30790 requests in 10.05s, 5.40MB read
  Socket errors: connect 0, read 145079, write 0, timeout 0
Requests/sec:   3065.10
Transfer/sec:    550.76KB

MEZZIO with swoole (in SWOOLE_BASE mode) testing


linuxdev@hs-tester:~$ wrk -t8 -c100 -d10s
Running 10s test @
  8 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    12.13ms   10.55ms 115.28ms   81.30%
    Req/Sec     1.15k   215.27     3.56k    78.95%
  91806 requests in 10.10s, 14.45MB read
Requests/sec:   9089.84
Transfer/sec:      1.43MB


linuxdev@hs-tester:~$ wrk -t8 -c200 -d10s
Running 10s test @
  8 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    22.52ms   10.67ms 122.63ms   70.33%
    Req/Sec     1.13k    85.41     1.37k    70.12%
  89758 requests in 10.00s, 14.12MB read
Requests/sec:   8971.64
Transfer/sec:      1.41MB


linuxdev@hs-tester:~$ wrk -t8 -c300 -d10s
Running 10s test @
  8 threads and 300 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    39.79ms   40.14ms 278.61ms   85.38%
    Req/Sec     1.17k   190.47     2.27k    71.38%
  93461 requests in 10.04s, 14.71MB read
Requests/sec:   9311.12
Transfer/sec:      1.47MB


linuxdev@hs-tester:~$ wrk -t8 -c400 -d10s
Running 10s test @
  8 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    51.43ms   44.86ms 351.79ms   87.48%
    Req/Sec     1.13k   113.49     1.58k    69.50%
  89868 requests in 10.02s, 14.14MB read
Requests/sec:   8968.50
Transfer/sec:      1.41MB


linuxdev@hs-tester:~$ wrk -t8 -c500 -d10s
Running 10s test @
  8 threads and 500 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    57.17ms   42.92ms 323.81ms   72.51%
    Req/Sec     1.16k   135.03     1.69k    73.50%
  92716 requests in 10.04s, 14.59MB read
Requests/sec:   9232.17
Transfer/sec:      1.45MB


linuxdev@hs-tester:~$ wrk -t8 -c600 -d10s
Running 10s test @
  8 threads and 600 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    61.57ms   31.08ms 395.86ms   72.91%
    Req/Sec     1.25k    97.78     1.64k    74.75%
  99243 requests in 10.04s, 15.62MB read
Requests/sec:   9884.07
Transfer/sec:      1.56MB


linuxdev@hs-tester:~$ wrk -t8 -c700 -d10s
Running 10s test @
  8 threads and 700 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    77.55ms   48.28ms 551.10ms   65.66%
    Req/Sec     1.16k   146.58     2.69k    81.12%
  92681 requests in 10.06s, 14.58MB read
Requests/sec:   9209.04
Transfer/sec:      1.45MB


linuxdev@hs-tester:~$ wrk -t8 -c800 -d10s
Running 10s test @
  8 threads and 800 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    80.64ms   37.94ms 460.35ms   78.60%
    Req/Sec     1.26k   160.25     1.94k    78.88%
  100306 requests in 10.06s, 15.78MB read
Requests/sec:   9968.80
Transfer/sec:      1.57MB


linuxdev@hs-tester:~$ wrk -t8 -c900 -d10s
Running 10s test @
  8 threads and 900 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   101.36ms   51.43ms 444.76ms   76.51%
    Req/Sec     1.12k   121.53     2.42k    86.88%
  89114 requests in 10.05s, 14.02MB read
Requests/sec:   8863.87
Transfer/sec:      1.39MB


linuxdev@hs-tester:~$ wrk -t8 -c1000 -d10s
Running 10s test @
  8 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   106.08ms   38.58ms 427.55ms   75.24%
    Req/Sec     1.16k   132.92     1.82k    85.12%
  92675 requests in 10.07s, 14.58MB read
Requests/sec:   9204.73
Transfer/sec:      1.45MB

MEZZIO with swoole (in SWOOLE_PROCESS mode) testing


linuxdev@hs-tester:~$ wrk -t8 -c100 -d10s
Running 10s test @
  8 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    11.97ms    6.37ms  94.56ms   91.85%
    Req/Sec     1.05k    94.77     1.27k    74.75%
  83528 requests in 10.01s, 13.14MB read
Requests/sec:   8346.30
Transfer/sec:      1.31MB


linuxdev@hs-tester:~$ wrk -t8 -c200 -d10s
Running 10s test @
  8 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    24.42ms   10.78ms 112.90ms   89.16%
    Req/Sec     1.06k   140.97     3.61k    93.77%
  84328 requests in 10.10s, 13.27MB read
Requests/sec:   8349.62
Transfer/sec:      1.31MB


linuxdev@hs-tester:~$ wrk -t8 -c300 -d10s
Running 10s test @
  8 threads and 300 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    37.22ms   19.65ms 253.90ms   90.74%
    Req/Sec     1.04k    68.68     1.24k    72.25%
  83017 requests in 10.03s, 13.06MB read
Requests/sec:   8274.97
Transfer/sec:      1.30MB


linuxdev@hs-tester:~$ wrk -t8 -c400 -d10s
Running 10s test @
  8 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    50.44ms   23.69ms 243.32ms   90.96%
    Req/Sec     1.03k    66.24     1.20k    72.62%
  81956 requests in 10.02s, 12.90MB read
Requests/sec:   8179.92
Transfer/sec:      1.29MB


linuxdev@hs-tester:~$ wrk -t8 -c500 -d10s
Running 10s test @
  8 threads and 500 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    61.52ms   24.43ms 263.80ms   87.23%
    Req/Sec     1.02k    75.11     2.23k    84.12%
  81517 requests in 10.03s, 12.83MB read
Requests/sec:   8124.63
Transfer/sec:      1.28MB


linuxdev@hs-tester:~$ wrk -t8 -c600 -d10s
Running 10s test @
  8 threads and 600 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    74.17ms   30.64ms 299.97ms   87.90%
    Req/Sec     1.03k    63.14     1.44k    75.75%
  82150 requests in 10.07s, 12.93MB read
Requests/sec:   8159.72
Transfer/sec:      1.28MB


linuxdev@hs-tester:~$ wrk -t8 -c700 -d10s
Running 10s test @
  8 threads and 700 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    89.36ms   52.10ms 503.27ms   91.56%
    Req/Sec     1.04k    75.73     1.55k    73.88%
  82786 requests in 10.05s, 13.03MB read
Requests/sec:   8240.75
Transfer/sec:      1.30MB


linuxdev@hs-tester:~$ wrk -t8 -c800 -d10s
Running 10s test @
  8 threads and 800 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    99.97ms   44.06ms 450.42ms   88.86%
    Req/Sec     1.02k    88.68     1.74k    82.38%
  81155 requests in 10.04s, 12.77MB read
Requests/sec:   8082.41
Transfer/sec:      1.27MB


linuxdev@hs-tester:~$ wrk -t8 -c900 -d10s
Running 10s test @
  8 threads and 900 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   111.37ms   52.05ms 448.06ms   87.88%
    Req/Sec     1.02k   107.46     1.87k    87.12%
  81442 requests in 10.08s, 12.82MB read
Requests/sec:   8082.36
Transfer/sec:      1.27MB


linuxdev@hs-tester:~$ wrk -t8 -c1000 -d10s
Running 10s test @
  8 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   123.43ms   55.08ms   1.04s    88.32%
    Req/Sec     1.02k    81.17     1.50k    80.88%
  81501 requests in 10.06s, 12.83MB read
Requests/sec:   8103.60
Transfer/sec:      1.28MB

Let’s see some graphs

Requests per second with different concurrencies (higher is better).

Average Latency, ms (lower is better)

Read errors (lower is better).

Typically there should not be any read errors. As we can see – only php-fpm generated read errors.


Swoole is a clear winner in this benchmark (to not much of my surprise). The main difference between swoole vs php-fpm would be that swoole only needs to bootstrap the application once on application boot whereas php-fpm has to do this on every request and this shows. Demonstrably -> even simple framework bootstrap work does come with a performance price, as swoole base out performs otherwise identical php-fpm application 2-3x in requests per second.

See more swoole performance benchmarks here:

