Finding and Fixing bugs in PHP Source Code

All software development projects are built on top of existing solutions, from databases, frameworks and libraries; to compilers, interpreters, and operating systems. When things go wrong, it is often the user of the solution that is causing the bug, rather than the solution itself, especially when the solution is mature and has existed for quite some time.

Sometimes, however, the problem does lie in the underlying solution, and it usually happens on niche features that only a few use, as bugs are easier to overlook on those kinds of seldomly used features. In this article, we will discuss one such case. Below, we will detail how we came to realise that what we initially thought was a configuration problem on our end was actually a bug in PHP-FPM.

This post outlines the problem identification and debugging process we did.

Background on PHP-FPM

If you know what PHP-FPM is, feel free to skip to the next section.

PHP is a programming language powering a lot of websites, one being WordPress. PHP usually can’t host a web service by itself, and has to rely on host web servers like Apache, Nginx, and Lighttpd to serve HTTP requests, communicating with them via Server APIs (SAPI). The traditional way PHP and servers interact is via CGI: for every incoming request by a client, the host web server (Apache, Nginx, Lighttpd, etc.) will invoke the PHP binary (via fork and exec), pass the request into standard input (HTTP headers are passed via environment variables), and read standard output to form the response to the client.

While CGI is simple, it is inefficient since it has to spawn a process for every request (the PHP application serving the request shuts down as soon as it finishes processing the request). Due to this inefficiency, an alternative protocol called “FastCGI” was created which makes things faster by recycling CGI processes instead of shutting them down right after processing a request.

Due to the nature of FastCGI applications requiring the maintenance of long-running processes, complexity is introduced. For one, a way of managing the processes is needed, introducing a “manager process” that controls the process pool. This introduces new configuration variables that is not present in plain CGI, such as the maximum number of processes allowed, and the number of idle processes to maintain on standby. This makes FastCGI applications servers themselves, rather than just mere one-off processes.

For PHP, serving requests over FastCGI is done using PHP-FPM (PHP FastCGI Process Manager). This is the process manager that is responsible for interacting with host web servers. Its job is to maintain a pool of PHP processes and hand off HTTP requests to them. It is also important to note that since a process manager is not responsible for executing PHP code, PHP-FPM is a different application than the PHP interpreter binary.

Discovery of the Issue

In order to aid system administrators in detecting performance issues, PHP-FPM can expose metrics like the number of active and idle processes, as well as the queue (incoming connections waiting to be processed). The endpoint uses a different listen socket than can also be used to check if PHP-FPM is able to respond in the first place, which can be use useful for identifying performance-related incidents.

Output of the status PHP-FPM endpoint.
/status endpoint data served over the status.html web interface.

This feature is disabled by default and can be enabled by setting pm.status_path and pm.status_listen in the PHP-FPM conf file (not to be mixed with the php.ini file). The comments in the configuration defaults this to /status and recommends to also serve the status.html file in the /usr/share/php/8.2/fpm directory.

Related to this feature is the ping endpoint (configured using ping.path) which just responds with the string stated in ping.response. This can be used to check if PHP-FPM is still alive without exposing server details, which is pretty handy for connecting it to uptime monitors.

However, when we tried to get the ping endpoint to work while working on one of our projects, we kept on getting a 404 error instead of the intended ping response (we were using PHP 8.2.7). These were the probable issues we listed:

  1. Our DNS was improperly configured, either assigning the wrong hostname to the IP address or the other way around.
  2. The reverse-proxy we were using, Nginx, was not configured properly: either not being able to connect to the PHP-FPM process, or connecting but not sending the FastCGI parameters properly.
  3. We configured PHP-FPM improperly, causing ping.path to not be recognized.

To identify the source of the issue, we tried connecting to PHP-FPM directly. Since PHP-FPM uses the FastGCI instead of the standard HTTP, we couldn’t use curl or wget, but instead used the cgi-fcgi command-line tool (it is actually a protocol converter from CGI to FastCGI, included in the libfcgi-bin package in Debian) to connect to PHP-FPM without having to go through Nginx.

Configuring pm.status_listen to 127.0.0.1:9123, we could then test connecting directly to PHP-FPM:

SCRIPT_FILENAME=/ping SCRIPT_NAME=/ping REQUEST_METHOD=GET cgi-fcgi -bind -connect 127.0.0.1:9123

But we got the following output:

Primary script unknownStatus: 404 Not Found
Content-type: text/html; charset=UTF-8

File not found.

We got a 404, which means that we were not hitting the ping endpoint. However, PHP-FPM still managed to send a response (as opposed to getting a “could not connect”) so we managed to connect to PHP-FPM properly. To test further, we tried connecting to the status endpoint:

$ SCRIPT_FILENAME=/status SCRIPT_NAME=/status REQUEST_METHOD=GET cgi-fcgi -bind -connect 127.0.0.1:9123
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Cache-Control: no-cache, no-store, must-revalidate, max-age=0
Content-type: text/plain;charset=UTF-8

pool:                 www
process manager:      dynamic
start time:           29/Apr/2024:01:08:28 +0800
start since:          68649
accepted conn:        0
listen queue:         0
max listen queue:     0
listen queue len:     0
idle processes:       2
active processes:     0
total processes:      2
max active processes: 0
max children reached: 0
slow requests:        0

This means the ping endpoint was configured wrongly or something was wrong with how PHP-FPM understood that part of the configuration. To test if PHP-FPM was parsing the configuration properly, we did php-fpm -tt :

[29-Apr-2024 20:58:26] NOTICE: [global]
[29-Apr-2024 20:58:26] NOTICE:  pid = /run/php/php8.2-fpm.pid
[29-Apr-2024 20:58:26] NOTICE:  error_log = /var/log/php8.2-fpm.log
...
[29-Apr-2024 20:58:26] NOTICE:
[29-Apr-2024 20:58:26] NOTICE: [www]
[29-Apr-2024 20:58:26] NOTICE:  listen = /run/php/php8.2-fpm.sock
...
[29-Apr-2024 20:58:26] NOTICE:  pm.status_path = /status
[29-Apr-2024 20:58:26] NOTICE:  pm.status_listen = 127.0.0.1:9123
[29-Apr-2024 20:58:26] NOTICE:  ping.path = /ping
[29-Apr-2024 20:58:26] NOTICE:  ping.response = pong
[29-Apr-2024 20:58:26] NOTICE:  access.log = undefined
[29-Apr-2024 20:58:26] NOTICE:  access.format = undefined
[29-Apr-2024 20:58:26] NOTICE:  slowlog = undefined
...
[29-Apr-2024 20:58:26] NOTICE: configuration file /etc/php/8.2/fpm/php-fpm.conf test is successful

From this, we saw that the configuration was being parsed properly but, for some reason, it was not being recognized. At this point, the only way to dig deeper was to debug PHP-FPM directly.

Unfortunately, despite the large amount of people using it, no one provides debug symbols for PHP-FPM. This left us with no choice but to compile PHP-FPM ourselves, with debug symbols enabled.

Compiling PHP-FPM

Compiling PHP (and therefore PHP-FPM) was pretty straightforward thankfully. The build instructions can be followed from the PHP GitHub repository. The only thing different in this specific scenario was that we needed to configure PHP with debug and FPM enabled. We used the following configure parameters: ./configure --enable-fpm --enable-debug --enable-cgi --enable-shared --enable-json=shared CFLAGS="-DDEBUG_ZEND=2"

Building was pretty fast; it took less than 10 minutes.

Debugging

We needed two things to start debugging:

  1. A debugger (i.e. GDB), and
  2. Knowing where in the PHP code to start debugging.

GDB was pretty straightforward to install using the operating system’s package manager. For Debian, it was just apt install gdb

Searching through the ping.path configuration key in the PHP source code, we found that it was getting assigned to fpm_status_ping_uri , and searching usages of the variable, we arrived at this line, fpm_status.c:155:

This line checks if the request for ping should be handled, which was a good starting point for debugging. We then did the debugging:

  1. Made sure that the compiled php-fpm process was running, not the one provided by the OS.
  2. Noted the PID of the php-fpm process (using ps or other methods). For this example, let’s assume the PID was 1234
  3. Ran gdb and attached it to the php-fpm process attach 1234
  4. Set the breakpoint to our target line: b fpm_status.c:155
  5. Since PHP-FPM forks a child process when processing a request, we had to tell GDB to follow the child on fork by set follow-fork-mode child
  6. Used cgi-fcgi to send a ping request to PHP-FPM

Things went as intended and we halted right at our target line. We then tried printing the fpm_status_ping_uri and fpm_ping_status_response:

(gdb) p fpm_status_ping_uri
$1 = 0x0
(gdb) p fpm_status_ping_response
$2 = 0x0

So, both config values were set to null, despite being properly configured and parsed by PHP-FPM. What then would the config value of the status endpoint fpm_status_uri contain?

(gdb) p fpm_status_uri
$3 = 0x55e4a81390ab "/status"

Since the status and ping endpoints were (theoretically) parsed and handled by the same set of code, it was strange that status was being handled properly but ping was not. So we checked for how fpm_status_ping_uri was assigned and we encountered the aptly-named fpm_status_init_child function in line fpm_status:35:

Going through the same motions to set a breakpoint in that line, we tried printing the wp->config variables:

The result was still suspicious. The next step was then to track down instances of pm_status_path, especially those that had no counterpart instances of ping_path and ping_response. Eventually we encountered these lines in fpm_conf:740:

The lines could be understood as copying strings over using “FPM_WPC_STR_CP”; the macro was defined a few lines before. The fact that the macro’s name contains “STR_CP” suggests that it is meant for copying strings.

So, “FPM_WPC_STR_CP” copied config variables over to another variable, which was then passed to the child when forked. However, notably missing from the copy was ping_path and ping_response.

Adding the FPM_WPC_STR_CP lines for both ping_path and ping_response, and recompiling PHP-FPM with the changes, made the ping endpoint work again:

X-Powered-By: PHP/8.2.7-dev
Content-type: text/plain;charset=UTF-8
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Cache-Control: no-cache, no-store, must-revalidate, max-age=0

pong

Conclusion

By meticulously debugging the PHP source code, we identified the issue and proposed a fix. This shows the importance of a methodical debugging process, even for seemingly straightforward configuration problems.

This experience reminds us: bugs can lurk anywhere, even in established software like PHP-FPM. By sharing this, we aim to equip fellow developers with a keen eye and the willingness to explore all avenues during troubleshooting.

Don’t let hidden bugs like this derail your project. At Bonito Tech, our experienced team of IT industry professionals bring the same close attention to detail to every project, in digital solutions and software development. We provide tailored services for businesses to succeed in the digital landscape, creating thoughtful solutions done with integrity. Contact us today!