The way to use bazooka to spot a typo and have fun

Recently (yesterday) I updated my server to the last Debian (bookworm). It was an occasion to rework the Ansible playbook I have to manage my server and get rid of old roles and backwards compatibility.

Everything went fine until I try to access one of the project through the browser. I got a weird message "File not found". Of course I checked everything (vhost, user rights...) but nothing seems wrong. I spot into nginx log this error:

2024/01/03 10:07:37 [error] 28323#28323: *21 FastCGI sent in stderr: "Primary script unknown" 
   while reading response header from upstream, client: xxx.xxx.xxx.xxx, server: domain.tld, 
   request: "GET / HTTP/2.0", upstream: "fastcgi://unix:/var/run/php/php8.3-fpm.sock:", 
   host: "domain.tld"

I don't know for you but for me this error means nothing. After a moment on my favourite search engine, I finally found the explication, this is a more or less standard error when nginx didn't received what it expect from PHP-FPM.

By default, PHP-FPM logs are disabled except for the running service. Let's create a new PHP-FPM pool to debug this error.
To ease debug and not crash the whole thing, I choose to work with a specific PHP-FPM pool.
So lets create a file /etc/php/8.3/fpm/pool.d/test.conf with this content:

<code class="hljs language-php">[test]
user = </code><code class="hljs language-php">www-data
</code><code class="hljs language-php">group = www-data
listen = /run/php/php8.3-fpm-test.sock
listen.owner = www-data
listen.group = www-data
access.log = /var/log/php-fpm/$pool.access.log
pm = dynamic
</code>

Let's create the log directory and restart PHP-FPM

mkdir <code class="hljs language-php">/var/log/php-fpm && systemctl restart php8.3-fpm</code>

The last thing I need is to edit vhost  to use the new fpm pool remplacing /run/php/php8.3-fpm.sock by /run/php/php8.3-fpm-test.sock and reload nginx.

Back to the browser, I still have the same error, the log in nginx is also the same. In the fpm pool log /var/log/php-fpm/test.access.log I got this error:

- -  03/Jan/2024:10:47:29 +0100 "GET /inde.php" 404

This error tells me that nginx is correctly passing the script to PHP-FPM so I'm sure nginx is not the problem. PHP-FPM can't access the file.
At this point the error is obvious, but after many hours of search I didn't seen it. And was still searching for a more explicit error.

My last idea was to trace the process. To simplify this, I changed the pool configuration to start only one child by adding in the /etc/php/8.3/fpm/pool.d/test.conf these lines:

<code class="hljs language-php">pm.max_children = <span class="hljs-number">1</span>
pm.start_servers = <span class="hljs-number">1</span>
pm.min_spare_servers = <span class="hljs-number">1</span>
pm.max_spare_servers = <span class="hljs-number">1</span></code>

Of course php-fpm whould be restarted.
Then ps command allow me to find the process I wanted to trace:

ps -aef | grep php
root       22044       1  0 14:17 ?        00:00:00 php-fpm: master process (/etc/php/8.3/fpm/php-fpm.conf)
www-data   22046   22044  0 14:17 ?        00:00:01 php-fpm: pool test
www-data   22047   22044  0 14:17 ?        00:00:00 php-fpm: pool www
www-data   22048   22044  0 14:17 ?        00:00:00 php-fpm: pool www
root       37577   20749  0 15:12 pts/0    00:00:00 grep --color=auto php
 

Now I can trace the process I wanted and then refresh the browser.

strace -p 22046
strace: Process 22046 attached
accept(12, {sa_family=AF_UNIX}, [112 => 2]) = 5
fcntl(5, F_GETFD)                       = 0
fcntl(5, F_SETFD, FD_CLOEXEC)           = 0
poll([{fd=5, events=POLLIN}], 1, 5000)  = 1 ([{fd=5, revents=POLLIN}])
times({tms_utime=117 /* 1.17 s */, tms_stime=23 /* 0.23 s */, tms_cutime=0, tms_cstime=0}) = 1718678172
read(5, "\1\1\0\1\0\10\0\0", 8)         = 8
read(5, "\0\1\0\0\0\0\0\0", 8)          = 8
read(5, "\1\4\0\1\4Z\6\0", 8)           = 8
read(5, "\f\0QUERY_STRING\16\3REQUEST_METHODGE"..., 1120) = 1120
read(5, "\1\4\0\1\0\0\0\0", 8)          = 8
newfstatat(AT_FDCWD, "/var/www/inde.php", 0x7ffd86b2c0c0, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (<code class="hljs language-php">No such file <span class="hljs-keyword">or</span> directory</code>)
...
 

The output is way more verbose, but hey I finally have an explicit error message:/var/www/inde.php doesn't exist on the disk. Yes of course I forget a x at the end of the word index.

5 hours later, I finally fix the typo on my ansible playbook, run it again and get a fully functional server to run my projects. I know the error was obvious and I am ashamed I didn't seen it faster, but hey I played with fpm pool, strace... and I learned some new stuff.

Add a comment