[email protected], Sep 8, 2009
What steps will reproduce the problem?
1. Nginx 6.71
2. Passenger 2.2.4
3. Random page show a 502 Gateway errors with this message
[ pid=2398 file=ext/nginx/HelperServer.cpp:478 time=2009-09-08 14:49:41.259 ]:
Couldn't forward the HTTP response back to the HTTP client: It seems the
user clicked on the 'Stop' button in his browser.
[ pid=2398 file=ext/nginx/HelperServer.cpp:478 time=2009-09-08 14:49:44.214 ]:
Couldn't forward the HTTP response back to the HTTP client: It seems the
user clicked on the 'Stop' button in his browser.
*** Exception Errno::EPIPE in Passenger RequestHandler (Broken pipe)
(process 16217):
from
/opt/ree/lib/ruby/gems/1.8/gems/passenger-2.2.4/lib/phusion_passenger/rack/request_handler.rb:108:in
`write'
from
/opt/ree/lib/ruby/gems/1.8/gems/passenger-2.2.4/lib/phusion_passenger/rack/request_handler.rb:108:in
`process_request'
from
/opt/ree/lib/ruby/gems/1.8/gems/actionpack-2.3.2/lib/action_controller/response.rb:155:in
`each_line'
from
/opt/ree/lib/ruby/gems/1.8/gems/actionpack-2.3.2/lib/action_controller/response.rb:155:in
`each'
from
/opt/ree/lib/ruby/gems/1.8/gems/passenger-2.2.4/lib/phusion_passenger/rack/request_handler.rb:107:in
`process_request'
from
/opt/ree/lib/ruby/gems/1.8/gems/passenger-2.2.4/lib/phusion_passenger/abstract_request_handler.rb:206:in
`main_loop'
from
/opt/ree/lib/ruby/gems/1.8/gems/passenger-2.2.4/lib/phusion_passenger/railz/application_spawner.rb:376:in
`start_request_handler'
from
/opt/ree/lib/ruby/gems/1.8/gems/passenger-2.2.4/lib/phusion_passenger/railz/application_spawner.rb:334:in
`handle_spawn_application'
from
/opt/ree/lib/ruby/gems/1.8/gems/passenger-2.2.4/lib/phusion_passenger/utils.rb:182:in
`safe_fork'
from
/opt/ree/lib/ruby/gems/1.8/gems/passenger-2.2.4/lib/phusion_passenger/railz/application_spawner.rb:332:in
`handle_spawn_application'
from
/opt/ree/lib/ruby/gems/1.8/gems/passenger-2.2.4/lib/phusion_passenger/abstract_server.rb:351:in
`__send__'
from
/opt/ree/lib/ruby/gems/1.8/gems/passenger-2.2.4/lib/phusion_passenger/abstract_server.rb:351:in
`main_loop'
from
/opt/ree/lib/ruby/gems/1.8/gems/passenger-2.2.4/lib/phusion_passenger/abstract_server.rb:195:in
`start_synchronously'
from
/opt/ree/lib/ruby/gems/1.8/gems/passenger-2.2.4/lib/phusion_passenger/abstract_server.rb:162:in
`start'
from
/opt/ree/lib/ruby/gems/1.8/gems/passenger-2.2.4/lib/phusion_passenger/railz/application_spawner.rb:213:in
`start'
from
/opt/ree/lib/ruby/gems/1.8/gems/passenger-2.2.4/lib/phusion_passenger/spawn_manager.rb:261:in
`spawn_rails_application'
from
/opt/ree/lib/ruby/gems/1.8/gems/passenger-2.2.4/lib/phusion_passenger/abstract_server_collection.rb:126:in
`lookup_or_add'
from
/opt/ree/lib/ruby/gems/1.8/gems/passenger-2.2.4/lib/phusion_passenger/spawn_manager.rb:255:in
`spawn_rails_application'
from
/opt/ree/lib/ruby/gems/1.8/gems/passenger-2.2.4/lib/phusion_passenger/abstract_server_collection.rb:80:in
`synchronize'
from
/opt/ree/lib/ruby/gems/1.8/gems/passenger-2.2.4/lib/phusion_passenger/abstract_server_collection.rb:79:in
`synchronize'
from
/opt/ree/lib/ruby/gems/1.8/gems/passenger-2.2.4/lib/phusion_passenger/spawn_manager.rb:254:in
`spawn_rails_application'
from
/opt/ree/lib/ruby/gems/1.8/gems/passenger-2.2.4/lib/phusion_passenger/spawn_manager.rb:153:in
`spawn_application'
from
/opt/ree/lib/ruby/gems/1.8/gems/passenger-2.2.4/lib/phusion_passenger/spawn_manager.rb:286:in
`handle_spawn_application'
from
/opt/ree/lib/ruby/gems/1.8/gems/passenger-2.2.4/lib/phusion_passenger/abstract_server.rb:351:in
`__send__'
from
/opt/ree/lib/ruby/gems/1.8/gems/passenger-2.2.4/lib/phusion_passenger/abstract_server.rb:351:in
`main_loop'
from
/opt/ree/lib/ruby/gems/1.8/gems/passenger-2.2.4/lib/phusion_passenger/abstract_server.rb:195:in
`start_synchronously'
from
/opt/ree/lib/ruby/gems/1.8/gems/passenger-2.2.4/bin/passenger-spawn-server:61
What version of Phusion Passenger are you using? Which version of Rails? On
what operating system?
Passenger 2.2.4, Rails 2.3.2, version 1.8.6-20090610, Debian
Comment 1 by [email protected], Sep 9, 2009
Same problem with passenger 2.2.5
[ pid=14425 file=ext/nginx/HelperServer.cpp:478 time=2009-09-08 23:12:01.981 ]:
Couldn't forward the HTTP response back to the HTTP client: It seems the user
clicked on the 'Stop' button in his browser.
*** Exception Errno::EPIPE in Passenger RequestHandler (Broken pipe) (process 14590):
Comment 2 by [email protected], Sep 29, 2009
Confirmed On Ubuntu 9.04, Passenger 2.2.4, Rails 2.3.3, nginx/0.7.61
Comment 3 by [email protected], Oct 9, 2009
Same problem with passenger 2.2.5 Ubuntu 8.04, Rails 2.3.4, nginx/0.7.62
Comment 4 by [email protected], Oct 12, 2009
Same problem with passenger-2.2.5, Ruby Enterprise Edition 20090610, Ubuntu 9.04,
Rails 2.3.4, nginx/0.7.62. I'm using smart spawning.
System was running fine for 3 days. This morning every request started throwing a
502. This is my first time w/ a passenger deployment.
Comment 5 by [email protected], Oct 13, 2009
Same issue here. Passenger 2.2.4, Rails 2.3.3. Ubuntu 9.0.4. Apache 2.
This is taking our site down about once a day.
Comment 6 by [email protected], Oct 14, 2009
Possible resolution:
Turns out monit was killing my nginx worker threads due to memory consumption. Since
it was workers and not the master process, we didnt get any monit notifications about
a PID change.
Silly wabbits.
Comment 7 by [email protected], Oct 19, 2009
Apparently the 502 on our site were linux open file descriptor limits. We raised them and so far no more 502.
Comment 9 by [email protected], Oct 29, 2009
confirmed on nginx 0.7.59, passenger 2.2.4.
This is killing us about twice a day. what is actually causing this and are
there any workarounds?
Comment 10 by [email protected], Dec 13, 2009
I have similar issue in Apache. In my case, this was caused by crawler trying to grab
pdf files which were served by xsendfile by Rails. However normal user access won't
generate this kind of error in error_log.
This is on Apache 2.2.14, Passenger 2.2.7 and Rails 2.3.5.
Comment 11 by [email protected], Jan 22, 2010
Confirmed on
FreeBSD 7.2-RELEASE-p4 #26
Apache/2.2.13
Passenger 2.2.4
Rails 2.2.2
It totally kills all apps running on Passenger, and has happened 4 times on two days.
Comment 12 by project member [email protected], Jan 22, 2010
Can anybody else confirm that raising the OS file descriptor limit helps?
Comment 13 by [email protected], Jan 26, 2010
Defect Confirmed on Ubuntu 8.10
Nginx 0.7.64
Passenger 2.2.9
Rails 2.3.5
Ruby 1.9.1p243
This is also killing one of our servers about twice a day - mostly during peak times. This didn't happen when
we ran on ruby 1.8.6 + an older version of passenger (maybe around 2.2.6 or so). We recently upgraded to 1.9
and passenger along with it. Before the upgrade everything ran perfectly.
Does anyone have any news on this problem? It's been around since Sept. Unfortunately starting to look at
alternatives to Passenger because of this
Comment 14 by project member [email protected], Jan 26, 2010
steve.quinlan, have you tried increasing Apache's maximum file descriptor limit?
Comment 15 by [email protected], Jan 27, 2010
I'm on nginx. Is there a similar setting?
Comment 16 by project member [email protected], Jan 27, 2010
Yes. Please try increasing the max file descriptor limit.
Comment 17 by [email protected], Jan 27, 2010
I've increased the worker_rlimit_nofile setting in nginx.conf to 40000. Let me know if I should pick a better
number.
Not sure how relevant this is, but I have only one application running on this server, and notice that I
frequently get multiple ApplicationSpawners (my understanding is that you get one ApplicationSpawner per
code base).
Today when I made the file descriptor limit change and ran /etc/init.d/nginx reload, I got multiple
PassengerNginxHelperServer also. I had to kill all the processes manually. This may or may not be relevant to
this defect. I saved the passenger-status output if you wish me to post it here or in another report.
In the meantime, I'll post any updates on the progress with the max file descriptor limit.
And thanks for your help!
Comment 18 by [email protected], Jan 27, 2010
@honglilai My app just died in the same fashion described above, 80 minutes after making the file descriptor
limit change. So I don't think it made any difference. Here's the passenger memory stats. (Machine is 512 mb
slice, ubuntu)
------- Apache processes --------
### Processes: 0
### Total private dirty RSS: 0.00 MB
---------- Nginx processes ----------
PID PPID VMSize Private Name
-------------------------------------
10038 1 34.6 MB 0.2 MB nginx: master process /opt/nginx/sbin/nginx
10039 10038 35.7 MB 1.2 MB nginx: worker process
10043 10038 35.7 MB 1.3 MB nginx: worker process
### Processes: 3
### Total private dirty RSS: 2.68 MB
----- Passenger processes ------
PID VMSize Private Name
--------------------------------
10020 24.1 MB 1.4 MB PassengerNginxHelperServer /opt/ruby-
1.9.1/lib/ruby/gems/1.9.1/gems/passenger-2.2.9 /opt/ruby-1.9.1/bin/ruby 3 4 0 6 0 300 1 www-data 33
33 /tmp/passenger.10017
10035 46.6 MB 9.5 MB Passenger spawn server
10094 250.4 MB 103.4 MB Rails: /home/apps/public_html/myapp/current
10369 220.4 MB 0.2 MB Passenger ApplicationSpawner: /home/apps/public_html/myapp/current
10414 222.7 MB 45.9 MB Rails: /home/apps/public_html/myapp/current
10422 222.7 MB 45.9 MB Rails: /home/apps/public_html/myapp/current
10430 222.6 MB 44.3 MB Rails: /home/apps/public_html/myapp/current
10438 220.4 MB 0.2 MB Passenger ApplicationSpawner: /home/apps/public_html/myapp/current
### Processes: 8
### Total private dirty RSS: 250.79 MB
Comment 19 by project member [email protected], Jan 27, 2010
You mentioned that it is "killing your server". Does that mean that once the EPIPE
error occurs, it'll keep occurring for everybody until some manual action is taken?
Could you also check whether the Nginx PIDs change after the EPIPE error occurs? For
example I see that your Nginx PIDs are 10038, 10039 and 10043; do any of those three
change?
Comment 20 by [email protected], Jan 27, 2010
@honglilai - It would be more accurate to say it is killing my app that my server, although there is only 1 app
running on the server. Once the defect kicks in, all http requests to the app time out. But I'm still able to ssh
to the box and kill processes without lag etc.
I'm not sure if the pids change, so I've taken a note of the nginx pids now, and will report back with the new
nginx pids if/when the problem happens again.
In the meantime, I also made one other change to my nginx.conf, I removed the line that says:
rails_framework_spawner_idle_time: 0
It's the only unusual thing about my nginx.conf - I use default settings for pretty much everything. I'll see if
that has an effect and report back.
Comment 21 by hello%[email protected], Jan 28, 2010
@honglilai: Nearly 24 hours later, the server feels a lot better since I removed
"rails_framework_spawner_idle_time: 0". There's been no hanging since, and there's a
lot less processes running. The log files show a few of the EPIPE errors happening,
but there's been no visible effect on the server.
Here's the stats:
---------- Nginx processes ----------
PID PPID VMSize Private Name
-------------------------------------
10786 1 37.9 MB ? nginx: master process /opt/nginx/sbin/nginx
12013 10786 37.9 MB 1.3 MB nginx: worker process
12015 10786 37.9 MB 1.1 MB nginx: worker process
### Processes: 3
### Total private dirty RSS: 2.40 MB (?)
----- Passenger processes -----
PID VMSize Private Name
-------------------------------
11995 88.1 MB 1.3 MB PassengerNginxHelperServer
/opt/ruby-1.9.1/lib/ruby/gems/1.9.1/gems/passenger-2.2.9 /opt/ruby-1.9.1/bin/ruby 3 4
0 6 0 300 1 www-data 33 33 /tmp/passenger.10786
12010 46.6 MB 1.9 MB Passenger spawn server
15430 244.8 MB 97.8 MB Rails: /home/apps/public_html/myapp/current
### Processes: 3
### Total private dirty RSS: 101.04 MB
This looks a lot healthier than yesterdays output (see above). Is it possible to
confirm if the "rails_framework_spawner_idle_time: 0" setting could be problematic,
or is this just a big coincidence?
Thanks for your help again.
Comment 22 by [email protected], Jan 28, 2010
Ooops. The above comment should be posted from steve.quinlan (I posted it on a
different computer than normal)
Steve
Comment 23 by project member [email protected], Jan 28, 2010
It would be very odd if that option is the cause of the problems. Please keep
monitoring the server and report back if there are any problems.
Comment 24 by [email protected], Jan 28, 2010
I don't use this option (rails_framework_spawner_idle_time), and i have the problem.
Comment 25 by [email protected], Jan 28, 2010
The only other change I made yesterday was the file descriptor limits setting, but that had no observable
improvement as the app hung about an hour later. I'll keep monitoring.
Comment 26 by [email protected], Jan 29, 2010
Hi all,
after over 40 successful hours (the longest in a while), the defect occurred again. Here are the stats at the
time of the defect.
---------- Nginx processes ----------
PID PPID VMSize Private Name
-------------------------------------
10786 1 37.9 MB ? nginx: master process /opt/nginx/sbin/nginx
12013 10786 37.9 MB 1.3 MB nginx: worker process
12015 10786 37.9 MB 1.2 MB nginx: worker process
### Processes: 3
### Total private dirty RSS: 2.54 MB (?)
----- Passenger processes ------
PID VMSize Private Name
--------------------------------
11995 88.1 MB 1.4 MB PassengerNginxHelperServer /opt/ruby-
1.9.1/lib/ruby/gems/1.9.1/gems/passenger-2.2.9 /opt/ruby-1.9.1/bin/ruby 3 4 0 6 0 300 1 www-data 33
33 /tmp/passenger.10786
12010 47.1 MB 3.9 MB Passenger spawn server
19978 247.9 MB 100.5 MB Rails: /home/apps/public_html/myapp/current
21326 221.6 MB 0.2 MB Passenger ApplicationSpawner: /home/apps/public_html/myapp/current
21356 223.6 MB 44.6 MB Rails: /home/apps/public_html/myapp/current
21364 223.6 MB 44.8 MB Rails: /home/apps/public_html/myapp/current
21372 223.6 MB 44.7 MB Rails: /home/apps/public_html/myapp/current
21380 221.6 MB 0.2 MB Passenger ApplicationSpawner: /home/apps/public_html/myapp/current
Note the nginx workers did not change pids (you can compare with the stats from yesterday)
Anytime the problem occurs, there's multiple ApplicationSpawner entries in the stats. When I ran
/etc/init.d/nginx stop, my stats looked like this:
-------- Nginx processes --------
### Processes: 0
### Total private dirty RSS: 0.00 MB
----- Passenger processes -----
PID VMSize Private Name
-------------------------------
12010 47.1 MB 3.9 MB Passenger spawn server
21326 221.6 MB 0.2 MB Passenger ApplicationSpawner: /home/apps/public_html/myapp/current
21380 221.6 MB 0.2 MB Passenger ApplicationSpawner: /home/apps/public_html/myapp/current
### Processes: 3
### Total private dirty RSS: 4.23 MB
Note the application spawners are not dying off. So I'm back to square one on this issue.
Until we get this fixed can anyone suggest a workaround? Maybe a cron job to kill off excess spawners etc.
Comment 27 by project member [email protected], Jan 29, 2010
You can use a cron job to kill off the spawn server once in a while. That should
automatically kill off the ApplicationSpawners too.
Comment 28 by [email protected], Jan 29, 2010
Problem confirmed here as well w/ Ubuntu 8.04, Passenger 2.2.5, Rails 2.1.2,
nginx/0.7.61.
I'll be applying the cron job fix tonight and will report back if the problem recurs.
This has brought our high-traffic site down many times, so hopefully the priority can
be bumped up a notch.
Thanks honglilai for your Passenger work - very much appreciated.
Comment 29 by project member [email protected], Jan 30, 2010
vitalaaron, are your symptoms the same? When the problem occurs are there multiple
ApplicationSpawners for the same app?
Comment 30 by [email protected], Feb 2, 2010
My app has survived since Friday without a crash since I implemented the cron job to kill the spawn server.
However, I'm still hesistant to say that the problem is fixed.
I thought I'd post my cron info if anyone wants it
# add this to root's cron job list
0 * * * * /opt/nginx/kill_spawn_server.sh
#kill_spawn_server.sh
#!/bin/sh
kill -9 `ps -ef | grep 'spawn server' | grep -v grep | awk '{print $2}'`
Look forward to finding out more about the root cause of this defect,
Steve
Comment 31 by project member [email protected], Feb 2, 2010
I haven't been able to find the cause until now, but Phusion Passenger 3 will come
with a bunch of cleanup changes for the spawn server. Maybe those changes would make
your kill cron job obsolete.
Comment 32 by [email protected], Feb 2, 2010
Thanks for the help @honglilai, I look forward to Passenger 3
Comment 33 by project member [email protected], Feb 7, 2010
I suspect that this problem might have something to do with a long-standing Safari
bug: https://bugs.webkit.org/show_bug.cgi?id=5760
Could you disable keep-alive, temporarily disable the cron job, and check whether the
problem still occurs?
Comment 34 by [email protected], Feb 8, 2010
@honglilai - I've disabled keep-alive (nginx users should set keepalive_timeout 0;) and turned off the cron job.
I'll monitor the situation and report back. By the way, the cron job has been a very successful, but not quite
perfect, workaround.
Comment 35 by [email protected], Feb 9, 2010
Since i disabled the keep alive + cron job, I haven't had any problems *yet*. However, looking at the nginx log
file, I see that the Errno::EPIPE exception is still being thrown. Can I presume that the problem isn't fixed then,
or does it still have a chance of being ok?
---
[ pid=1588 file=ext/nginx/HelperServer.cpp:478 time=2010-02-08 20:00:40.315 ]:
Couldn't forward the HTTP response back to the HTTP client: It seems the user clicked on the 'Stop' button in
his browser.
*** Exception Errno::EPIPE in Passenger RequestHandler (Broken pipe) (process 3140):
---
Comment 36 by project member [email protected], Feb 9, 2010
Before EPIPE errors were correlated with downtime, but now they don't, right? If so
then the EPIPE errors that you're getting now might be legit, i.e. people actually
clicked on Stop.
If you set the error log level to 'debug' then you can see whether it's legit or not,
like this:
error_log logs/error.log debug;
You should see something along the lines of this:
2010/02/02 12:09:27 [info] 56470#0: *23 kevent() reported that client closed
prematurely connection, so upstream connection is closed too while sending request to
upstream, client: 127.0.0.1, server: towertop.test, request: "GET
/web_apps/L1VzZXJzL2hvbmdsaS9Qcm9qZWN0cy90b3dlcnRvcA/logs/tail HTTP/1.1", upstream:
"passenger:unix:/passenger_helper_server:", host: "towertop.test:8000", referrer:
"http://towertop.test:8000/web_apps/L1VzZXJzL2hvbmdsaS9Qcm9qZWN0cy90b3dlcnRvcA/logs"
Notice the "reported that client closed prematurely connection, so upstream
connection is closed too while sending request to upstream" part, this indicates that
Nginx has detect that the HTTP client closed the connection.
Comment 37 by project member [email protected], Feb 11, 2010
Merging this issue with #435.
Status: Duplicate
Mergedinto: 435
Comment 38 by [email protected], Feb 18, 2010
@honglilai
I just posted a reply to your above question in issue #435 since this issue was
merged with it. Sorry for delay - I really thought I had voted for the issue but
apparently I hadn't, as I wasn't getting notifications.
Reported by