http://grokbase.com/t/gg/mongodb-user/1266hzp11z/slow-query-hang-all-the-replica
Hi!
I have a mongodb system with 1 master (v1.8.2) and 1 replica (was
v1.8.2 but i've recently upgraded to v2.0.6), both Linux with 4GB RAM,
with 1 database with 16 tables but only one is big (442k documents)
and is getting me into troubles. I run 98% of the queries on the
replica.
Database and bigger table (news) stats: http://pastebin.com/qZu8cB1U
The problem:
I noticed that from time to time, my replica use to hang for some
seconds (can be 10 seconds, can be 1 minute - usually is about 1
minute). This is obviously a undesired situation of unavailability.
To find out what was going on:
1- I've put mongostat writing into a file for watching later, and..
after a couple of hours, i was able to see this periods on the log
file. Example: http://pastebin.com/JBs0xwWx . We can see the values on
the faults column getting high. On this moments, the iostat %util is
beating on 97 - 98.
2- I've activated the database profiler for queries taking more then 3
seconds:
Here we could see that two heavy queries reached the server and the
next queries hang on waiting.
I could see too that the other queries finish almost all at the same
time (next milliseconds).
Profiling info: http://pastebin.com/0ZENXUDA
3- I've put a script watching mongostat, detecting this problematic
situations and saving db.currentOP() into a file. Here is one of the
results: http://pastebin.com/6iR45mbi .
Here we can see that queries are active, some running for more than 30
seconds, seems that they are not "waitingForLock" but, as we can see
on the profiling data, the most of them look like pretty trivial
(nscanned: 62, nreturned : 16).
At this point i can't understand how the trivial queries take so long
time to run, after running one or two heavy queries.
Does somebody have an idea about what is going on and how to avoid
that 1 or 2 queries damage the all server well functioning?
Thank you.
:
Hi David,
I notice this question didn't have any follow up .. have you been able to
work out the cause for your intermittent slowdown?
Based on the mongostats information and your high rate of page faults
immediately after the batch of updates, it looks like you are experiencing
memory pressure (i.e. your server is bogged down swapping from memory to
disk and becomes unresponsive for a period of time). You could confirm
this by watching the swap activity reported by `vmstat 1`.
Is your replica set in MMS (https://mms.10gen.com/help/install.html)?
Would also be helpful to install Munin for resource monitoring.
~