Server Running Status Report
Part I
This is a brief statement about the running status of USA server.
Here is the monitoring map of System Load. In order to get a good observation, the numbers below are 100 times the system load.
System Load Weekly Graph
System load means the numbers of process that wait for the CPU to deal with in a certain period. The system load value should not be over 3 in a single CPU system; and the value should not be over 6 in a dual CPU or dualcore CPU system. System performance will face a significant reduction if the system load is more than limit value.
The server we used in USA is dualcore CPU system so the system load value should not be over 6. You will read a 600 in the two graphs above if the system load value is 6.
From the System Load Daily Graph and System Load Weekly Graph, we can see that the system load value is over 6 ((system load)*100>600) from USA time 08:00 to 15:00, and the max system load value goes to 14.65.
Let us compare the System Load Daily Graph and Cpu Idle Percentage Graph at the same time segment.
Cpu Idle Percentage Graph
From the two graphs above we can see that from 10:00 to 13:00 (USA time), the System load value is over 12 and Cpu Idle Percentage is 0, and this situation happens when there are less than 15 people online at the same time in our website. We will do an analysis about it in Part II and see what the reason of it.
P.S: Established TCP Connections Graph at the same time segment.
Established TCP Connections Graph
Note: Established TCP Connections is not the number of people online. There can be more TCP connections based on one IP. According to the graph above, there are no more than 15 people in the website in peak and it lasts very short.
Part II: The cause of Fault
The records of some tables in Mysql database are too much and they are not optimized. It takes huge resources and time for the CPU to process those SQL statements so the system performance goes down because of that. Here are two reasons that can support the judge.
(1)The system load value goes up when there are more queries to the database.
Here is a compare of the trend for System Load Daily Graph and Mysql Query Per Second Graph at the same time segment.
Mysql Query Per Second Graph
(2)The statistics of SQL statements in a period.
Time: 2009.11.01 15:15:07 to 2009.11.02 18:52:23
Condition: Arrange the SQL statements from more to less according to their average process time and choose the first 5 SQL statements.
SQL Statements Average Process Time Table
The process time of SQL statements should not be over 1 second, or it will affect the performance of the system. The requirement is more strict in a busy system. But the SQL statements in the above table take long time to process and they take up too much CPU resource and time.
.
Part III: Stress Test
We did the stress test from 22:00 to 24:00 (USA time) because the site visit is smallest during this time period.
Time: Wed Nov 4 23:45:04 PST 2009
Tool: ab
Mode: We did 400 connections to anshex homepage within 1 or 2 seconds, and the number of concurrent connections is 50 each to simulate 50 users connect the website at the same time.
Result: Only show the key parameters
Time taken for tests: 60.76300 seconds
Percentage of the requests served within a certain time (ms)
50% 5601 # 50%Connect response time is about 5.601 seconds
66% 8321 # 66% Connect response time is about 8.321 seconds
75% 10117
80% 11354
90% 15128
95% 19928
98% 23425
99% 25579
100% 27017 (longest request)
Monitoring Map:
Analysis:
The connect response time is too long, and the system is not able to accept more connect when the system load is over 12 and Cpu Idle Percentage is 0 at the Test Point.
From the two graphs above we can see that when the test happens, the system load is more than 12 and Mysql Query Per Second is more than 90. In other words, the system resources are spending on processing SQL statements.
Postscript of Ab Test: There are limits for ab test and it did not involve the SQL statements in “SQL statements Average Process Time Table”. If there are 50 concurrent connections process SQL statements in “SQL statements Average Process Time Table”, the situation will be worse.