During a conversation on the TCPHP mailing list yesterday about frameworks and scalability I wrote a general reply on performance tuning for larger sites. The focus of this post is not to show performance related items to specific PHP frameworks since many bottlenecks actually apply before running the framework itself that should certainly be solved up front. Therefore in this posting I attempt to look at simple items that can be deployed in order to produce finer tuned systems.
I don’t believe I need to state why performance is important as we all know performance can make or break a site, application or even a business. To state up front; I do not deploy all of the methods listed here on every server, site and/or application as it truly depends on what I am expecting with traffic levels, server resource levels, etc.
There are many methods available to help you performance tune PHP. Some require little to no effort, while others take more time in analysis for potential opportunities.
The number one item to start tuning is to ensure that you are using an opcode cache. Opcode caching will cache and optimize PHP intermediate code, of which will give you a very large performance gain in comparison to straight PHP.
Your options here include APC, XCache, EAccelerator and Zend Platform. I recommend APC and XCache, you decide.
If you are utilizing APC a great way during a release is to prime the file which will store the file in the bytecode cache bypassing all of the filters. This is done by running each file through apc_compile_file. Now you can create your cache quickly and effectively. Here is an example PHP file to run this (you have to utilize apc.enable_cli=1 if you want to make this run from the CLI). The script below doesn’t contain much checking on your directory so you may want to add that in if you’d like.
if (!function_exists('apc_compile_file')) { echo "ERROR: apc_compile_file does not exist!"; exit(1); } /** * Compile Files for APC * The function runs through each directory and * compiles each *.php file through apc_compile_file * @param string $dir start directory * @return void */ function compile_files($dir) { $dirs = glob($dir . DIRECTORY_SEPARATOR . '*', GLOB_ONLYDIR); if (is_array($dirs) && count($dirs) > 0) { while(list(,$v) = each($dirs)) { compile_files($v); } } $files = glob($dir . DIRECTORY_SEPARATOR . '*.php'); if (is_array($files) && count($files) > 0) { while(list(,$v) = each($files)) { apc_compile_file($v); } } } compile_files('/path/to/dir');
The number of files you include can certainly limit your performance. While this is minimal, when you start to include a large number of files this starts to gain. When you are running a page, remember, that every time your page runs it has to include those files. If you are including a large number of files that you typically utilize in a bootstrap, merge those files at deploy time to create a single include file that is used through your bootstrap. Please note, that I am not stating to do this in development as that could lead to a maintenance nightmare but creating a release process where you would do this before deployment.
Loops can be expensive especially the more operations that you run in that loop. For instance take the following scenario where you need to retrieve a set of integer values from the browser that correspond with a particular record in that database and then output the row.
if (isset($_GET['ids'])) { foreach($_GET['ids'] as $id) { $rs = mysql_query('SELECT * FROM my_table WHERE my_id = ' . (int) $id); $row = mysql_fetch_assoc($rs); print_r($row); } }
if (isset($_GET['ids'])) { $ids = array_map('intval', $_GET['ids']); $ids = implode(',', $ids); $rs = mysql_query('SELECT * FROM my_table WHERE my_id IN (' . $ids . ')'); while($row = mysql_fetch_assoc($rs)) { print_r($row); } }
The examples above simulate a common mistake you see often in code. The first example shows us executing the same query per every ID where in the second we only use one query. This cuts down considerably in the time it takes. To generate the below I passed 15 id’s to hit a single table that had only 40 records (6 id’s did not actually exist).
Number of IDS: 15 The Bad Way: 0.0044221878051758 seconds The Good Way: 0.0011670589447021 seconds
Add on quite a bit more id values and a few joins in there and you have yourself a bottleneck building up.
memcache is a distributed memory object caching system to which we have a memcache extension in PHP. You can utilize memcache to store certain aspects of your data that is utilized often. Typically, you will store database results within memcache for quick access to ever changing content. If you are thinking about the query caching that RDBMS especially with MySQL 4.x read the section on memcaches front page.
Many applications connect to the database starting in the main file. You should not be doing this as the initial connection to the database still takes time. While it may be minor if you have quite a bit of content that you do not need to connect to the database you are saving your server precious resources and your database from having connections that do nothing. One way to implement a change like this is to use or create an abstraction layer modifying it to only connect on the first command sent to the database.
I am only going to say a few pointers here as each RDBMS implementation varies but there is general information that you should tend to follow.
This is the number one reason for poor performance on a database besides terrible data models is that of poor indexes. You should be hitting your indexes with just about every query. Check your explain plan, ensure that it is using the proper indexes and if they are not adjust it to ensure the speed is going to be adequate.
Further, ensure that you have enough memory to handle large indexes and more than just one at a time.
The more joins you have to make the harder you are making the database work. This slows the query down from being extremely fast to slowly taking on a load. You shouldn’t be joining 10 tables together on a page that is utilized a mass amount of the time.
Make sure to make usage of the query caching on your database this will increase performance for your queries that are run more often.
Archiving old data is a huge thing to make use of. Especially in tables that are ever growing and are getting into the millions of records. Archiving data that is used less often will speed up your query time since it will reduce the amount that is stored in the indexes as well as limiting the number of rows you are potentially scanning through.
Many larger companies when archiving old data will produce aggregate statistics on this archived data if it is going to be of use for the customer, client, etc.
As with RDBMS there are multiple web servers so here I will simply give some brief tips considering a few different areas. If you are looking for specific tuning such as apache there are articles scattered throughout the web and information on the Apache website to give you more details.
Utilizing GZIP to send your files to the browser you can effectively slim the file sizes down cutting down on your bandwidth, the amount of time the server needs to be connected with the user and increase the user experience by faster loading times.
To cut down on the amount of requests your web server needs to handle you can effectively combine most used CSS files and JavaScript files. Here you take all of the CSS files you utilize combine them and strip out the whitespace. This will save the user time when downloading the CSS file as well as saving you bandwidth. Do the same with your JavaScript files thus cutting down wasted server resources, bandwidth, etc.
CSS Sprites is the process of taking an image that you would have spliced or a set of icons and creating a single image to then use CSS to apply showing the correct image. Take a look at the CSS Sprites article on A List Apart.
If your content has not changed, you should be sending not modified headers and/or last modified date headers. You want the browser to cache information that isn’t changing so that they do not waste your resources. A browser will only look at the headers if the user has a cache of the content and it has not been modified saving you bandwidth, server resources and the user time.
The more modules you utilize the more memory and resources your web server will take up. If you do not use a module, disable it or remove it. Secondly a technique to perform better is to have 2 web servers, one for your dynamic content and one for the static content so that there is no overhead of loading PHP with the web server for each instance.
Now as I stated this was a very simplistic overview of some performance tuning options and does not take it to the nth degree of what you can do as there is always more performance tuning that can be done. Further there are many tools to benchmark the impact that each of these are making on your site but that is out of the scope of this post. If you have any suggestions or items I did not cover here simply write a comment.