深入理解PHP的引用(References in PHP) huangguisu
为了深入理解PHP的引用,找到一篇老外的东西: http://derickrethans.nl/talks/phparch-php-variables-article
很多内容还是直接看英文版比较好,翻译过来有时候词不达意。
php在zend里面存储的变量,PHP中每个变量都有对应的 zval, Zval结构体定义在Zend/zend.h里面,其结构:
typedef struct _zval_struct zval; struct _zval_struct { /* Variable information */ zvalue_value value; /* The value 存储变量的值*/ zend_uint refcount__gc; /* 引用计数 */ zend_uchar type; /* 变量具体类型*/ zend_uchar is_ref__gc; /* 是否引用 1为引用,0不是*/ };后面也经常提到refcount 即refcount_gc (PHP5.3以后引入的垃圾收集机制)
PHP’s handling of variables can be non-obvious, at times.Have you ever wondered what happens at the engine level when a variable is copied to another? How about when a function returns a variable “by reference?” If so, read on.
PHP是弱语言,其变量处理的过程是不可见的。你是否曾经很想知道在变量复制的时候,PHP引擎做了什么?你是否曾经很想知道一个函数是如何以引用的方式返回一个变量?如果是这样,请您接着向下看。
Every computer language needs some form of container to hold data-variables. In some languages, those variables have a specific type attached to hem. They can be a string, a number, an array, an object or something else. Examples of such statically-typed languages are C and pascal. Variables in PHP do not have this specific restraint. They can be a string in one line, but a number in the next line. Converting between types is also easy to do, and often, even auto-matic. These loosely-typed variables are one of the properties that make PHP such an easy and powerful language, although they can sometimes also cause interesting problems. Internally, in PHP, those variables are all stored in a similar container, called a zval container (also called“variable container”). This container keeps track of several things that are related to a specific value. The most important things that a variable container contains are the value of the “variable”, but also the type of the variable. Python is similar to PHP in this regard as it also labels each variable with a type. The variable container contains a few more fields that the PHP engine uses to keep track of whether a value is a reference or not. It also keeps reference count of its value. Variables are stored in a symbol table, which is quite analogous to an associative array. This array has keys that represent the name of the variable, and those keys point to variable containers that contain the value (andtype) of the variables. See Figure 1 for an example of this.
总结就是变量存储在一个于类似关联数组的符号表中。
1 . 引用计数 Reference Counting
PHP tries to be smart when it deals with copying variables like in $a = $b. Using the = operator is also called an “assign-by-value” operation. While assigning by value, the PHP engine will not actually create a copy of the variable container, but it will merely increase the refcount__gc field in the variable container. As you can imagine this saves a lot of memory in case you have a large string of text, or a large array.Figure 2 shows how this “looks”.
In Step 1 there is one variable, a, which contains the text"this is" s and it has (by default) a reference count of 1.
In step 2, we assign variable $a to variable$b and$c. Here, no copy of the variable container is made, only the refcount value gets updated with 1 for each variable that is assigned to the container. Because we assign two more variables here, the refcount gets updated to 2 and ends up being 3 after the two assignment statements.
in step 3,Now, you might wonder what would happen if the variable $cgets changed. Two things might happen, depending on the value of therefcount. If the value is 1, then the container simply gets updated with its new value (and possibly its type, too). In case therefcountvalue is larger than 1, a new variable container gets created containing the new value (and type). You can see this in step 3 of Figure 2。Therefcount value for the variable container that is linked to the variable $ais decreased by one so that the variable container that belongs to variable$a and $b now has a refcount of 2, and the newly created container has a refcount of 1.
in step 4 ,When unsett( ) is called on a variable the refcount value of the variable container that is linked to the variable that is unset will be decreased by one. This happens when we call unset( $b ) in step 4. If the refcount value drops below 1, the PHP Engine will free the variable container.
in step 5,The variable container is then destroyed, as you can see in step 5.
2. 函数传值 Passing Variables to Functions
Besides the global symbol table that every script has, every call to a user defined function creates a symbol table where a function locally stores its variables. Every time a function is called, such a symbol table is created, and every time a function returns, this symbol table is destroyed. A function returns by either using the return statement, or by implicitly returning because the end of the function has been reached.
In Figure 3, I illustrate exactly how variables are assed to functions.
In step 1, we assign a value to the ariable $a, again—“this is”. We pass this variable to the do_something g( ) function, where it is received in the ariable $s.
In step 2, you can see that it is practically he same operation as assigning a variable to another ne (like we did in the previous section with $b = $a),except that the variable is stored in a different symbol table—the one that belongs to the called function—and that the reference count is increased twice, instead he normal once. The reason for this is that the function’s stack also contains a reference to the variable container.(原因是函数栈也包含了这个变量容器的引用)
in step 3 ,When we assign a new value to the variable $s in step 3, the refcount of the original variable container is decreased by one and a new variable container is created, containing the new variable.
In step 4, we return the variable with thereturn statement. The returned variable gets an entry in the global symbol table and the refcount value is increased by 1. When the function ends, the function’s symbol table will be destroyed. During the destruction, the engine will go over all variables in the symbol table and decrease therefcount of each variable container. When a refcount of a variable container reaches 0, the variable container is destroyed.
As you see, the variable container is again not copied when returning it from the function due to PHP’s reference counting mechanism.
If the variable $s would not have been modified in step 3 then variable$a and $b would still point to the
same variable container which would have a refcount value of 2. In this situation, a copy of the variable container that was created with the statement$a = = “ this is ” would not have been made
3. 介绍引用Introducing References
References are a method of having two names for the same variable. A more technical description would be: references are a method of having two keys in a symbol table pointing to the same zval container. References can be created with the reference assignment operator &=.
Figure 4 gives a schematic overview of how references work in combination with reference counting.
Instep 1, we create a variable$a that contains the string “this is”.
Instep 2,Then in step two we create two references ($b and $c)to the same variable container. The refcount increases normally for each assignment making the final refcount 3, after both assignments by reference ($b =& $a and $c =& $a ), but because the reference assignment operator is used, the other valueis_ref is now set to 1.
This value is important for two reasons.The second one I will divulge a little bit later in this article(后面将会说明第二原因), and the first reason that makes this value important is when we are reassigning a new value to one of the three variables that all point to the same variable container. If the is_ref value is set to 0 when a new value is set for a specific variable, the PHP engine will create a new variable container as you could see in step 3 of Figure 2. But if the is_ref value is set to 1, then the PHP engine will not create a new variable container and simply only update the value to which one of the variable
names point as you can see in step 2 of Figure 4.
In step 3, The exact same result would be reached when the statement $a = 42 was used instead of$b = 42. After the variable container is modified, all three variables$a, $band $c will contain the value 42 .
In step 4, we use theunset() language construct to remove a variable—in this case variable $c. Using
unset() on a variable means that therefcount value of the variable container that the variable points to gets decreased by 1. This works exactly the same for referenced variables. There is one difference, though, that shows in step 5.
In step 5 When the reference count of a variable container reaches 1 and the is_ref value is set to 1, the is_ref value is reset to 0. The reason for this is that a variable container can only be marked as a referenced variable container when there is more than one variable pointing to the variable container.
4 .混合变量直接赋值和引用赋值 Mixing Assign-by-Value and Assign-by-Reference
混合方式系,并没有节约内存空间,反而增加了。这个由于引用赋值后需要重新分配一份内存给引用的变量。
Something interesting—and perhaps unexpected—happens if you mix an assign-by-value call and an assign-by-reference call. This shows in Figure 5.
In step 1,In the first step we create two variables$a and$b, where the latter is assigned-by-value to the former. This creates a situation where there is one variable container withis_ref set to 0 and r re ef fc co ou un nt t set to 2. This should be familiar by now.
In step 2 we proceed by assigning variable$c by reference to variable$c. Here, the PHP engine will create a copy of the variable container. The variable$akeeps pointing to the original variable container but the refcount is, of course, decreased to 1 as there is only one variable pointing the this variable container now. The variables $b and$c point to the copied container which has now arefcount of 2 and theis_ref value is set to 1.
You can see that in this case, using a reference does not save you any memory, it actually uses more memory, as it had to duplicate the original variable container.
The container had to be copied, otherwise the PHP engine would have no way of knowing how to deal with the reassignment of one of the three variables as two of them were references to the same container$b and$c, while the other was not supposed to be a reference. If there is only one container with r re ef fc co ou un nt t set to 3, andis_ref set to 1, then it is impossible to figure that out. That is the reason why the PHP engine needs to create a copy of the container when you do an assignment-by-reference.
If we switch the order of assignments—first we assign $a by reference to $b and then we assign $a by value to $c—then something similar happens. Figure 6 shows how this is handled.
In step 1, In the first step we assign the variable $a to the string “this is” and then we proceed to assign $a by reference to variable$b. We now have one variable container whereis_ref is 1 and refcount is 2.
In step 2, , we assign variable $a by value to variable $c, now a copy of the variable container is made in order for the PHP engine to be able to handle modifications to the variables, correctly, with the same reasons as stated in the previous paragraph.But if you go back to step 2 of Figure 2, where we assign the variable$ato both$b and$c, you see that no copy is made here.
5. 函数引用传递Passing References to Functions
Variables can also be passed-by-reference to functions. This is useful when a function needs to modify the value of a specific variable when it is called. The script in
Figure 7 is a slightly modified version of the script that you have already seen in Figure 3.
The only difference is the ampersand (&) in front of the$s variable in the declaration of the functiondo_something(). This ampersand instructs the PHP engine that the variable to which the ampersand is applied is going to be passed by reference and not by value. A different name for a passed-by-reference variable is an “out variable”. When a variable is passed by reference to a function the new variable in the function’s symbol table is pointed to the old container and the refcount value is
increased by 2 (one for the symbol table, and one for the stack). Just as in a normal assignment-by-reference the is_ref value inside the variable container is also set to 1 as you can see in step 2. From here on, the same things happen as with a normal reference like in step 3,where no copy of the variable container is made if we assign a new value to the variable$s.
The refcount $s ; statement is basically the same as the $c = $a statement in step 2 of Figure 6. The global varible$a and the local variable $s are both references to he same variable container and the logic dictates that is_ref is set to 1 for a specific container and this conainer is assigned to another variable by-value, the conainer does not need to be duplicated. This is exactly hat happens here, except that the newly created varible is created in the global symbol table by the assignment of the return value of the function with the statement $b = do_something( $s ).
6 . 函数引用返回 Returning by Reference
Another feature in PHP is the ability to “return by reference”. This is useful, for example, if you want to select a variable for modification with a function, such as
selecting an array element or a node in a tree structure. In Figure 8 we show how returning by references work by means of an example.
In step 1,In this example (step 1), we define a $tree variable (which is actually not a tree, but a simple array) that contains three elements. The three
elements have key values of 1, 2 and 3, and all of them point to a string describing the English word that matches with the key’s value (ie.one, two and three).
In step 2,This array gets passed to the fiind_node()function by reference, along with the key of the element that thefiind_node() function should look for and return. We need to pass by reference here, otherwise we can not return a reference to one of the elements, as we will be returning a reference to a copy of the $tree . When $tree is passed to the function it has arefcount of 3 andis_refis set to 1. Nothing new here.
In step 3,The first statement in the function, $item = & $node[$key], causes a new variable to be created in the symbol table of the function, which points to the array element where the key is “3” (because the variable$key is set to 3). In this step 3 you see that the creation of the$item by assigning it by reference to the array element causes therefcountvalue of the variable container that belongs to the array element to be increased by 1. Theis_refvalue of that variable container is now 1, too, of course.
In step 4,The interesting things happen in step 4 where we return $item (by reference) back to the calling scope and assign it (by reference) to $node. This causes therefcountof the variable container to which the 3rd array key points to be set to 3. At this point $tree, $item (from the function’s scope) and $node (global scope) all point to this variable container.
In step 5, When the symbol table of the function is destroyed (in step 5), therefcount value decreases from 1 to 2.$node is now a reference to the third element in the array. If the variable$item would not have been assigned by reference to the return value of the do_something() function, but instead would have been assigned by value, then$node would not have been a reference to $tree[3]. In this case, therefcount value of the variable container to which $tree[3] points is then 1 after the function ends, but for some strange reason theis_refvalue is not reset to 0 as you might expect. My tests did not find any problems with this, though, in this simple example. If the function do_something() would not have been a “return-by-reference function”, then again the $node variable would not be a reference to $tree[3]]. In this case, theis_ref value of the variable ( $tree )container would have been reset to 0.
In step 6,Finally, in step 6, we modify the value in the variable container to which both$node and $tree[3] point.
Please do note that it is harmful not to accept a reference from a function that returns a reference. In some cases, PHP will get confused and cause memory corruptions which are very hard to find and debug. It is also not a good idea to return a static value as reference, as the PHP engine has problems with that too. In PHP 4.3, both cases can lead to very hard to reproduce bugs and crashes of PHP and the web server. In PHP 5, this works all a little bit better. Here you can expect a warning and it will behave “properly”. Hopefully, a backported fix for this problem makes it into a new minor version of PHP 4—PHP 4.4.
7.The Global Keyword
PHP has a feature that allows the use of a global variable inside a function: you can make this connection with the g gl lo ob ba al l keyword. This keyword will create a ref-
erence between the local variable and the global one. Figure 9 shows this in an example.
In step 1 and 2, we create the variable$varand call the functionupdate_var() with the string literal“one” as the sole parameter. At this point, we have two variable containers. The first one is pointed to from the global variable $var, and the second one is the $val the functionupdate_var() with the string literal“one” as the sole parameter. At this point, we have two variable containers. The first one is pointed to from the global variable $var, and the second one is the $val variable in the called function. The latter variable container has arefcount value of 2, as both the variable on the stack and the local variable$val point to it.
In step 3,The global $var statement, in the function, creates a new variable in the local scope, which is created as a reference to the variable with the same name in the global scope. As you can see in step 3, this increases the refcount of the variable container from 1 to 2 and this also sets the is_ref value to 1.
In step 4, we unset the variable $var . Against some people’s expectation, the global variable $vardoes not get unset—as theunset() was done on a reference to the global variable$varand not that variable itself.
In step 5, To reestablish the reference, we employ the global keyword, again in step 5. As you can see, we have re-created the same situation as in step 3. Instead of using global $var we could just as well have used $var ==&$GLOBAL[ [‘var’] as it would have created the exact same situation.
In step 6, we continue to reassign the $var variable to the function’s $val argument. This changes the value to which both the global variable$var and the local variable$var point; this is what you would expect from a referenced variable. When the function ends,
In step 7, the reference from the variable in the scope of the function disappears, and we end up with one variable container with arefcount of 1 and anis_ref value of 0.
8. (勿滥用引用)Abusing References
In this section, I will give a few examples that show you how references should not be used—in some cases these examples might even create memory corruptions
in PHP 4.3 and lower.
Example 1: “Returning static values by-reference”. In Figure 10, we have a very small script with a return-by-reference function called definition().
This function simply returns an array that contains some elements. Returning by reference makes no sense here, as the exact same things would happen internally if the variable container holding the array was returned by value, except that in the intermediate step (step 3) the is_ref value of the container would not be set to 1, of course. In case the$defvariable in the function’s scope would have been referenced by another variable, something that might happen in a class method where you do $def = $this->def then the return-by-reference properties of the function would have copied the array, because this creates a similar situation as in step 2 of Figure 5.
Example 2: “Accepting references from a function hat doesn’t return references”. This is potentially dan-gerous; PHP 4.3 (and lower) does not handle this properly. In Listing 1, you see an example of something that is not going to work properly.
<?php function &split_list($emails) { $emails =& preg_split(“/[,;]/”, $emails); return $emails; } $emails = split_list(‘[email protected];[email protected];[email protected]’);
This function was implemented with performance in mind, trying not to copy variable containers by using references. As you should know after reading this article, this is not going to buy you anything. There are a few reasons why it doesn’t work. The first reason is that the PHP internal function preg_split() does not return by reference—actually, no internal function in PHP can return anything by reference. So, assigning the return value by reference from a function that doesn’t return a reference is pointless. The second reason why there is no performance benefit, here, is the same one as in Example 1, in the previous paragraph: you’re returning a static valuenot a reference to a variable—it does not make sense to make thes split_list() function to return-by-reference.
9. 总结 Conclusion
After reading this article, I hope that you now fully understand how references, refcounting, and variables work in PHP. It should also have explained that assigning by reference does not always save you memory—it’s better to let the PHP engine handle this optimization. Do not try to outsmart PHP yourself here and only use references when they are really needed. In PHP 4.3, there are still some problems with references, for which patches are in the works. These patches are backports from PHP 5-specific code, and although they work fine, they will break binary compatibility—meaning that compiled extensions no longer work after those patches are put into PHP. In my opinion, those hard to produce memory corruption errors should be fixed in PHP 4 too, though, so perhaps this creates the need for a PHP 4.4 release. If you’re having problems, you can try to use the patch located at http://files.derickrethans.nl/patches/ze1-return-refrence-20050429.diff.txt The PHP Manual also has some information on references, although it does not explain the internals very well. The URL for the section in PHP’s Manual is
http://php.net/language.references