Detecting recursive dependencies in PHP composite values

A great deal of complexity in the Zend_Json_Encoder class (like having a static method that instantiates the hosting class) is due to the implementation of a recursive dependency check.

Composite values can contain parts that contain the whole. In general it’s a good feature for data, but it allows a program to exhaust processing power, by entering infinite recursion. A developer MUST avoid infinite recursion, and every developer knows it. So the recursive dependency check is a requirement of any trustworthy PHP to JSON encoder, and the Developer of the Zend_Json_Encoder class put a lot of effort into giving a satisfactory solution to the problem.

Issue 1: False positives

The implementation of the recursive dependency check goes like this: if an object is found twice while visiting an object then it’s considered a recursive dependency. Unfortunately that’s a necessary but not sufficient condition. In fact, as a reported bug made clear, it’s very easy (and useful) to craft a composite value with the same object twice, and no recursive dependency involved.

The bug fix was quite wacky. Firstly the Developer added a switch for turning on the check, and made the switch off by default. Secondly they added another switch for turning off the throw of an exception so that when a recursive dependency is detected a standard string is returned instead of encoding the recurring object again. The problem is that this mechanism still suffer the same issue: it cuts recursive dependencies as well as simple repetitions (false positives).

Nobody should ever turn on the recursive dependency check in the Zend_Json_Encoder class because the only reason for doing so is when a developer wants the encoding to be performed no matter what. This can be accomplished by turning the check on and exceptions off. But the presence of false positives makes it a bad choice anyway.

Issue 2: Arrays recur too

The implementation of the recursive dependency check takes into account objects, but arrays can recur too, and no check for arrays is available. This is like generating false expectations: a developer that turned on the check would expect all recurring dependencies being detected, even if some could be false positives, but this is not the case. The check on means only recurring/repeating objects will be found.

Observations

We’ll make now some little experiments, with recurring arrays and objects, and see how they get printed, serialized and json_encoded by PHP itself.

function show(&$var, $name)
{
    print_r("nn-------------------- $name --------------------nn");
    print_r($var);
    print_r("nserialized -> ");
    print_r(serialize($var));
    print_r("njson_encoded -> ");
    print_r(json_encode($var));
    print_r("n---");
}

//arrays with recursive dependency
$array1 = array();
$array2 = array();
$array1[] = &$array2;
$array2[] = $array1;
show($array1, '$array1 with recursive dependency');

//objects with recursive dependency
$object1 = new stdClass();
$object2 = new stdClass();
$object1->member1 = $object2;
$object2->member2 = $object1;
show($object1, '$object1 with recursive dependency');

//arrays without recursive dependency
$array1 = array();
$array2 = array();
$array1[] = $array2;
$array2[] = $array1;
show($array1, '$array1 without recursive dependency');

//objects without recursive dependency
$object1 = new stdClass();
$object2 = new stdClass();
$object1->member1 = $object2;
$object1->member2 = $object2;
show($object1, '$object1 without recursive dependency');

-------------------- $array1 with recursive dependency --------------------

Array
(
    [0] => Array
        (
            [0] => Array
                (
                    [0] => Array
 *RECURSION*
                )

        )

)

serialized -> a:1:{i:0;a:1:{i:0;a:1:{i:0;R:2;}}}
json_encoded -> 
Warning: json_encode(): recursion detected in /Users/aercolino/...
[[[null]]]
---

-------------------- $object1 with recursive dependency --------------------

stdClass Object
(
    [member1] => stdClass Object
        (
            [member2] => stdClass Object
 *RECURSION*
        )

)

serialized -> O:8:"stdClass":1:{s:7:"member1";O:8:"stdClass":1:{s:7:"member2";r:1;}}
json_encoded -> 
Warning: json_encode(): recursion detected in /Users/aercolino/...
{"member1":{"member2":{"member1":null}}}
---

-------------------- $array1 without recursive dependency --------------------

Array
(
    [0] => Array
        (
        )

)

serialized -> a:1:{i:0;a:0:{}}
json_encoded -> [[]]
---

-------------------- $object1 without recursive dependency --------------------

stdClass Object
(
    [member1] => stdClass Object
        (
        )

    [member2] => stdClass Object
        (
        )

)

serialized -> O:8:"stdClass":2:{s:7:"member1";O:8:"stdClass":0:{}s:7:"member2";r:2;}
json_encoded -> {"member1":{},"member2":{}}
---

Comparing the output in the cases with a recursive dependency, we see that PHP by itself detects recursion (we trust PHP, so we knew it MUST avoid recursion):

  • print_r() shows a *RECURSION* label
  • json_encode() shows a Warning
  • serialize() shows a metadata element labeled R: for arrays and r: for objects

Comparing the output in the cases without a recursive dependency, we also see that PHP doesn’t detect recursion. Hmm, almost.

  • print_r() doesn’t show any *RECURSION* label
  • json_encode() doesn’t show any Warning
  • serialize() doesn’t show any metadata element labeled R: for arrays… BUT does show r: for objects

In fact, the last case is a simpler version of the example given for reproducing the bug of the Zend_Json_Encoder class: a composite value with the same object twice.

It seems that the serialize function of PHP is affected by the same bug. Or should we assume that the R/r is for repetition instead of recursion? Anyway, the serialize function cannot be trusted.

Solution

We’re going to exploit the fact that print_r() emits a *RECURSION* label. If we tried to match the label in the string returned by print_r(), and no one existed, then it would mean that there is no recursion. But if we got a match, that one could be a false positive if user data contained the *RECURSION* substring.

So we need a means for getting rid of any of those user data substrings that could pollute our matches. Well, luckily serialize doesn’t use a *RECURSION* label in its metadata, so any that could occur would be user data.

function hasRecursiveDependency($value)
{
    //if PHP detects recursion in a $value, then a printed $value 
    //will contain at least one match for the pattern /*RECURSION*/
    $printed = print_r($value, true);
    $recursionMetaUser = preg_match_all('@*RECURSION*@', $printed, $matches);
    if ($recursionMetaUser == 0)
    {
        return false;
    }
    //if PHP detects recursion in a $value, then a serialized $value 
    //will contain matches for the pattern /*RECURSION*/ never because
    //of metadata of the serialized $value, but only because of user data
    $serialized = serialize($value);
    $recursionUser = preg_match_all('@*RECURSION*@', $serialized, $matches);
    //all the matches that are user data instead of metadata of the 
    //printed $value must be ignored
    $result = $recursionMetaUser > $recursionUser;
    return $result;
}

This solution has many advantages:

  1. is trustworthy (PHP does the job)
  2. works for objects as well as arrays
  3. is short and simple to understand
  4. can be applied before walking a value

And very few disadvantages:

  1. doesn’t allow to spot where the recursion occurs
  2. could be slow for complex structures
Posted in Criticism, Zend Framework
One comment on “Detecting recursive dependencies in PHP composite values
  1. Xunnamius says:

    Very nice solution. I was thinking of something along the same lines, but I really didn’t want to stop what I was doing to write the code for it.

    Truly ingenious, thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>