This is the second time I try to fix the preg_match bug with a number 50887. The first time I tried it, less than two months ago, my solution was buggy too, and for this reason I have unpublished it.

There are many different scenarios that I didn’t take into account, but almost all of them were related to How to count expected matches of a PHP regular expression. The rest were essentially the same: I thought that the preg_match bug was affecting just the last optional match. Later I found instead that it affects really all the matches at the end, any number of them, be they optional or not. Due to this little misunderstanding I had to rewrite a big portion of the function.

/**
 * Searches $subject for a match to the regular expression given in $pattern.
 * 
 * @see http://bugs.php.net/bug.php?id=50887 This is a fix for that bug
 * 
 * @param string  $pattern 
 *        The pattern to search for, as a string.
 * @param string  $subject 
 *        The input string.
 * @param array   $matches 
 *        If matches is provided, then it is filled with the results of search. 
 *        $matches[0] will contain the text that matched the full pattern, 
 *        $matches[1] will have the text that matched the first captured parenthesized 
 *        subpattern, and so on.
 * @param boolean $flags   
 *        flags can be the following flag: PREG_OFFSET_CAPTURE
 *        If this flag is passed, for every occurring match the appendant string offset 
 *        will also be returned. Note that this changes the value of matches into an 
 *        array where every element is an array consisting of the matched string at 
 *        offset 0 and its string offset into subject at offset 1.
 * @param integer $offset
 *        Normally, the search starts from the beginning of the subject string. The 
 *        optional parameter offset can be used to specify the alternate place from 
 *        which to start the search (in bytes).
 * 
 * @return integer
 *         preg_match() returns the number of times pattern matches. That will be either 
 *         0 times (no match) or 1 time because preg_match() will stop searching after 
 *         the first match. preg_match_all() on the contrary will continue until it 
 *         reaches the end of subject. preg_match() returns FALSE if an error occurred.
 */
function ando_preg_match( $pattern, $subject, array &$matches = NULL, $flags = NULL, $offset = NULL ) 
{ 
    $result = preg_match($pattern, $subject, $matches, $flags, $offset); 
    if (! $result) 
    {
        return $result; 
    }
    $missing = $flags == PREG_OFFSET_CAPTURE ? array('', -1) : '';
    $groups_count = ando_preg_count_groups($pattern, $named_groups, $numbered_groups);
    $matches_count = count($matches);
    $missing_matches = array();
    while ($groups_count != $matches_count - 1 + count($missing_matches))
    { 
        $last_group_offset['named'] = -1;
        $named_groups_count = count($named_groups);
        if ($named_groups_count) 
        { 
            $last_named_group = $named_groups[$named_groups_count - 1];
            $last_group_offset['named'] = $last_named_group[0][1];
            $last_group_name            = $last_named_group[1][0];
        } 
        $last_group_offset['numbered'] = -1; 
        $numbered_groups_count = count($numbered_groups);
        if ($numbered_groups_count) 
        { 
            $last_numbered_group = $numbered_groups[$numbered_groups_count - 1];
            $last_group_offset['numbered'] = $last_numbered_group[0][1]; 
        } 
        $last_group_type = array_search(max($last_group_offset), $last_group_offset);
        $missing_entries = array();
        if ('named' == $last_group_type) 
        { 
            if (! isset($matches[$last_group_name]))  //takes care of dupnames
            {
                $missing_entries[$last_group_name] = $missing;
            }
            $missing_entries[] = $missing;
            array_pop($named_groups);
        }
        else //numbered
        {
            $missing_entries[] = $missing;
            array_pop($numbered_groups);
        }
        $missing_matches = array_merge($missing_entries, $missing_matches);
    }
    $matches = array_merge($matches, $missing_matches);
    return $result; 
}

For the function ando_preg_count_groups(…) see the post How to count expected matches of a PHP regular expression.

Minimal Tests

The following tests are based on $regex = ‘(?J:(?Mon|Fri|Sun)(?:day)?|(?Tue)(?:sday)?|(?Wed)(?:nesday)?|(?Thu)(?:rsday)?|(?Sat)(?:urday)?)’;

.

Test 1a

echo "\n\nIn the next comparison we see the same result, and it really is, because
the count of expected matches equals the count of returned matches.\n";

preg_match("@$regex@", 'This matches on Saturday', $matches);
print_r($matches);

ando_preg_match("@$regex@", 'This matches on Saturday', $matches);
print_r($matches);

In the next comparison we see the same result, and it really is, because
the count of expected matches equals the count of returned matches.
Array
(
    [0] => Saturday
    [DN] => Sat
    [1] => 
    [2] => 
    [3] => 
    [4] => 
    [5] => Sat
)
Array
(
    [0] => Saturday
    [DN] => Sat
    [1] => 
    [2] => 
    [3] => 
    [4] => 
    [5] => Sat
)
Test 1b

echo "\n\nIn the next comparison we see above that the bug applies to all the matches at the end,
and below that my fix correctly returns all expected matches, including the empty ones.\n";

preg_match("@$regex@", 'This matches on Tuesday', $matches);
print_r($matches);

ando_preg_match("@$regex@", 'This matches on Tuesday', $matches);
print_r($matches);

In the next comparison we see above that the bug applies to all the matches at the end,
and below that my fix correctly returns all expected matches, including the empty ones.
Array
(
    [0] => Tuesday
    [DN] => Tue
    [1] => 
    [2] => Tue
)
Array
(
    [0] => Tuesday
    [DN] => Tue
    [1] => 
    [2] => Tue
    [3] => 
    [4] => 
    [5] => 
)
Test 1c

echo "\n\nIn the next comparison we see above the bug at its extreme,
and below that my fix works as well as in the other cases.\n";

preg_match("@$regex?@", 'This doesn\'t match on any day but still matches.', $matches);
print_r($matches);

ando_preg_match("@$regex?@", 'This doesn\'t match on any day but still matches.', $matches);
print_r($matches);

In the next comparison we see above the bug at its extreme,
and below that my fix works as well as in the other cases.
Array
(
    [0] => 
)
Array
(
    [0] => 
    [DN] => 
    [1] => 
    [2] => 
    [3] => 
    [4] => 
    [5] => 
)

The following tests are based on $regex = ‘(?|Saturday|Sun(day)?)’;.

Test 2a

preg_match("@$regex@", 'Sun', $matches);
print_r($matches);

ando_preg_match("@$regex@", 'Sun', $matches);
print_r($matches);

Array
(
    [0] => Sun
)
Array
(
    [0] => Sun
    [1] => 
)
Test 2b

preg_match("@$regex@", 'Sunday', $matches);
print_r($matches);

ando_preg_match("@$regex@", 'Sunday', $matches);
print_r($matches);

Array
(
    [0] => Sunday
    [1] => day
)
Array
(
    [0] => Sunday
    [1] => day
)
Test 2c

preg_match("@$regex@", 'Saturday', $matches);
print_r($matches);

ando_preg_match("@$regex@", 'Saturday', $matches);
print_r($matches);

Array
(
    [0] => Saturday
)
Array
(
    [0] => Saturday
    [1] => 
)

The following tests are based on $regex = ‘(?|(Sat)ur(day)|Sun(day)?)’;.

Test 3a

preg_match("@$regex@", 'Sun', $matches);
print_r($matches);

ando_preg_match("@$regex@", 'Sun', $matches);
print_r($matches);

Array
(
    [0] => Sun
)
Array
(
    [0] => Sun
    [1] => 
    [2] => 
)
Test 3b

preg_match("@$regex@", 'Sunday', $matches);
print_r($matches);

ando_preg_match("@$regex@", 'Sunday', $matches);
print_r($matches);

Array
(
    [0] => Sunday
    [1] => day
)
Array
(
    [0] => Sunday
    [1] => day
    [2] => 
)
Test 3c

preg_match("@$regex@", 'Saturday', $matches);
print_r($matches);

ando_preg_match("@$regex@", 'Saturday', $matches);
print_r($matches);

Array
(
    [0] => Saturday
    [1] => Sat
    [2] => day
)
Array
(
    [0] => Saturday
    [1] => Sat
    [2] => day
)