Skip to content

Fix PHP re2c scanner state machine translation bugs#3

Merged
niden merged 8 commits intophalcon:masterfrom
niden-code:master
Apr 8, 2026
Merged

Fix PHP re2c scanner state machine translation bugs#3
niden merged 8 commits intophalcon:masterfrom
niden-code:master

Conversation

@niden
Copy link
Copy Markdown
Member

@niden niden commented Apr 8, 2026

Summary:

  • Fixed infinite loop in Scanner::scanForToken() — bounds check was only at function entry; added it inside the outer while(IMPOSSIBLE) loop so cursor overruns during a single call are also caught
  • Fixed whitespace recognition — case '\t':, '\n':, '\r': were single-quoted (literal backslash+letter in PHP); changed to double-quoted so tab/newline/CR characters are actually matched
  • Fixed 6 broken state transitions (states 45, 59, 67, 73, 135, 138) — these states read $yych then used a bare break 2 which in PHP exits both the enclosing switch and the while(true), discarding the read value and resetting the state machine to state 0. Fixed by removing the explicit $yystate/break 2 and letting execution fall through to the successor case
  • Fixed bracket-quoted identifiers [...] — state 138 had the same bare break 2 bug, preventing state 139 from ever being reached; fixed with fall-through; corrected substr offset to capture the full [identifier] value including brackets
  • Fixed stale token values — the Token object is reused across scanForToken() calls; value, opcode, and len are now reset to null/0 at the start of each call

Root cause: In switch($yystate), a bare break 2 exits both the switch and the enclosing while(true). However, break 2 from inside a nested switch($yych) only exits the two switch levels, allowing the while to continue. States that performed a read-then-transition without a nested switch all shared this bug, silently resetting the state machine to state 0 and losing accumulated scan progress.

Test plan:

  • All 6 existing PHPUnit tests pass (vendor/bin/phpunit -c phpunit.xml)
  • php test_parser.php produces correct token trace and parser result for SELECT r.* FROM Robots r LIMIT 10
  • Bracket-quoted FQCN [Phalcon\Tests\Models\Invoices] parsed correctly
  • Bracket-quoted alias [First Name] parsed correctly

@niden niden requested a review from Jeckerson April 8, 2026 20:10
@niden niden self-assigned this Apr 8, 2026
@niden niden added the bug Something isn't working label Apr 8, 2026
@niden niden added this to Phalcon v6 Apr 8, 2026
@github-project-automation github-project-automation bot moved this to In progress in Phalcon v6 Apr 8, 2026
@niden niden merged commit b0ebec4 into phalcon:master Apr 8, 2026
0 of 6 checks passed
@github-project-automation github-project-automation bot moved this from In progress to Implemented in Phalcon v6 Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

Status: Implemented

Development

Successfully merging this pull request may close these issues.

1 participant