Skip to content

feat/refactor: 'expr.transform()'; fixes & refactors to 'replace()' and rule-application#301

Open
samueltlg wants to merge 15 commits intocortex-js:mainfrom
samueltlg:expr-transform
Open

feat/refactor: 'expr.transform()'; fixes & refactors to 'replace()' and rule-application#301
samueltlg wants to merge 15 commits intocortex-js:mainfrom
samueltlg:expr-transform

Conversation

@samueltlg
Copy link
Copy Markdown
Contributor

@samueltlg samueltlg commented Apr 11, 2026

(Code commentary incoming)

Sorry that this was delayed again... Upon thinking it was ready to go, encountered a new set of problems/niggles, requiring a fair amount of investigation.
The change here (fixes) are the sum of quite a lot of tedious debugging!

Most of these are fixes - and an update to 'replace()' - but you may wish to discard the head/feature change here (transform()). Personally - think it is very useful (not something have added to the personal/bug-fix fork up to this point, but is something will certainly use...)

(For reference, the failing 'compilation-performance' tests, were notably also failing before any of the changes made here))


Update:

Queries & points of contention (after scanning through changes):

  • 'transform()'
    • The naming of this method is at your discretion. I think 'transform' is on-point enough (designation here borrowed from other libraries).
    • TransformOptions option 'targets'
      • Having this as an alternative to 'match' may not be to your taste; but presently this offers a solution and matching route not offered by traditional pattern-matching (- referential-identity; predicate-based). An alternative altogether, or a merge with 'match', may be preferred.
      • !There still exists the problem of this not being able to match against 'cached' (common) engine-exprs (One,Zero,Infinity...): although, for the majority of use-cases, it is hard to imagine the requirement for applying transformations to these expressions anyway (perhaps some cases of 'replace').

Identified further bugs, for reference:

  • In a 'replace()' context...:

    • During expression matching, the non-canonical expr. is used... Sometimes, however, consequently there can be an unfortunate non-match where a match would otherwise take place. Example:
      • ce.expr(['Exp', 3]).match(['Exp', '_')]) will not match, due to the non-canonical pattern matching against the canonical variant of ['Exp', 3], which notably canonicalizes to a 'Power' FN expr. (['Power', 'ExponentialE', 3]).
      • BoxedString has a missing replace() method (such that replacing always to return 'null').
  • As highlighted in fix (workaround): ensure intended canonical - non-canonical - match pattern...:

    • Requesting the canonical variant of wildcard-containing expressions causes entries (BoxedValueDefinition) to be added to global scope for the captured wildcard symbol (e.g. _ or _c).
      • If this point were addressed, this workaround should be able to be retracted.
  • A couple of extant issues present with the 'structural' Expression form:

    • Requesting the structural form of canonical vs. non-canonical exprs. presently can produce differing output.
      • For the case of a Rational number (representation), for inputs as produced by either ce.parse('2/5', {form: ...}) or ce.expr(['Divide', 2, 5], {form: ...}):
        • In case of boxing/parsing with form 'structural' or 'raw' (and then 'expr.structural'), the representation is 'Divide' (same in both cases)
        • In case of first obtaining the canonical variant, and then requesting the structural, the result is instead a BoxedNumber (type = rational, with an ExactNumericValue value).
      • For the case of an expression with an associative/commutative function head:
        • If obtaining a structural form through requesting this from a canonical expression, then the result will be both sorted (the operands) and flattened.
        • If in contrast boxing/parsing directly as 'structural' - or requesting the 'structural' formfrom of a 'raw' expr. - then expression/operands are neither sorted, or expression flattened.
      • ^Personally here have identified the source of this to be the fact that the 'canonical' variant has access to the OperatorDefinition: whilst in other cases not.

Some possible future directions:

  • expr.subs() - and possibly one or two others (methods) - could also admit a 'form' specification over 'canonical'.
  • If sticking with keeping expr.transform():
    • As per above, TransformOptions 'targets' could be merged with 'match' - or some category of alternative - according to preference.
    • A verbose switch for this method: with secondary/meta data returned such as stats on replacement quantity, replacement-expression instances (or a map of replaced->replacements), transformation-specific data (e.g. 'steps' for simplify). Perhaps overall expression validity
    • An async variant commensurate to 'evaluateAsync()', and others.

…attern variant comparison (rule application)

(this change is marked as a *workaround* rather than a 'fix' per se, since the issue it tackles instead likely has its root in the current behaviour regarding the binding of 'wildcard' symbol variants (-outside the context of pattern matching / replacement.
(See the closing of this message for a further remark on this point.))

Explanation:
- Currently, because canonicalization of expresssions involves wildcard  binding as part of symbol binding (this may be undesirable / a bug), it is presently necessary that a new scope be created upon 'match' pattern canonicalization (performed for optimization purposes) in rule application:  with this otherwise running the risk of present wildcard symbols being bound to definitions in the current scope...

For example, before this change:
```
ce.expr(['Add', 'x', 3]).replace({ match: ['_', '__'], replace: 'y'});

// →Expectedly captures wildcards, and replaces
// top-level expression.
// *However*, the canonical-request of the 'match' expression within 'applyRule'
// results in symbol/wildcard '_' being attributed a function-definition
// (type='function'): within the scope in which this 'expr.replace()' was called

// Consequently, a subsequent replace call involving a universal wildcard
ce.expr(['Add', 'x', 3]).replace({ match: ['Add', 'x', '_')], replace: 5 });

// → Returns 'null'... (assuming this call made with an unmodified/same scope as the previous)
// Ultimately because the same point-of-canonicalization results in the Universal
// Wildcard in this instance uptaking the previous, 'function' definition, resulting
// in a MathJson-internal type-error for the canonicalized match-pattern variant
// in this instance, resulting in an absent wildcard in the canonical variant,
// and leading to an early exit from rule-application 'applyRule()'

//For illustration, the canonical-variant of the initial, non-canonically boxed
// `['Add', 'x', 3]` pattern results in the canonical-variant as:

[ "Add", "x", [ "Error", [ "ErrorCode", "'incompatible-type'", "number", "function", ], ], ]
```

*Note*:
- This 'fix' may no longer be necessary, if canonicalization of wildcard-containing expressions were to no longer perform name-binding on these (this may be unintentional?)
…ition'

Before: passing around a rule with '{ match: .., condition: 'non-delimited LatexString }' would fail to yield/parse a condition.

Now: cases such as the previous result in a parsed condition (yet, presence of display/inline mode delimiters *should* still be admissible)
…fyNonCommutativeFunction

(This inadvertently resulted in *full*-rule simplification of all operands for all functions
Correcting this has resulted in *no broken tests*)
…umeric operands of 'lazy' operator definitions

Explanation
------------
'evaluateNumericSubexpressions()' (function, which is notably called
within the 'simplify' process only for operator-definitions with flag
'lazy' set to 'true', and operator 'Divide', presently fully evaluate
all (more-or-less) numeric operands during simplification.
At least a couple of problems exist with this, mentionably that
'simplify' is, use-case depending, likely not called with intention to
*evaluate all numeric* operands/expressions. However, related to this,
and really a more primary issue here, is that proceeding with this
unconditionally is troublesome in terms of simplifying in the context of
any given rules: particularly when the ruleset is a custom one, or an
incomplete subset of the default ruleset.
For example, consider the simplification of an 'Add' expression with
evaluable 'Ln' or 'Log' arguments... If the accompanying simplify
rule-set does not with it include rules covering the simplification
(perhaps, 'evaluation') of this set or category of functions, then it is
dubious that these should be evaluated prior to be being summed...
(instead - given a ruleset consisting of a single rule which handles
'Add', an input such as 'Ln(e) + Ln(e)' should understandably reduce
this to '2 * Ln(e)' (instead of simply '2'); the case therefore of a
fuller simplification being dependent on presence of separate rules
handling Ln/Log.
(Another illustrative example, is that at time of commit, input '3!'
will remain unsimplified (unevaluated)... illustrating the need for a
requiring/wrapping operator (with a 'lazy' flag) for its evaluation)

\*Stemming from this consideration, it may be worth considering a
single, or set of 'evaluation'-rules: decoupled from respective
simplification rules (plain/eager evaluation of trig., vs.
identities/construction, for example)
… considering replacement expr. sameness, also consider expression 'form'

- A switch to 'form' allows specification of 'structural', and also an explicit specification of 'raw'.
- As per inline documentation, the same/previous behaviour can be replicated, by first ensuring the entire input bears the same form as any speified replacement form.

- When considering 'successful' rule application, any differences in expression form before-> after is now considered, in spite of these anyway bearing structural similiarity.

- The same change would likely benefit from being made on `expr.subs()`.
In disuse: but set for employment particularly in test-files
'transform()' is a wrapper around 'replace()' - always acting recursively - and offers an ergonomic means to match sub-expression targets and apply common fundamental operations/transformations in an expression 'tree', without the need for custom 'Rule' logic / workarounds.

Change includes tests.

- The typing of 'TransformOptions' has necessitated that 'SimplifyOptions' be shifted from its previous home of 'types-definitions(.ts)', to 'types-kernel-evaluation.ts'. It has also been made generic over Expr/SemiExpr/CE, in a similar way as for its sibling types (such as 'RuleReplaceFunction').

if (canonical && match) {
const awc = getWildcards(match);
pushSafeScope(ce);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(See commit message - this change indeed may not be necessary, if the described present behaviour of the wildcard-symbol canonicalization (global-scope) is not as it should be).

* **Default** is: `'left-right'` (standard post-order)
*
*/
direction: 'left-right' | 'right-left';
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At your discretion! (this is anyway achievable in a workaround-way with custom rule-functions and what-not).

(I did also intend to update the 'once' rule such that it would be typed as something like 'one-rule' | 'one-replacement' | false: where 'one-rule' would assume the old behaviour of 'true', but 'one-replacement' would command an even stricter policy (currently 'true' applies at the rule level: i.e. a rule may apply multiple times; 'one-replacement' would further restrict this to one replacement/within one rule). This would also somewhat add a 'raison d'être' for this property ('direction'), too.
However, again, both of these can, with a bit of trouble, be achieved through RuleFunctions.)

// Normalize the condition to a function
let condFn: undefined | RuleConditionFunction;
if (typeof condition === 'string') {
const latex = asLatexString(condition);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking back on this did not appear to break anything. Concludes that it was one of either an outdated or mistaken check... the 'asLatexString' call appears to presently require that any string [input] be delimited by an '$' or '$$'

// canonical one. This allows subsequent .canonical calls to perform full
// canonicalization.
if (isFunction(expr) && expr.isCanonical) {
if (isFunction(expr) && forms.length > 0) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was not a test to this... but changing this appeared to correct some behaviours whilst outlining the tests for transform. Throughout the partial-canonicalization process, appears that sometimes the expression is 'intermediarily' not marked as canonical.

@@ -407,7 +407,7 @@ function simplifyNonCommutativeFunction(
let last = result.at(-1)!.value;
if (last.isSame(expr)) return steps;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small oversight... but was inadvertently applying the entire default rule-set to operands! (indeed; no test failures result)

type EvaluateOptions = KernelEvaluateOptions;
type Rule = KernelRule<Expression, ExpressionInput, ComputeEngine>;
type BoxedRule = KernelBoxedRule<Expression, ComputeEngine>;
type BoxedRuleSet = KernelBoxedRuleSet<Expression, ComputeEngine>;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(These re-constructions only previously used by SimplifyOptions (now re-located))


/**
*
* Process and transform this expression *recursively* by applying one of a set of predefined
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This doc.) an outline of justification for this method

* @category Boxed Expression
*/

export type SimplifyOptions<
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly not the perfect place for this (i.e. 'evaluate') - but seemed most sensible given the options

/** Available transformation types for `Expression.transform()`
*
* @category Boxed Expression */
export type Transformation =
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Others could be possible)

* ::Note
* The 'extended' matching routes available here are unique to *transform()* and facilitate
* convenient and more expressive matching in the context of recursive traversal.*/
targets?: Expr | Expr[] | MatchConditionFunction<Expr>;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible point of contention: distinct matching route from 'match'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant