Skip to content

Identifier case normalization causes failures (aliases + properties) #105

@prrao87

Description

@prrao87

Summary

The Cypher parser/planner lowercases unquoted identifiers, but the underlying schema and projection layer remain case‑sensitive. So anything camelCase in the query (aliases or property names) gets implicitly converted to lowercase during planning, and then lookup fails because the schema still has the original casing.

See the graph-benchmark repo for a real example.

  • Query writes numFollowers → planner looks for numfollowers → schema has numFollowers → error.
  • Query writes p.isMarried → planner looks for ismarried → schema has isMarried → error.
  • So the bug is an internal mismatch: identifier normalization is happening in some parts of the pipeline but not consistently applied to schema lookup / binding.

Expected (Cypher‑like):

  • Unquoted identifiers should be case‑insensitive OR
  • Quoted identifiers should be supported to preserve case OR
  • Ideally, there should be a config file or something similar to disable normalization.

Example

Alias case

import pyarrow as pa
from lance_graph import GraphConfig, CypherQuery

people = pa.table({"id": [1, 2]})
follows = pa.table({"src": [1], "dst": [2]})
cfg = (
    GraphConfig.builder()
    .with_node_label("Person", "id")
    .with_relationship("FOLLOWS", "src", "dst")
    .build()
)
datasets = {"Person": people, "FOLLOWS": follows}

query = """
MATCH (a:Person)-[:FOLLOWS]->(b:Person)
RETURN count(a.id) AS numFollowers
ORDER BY numFollowers DESC
"""
res = CypherQuery(query).with_config(cfg).execute(datasets)
print(res.to_pylist())

Actual

 Schema error: No field named numfollowers. ... Did you mean 'numFollowers'?

Property case

import pyarrow as pa
from lance_graph import GraphConfig, CypherQuery

people = pa.table({"id": [1], "isMarried": [True]})
cfg = GraphConfig.builder().with_node_label("Person", "id").build()
datasets = {"Person": people}

res = CypherQuery("MATCH (p:Person) RETURN p.isMarried").with_config(cfg).execute(datasets)
print(res.to_pylist())

Expected

Case-insensitive lookup for unquoted identifiers, or support for quoted identifiers / config to disable normalization.

Actual

Schema error: No field named ismarried. Did you mean 'person.isMarried'?

Proposal

Rather than the user manually lowercasing all column names and aliases to make queries work, does it make sense to treat column names as case-insensitive? (like Postgres).

Environment

The following environment was used to test this:

lance-graph 0.4.0
Python 3.13
macOS Tahoe 26.2

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions