Add Oracle SQL/PGQ support to Awesome-Text2GQL by ayoubmoussaid · Pull Request #68 · ldbc/Text2GraphQuery-DataGen

ayoubmoussaid · 2026-06-22T16:24:03Z

Summary

This PR adds Oracle SQL/PGQ support to Awesome-Text2GQL and introduces tooling to translate, validate, compare, and export Oracle SQL/PGQ datasets from Text2GQL-Bench.

The main goals are:

Convert framework/TuGraph-style schemas into Oracle SQL/PGQ artifacts.
Translate supported Cypher/GQL-like benchmark queries into Oracle SQL/PGQ GRAPH_TABLE queries.
Validate translated queries against Oracle Database.
Compare Oracle SQL/PGQ results with Neo4j Cypher results for the same benchmark records.
Export only validated Oracle SQL/PGQ records where Oracle and Neo4j results match.
Document the Oracle SQL/PGQ generation workflow and dataset preparation workflow.

New Dataset with only valid Oracle SQL/PGQ queries added is available at Text2GQL-Dataset

What Changed

Oracle SQL/PGQ implementation

Added Oracle SQL/PGQ support under app/impl/oracle_sqlpgq/, including:

Schema conversion from framework/TuGraph-style graph schemas to Oracle relational tables and property graph DDL.
Oracle SQL/PGQ query translation from Graph-IL.
Oracle SQL/PGQ AST visitor support.
Oracle DB client integration using python-oracledb.
Template-based Oracle SQL/PGQ corpus generation.
Query generalization and corpus combination utilities.
SQL/PGQ helper utilities.

Dataset preparation utilities

Added dataset_prep/ scripts for working with the published Text2GQL-Bench dataset:

Discover benchmark query files and graph import configs.
Translate Cypher/GQL-like records to Oracle SQL/PGQ.
Optionally validate translated SQL/PGQ against Oracle.
Analyze translation and runtime failures.
Compare Oracle SQL/PGQ results with Neo4j results.
Export validated Oracle SQL/PGQ dataset records.

Examples and documentation

Added examples for schema conversion, Oracle graph setup, Cypher-to-Oracle SQL/PGQ translation, template corpus generation, LLM-based corpus generation, and corpus combination.

Added detailed documentation for:

Oracle SQL/PGQ data generation workflows.
Dataset preparation workflows for Text2GQL-Bench.

The main README now points to those focused docs instead of duplicating long Oracle SQL/PGQ command sequences.

Why

This adds a practical Oracle SQL/PGQ path to the Text2GQL data generation workflow. It makes it possible to generate queries using LLMs, or take existing Text2GQL-Bench Cypher/GQL-like records, translate them into Oracle SQL/PGQ where supported, validate them against a live Oracle property graph, and compare results against Neo4j to identify records that are semantically safe to export.

Validation Notes

Final export validation stats:

Databases processed: 33
Total records selected: 22,407
Records considered for Oracle/Neo4j comparison: 20,653
Records exported: 19,633
Failed comparisons: 325
Skipped records: 2,449

Reviewer Notes: Source Dataset / Schema Issues Found

While validating Text2GQL-Bench records, the new tooling found several cases where the source Cypher/GQL-like query appears inconsistent with its own graph import schema. These are classified as unsupported instead of emitting potentially incorrect Oracle SQL/PGQ.

The following are representative examples for reviewers to inspect.

1. Relationship direction mismatch

Dataset: dev/Address
Record: bird_address_0

MATCH (t1:zip_data)<-[zip_code:ZIP_CODE]-(t2:country)
WHERE t2.county = 'ARECIBO'
RETURN sum(t1.households)

Why this matters:
The query uses the ZIP_CODE relationship in a direction that does not match the discovered import schema. The translator intentionally refuses to emit Oracle SQL/PGQ for this because reversing the relationship could change query semantics.

Invalid or mismatched label
Dataset: dev/Address
Record: bird_address_7

MATCH (t1:state)<-[state:STATE]-(t2:country)
WHERE t1.name = 'Alabama'
RETURN count(t2.county)

Why this matters:
The query references a label or relationship shape that does not align cleanly with the graph schema discovered from the dataset import config.

Invalid or missing property
Dataset: dev/FInancial_Financial_Management
Record: at2gsynth_financialfinancialmanagement_73

MATCH (b:BUDGET)-[alloc:AllocatedTo]->(a:ACCOUNT)
WHERE b.currency <> a.currency
RETURN b.budget_id, b.category, a.account_number,
       b.currency AS budget_currency,
       a.currency AS account_currency

Why this matters:
The query references properties that are not present on the corresponding schema elements. The tooling treats this as a source/schema mismatch and does not emit SQL/PGQ, because generating SQL against absent properties would produce invalid or misleading output.

- Updated pyproject.toml to include oracledb dependency. - Introduced new test suite for translating Cypher queries to Oracle SQL/PGQ. - Implemented dataset preparation tests for Oracle integration. - Added live tests for OracleDB client functionality. - Created query generalizer and template instantiator for Oracle SQL/PGQ. - Enhanced corpus combiner to handle Oracle-specific queries and validation. - Included schema parser for generating Oracle DDL statements.

…strict validation - Added support for node and edge primary key mappings in OracleSqlPgqQueryTranslator. - Introduced strict property validation to ensure properties are defined for variables. - Updated methods to normalize label maps and handle aggregate functions in WITH clauses. - Enhanced validation for translated queries, including handling of string predicates and label predicates. - Improved error handling for missing properties when strict validation is enabled. - Added new command-line arguments for validation timeout and fetch limit in dataset preparation. - Updated tests to cover new features, including primary key mapping and strict validation scenarios.

- Enhance `test_detect_unsupported_oracle_sqlpgq_features` with additional assertions for various unsupported query patterns. - Introduce `test_failure_analysis_groups_unsupported_query_shapes` to analyze failure signatures for unsupported queries. - Implement `test_failure_analysis_uses_manifest_for_invalid_schema` to validate schema direction and property checks against a manifest. - Add normalization tests in `test_compare_normalizes_temporal_strings_and_numeric_precision` and `test_compare_normalizes_oracle_and_neo4j_node_identity`. - Create tests for path normalization in `test_compare_normalizes_single_neo4j_path_to_flat_element_sequence`. - Include checks for nondeterministic limits in `test_compare_detects_nondeterministic_limit_without_order_by`. - Expose file stem label aliases in `test_loader_exposes_file_stem_label_aliases`.

…atches - Introduced `is_supported_correlated_optional_match` to validate correlated optional matches in Cypher queries. - Updated `detect_unsupported_features` to remove "optional_match" feature if correlated optional matches are supported. - Removed redundant optional match translation logic from `cypher2oracle_sqlpgq`. - Added comprehensive tests for various optional match scenarios, including correlated optional matches and their translations to SQL. - Improved handling of optional match clauses in the dataset preparation and query translation processes.

…ement - Implemented CypherSchema to manage and validate graph schema based on provided configuration. - Added methods for detecting validation issues in Cypher queries, including node and edge label checks, property validation, and unsafe numeric conversions. - Introduced utility functions for parsing Cypher variable labels, property references, and edge relationships. - Included comprehensive handling of schema name aliases and property types. - Ensured deduplication of validation issues for cleaner output.

…tion and numeric tolerance checks

- Introduced checks for unique schema ownership of properties in CypherSchema. - Added detection for unsafe temporal arithmetic in aggregate queries. - Improved handling of broad bounded variable length relationships in translation. - Updated tests to cover new features and edge cases, including disambiguation of complex aggregate property aliases. - Refactored unsupported feature detection to exclude expensive variable length paths. - Enhanced query translation to preserve real ID properties over pseudo identities. - Added stable tiebreakers for ordered queries with limits in comparison functions.

… tests

…EADME

…SQL/PGQ support

ayoubmoussaid added 11 commits May 11, 2026 10:08

feat: Add dataset preparation and Oracle vs Neo4j comparison utilities

56866af

feat: enhance Oracle SQL PGQ Translator with stage expression correla…

d1e8079

…tion and numeric tolerance checks

feat: enhance detection of unsupported features with new patterns and…

8716fb4

… tests

feat: add exporter for validated Oracle SQL/PGQ dataset and enhance R…

8fa68f1

…EADME

feat: update README and dataset preparation documentation for Oracle …

ef70dee

…SQL/PGQ support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Oracle SQL/PGQ support to Awesome-Text2GQL#68

Add Oracle SQL/PGQ support to Awesome-Text2GQL#68
ayoubmoussaid wants to merge 11 commits into
ldbc:masterfrom
ayoubmoussaid:master

ayoubmoussaid commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ayoubmoussaid commented Jun 22, 2026

Summary

What Changed

Oracle SQL/PGQ implementation

Dataset preparation utilities

Examples and documentation

Why

Validation Notes

Reviewer Notes: Source Dataset / Schema Issues Found

1. Relationship direction mismatch

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant