Add Oracle SQL/PGQ support to Awesome-Text2GQL#68
Open
ayoubmoussaid wants to merge 11 commits into
Open
Conversation
- Updated pyproject.toml to include oracledb dependency. - Introduced new test suite for translating Cypher queries to Oracle SQL/PGQ. - Implemented dataset preparation tests for Oracle integration. - Added live tests for OracleDB client functionality. - Created query generalizer and template instantiator for Oracle SQL/PGQ. - Enhanced corpus combiner to handle Oracle-specific queries and validation. - Included schema parser for generating Oracle DDL statements.
…strict validation - Added support for node and edge primary key mappings in OracleSqlPgqQueryTranslator. - Introduced strict property validation to ensure properties are defined for variables. - Updated methods to normalize label maps and handle aggregate functions in WITH clauses. - Enhanced validation for translated queries, including handling of string predicates and label predicates. - Improved error handling for missing properties when strict validation is enabled. - Added new command-line arguments for validation timeout and fetch limit in dataset preparation. - Updated tests to cover new features, including primary key mapping and strict validation scenarios.
- Enhance `test_detect_unsupported_oracle_sqlpgq_features` with additional assertions for various unsupported query patterns. - Introduce `test_failure_analysis_groups_unsupported_query_shapes` to analyze failure signatures for unsupported queries. - Implement `test_failure_analysis_uses_manifest_for_invalid_schema` to validate schema direction and property checks against a manifest. - Add normalization tests in `test_compare_normalizes_temporal_strings_and_numeric_precision` and `test_compare_normalizes_oracle_and_neo4j_node_identity`. - Create tests for path normalization in `test_compare_normalizes_single_neo4j_path_to_flat_element_sequence`. - Include checks for nondeterministic limits in `test_compare_detects_nondeterministic_limit_without_order_by`. - Expose file stem label aliases in `test_loader_exposes_file_stem_label_aliases`.
…atches - Introduced `is_supported_correlated_optional_match` to validate correlated optional matches in Cypher queries. - Updated `detect_unsupported_features` to remove "optional_match" feature if correlated optional matches are supported. - Removed redundant optional match translation logic from `cypher2oracle_sqlpgq`. - Added comprehensive tests for various optional match scenarios, including correlated optional matches and their translations to SQL. - Improved handling of optional match clauses in the dataset preparation and query translation processes.
…ement - Implemented CypherSchema to manage and validate graph schema based on provided configuration. - Added methods for detecting validation issues in Cypher queries, including node and edge label checks, property validation, and unsafe numeric conversions. - Introduced utility functions for parsing Cypher variable labels, property references, and edge relationships. - Included comprehensive handling of schema name aliases and property types. - Ensured deduplication of validation issues for cleaner output.
…tion and numeric tolerance checks
- Introduced checks for unique schema ownership of properties in CypherSchema. - Added detection for unsafe temporal arithmetic in aggregate queries. - Improved handling of broad bounded variable length relationships in translation. - Updated tests to cover new features and edge cases, including disambiguation of complex aggregate property aliases. - Refactored unsupported feature detection to exclude expensive variable length paths. - Enhanced query translation to preserve real ID properties over pseudo identities. - Added stable tiebreakers for ordered queries with limits in comparison functions.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds Oracle SQL/PGQ support to Awesome-Text2GQL and introduces tooling to translate, validate, compare, and export Oracle SQL/PGQ datasets from Text2GQL-Bench.
The main goals are:
GRAPH_TABLEqueries.New Dataset with only valid Oracle SQL/PGQ queries added is available at Text2GQL-Dataset
What Changed
Oracle SQL/PGQ implementation
Added Oracle SQL/PGQ support under
app/impl/oracle_sqlpgq/, including:python-oracledb.Dataset preparation utilities
Added
dataset_prep/scripts for working with the published Text2GQL-Bench dataset:Examples and documentation
Added examples for schema conversion, Oracle graph setup, Cypher-to-Oracle SQL/PGQ translation, template corpus generation, LLM-based corpus generation, and corpus combination.
Added detailed documentation for:
The main README now points to those focused docs instead of duplicating long Oracle SQL/PGQ command sequences.
Why
This adds a practical Oracle SQL/PGQ path to the Text2GQL data generation workflow. It makes it possible to generate queries using LLMs, or take existing Text2GQL-Bench Cypher/GQL-like records, translate them into Oracle SQL/PGQ where supported, validate them against a live Oracle property graph, and compare results against Neo4j to identify records that are semantically safe to export.
Validation Notes
Final export validation stats:
3322,40720,65319,6333252,449Reviewer Notes: Source Dataset / Schema Issues Found
While validating Text2GQL-Bench records, the new tooling found several cases where the source Cypher/GQL-like query appears inconsistent with its own graph import schema. These are classified as unsupported instead of emitting potentially incorrect Oracle SQL/PGQ.
The following are representative examples for reviewers to inspect.
1. Relationship direction mismatch
Dataset:
dev/AddressRecord:
bird_address_0Why this matters:
The query uses the ZIP_CODE relationship in a direction that does not match the discovered import schema. The translator intentionally refuses to emit Oracle SQL/PGQ for this because reversing the relationship could change query semantics.
Dataset: dev/Address
Record: bird_address_7
Why this matters:
The query references a label or relationship shape that does not align cleanly with the graph schema discovered from the dataset import config.
Dataset: dev/FInancial_Financial_Management
Record: at2gsynth_financialfinancialmanagement_73
Why this matters:
The query references properties that are not present on the corresponding schema elements. The tooling treats this as a source/schema mismatch and does not emit SQL/PGQ, because generating SQL against absent properties would produce invalid or misleading output.