[KYUUBI #7379][2a/4] Data Agent Engine: tool system, data source, and prompt templates#7400
Open
wangzhigang1999 wants to merge 2 commits intoapache:masterfrom
Open
[KYUUBI #7379][2a/4] Data Agent Engine: tool system, data source, and prompt templates#7400wangzhigang1999 wants to merge 2 commits intoapache:masterfrom
wangzhigang1999 wants to merge 2 commits intoapache:masterfrom
Conversation
…urce, and prompt templates
Tool system with risk-based separation (RunSelectQueryTool / RunMutationQueryTool),
ToolRegistry with JSON schema generation, and SqlReadOnlyChecker keyword whitelist.
Data source abstraction with JdbcDialect auto-detection (Spark/Trino/MySQL/SQLite),
GenericDialect fallback for unknown JDBC subprotocols, TableRef value object for
structured table references with Jackson deserialization support, and HikariCP-backed
DataSourceFactory with credential isolation.
Composable SystemPromptBuilder with per-dialect prompt templates (base.md +
datasource-{name}.md), SQL workflow guidance, and query risk classification.
… remove jdbcUrl shortcut - SystemPromptBuilder.datasource() now replaces the previous datasource section instead of appending, matching the single-datasource-per-session model - Remove jdbcUrl() convenience method; callers use JdbcDialect.fromUrl() directly - Remove redundant tests (pool config defaults, toString, tool metadata, fast timeout) - Clean up todo comment in JdbcDialect Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
Author
|
Hi @pan3793, could you help review this PR when you have time? This is Part 2a of the Data Agent Engine series — it adds the tool system, data source abstraction, and prompt builder. Thanks! 🙏 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why are the changes needed?
Part 2a of 4 for the Data Agent Engine (umbrella, KPIP-7373).
This PR adds the tool system, data source abstraction, and composable prompt builder — the infrastructure that the agent runtime (PR 2b) will use to execute SQL and interact with the LLM.
Changes include:
AgentToolinterface withToolRiskLevel, JSON schema generation for LLM function callingToolRegistry— thread-safe tool registration, dispatch, timeout enforcement, and OpenAI-compatible tool definition exportRunSelectQueryTool/RunMutationQueryTool— read-only vs. mutation SQL execution with maxRows enforcement, output truncation, andSqlReadOnlyCheckerSqlExecutor— shared JDBC execution logic with statement timeout and result formattingDataSourceFactory— HikariCP connection pool creation with optional user/passwordJdbcDialect— auto-detection from JDBC URL with dialect-specific identifier quoting (Spark, MySQL, SQLite, Trino, generic fallback)TableRef— catalog/schema/table reference with JSON deserialization supportSystemPromptBuilder— composable Markdown prompt assembly with date injection, per-dialect datasource sections, and free-form text sectionsbase.md,datasource-{mysql,spark,sqlite,trino}.mdkyuubi.engine.data.agent.tool.*andkyuubi.engine.data.agent.datasource.*configuration entriesHow was this patch tested?
JdbcDialectTest,TableRefTest,DataSourceFactoryAuthTest,SqlReadOnlyCheckerTest,RunSelectQueryToolTest,RunMutationQueryToolTest,ToolTest,ToolRegistryThreadSafetyTest,ToolSchemaGeneratorTest,SystemPromptBuilderTestDataSourceFactoryTest,DialectTest,RunSelectQueryTest,RunMutationQueryTest,ToolExecutionTest— all run against a real MySQL containerWas this patch authored or co-authored using generative AI tooling?
Partially assisted by Claude Code (Claude Opus 4.6) for test generation, code review, and PR formatting. Core design and implementation are human-authored.