SQL DDL documentation#37539
Conversation
Summary of ChangesHello @ahmedabu98, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Apache Beam website by introducing extensive documentation for SQL Data Definition Language (DDL) commands, making it easier for users to understand and utilize SQL for managing Catalogs, Databases, and Tables. Concurrently, it refines the Iceberg SQL connector's ability to handle table properties, including a new convention for Beam-specific configurations, and updates existing documentation to reflect these capabilities and new table types. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
f2675e8 to
257cd5a
Compare
|
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
…erg_getting_started
|
Assigning reviewers: R: @kennknowles for label java. Note: If you would like to opt out of this review, comment Available commands:
The PR bot will only process comments in the main thread (not review comments). |
|
Reminder, please take a look at this pr: @kennknowles @shunping |
|
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment R: @Abacn for label java. Available commands:
|
|
Reminder, please take a look at this pr: @Abacn |
|
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment R: @chamikaramj for label java. Available commands:
|
|
Reminder, please take a look at this pr: @chamikaramj @kennknowles |
|
@ahmedabu98 are you still waiting for @talatuyarer review, or want anther reviewer to take a look |
|
I could use another reviewer, looks like @talatuyarer may have their hands full |
|
Reminder, please take a look at this pr: @chamikaramj @kennknowles @Abacn |
|
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment R: @chamikaramj for label java. Available commands:
|
Abacn
left a comment
There was a problem hiding this comment.
For markdown, could you please find a tech writer reviewer? Thanks!
| if (prop.equals(TRIGGERING_FREQUENCY_FIELD)) { | ||
| this.triggeringFrequency = property.getValue().asInt(); | ||
| } else { | ||
| throw new IllegalArgumentException("Unknown Beam write property: " + name); |
There was a problem hiding this comment.
Would it be preferred to warn here (for future forward and backward compatibility)
There was a problem hiding this comment.
I'd rather fail and avoid a situation where the transform is behaving unexpectedly.
I remember running into user issues with some IOs for the same reason, where incorrect configurations were logged but the pipeline kept running.
|
Reminder, please take a look at this pr: @chamikaramj @shunping |
|
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment R: @Abacn for label java. Available commands:
|
|
waiting on author |
…erg_getting_started
There was a problem hiding this comment.
Should there be a section/link added to this page for this new content?
|
|
||
| # Beam SQL DDL | ||
|
|
||
| Beam SQL provides a standard three-level hierarchy to manage metadata across external data sources, |
There was a problem hiding this comment.
For SEO purposes and to orient the reader it's a good idea to introduce the notion of Beam SQL DDL here. Something like: "Beam SQL Data Definition Language (DDL) provides..." (or similar).
| # Beam SQL DDL | ||
|
|
||
| Beam SQL provides a standard three-level hierarchy to manage metadata across external data sources, | ||
| enabling structured discovery and cross-source interoperability. |
There was a problem hiding this comment.
"enabling structured discovery and cross-source interoperability": this reads a little bit like marketing speak to me and is a bit vague. Could you rephrase more in terms of what it allows them as the user to do?
Like: "...external data sources. This lets you discover data trends in a structured way and operate using multiple data sources" (or something).
|
|
||
| Beam SQL provides a standard three-level hierarchy to manage metadata across external data sources, | ||
| enabling structured discovery and cross-source interoperability. | ||
| 1. Catalog: The top-level container representing an external metadata provider. Examples include a Hive Metastore, AWS Glue, or a BigLake Catalog. |
There was a problem hiding this comment.
BigLake Catalog -> Lakehouse (formerly BigLake) Catalog
| Beam SQL provides a standard three-level hierarchy to manage metadata across external data sources, | ||
| enabling structured discovery and cross-source interoperability. | ||
| 1. Catalog: The top-level container representing an external metadata provider. Examples include a Hive Metastore, AWS Glue, or a BigLake Catalog. | ||
| 2. Database: A logical grouping within a Catalog. This typically maps to a "Schema" in traditional RDBMS or a "Namespace" in systems like Apache Iceberg |
There was a problem hiding this comment.
add period at end to match other list items: "...systems like Apache Iceberg."
| [ TBLPROPERTIES 'properties_json_string' ]; | ||
| {{< /highlight >}} | ||
| <ul> | ||
| <li><strong>TYPE:</strong> the table type (e.g. <code>'iceberg'</code>, <code>'text'</code>, <code>'kafka'</code>).</li> |
There was a problem hiding this comment.
(e.g. 'iceberg', 'text', 'kafka') -> (for example, 'iceberg', 'text', 'kafka')
Side note: is this list exhaustive?
| <ul> | ||
| <li><strong>TYPE:</strong> the table type (e.g. <code>'iceberg'</code>, <code>'text'</code>, <code>'kafka'</code>).</li> | ||
| <li><strong>PARTITIONED BY:</strong> an ordered list of fields describing the partition spec.</li> | ||
| <li><strong>LOCATION:</strong> explicitly sets the location of the table (overriding the inferred <code>catalog.db.table_name</code> location)</li> |
There was a problem hiding this comment.
of the table (overriding the inferred catalog.db.table_name location) -> of the table. This overrides the inferred catalog.db.table_name location.
| <ul> | ||
| <li>This creates an Iceberg table named <code>orders</code> under the namespace <code>sales_data</code>, within the <code>prod_iceberg</code> catalog.</li> | ||
| <li>The table is partitioned by <code>region_id</code>, then by the day value of <code>order_date</code> (using Iceberg's <a href="https://iceberg.apache.org/docs/latest/partitioning/#icebergs-hidden-partitioning">hidden partitioning</a>).</li> | ||
| <li>The table is created with the appropriate properties <code>"write.format.default"</code> and <code>"read.split.target-size"</code>. The Beam property <code>"beam.write.triggering_frequency_seconds"</code></li> |
There was a problem hiding this comment.
Hanging thought: "The Beam property "beam.write.triggering_frequency_seconds""
| <li>This creates an Iceberg table named <code>orders</code> under the namespace <code>sales_data</code>, within the <code>prod_iceberg</code> catalog.</li> | ||
| <li>The table is partitioned by <code>region_id</code>, then by the day value of <code>order_date</code> (using Iceberg's <a href="https://iceberg.apache.org/docs/latest/partitioning/#icebergs-hidden-partitioning">hidden partitioning</a>).</li> | ||
| <li>The table is created with the appropriate properties <code>"write.format.default"</code> and <code>"read.split.target-size"</code>. The Beam property <code>"beam.write.triggering_frequency_seconds"</code></li> | ||
| <li>Beam properties (prefixed with <code>"beam.write."</code> and <code>"beam.read."</code> are intended for the relevant IOs)</li> |
|
|
||
| {{< /tab >}} | ||
| {{< tab SHOW >}} | ||
| <p>Lists tables under the currently active database, or a specified database.</p> |
There was a problem hiding this comment.
or a specified database. -> or a database you specify.
https://apache-beam-website-pull-requests.storage.googleapis.com/37539/documentation/dsls/sql/ddl/index.html
Adding some SQL DDL documentation to the Beam website.
Specifically the following commands for Catalogs, Databases, Tables: