Prototype BigQuery CDC -> Anomaly detection python flex template. by claudevdm · Pull Request #3479 · GoogleCloudPlatform/DataflowTemplates

claudevdm · 2026-03-11T17:40:41Z

No description provided.

gemini-code-assist · 2026-03-11T17:44:55Z

Warning

Gemini is experiencing higher than usual traffic and was unable to create the summary. Please try again in a few hours by commenting /gemini summary.

claudevdm · 2026-03-11T17:48:37Z

/gemini summary

codecov · 2026-03-11T17:52:15Z

Codecov Report

❌ Patch coverage is 12.90323% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 52.20%. Comparing base (7264b3a) to head (379947f).
⚠️ Report is 53 commits behind head on main.

Files with missing lines	Patch %	Lines
...loud/teleport/plugin/maven/TemplatesStageMojo.java	5.00%	19 Missing ⚠️
...gle/cloud/teleport/plugin/DockerfileGenerator.java	27.27%	8 Missing ⚠️

Additional details and impacted files

@@              Coverage Diff              @@
##               main    #3479       +/-   ##
=============================================
+ Coverage     33.37%   52.20%   +18.82%     
- Complexity      481     6050     +5569     
=============================================
  Files           215     1040      +825     
  Lines         12816    62977    +50161     
  Branches       1249     6899     +5650     
=============================================
+ Hits           4277    32874    +28597     
- Misses         8203    27876    +19673     
- Partials        336     2227     +1891

Components	Coverage Δ
spanner-templates	`72.19% <ø> (∅)`
spanner-import-export	`68.92% <ø> (∅)`
spanner-live-forward-migration	`80.45% <ø> (∅)`
spanner-live-reverse-replication	`77.85% <ø> (∅)`
spanner-bulk-migration	`89.17% <ø> (∅)`
gcs-spanner-dv	`85.30% <ø> (∅)`

Files with missing lines	Coverage Δ
...gle/cloud/teleport/plugin/DockerfileGenerator.java	`84.29% <27.27%> (-5.71%)`	⬇️
...loud/teleport/plugin/maven/TemplatesStageMojo.java	`17.31% <5.00%> (-0.59%)`	⬇️

... and 848 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

gemini-code-assist · 2026-03-11T17:52:51Z

Warning

Gemini is experiencing higher than usual traffic and was unable to create the summary. Please try again in a few hours by commenting /gemini summary.

claudevdm · 2026-03-11T18:04:05Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a new BigQuery Anomaly Detection template, which is a significant and valuable addition. The changes are well-structured, covering both Java and Python components, and include comprehensive unit and integration tests. The safe_eval module is particularly well-designed for secure expression evaluation, and the updates to Dockerfile generation enhance the flexibility of Python template deployment. The new README provides excellent documentation for users. Overall, this is a high-quality contribution.

claudevdm · 2026-03-12T17:27:56Z

@shunping can you please take a look at the anomaly detection parts?

tvalentyn · 2026-03-16T18:25:55Z

@claudevdm would it make sense to break up this contribution into several commits that can be reviewed one-by-one?

Abacn · 2026-03-16T18:46:10Z

    File dockerfile = new File(dockerfilePath);
    if (!dockerfile.exists()) {
-      List<String> filesToCopy = List.of(definition.getTemplateAnnotation().filesToCopy());
+      List<String> allFilesToCopy = List.of(definition.getTemplateAnnotation().filesToCopy());


In terms of idea of splitting the PR, changes to existing file/infra can go in first while reviewing new templates is still in progress.

I see some fixes to stageFlexPythonTemplate (e.g. honor directoriesToCopy). As you may have noticed we currently do not build or release any Python templates. Existing ones have been commented out

DataflowTemplates/python/src/main/java/com/google/cloud/teleport/templates/python/StreamingLLM.java

Line 26 in 395237d

type = TemplateType.PYTHON,

DataflowTemplates/python/src/main/java/com/google/cloud/teleport/templates/python/WordCountPython.java

Line 26 in 395237d

type = TemplateType.PYTHON,

as suggested in comment, they might never worked before. Wondering we can just get rid of these dead code (or revive them after staging being fixed, not in scope of this PR though)

claudevdm · 2026-03-16T18:57:24Z

I will think about how to split. The CDC/IO source is already reviewed in apache/beam#37724 so reviewers can ignore that part.

I forked it here until it rolls out in a beam release

…correct pip args - DockerfileGenerator: add setSetupFile() for FLEX_TEMPLATE_PYTHON_SETUP_FILE env and pip install of setup.py packages - Dockerfile-template-python: use ARG REQUIREMENTS_FILE instead of ENV FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE to avoid launcher re-resolution; add directoriesToCopy and setupInstall placeholders; fix pip arg order - TemplatesStageMojo: separate files from directories in filesToCopy, auto-detect setup.py, fix empty entryPoint default, use outputClassesDirectory for Dockerfile generation - DockerfileGeneratorTest: update assertions for new pip command format

STORAGE_WRITE_API requires Java (xlang expansion service) which is not available in the Python Flex Template container. STREAMING_INSERTS is pure Python and sufficient for the low-volume aggregated results sink.

Dry-run CDC query now SELECTs the columns referenced by the metric spec (via required_source_columns()) instead of SELECT 1. This catches missing/misspelled column names at launch time rather than at runtime.

pull-request-size Bot added the size/XXL label Mar 11, 2026

claudevdm force-pushed the bqmonitor branch 2 times, most recently from 7229c10 to 393de61 Compare March 11, 2026 17:44

gemini-code-assist Bot reviewed Mar 11, 2026

View reviewed changes

claudevdm changed the title ~~initial~~ Prototype BigQuery CDC -> Anomaly detection python flex template. Mar 11, 2026

initial

d0786b4

claudevdm force-pushed the bqmonitor branch from 393de61 to d0786b4 Compare March 12, 2026 16:27

claudevdm marked this pull request as ready for review March 12, 2026 17:27

claudevdm requested review from Abacn, shunping and tvalentyn March 12, 2026 17:27

Abacn reviewed Mar 16, 2026

View reviewed changes

claudevdm added 8 commits March 17, 2026 07:31

optimizations

382c783

fix metrics

dd5338f

remove custom sharding

10aa11d

add fanout strategies

52383ed

Write to BQ

a70b290

more

9fdcf62

Use STREAMING_INSERTS for BQ sink, remove write_method option

ee8c847

STORAGE_WRITE_API requires Java (xlang expansion service) which is not available in the Python Flex Template container. STREAMING_INSERTS is pure Python and sufficient for the low-volume aggregated results sink.

claudevdm force-pushed the bqmonitor branch from ad0b4d7 to ee8c847 Compare March 19, 2026 17:15

spotless

a872482

claudevdm added 7 commits March 19, 2026 14:05

Preflight: validate metric spec columns exist in source table

af34fea

Dry-run CDC query now SELECTs the columns referenced by the metric spec (via required_source_columns()) instead of SELECT 1. This catches missing/misspelled column names at launch time rather than at runtime.

Remove check

757ba85

Rename requirements.txt to requirements_all.txt, fix metric.py import

1c073ff

Support configurable requirements filename (requirements_all.txt)

29feb09

fix python version restrict

7dce245

Merge pr1-python-staging-fixes into bqmonitor

89f3881

Add runner side precombine

379947f

claudevdm closed this Mar 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prototype BigQuery CDC -> Anomaly detection python flex template.#3479

Prototype BigQuery CDC -> Anomaly detection python flex template.#3479
claudevdm wants to merge 17 commits intoGoogleCloudPlatform:mainfrom
claudevdm:bqmonitor

claudevdm commented Mar 11, 2026

Uh oh!

gemini-code-assist Bot commented Mar 11, 2026

Uh oh!

claudevdm commented Mar 11, 2026

Uh oh!

codecov Bot commented Mar 11, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Mar 11, 2026

Uh oh!

claudevdm commented Mar 11, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

claudevdm commented Mar 12, 2026

Uh oh!

tvalentyn commented Mar 16, 2026

Uh oh!

Abacn Mar 16, 2026

Uh oh!

claudevdm commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

claudevdm commented Mar 11, 2026

Uh oh!

gemini-code-assist Bot commented Mar 11, 2026

Uh oh!

claudevdm commented Mar 11, 2026

Uh oh!

codecov Bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

gemini-code-assist Bot commented Mar 11, 2026

Uh oh!

claudevdm commented Mar 11, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

claudevdm commented Mar 12, 2026

Uh oh!

tvalentyn commented Mar 16, 2026

Uh oh!

Abacn Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

claudevdm commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov Bot commented Mar 11, 2026 •

edited

Loading