Prototype BigQuery CDC -> Anomaly detection python flex template.#3479
Prototype BigQuery CDC -> Anomaly detection python flex template.#3479claudevdm wants to merge 17 commits intoGoogleCloudPlatform:mainfrom
Conversation
7229c10 to
393de61
Compare
|
Warning Gemini is experiencing higher than usual traffic and was unable to create the summary. Please try again in a few hours by commenting |
|
/gemini summary |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3479 +/- ##
=============================================
+ Coverage 33.37% 52.20% +18.82%
- Complexity 481 6050 +5569
=============================================
Files 215 1040 +825
Lines 12816 62977 +50161
Branches 1249 6899 +5650
=============================================
+ Hits 4277 32874 +28597
- Misses 8203 27876 +19673
- Partials 336 2227 +1891
🚀 New features to boost your workflow:
|
|
Warning Gemini is experiencing higher than usual traffic and was unable to create the summary. Please try again in a few hours by commenting |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a new BigQuery Anomaly Detection template, which is a significant and valuable addition. The changes are well-structured, covering both Java and Python components, and include comprehensive unit and integration tests. The safe_eval module is particularly well-designed for secure expression evaluation, and the updates to Dockerfile generation enhance the flexibility of Python template deployment. The new README provides excellent documentation for users. Overall, this is a high-quality contribution.
|
@shunping can you please take a look at the anomaly detection parts? |
|
@claudevdm would it make sense to break up this contribution into several commits that can be reviewed one-by-one? |
| File dockerfile = new File(dockerfilePath); | ||
| if (!dockerfile.exists()) { | ||
| List<String> filesToCopy = List.of(definition.getTemplateAnnotation().filesToCopy()); | ||
| List<String> allFilesToCopy = List.of(definition.getTemplateAnnotation().filesToCopy()); |
There was a problem hiding this comment.
In terms of idea of splitting the PR, changes to existing file/infra can go in first while reviewing new templates is still in progress.
I see some fixes to stageFlexPythonTemplate (e.g. honor directoriesToCopy). As you may have noticed we currently do not build or release any Python templates. Existing ones have been commented out
as suggested in comment, they might never worked before. Wondering we can just get rid of these dead code (or revive them after staging being fixed, not in scope of this PR though)
|
I will think about how to split. The CDC/IO source is already reviewed in apache/beam#37724 so reviewers can ignore that part. I forked it here until it rolls out in a beam release |
…correct pip args - DockerfileGenerator: add setSetupFile() for FLEX_TEMPLATE_PYTHON_SETUP_FILE env and pip install of setup.py packages - Dockerfile-template-python: use ARG REQUIREMENTS_FILE instead of ENV FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE to avoid launcher re-resolution; add directoriesToCopy and setupInstall placeholders; fix pip arg order - TemplatesStageMojo: separate files from directories in filesToCopy, auto-detect setup.py, fix empty entryPoint default, use outputClassesDirectory for Dockerfile generation - DockerfileGeneratorTest: update assertions for new pip command format
STORAGE_WRITE_API requires Java (xlang expansion service) which is not available in the Python Flex Template container. STREAMING_INSERTS is pure Python and sufficient for the low-volume aggregated results sink.
Dry-run CDC query now SELECTs the columns referenced by the metric spec (via required_source_columns()) instead of SELECT 1. This catches missing/misspelled column names at launch time rather than at runtime.
No description provided.