Skip to content

Fix GlueJobHook failing to update a Glue job that has tags#68711

Open
FrankYang0529 wants to merge 1 commit into
apache:mainfrom
FrankYang0529:airflow-68676
Open

Fix GlueJobHook failing to update a Glue job that has tags#68711
FrankYang0529 wants to merge 1 commit into
apache:mainfrom
FrankYang0529:airflow-68676

Conversation

@FrankYang0529

Copy link
Copy Markdown
Member

When GlueJobOperator / GlueJobHook runs with update_config=True against a job that already exists and whose create_job_kwargs include Tags, the update fails.

AWS Glue's CreateJob
accepts a top-level Tags parameter, but UpdateJob only accepts JobName and a JobUpdate structure. The hook passed the whole config (including Tags) straight into JobUpdate, so botocore rejected the call with a ParamValidationError. The result was an inconsistency: tagging a job worked on first creation but broke on every subsequent update.

This strips Tags out of the JobUpdate payload and reconciles them through the dedicated tag APIs (TagResource / UntagResource), mirroring how GlueCrawlerHook already handles the same Glue limitation for crawlers. Tag additions, value changes, and removals are all applied. The update_job now reports that an update happened, when only the tags change.

closes: #68676

Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

Generated-by: [Claude Code with Opus 4.8] following the guidelines


  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

Signed-off-by: PoAn Yang <payang@apache.org>
Comment on lines +522 to +530
tags_to_add = {key: value for key, value in job_tags.items() if current_tags.get(key) != value}
tags_to_remove = [key for key in current_tags if key not in job_tags]

if tags_to_add:
self.log.info("Updating job tags: %s", job_name)
self.conn.tag_resource(ResourceArn=job_arn, TagsToAdd=tags_to_add)
if tags_to_remove:
self.log.info("Removing job tags: %s", job_name)
self.conn.untag_resource(ResourceArn=job_arn, TagsToRemove=tags_to_remove)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tags_to_add = {key: value for key, value in job_tags.items() if current_tags.get(key) != value}
tags_to_remove = [key for key in current_tags if key not in job_tags]
if tags_to_add:
self.log.info("Updating job tags: %s", job_name)
self.conn.tag_resource(ResourceArn=job_arn, TagsToAdd=tags_to_add)
if tags_to_remove:
self.log.info("Removing job tags: %s", job_name)
self.conn.untag_resource(ResourceArn=job_arn, TagsToRemove=tags_to_remove)
if (tags_to_add := {key: value for key, value in job_tags.items() if current_tags.get(key) != value})
self.log.info("Updating job tags: %s", job_name)
self.conn.tag_resource(ResourceArn=job_arn, TagsToAdd=tags_to_add)
if (tags_to_remove := [key for key in current_tags if key not in job_tags])
self.log.info("Removing job tags: %s", job_name)
self.conn.untag_resource(ResourceArn=job_arn, TagsToRemove=tags_to_remove)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:amazon AWS/Amazon - related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Airflow GlueJobOperator support tags during creation of job however failed while updating the job with tags

2 participants