Refactor bio.tools sync: add protected fields validation, BioToolsClient, and improve update logic#51
Conversation
6e9d5c7 to
23d29f5
Compare
There was a problem hiding this comment.
Pull request overview
This PR refactors the gh2biotools sync script by introducing a BioToolsClient abstraction, adds retry/rate-limiting behavior, and introduces a shell validator to prevent modifications to protected fields in .biotools.json files (skipping checks for newly added files).
Changes:
- Added
scripts/runbiotools/validate-biotools-fields.shto detect protected-field changes betweenGITHUB_BEFORE_SHAand the current workspace. - Refactored
scripts/runbiotools/gh2biotools.pyinto aBioToolsClientwith improved logging, timeouts, and retry support. - Updated sync logic to categorize results and exit non-zero when failures occur.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| scripts/runbiotools/validate-biotools-fields.sh | Adds protected-field diff validation for existing .biotools.json files. |
| scripts/runbiotools/gh2biotools.py | Refactors bio.tools syncing into a client class with retries, rate limiting, and improved reporting. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| elif response.status_code == 404: | ||
| return None | ||
| else: | ||
| logging.error( | ||
| f"Error retrieving tool {tool_id}: {response.status_code} {response.text}" | ||
| ) | ||
| return None |
There was a problem hiding this comment.
@anitamnd this is a good point. if I get the logic correctly, if return code is not 200 nor 404 it should raise requests.RequestException right?
| existing_tool_clean = remap(existing_tool, lambda p, k, v: bool(v)) | ||
| payload_dict_clean = remap(payload_dict, lambda p, k, v: bool(v)) | ||
|
|
||
| if existing_tool_clean == payload_dict_clean: | ||
| logging.info(f"Tool {tool_id} is unchanged") | ||
| return tool_id, "unchanged" | ||
|
|
||
| # Validate and update | ||
| if not self.validate_update(tool_id, payload_dict): | ||
| return tool_id, "failed_validation" | ||
|
|
||
| success = self.update_tool(payload_dict) |
| # Check if this is a new file or existing file | ||
| if ! git show "${GITHUB_BEFORE_SHA}:${file}" > old.json 2>/dev/null; then | ||
| # NEW FILE | ||
| echo "ℹ New file detected: $file" | ||
| echo "ℹ Skipping validation for protected fields..." | ||
| else |
| failed=false | ||
|
|
||
| for file in "$@"; do | ||
| echo "Validating $file..." | ||
|
|
||
| # Check if this is a new file or existing file | ||
| if ! git show "${GITHUB_BEFORE_SHA}:${file}" > old.json 2>/dev/null; then | ||
| # NEW FILE | ||
| echo "ℹ New file detected: $file" | ||
| echo "ℹ Skipping validation for protected fields..." | ||
| else | ||
| # EXISTING FILE | ||
| echo "ℹ Existing file: $file" | ||
| cat "$file" > new.json | ||
|
|
||
| for field in "${PROTECTED_FIELDS[@]}"; do | ||
| old_val=$(jq -r ".$field // empty" old.json 2>/dev/null) | ||
| new_val=$(jq -r ".$field // empty" new.json 2>/dev/null) | ||
| if [ "$old_val" != "$new_val" ]; then | ||
| echo "::error file=$file::Protected field '$field' was modified (old: '$old_val', new: '$new_val')" | ||
| failed=true | ||
| fi | ||
| done | ||
|
|
||
| rm -f old.json new.json |
| for field in "${PROTECTED_FIELDS[@]}"; do | ||
| old_val=$(jq -r ".$field // empty" old.json 2>/dev/null) | ||
| new_val=$(jq -r ".$field // empty" new.json 2>/dev/null) | ||
| if [ "$old_val" != "$new_val" ]; then | ||
| echo "::error file=$file::Protected field '$field' was modified (old: '$old_val', new: '$new_val')" | ||
| failed=true | ||
| fi |
| elif response.status_code == 404: | ||
| return None | ||
| else: | ||
| logging.error( | ||
| f"Error retrieving tool {tool_id}: {response.status_code} {response.text}" | ||
| ) | ||
| return None |
There was a problem hiding this comment.
@anitamnd this is a good point. if I get the logic correctly, if return code is not 200 nor 404 it should raise requests.RequestException right?
| existing_tool_clean = remap(existing_tool, lambda p, k, v: bool(v)) | ||
| payload_dict_clean = remap(payload_dict, lambda p, k, v: bool(v)) | ||
|
|
||
| if existing_tool_clean == payload_dict_clean: | ||
| logging.info(f"Tool {tool_id} is unchanged") | ||
| return tool_id, "unchanged" | ||
|
|
||
| # Validate and update | ||
| if not self.validate_update(tool_id, payload_dict): | ||
| return tool_id, "failed_validation" | ||
|
|
||
| success = self.update_tool(payload_dict) |
This PR refactors the gh2biotools sync by introducing a BioToolsClient class while preserving the existing logic and brings some improvements: