Skip to content

Refactor bio.tools sync: add protected fields validation, BioToolsClient, and improve update logic#51

Open
anitamnd wants to merge 4 commits into
research-software-ecosystem:mainfrom
anitamnd:fix/gh2biotools
Open

Refactor bio.tools sync: add protected fields validation, BioToolsClient, and improve update logic#51
anitamnd wants to merge 4 commits into
research-software-ecosystem:mainfrom
anitamnd:fix/gh2biotools

Conversation

@anitamnd
Copy link
Copy Markdown
Contributor

This PR refactors the gh2biotools sync by introducing a BioToolsClient class while preserving the existing logic and brings some improvements:

  • better error handling and logging for gh2biotools
  • adds a validation script to check for protected fields in .biotools.json files (moved from rsec/content repository)
  • skips protected fields validation for new/added .biotools.json files

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the gh2biotools sync script by introducing a BioToolsClient abstraction, adds retry/rate-limiting behavior, and introduces a shell validator to prevent modifications to protected fields in .biotools.json files (skipping checks for newly added files).

Changes:

  • Added scripts/runbiotools/validate-biotools-fields.sh to detect protected-field changes between GITHUB_BEFORE_SHA and the current workspace.
  • Refactored scripts/runbiotools/gh2biotools.py into a BioToolsClient with improved logging, timeouts, and retry support.
  • Updated sync logic to categorize results and exit non-zero when failures occur.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
scripts/runbiotools/validate-biotools-fields.sh Adds protected-field diff validation for existing .biotools.json files.
scripts/runbiotools/gh2biotools.py Refactors bio.tools syncing into a client class with retries, rate limiting, and improved reporting.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +100 to +106
elif response.status_code == 404:
return None
else:
logging.error(
f"Error retrieving tool {tool_id}: {response.status_code} {response.text}"
)
return None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anitamnd this is a good point. if I get the logic correctly, if return code is not 200 nor 404 it should raise requests.RequestException right?

Comment on lines +268 to +279
existing_tool_clean = remap(existing_tool, lambda p, k, v: bool(v))
payload_dict_clean = remap(payload_dict, lambda p, k, v: bool(v))

if existing_tool_clean == payload_dict_clean:
logging.info(f"Tool {tool_id} is unchanged")
return tool_id, "unchanged"

# Validate and update
if not self.validate_update(tool_id, payload_dict):
return tool_id, "failed_validation"

success = self.update_tool(payload_dict)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be discussed @anitamnd ?

Comment on lines +30 to +35
# Check if this is a new file or existing file
if ! git show "${GITHUB_BEFORE_SHA}:${file}" > old.json 2>/dev/null; then
# NEW FILE
echo "ℹ New file detected: $file"
echo "ℹ Skipping validation for protected fields..."
else
Comment on lines +25 to +49
failed=false

for file in "$@"; do
echo "Validating $file..."

# Check if this is a new file or existing file
if ! git show "${GITHUB_BEFORE_SHA}:${file}" > old.json 2>/dev/null; then
# NEW FILE
echo "ℹ New file detected: $file"
echo "ℹ Skipping validation for protected fields..."
else
# EXISTING FILE
echo "ℹ Existing file: $file"
cat "$file" > new.json

for field in "${PROTECTED_FIELDS[@]}"; do
old_val=$(jq -r ".$field // empty" old.json 2>/dev/null)
new_val=$(jq -r ".$field // empty" new.json 2>/dev/null)
if [ "$old_val" != "$new_val" ]; then
echo "::error file=$file::Protected field '$field' was modified (old: '$old_val', new: '$new_val')"
failed=true
fi
done

rm -f old.json new.json
Comment on lines +40 to +46
for field in "${PROTECTED_FIELDS[@]}"; do
old_val=$(jq -r ".$field // empty" old.json 2>/dev/null)
new_val=$(jq -r ".$field // empty" new.json 2>/dev/null)
if [ "$old_val" != "$new_val" ]; then
echo "::error file=$file::Protected field '$field' was modified (old: '$old_val', new: '$new_val')"
failed=true
fi
Copy link
Copy Markdown
Contributor

@hmenager hmenager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @anitamnd !

Comment on lines +100 to +106
elif response.status_code == 404:
return None
else:
logging.error(
f"Error retrieving tool {tool_id}: {response.status_code} {response.text}"
)
return None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anitamnd this is a good point. if I get the logic correctly, if return code is not 200 nor 404 it should raise requests.RequestException right?

Comment on lines +268 to +279
existing_tool_clean = remap(existing_tool, lambda p, k, v: bool(v))
payload_dict_clean = remap(payload_dict, lambda p, k, v: bool(v))

if existing_tool_clean == payload_dict_clean:
logging.info(f"Tool {tool_id} is unchanged")
return tool_id, "unchanged"

# Validate and update
if not self.validate_update(tool_id, payload_dict):
return tool_id, "failed_validation"

success = self.update_tool(payload_dict)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be discussed @anitamnd ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

4 participants