-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Data Analysis of Potential Duplicate Authors #10438
Copy link
Copy link
Open
Labels
Lead: @RayBBIssues overseen by Ray (Onboarding & Documentation Lead) [manages]Issues overseen by Ray (Onboarding & Documentation Lead) [manages]Needs: HelpIssues, typically substantial ones, that need a dedicated developer to take them on. [managed]Issues, typically substantial ones, that need a dedicated developer to take them on. [managed]Needs: ResponseIssues which require feedback from leadIssues which require feedback from leadType: Feature RequestIssue describes a feature or enhancement we'd like to implement. [managed]Issue describes a feature or enhancement we'd like to implement. [managed]
Metadata
Metadata
Assignees
Labels
Lead: @RayBBIssues overseen by Ray (Onboarding & Documentation Lead) [manages]Issues overseen by Ray (Onboarding & Documentation Lead) [manages]Needs: HelpIssues, typically substantial ones, that need a dedicated developer to take them on. [managed]Issues, typically substantial ones, that need a dedicated developer to take them on. [managed]Needs: ResponseIssues which require feedback from leadIssues which require feedback from leadType: Feature RequestIssue describes a feature or enhancement we'd like to implement. [managed]Issue describes a feature or enhancement we'd like to implement. [managed]
Proposal
Here we have a list of 100k+ authors that have the exact same name and IDs that are only one off from each other. They are likely to be duplicate authors because of a race condition. As of 2024 there are few instances of this problem. But we still need to fix up the old instances.
Here's what someone should do:
From there staff can decide if we want to do an automated merge.
To work on this please use the data dumps, do not call the API.
CSV of authors in case you can't see the file on slack:
ids_next_to_each_other.csv
Justification
No response
Breakdown
Requirements Checklist
Related files
Stakeholders
Instructions for Contributors
Please run these commands to ensure your repository is up to date before creating a new branch to work on this issue and each time after pushing code to Github, because the pre-commit bot may add commits to your PRs upstream.