This command extracts all unique image paths from the pretraining annotations CSV file, saves them to a separate file, and counts the total number of unique images.
tail -n +2 pretraining_annotations.csv | cut -d',' -f1 | sort -u | tee unique_image_paths.txt | wc -lWhat it does:
tail -n +2- Skip the header rowcut -d',' -f1- Extract the first column (image_path)sort -u- Sort and keep only unique valuestee unique_image_paths.txt- Save the unique paths to a file while passing them throughwc -l- Count the number of lines (unique images)
Expected output:
4193
This indicates there are 4,193 unique image paths in the dataset. The unique paths are saved to unique_image_paths.txt with one path per line.
When downloading many files from a remote server that requires 2FA, use rsync with --files-from to download all files in a single SSH session.
rsync -avzh --progress --dry-run --files-from=neon_aop_urls.txt user@hostname:/remote/base/path/ ./local_destination/rsync -avzh --progress --files-from=neon_aop_urls.txt user@hostname:/remote/base/path/ ./local_destination/Command breakdown:
-a= archive mode (preserves permissions, timestamps, links)-v= verbose (shows files being processed)-z= compress during transfer-h= human-readable file sizes--progress= shows transfer progress for each file--dry-run= test run without actually downloading (remove for real download)--files-from=neon_aop_urls.txt= read file list from text file (one path per line)
Replace these placeholders:
user@hostname= your SSH username and server hostname/remote/base/path/= base directory on remote server where files are located./local_destination/= local directory where files should be saved
Benefits:
- Single 2FA authentication for all files
- Maintains directory structure from file list
- Resume capability if interrupted
- Only downloads files that don't already exist locally
- Compression reduces transfer time