Skip to content

Latest commit

 

History

History
58 lines (45 loc) · 2.16 KB

File metadata and controls

58 lines (45 loc) · 2.16 KB

Useful Commands

Extract Unique Image Paths from CSV

This command extracts all unique image paths from the pretraining annotations CSV file, saves them to a separate file, and counts the total number of unique images.

tail -n +2 pretraining_annotations.csv | cut -d',' -f1 | sort -u | tee unique_image_paths.txt | wc -l

What it does:

  • tail -n +2 - Skip the header row
  • cut -d',' -f1 - Extract the first column (image_path)
  • sort -u - Sort and keep only unique values
  • tee unique_image_paths.txt - Save the unique paths to a file while passing them through
  • wc -l - Count the number of lines (unique images)

Expected output:

4193

This indicates there are 4,193 unique image paths in the dataset. The unique paths are saved to unique_image_paths.txt with one path per line.

Download NEON AOP Data with rsync

When downloading many files from a remote server that requires 2FA, use rsync with --files-from to download all files in a single SSH session.

Dry Run (Test First)

rsync -avzh --progress --dry-run --files-from=neon_aop_urls.txt user@hostname:/remote/base/path/ ./local_destination/

Actual Download

rsync -avzh --progress --files-from=neon_aop_urls.txt user@hostname:/remote/base/path/ ./local_destination/

Command breakdown:

  • -a = archive mode (preserves permissions, timestamps, links)
  • -v = verbose (shows files being processed)
  • -z = compress during transfer
  • -h = human-readable file sizes
  • --progress = shows transfer progress for each file
  • --dry-run = test run without actually downloading (remove for real download)
  • --files-from=neon_aop_urls.txt = read file list from text file (one path per line)

Replace these placeholders:

  • user@hostname = your SSH username and server hostname
  • /remote/base/path/ = base directory on remote server where files are located
  • ./local_destination/ = local directory where files should be saved

Benefits:

  • Single 2FA authentication for all files
  • Maintains directory structure from file list
  • Resume capability if interrupted
  • Only downloads files that don't already exist locally
  • Compression reduces transfer time