Add a REPL to: - manage the main API server and lets the user discover and handle worker nodes - manage local model storage and download of restricted models from HF - trace callstack and get performance information at different levels (including interpreter-level with sys.setprofile) - trace different subsystems (prefill, compute, prefetch) and specific components (tokenizer, attention, etc.) - isolate and benchmark different subsystems and components with dummy data
Add a REPL to: