Skip to content

Add macOS and Windows support #1

@bnovik0v

Description

@bnovik0v

Goal

Make ghst run on macOS and Windows, not just Linux/PipeWire.

Current state

Only one place in the code is hard-coded to Linux: startCapture / findDefaultSink / ensureMonitorVolumeFull in src/main/index.ts (lines ~66-110). It spawns pw-record and shells out to pactl. The platform guard at line 68 throws on non-Linux.

Everything else is already cross-platform:

  • setContentProtection(true) is already gated on darwin / win32 (src/main/index.ts:161).
  • Hotkeys use CommandOrControl+....
  • keyStore.ts uses Electron safeStorage — Keychain (macOS) / DPAPI (Windows) work out of the box.
  • The VAD / Whisper / copilot pipeline in src/core/* and the worker renderer is pure web APIs.

Missing:

  • mac and win targets in package.json#build.
  • Cross-platform audio capture strategy.
  • CI matrix / signing / notarization.

Proposed approach: electron-audio-loopback in the worker renderer

The README and CLAUDE.md already reference this as the planned cross-platform path. It exposes getDisplayMedia({ audio: true }) with system-audio loopback on macOS 13+ (via ScreenCaptureKit) and Windows 10+ (WASAPI loopback). The captured MediaStream plugs directly into the existing AudioWorklet → VAD → Whisper pipeline that already lives in the worker renderer.

Pros: pure JS, no native module to build/sign/notarize.
Cons: macOS requires Screen Recording permission; a system picker may appear on first capture per session.

Native-module alternatives (ScreenCaptureKit addon on macOS, WASAPI addon on Windows) are possible but add a large build/sign burden. Skip unless the loopback UX is unacceptable.

Plan

  1. Refactor startCapture into a CaptureStrategy interface under src/main/capture/, with linux-pipewire.ts (existing logic) and a new strategy that signals the worker renderer to start a getDisplayMedia capture. Pick strategy by process.platform at app.whenReady().
  2. On non-Linux, the worker renderer owns the audio source: it pipes the loopback MediaStreamTrack into an AudioContext at 16 kHz and feeds the existing VAD pipeline. Main just sends cmd:capture-start / cmd:capture-stop. Linux keeps the existing pw-record path in main.
  3. Extend package.json#build with:
    • mac: dmg + zip, arm64 and x64, hardened runtime, entitlements for audio input + screen recording, notarization config.
    • win: nsis, x64.
  4. extendInfoPlist: NSMicrophoneUsageDescription and NSScreenCaptureUsageDescription.
  5. Extend .github/workflows/release.yml matrix with macos-14 (arm64), macos-13 (x64), windows-latest. Add secrets for Apple ID notarization (and optionally Windows code-signing cert).
  6. README: drop the Linux-only badge, add per-OS install sections, document the macOS Screen Recording permission step.
  7. Tests: unit test for the strategy selector. Capture strategies themselves are integration-only.

Open questions

  • Minimum macOS version: recommend 13+ (clean ScreenCaptureKit loopback). Drop 12?
  • Apple Developer ID for notarization ($99/yr) — required to ship without Gatekeeper warnings. Acceptable?
  • Windows code-signing cert ($200-400/yr EV) — optional but avoids SmartScreen warnings.

Out of scope for this issue

  • Linux/Wayland ghost-mode (still no per-window screencast exclude API).
  • Migrating away from Groq Whisper.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions