feat: pluggable TTS engine interface by alichherawalla · Pull Request #247 · alichherawalla/off-grid-mobile-ai

alichherawalla · 2026-04-09T18:24:57Z

Summary

Introduces a pluggable TTS engine interface (TTSEngine) that decouples the app from any specific TTS implementation
Wraps Kokoro (ExecuTorch) and OuteTTS (llama.rn) as engine adapters behind a unified API
Rewrites the TTS store as a thin proxy that delegates to the active engine — no engine-specific branching
Adds engine picker UI to TTS Settings screen
Lays the foundation for a multimodal on-device engine SDK (OnDeviceEngine base generalizes to STT, Vision, LLM)

What changed

New files (engine layer):

src/engine/types.ts — OnDeviceEngine base + TTSEngine interface + all event/voice/asset types
src/engine/OnDeviceEngineEmitter.ts — zero-dep typed event emitter
src/engine/EngineRegistry.ts — generic registry (works for any modality)
src/engine/tts/engines/kokoro/ — KokoroEngine, KokoroTTSBridge, voices
src/engine/tts/engines/outetts/ — OuteTTSEngine, models
src/engine/tts/engines/qwen3/ — Qwen3TTSEngine stub (asset management ready, inference TODO)
src/components/EngineBridge.tsx — renders bridge for hook-based engines
docs/TTS_ENGINE_INTERFACE.md — full documentation

Refactored:

src/stores/ttsStore.ts — engine-agnostic, delegates to ttsRegistry.getActiveEngine()
App.tsx — <KokoroTTSManager /> replaced with <EngineBridge />
All UI consumers (TTSButton, TTSSection, TTSSettingsScreen, Popovers, ChatInput, etc.) now read engine-agnostic state from the store

Removed:

src/components/KokoroTTSManager.tsx — absorbed into KokoroEngine + KokoroTTSBridge

How engine swapping works

// In TTS Settings, user taps an engine:
await useTTSStore.getState().setEngine('outetts');
// That's it. Store syncs voices, assets, phase. UI updates automatically.

Test plan

Implements on-device text-to-speech using OuteTTS 0.3 (454 MB) + WavTokenizer (73 MB) via llama.rn, with react-native-audio-api for playback. Two interface modes (user-switchable from Settings): - Chat Mode: play/stop TTSButton on each assistant message bubble - Audio Mode: waveform bubbles with auto-TTS after streaming, transcript expand, speed cycling, and PCM audio persisted to disk per message for repeat playback New files: - src/constants/ttsModels.ts — model URLs, RAM thresholds, cache config - src/services/ttsService.ts — download, load, generate, persist, play - src/stores/ttsStore.ts — Zustand store with Chat + Audio Mode actions - src/hooks/useTTS.ts — convenience hook with RAM gate and weighted progress - src/components/TTSButton/index.tsx — Chat Mode play/stop per message - src/components/AudioMessageBubble/index.tsx — waveform bubble component - src/screens/TTSSettingsScreen/index.tsx — download, mode, speed, cache Modified: - Message type: audioPath, waveformData, audioDurationSeconds, isGeneratingAudio - ChatMessage: Audio Mode branch + TTSButton in meta row - SettingsScreen: Text to Speech nav row - Navigation: TTSSettings route - stores/index.ts, services/index.ts: exports Tests: 42 unit + integration tests covering service, store, and full flows Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Revert ChatMessage to main (avoids pre-existing complexity lint failure when the file enters the push-range diff) - Add Audio Mode + TTSButton to MessageRenderer instead — clean, under limit - Move audioPath/waveformData/audioDurationSeconds/isGeneratingAudio fields from types/index.ts to types/tts.ts via module augmentation (keeps index.ts under the 350-line max) - Add react-native-audio-api global mock to jest.setup.ts so all test suites that transitively import ttsService can resolve the native module Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

In finalizeStreamingMessage, after addMessage() saves the assistant reply, check if Audio Mode is active and model is loaded — if so, fire useTTSStore.generateAndSave() in the background so the waveform bubble auto-generates instead of spinning indefinitely. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…, TTSButton placement Critical fixes for TTS Audio Mode: - Add updateMessageAudio() to chatStore — writes audioPath, waveformData, audioDurationSeconds, isGeneratingAudio back to the conversation message (without this, the waveform bubble spun forever after generation) - Wire auto-TTS trigger in useChatScreen via useEffect on isStreamingForThisConversation: detects streaming → stopped, checks Audio Mode + model loaded, calls triggerAudioModeGeneration() which sets isGeneratingAudio:true, fires generateAndSave, then writes audio fields or clears the flag on error - Fix isGenerating logic: show spinner only when isGeneratingAudio===true, not for every assistant message missing audioPath (which made all old messages spin forever in Audio Mode) - Fix TTSButton placement: add metaExtra prop to ChatMessage/MessageMetaRow so TTSButton renders inline in the timestamp row rather than below the bubble Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds a Voice row (volume icon + Chat/Audio/N/A badge) to the quick settings popover in the chat input. Tapping it: - Toggles between Chat and Audio mode when models are downloaded - Auto-loads/unloads the TTS model on switch - Navigates to TTSSettings when models are not yet downloaded This makes Audio Mode accessible without leaving the chat screen. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The ChatInput test mock for src/stores was missing useTTSStore, causing Popovers.tsx (which now uses useTTSStore) to throw on render. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

1. checkDownloadStatus() never called on TTSSettingsScreen mount → store always showed models as not downloaded after fresh app start 2. speak() race condition: stop() during generation didn't prevent playback → set isSpeakingFlag=true before generate(), check it after, use finally 3. RNFS.stat() on directory reports block size (~0), not total file size → replaced with readDir() recursive sum of individual .pcm file sizes 4. Historical messages without audio showed broken play button in Audio Mode → AudioMessageBubble only rendered when msg.audioPath || msg.isGeneratingAudio Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replaced stat() mock with readDir() mocks matching the new recursive file-size summation approach. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…nto feat/tts-implementation

Replaces slider controls with a [–] value [+] stepper row for precise numeric input in settings screens. Supports min/max/step, optional decimal formatting, and testID for E2E automation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Removes @react-native-community/slider from GenerationSettingsModal, ModelSettingsScreen, and TTSSettingsScreen. Every numeric control (temperature, top-p, GPU layers, speed, etc.) now uses the stepper for touch-friendly precise adjustment. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- MediaAttachment gains audioFormat and audioDurationSeconds fields - audioRecorderService.stopRecording() now returns { path, durationSeconds } instead of just the path, enabling accurate audio bubble scrubbing - ChatInput/Attachments.addAudioAttachment stores the duration Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…send In Audio Mode, user voice recordings now appear as right-aligned audio bubbles instead of text messages, making both sides of the conversation audio-native. - Voice.ts: adds file-based transcription path (audioRecorderService + whisperService.transcribeFile) and onAutoSend callback for atomic send with audio attachment. Multimodal models skip transcription entirely. - ChatInput: passes onAutoSend in Audio Mode; builds MediaAttachment inline to avoid async state-update race; uses attachmentsRef for sync reads. - AudioMessageBubble: adds isUser prop for right-aligned primary-tinted style. - MessageRenderer: renders user audio attachments as AudioMessageBubble before the normal message path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The streaming-complete useEffect only listed isStreamingForThisConversation in its deps, so activeConversation was captured stale. When streaming ended, the last message was always the old value — TTS generation was never triggered. Fix: read conversation and last message directly from useChatStore.getState() inside the effect instead of relying on the closed-over activeConversation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

When no Whisper model is installed and the user taps the mic, show a CustomAlert offering to download Whisper Small (466 MB) immediately, rather than navigating away to VoiceSettings. UnavailableButton also now shows a download icon + percentage while the model is being fetched, so feedback is in-place. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds a TEXT TO SPEECH section alongside IMAGE GENERATION and TEXT GENERATION in the chat settings modal. Shows mode toggle (chat/audio), enable switch, speed stepper, and auto-play toggle. Deep-links to TTSSettingsScreen for full configuration. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

WHISPER_MODELS grows from 5 to 10 entries covering English-only and Multilingual variants for tiny/base/small/medium, plus Large v3 Turbo and Large v3. whisperService.downloadFromUrl(url, modelId) downloads any ggml .bin file from an arbitrary URL — enables installing community models from HuggingFace. whisperStore exposes it as downloadFromUrl action. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Rewrites the voice settings screen with three sections: - Active model card with inline download progress and remove action - Curated models grouped by English-only / Multilingual (all sizes, tiny → large-v3) - Live HuggingFace search bar (500 ms debounce) that queries ASR repos; tap a repo to expand and browse its ggml .bin files; tap a file to confirm and download via downloadFromUrl huggingFaceService gains searchWhisperRepos() and getWhisperFiles() to power the HF search without coupling to the LLM model browser. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

llmMessages builds an input_audio content block from audio attachments when the active model reports audio support, bypassing Whisper entirely. llm.ts exposes getMultimodalSupport() so the voice layer can detect this. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- ttsStore: adds interfaceMode, speed, autoPlay, enabled settings; generateAndSave flow for Audio Mode; updateMessageAudio - ttsService: OuteTTS generate+save path for AI audio bubbles - TTSButton: play/stop per-message with generation spinner - KokoroTTSManager + kokoroModels: scaffold for Tier 1 Kokoro TTS (not yet wired to react-native-executorch, marked not started) - App.tsx: mounts KokoroTTSManager near root - packages: react-native-executorch, background-downloader, dr.pogodin/react-native-fs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- ChatMessage: long-press action sheet gains Speak option (delegates to ttsStore) - ModelSettingsScreen: suppress pre-existing exhaustive-deps lint warning - Tests: update GenerationSettingsModal and ModelSettingsScreen tests for NumericStepper (gpu-layers-stepper-increment) replacing slider testIDs - TTS_IMPLEMENTATION_PLAN: rewritten to reflect Audio Mode bidirectional voice conversation, stale closure fix, and implementation status Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…sages Two bugs causing broken Audio Mode: 1. AudioRecorder was recording at the system default rate (~44.1 kHz), producing WAV that Whisper interprets as static ('TV static' / [SOUND]). Fix: pass a preset with sampleRate:16000, BitDepth.Bit16 so the file is Whisper-compatible 16 kHz mono int16 PCM from the start. 2. buildOAIMessages was always including audio attachments as input_audio content blocks, even for models that don't support audio input (e.g. remote Qwen 3.5 2B / Gemma 42B). Those models replied 'I cannot hear audio'. Fix: buildOAIMessages now accepts supportsAudio flag (default false) and only emits input_audio parts when the model declares audio support. llm.ts passes multimodalSupport.audio when calling it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

playFromFile was treating WAV bytes as raw Float32 PCM — designed for OuteTTS output only. WAV files have a 44-byte RIFF header plus int16 samples; reinterpreting them as Float32 produces pure static. Fix: use AudioContext.decodeAudioData(filePath) which properly parses the WAV header and decodes samples. The file:// prefix is added if missing. MessageRenderer now wraps user and assistant audio bubbles in a container View with paddingHorizontal:16 and marginVertical:8, matching the ChatMessage container layout so bubbles align correctly with the chat edges instead of touching screen borders. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Audio type attachments were falling through to the FadeInImage branch, causing Image to try to load the WAV file path — resulting in a broken image placeholder that stretched the user bubble very wide (the 'super long' bubble issue). Audio attachments now render as a compact mic icon + 'Voice message' badge (matching the document badge style), keeping the bubble compact. In Audio Mode they never reach this code — they render as AudioMessageBubble. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add isAudioModeMessage to Message type and updateMessageAudio signature. Set flag in triggerAudioModeGeneration so mode switches don't reformat old text messages. MessageRenderer now checks msg.isAudioModeMessage instead of global ttsMode for assistant audio bubbles. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Bug 2: handlePlayPause calls speak() for AI bubbles (empty audioPath) instead of playMessage with empty string. Remove isGenerating spinner. Bug 3: WaveformBars gets flex:1 + overflow:hidden, WAVEFORM_BARS 40→28, bubble overflow:hidden, maxWidth 80%→88%. Bug 4: user bubble flips play row order (speed+duration left, play right). Bug 5: voice cycling chip on AI bubbles reads/writes kokoroVoiceId. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Fix guard: was checking isModelLoaded (OuteTTS, always false) instead of kokoroReady — so isAudioModeMessage was never stamped and all AI messages rendered as text in audio mode - Add sentence-level streaming TTS: Kokoro now starts speaking each sentence as soon as LLM finishes generating it, instead of waiting for the full response - Fix waveform invisible in idle state: min bar height 3→6px and empty waveform now renders a sine-wave placeholder instead of nearly-invisible flat bars Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds memory-rag capability and conversationRagService spec so Jarvis can retrieve relevant context from past conversations and inject it into the system prompt — giving it cross-chat intelligence without requiring the user to repeat themselves. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Stamp isAudioModeMessage BEFORE checking TTS engine readiness — so AI messages always render as audio bubbles even when Kokoro hasn't downloaded yet - Add minWidth: 220 to audio bubble so flex:1 waveform container has space to expand (previously collapsed to 0 since bubble shrinks to content in flex-end alignment) - Audio mode input: hide text pill, show centered VoiceRecordButton with 'Hold to speak' / 'Release to send' hint — clearly communicates the interface mode - User voice recordings now render as AudioMessageBubble in BOTH chat and audio mode — tap play to hear your recording back regardless of which interface is active Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- MessageRenderer now renders ALL assistant messages as audio bubbles when interfaceMode=audio (not just isAudioModeMessage-stamped ones), fixing old messages showing as text after enabling audio mode - Removed voiceChip from play row; added dedicated voice row below controls with mic icon + voice name + chevron-right to cycle voices - AudioMessageBubble: streaming-only messages (no audioPath) correctly fall through to speak(transcript) for on-demand playback - ChatInput audio mode: added +/settings buttons back on left side so users can attach photos and configure tools while in audio mode Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replace animated WaveformBars (VU-meter, wave bounce, 3 animation modes, Animated.Value refs) with simple static bars. Progress is now shown entirely by the native Slider component. Remove RMS amplitude calculation from KokoroTTSManager onNext callback. ~80 lines of animation code removed. No more JS thread contention from per-chunk amplitude updates. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…click play - Transcript shows karaoke-style word highlighting based on playback progress — spoken words in full color, upcoming words muted - Stop any TTS playback when user starts recording (mic + speaker shouldn't overlap) - Set isSpeaking + currentMessageId immediately before the 300ms Kokoro cleanup wait, so UI shows loading state right away when switching clips Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- KokoroTTSManager: 500ms cooldown after isSpeaking→false before applying voice config change, giving native ExecuTorch thread time to fully stop - Transcript highlight: only the currently spoken word is highlighted (primary color + subtle background), not all spoken words - Auto-scroll: ScrollView with maxHeight 120px, scrolls to keep the active word visible as playback progresses Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Remove word-level transcript highlighting — Kokoro doesn't provide word timestamps, so it was always off. Keep transcript as plain text in a scrollable container (max 120px) - Waveform bars now visually distinguish playing vs idle: playing bars are brighter (0.6–1.0 opacity), idle bars are dimmer (0.25–0.6) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Waveform bars now tint as the playhead passes: played bars are bright, unplayed bars are muted — like WhatsApp voice messages - Progress is shown directly on the bars, with the Slider below for drag-to-seek interaction - Increase voice change cooldown to 1500ms to prevent native crash Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Audio bubble uses fixed width: 88% (not maxWidth) so it doesn't resize when transcript opens - Thinking block wrapper matches at width: 88% (was maxWidth: 85%) - Both bubbles now render at exactly the same width Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Slider is now positioned on top of the waveform bars (centered vertically) instead of as a separate row below - Slider track is transparent — waveform bar coloring shows progress - Slider thumb (dot) sits on top of the waveform at the current position - Seekbar visible on both user and AI audio bubbles - Removed separate seekbar row — cleaner layout Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Thumb is transparent when progress=0 and not seeking. Only becomes visible (primary color) when audio is actively playing or user is dragging the slider. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Thumb always shows (primary color) so users know they can seek - Expand seekOverlay to left/right -16px to compensate for Android Slider's built-in ~16px internal padding — thumb now aligns with the waveform bar highlighting Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Play button + waveform in top row (waveform takes full remaining width) - Show transcript, duration, speed chip in a single meta row below - Matches WhatsApp voice message layout: play + waveform on top, info below Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Bars now distribute evenly across the entire container width instead of clustering together with fixed 2px gaps. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Increase to 48 bars with 1.5px gaps — fills full width, looks denser - Bigger speed chip (more padding, larger border radius) — easier to tap - Voice change cooldown now uses actual stream end timestamp instead of isSpeaking state — waits 2 seconds from when the native stream actually stopped, not from when JS flag flipped - Both user and AI bubbles use same width: 88% Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Waveform bars now span edge-to-edge across the entire bubble width. Play button sits in the meta row below alongside show transcript, duration, and speed chip. No more asymmetric padding. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Reverted play button to left of waveform (standard layout). Reduced playRow gap from SPACING.sm to SPACING.xs so waveform extends further right. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Voice switch: key-based remount of KokoroTTSManager avoids native SIGSEGV when executorch re-initializes with a new voice config. Outer component manages cooldown, inner component holds the hook. Sets kokoroReady=false during switch so UI shows loader. - Seekbar progress: playMessage finally block now checks ownership (currentMessageId === messageId) before clearing state, preventing it from clobbering an in-flight speak() call's isSpeaking/isAudioPlaying. Added playSessionId counter + retry loop (up to 10x 200ms) when executorch reports "model is currently generating" (code 104). - Seekbar smoothness: timer interval 500ms→50ms, fractional seconds instead of Math.floor for continuous waveform bar progress. - Transcript layout: split TranscriptSection into TranscriptToggle (stays in metaRow with time/speed) and TranscriptContent (renders below), preventing text from squeezing against duration/speed chip. - Chat scroll: FlatList hidden (opacity:0) during initial layout, revealed after first scrollToEnd settles. Mode switch (chat↔audio) resets scroll via extraData + scrollToEnd. - Voice loader UI: track kokoroActiveVoiceId in store, derive isChangingVoice in UI components from settings vs active mismatch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…nto feat/tts-implementation

…tional Kokoro - Audio mode now renders tool-call messages via ChatMessage (proper bubble + tool call UI) instead of dropping them as raw unstyled text. Plain assistant messages still render as AudioMessageBubble. - Transcript ScrollView uses react-native-gesture-handler for reliable nested scrolling inside FlatList on Android. Moved transcript outside the TouchableOpacity wrapper so it can capture scroll gestures. - Action menu (long-press + 3-dot) added to both user and assistant audio bubbles: Copy + Resend for user, Copy + Regenerate for assistant. - Kokoro TTS only loads in audio interface mode (App.tsx), saving RAM when in chat mode. - Post-stream ownership transfer: when all text was spoken by streaming chunks, transfers currentMessageId from 'streaming' to the real message ID so the AudioMessageBubble seekbar works. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When retrying a message while TTS is speaking, the audio bubble disappears but Kokoro continues playing natively. Now calls ttsStore.stop() before deleting messages in the retry handler. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Conditional mounting (audio mode only) caused Kokoro to not be ready during streaming — it takes ~10s to initialize, but fast models finish streaming before that. Streaming TTS chunks silently skipped because kokoroReady was false. Reverting to always-mounted so Kokoro is warm when streaming starts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Streaming TTS chunks couldn't keep up with fast cloud models — Kokoro speaks slower than tokens arrive, causing a growing backlog of unspoken chunks, word skipping at transitions, and unpredictable playback. Replaced with a simpler approach: text streams normally as a ChatMessage, then when streaming ends the full response is spoken as a single TTS call with the real message ID. Clean, predictable, no word skipping. Also includes: stop in-flight TTS when new streaming begins, TTS stop on retry/resend, and text offset fix for post-stream remaining calc. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Introduce an engine abstraction layer that decouples the app from any specific TTS implementation. Engines register with a generic registry, the store delegates all operations through the active engine, and UI components read engine-agnostic state. - OnDeviceEngine base interface (lifecycle, assets, events, capabilities) designed to generalize to STT, Vision, and LLM modalities - TTSEngine extends base with voice management, speak/stop/pause/resume, generateAndSave, and streaming audio events - KokoroEngine wraps react-native-executorch hook via bridge component - OuteTTSEngine absorbs ttsService.ts into the engine interface - Qwen3TTSEngine stub with asset management ready, inference pipeline TODO - ttsStore rewritten as thin proxy — no engine-specific branching - Engine picker added to TTS Settings screen - Settings migration from old voiceId/kokoroVoiceId to voiceByEngine map - Race condition fixes via playSessionId ownership - 157 test suites, 5176 tests passing, 0 tsc errors, 0 lint errors Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

sonarqubecloud · 2026-04-09T18:26:51Z

Quality Gate failed

Failed conditions
6 Security Hotspots
5.4% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

gemini-code-assist

Code Review

This pull request introduces a pluggable text-to-speech (TTS) architecture supporting multiple on-device engines, specifically Kokoro for fast streaming and OuteTTS for high-quality audio generation. It includes new UI components for waveform audio bubbles, a dedicated TTS settings screen, and updates to the chat interface for an "Audio Mode" experience. Feedback highlights a bug in OuteTTSEngine where raw PCM data is incorrectly passed to decodeAudioData without a header, and suggests a more efficient Buffer-based implementation for base64 conversion of audio samples.

gemini-code-assist · 2026-04-09T18:42:28Z

+      const src = filePath.startsWith('file://') ? filePath : `file://${filePath}`;
+      const buffer = await this._audioCtx.decodeAudioData(src as unknown as ArrayBuffer);
+
+      // Abort if stop() was called during decode


The decodeAudioData method is typically used for encoded audio formats (like WAV or MP3). Since the engine writes raw Float32 PCM data to disk, decodeAudioData will likely fail to decode the .pcm file as it lacks a header. You should read the file as an ArrayBuffer and manually load it into an AudioBuffer using createBuffer and copyToChannel.

gemini-code-assist · 2026-04-09T18:42:28Z

+  private _float32ToBase64(samples: Float32Array): string {
+    const uint8 = new Uint8Array(samples.buffer);
+    let binary = '';
+    for (let i = 0; i < uint8.length; i++) {
+      binary += String.fromCharCode(uint8[i]);
+    }
+    return btoa(binary);
+  }
+}


The _float32ToBase64 implementation is inefficient and uses non-standard globals. Use Buffer for faster, safer base64 conversion.

private _float32ToBase64(samples: Float32Array): string { return Buffer.from(samples.buffer, samples.byteOffset, samples.byteLength).toString('base64'); }

alichherawalla and others added 30 commits April 7, 2026 16:41

test: add useTTSStore mock to ChatInput test suite

ee07ec2

The ChatInput test mock for src/stores was missing useTTSStore, causing Popovers.tsx (which now uses useTTSStore) to throw on render. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

test: update getAudioCacheSizeMB test for readDir-based implementation

c602566

Replaced stat() mock with readDir() mocks matching the new recursive file-size summation approach. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merge branch 'main' of github.com:alichherawalla/off-grid-mobile-ai i…

edd8043

…nto feat/tts-implementation

feat: add NumericStepper component

8ab6a50

Replaces slider controls with a [–] value [+] stepper row for precise numeric input in settings screens. Supports min/max/step, optional decimal formatting, and testID for E2E automation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

alichherawalla and others added 27 commits April 9, 2026 11:29

fix: hide seekbar thumb when not playing — no stray dot at position 0

b3d7077

Thumb is transparent when progress=0 and not seeking. Only becomes visible (primary color) when audio is actively playing or user is dragging the slider. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: waveform bars span full width using space-between

e5b4816

Bars now distribute evenly across the entire container width instead of clustering together with fixed 2px gaps. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: revert — play button back on left, reduce gap for wider waveform

e8bce31

Reverted play button to left of waveform (standard layout). Reduced playRow gap from SPACING.sm to SPACING.xs so waveform extends further right. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: add left margin to waveform for spacing from play button

33dd403

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: increase waveform left margin to SPACING.sm

e958dd4

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: waveform extends to bubble right edge, spacing from play button

64c6a2a

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: bars flex to fill full waveform width — no right gap

ff738a0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: remove negative right margin — waveform stays within bubble

42e606c

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: interim

ea27099

Merge branch 'main' of github.com:alichherawalla/off-grid-mobile-ai i…

0beba49

…nto feat/tts-implementation

greptile-apps Bot reviewed Apr 9, 2026

View reviewed changes

gemini-code-assist Bot reviewed Apr 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: pluggable TTS engine interface#247

feat: pluggable TTS engine interface#247
alichherawalla wants to merge 98 commits intomainfrom
feat/tts-engine-interface

alichherawalla commented Apr 9, 2026

Uh oh!

greptile-apps Bot left a comment

Uh oh!

sonarqubecloud Bot commented Apr 9, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 9, 2026

Uh oh!

gemini-code-assist Bot Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alichherawalla commented Apr 9, 2026

Summary

What changed

How engine swapping works

Test plan

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud Bot commented Apr 9, 2026

Quality Gate failed

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant