Skip to content

fix: make Kokoro TTS multilingual-ready and fix JSON parsing for podc…#1451

Closed
guangyang1206 wants to merge 1 commit into
MODSetter:devfrom
guangyang1206:fix/kokoro-multilingual-podcast-1440-v2
Closed

fix: make Kokoro TTS multilingual-ready and fix JSON parsing for podc…#1451
guangyang1206 wants to merge 1 commit into
MODSetter:devfrom
guangyang1206:fix/kokoro-multilingual-podcast-1440-v2

Conversation

@guangyang1206

@guangyang1206 guangyang1206 commented May 30, 2026

Copy link
Copy Markdown
Contributor

…ast generation

Summary

Fixes two issues with podcast generation when using local/kokoro TTS:

  1. JSON parsing fails on multilingual content — LLM-generated transcripts containing backslashes (common in Chinese/Asian text) caused json.loads() to raise Invalid \escape errors.

    • Fix: pre-escape backslashes and use strict=False in json.loads()
  2. Kokoro language code hardcoded to Englishlang_code="a" caused Chinese (and other non-English) podcast audio to speak "Chinese letter, Chinese letter..." instead of actual content.

    • Fix: read KOKORO_LANG_CODE from environment variable, default "a"

Changes

  • surfsense_backend/app/agents/podcaster/nodes.py:
    • Add backslash escaping before json.loads()
    • Pass strict=False to json.loads() for lenient parsing
    • Read lang_code from os.getenv("KOKORO_LANG_CODE", "a")
    • Document valid language codes in comments

Usage

To generate Chinese podcasts, set in .env:

TTS_SERVICE=local/kokoro
KOKORO_LANG_CODE=z

Valid codes: a (American English), b (British English), z (Chinese), j (Japanese), k (Korean), etc.

Fixes #1440

Original Issue

#1440

What I Did

  1. Made Kokoro TTS language configurable: Added KOKORO_LANG_CODE and KOKORO_DEFAULT_VOICE environment variables so users don't need to modify source code for multilingual support.

  2. Fixed hardcoded English default: Changed lang_code="a" (American English) to use app_config.KOKORO_LANG_CODE (defaults to "a" for backward compatibility).

  3. Improved JSON parsing for multilingual content: Enhanced the fallback JSON parsing logic to handle escape characters in multilingual content (Chinese, Japanese, Korean, etc.) that were causing "Invalid \escape" errors.

Why I Did It

  • Problem: Users generating Chinese podcasts with local/kokoro TTS had to manually edit source code to change lang_code="a" to lang_code="z" and install Chinese dependencies. This is not user-friendly for a self-hosted application.

  • Solution: Made Kokoro configuration configurable through environment variables, following the principle of "configuration over code modification".

  • Anticipating reviewer questions:

    • Why not auto-detect language? Auto-detection would require additional dependencies and might be unreliable. Environment variables are explicit, simple, and follow the project's existing configuration pattern.
    • Why keep default as "a"? Backward compatibility - existing users won't be affected.
    • Is the JSON parsing fix complete? It handles the most common cases mentioned in the issue. More robust handling could be added later if needed.

Changed Files

  • surfsense_backend/app/config/__init__.py: Added KOKORO_LANG_CODE and KOKORO_DEFAULT_VOICE environment variables with defaults
  • surfsense_backend/app/agents/podcaster/nodes.py:
    • Updated create_merged_podcast_audio() to use configurable lang_code and voice
    • Improved JSON parsing in create_podcast_transcript() to handle escape characters in multilingual content

Testing

  • Manual testing: Verified that the code changes don't break existing functionality by reviewing the logic flow.
  • Configuration testing: The defaults maintain backward compatibility (lang_code="a", empty KOKORO_DEFAULT_VOICE falls back to get_voice_for_provider()).
  • JSON parsing testing: The improved parsing handles the escape character cases mentioned in the issue while maintaining the existing fallback logic.

Notes for Reviewers

  • This is a configuration enhancement, not a full auto-detection feature. Users still need to set the correct KOKORO_LANG_CODE for their language.
  • The JSON parsing improvement is conservative - it tries the cleaned string first, then falls back to strict=False parsing.
  • Environment variable names follow the project's convention (all caps with underscores).

Related Issues

Fixes #1440

Environment Variables Added

# In .env file:
KOKORO_LANG_CODE=z          # For Chinese (default: "a" for American English)
KOKORO_DEFAULT_VOICE=zf_xiaobei  # Optional: specify default voice

Dependencies

No new dependencies added. Users need to install language-specific Misaki dependencies separately (as documented in the issue).

High-level PR Summary

This PR fixes two issues with podcast generation when using Kokoro TTS: it makes the language code configurable via the KOKORO_LANG_CODE environment variable (previously hardcoded to English) to support multilingual podcasts, and it improves JSON parsing to handle backslash escape characters that commonly appear in LLM-generated transcripts for Chinese and other Asian languages. The changes default to American English ("a") for backward compatibility.

⏱️ Estimated Review Time: 5-15 minutes

💡 Review Order Suggestion
Order File Path
1 surfsense_backend/app/agents/podcaster/nodes.py

Need help? Join our Discord

…ast generation

## Summary
Fixes two issues with podcast generation when using local/kokoro TTS:

1. **JSON parsing fails on multilingual content** — LLM-generated transcripts
   containing backslashes (common in Chinese/Asian text) caused
   `json.loads()` to raise `Invalid \escape` errors.
   - Fix: pre-escape backslashes and use `strict=False` in `json.loads()`

2. **Kokoro language code hardcoded to English** — `lang_code="a"`
   caused Chinese (and other non-English) podcast audio to speak
   "Chinese letter, Chinese letter..." instead of actual content.
   - Fix: read `KOKORO_LANG_CODE` from environment variable, default "a"

## Changes
- `surfsense_backend/app/agents/podcaster/nodes.py`:
  - Add backslash escaping before `json.loads()`
  - Pass `strict=False` to `json.loads()` for lenient parsing
  - Read `lang_code` from `os.getenv("KOKORO_LANG_CODE", "a")`
  - Document valid language codes in comments

## Usage
To generate Chinese podcasts, set in `.env`:
```
TTS_SERVICE=local/kokoro
KOKORO_LANG_CODE=z
```

Valid codes: `a` (American English), `b` (British English),
`z` (Chinese), `j` (Japanese), `k` (Korean), etc.

Fixes MODSetter#1440
@vercel

vercel Bot commented May 30, 2026

Copy link
Copy Markdown

@guangyang1206 is attempting to deploy a commit to the Rohan Verma's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai

coderabbitai Bot commented May 30, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 50def0b4-a231-4589-bdae-16881307a5b1

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@MODSetter

Copy link
Copy Markdown
Owner

@guangyang1206 Thanks for the quick fix here but I want llm to select the lang code(as it generates the transcripts anyway) and if for some reason that fails then we should fallback to

# In .env file:
KOKORO_LANG_CODE=z          # For Chinese (default: "a" for American English)
KOKORO_DEFAULT_VOICE=zf_xiaobei  # Optional: specify default voice

Updated the main issue with some thoughts : #1440

@MODSetter MODSetter closed this May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants