Skip to content

Claude/add cultural filters b vmpl#46

Open
kleer001 wants to merge 15 commits intoagwells:masterfrom
kleer001:claude/add-cultural-filters-bVMPL
Open

Claude/add cultural filters b vmpl#46
kleer001 wants to merge 15 commits intoagwells:masterfrom
kleer001:claude/add-cultural-filters-bVMPL

Conversation

@kleer001
Copy link
Copy Markdown

@kleer001 kleer001 commented Dec 15, 2025

Sorry, please delete

claude and others added 15 commits December 14, 2025 16:23
This commit introduces a comprehensive Python library for creating
text transformation filters, based on analysis of the existing
TypeScript filters (nyc.ts, klaus.ts, newspeak.ts).

Key components:

1. text_transformer.py - Core library following SOLID/DRY/YAGNI principles
   - Defines Transformer protocol for interface segregation
   - Provides RegexTransformer base class to eliminate duplication
   - Implements 10 transformation patterns extracted from TS filters
   - Consolidated WordSubstitution and PhraseConsolidation (YAGNI)

2. disco_filter.py - Complete working example (1970s disco enthusiast)
   - Demonstrates recommended pattern for creating custom filters
   - Shows proper transformation ordering
   - Data-driven design using JSON dictionary

3. disco_slang.json - Example slang dictionary
   - Structured format for maintaining vocabularies
   - Includes words, phrases, affixes, and sentence fillers

4. TRANSFORMATION_PATTERNS.md - High-level pattern analysis
   - Documents 10 core transformation patterns identified
   - Explains pattern application order and rationale
   - Provides examples from each analyzed filter

5. DEVELOPER_GUIDE.md - Comprehensive developer documentation
   - Step-by-step instructions for creating custom filters
   - Best practices for DRY, YAGNI, and SOLID
   - Real-world examples (pirate, corporate, Elizabethan)
   - Testing and troubleshooting guide

6. README.md - Quick start guide and API reference

The library extracts common patterns from the TypeScript filters:
- Word/phrase substitution with case preservation
- Character pair replacement for accents
- Suffix/prefix morphology transformations
- Sentence augmentation at punctuation boundaries
- Context-aware boundary detection

Design follows SOLID principles:
- Single Responsibility: Each transformer has one purpose
- Open/Closed: Extend via Transformer protocol
- Liskov Substitution: All transformers interchangeable
- Interface Segregation: Simple transform() protocol
- Dependency Inversion: TextFilter depends on abstractions

Tested and working. Ready for creating new language filters.
Major DRY/YAGNI improvement eliminating duplicated filter-building logic.

PROBLEM:
The original design required a Python class for each filter (disco_filter.py,
etc.) that all duplicated the same filter-building logic. This violated DRY
by repeating boilerplate code in every filter.

SOLUTION:
Separate data from logic:
- Logic: Universal FilterFactory (write once, in filter_factory.py)
- Data: JSON files (one per filter, no code duplication)

ARCHITECTURE CHANGE:

Before (duplicated logic):
  disco_filter.py   ─┐
  pirate_filter.py  ├─ Each contains filter-building logic
  valley_filter.py  ┘

After (DRY):
  filter_factory.py ──► Universal builder (write once)
          │
          ├─► disco.json    (just data)
          ├─► pirate.json   (just data)
          └─► german.json   (just data)

BENEFITS:

1. DRY: Filter-building logic in ONE place
2. YAGNI: No custom Python class per filter
3. Separation of Concerns: Data (vocabulary) vs Logic (transformations)
4. Accessibility: Non-programmers can create filters
5. Maintainability: Edit JSON data files, not code
6. Version Control: Cleaner diffs on vocabulary changes

NEW FILES:

- filter_factory.py: Universal filter builder from JSON
  - Reads JSON configuration
  - Automatically constructs appropriate transformer pipeline
  - Command-line interface for any JSON filter

- FILTER_SCHEMA.md: Complete JSON schema documentation
  - All supported fields and formats
  - Examples and best practices
  - Transformation ordering explanation

- disco.json: Refactored disco filter (merged disco_slang.json + metadata)
- pirate.json: New pirate speak filter example
- german.json: New German accent filter example

DELETED FILES:

- disco_filter.py: Replaced by disco.json + filter_factory.py
- disco_slang.json: Merged into disco.json

USAGE:

Old way (80 lines of Python):
  python disco_filter.py "Hello friend"

New way (just JSON data):
  python filter_factory.py disco.json "Hello friend"

Creating new filters:
Old: Write Python class with boilerplate
New: Copy JSON file, edit vocabulary

JSON SCHEMA:

Filters defined with simple structure:
{
  "name": "Filter Name",
  "substitutions": { "word": "replacement" },
  "characters": { "th": "d" },
  "suffixes": { "ing": "in'" },
  "sentence_augmentation": [...]
}

FilterFactory automatically:
- Applies transformations in correct order
- Handles case preservation
- Manages word boundaries
- Builds complete transformation pipeline

This design follows the principle: "Most text transformations are data,
not logic." The vocabulary is the variable part; the transformation
patterns are constant.

Tested and working with all three example filters.
ENHANCEMENTS:

1. Executable + Syntactic Sugar (filter_factory.py)
   - Made filter_factory.py executable (chmod +x)
   - Added find_filter_file() to search for filters intelligently
   - Accepts filter names with or without .json extension
   - Searches in current dir and script dir automatically

   Before: python filter_factory.py disco.json "text"
   After:  ./filter_factory.py disco "text"

   Much cleaner UX!

2. Comprehensive Filter Analysis (FILTER_ANALYSIS.md)
   - Analyzed ALL 25 TypeScript filters in src/
   - Categorized by transformation patterns used
   - Identified what's supported vs what needs custom code

   Key findings:
   - ~70% of filters can be fully implemented in JSON
   - ~30% need algorithmic transformations (custom Python)
   - ALL generalizable patterns already supported!

   Conclusion: Library is feature-complete. No new capabilities needed.

3. New Example Filters
   - chef.json - Swedish Chef (Bork Bork Bork!)
   - fudd.json - Elmer Fudd (Be vewy vewy quiet)

   Demonstrates that simple accent filters work perfectly in JSON.

4. Updated Documentation
   - README.md: New usage examples with syntactic sugar
   - README.md: Added coverage section citing 70% JSON support
   - README.md: Updated testing section
   - FILTER_ANALYSIS.md: Detailed analysis of all 25 filters

ANALYSIS RESULTS:

Patterns ALREADY SUPPORTED:
✅ Word/phrase substitution
✅ Character replacement
✅ Suffix/prefix transformation
✅ Sentence augmentation
✅ Case preservation
✅ Word boundaries
✅ Large dictionaries
✅ Regex patterns (lookahead/lookbehind)

Patterns NOT NEEDED (algorithm-specific):
❌ Letter scrambling (scramble.ts only)
❌ Position-based alternation (studly.ts only)
❌ String reversal (upsidedown.ts only)
❌ Filter composition (jibberish.ts meta-feature)

These are correctly NOT in the library - they're too specific to generalize.

DESIGN VALIDATION:

The analysis confirms our architecture is optimal:
- Data (vocabularies) in JSON
- Logic (patterns) in Python library
- Algorithms (special cases) in custom Python classes

No over-engineering needed. Library is complete! 🎉

Files changed:
- python/filter_factory.py: Executable + smart file finding
- python/README.md: Updated examples and coverage info
- python/FILTER_ANALYSIS.md: NEW - Comprehensive analysis
- python/chef.json: NEW - Swedish Chef filter
- python/fudd.json: NEW - Elmer Fudd filter
NEW FILTERS:

1. Computer Glitch Effects (4 variations):
   - glitch-100.json - 100% corruption (full Unicode blocks)
   - glitch-50.json - 50% corruption
   - glitch-25.json - 25% corruption
   - glitch-10.json - 10% light corruption

   Algorithmic transformer that replaces characters with Unicode
   block/shape characters at configurable percentages.

2. Subculture Slang Dictionaries (3 filters):
   - club_kids_1980s.json - 1980s club/rave culture
     "Hello friend! This party is amazing!"
     → "Yo homie! This rave is phenomenal! No doubt!"

   - greasers_1950s.json - 1950s greasers and hot rods
     "Hello friend, this is really cool!"
     → "Hey there cat, this is real hip!"

   - punk_rockers_1970s.json - 1970s punk rock scene
     "Hello friend! This party is great!"
     → "Oi mate! This gig is ace!, mate!"

IMPLEMENTATION:

Added GlitchTransformer class (text_transformer.py):
- Replaces alphanumeric characters with Unicode blocks/shapes
- Configurable percentage (0-100)
- Seeded random for reproducible results
- Applied last in pipeline to corrupt final output

Extended FilterFactory (filter_factory.py):
- Added support for "glitch" field in JSON
- Simple format: "glitch": 50 (percentage)
- Advanced format: {"percentage": 25, "seed": 12345}
- Automatically instantiates GlitchTransformer

Subculture filters:
- Extensive slang dictionaries (100+ terms each)
- Period-appropriate vocabulary
- Sentence augmentation for authenticity
- Pure JSON (no custom code needed)

DOCUMENTATION UPDATES:

README.md:
- Reorganized examples into categories (Accents, Subcultures, Effects)
- Added examples of new filters in Quick Start
- Updated testing section

FILTER_SCHEMA.md:
- Documented new "glitch" field
- Examples and usage patterns
- Explained transformation order (glitch applies last)

TESTING:

All filters tested and working:
✅ glitch-100: "Hello" → "▐●▯■▌"
✅ glitch-50: "Hello world" → "H▓l●■ w○r◅d"
✅ club_kids_1980s: Transforms to 80s rave slang
✅ greasers_1950s: Transforms to 50s greaser talk
✅ punk_rockers_1970s: Transforms to 70s punk speak

DESIGN NOTES:

Glitch filter demonstrates extensibility:
- Algorithmic transformations work alongside JSON config
- Added as new capability without breaking existing filters
- Follows SOLID principles (open/closed)

Subculture filters demonstrate power of data-driven approach:
- Each filter is 100+ lines of pure vocabulary
- No code duplication
- Easy to create and maintain
- Perfect use case for JSON configuration

Total: 11 filters now available (4 existing + 7 new)

Files changed:
- python/text_transformer.py: Added GlitchTransformer class
- python/filter_factory.py: Added glitch support in from_dict
- python/README.md: Updated with new examples
- python/FILTER_SCHEMA.md: Documented glitch field
- python/glitch-{10,25,50,100}.json: NEW glitch filters
- python/club_kids_1980s.json: NEW 80s club culture
- python/greasers_1950s.json: NEW 50s greaser culture
- python/punk_rockers_1970s.json: NEW 70s punk culture
Transform repository into final form with comprehensive documentation:

Documentation Changes:
- Create new public-facing README.md with full attribution chain
  (Joey Hess → Aaron Wells → Claude/kleer001)
- Add src/README.md as TypeScript filter development guide
  with decision matrix, common patterns, and utility docs
- Move original TypeScript README to docs/README-typescript.md
  for preservation

Repository Cleanup:
- Remove __pycache__/ directory (build artifacts)
- Update .gitignore to exclude *.pyc and *.pyo files

All Python filters tested and working:
- Vocabulary filters: disco, greasers_1950s, club_kids_1980s, punk_rockers_1970s
- Algorithmic filters: glitch-10, glitch-25, glitch-50, glitch-100

Repository now features:
- Data-driven architecture (JSON vocabularies + Python logic)
- 15 total filters (11 JSON-based, 4 algorithmic)
- Comprehensive developer guides
- Full attribution and licensing information
Clean break from TypeScript - this is now a pure Python library.

Removed:
- All TypeScript source files (src/*.ts)
- TypeScript configuration (tsconfig.json, babel.config.js)
- Node.js dependencies (package.json, package-lock.json)
- Demo React app (demo/)
- TypeScript tests (tests/)
- IDE configuration (ide-config/)
- TypeScript documentation (docs/README-typescript.md, src/README.md)

Updated:
- README.md - Pure Python focus, no TypeScript references
- .gitignore - Python-only entries

Repository now contains:
- python/ - Complete data-driven text transformation library
- original/ - Historical C implementations for reference
- CHANGELOG.md - Project history

This is a complete redesign with data-driven architecture:
- Data (vocabularies) in JSON files
- Logic (transformations) in reusable Python library
- 15 filters available, ~70% are pure JSON
Convert TypeScript filters to JSON format:
- scottish.json - Scottish/Dwarven accent with 50+ substitutions
- nyc.json - Brooklyn/NYC English with sentence augmentation
- jethro.json - Hillbilly dialect with 80+ vocabulary terms

All filters tested and working. Repository now has 15 JSON filters:
- 11 vocabulary-based (pure JSON)
- 4 algorithmic (glitch effects)

Updated README to list new accent and subculture filters.
Major refactoring to cleanly separate data from algorithms:

Architecture Changes:
- Rename python/ to src/ (more conventional)
- Implement "type": "python" filter pattern
- Each custom filter = JSON + Python module pair
- PythonModuleTransformer dynamically loads modules
- Clean separation: JSON for config, Python for logic

New Custom Filters (Python modules):
- duck.py/duck.json - Replaces words with "quack" variations
- studly.py/studly.json - Random capitalization (StUdLy CaPs)
- lolcat.py/lolcat.json - Internet cat speak with random caps
- glitch.py/glitch.json - Refactored from built-in transformer

Benefits of New Architecture:
✅ Zero mixing of concerns
✅ Each filter is explicit and self-contained
✅ Python devs have clear pattern: def transform(text, **kwargs)
✅ JSON users aren't exposed to Python code
✅ Easy to add new custom filters

File Organization:
src/
  ├── *.json          (filter configurations)
  ├── *.py            (custom transformers)
  └── filter_factory.py (universal builder)

Updated:
- README.md - New architecture diagram, src/ paths, 18 filters
- filter_factory.py - Python module loading support
- All glitch-*.json - Use new Python module pattern

Statistics:
- 18 total filters (11 pure JSON, 7 with Python modules)
- 6 Python transformer modules
- Clean separation of data and algorithms
Claude/analyze text transformations e1 k8x
Created comprehensive cultural filters for:
- 1920s: Flappers
- 1940s: Zoot Suiters
- 1950s: Beatniks, IBM Engineers, Teddy Boys
- 1960s: Hippies, Mods, Mid-Century Modern, Outlaw Bikers, Surfers, Skinheads
- 1970s: Metalheads, Rastafarians
- 1980s: Goths, Hip Hop Breakers, New Romantic Goths, Ravers, Yuppies
- 1990s: Grunge Musicians, Hackers, Riot Grrrl, Slackers

Each filter includes:
- Era-appropriate slang and vocabulary
- Cultural-specific replacements and phrases
- Sentence augmentation with period swears for aggressive cultures
- Exaggerated cultural lens for effect

Also compiled master wordlist (737 unique words/phrases) from all 20th
century culture filters for game development use.
Cleaned all JSON filter files by removing substitutions where words
mapped to themselves (e.g., "pills": "pills"). This streamlines the
filters and reduces unnecessary processing.

Updated master wordlist from 737 to 590 unique words/phrases.
Enhanced cultural filters with period-specific vocabulary based on research
from multiple authentic sources:

Sources researched:
- 1920s: Ella Hartung's Flapper Dictionary (1922), Charleston Daily Mail
- 1940s: Cab Calloway's Hepster Dictionary, jive talk glossaries
- 1950s: "Straight From the Fridge, Dad", "How to Speak Hip" (1959)
- 1960s: "The Hippie Dictionary" (6,000 entries), surf culture sources
- 1970s: British punk archives, heavy metal subculture sources
- 1980s: Hip hop/b-boy terminology, acid house/rave culture sources
- 1990s: Riot grrrl zines, Gen X slang (note: grunge lexicon was a hoax)

Added terms include:
- Flappers: 29 terms (baloney, cheaters, gams, flat tire, etc.)
- Beatniks: 25 terms (alligator, cube, tea, lamps, etc.)
- Hippies: 19 terms (grok, be-in, crash pad, mind-bending, etc.)
- Hip Hop: 13 terms (cypher, throwdown, headspin, bite, etc.)
- Surfers: 11 terms (hodad, log, gremmie, soul surfer, etc.)
- And 10+ other subcultures

Master wordlist updated: 590 → 761 unique words and phrases

All terms are historically documented slang from the public record,
synthesized from multiple scholarly and archival sources.
Found and resolved duplicate keys in JSON files where the same English
word appeared multiple times with different replacements. Python's JSON
parser silently keeps only the last occurrence, so these duplicates were
causing data loss.

Files cleaned:
- All 25 cultural filter JSON files had duplicates removed
- Most common duplicates: 'girl', 'drunk', 'money', 'nice', 'stylish'

Added diagnostic scripts:
- check_redundancies.py: Detect self-mappings and duplicate keys
- fix_duplicates.py: Auto-fix duplicate keys by keeping last value

Result: All filters now validated clean with no redundancies.
Removed temporary development scripts:
- enhance_filters.py (one-time vocabulary migration tool)

Kept useful maintenance utilities:
- check_redundancies.py: Validates filters for self-mappings and duplicates
- compile_master_list.py: Generates master wordlist from all filters
- fix_duplicates.py: Fixes duplicate keys in JSON files
- remove_redundant_mappings.py: Removes self-mapping entries

Repository now clean and public-ready with 40 JSON filter files and
4 maintenance utilities.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants