EmbeddingGenerator.sex_text_model does not update the model ; remains "fastembed" #160

MoktarEls · 2026-01-08T11:04:33Z

MoktarEls
Jan 8, 2026

Hi !

We noticed that when using EmbeddingGenerator.set_text_model(), the embedding method does not update correctly.
It always defaults to fastembed, even when we specify another method (e.g., sentence_transformers).

Using TextEmbedder directly works perfectly, for example:

from semantica.embeddings import TextEmbedder

generator = TextEmbedder(method="sentence_transformers", model_name="dangvantuan/sentence-camembert-base")
print(f"Current method: {generator.get_method()}")

# Prepare texts for embedding generation
print("Preparing texts for embedding...")
texts = ["Un avion est en train de décoller.",
                   "Un homme joue d'une grande flûte.",
          "Un homme étale du fromage râpé sur une pizza.",
          "Une personne jette un chat au plafond.",
          "Une personne est en train de plier un morceau de papier.",
          ]

try:
    print(f"methods info : {generator.get_model_info()}")
    # Generate embeddings using the configured embedding generator
    print("Generating embeddings...")

    embeddings = generator.embed_batch(texts, show_progress_bar=True)

    print(f"Embeddings generated successfully:")
    print(f"  - Total embeddings: {len(embeddings)}")
    print(f"  - Embedding dimension: {embeddings.shape[1] if len(embeddings) > 0 else 0}")
    print(embeddings)

except ImportError:
    print("Error occured ")

Output :

Current method: sentence_transformers
Preparing texts for embedding...
methods info : {'method': 'sentence_transformers', 'model_name': 'dangvantuan/sentence-camembert-base', 'model_loaded': True, 'dimension': 768, 'normalize': True, 'device': 'cpu'}
Generating embeddings...
Batches: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  5.46it/s] 
Embeddings generated successfully:
  - Total embeddings: 5
  - Embedding dimension: 768

However, when using EmbeddingGenerator (through core.embedding_generator), the method does not change:

config_dict = {
    "project_name": "Semantica_Test_Project",
    
    # Embedding configuration
    "embedding": {
        "provider": "sentence_transformers", 
        "model": "dangvantuan/sentence-camembert-base"  # 768-dimensional embeddings
    }, 
    
    # Extraction configuration (for NER and relation extraction)
    "extraction": {
        "provider": "groq", 
        "model": "llama-3.1-8b-instant", 
        "temperature": 0.0  # Deterministic extraction
    },
    
    # Inference configuration (for answer generation)
    "inference": {
        "provider": "groq",
        "model": "llama-3.3-70b-versatile"
    },
    
    # Vector store configuration
    "vector_store": {
        "provider": "faiss", 
        "dimension": 768  # Must match embedding dimension
    },
    
    # Knowledge graph configuration
    "knowledge_graph": {
        "backend": "networkx", 
        "merge_entities": True  # Automatically merge duplicate entities
    }
}

from semantica.core import Semantica, ConfigManager

# Load configuration and initialize Semantica core
config = ConfigManager().load_from_dict(config_dict)
core = Semantica(config=config)
print(f"Config directory : {config_dict}\n")

print("Preparing texts for embedding...")
texts = ["Un avion est en train de décoller.",
                   "Un homme joue d'une grande flûte.",
          "Un homme étale du fromage râpé sur une pizza.",
          "Une personne jette un chat au plafond.",
          "Une personne est en train de plier un morceau de papier.",
          ]

core.embedding_generator.set_text_model(method="sentence_transformers", model_name="dangvantuan/sentence-camembert-base")

print(f"  - Total chunks to embed: {len(texts)}")
print(f"  - Text method : {core.embedding_generator.get_text_method()}")
print(f"  - Methods info: {core.embedding_generator.get_methods_info()}")
print(f"  - Expected dimension: 768")
print("\n")

embeddings = core.embedding_generator.generate_embeddings(texts, data_type="text")

print(f"Embeddings generated successfully:")
print(f"  - Total embeddings: {len(embeddings)}")
print(f"  - Embedding dimension: {embeddings.shape[1] if len(embeddings) > 0 else 0}")
print("\n")
print(embeddings)

output :

Config directory : {'project_name': 'Semantica_Test_Project', 'embedding': {'provider': 'sentence_transformers', 'model': 'dangvantuan/sentence-camembert-base'}, 'extraction': {'provider': 'groq', 'model': 'llama-3.1-8b-instant', 'temperature': 0.0}, 'inference': {'provider': 'groq', 'model': 'llama-3.3-70b-versatile'}, 'vector_store': {'provider': 'faiss', 'dimension': 768}, 'knowledge_graph': {'backend': 'networkx', 'merge_entities': True}}

Preparing texts for embedding...
  - Total chunks to embed: 5
  - Text method : fastembed
  - Methods info: {'text': {'method': 'fastembed', 'model_name': 'dangvantuan/sentence-camembert-base', 'model_loaded': True, 'dimension': 384, 'normalize': True}}
  - Expected dimension: 768


Embeddings generated successfully:
  - Total embeddings: 5
  - Embedding dimension: 384

Even though the expected embedding dimension is 768, the generator still produces embeddings of dimension 384 and the method remains fastembed

We looked up inside the code to understand, but the set_text_method from the EmbeddingGenerator actually calls the update method from its TextEmbedder, which worked perfectly fine alone, so we have no clue about this behaviour...

Answered by KaifAhmad1

Jan 8, 2026

Hi @MoktarEls Thank you so much for the detailed bug report—your investigation was incredibly helpful in identifying the root cause!

The Fix

We've resolved the issue in the embeddings branch (associated with PR #160). The TextEmbedder now correctly resets its internal state and dynamically detects embedding dimensions when switching models.

Example

# Switch to sentence_transformers
core.embedding_generator.set_text_model(
    method="sentence_transformers", 
    model_name="dangvantuan/sentence-camembert-base"
)

# It now correctly reflects the change:
info = core.embedding_generator.get_methods_info()
print(f"Method: {info['text']['method']}")       # sentence_transformers
print(f"Dimens…

View full answer

KaifAhmad1 · 2026-01-08T13:56:45Z

KaifAhmad1
Jan 8, 2026
Maintainer

Hi @MoktarEls Thank you so much for the detailed bug report—your investigation was incredibly helpful in identifying the root cause!

The Fix

We've resolved the issue in the embeddings branch (associated with PR #160). The TextEmbedder now correctly resets its internal state and dynamically detects embedding dimensions when switching models.

Example

# Switch to sentence_transformers
core.embedding_generator.set_text_model(
    method="sentence_transformers", 
    model_name="dangvantuan/sentence-camembert-base"
)

# It now correctly reflects the change:
info = core.embedding_generator.get_methods_info()
print(f"Method: {info['text']['method']}")       # sentence_transformers
print(f"Dimension: {info['text']['dimension']}") # 768

How to Verify

You can install the latest changes directly from the embeddings branch using pip:

pip install git+https://github.com/Hawksight-AI/semantica.git@main

Alternatively, you can clone the repository and switch to the branch:

git clone https://github.com/Hawksight-AI/semantica.git
git checkout embeddings

We've also included a new test suite in tests/embeddings/test_model_switching.py to cover this scenario.

All these fixes will be officially released in the next version of semantica.

Thanks again for helping us improve the project

2 replies

MoktarEls Jan 8, 2026
Author

Great! We are delighted to be able to collaborate on this project, which is very important to us. We have high expectations for it, and when we see the quality of the code and the speed of maintenance, it can only motivate us!

Do you have any idea when the next version will be officially released ?

Have a good day and thank you for helping us

KaifAhmad1 Jan 8, 2026
Maintainer

Hi @MoktarEls, thank you again for the encouraging words — we truly appreciate the support 😊

Just to add a bit more context on the upcoming release: alongside the embedding fixes, we’ve also added a Docling integration, but that PR is not merged yet. We’re waiting for it to be finalized and reviewed before cutting the next release, so we can ship everything together in a stable way.

Once that PR is merged, we’ll proceed with the official release shortly after. Until then, main already includes the embedding fix you tested.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

EmbeddingGenerator.sex_text_model does not update the model ; remains "fastembed" #160

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

EmbeddingGenerator.sex_text_model does not update the model ; remains "fastembed" #160

Uh oh!

Uh oh!

MoktarEls Jan 8, 2026

The Fix

Example

Replies: 1 comment · 2 replies

Uh oh!

KaifAhmad1 Jan 8, 2026 Maintainer

The Fix

Example

How to Verify

Uh oh!

MoktarEls Jan 8, 2026 Author

Uh oh!

KaifAhmad1 Jan 8, 2026 Maintainer

MoktarEls
Jan 8, 2026

Replies: 1 comment 2 replies

KaifAhmad1
Jan 8, 2026
Maintainer

MoktarEls Jan 8, 2026
Author

KaifAhmad1 Jan 8, 2026
Maintainer