The Embedding Service transforms text content into vector embeddings that enable semantic search and similarity comparisons in your RAG implementation.

Overview

Embeddings are numerical representations of text that capture semantic meaning. The application uses these vectors to power search functionality and document comparisons. By default, a local embedding model is used, but you can configure the system to use OpenAI’s embedding API or custom ONNX models.

Quick Setup

For most users, the default embedding configuration works out of the box. You can easily customize it using environment variables in your deployment.

Environment Variables

VariableDefaultDescription
OPEN_RESPONSES_EMBEDDINGS_MODEL_TYPEall-minilm-l6-v2The type of embedding model to use (all-minilm-l6-v2 or onnx)
OPEN_RESPONSES_EMBEDDINGS_HTTP_ENABLEDfalseWhether to use OpenAI API (true) or local model (false)
OPEN_RESPONSES_EMBEDDINGS_API_KEYYour OpenAI API key (required when HTTP_ENABLED is true)
OPEN_RESPONSES_EMBEDDINGS_MODELThe OpenAI model to use (when HTTP_ENABLED is true)
OPEN_RESPONSES_EMBEDDINGS_URLThe base URL for OpenAI API (when HTTP_ENABLED is true)
OPEN_RESPONSES_EMBEDDINGS_ONNX_MODEL_PATHPath to custom ONNX model file (when MODEL_TYPE is onnx)
OPEN_RESPONSES_EMBEDDINGS_TOKENIZER_PATHPath to custom tokenizer JSON file (when MODEL_TYPE is onnx)
OPEN_RESPONSES_EMBEDDINGS_POOLING_MODEmeanPooling mode for ONNX models: mean, cls, or max

Embedding Configuration Options

Supported Models

Default Local Model

By default, the application uses the AllMiniLmL6V2 model, which offers:
  • Fast, efficient embedding generation
  • 384-dimensional vectors
  • Good balance of performance and quality
  • No external API dependencies
Example docker-compose setup for default model:
services:
  app:
    image: masaicai/open-responses:latest
    # No specific embedding environment variables needed for default setup
Or using Docker run command:
docker run -p 8080:8080 masaicai/open-responses:latest

OpenAI Models

For higher quality embeddings, you can use OpenAI’s embedding models:
services:
  app:
    image: masaicai/open-responses:latest
    environment:
      - OPEN_RESPONSES_EMBEDDINGS_HTTP_ENABLED=true
      - OPEN_RESPONSES_EMBEDDINGS_API_KEY=your-openai-api-key
      - OPEN_RESPONSES_EMBEDDINGS_MODEL=text-embedding-3-small
Or using Docker run command:
docker run -p 8080:8080 \
  -e OPEN_RESPONSES_EMBEDDINGS_HTTP_ENABLED=true \
  -e OPEN_RESPONSES_EMBEDDINGS_API_KEY=your-openai-api-key \
  -e OPEN_RESPONSES_EMBEDDINGS_MODEL=text-embedding-3-small \
  masaicai/open-responses:latest
Benefits of OpenAI models:
  • Higher quality embeddings
  • More dimensions (1536 for text-embedding-3-small)
  • Better semantic understanding
Trade-offs:
  • Requires internet connectivity
  • Incurs API usage costs
  • Adds network latency

Custom ONNX Models

For advanced users, custom ONNX models can be used:
services:
  app:
    image: masaicai/open-responses:latest
    environment:
      - OPEN_RESPONSES_EMBEDDINGS_MODEL_TYPE=onnx
      - OPEN_RESPONSES_EMBEDDINGS_ONNX_MODEL_PATH=/models/your-model.onnx
      - OPEN_RESPONSES_EMBEDDINGS_TOKENIZER_PATH=/models/your-tokenizer.json
      - OPEN_RESPONSES_EMBEDDINGS_POOLING_MODE=mean
    volumes:
      - ./models:/models
Or using Docker run command:
docker run -p 8080:8080 \
  -e OPEN_RESPONSES_EMBEDDINGS_MODEL_TYPE=onnx \
  -e OPEN_RESPONSES_EMBEDDINGS_ONNX_MODEL_PATH=/models/your-model.onnx \
  -e OPEN_RESPONSES_EMBEDDINGS_TOKENIZER_PATH=/models/your-tokenizer.json \
  -e OPEN_RESPONSES_EMBEDDINGS_POOLING_MODE=mean \
  -v ./models:/models \
  masaicai/open-responses:latest

Performance Considerations

OpenAI Models

Higher quality but add latency and cost

Local Models

Faster and work offline but may have lower quality

Custom ONNX

Flexible, configurable for specific use cases
Embedding generation happens when documents are uploaded and indexed. The vector similarity search performance depends on the vector database implementation used.

Troubleshooting

Common Issues

  • Check your OPEN_RESPONSES_EMBEDDINGS_API_KEY is correct
  • Verify network connectivity to OPEN_RESPONSES_EMBEDDINGS_URL
  • Confirm your OpenAI account has available quota
  • The default model requires approximately 150MB of RAM
  • Ensure your container has sufficient memory allocated
  • Verify file paths are correct and the files are accessible
  • Ensure your model is compatible with the application
  • Check logs for specific error messages

Further Resources