Open Response API Documentation

The Embedding Service transforms text content into vector embeddings that enable semantic search and similarity comparisons in your RAG implementation.

Overview

Embeddings are numerical representations of text that capture semantic meaning. The application uses these vectors to power search functionality and document comparisons. By default, a local embedding model is used, but you can configure the system to use OpenAI’s embedding API or custom ONNX models.

Quick Setup

For most users, the default embedding configuration works out of the box. You can easily customize it using environment variables in your deployment.

Environment Variables

Variable	Default	Description
`OPEN_RESPONSES_EMBEDDINGS_MODEL_TYPE`	all-minilm-l6-v2	The type of embedding model to use (all-minilm-l6-v2 or onnx)
`OPEN_RESPONSES_EMBEDDINGS_HTTP_ENABLED`	false	Whether to use OpenAI API (true) or local model (false)
`OPEN_RESPONSES_EMBEDDINGS_API_KEY`		Your OpenAI API key (required when HTTP_ENABLED is true)
`OPEN_RESPONSES_EMBEDDINGS_MODEL`		The OpenAI model to use (when HTTP_ENABLED is true)
`OPEN_RESPONSES_EMBEDDINGS_URL`		The base URL for OpenAI API (when HTTP_ENABLED is true)
`OPEN_RESPONSES_EMBEDDINGS_ONNX_MODEL_PATH`		Path to custom ONNX model file (when MODEL_TYPE is onnx)
`OPEN_RESPONSES_EMBEDDINGS_TOKENIZER_PATH`		Path to custom tokenizer JSON file (when MODEL_TYPE is onnx)
`OPEN_RESPONSES_EMBEDDINGS_POOLING_MODE`	mean	Pooling mode for ONNX models: mean, cls, or max

Embedding Configuration Options

Supported Models

Default Local Model

By default, the application uses the AllMiniLmL6V2 model, which offers:

Fast, efficient embedding generation
384-dimensional vectors
Good balance of performance and quality
No external API dependencies

Example docker-compose setup for default model:

services:
  app:
    image: masaicai/open-responses:latest
    # No specific embedding environment variables needed for default setup

Or using Docker run command:

docker run -p 8080:8080 masaicai/open-responses:latest

OpenAI Models

For higher quality embeddings, you can use OpenAI’s embedding models:

services:
  app:
    image: masaicai/open-responses:latest
    environment:
      - OPEN_RESPONSES_EMBEDDINGS_HTTP_ENABLED=true
      - OPEN_RESPONSES_EMBEDDINGS_API_KEY=your-openai-api-key
      - OPEN_RESPONSES_EMBEDDINGS_MODEL=text-embedding-3-small

Or using Docker run command:

docker run -p 8080:8080 \
  -e OPEN_RESPONSES_EMBEDDINGS_HTTP_ENABLED=true \
  -e OPEN_RESPONSES_EMBEDDINGS_API_KEY=your-openai-api-key \
  -e OPEN_RESPONSES_EMBEDDINGS_MODEL=text-embedding-3-small \
  masaicai/open-responses:latest

Benefits of OpenAI models:

Higher quality embeddings
More dimensions (1536 for text-embedding-3-small)
Better semantic understanding

Trade-offs:

Requires internet connectivity
Incurs API usage costs
Adds network latency

Custom ONNX Models

For advanced users, custom ONNX models can be used:

services:
  app:
    image: masaicai/open-responses:latest
    environment:
      - OPEN_RESPONSES_EMBEDDINGS_MODEL_TYPE=onnx
      - OPEN_RESPONSES_EMBEDDINGS_ONNX_MODEL_PATH=/models/your-model.onnx
      - OPEN_RESPONSES_EMBEDDINGS_TOKENIZER_PATH=/models/your-tokenizer.json
      - OPEN_RESPONSES_EMBEDDINGS_POOLING_MODE=mean
    volumes:
      - ./models:/models

Or using Docker run command:

docker run -p 8080:8080 \
  -e OPEN_RESPONSES_EMBEDDINGS_MODEL_TYPE=onnx \
  -e OPEN_RESPONSES_EMBEDDINGS_ONNX_MODEL_PATH=/models/your-model.onnx \
  -e OPEN_RESPONSES_EMBEDDINGS_TOKENIZER_PATH=/models/your-tokenizer.json \
  -e OPEN_RESPONSES_EMBEDDINGS_POOLING_MODE=mean \
  -v ./models:/models \
  masaicai/open-responses:latest

Performance Considerations

OpenAI Models

Higher quality but add latency and cost

Local Models

Faster and work offline but may have lower quality

Custom ONNX

Flexible, configurable for specific use cases

Embedding generation happens when documents are uploaded and indexed. The vector similarity search performance depends on the vector database implementation used.

Troubleshooting

Common Issues

OpenAI Connection Errors

Check your OPEN_RESPONSES_EMBEDDINGS_API_KEY is correct
Verify network connectivity to OPEN_RESPONSES_EMBEDDINGS_URL
Confirm your OpenAI account has available quota

Local Model Performance

The default model requires approximately 150MB of RAM
Ensure your container has sufficient memory allocated

Custom ONNX Model Issues

Verify file paths are correct and the files are accessible
Ensure your model is compatible with the application
Check logs for specific error messages

Get Started

Announcements

Model Providers

Use Cases & Demos

OpenResponses API

Retrieval (RAG)

Built-In Tools

Embedding Models

Overview

Quick Setup

Environment Variables

Supported Models

Default Local Model

OpenAI Models

Custom ONNX Models

Performance Considerations

OpenAI Models

Local Models

Custom ONNX

Troubleshooting

Common Issues

Further Resources

Get Started

Announcements

Model Providers

Use Cases & Demos

OpenResponses API

Retrieval (RAG)

Built-In Tools

​Overview

​Quick Setup

​Environment Variables

​Supported Models

​Default Local Model

​OpenAI Models

​Custom ONNX Models

​Performance Considerations

OpenAI Models

Local Models

Custom ONNX

​Troubleshooting

​Common Issues

​Further Resources

Overview

Quick Setup

Environment Variables

Supported Models

Default Local Model

OpenAI Models

Custom ONNX Models

Performance Considerations

Troubleshooting

Common Issues

Further Resources