Skip to content

Custom Models

Learn how to use custom models with MIRIX through vllm or other OpenAI-compatible API endpoints. This guide covers both end-user (Desktop App) and developer (Backend API) approaches.

Overview

MIRIX supports using custom models served through vllm or other OpenAI-compatible API endpoints. This allows you to:

  • Use your own fine-tuned models
  • Deploy models on your own infrastructure
  • Integrate with alternative model providers
  • Optimize for specific use cases and hardware

Choose your approach:

  • End-Users - Use the Desktop App's graphical interface
  • Developers - Use configuration files and Python API

For End-Users (Desktop App)

If you're using the MIRIX Desktop App, you can easily add custom models through the graphical interface without any configuration files.

Step 1: Set Up vllm Model Server

First, serve your model using vllm. This step is the same for both end-users and developers:

# Example: Serving Qwen3-32B model with multi-GPU setup
CUDA_VISIBLE_DEVICES=0,1,2,3 \
python3 -m vllm.entrypoints.openai.api_server \
  --model Qwen/Qwen3-32B \
  --served-model-name qwen3-32b \
  --port 8001 \
  --tensor-parallel-size 4 \
  --max-model-len 32768 \
  --gpu-memory-utilization 0.9

vllm Parameters Explained:

Parameter Description Example Value
CUDA_VISIBLE_DEVICES Specify which GPUs to use 0,1,2,3
--model HuggingFace model path or local model path Qwen/Qwen3-32B
--served-model-name Custom name for the served model qwen3-32b
--port Port to serve the API on 8001
--tensor-parallel-size Number of GPUs for tensor parallelism 4
--max-model-len Maximum sequence length 32768
--gpu-memory-utilization GPU memory usage ratio (0.0-1.0) 0.9

Step 2: Add Model in Desktop App

Once your vllm server is running, add the model through the MIRIX Desktop App interface:

Add Local Model Dialog

  1. Open Settings - Navigate to SettingsModel Configuration
  2. Add Local Model - Click the "Add" button next to either Chat Agent Model or Memory Manager Model
  3. Fill in Model Details in the "Add Local Model" dialog:
Field Description Example
Model Name The name identifier for your deployed model qwen3-32b
Model Endpoint The API endpoint URL for your vllm server http://localhost:8001/v1
Temperature Controls randomness in responses (0.0-2.0) 0.7
Max Tokens Maximum number of tokens to generate 4096
Maximum Length Maximum context length supported 32768
  1. Add Model - Click "Add Model" to save the configuration
  2. Select Model - Choose your newly added model from the dropdown
  3. Start Chatting - Your custom model is now ready to use!

UI Configuration Tips

Model Endpoint Format

Make sure your endpoint URL includes the /v1 suffix: - ✅ Correct: http://localhost:8001/v1 - ❌ Incorrect: http://localhost:8001

Parameter Guidelines

  • Temperature: Start with 0.7 for balanced responses
  • Max Tokens: Match your typical response length needs
  • Maximum Length: Should match your vllm --max-model-len setting

For Developers (Backend API)

If you're using MIRIX programmatically through Python, you can configure custom models using YAML configuration files.

Step 1: Set Up vllm Model Server

Use the same vllm server setup as shown in the End-Users section above.

Step 2: Create Custom Model Configuration

Create a custom configuration file in your configs/ directory (e.g., configs/mirix_custom_model.yaml):

agent_name: mirix
model_name: qwen3-32b
model_endpoint: http://localhost:8001/v1
generation_config:
  temperature: 0.6
  max_tokens: 4096
  context_window: 32768

Step 3: Configuration Parameters

Parameter Description Example
agent_name Name identifier for your agent mirix
model_name Model name as served by vllm qwen3-32b
model_endpoint API endpoint URL http://localhost:8001/v1
temperature Controls response randomness (0.0-1.0) 0.6
max_tokens Maximum tokens in response 4096
context_window Maximum context length 32768

Step 4: Initialize Agent with Custom Model

from mirix.agent import AgentWrapper

# Initialize agent with custom model configuration
agent = AgentWrapper("./configs/mirix_custom_model.yaml")

Supported Model Types

MIRIX supports any model that provides an OpenAI-compatible API, including:

  • Local Models - Models served locally with vllm, Ollama, or other servers
  • Self-hosted Models - Models deployed on your own infrastructure
  • Custom Endpoints - Any OpenAI-compatible API endpoint
  • Different Providers - Alternative model providers with OpenAI-compatible APIs

Complete Example: Developer Workflow

Here's a complete end-to-end example for developers using the Python API:

# 1. Start vllm server (in terminal)
# CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m vllm.entrypoints.openai.api_server \
#   --model Qwen/Qwen3-32B --served-model-name qwen3-32b --port 8001 \
#   --tensor-parallel-size 4 --max-model-len 32768 --gpu-memory-utilization 0.9

# 2. Create configuration file
custom_config = """
agent_name: mirix_qwen
model_name: qwen3-32b
model_endpoint: http://localhost:8001/v1
generation_config:
  temperature: 0.7
  max_tokens: 8192
  context_window: 32768
"""

# Save to configs/mirix_custom_model.yaml

# 3. Initialize and use
from mirix.agent import AgentWrapper

agent = AgentWrapper("./configs/mirix_custom_model.yaml")
response = agent.send_message("Hello! What model are you running on?")
print("Custom Model Response:", response)

Advanced Configuration

Multiple Model Configurations

You can create different configuration files for different models:

configs/
├── mirix_qwen_32b.yaml      # Large model for complex tasks
├── mirix_qwen_7b.yaml       # Smaller model for quick responses
└── mirix_custom_local.yaml  # Local fine-tuned model

Model-Specific Optimizations

Different models may require different configurations:

# For code generation models
agent_name: mirix_code
model_name: codellama-34b
model_endpoint: http://localhost:8002/v1
generation_config:
  temperature: 0.1  # Lower temperature for code
  max_tokens: 8192
  context_window: 16384

---
# For creative writing models
agent_name: mirix_creative
model_name: llama2-70b
model_endpoint: http://localhost:8003/v1
generation_config:
  temperature: 0.9  # Higher temperature for creativity
  max_tokens: 4096
  context_window: 32768

Troubleshooting

Common Issues

Connection refused errors

Check if your vllm server is running:

# Check if the port is in use
lsof -i :8001

# Check vllm server logs
tail -f /path/to/vllm/logs

Out of memory errors

Adjust GPU memory utilization:

# Reduce memory utilization
--gpu-memory-utilization 0.8

# Use smaller tensor parallel size
--tensor-parallel-size 2

Model loading errors

Verify model path and permissions:

# Check if model exists locally
ls -la /path/to/local/model/

# Check HuggingFace cache
ls ~/.cache/huggingface/transformers/

Performance Optimization

GPU Configuration

For optimal performance: - Use all available GPUs with tensor parallelism - Set gpu-memory-utilization to 0.85-0.95 - Match max-model-len to your actual needs

Memory Management

Monitor system resources:

# Monitor GPU usage
nvidia-smi -l 1

# Monitor system memory
htop

Important Considerations

Model Compatibility

Ensure your custom model is compatible with the tasks you want to perform. Some models may work better for chat, while others excel at specific tasks like code generation or analysis.

Performance Considerations

Local model performance depends on your hardware. Consider factors like: - GPU memory requirements - Model size vs. inference speed - Context window vs. memory usage - Network latency for remote endpoints

Security

When using custom endpoints: - Ensure secure connections (HTTPS) for remote servers - Validate API authentication if required - Monitor for unusual usage patterns

Quick Reference

Which Approach Should I Use?

Use Case Recommended Approach Why
Regular MIRIX Desktop App user End-Users (Desktop App) Easy graphical interface, no configuration files needed
Python developer building integrations Developers (Backend API) Programmatic control, configuration files, automation
Both desktop and programmatic use Start with Desktop App, then use Backend API Best of both worlds

Essential Steps (Both Approaches)

  1. Start vllm server with your model
  2. Configure in MIRIX (UI dialog OR YAML file)
  3. Start using your custom model

What's Next?

Ready to dive deeper into MIRIX development?

Performance →