Search Capabilities
MIRIX provides multiple sophisticated search methods for retrieving information from its 5 memory components (Procedural, Knowledge Vault, Episodic, Semantic, Resource), with PostgreSQL-native full-text search as the primary implementation for optimal performance and scalability.
Search Methods Overview
MIRIX supports three distinct search methods, each optimized for different use cases:
Method | Description | Best For | Performance |
---|---|---|---|
bm25 |
RECOMMENDED - PostgreSQL native full-text search | Most queries, production use | Excellent |
embedding |
Vector similarity search using embeddings | Semantic similarity, conceptual queries | Good |
string_match |
Simple string containment search | Exact text matching | Fast |
PostgreSQL Native BM25 Implementation
Architecture
All 5 memory managers (Procedural, Knowledge Vault, Episodic, Semantic, Resource) use PostgreSQL's native ts_rank_cd
function for BM25-like scoring directly in the database.
graph TB
A[Search Query] --> B[Text Cleaning & Tokenization]
B --> C[TSQuery Generation with Prefix Matching]
C --> D[AND Query Attempt]
D --> E{Sufficient Results?}
E -->|Yes| F[Return AND Results]
E -->|No| G[OR Query Fallback]
G --> H[GIN Index Lookup]
H --> I[ts_rank_cd Scoring]
I --> J[Memory-Type-Specific Field Processing]
J --> K[Ranked Results]
Key Technical Features
Smart Query Preprocessing
- Text Cleaning: Removes punctuation, normalizes whitespace, converts to lowercase
- Tokenization: Splits into meaningful tokens, filters out very short words
- TSQuery Creation: Generates both exact and prefix matching queries for better recall
- Special Character Escaping: Safely handles PostgreSQL tsquery special characters
# Example preprocessing flow
"machine-learning AI!" → "machine learning ai" → ["machine", "learning", "ai"]
→ "(machine | machine:*) & (learning | learning:*) & (ai | ai:*)"
Advanced Query Logic with Fallback
-- Step 1: Try precise AND query for high precision
SELECT *, ts_rank_cd(tsvector_field, to_tsquery('english', :and_query), 32) as rank
FROM memory_table
WHERE tsvector_field @@ to_tsquery('english', :and_query)
ORDER BY rank DESC, created_at DESC
LIMIT 50;
-- Step 2: If insufficient results, fallback to OR query for better recall
SELECT *, ts_rank_cd(tsvector_field, to_tsquery('english', :or_query), 32) as rank
FROM memory_table
WHERE tsvector_field @@ to_tsquery('english', :or_query)
ORDER BY rank DESC, created_at DESC
LIMIT 50;
-- Step 3: Ultimate fallback to simple ILIKE if tsquery fails
SELECT * FROM memory_table
WHERE lower(field) LIKE lower('%query%')
ORDER BY created_at DESC
LIMIT 50;
Memory-Type-Specific Field Implementation
Each memory type has different searchable fields and special handling:
Procedural Memory Fields:
summary
: Process description -to_tsvector('english', coalesce(summary, ''))
steps
: JSON array with special processing -to_tsvector('english', coalesce(regexp_replace(steps::text, '[\"\\[\\],]', ' ', 'g'), ''))
entry_type
: Type of procedure -to_tsvector('english', coalesce(entry_type, ''))
Knowledge Vault Fields:
caption
: Description of stored item -to_tsvector('english', coalesce(caption, ''))
secret_value
: Actual stored value -to_tsvector('english', coalesce(secret_value, ''))
- Additional:
entry_type
,source
,sensitivity
(with filtering support)
Episodic Memory Fields:
summary
: Brief event description -to_tsvector('english', coalesce(summary, ''))
details
: Comprehensive event information -to_tsvector('english', coalesce(details, ''))
actor
: Who performed the action -to_tsvector('english', coalesce(actor, ''))
event_type
: Category of event -to_tsvector('english', coalesce(event_type, ''))
Semantic Memory Fields:
name
: Concept or object name -to_tsvector('english', coalesce(name, ''))
summary
: Concise explanation -to_tsvector('english', coalesce(summary, ''))
details
: Extended description -to_tsvector('english', coalesce(details, ''))
source
: Knowledge origin -to_tsvector('english', coalesce(source, ''))
Resource Memory Fields:
title
: Resource name -to_tsvector('english', coalesce(title, ''))
summary
: Brief description -to_tsvector('english', coalesce(summary, ''))
content
: Full document content -to_tsvector('english', coalesce(content, ''))
resource_type
: File format type -to_tsvector('english', coalesce(resource_type, ''))
Special JSON Array Handling (Procedural Memory Only):
-- Transforms: ["step 1", "step 2", "step 3"]
-- Into: "step 1 step 2 step 3" for full-text search
regexp_replace(steps::text, '[\"\\[\\],]', ' ', 'g')
Field Weighting System (Multi-Field Search Only)
When no specific search_field
is provided, each memory type uses different field weighting:
Procedural Memory:
setweight(to_tsvector('english', coalesce(summary, '')), 'A') ||
setweight(to_tsvector('english', coalesce(regexp_replace(steps::text, '[\"\\[\\],]', ' ', 'g'), '')), 'B') ||
setweight(to_tsvector('english', coalesce(entry_type, '')), 'C')
Knowledge Vault (2-tier weighting):
setweight(to_tsvector('english', coalesce(caption, '')), 'A') ||
setweight(to_tsvector('english', coalesce(secret_value, '')), 'B')
Episodic Memory:
setweight(to_tsvector('english', coalesce(summary, '')), 'A') ||
setweight(to_tsvector('english', coalesce(details, '')), 'B') ||
setweight(to_tsvector('english', coalesce(actor, '')), 'C') ||
setweight(to_tsvector('english', coalesce(event_type, '')), 'D')
Semantic Memory:
setweight(to_tsvector('english', coalesce(name, '')), 'A') ||
setweight(to_tsvector('english', coalesce(summary, '')), 'B') ||
setweight(to_tsvector('english', coalesce(details, '')), 'C') ||
setweight(to_tsvector('english', coalesce(source, '')), 'D')
Resource Memory:
setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
setweight(to_tsvector('english', coalesce(summary, '')), 'B') ||
setweight(to_tsvector('english', coalesce(content, '')), 'C') ||
setweight(to_tsvector('english', coalesce(resource_type, '')), 'D')
Note: Field weighting is only applied during multi-field searches. Single-field searches directly query the specified field without weighting.
Special Features
Knowledge Vault Sensitivity Filtering:
Knowledge Vault has an additional sensitivity
parameter for security-based filtering:
# Filter by sensitivity levels
results = knowledge_vault_manager.list_knowledge(
query="password",
sensitivity=["low", "medium"] # Exclude "high" sensitivity items
)
Document Length Normalization
Uses ts_rank_cd
with normalization parameter 32 for optimal BM25-like scoring:
ts_rank_cd(search_vector, query, 32)
-- 32 = normalize by document length + use logarithmic normalization + unique word count
Performance Optimizations
Memory Efficiency
- Zero In-Memory Loading: Eliminates the need to load all documents into Python memory
- Database-Level Processing: All ranking and filtering done at the PostgreSQL level
- Scalable Architecture: Performance scales with your PostgreSQL setup
Search Field Specifications
Each of the 5 memory types supports field-specific searches:
Procedural Memory (Workflows & Procedures)
summary
: Process descriptionsteps
: Detailed instructions (JSON array with special processing)entry_type
: Type of procedure (workflow/guide/script)
Knowledge Vault (Sensitive Information)
caption
: Description of stored itemsecret_value
: Actual stored value (credentials, tokens, etc.)entry_type
: Type of data (credential/bookmark/contact)source
: Data originsensitivity
: Security classification (low/medium/high) - supports filtering
Episodic Memory (Events & Activities)
summary
: Brief event descriptiondetails
: Comprehensive event informationactor
: Who performed the action (user/assistant)event_type
: Category of event
Semantic Memory (Concepts & Knowledge)
name
: Concept or object namesummary
: Concise explanationdetails
: Extended descriptionsource
: Knowledge origin
Resource Memory (Documents & Files)
title
: Resource namesummary
: Brief description with contextcontent
: Full document contentresource_type
: File format type (doc/markdown/pdf/etc.)
Usage Examples
Basic Search
from mirix.agent import AgentWrapper
agent = AgentWrapper("./configs/mirix.yaml")
# Basic search across all memory types
results = agent.search_memory("machine learning algorithms")
Memory-Type-Specific Search Examples
# Procedural Memory - Find workflows
results = procedural_memory_manager.list_procedures(
agent_state=agent_state,
query="deployment process",
search_method="bm25",
search_field="summary",
limit=20
)
# Knowledge Vault - Find credentials with sensitivity filtering
results = knowledge_vault_manager.list_knowledge(
agent_state=agent_state,
query="database password",
search_method="bm25",
sensitivity=["low", "medium"], # Exclude high sensitivity
limit=10
)
# Episodic Memory - Find recent activities
results = episodic_memory_manager.list_episodic_memory(
agent_state=agent_state,
query="code review",
search_method="bm25",
search_field="details",
limit=15
)
# Semantic Memory - Find concepts
results = semantic_memory_manager.list_semantic_items(
agent_state=agent_state,
query="machine learning",
search_method="embedding",
search_field="name",
limit=25
)
# Resource Memory - Find documents
results = resource_memory_manager.list_resources(
agent_state=agent_state,
query="API documentation",
search_method="bm25",
search_field="content",
limit=30
)
Advanced Features
Special Memory-Type Features
Knowledge Vault Sensitivity Filtering:
# Filter by sensitivity levels for security
results = knowledge_vault_manager.list_knowledge(
query="database credentials",
sensitivity=["low", "medium"] # Exclude high sensitivity items
)
Episodic Memory Time-Based Search:
# Search within specific time ranges
results = episodic_memory_manager.list_episodic_memory_around_timestamp(
agent_state=agent_state,
start_time=yesterday,
end_time=today
)
Procedural Memory JSON Array Processing:
# Search within step-by-step instructions
results = procedural_memory_manager.list_procedures(
query="git deployment",
search_field="steps", # Searches within JSON array of steps
search_method="bm25"
)
What's Next?
Learn how to use these search capabilities in practice: