Architecture¶
This document provides a comprehensive overview of the EDAM MCP Server architecture, including system design, component interactions, and data flow.
🏗️ System Overview¶
The EDAM MCP Server is built as a modular, async-first system that provides semantic matching and concept suggestion capabilities for the EDAM ontology.
graph TB
subgraph "MCP Layer"
A[MCP Client]
B[FastMCP Server]
end
subgraph "Tool Layer"
C[Mapping Tool]
D[Suggestion Tool]
end
subgraph "Core Layer"
E[Ontology Loader]
F[Concept Matcher]
G[Concept Suggester]
end
subgraph "Data Layer"
H[EDAM Ontology]
I[Sentence Transformers]
J[Text Processing]
end
A --> B
B --> C
B --> D
C --> E
C --> F
D --> G
E --> H
F --> I
G --> J
📦 Core Components¶
1. FastMCP Server (edam_mcp.main
)¶
The main server component that handles MCP protocol communication and tool registration.
Key Responsibilities: - MCP protocol implementation - Tool registration and routing - Request/response handling - Server lifecycle management
Key Classes:
- FastMCP
- Main server instance
- Context
- MCP context for logging and progress
2. Ontology Layer (edam_mcp.ontology
)¶
Handles loading, parsing, and querying the EDAM ontology.
Ontology Loader (edam_mcp.ontology.loader
)¶
Responsibilities: - Download and parse EDAM OWL files - Extract concept metadata (labels, definitions, synonyms) - Build concept hierarchy - Cache ontology data
Key Methods:
class OntologyLoader:
def load_ontology(self) -> bool
def get_concept(self, uri: str) -> Optional[Dict]
def search_concepts(self, query: str) -> List[Dict]
def get_concept_hierarchy(self, concept_uri: str) -> List[str]
Concept Matcher (edam_mcp.ontology.matcher
)¶
Responsibilities: - Semantic similarity calculation - Embedding generation and caching - Exact and fuzzy matching - Confidence scoring
Key Methods:
class ConceptMatcher:
def match_concepts(self, description: str, ...) -> List[ConceptMatch]
def find_exact_matches(self, description: str) -> List[ConceptMatch]
def get_concept_neighbors(self, concept_uri: str) -> List[ConceptMatch]
Concept Suggester (edam_mcp.ontology.suggester
)¶
Responsibilities: - Generate new concept suggestions - Infer concept types - Suggest hierarchical placement - Calculate suggestion confidence
Key Methods:
class ConceptSuggester:
def suggest_concepts(self, description: str, ...) -> List[SuggestedConcept]
def _infer_concept_type(self, description: str) -> str
def _generate_label_variations(self, text: str) -> List[str]
3. Tool Layer (edam_mcp.tools
)¶
MCP tool implementations that expose functionality to clients.
Mapping Tool (edam_mcp.tools.mapping
)¶
Responsibilities: - Handle mapping requests - Coordinate ontology loading and matching - Return structured responses - Error handling and logging
API:
@mcp.tool
async def map_to_edam_concept(
request: MappingRequest,
context: Context
) -> MappingResponse
Suggestion Tool (edam_mcp.tools.suggestion
)¶
Responsibilities: - Handle suggestion requests - Attempt mapping first, then suggest - Generate multiple suggestion approaches - Return hierarchical suggestions
API:
@mcp.tool
async def suggest_new_concept(
request: SuggestionRequest,
context: Context
) -> SuggestionResponse
4. Models Layer (edam_mcp.models
)¶
Pydantic models for request/response validation and serialization.
Request Models (edam_mcp.models.requests
)¶
MappingRequest
- Input for concept mappingSuggestionRequest
- Input for concept suggestion
Response Models (edam_mcp.models.responses
)¶
ConceptMatch
- Individual concept match with confidenceMappingResponse
- Complete mapping resultsSuggestedConcept
- New concept suggestionSuggestionResponse
- Complete suggestion results
5. Utilities (edam_mcp.utils
)¶
Helper functions for text processing and similarity calculation.
Text Processing (edam_mcp.utils.text_processing
)¶
Functions:
- preprocess_text()
- Clean and normalize text
- extract_keywords()
- Extract key terms
- tokenize_text()
- Split text into tokens
Similarity (edam_mcp.utils.similarity
)¶
Functions:
- calculate_cosine_similarity()
- Vector similarity
- calculate_jaccard_similarity()
- Set similarity
- calculate_string_similarity()
- String similarity
🔄 Data Flow¶
1. Concept Mapping Flow¶
sequenceDiagram
participant Client
participant Server
participant Loader
participant Matcher
participant Ontology
Client->>Server: MappingRequest
Server->>Loader: load_ontology()
Loader->>Ontology: Download OWL file
Ontology-->>Loader: Ontology data
Loader-->>Server: Loaded concepts
Server->>Matcher: match_concepts()
Matcher->>Matcher: Generate embeddings
Matcher->>Matcher: Calculate similarities
Matcher-->>Server: Concept matches
Server-->>Client: MappingResponse
2. Concept Suggestion Flow¶
sequenceDiagram
participant Client
participant Server
participant Suggester
participant Matcher
participant Loader
Client->>Server: SuggestionRequest
Server->>Matcher: Try mapping first
Matcher-->>Server: Low confidence matches
Server->>Suggester: suggest_concepts()
Suggester->>Suggester: Infer concept type
Suggester->>Suggester: Generate labels
Suggester->>Suggester: Find parent concepts
Suggester-->>Server: Suggested concepts
Server-->>Client: SuggestionResponse
🎯 Design Principles¶
1. Modularity¶
- Clear separation of concerns
- Loose coupling between components
- Easy to extend and modify
2. Async-First¶
- All I/O operations are async
- Non-blocking ontology loading
- Concurrent request handling
3. Lazy Loading¶
- Heavy dependencies loaded on demand
- Ontology cached after first load
- ML models loaded only when needed
4. Type Safety¶
- Complete type hints throughout
- Pydantic validation for all data
- Runtime type checking
5. Error Handling¶
- Graceful degradation
- Detailed error messages
- Proper logging at all levels
🔧 Configuration¶
The system is configured through environment variables and the Settings
class:
class Settings(BaseSettings):
edam_ontology_url: str
similarity_threshold: float
max_suggestions: int
embedding_model: str
cache_ttl: int
log_level: str
📊 Performance Characteristics¶
Memory Usage¶
- Base: ~50MB (Python + dependencies)
- Ontology: ~100MB (3,515 concepts)
- ML Models: ~350MB (sentence transformers)
- Total: ~500MB
Response Times¶
- First Run: ~5 seconds (model download)
- Subsequent Runs: <1 second
- Ontology Loading: ~2 seconds
- Semantic Matching: ~0.5 seconds
Scalability¶
- Concurrent Requests: Full async support
- Memory Scaling: Linear with ontology size
- CPU Scaling: Parallel embedding generation
🔮 Future Enhancements¶
Planned Improvements¶
- Caching Layer: Redis for distributed caching
- Database Backend: PostgreSQL for ontology storage
- Batch Processing: Bulk concept mapping
- Custom Models: Fine-tuned embedding models
- API Versioning: Backward compatibility
Extension Points¶
- New Tool Types: Additional MCP tools
- Alternative Matchers: Different similarity algorithms
- External APIs: Integration with other ontologies
- Plugin System: Third-party extensions