Architecture¶

This document provides a comprehensive overview of the EDAM MCP Server architecture, including system design, component interactions, and data flow.

🏗️ System Overview¶

The EDAM MCP Server is built as a modular, async-first system that provides semantic matching and concept suggestion capabilities for the EDAM ontology.

graph TB
    subgraph "MCP Layer"
        A[MCP Client]
        B[FastMCP Server]
    end

    subgraph "Tool Layer"
        C[Mapping Tool]
        D[Suggestion Tool]
    end

    subgraph "Core Layer"
        E[Ontology Loader]
        F[Concept Matcher]
        G[Concept Suggester]
    end

    subgraph "Data Layer"
        H[EDAM Ontology]
        I[Sentence Transformers]
        J[Text Processing]
    end

    A --> B
    B --> C
    B --> D
    C --> E
    C --> F
    D --> G
    E --> H
    F --> I
    G --> J

📦 Core Components¶

1. FastMCP Server (`edam_mcp.main`)¶

The main server component that handles MCP protocol communication and tool registration.

Key Responsibilities:

MCP protocol implementation
Tool registration and routing
Request/response handling
Server lifecycle management

Key Classes:

FastMCP - Main server instance
Context - MCP context for logging and progress

2. Ontology Layer (`edam_mcp.ontology`)¶

Handles loading, parsing, and querying the EDAM ontology.

Ontology Loader (`edam_mcp.ontology.loader`)¶

Responsibilities:

Download and parse EDAM OWL files
Extract concept metadata (labels, definitions, synonyms)
Build concept hierarchy
Cache ontology data

Key Methods:

class OntologyLoader:
    def load_ontology(self) -> bool
    def get_concept(self, uri: str) -> Optional[Dict]
    def search_concepts(self, query: str) -> List[Dict]
    def get_concept_hierarchy(self, concept_uri: str) -> List[str]

Concept Matcher (`edam_mcp.ontology.matcher`)¶

Responsibilities:

Semantic similarity calculation
Embedding generation and caching
Exact and fuzzy matching
Confidence scoring

Key Methods:

class ConceptMatcher:
    def match_concepts(self, description: str, ...) -> List[ConceptMatch]
    def find_exact_matches(self, description: str) -> List[ConceptMatch]
    def get_concept_neighbors(self, concept_uri: str) -> List[ConceptMatch]

Concept Suggester (`edam_mcp.ontology.suggester`)¶

Responsibilities:

Generate new concept suggestions
Infer concept types
Suggest hierarchical placement
Calculate suggestion confidence

Key Methods:

class ConceptSuggester:
    def suggest_concepts(self, description: str, ...) -> List[SuggestedConcept]
    def _infer_concept_type(self, description: str) -> str
    def _generate_label_variations(self, text: str) -> List[str]

3. Tool Layer (`edam_mcp.tools`)¶

MCP tool implementations that expose functionality to clients.

Mapping Tool (`edam_mcp.tools.mapping`)¶

Responsibilities:

Handle mapping requests
Coordinate ontology loading and matching
Return structured responses
Error handling and logging

API:

@mcp.tool
async def map_to_edam_concept(
    request: MappingRequest,
    context: Context
) -> MappingResponse

Suggestion Tool (`edam_mcp.tools.suggestion`)¶

Responsibilities:

Handle suggestion requests
Attempt mapping first, then suggest
Generate multiple suggestion approaches
Return hierarchical suggestions

API:

@mcp.tool
async def suggest_new_concept(
    request: SuggestionRequest,
    context: Context
) -> SuggestionResponse

4. Models Layer (`edam_mcp.models`)¶

Pydantic models for request/response validation and serialization.

Request Models (`edam_mcp.models.requests`)¶

MappingRequest - Input for concept mapping
SuggestionRequest - Input for concept suggestion

Response Models (`edam_mcp.models.responses`)¶

ConceptMatch - Individual concept match with confidence
MappingResponse - Complete mapping results
SuggestedConcept - New concept suggestion
SuggestionResponse - Complete suggestion results

5. Utilities (`edam_mcp.utils`)¶

Helper functions for text processing and similarity calculation.

Text Processing (`edam_mcp.utils.text_processing`)¶

Functions:

preprocess_text() - Clean and normalize text
extract_keywords() - Extract key terms
tokenize_text() - Split text into tokens

Similarity (`edam_mcp.utils.similarity`)¶

Functions:

calculate_cosine_similarity() - Vector similarity
calculate_jaccard_similarity() - Set similarity
calculate_string_similarity() - String similarity

🔄 Data Flow¶

1. Concept Mapping Flow¶

sequenceDiagram
    participant Client
    participant Server
    participant Loader
    participant Matcher
    participant Ontology

    Client->>Server: MappingRequest
    Server->>Loader: load_ontology()
    Loader->>Ontology: Download OWL file
    Ontology-->>Loader: Ontology data
    Loader-->>Server: Loaded concepts
    Server->>Matcher: match_concepts()
    Matcher->>Matcher: Generate embeddings
    Matcher->>Matcher: Calculate similarities
    Matcher-->>Server: Concept matches
    Server-->>Client: MappingResponse

2. Concept Suggestion Flow¶

sequenceDiagram
    participant Client
    participant Server
    participant Suggester
    participant Matcher
    participant Loader

    Client->>Server: SuggestionRequest
    Server->>Matcher: Try mapping first
    Matcher-->>Server: Low confidence matches
    Server->>Suggester: suggest_concepts()
    Suggester->>Suggester: Infer concept type
    Suggester->>Suggester: Generate labels
    Suggester->>Suggester: Find parent concepts
    Suggester-->>Server: Suggested concepts
    Server-->>Client: SuggestionResponse

🎯 Design Principles¶

1. Modularity¶

Clear separation of concerns
Loose coupling between components
Easy to extend and modify

2. Async-First¶

All I/O operations are async
Non-blocking ontology loading
Concurrent request handling

3. Lazy Loading¶

Heavy dependencies loaded on demand
Ontology cached after first load
ML models loaded only when needed

4. Type Safety¶

Complete type hints throughout
Pydantic validation for all data
Runtime type checking

5. Error Handling¶

Graceful degradation
Detailed error messages
Proper logging at all levels

🔧 Configuration¶

The system is configured through environment variables and the Settings class:

class Settings(BaseSettings):
    ontology_url: str
    similarity_threshold: float
    max_suggestions: int
    embedding_model: str
    cache_ttl: int
    log_level: str

📊 Performance Characteristics¶

Memory Usage¶

Base: ~50MB (Python + dependencies)
Ontology: ~100MB (3,515 concepts)
ML Models: ~350MB (sentence transformers)
Total: ~500MB

Response Times¶

First Run: ~5 seconds (model download)
Subsequent Runs: <1 second
Ontology Loading: ~2 seconds
Semantic Matching: ~0.5 seconds

Scalability¶

Concurrent Requests: Full async support
Memory Scaling: Linear with ontology size
CPU Scaling: Parallel embedding generation

🔮 Future Enhancements¶

Planned Improvements¶

Caching Layer: Redis for distributed caching
Database Backend: PostgreSQL for ontology storage
Batch Processing: Bulk concept mapping
Custom Models: Fine-tuned embedding models
API Versioning: Backward compatibility

Extension Points¶

New Tool Types: Additional MCP tools
Alternative Matchers: Different similarity algorithms
External APIs: Integration with other ontologies
Plugin System: Third-party extensions

Architecture¶

🏗️ System Overview¶

📦 Core Components¶

1. FastMCP Server (edam_mcp.main)¶

2. Ontology Layer (edam_mcp.ontology)¶

Ontology Loader (edam_mcp.ontology.loader)¶

Concept Matcher (edam_mcp.ontology.matcher)¶

Concept Suggester (edam_mcp.ontology.suggester)¶

3. Tool Layer (edam_mcp.tools)¶

Mapping Tool (edam_mcp.tools.mapping)¶

Suggestion Tool (edam_mcp.tools.suggestion)¶

4. Models Layer (edam_mcp.models)¶

Request Models (edam_mcp.models.requests)¶

Response Models (edam_mcp.models.responses)¶

5. Utilities (edam_mcp.utils)¶

Text Processing (edam_mcp.utils.text_processing)¶

Similarity (edam_mcp.utils.similarity)¶

🔄 Data Flow¶

1. Concept Mapping Flow¶

2. Concept Suggestion Flow¶

🎯 Design Principles¶

1. Modularity¶

2. Async-First¶

3. Lazy Loading¶

4. Type Safety¶

5. Error Handling¶

🔧 Configuration¶

📊 Performance Characteristics¶

Memory Usage¶

Response Times¶

Scalability¶

🔮 Future Enhancements¶

Planned Improvements¶

Extension Points¶

1. FastMCP Server (`edam_mcp.main`)¶

2. Ontology Layer (`edam_mcp.ontology`)¶

Ontology Loader (`edam_mcp.ontology.loader`)¶

Concept Matcher (`edam_mcp.ontology.matcher`)¶

Concept Suggester (`edam_mcp.ontology.suggester`)¶

3. Tool Layer (`edam_mcp.tools`)¶

Mapping Tool (`edam_mcp.tools.mapping`)¶

Suggestion Tool (`edam_mcp.tools.suggestion`)¶

4. Models Layer (`edam_mcp.models`)¶

Request Models (`edam_mcp.models.requests`)¶

Response Models (`edam_mcp.models.responses`)¶

5. Utilities (`edam_mcp.utils`)¶

Text Processing (`edam_mcp.utils.text_processing`)¶

Similarity (`edam_mcp.utils.similarity`)¶