Models

Models represent configured instances of AI language models that generate text responses. Each model is backed by a model type that defines how to communicate with the underlying AI service.

Model entity

A model is defined by the following properties:

class Model:
    id: UUID                # Unique identifier
    tenant_id: UUID         # Tenant isolation
    name: str               # Unique name per tenant
    dtype: str              # Model type (e.g., "openai", "vllm_local")
    configuration: dict     # Type-specific config (API keys, URLs)
    summary: str            # Brief description
    tags: str               # Comma-separated tags
    created_at: datetime
    updated_at: datetime

Location: backend/syft_space/components/models/entities.py:15

Model types

Model types implement the BaseModelType protocol and provide:

Configuration schema

Each type defines required fields:

@classmethod
def configuration_schema(cls) -> dict[str, Any]:
    """Return configuration schema for this model type."""
    return {
        "api_key": {"type": "string", "required": True, "secret": True},
        "model": {"type": "string", "required": True},
        "base_url": {"type": "string", "required": False}
    }

Chat interface

All model types implement chat functionality:

async def chat(
    self,
    ctx: ChatContext,
    messages: list[ChatMessage],
    params: ChatParameters | None = None
) -> ChatResult:
    """Generate a response from the model."""

Location: backend/syft_space/components/model_types/interfaces.py:129

Chat data models

ChatMessage

Input messages to the model:

class ChatMessage:
    role: str       # "user", "assistant", or "system"
    content: str    # Message text

Location: backend/syft_space/components/model_types/interfaces.py:18

ChatParameters

Control generation behavior:

class ChatParameters:
    temperature: float = 0.7           # Randomness (0.0-2.0)
    max_tokens: int = 100              # Maximum response length
    stop_sequences: list[str] = []     # Stop generation at these strings
    presence_penalty: float = 0.0      # Penalize repeated topics (-2.0 to 2.0)
    frequency_penalty: float = 0.0     # Penalize repeated tokens (-2.0 to 2.0)
    top_p: float = 1.0                 # Nucleus sampling (0.0-1.0)
    extra_options: dict = {}           # Type-specific options

Location: backend/syft_space/components/model_types/interfaces.py:26

ChatResult

Model response:

class ChatResult:
    id: str                         # Unique completion ID
    model: str                      # Model name used
    messages: list[ChatMessageResult]  # Generated messages
    finish_reason: str              # "stop", "length", "error", etc.
    usage: TokenUsage               # Token consumption details
    metadata: dict                  # Additional info

class ChatMessageResult:
    role: str       # Message role
    content: str    # Generated text
    tokens: int     # Tokens in this message

class TokenUsage:
    prompt_tokens: int      # Tokens in input
    completion_tokens: int  # Tokens in output
    total_tokens: int       # Sum of both

Location: backend/syft_space/components/model_types/interfaces.py:68

Available model types

OpenAI

Type name: openai Connect to OpenAI’s API (GPT-4, GPT-3.5, etc.). Configuration:

{
  "api_key": "sk-...",
  "model": "gpt-4",
  "base_url": "https://api.openai.com/v1"  // optional
}

Use cases:

Production-grade chat completions
Function calling
Advanced reasoning tasks

vLLM (local)

Type name: vllm_local Connect to locally-hosted vLLM inference server. Configuration:

{
  "base_url": "http://localhost:8000",
  "model": "meta-llama/Llama-2-7b-chat-hf"
}

Use cases:

Privacy-preserving inference (data never leaves your infrastructure)
Custom fine-tuned models
Cost optimization for high-volume use

Model operations

Create model

async def create_model(
    request: CreateModelRequest,
    tenant: Tenant
) -> ModelResponse:
    """
    1. Validates model type exists
    2. Validates configuration against schema
    3. Creates model entity
    """

Location: backend/syft_space/components/models/handlers.py:86 Request schema:

class CreateModelRequest:
    name: str               # Unique name per tenant
    dtype: str              # Model type name
    configuration: dict     # Type-specific config
    summary: str = ""       # Optional description
    tags: str = ""          # Comma-separated tags

Update model

Partial updates (only name, summary, tags):

async def update_model(
    name: str,
    request: UpdateModelRequest,
    tenant: Tenant
) -> ModelResponse:
    """
    Updates metadata fields.
    Configuration cannot be updated (delete + recreate instead).
    """

Model configuration (API keys, URLs) cannot be updated. To change configuration, delete and recreate the model.

Location: backend/syft_space/components/models/handlers.py:162

Delete model

async def delete_model(name: str, tenant: Tenant) -> dict:
    """Deletes model and cascades to connected endpoints."""

Location: backend/syft_space/components/models/handlers.py:197

Healthcheck

Verify model connectivity:

async def healthcheck(name: str, tenant: Tenant) -> HealthcheckResponse:
    """
    Returns:
    - status: HEALTHY or UNHEALTHY
    - message: Details about connection state
    """

Location: backend/syft_space/components/models/handlers.py:217

RAG integration

When a model is used in an endpoint with a dataset (response type “both”), search results are automatically injected as context:

# Endpoint handler combines dataset + model
if references and references.documents:
    # Build context from top 3 search results
    context_content = "\n\n".join([
        f"[{doc.document_id}] {doc.content}"
        for doc in references.documents[:3]
    ])
    
    # Inject as system message
    context_message = ChatMessage(
        role="system",
        content=f"Use the following context to answer:\n{context_content}"
    )
    messages.insert(0, context_message)

# Chat with model
chat_result = await model_instance.chat(ctx, messages, params)

Location: backend/syft_space/components/endpoints/handlers.py:481 This implements the retrieval-augmented generation pattern:

Query searches dataset for relevant documents
Top results are formatted as context
Context + user message sent to model
Model generates answer grounded in retrieved documents

Response format

When querying an endpoint, model responses follow this structure:

class SummaryResponse:
    id: str                     # Completion ID
    model: str                  # Model name used
    message: MessageResponse    # Generated message
    finish_reason: str          # Completion reason
    usage: TokenUsage          # Token consumption
    cost: float                # Generation cost
    provider_info: ProviderInfo # API version, response time

class MessageResponse:
    role: str       # "assistant"
    content: str    # Generated text
    tokens: int     # Token count

Location: backend/syft_space/components/endpoints/schemas.py:336

Relationships

Tenant: Each model belongs to one tenant
Endpoints: One model can be used by multiple endpoints

Context injection

The ChatContext object tracks model usage:

class ChatContext(Context):
    sender: str     # Email of user making request (from auth token)
    model_id: UUID  # Model being used

This enables:

Audit logging (who used which model when)
Usage tracking per sender
Policy enforcement based on sender identity

Location: backend/syft_space/components/model_types/interfaces.py:11

Example workflow

Create OpenAI model

POST /api/v1/models with OpenAI credentials

{
  "name": "gpt-4-assistant",
  "dtype": "openai",
  "configuration": {
    "api_key": "sk-...",
    "model": "gpt-4"
  }
}

Test healthcheck

GET /api/v1/models/gpt-4-assistant/healthcheckVerifies API key and connectivity

Create endpoint

Link model to an endpoint (with or without dataset)

{
  "slug": "qa-bot",
  "model_id": "<model-uuid>",
  "response_type": "summary"
}

Query endpoint

POST /api/v1/endpoints/qa-bot/query

{
  "messages": [{"role": "user", "content": "What is RAG?"}],
  "temperature": 0.7,
  "max_tokens": 150
}

Returns generated response

Best practices

Use descriptive names

Name models by their purpose: customer-support-gpt4, legal-qa-llama2

Secure API keys

Store API keys in configuration, not hardcoded. They are encrypted in the database.

Test with healthcheck

Always run healthcheck after creating a model to verify connectivity before using in endpoints.

Monitor token usage

Track usage.total_tokens in responses to understand costs and optimize prompts.

Get Started

Core Concepts

Guides

Desktop App

Deployment

Advanced

Model entity

Model types

Configuration schema

Chat interface

Chat data models

ChatMessage

ChatParameters

ChatResult

Available model types

OpenAI

vLLM (local)

Model operations

Create model

Update model

Delete model

Healthcheck

RAG integration

Response format

Relationships

Context injection

Example workflow

Best practices

Next steps

Endpoints

Policies

Get Started

Core Concepts

Guides

Desktop App

Deployment

Advanced

Documentation Index

​Model entity

​Model types

​Configuration schema

​Chat interface

​Chat data models

​ChatMessage

​ChatParameters

​ChatResult

​Available model types

​OpenAI

​vLLM (local)

​Model operations

​Create model

​Update model

​Delete model

​Healthcheck

​RAG integration

​Response format

​Relationships

​Context injection

​Example workflow

​Best practices

​Next steps

Endpoints

Policies

Model entity

Model types

Configuration schema

Chat interface

Chat data models

ChatMessage

ChatParameters

ChatResult

Available model types

OpenAI

vLLM (local)

Model operations

Create model

Update model

Delete model

Healthcheck

RAG integration

Response format

Relationships

Context injection

Example workflow

Best practices

Next steps