Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/openmined/syft-space/llms.txt

Use this file to discover all available pages before exploring further.

Models represent configured instances of AI language models that generate text responses. Each model is backed by a model type that defines how to communicate with the underlying AI service.

Model entity

A model is defined by the following properties:
class Model:
    id: UUID                # Unique identifier
    tenant_id: UUID         # Tenant isolation
    name: str               # Unique name per tenant
    dtype: str              # Model type (e.g., "openai", "vllm_local")
    configuration: dict     # Type-specific config (API keys, URLs)
    summary: str            # Brief description
    tags: str               # Comma-separated tags
    created_at: datetime
    updated_at: datetime
Location: backend/syft_space/components/models/entities.py:15

Model types

Model types implement the BaseModelType protocol and provide:

Configuration schema

Each type defines required fields:
@classmethod
def configuration_schema(cls) -> dict[str, Any]:
    """Return configuration schema for this model type."""
    return {
        "api_key": {"type": "string", "required": True, "secret": True},
        "model": {"type": "string", "required": True},
        "base_url": {"type": "string", "required": False}
    }

Chat interface

All model types implement chat functionality:
async def chat(
    self,
    ctx: ChatContext,
    messages: list[ChatMessage],
    params: ChatParameters | None = None
) -> ChatResult:
    """Generate a response from the model."""
Location: backend/syft_space/components/model_types/interfaces.py:129

Chat data models

ChatMessage

Input messages to the model:
class ChatMessage:
    role: str       # "user", "assistant", or "system"
    content: str    # Message text
Location: backend/syft_space/components/model_types/interfaces.py:18

ChatParameters

Control generation behavior:
class ChatParameters:
    temperature: float = 0.7           # Randomness (0.0-2.0)
    max_tokens: int = 100              # Maximum response length
    stop_sequences: list[str] = []     # Stop generation at these strings
    presence_penalty: float = 0.0      # Penalize repeated topics (-2.0 to 2.0)
    frequency_penalty: float = 0.0     # Penalize repeated tokens (-2.0 to 2.0)
    top_p: float = 1.0                 # Nucleus sampling (0.0-1.0)
    extra_options: dict = {}           # Type-specific options
Location: backend/syft_space/components/model_types/interfaces.py:26

ChatResult

Model response:
class ChatResult:
    id: str                         # Unique completion ID
    model: str                      # Model name used
    messages: list[ChatMessageResult]  # Generated messages
    finish_reason: str              # "stop", "length", "error", etc.
    usage: TokenUsage               # Token consumption details
    metadata: dict                  # Additional info

class ChatMessageResult:
    role: str       # Message role
    content: str    # Generated text
    tokens: int     # Tokens in this message

class TokenUsage:
    prompt_tokens: int      # Tokens in input
    completion_tokens: int  # Tokens in output
    total_tokens: int       # Sum of both
Location: backend/syft_space/components/model_types/interfaces.py:68

Available model types

OpenAI

Type name: openai Connect to OpenAI’s API (GPT-4, GPT-3.5, etc.). Configuration:
{
  "api_key": "sk-...",
  "model": "gpt-4",
  "base_url": "https://api.openai.com/v1"  // optional
}
Use cases:
  • Production-grade chat completions
  • Function calling
  • Advanced reasoning tasks

vLLM (local)

Type name: vllm_local Connect to locally-hosted vLLM inference server. Configuration:
{
  "base_url": "http://localhost:8000",
  "model": "meta-llama/Llama-2-7b-chat-hf"
}
Use cases:
  • Privacy-preserving inference (data never leaves your infrastructure)
  • Custom fine-tuned models
  • Cost optimization for high-volume use

Model operations

Create model

async def create_model(
    request: CreateModelRequest,
    tenant: Tenant
) -> ModelResponse:
    """
    1. Validates model type exists
    2. Validates configuration against schema
    3. Creates model entity
    """
Location: backend/syft_space/components/models/handlers.py:86 Request schema:
class CreateModelRequest:
    name: str               # Unique name per tenant
    dtype: str              # Model type name
    configuration: dict     # Type-specific config
    summary: str = ""       # Optional description
    tags: str = ""          # Comma-separated tags

Update model

Partial updates (only name, summary, tags):
async def update_model(
    name: str,
    request: UpdateModelRequest,
    tenant: Tenant
) -> ModelResponse:
    """
    Updates metadata fields.
    Configuration cannot be updated (delete + recreate instead).
    """
Model configuration (API keys, URLs) cannot be updated. To change configuration, delete and recreate the model.
Location: backend/syft_space/components/models/handlers.py:162

Delete model

async def delete_model(name: str, tenant: Tenant) -> dict:
    """Deletes model and cascades to connected endpoints."""
Location: backend/syft_space/components/models/handlers.py:197

Healthcheck

Verify model connectivity:
async def healthcheck(name: str, tenant: Tenant) -> HealthcheckResponse:
    """
    Returns:
    - status: HEALTHY or UNHEALTHY
    - message: Details about connection state
    """
Location: backend/syft_space/components/models/handlers.py:217

RAG integration

When a model is used in an endpoint with a dataset (response type “both”), search results are automatically injected as context:
# Endpoint handler combines dataset + model
if references and references.documents:
    # Build context from top 3 search results
    context_content = "\n\n".join([
        f"[{doc.document_id}] {doc.content}"
        for doc in references.documents[:3]
    ])
    
    # Inject as system message
    context_message = ChatMessage(
        role="system",
        content=f"Use the following context to answer:\n{context_content}"
    )
    messages.insert(0, context_message)

# Chat with model
chat_result = await model_instance.chat(ctx, messages, params)
Location: backend/syft_space/components/endpoints/handlers.py:481 This implements the retrieval-augmented generation pattern:
  1. Query searches dataset for relevant documents
  2. Top results are formatted as context
  3. Context + user message sent to model
  4. Model generates answer grounded in retrieved documents

Response format

When querying an endpoint, model responses follow this structure:
class SummaryResponse:
    id: str                     # Completion ID
    model: str                  # Model name used
    message: MessageResponse    # Generated message
    finish_reason: str          # Completion reason
    usage: TokenUsage          # Token consumption
    cost: float                # Generation cost
    provider_info: ProviderInfo # API version, response time

class MessageResponse:
    role: str       # "assistant"
    content: str    # Generated text
    tokens: int     # Token count
Location: backend/syft_space/components/endpoints/schemas.py:336

Relationships

  • Tenant: Each model belongs to one tenant
  • Endpoints: One model can be used by multiple endpoints

Context injection

The ChatContext object tracks model usage:
class ChatContext(Context):
    sender: str     # Email of user making request (from auth token)
    model_id: UUID  # Model being used
This enables:
  • Audit logging (who used which model when)
  • Usage tracking per sender
  • Policy enforcement based on sender identity
Location: backend/syft_space/components/model_types/interfaces.py:11

Example workflow

1

Create OpenAI model

POST /api/v1/models with OpenAI credentials
{
  "name": "gpt-4-assistant",
  "dtype": "openai",
  "configuration": {
    "api_key": "sk-...",
    "model": "gpt-4"
  }
}
2

Test healthcheck

GET /api/v1/models/gpt-4-assistant/healthcheckVerifies API key and connectivity
3

Create endpoint

Link model to an endpoint (with or without dataset)
{
  "slug": "qa-bot",
  "model_id": "<model-uuid>",
  "response_type": "summary"
}
4

Query endpoint

POST /api/v1/endpoints/qa-bot/query
{
  "messages": [{"role": "user", "content": "What is RAG?"}],
  "temperature": 0.7,
  "max_tokens": 150
}
Returns generated response

Best practices

Name models by their purpose: customer-support-gpt4, legal-qa-llama2
Store API keys in configuration, not hardcoded. They are encrypted in the database.
Always run healthcheck after creating a model to verify connectivity before using in endpoints.
Track usage.total_tokens in responses to understand costs and optimize prompts.

Next steps

Endpoints

Combine models with datasets to create RAG endpoints

Policies

Apply rate limiting and access controls to model usage