Ingestion API Schema for Contextual AI Assistant

This document defines the schema and processing logic for the /ingest endpoint of the Contextual AI Assistant, which is responsible for receiving and processing screen data from client applications.

1. Endpoint Definition

POST /ingest

The ingestion endpoint receives screen data every 5 seconds from client applications, processes it to extract entities and relationships, and stores the structured information in the Mem0 memory system.

2. Request Schema

2.1 Headers

Header Description Required Example
Content-Type Media type of the request body Yes application/json
X-User-ID Unique identifier for the user Yes user_123
X-Client-ID Identifier for the client application Yes desktop_app_v1.2
X-Session-ID Unique identifier for the user session No session_456
X-Timestamp Client-side timestamp (ISO 8601) No 2025-04-17T00:30:45Z

2.2 Request Body

{
  "content": {
    "text": "string",          // Raw text content from the screen
    "html": "string",          // Optional HTML content for rich formatting
    "structure": {},           // Optional structured representation of the content
    "metadata": {}             // Optional additional metadata
  },
  "context": {
    "app": {
      "name": "string",        // Application name (e.g., "WhatsApp", "GitHub")
      "type": "string",        // Application type (e.g., "messaging", "development")
      "version": "string",     // Application version
      "window_title": "string" // Window title
    },
    "user": {
      "active": true,          // Whether the user is actively engaging with the app
      "focus_duration_ms": 0,  // How long the user has been focused on this window
      "last_input_ms": 0       // Time since last user input
    },
    "device": {
      "type": "string",        // Device type (e.g., "desktop", "mobile")
      "os": "string",          // Operating system
      "screen_resolution": {   // Screen resolution
        "width": 0,
        "height": 0
      }
    },
    "timestamp": "string",     // ISO 8601 timestamp
    "timezone": "string"       // User's timezone
  },
  "capture": {
    "type": "full" | "diff" | "event", // Type of screen capture
    "sequence_id": 0,                 // Sequence number for ordering captures
    "diff_base_id": "string",         // Reference to previous capture (for diffs)
    "image": "string"                 // Optional: Base64-encoded screenshot
  }
}

2.2.1 Content Object

The content object contains the actual screen data captured by the client:

2.2.2 Context Object

The context object provides information about the environment in which the screen data was captured: