This document defines the schema and processing logic for the /ingest endpoint of the Contextual AI Assistant, which is responsible for receiving and processing screen data from client applications.
POST /ingest
The ingestion endpoint receives screen data every 5 seconds from client applications, processes it to extract entities and relationships, and stores the structured information in the Mem0 memory system.
| Header | Description | Required | Example |
|---|---|---|---|
Content-Type |
Media type of the request body | Yes | application/json |
X-User-ID |
Unique identifier for the user | Yes | user_123 |
X-Client-ID |
Identifier for the client application | Yes | desktop_app_v1.2 |
X-Session-ID |
Unique identifier for the user session | No | session_456 |
X-Timestamp |
Client-side timestamp (ISO 8601) | No | 2025-04-17T00:30:45Z |
{
"content": {
"text": "string", // Raw text content from the screen
"html": "string", // Optional HTML content for rich formatting
"structure": {}, // Optional structured representation of the content
"metadata": {} // Optional additional metadata
},
"context": {
"app": {
"name": "string", // Application name (e.g., "WhatsApp", "GitHub")
"type": "string", // Application type (e.g., "messaging", "development")
"version": "string", // Application version
"window_title": "string" // Window title
},
"user": {
"active": true, // Whether the user is actively engaging with the app
"focus_duration_ms": 0, // How long the user has been focused on this window
"last_input_ms": 0 // Time since last user input
},
"device": {
"type": "string", // Device type (e.g., "desktop", "mobile")
"os": "string", // Operating system
"screen_resolution": { // Screen resolution
"width": 0,
"height": 0
}
},
"timestamp": "string", // ISO 8601 timestamp
"timezone": "string" // User's timezone
},
"capture": {
"type": "full" | "diff" | "event", // Type of screen capture
"sequence_id": 0, // Sequence number for ordering captures
"diff_base_id": "string", // Reference to previous capture (for diffs)
"image": "string" // Optional: Base64-encoded screenshot
}
}
The content object contains the actual screen data captured by the client:
The context object provides information about the environment in which the screen data was captured: