Screen Context AI System

Problem Overview

You're tasked with designing a system that maintains awareness of a user's screen content to answer contextual questions that traditional RAG systems cannot handle.

Core Components:

Data Ingestion API (POST /ingest)
- Called every 5 seconds with current screen content
- Schema to be designed by you
Query API (POST /chat_completion)
- Handles user questions about their digital context

Example Use Cases:

"Which people are in my team?"
"What WhatsApp messages need replies? Draft responses for me."
"What PRs need my reviews?"
[Additional contextual questions]

Technical Parameters:

You control the client-side implementation and parsing
You'll have databases of your choice, an embedding model (if needed), and an LLM
Solution should be generic but can include app-specific optimizations
Recommended research areas: knowledge graphs, memory systems (Grafiti, mem0, GraphRAG)

Deliverables:

A comprehensive system design proposal
A proof-of-concept implementation (format of your choice)
Research documentation