Semantic Connections, Discovered Automatically
The on-premise LLM automatically discovers semantic relationships between your personal notes and the company knowledge graph — running nightly without any manual intervention.
§1 — Nightly pipeline (03:00 UTC)
The personal_autolink_orchestrator Airflow DAG runs every night and processes all active users in five sequential steps.
Discover active users
SPARQL query on Jena finds all named graphs under http://arcaq.com/personal/ — any collaborator who has at least one personal note is automatically included in the run.
Fetch recent personal notes
For each user, up to AUTOLINK_BATCH_SIZE (default 50) personal notes are retrieved from the private graph — label, content, and note type.
Fetch company entity sample
Up to 500 company entities (with labels and types) are fetched from the shared knowledge graph as comparison targets — excluding personal and blank nodes.
LLM semantic analysis
The on-premise LLM (OLLAMA_MODEL, default llama3.2:3b) is queried with pairs of notes and company entities to produce a similarity score. Only pairs scoring above AUTOLINK_CONFIDENCE (default 0.7) become links.
Insert auto-links into personal graph
Discovered links are written as typed arcaq:AutoLink triples into the user's private named graph — idempotent, timestamped, and carrying the confidence score.
§2 — RDF auto-link model
Every discovered link is a first-class RDF resource — queryable, deletable, and traceable to the LLM model that created it.
§3 — Configuration
All thresholds and model choices are managed via environment variables — no code change required to tune the linker.
| Variable | Default | Description |
|---|---|---|
| AUTOLINK_CONFIDENCE | 0.7 | Minimum LLM similarity score to create a link (0.0–1.0). Raise for precision, lower for recall. |
| AUTOLINK_BATCH_SIZE | 50 | Max personal notes processed per user per DAG run. Raise for thoroughness, lower for speed. |
| OLLAMA_MODEL | llama3.2:3b | Ollama model used for semantic comparison. Any model installed in your cluster can be used. |
| OLLAMA_BASE_URL | http://ollama:11434 | Internal URL of the Ollama service. Must be reachable from Airflow workers. |
| ARCAQ_API_URL | http://arcaq-api:8000 | ArcaQ API base URL — used to discover users via the API rather than SPARQL when preferred. |
Let the LLM do the connecting
Stop manually tagging relationships. The auto-linker surfaces unexpected connections across domains — entirely on-premise, with full auditability.