A comprehensive technical overview of the Projekt Blueprint OSINT platform—from its agentic RAG system and entity extraction pipeline to its real-time streaming architecture and enterprise scalability patterns. Built for intelligence at scale.
Hot-swappable LLM backends. Google Gemini, OpenAI GPT-4, Anthropic Claude, or local Ollama/LM Studio. Provider abstraction via unified interface.
Structured output via native function calling APIs. Gemini function-based entity extraction for reliable, typed data extraction.
Document and entity embeddings for semantic similarity search. pgvector with HNSW indexing for O(log n) approximate nearest neighbor queries.
The intelligence engine at Blueprint's core. A ReAct-based autonomous agent that reasons through complex queries using a suite of 24 specialized tools. Not just retrieval—true multi-step reasoning with tool orchestration.
The agent operates in a reasoning loop: it thinks about the query, selects and executes tools, observes results, and iterates until a complete answer emerges. Up to 100 iterations with 32K token context windows.
Entity search, semantic search, event search, document search, RSS feeds, relations, alerts, workspace stats
Create entity, update entity, add relations, create investigation, create alert
Batch entity lookup, batch relations, find common connections, execute workflow
Generate report, list playbooks, get playbook, OSINT lookups (GreyNoise, VirusTotal), render canvas
The agent can render visualizations directly into a split-view canvas area. Seven canvas types support different analytical outputs. SSE streaming enables real-time canvas updates as the agent processes queries.
Force-directed networks
Structured data grids
Temporal sequences
Geospatial plots
KPI dashboards
Formatted reports
Automated intelligence extraction from any data source. AI-powered entity recognition with fuzzy deduplication, alias resolution, and relationship inference. Production-grade queue architecture for high-throughput processing.
Crash-resilient job processing
All extraction jobs—RSS items, documents, scrapes—flow through a unified BullMQ queue backed by Redis. 20 concurrent workers process jobs in parallel with 1800 RPM rate limiting to stay within LLM API quotas.
Per-source failure handling
Separate circuit breakers for RSS, documents, scrape, and manual sources prevent cascade failures. If one source type fails repeatedly, only that source pauses— other sources continue processing normally.
5 failures
30 seconds
2 test requests
On success
Fuzzy matching + cross-type detection
Prevents duplicate entities using Levenshtein distance scoring. Cross-type deduplication finds same-name entities classified differently. Three outcomes: LINK (matched), REVIEW (uncertain), CREATE (new entity).
Multi-layer safety controls for autonomous AI operation. Input validation, output filtering, human-in-the-loop approval, and comprehensive tracing. Responsible AI by design.
PostgreSQL at the core with specialized indexing strategies for different query patterns. Full-text search, vector similarity, and graph traversal—all in a single database.
tsvector-indexed text search across entity names, descriptions, and document content. Sub-50ms queries on 1M+ rows with proper GIN indexing.
Semantic similarity search via vector embeddings. HNSW indexing provides O(log n) approximate nearest neighbor queries for fast semantic retrieval.
Entity relationships stored in dedicated table with typed relations. Recursive CTEs enable multi-hop graph traversal up to 100 hops for connection discovery.
Blueprint is built for enterprise deployment. On-premise installation, private cloud options, and dedicated support available for qualified organizations.
FOR PARTNERSHIP INQUIRIES: NICK@GRUPPEPROJEKT.COM