Workloads, not requests
Chat is one prompt, one response. Agents are long-running workloads — repeated model calls, tool use, retries, state, memory, deadlines. We build for inference as execution, not API calls.
[00] — THESIS
AI is moving from chat to work. Agents don't make one model call — they research, code, test, extract, review, retry, and operate in the background. Valar is the execution layer for agentic inference: one system to run production workloads across GPUs, accelerators, clouds, and owned capacity. Every job runs on the cheapest compute that meets its latency, QoS, and data requirements.
Frontier-like performance. Fraction of the cost.
[01] — PRINCIPLES
Chat is one prompt, one response. Agents are long-running workloads — repeated model calls, tool use, retries, state, memory, deadlines. We build for inference as execution, not API calls.
NVIDIA, AMD, TPUs, Trainium, Inferentia — reserved, spot, private, sovereign, idle. Valar turns fragmented compute into one execution layer with scheduling, placement, QoS, and cost control.
Tokens are not the outcome. A merged PR. A processed document. A resolved ticket. We optimize the cost, speed, and reliability of completed work.
The most valuable workloads can't leave the customer's environment. Managed and private deployments with full data residency — and the simplicity of a managed API.
[02] — WHY NOW
AI is becoming labor. One chat request creates a few tokens. One agentic task can create millions. As agents enter production, inference becomes a cost, reliability, and scheduling problem. The chat stack was built for responses. The agent stack must be built for work.
Long-running execution · heterogeneous compute · private deployment · cost per successful task.
WRITE TO US →[03] — JOIN US
We're building the inference engine for autonomous work. Schedulers, kernels, compilers, distributed systems, model serving, GPU orchestration, runtime optimization. Small team. High ownership. Hard problems.
[04] — WAITLIST
We're onboarding a handful of design partners running real agentic workloads.
No newsletters. No spam. One email when there's something real to share.