VALAR LABSBOOK A CALL

[00] — THESIS

Most efficient, lowest-cost inference for agentic workloads.

AI is moving from chat to work. Agents don't make one model call — they research, code, test, extract, review, retry, and operate in the background. Valar is the execution layer for agentic inference: one system to run production workloads across GPUs, accelerators, clouds, and owned capacity. Every job runs on the cheapest compute that meets its latency, QoS, and data requirements.

Frontier-like performance. Fraction of the cost.

[01] — PRINCIPLES

I.

Workloads, not requests

Chat is one prompt, one response. Agents are long-running workloads — repeated model calls, tool use, retries, state, memory, deadlines. We build for inference as execution, not API calls.

II.

Any accelerator, one layer

NVIDIA, AMD, TPUs, Trainium, Inferentia — reserved, spot, private, sovereign, idle. Valar turns fragmented compute into one execution layer with scheduling, placement, QoS, and cost control.

III.

Cost per completed task

Tokens are not the outcome. A merged PR. A processed document. A resolved ticket. We optimize the cost, speed, and reliability of completed work.

IV.

Private by design

The most valuable workloads can't leave the customer's environment. Managed and private deployments with full data residency — and the simplicity of a managed API.

[02] — WHY NOW

AI is becoming labor. One chat request creates a few tokens. One agentic task can create millions. As agents enter production, inference becomes a cost, reliability, and scheduling problem. The chat stack was built for responses. The agent stack must be built for work.

Long-running execution · heterogeneous compute · private deployment · cost per successful task.

WRITE TO US →

[03] — JOIN US

We're building the inference engine for autonomous work. Schedulers, kernels, compilers, distributed systems, model serving, GPU orchestration, runtime optimization. Small team. High ownership. Hard problems.

[04] — WAITLIST

We're onboarding a handful of design partners running real agentic workloads.

No newsletters. No spam. One email when there's something real to share.