News · June 19, 2026

Hackathon Recap: A Day of Building

Real agentic app traffic is bursty, uneven, and over-provisioned. OpenInfer gives you the visibility and control to right-size inference infrastructure.

Ten teams gathered for a single-day hackathon to build, test, and demo — every request traced through the Weave hub to the GPU cluster. Here is the report from the floor, with photos, demos, and full telemetry.

10: Teams; Team 01 – Team 10
711: Requests inferenced; 648 succeeded · 200 OK
864.9k: Tokens served; 528k prompt · 337k completion
91.1%: Success rate; 6 server errors · 0.8%

Why this matters

Agentic traffic isn't chat traffic

Agentic AI traffic does not look like traditional chat traffic. It arrives in bursts, mixes long and short contexts, and creates idle capacity between spikes. Static provisioning forces teams to overbuild for peaks, even when average utilization is low.

OpenInfer's goal is to make this visible and controllable: schedule requests intelligently, understand real demand, and right-size infrastructure without sacrificing reliability.

Report

How the day went

The hackathon ran for a single day. Teams arrived in the morning, picked up badges, and got straight to scoping their projects. By midday, most groups had a working prototype taking shape.

The afternoon was for building and short check-ins with mentors. Teams ran quick demos in the main room before wrap-up, showing what they had put together in the time they had.

The photos below are from across the day — setup, work sessions, and the demos that closed it out.

Gallery

Photos from the floor

15 photos

01 Teams settling in at the start of the day.

02 Working through the first round of ideas.

05 OpenInfer engineers mentoring the participants.

Demos

What teams built

PureConstruct AI demo — PureConstruct AI

01 · At a glance

The numbers

10: Teams / API keys; Team 01 – Team 10
711: Total requests; 648 succeeded · 200 OK
864.9k: Tokens served; 528k prompt · 337k completion
91.1%: Success rate; 6 server errors (0.8%) · 7.9% client aborts
3.6s: Avg TTFT; p95 6.8s · p99 23.6s
23.3s: Avg TTLT; p95 62s · p99 239s
12: Peak concurrency; avg ≈ 1 · highly bursty
6 → 2: Nodes run → needed; ≈ 3× over-provisioned

02 · Load over time

When the traffic landed

Requests per hour · stacked outcome, peak concurrency overlay (PDT)

03 · Team breakdown

Who used the cluster

Requests by team

Tokens by team · prompt + completion

Per-team detail

Team / key	Requests	Total tokens	Prompt	Completion
Team 01	133	179,882	87,576	92,306
Team 02	145	77,614	69,429	8,185
Team 03	1	356	166	190
Team 04	22	113,962	101,332	12,630
Team 05	304	121,918	24,884	97,034
Team 06	38	63,975	42,844	21,131
Team 07	31	42,600	19,230	23,370
Team 08	13	11,231	5,142	6,089
Team 09	9	124,963	83,718	41,245
Team 10	15	128,448	93,652	34,796
Total — 10 keys	711	864,949	527,973	336,976

Team 02 hit a usage-accounting gap: 144 of its non-streaming /v1/chat/completions calls were metered as 0 tokens, so live counters first showed only 343. Recovered from stored response bodies, real usage was 77,614 — since backfilled into the live data and fixed in the proxy (PR #158). The per-node token sums below were captured pre-correction and slightly undercount.

04 · Reliability & latency

How it felt to use

Latency percentiles · seconds, successful requests

Outcome by HTTP status

Every error, and what caused it · 7 of 711 requests · times PDT

Time	Team	Status	Node	Latency	Cause
09:45	Team 08	500	d35407f1	1.1 s	Bad image. Vision request whose image URL was truncated/invalid — the engine couldn't decode it.
10:14	Team 05	504	e2cb00b2	120 s	Headers timeout. Engine never produced a first token within the 120 s cap.
13:54	Team 09	400	615b4fd7	0.07 s	Client bad request. Malformed/invalid payload, rejected at validation in ~70 ms.
14:17	Team 10	504	6e6434cc	62 s	Chunk timeout. Stream began (first token at 2.1 s) then stalled mid-generation and was cut.
14:22	Team 10	504	e2cb00b2	62 s	Chunk timeout. Stream began (first token at 2.2 s) then stalled mid-generation and was cut.
14:24	Team 09	500	ddfdb954	0.1 s	Session collision. Multi-turn app reused a previous_response_id; engine refused — session already exists.
15:28	Team 05	504	6e6434cc	120 s	Headers timeout. Engine never produced a first token within the 120 s cap.

Excludes 56 client-aborted requests (499) — users cancelling, not failures. No 503 no-node-available errors occurred inside event hours.

05 · Node load

How work spread across silicon

Requests & tokens per node

Node detail

Node	Reqs	Tokens	Peak
ddfdb954	127	136,507	3
765d8a20	124	98,322	2
d35407f1	123	105,222	2
6e6434cc	117	176,197	3
615b4fd7	115	146,321	2
e2cb00b2	105	125,109	2

Peak = max simultaneous in-flight requests on that node. Load is evenly spread; the lopsided routing seen under synthetic traffic disappears for real teams.

06 · Right-sizing

How many nodes we actually needed

Two nodes would have comfortably served the entire hackathon.

Real participant load peaked at 12 concurrent requests for a single hour (1 PM PDT) and averaged roughly one in-flight request — the workload was extremely bursty, with brief spikes separated by long idle stretches. Each of the six nodes only ever touched 2–3 concurrent at once, and a single node was independently shown to sustain 16.

So the peak of 12 fits inside two nodes with headroom to spare, and one node alone clears the demonstrated 16-concurrent ceiling. The six provisioned nodes ran roughly 3× over capacity for the traffic the event actually produced.

1 node survives (12 peak < 16 ceiling), no failover 2 nodes recommended — peak ~6 each + redundancy 6 nodes run ≈ 3× over capacity Fix tail latency, not node count

Generated from live prod telemetry · OpenInfer Studio · all timestamps PDT (UTC−7) · excludes synthetic keys (Stress Test · Sam · Chongyi Test)

Ready to Get Started?

OpenInfer is now available. Sign up today to gain access and experience these performance gains for yourself.

← PreviousCPU: The Processor You Can't Route Around