Back to News

News · June 19, 2026

Hackathon Recap: A Day of Building

Real agentic app traffic is bursty, uneven, and over-provisioned. OpenInfer gives you the visibility and control to right-size inference infrastructure.

Ten teams gathered for a single-day hackathon to build, test, and demo — every request traced through the Weave hub to the GPU cluster. Here is the report from the floor, with photos, demos, and full telemetry.

10
Teams
Team 01 – Team 10
711
Requests inferenced
648 succeeded · 200 OK
864.9k
Tokens served
528k prompt · 337k completion
91.1%
Success rate
6 server errors · 0.8%

Why this matters

Agentic traffic isn't chat traffic

Agentic AI traffic does not look like traditional chat traffic. It arrives in bursts, mixes long and short contexts, and creates idle capacity between spikes. Static provisioning forces teams to overbuild for peaks, even when average utilization is low.

OpenInfer's goal is to make this visible and controllable: schedule requests intelligently, understand real demand, and right-size infrastructure without sacrificing reliability.

Report

How the day went

The hackathon ran for a single day. Teams arrived in the morning, picked up badges, and got straight to scoping their projects. By midday, most groups had a working prototype taking shape.

The afternoon was for building and short check-ins with mentors. Teams ran quick demos in the main room before wrap-up, showing what they had put together in the time they had.

The photos below are from across the day — setup, work sessions, and the demos that closed it out.

Gallery

Photos from the floor

Teams settling in at the start of the day.
01 Teams settling in at the start of the day.
Working through the first round of ideas.
02 Working through the first round of ideas.
Whiteboarding the architecture.
03 Whiteboarding the architecture.
CEO introducing OpenInfer.
04 CEO introducing OpenInfer.
OpenInfer engineers mentoring the participants.
05 OpenInfer engineers mentoring the participants.
Developing together.
06 Developing together.
Demonstrating their product.
07 Demonstrating their product.
Live demo to a small audience.
08 Live demo to a small audience.
Side conversations between sessions.
09 Side conversations between sessions.
Reviewing progress with a mentor.
10 Reviewing progress with a mentor.
Showing the product.
11 Showing the product.
Notes and laptops, mid-afternoon.
12 Notes and laptops, mid-afternoon.
A team comparing results.
13 A team comparing results.
Wrap-up before final presentations.
14 Wrap-up before final presentations.
Closing the day.
15 Closing the day.

Demos

What teams built

PureConstruct AI
BrandOps
Autopsy
On the Fly

01 · At a glance

The numbers

10
Teams / API keys
Team 01 – Team 10
711
Total requests
648 succeeded · 200 OK
864.9k
Tokens served
528k prompt · 337k completion
91.1%
Success rate
6 server errors (0.8%) · 7.9% client aborts
3.6s
Avg TTFT
p95 6.8s · p99 23.6s
23.3s
Avg TTLT
p95 62s · p99 239s
12
Peak concurrency
avg ≈ 1 · highly bursty
6 → 2
Nodes run → needed
≈ 3× over-provisioned

02 · Load over time

When the traffic landed

Requests per hour · stacked outcome, peak concurrency overlay (PDT)

03 · Team breakdown

Who used the cluster

Requests by team

Tokens by team · prompt + completion

Per-team detail

Team / key Requests Total tokens Prompt Completion
Team 01
133
179,882 87,576 92,306
Team 02
145
77,614 69,429 8,185
Team 03
1
356 166 190
Team 04
22
113,962 101,332 12,630
Team 05
304
121,918 24,884 97,034
Team 06
38
63,975 42,844 21,131
Team 07
31
42,600 19,230 23,370
Team 08
13
11,231 5,142 6,089
Team 09
9
124,963 83,718 41,245
Team 10
15
128,448 93,652 34,796
Total — 10 keys 711 864,949 527,973 336,976

Team 02 hit a usage-accounting gap: 144 of its non-streaming /v1/chat/completions calls were metered as 0 tokens, so live counters first showed only 343. Recovered from stored response bodies, real usage was 77,614 — since backfilled into the live data and fixed in the proxy (PR #158). The per-node token sums below were captured pre-correction and slightly undercount.

04 · Reliability & latency

How it felt to use

Latency percentiles · seconds, successful requests

Outcome by HTTP status

Every error, and what caused it · 7 of 711 requests · times PDT

Time Team Status Node Latency Cause
09:45 Team 08 500 d35407f1 1.1 s Bad image. Vision request whose image URL was truncated/invalid — the engine couldn't decode it.
10:14 Team 05 504 e2cb00b2 120 s Headers timeout. Engine never produced a first token within the 120 s cap.
13:54 Team 09 400 615b4fd7 0.07 s Client bad request. Malformed/invalid payload, rejected at validation in ~70 ms.
14:17 Team 10 504 6e6434cc 62 s Chunk timeout. Stream began (first token at 2.1 s) then stalled mid-generation and was cut.
14:22 Team 10 504 e2cb00b2 62 s Chunk timeout. Stream began (first token at 2.2 s) then stalled mid-generation and was cut.
14:24 Team 09 500 ddfdb954 0.1 s Session collision. Multi-turn app reused a previous_response_id; engine refused — session already exists.
15:28 Team 05 504 6e6434cc 120 s Headers timeout. Engine never produced a first token within the 120 s cap.

Excludes 56 client-aborted requests (499) — users cancelling, not failures. No 503 no-node-available errors occurred inside event hours.

05 · Node load

How work spread across silicon

Requests & tokens per node

Node detail

Node Reqs Tokens Peak
ddfdb954 127 136,507 3
765d8a20 124 98,322 2
d35407f1 123 105,222 2
6e6434cc 117 176,197 3
615b4fd7 115 146,321 2
e2cb00b2 105 125,109 2

Peak = max simultaneous in-flight requests on that node. Load is evenly spread; the lopsided routing seen under synthetic traffic disappears for real teams.

06 · Right-sizing

How many nodes we actually needed

Two nodes would have comfortably served the entire hackathon.

Real participant load peaked at 12 concurrent requests for a single hour (1 PM PDT) and averaged roughly one in-flight request — the workload was extremely bursty, with brief spikes separated by long idle stretches. Each of the six nodes only ever touched 2–3 concurrent at once, and a single node was independently shown to sustain 16.

So the peak of 12 fits inside two nodes with headroom to spare, and one node alone clears the demonstrated 16-concurrent ceiling. The six provisioned nodes ran roughly 3× over capacity for the traffic the event actually produced.

1 node survives (12 peak < 16 ceiling), no failover 2 nodes recommended — peak ~6 each + redundancy 6 nodes run ≈ over capacity Fix tail latency, not node count

Generated from live prod telemetry · OpenInfer Studio · all timestamps PDT (UTC−7) · excludes synthetic keys (Stress Test · Sam · Chongyi Test)

Ready to Get Started?

OpenInfer is now available. Sign up today to gain access and experience these performance gains for yourself.