Introducing OpenInfer API: The Zero-Rewrite Inference Engine That Integrates Effortlessly Into Your Stack

At OpenInfer, our primary goal is to make integration effortless. We’ve designed our inference engine to be a drop-in replacement—switching your endpoints is as simple as updating a URL. And here's the best part: your existing agents and frameworks, like LangChain, won't even know the difference. We want to be your performance partner—delivering seamless integration today and working relentlessly behind the scenes to enhance your system's performance tomorrow. Once you experience OpenInfer, you’ll wonder how you ever worked without it.

Seamless Integration: Update Your Endpoints in a Snap

Switching over to OpenInfer is designed to be hassle-free. If you're currently using another local inference solution, you can update your code with just one simple change. For example:

For Existing Local Endpoints (e.g., Ollama):

Change this: http://localhost:11434

To this: http://localhost:1337/ollama

In your LangChain code for Ollama:

import { ChatOllama } from "@langchain/ollama";

const model = new ChatOllama({
  baseUrl: "http://localhost:1337/ollama",  // Add This!
  model: "@openinfer/llama-3.1-8b-q6-k:r1-distill",  // Update to use the OpenInfer model.
});

const result = await model.invoke(["human", "Hello, how are you?"]);

With this minimal change—simply updating the endpoint URL—OpenInfer slots right into your existing workflow, ensuring your current agents and frameworks continue to operate seamlessly.

What You’ll See in Our Demo Video

Our demo video walks you through key operations available in this initial phase via our local HTTP API:

Pulling a Model: Watch as we effortlessly pull a model from our repository with a single API call. The model downloads and sets up locally, ready for immediate use.
Listing Installed Models: A quick API request shows you all the models currently installed, giving you full visibility of your local resources.
Chatting with the Model: Experience a live, interactive session as we demonstrate real-time interaction with a model. This showcases just how easy it is to integrate conversational AI capabilities into your application. Each step is designed to highlight the simplicity of swapping in OpenInfer—so you can experiment and integrate new functionality without any hassle.

A Glimpse into the Future

We're committed to delivering exceptional performance as your trusted performance partner. In upcoming phases, expect further optimizations that will bring even faster responses and a smoother experience as you scale your AI capabilities.

Ready to Get Started?

OpenInfer is now available! Sign up today to gain access and experience these performance gains for yourself. Together, let’s redefine what’s possible with AI inference.