knowmatic hobby lab
DemoModelsDocs

Infraless AI at scale

Transformer models you embed in your app to make smarter routing decisions for your agentic experience. Classify prompts, estimate reasoning effort, and detect code languages, small enough to run anywhere with negligible overhead.

This is not a routing service. These are tiny models you drop into your own stack so your agent can decide which LLM to call, how much thinking budget to set, and what context to inject, before the API request.

  • 4 tiny AI models that run inside your app
  • Runs in the browser or locally in Node.js, no extra servers needed
  • Automatically picks the cheapest AI that can handle each prompt
  • Detects 30 programming languages
Try the demoRead more
Loading models (0/4)...

Live Demo

Try the models

These models are running live in your browser via ONNX Runtime Web. Nothing is sent to a server.

Prompt Router + Reasoning Effort

Enter a prompt to classify its difficulty and predict reasoning effort. The router recommends which model tier to use.

Code Classifier

Paste a code snippet. The model detects the programming language from 30 supported languages.

Autocomplete (SFT)

Start typing and the SFT model generates suggestions in real time. Press Tab to accept one word at a time.

Autocomplete model not loaded

Type in the box above to see autocomplete suggestions. Next token confidence is set to 25%. A higher threshold produces more predictable, production-ready suggestions by only surfacing tokens the model is confident about.


Models

Four models. Zero servers.

Prompt Router

Difficulty Classifier

Classifies prompts as Easy, Medium, or Hard and maps them to a model tier: Haiku ($), Sonnet ($$), or Opus ($$$). Stop sending simple questions to the most expensive model.

3 classes·16K vocab·512 max tokens

Reasoning Effort

Thinking Budget Predictor

Predicts whether a prompt needs Low, Medium, or High reasoning effort. Set the thinking budget before the LLM starts generating.

3 classes·16K vocab·512 max tokens

Code Classifier

30 Programming Languages

Detects code in prompts and classifies the language: Python, TypeScript, Rust, Go, Java, C++, Ruby, SQL, Haskell, FORTRAN, and 20 more. Enables language-specific context injection before the API call.

PythonTypeScriptRustGoJavaC++RubySQLHaskellC#+20 more

Autocomplete

SFT Tiny GPT

A small transformer fine-tuned on 788 real-world code examples from production applications. The base model learned core language patterns; this supervised fine-tuned version generates inline prompt suggestions as you type. Real-time ghost text, accept with Tab.

Write a function that calculates the sum of...Tab

How it works

Lightweight, on-device inference

01

Load once

Quantized ONNX models load into your browser or app on first use. They're small enough that load time is negligible.

02

Classify locally

ONNX Runtime runs inference locally, via WebAssembly in the browser or natively in Node.js. Either way, classification adds negligible latency.

03

Route smart

Results determine which model tier to call, how much reasoning to budget, and what language context to inject. All before the first API call.

ONNX Runtime
Runtime
Browser or Node.js
Execution
Negligible
Overhead
100% local
Privacy

Economics

Stop overpaying for AI

Most prompts are simple. Routing them to the right model tier saves real money.

Prompt type
Without routing
With routing
Savings
Easy
Opus ($$$)
Haiku ($)
~80%
Medium
Opus ($$$)
Sonnet ($$)
~40%
Hard
Opus ($$$)
Opus ($$$)
0%

Pricing based on Anthropic Claude models, and used for illustrative purposes only.


About

A hobby lab

knowmatic hobby lab is a side project. I work on this in my free time because I think the future of AI routing belongs on the client, not the server. You can find more of my work on GitHub.

Every major AI platform sends your data to a server for decisions that a tiny, task-specific, model could make locally in milliseconds. For prompt routing, language detection, and effort estimation, that round trip is unnecessary.

These models are small on purpose. They run in the browser or locally in your app, load instantly, and add negligible time to your pipeline, making every API call smarter, cheaper, and faster before it even happens.

knowmatic hobby lab partners with Thundraa, led by Mohammad, on applied AI projects. We built the agentic coding evals for the opencode.ai team, helping them measure their agent's coding abilities. Now we're working together to build a custom routing model, a tiny GPT trained on real use cases from the opencode CLI that automatically matches each prompt to the best model for the task. The training data reflects how developers actually use the opencode agent, so the router learns from genuine agentic workflows rather than synthetic benchmarks.


Roadmap

What's next

Python library. pip install knowmatic, drop the models into any Python application. FastAPI middleware, Django integration, standalone CLI.

TypeScript library. npm install knowmatic, use the same models in Node.js, Deno, or any frontend framework. Works with ONNX Runtime Node for server-side or ONNX Runtime Web for the browser.

One line of code. The goal is to make prompt routing, reasoning effort estimation, and code classification available to any existing project with a single import.


Contact

Get in touch

Looking for investors, partners, and early customers who believe decision-making should live where the user works.

knowmatic hobby lab
DemoModelsDocsContact
Models are provided for use within this application only and may not be redistributed. Terms of Use