Infraless AI at scale

Transformer models you embed in your app to make smarter routing decisions for your agentic experience. Classify prompts, estimate reasoning effort, and detect code languages, entirely client-side.

This is not a routing service. These are tiny models you drop into your own stack so your agent can decide which LLM to call, how much thinking budget to set, and what context to inject, before the API request.

4 tiny AI models that run inside your app
Runs in the browser, no servers needed
Automatically picks the cheapest AI that can handle each prompt
Detects 30 programming languages

Try the demo Read more

Loading models (0/4)...

Live Demo

Try the models

These models are running live in your browser via ONNX Runtime Web. Nothing is sent to a server.

Prompt Router + Reasoning Effort

Enter a prompt to classify its difficulty and predict reasoning effort. The router recommends which model tier to use.

Code Classifier

Paste a code snippet. The model detects the programming language from 30 supported languages.

Autocomplete (SFT)

Start typing and the SFT model generates suggestions in real time. Press Tab to accept one word at a time.

Autocomplete model not loaded

Type in the box above to see autocomplete suggestions. Next token confidence is set to 25%. A higher threshold produces more predictable, production-ready suggestions by only surfacing tokens the model is confident about.

Models

Four models. Zero servers.

Prompt Router

Difficulty Classifier

Classifies prompts as Easy, Medium, or Hard and maps them to a model tier: Haiku ($), Sonnet ($$), or Opus ($$$). Stop sending simple questions to the most expensive model.

3 classes·16K vocab·512 max tokens

Reasoning Effort

Thinking Budget Predictor

Predicts whether a prompt needs Low, Medium, or High reasoning effort. Set the thinking budget before the LLM starts generating.

3 classes·16K vocab·512 max tokens

Code Classifier

30 Programming Languages

Detects code in prompts and classifies the language: Python, TypeScript, Rust, Go, Java, C++, Ruby, SQL, Haskell, FORTRAN, and 20 more. Enables language-specific context injection before the API call.

PythonTypeScriptRustGoJavaC++RubySQLHaskellC#+20 more

Autocomplete

SFT Tiny GPT

A small transformer fine-tuned on 788 real-world code examples from production applications. The base model learned core language patterns; this supervised fine-tuned version generates inline prompt suggestions as you type. Real-time ghost text, accept with Tab.

Write a function that calculates the sum of...Tab

How it works

Browser-native inference

Load once

Quantized ONNX models load into the browser on first visit. They're cached, subsequent loads are instant.

Classify locally

ONNX Runtime Web runs inference via WebAssembly in a dedicated Web Worker. The main thread stays responsive.

Route smart

Results determine which model tier to call, how much reasoning to budget, and what language context to inject. All before the first API call.

ONNX Web

Runtime

WebAssembly

Execution

Web Workers

Threading

100% local

Privacy

Economics

Stop overpaying for AI

Most prompts are simple. Routing them to the right model tier saves real money.

Prompt type

Without routing

With routing

Savings

Easy

Opus ($$$)

Haiku ($)

~80%

Medium

Opus ($$$)

Sonnet ($$)

~40%

Hard

Opus ($$$)

Pricing based on Anthropic Claude models, and used for illustrative purposes only.

About

A hobby lab

knowmatic hobby lab is a side project. I work on this in my free time because I think the future of AI routing belongs on the client, not the server. You can find more of my work on GitHub.

Every major AI platform sends your data to a server for decisions that a tiny, task-specific, model could make locally in milliseconds. For prompt routing, language detection, and effort estimation, that round trip is unnecessary.

These models are small on purpose. They run in WebAssembly, load in seconds, and can make every subsequent API call smarter, cheaper, and faster, before it even happens.

knowmatic hobby lab partners with Thundraa, led by Mohammad, on applied AI projects. Together we built the agentic coding evals for the opencode.ai team, helping them measure and understand their agent's coding abilities.

Roadmap

What's next

Python library. pip install knowmatic, drop the models into any Python application. FastAPI middleware, Django integration, standalone CLI.

TypeScript library. npm install knowmatic, use the same models in Node.js, Deno, or any frontend framework. Works with ONNX Runtime Node for server-side or ONNX Runtime Web for the browser.

One line of code. The goal is to make prompt routing, reasoning effort estimation, and code classification available to any existing project with a single import.

Contact

Get in touch

Looking for investors, partners, and early customers who believe decision-making should live where the user works.

Infraless AI at scale

Transformer models you embed in your app to make smarter routing decisions for your agentic experience. Classify prompts, estimate reasoning effort, and detect code languages, entirely client-side.

4 tiny AI models that run inside your app
Runs in the browser, no servers needed
Automatically picks the cheapest AI that can handle each prompt
Detects 30 programming languages

Try the demo Read more

Loading models (0/4)...

Live Demo

Try the models

These models are running live in your browser via ONNX Runtime Web. Nothing is sent to a server.

Prompt Router + Reasoning Effort

Enter a prompt to classify its difficulty and predict reasoning effort. The router recommends which model tier to use.

Code Classifier

Paste a code snippet. The model detects the programming language from 30 supported languages.

Autocomplete (SFT)

Start typing and the SFT model generates suggestions in real time. Press Tab to accept one word at a time.

Autocomplete model not loaded

Models

Four models. Zero servers.

Prompt Router

Difficulty Classifier

Classifies prompts as Easy, Medium, or Hard and maps them to a model tier: Haiku ($), Sonnet ($$), or Opus ($$$). Stop sending simple questions to the most expensive model.

3 classes·16K vocab·512 max tokens

Reasoning Effort

Thinking Budget Predictor

Predicts whether a prompt needs Low, Medium, or High reasoning effort. Set the thinking budget before the LLM starts generating.

3 classes·16K vocab·512 max tokens

Code Classifier

30 Programming Languages

PythonTypeScriptRustGoJavaC++RubySQLHaskellC#+20 more

Autocomplete

SFT Tiny GPT

Write a function that calculates the sum of...Tab

How it works

Browser-native inference

Load once

Quantized ONNX models load into the browser on first visit. They're cached, subsequent loads are instant.

Classify locally

ONNX Runtime Web runs inference via WebAssembly in a dedicated Web Worker. The main thread stays responsive.

Route smart

Results determine which model tier to call, how much reasoning to budget, and what language context to inject. All before the first API call.

ONNX Web

Runtime

WebAssembly

Execution

Web Workers

Threading

100% local

Privacy

Economics

Stop overpaying for AI

Most prompts are simple. Routing them to the right model tier saves real money.

Prompt type

Without routing

With routing

Savings

Easy

Opus ($$$)

Haiku ($)

~80%

Medium

Opus ($$$)

Sonnet ($$)

~40%

Hard

Opus ($$$)

Pricing based on Anthropic Claude models, and used for illustrative purposes only.

About

A hobby lab

knowmatic hobby lab is a side project. I work on this in my free time because I think the future of AI routing belongs on the client, not the server. You can find more of my work on GitHub.

These models are small on purpose. They run in WebAssembly, load in seconds, and can make every subsequent API call smarter, cheaper, and faster, before it even happens.

Roadmap

What's next

Python library. pip install knowmatic, drop the models into any Python application. FastAPI middleware, Django integration, standalone CLI.

TypeScript library. npm install knowmatic, use the same models in Node.js, Deno, or any frontend framework. Works with ONNX Runtime Node for server-side or ONNX Runtime Web for the browser.

One line of code. The goal is to make prompt routing, reasoning effort estimation, and code classification available to any existing project with a single import.

Contact

Get in touch

Looking for investors, partners, and early customers who believe decision-making should live where the user works.