Enterprise Playground

Train local LLMs on real enterprise UIs, then let product managers vibe-code production interfaces. Zero API costs. One GPU. Future-proof for AI agents.

Python TypeScript FastAPI Next.js Ollama ChromaDB UMAP Plotly QLoRA Playwright RTX 4090
View on GitHub
Generate Tab
Generate
Gallery Tab
Gallery
Pipeline Tab
Pipeline
ML Metrics Tab
ML Metrics
Data & RAG Tab
Data & RAG
ML Observatory Tab
Observatory
Agent Tab
Agent
Embeddings & Storage Tab
Embeddings

The Vision

Step 1

Scrape → Train → Own

Capture real enterprise UIs — banking dashboards, workflow forms, transaction screens — with Playwright. Feed them into a local 14B model and fine-tune with QLoRA. Now your team has an AI assistant that understands your design system and it never leaves your network.

Step 2

10x the Output, Same Team

Instead of one mockup per review cycle, generate 20 UI variations in the time it takes to write one ticket. PMs describe what they need — “a transaction history table with filters and export” — and explore options instantly. Designers focus on the hard UX problems, not pixel-pushing.

Step 3

Collapse the Feedback Loop

The idea-to-visual gap goes from days to seconds. Stakeholders see working HTML prototypes instead of static mockups. Iterate in real-time during the meeting, not after. The team spends less time waiting and more time making decisions that matter.

Step 4

Free Humans for Evolved Work

When AI handles the repetitive UI scaffolding, your designers focus on UX strategy, accessibility, and complex interactions. Your developers build business logic, not boilerplate. The goal isn’t fewer people — it’s higher-leverage work from the same team.

Where This Is Heading

Today
PMs generate UI drafts in seconds, iterate with the team in real-time
Next
Agents propose variations from the roadmap — humans pick the best direction
Soon
Agents handle scaffolding end-to-end — humans focus on strategy and polish
WithoutWith Enterprise Playground
1 mockup per review cycle20+ variations before the meeting ends
Idea → visual: days of handoffIdea → working HTML: seconds
Designers pixel-push boilerplateDesigners focus on UX strategy & hard problems
Developers write UI scaffoldingDevelopers build business logic & integrations
Per-token cloud API feesUnlimited local inference at $0/token
Feedback loops span multiple sprintsIterate live, ship the same day

Features

Dual-Model Inference

14B code generator + 3B router running simultaneously on one GPU (10.5 GB VRAM)

Click to see in action

SSE Streaming

Real-time HTML generation streamed token-by-token to the browser

Click to see in action

RAG Pipeline

ChromaDB + nomic-embed-text embeddings (CPU-only, zero VRAM) enrich prompts with domain context

Click to see in action

Semantic Caching

SequenceMatcher-based dedup — identical prompts return instantly, saving 100% of tokens

Click to see in action

Smart Routing

Keyword + LLM classifier routes requests to the optimal model (3B for text, 14B for code)

Click to see in action

Embedding Visualizer

Interactive 2D/3D UMAP projections of RAG embeddings via Plotly.js with click-to-inspect

Click to see in action

ChromaDB Inspector

Chunk browser, type distributions, per-workflow analytics, similarity search

Click to see in action

QLoRA Fine-Tuning

LoRA r=32 training pipeline with dataset preparation, training, and Ollama deployment

Click to see in action

Web Scraper

Playwright-based capture of banking workflow UIs with full-page screenshots

Click to see in action

Architecture

User Prompt
     |
     v
+--------------+
| Smart Router | <-- qwen2.5:3b (keyword + LLM classification)
|   (3B)       |
+------+-------+
       |
  +----+----+
  |         |
Code      Text
  |         |
  v         v
+------+  +------+
|Cache |  | 3B   |
|Check |  |Direct|
+--+---+  +------+
   |
HIT| MISS
   |    |
   |    v
   | +----------+
   | |RAG Query | <-- ChromaDB + nomic-embed-text (CPU)
   | +----+-----+
   |      v
   | +----------+
   | |3B Compress| <-- Saves 30-50% input tokens
   | +----+-----+
   |      v
   | +----------+
   | |14B Gen   | <-- qwen2.5-coder:14b (SSE streaming)
   | +----+-----+
   |      v
   | +----------+
   | |Cache Store|
   +>|+ Save HTML|
     +----------+
ModelRoleVRAMContext
qwen2.5-coder:14bHTML/CSS/JS generation~8.5 GB8192 tokens
qwen2.5:3bRouting, chat, compression~2.0 GB2048 tokens
nomic-embed-textRAG embeddings0 GB (CPU)
Total~10.5 GBLeaves 5.5 GB for KV cache

8-Tab Dashboard

1

Generate

SSE streaming, prompt input, style selector, RAG context panel

2

Gallery

Live iframe previews, search/filter/sort, CACHE and RAG badges

3

Pipeline

7-phase ML pipeline: Scrape > Map > Store > Route > Generate > Cache > Train

4

Data & RAG

RAG ingest/clear/query tester, workflow browser, dataset prep

5

ML Metrics

Model comparison (14B vs 3B), VRAM gauge, cache rate, activity log

6

Observatory

RAG chunking, training lifecycle, adapter registry, pipeline diagram

7

Agent

Trace timeline, model distribution, router methods, token economy

8

Embeddings

UMAP 2D/3D scatter (Plotly.js), ChromaDB inspector, storage map

Results

95.8%
Router Accuracy
85%
Training Loss Reduction
10.5 GB
VRAM Budget (65.6%)
~35%
Cache Hit Rate
2-4s
Generation Speed
30+
API Endpoints

Tech Stack

Backend

Python 3.11+ / FastAPI / Ollama / ChromaDB / UMAP / Plotly.js / Playwright / PyTorch + PEFT / SQLite

Frontend

Next.js 15 / TypeScript strict / Tailwind CSS + shadcn/ui / Zustand / TanStack Query / Zod

Testing

Vitest for unit tests / Playwright for E2E / Storybook for component development