Enterprise Playground

The Vision

Step 1

Scrape → Train → Own

Capture real enterprise UIs — banking dashboards, workflow forms, transaction screens — with Playwright. Feed them into a local 14B model and fine-tune with QLoRA. Now your team has an AI assistant that understands your design system and it never leaves your network.

Step 2

10x the Output, Same Team

Instead of one mockup per review cycle, generate 20 UI variations in the time it takes to write one ticket. PMs describe what they need — “a transaction history table with filters and export” — and explore options instantly. Designers focus on the hard UX problems, not pixel-pushing.

Step 3

Collapse the Feedback Loop

The idea-to-visual gap goes from days to seconds. Stakeholders see working HTML prototypes instead of static mockups. Iterate in real-time during the meeting, not after. The team spends less time waiting and more time making decisions that matter.

Step 4

Free Humans for Evolved Work

When AI handles the repetitive UI scaffolding, your designers focus on UX strategy, accessibility, and complex interactions. Your developers build business logic, not boilerplate. The goal isn’t fewer people — it’s higher-leverage work from the same team.

Without	With Enterprise Playground
1 mockup per review cycle	20+ variations before the meeting ends
Idea → visual: days of handoff	Idea → working HTML: seconds
Designers pixel-push boilerplate	Designers focus on UX strategy & hard problems
Developers write UI scaffolding	Developers build business logic & integrations
Per-token cloud API fees	Unlimited local inference at $0/token
Feedback loops span multiple sprints	Iterate live, ship the same day

Features

Dual-Model Inference

14B code generator + 3B router running simultaneously on one GPU (10.5 GB VRAM)

Click to see in action

SSE Streaming

Real-time HTML generation streamed token-by-token to the browser

Click to see in action

RAG Pipeline

ChromaDB + nomic-embed-text embeddings (CPU-only, zero VRAM) enrich prompts with domain context

Click to see in action

Semantic Caching

SequenceMatcher-based dedup — identical prompts return instantly, saving 100% of tokens

Click to see in action

Smart Routing

Keyword + LLM classifier routes requests to the optimal model (3B for text, 14B for code)

Click to see in action

Embedding Visualizer

Interactive 2D/3D UMAP projections of RAG embeddings via Plotly.js with click-to-inspect

Click to see in action

ChromaDB Inspector

Chunk browser, type distributions, per-workflow analytics, similarity search

Click to see in action

QLoRA Fine-Tuning

LoRA r=32 training pipeline with dataset preparation, training, and Ollama deployment

Click to see in action

Web Scraper

Playwright-based capture of banking workflow UIs with full-page screenshots

Click to see in action

Architecture

User Prompt
     |
     v
+--------------+
| Smart Router | <-- qwen2.5:3b (keyword + LLM classification)
|   (3B)       |
+------+-------+
       |
  +----+----+
  |         |
Code      Text
  |         |
  v         v
+------+  +------+
|Cache |  | 3B   |
|Check |  |Direct|
+--+---+  +------+
   |
HIT| MISS
   |    |
   |    v
   | +----------+
   | |RAG Query | <-- ChromaDB + nomic-embed-text (CPU)
   | +----+-----+
   |      v
   | +----------+
   | |3B Compress| <-- Saves 30-50% input tokens
   | +----+-----+
   |      v
   | +----------+
   | |14B Gen   | <-- qwen2.5-coder:14b (SSE streaming)
   | +----+-----+
   |      v
   | +----------+
   | |Cache Store|
   +>|+ Save HTML|
     +----------+

Model	Role	VRAM	Context
`qwen2.5-coder:14b`	HTML/CSS/JS generation	~8.5 GB	8192 tokens
`qwen2.5:3b`	Routing, chat, compression	~2.0 GB	2048 tokens
`nomic-embed-text`	RAG embeddings	0 GB (CPU)	—
Total		~10.5 GB	Leaves 5.5 GB for KV cache

8-Tab Dashboard

1

Generate

SSE streaming, prompt input, style selector, RAG context panel

2

Gallery

Live iframe previews, search/filter/sort, CACHE and RAG badges

3

Pipeline

7-phase ML pipeline: Scrape > Map > Store > Route > Generate > Cache > Train

4

Data & RAG

RAG ingest/clear/query tester, workflow browser, dataset prep

5

ML Metrics

Model comparison (14B vs 3B), VRAM gauge, cache rate, activity log

6

Observatory

RAG chunking, training lifecycle, adapter registry, pipeline diagram

7

Agent

Trace timeline, model distribution, router methods, token economy

8

Embeddings

UMAP 2D/3D scatter (Plotly.js), ChromaDB inspector, storage map

Results

95.8%

Router Accuracy

85%

Training Loss Reduction

10.5 GB

VRAM Budget (65.6%)

~35%

Cache Hit Rate

2-4s

Generation Speed

30+

API Endpoints

Tech Stack

Backend

Python 3.11+ / FastAPI / Ollama / ChromaDB / UMAP / Plotly.js / Playwright / PyTorch + PEFT / SQLite

Frontend

Next.js 15 / TypeScript strict / Tailwind CSS + shadcn/ui / Zustand / TanStack Query / Zod

Testing

Vitest for unit tests / Playwright for E2E / Storybook for component development

The Vision

Scrape → Train → Own

10x the Output, Same Team

Collapse the Feedback Loop

Free Humans for Evolved Work

Where This Is Heading

Features

Dual-Model Inference

SSE Streaming

RAG Pipeline

Semantic Caching

Smart Routing

Embedding Visualizer

ChromaDB Inspector

QLoRA Fine-Tuning

Web Scraper

Architecture

8-Tab Dashboard

Generate

Gallery

Pipeline

Data & RAG

ML Metrics

Observatory

Agent

Embeddings

Results

Tech Stack

Backend

Frontend

Testing