MODELS
Open-weight LLMs
- · Llama 3.1 70B Instruct
- · Qwen 2.5 14B Instruct (fast lane)
- · Mistral 7B (legacy support)
- · LoRA fine-tunes per workflow
Weights live on your hardware. Inference via vLLM or llama.cpp.
// Architecture
Most "AI for construction" pages stop at the marketing layer. This one doesn't. Here is the exact stack we ship — the models, the vector store, the API, the UI, the infra. All of it open-source where it matters. All of it handed over on day 14 with the source code, weights, and runbooks.
// The full pipeline
LAYER 01 · INGEST
Document ingest
PDFs, drawings, RFIs, email, meeting minutes → OCR → chunked text + page-region metadata
LAYER 02 · MODELS
Local LLM layer
Llama 3.1 70B (general), Qwen 2.5 14B (fast), domain fine-tunes (e.g. rfi_classifier)
LAYER 03 · MEMORY
Embeddings + vector store
BAAI/bge-m3 embeddings → ChromaDB or Qdrant → Postgres for structured project metadata
LAYER 04 · APP
Application layer
FastAPI services for RFI triage, daily log synth, change-order risk scoring, deficiency consolidation
LAYER 05 · UI
Web UI
Next.js + Tailwind. Hosted on your domain or behind your VPN. SSO via your existing identity provider.
// Where it runs
OPTION 01
A single MacBook Pro M3 Max or a workstation with a 24GB GPU. Good for one PE / PM running RFI triage and daily logs.
~$5k hardware · 1 user · No IT lift.
OPTION 02 · COMMON
A single tower or 2U rack server in your office, dual-RTX or A6000-class GPU. Runs the full stack for a 10-50 person GC.
~$15-25k hardware · Whole office · One IT day to install.
OPTION 03
Your AWS / Azure / GCP tenant, your VPC, your IAM. We deploy via Terraform. Documents never leave your cloud account.
Hourly GPU billing · Multi-site · Your existing cloud governance applies.
// Maintenance & upgrades
Self-serve by default
Every runbook is in plain Markdown. Backups, model updates, retraining the RFI classifier on new project data — all documented step by step. A competent IT generalist can run it.
Optional retainer
If you want us on call, we offer a flat monthly retainer for upgrades, model swaps as new open-weight releases drop, and incident support. No required subscription.
Open-weight upgrades
When Llama 4 or the next Qwen ships and outperforms what you have, you can swap it. Your fine-tunes and pipelines are model-agnostic on purpose.
No phone-home
The system never calls back to us. No telemetry, no usage pings. If your network is air-gapped, the system runs anyway.
// Want to read the actual code?
Book a 30-minute call and we'll screen-share an anonymized client repo: the API, the runbooks, the model configs. No NDA needed for the walkthrough.