Architecture – What You Actually Own

// The full pipeline

FIVE LAYERS, NO BLACK BOXES.

LAYER 01 · INGEST

Document ingest

PDFs, drawings, RFIs, email, meeting minutes → OCR → chunked text + page-region metadata

↓

LAYER 02 · MODELS

Local LLM layer

Llama 3.1 70B (general), Qwen 2.5 14B (fast), domain fine-tunes (e.g. rfi_classifier)

↓

LAYER 03 · MEMORY

Embeddings + vector store

BAAI/bge-m3 embeddings → ChromaDB or Qdrant → Postgres for structured project metadata

↓

LAYER 04 · APP

Application layer

FastAPI services for RFI triage, daily log synth, change-order risk scoring, deficiency consolidation

↓

LAYER 05 · UI

Web UI

Next.js + Tailwind. Hosted on your domain or behind your VPN. SSO via your existing identity provider.

// Layer breakdown

EVERY COMPONENT, NAMED.

MODELS

Open-weight LLMs

· Llama 3.1 70B Instruct
· Qwen 2.5 14B Instruct (fast lane)
· Mistral 7B (legacy support)
· LoRA fine-tunes per workflow

Weights live on your hardware. Inference via vLLM or llama.cpp.

VECTOR STORE

Embeddings + retrieval

· ChromaDB (default) or Qdrant
· BAAI/bge-m3 embeddings
· Hybrid search (BM25 + dense)
· Per-project namespace isolation

All vectors stay on your disk. No external embedding APIs.

API LAYER

FastAPI services

· FastAPI + Pydantic
· Celery + Redis for async jobs
· OAuth2 / SSO ready
· OpenAPI spec generated

Stateless services. Easy to scale or replace one module without touching the others.

UI

Web interface

· Next.js 14 (app router)
· Tailwind CSS
· React Query for state
· Mobile-first for site supers

Yours to brand. Yours to extend. Source ships in the handover.

INFRA

Runtime

· Docker Compose (single-host)
· Postgres 16 for metadata
· MinIO or local FS for blobs
· Caddy or nginx as reverse proxy

No Kubernetes unless you ask. Boring tech is the point.

OBSERVABILITY

What broke and why

· Structured logs (JSON)
· Prometheus metrics
· Grafana dashboards (optional)
· Per-request audit trail

Your IT team can answer "what happened" without calling us.

// Where it runs

THREE DEPLOYMENT OPTIONS.

OPTION 01

Workstation

A single MacBook Pro M3 Max or a workstation with a 24GB GPU. Good for one PE / PM running RFI triage and daily logs.

~$5k hardware · 1 user · No IT lift.

OPTION 02 · COMMON

On-prem server

A single tower or 2U rack server in your office, dual-RTX or A6000-class GPU. Runs the full stack for a 10-50 person GC.

~$15-25k hardware · Whole office · One IT day to install.

OPTION 03

Your private cloud

Your AWS / Azure / GCP tenant, your VPC, your IAM. We deploy via Terraform. Documents never leave your cloud account.

Hourly GPU billing · Multi-site · Your existing cloud governance applies.

// What you get on day 14

THE HANDOVER FOLDER.

A real, opinionated folder structure ships with every build. Everything is versioned, documented, and reproducible. Here's what lands in your repo:

your-marapone-build/
├── README.md
├── LICENSE
├── docker-compose.yml
├── .env.example
├── models/
│   ├── llama-3.1-70b-instruct.gguf
│   ├── qwen2.5-14b-instruct.gguf
│   └── lora/
│       ├── rfi_classifier/
│       └── change_order_risk/
├── data/
│   ├── ingested/
│   │   ├── drawings/
│   │   └── rfis/
│   ├── embeddings/
│   │   └── blueprint_pages.parquet
│   └── chroma/
├── src/
│   ├── api/
│   │   ├── main.py
│   │   ├── routers/
│   │   │   ├── rfi.py
│   │   │   ├── daily_log.py
│   │   │   └── change_order.py
│   │   └── services/
│   ├── ingest/
│   │   ├── ocr_pipeline.py
│   │   └── chunkers/
│   └── ui/                      # Next.js app
├── runbooks/
│   ├── 01_install.md
│   ├── 02_ingest_new_project.md
│   ├── 03_retrain_classifier.md
│   ├── 04_backup_restore.md
│   └── 05_incident_playbook.md
├── tests/
└── infra/
    ├── terraform/               # if cloud deploy
    └── scripts/

Every file is yours under your license. Want to change the model? Edit one config. Want to fork the UI? It's already a clean Next.js app. Want to swap ChromaDB for Qdrant? One docker-compose line.

// Maintenance & upgrades

YOU OWN IT. WE'RE STILL THERE IF YOU WANT US.

Self-serve by default

Every runbook is in plain Markdown. Backups, model updates, retraining the RFI classifier on new project data — all documented step by step. A competent IT generalist can run it.

Optional retainer

If you want us on call, we offer a flat monthly retainer for upgrades, model swaps as new open-weight releases drop, and incident support. No required subscription.

Open-weight upgrades

When Llama 4 or the next Qwen ships and outperforms what you have, you can swap it. Your fine-tunes and pipelines are model-agnostic on purpose.

No phone-home

The system never calls back to us. No telemetry, no usage pings. If your network is air-gapped, the system runs anyway.

// Want to read the actual code?

WE'LL WALK YOU THROUGH
A REAL HANDOVER REPO.

Book a 30-minute call and we'll screen-share an anonymized client repo: the API, the runbooks, the model configs. No NDA needed for the walkthrough.

Request the Walkthrough → See Security Posture

UNDER THE HOOD —WHAT YOU ACTUALLY OWN.