Quickstart — Deploy FowyldAI¶
Get FowyldAI running and execute your first sovereign inference.
Prerequisites¶
- Python 3.11+ (enforced at startup)
- PyTorch 2.4+ with CUDA support (optional — falls back to CPU)
- 16 GB RAM minimum (32 GB recommended)
- GPU recommended: 8 GB VRAM minimum (e.g., RTX 2000 Ada). 16 GB for full model suite.
- ~35 GB disk for model weights
Step 1: Clone and Install¶
git clone https://github.com/melhousen-solutions-dev/fowyldai.git
cd fowyldai
python -m venv .venv
.venv\Scripts\activate # Windows
# source .venv/bin/activate # Linux/Mac
pip install -e .
Step 2: Configure¶
Copy the example environment file and edit as needed:
Key settings in .env:
FOWYLDAI_PORT=8400
FOWYLDAI_MODEL_ROOT=D:\models
FOWYLD_EDITION=crown # crown or ranger
FOWYLDAI_PRODUCT_PACK=melhousen
Configuration files live in the config/ directory:
| File | Purpose |
|---|---|
config/dev.yaml |
App config (host, port, logging, inference, safety) |
config/prod.yaml |
Production overrides |
config/models.yaml |
Model registry (HF repos, VRAM, quantization) |
config/security.yaml |
Rate limiting, CORS, alerting, encryption |
config/warm_pool.yaml |
Which models stay preloaded in GPU VRAM |
Step 3: Download Models¶
This downloads the core model suite from HuggingFace (~35 GB total):
| Model | Role | VRAM |
|---|---|---|
| qwen25-1b | Classification (Scout) | 3 GB |
| phi3-mini | Light reasoning | 7.6 GB |
| mistral-7b (GPTQ) | Deep reasoning | 4.5 GB |
| openhermes-7b (GPTQ) | Heavy reasoning | 4.5 GB |
Step 4: Start the Crown Engine¶
Option A — Direct uvicorn:
Option B — Using the CLI entry point (after pip install -e .):
Option C — Using Make:
Option D — Using the startup script (Windows):
This script verifies model integrity (SHA-256 checksums) before starting.
Option E — Docker:
docker build -t fowyldai:crown --build-arg FOWYLD_EDITION=crown .
docker compose -f docker-compose.prod.yml up -d
Step 5: Verify It's Running¶
curl http://127.0.0.1:8400/ping
# {"status": "ok"}
curl http://127.0.0.1:8400/health
# Returns version + loaded models list
Step 6: Run Your First Inference¶
Auto-route (let the Sovereign Brain pick the best model):
curl -X POST http://127.0.0.1:8400/auto \
-H "Content-Type: application/json" \
-d '{"prompt": "Summarize the key benefits of sovereign AI deployment"}'
OpenAI-compatible endpoint:
curl -X POST http://127.0.0.1:8400/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "phi3-mini",
"messages": [{"role": "user", "content": "What is sovereign AI?"}]
}'
Sovereign Brain reasoning:
curl -X POST http://127.0.0.1:8400/brain/reason \
-H "Content-Type: application/json" \
-d '{"prompt": "Analyze the security implications of cloud-hosted LLMs"}'