Self-Host WorldOfTaxonomy in 2 Minutes
TL;DR: MIT-licensed, fully open source. Clone,
docker compose up, and you have the complete platform - API, web app, MCP server, and all 1,000 systems - on your own infrastructure. No vendor lock-in, no per-query pricing.
Architecture overview
graph TB
subgraph Docker["Docker Compose"]
PG["PostgreSQL\n(data layer)"]
API["FastAPI Backend\n:8000"]
FE["Next.js Frontend\n:3000"]
MCP["MCP Server\n(stdio)"]
end
API --> PG
MCP --> PG
FE -->|proxy /api/*| API
subgraph Sources["Authoritative Sources"]
CENSUS["Census Bureau"]
UN["UN Statistics"]
WHO["WHO"]
EUROSTAT["Eurostat"]
end
API -->|ingest| CENSUS
API -->|ingest| UN
API -->|ingest| WHO
API -->|ingest| EUROSTAT
Quick start (Docker)
git clone https://github.com/colaberry/WorldOfTaxonomy.git
cd WorldOfTaxonomy
docker compose up
That is it. Web app at http://localhost:3000. API at http://localhost:8000.
Ingest systems
graph LR
EMPTY["Empty DB"] -->|ingest core| CORE["NAICS + ISIC\n+ crosswalk\n~3 min"]
CORE -->|ingest all| FULL["All 1,000 systems\n~30-45 min"]
The database starts empty. Ingest what you need:
# Core systems (~3 minutes)
docker compose exec backend python3 -m world_of_taxonomy ingest naics
docker compose exec backend python3 -m world_of_taxonomy ingest isic
docker compose exec backend python3 -m world_of_taxonomy ingest crosswalk
# All 1,000 systems (~30-45 minutes)
docker compose exec backend python3 -m world_of_taxonomy ingest all
Each ingester downloads data directly from its authoritative source (Census Bureau, UN, Eurostat, WHO) and loads into your local PostgreSQL. No pre-built data dumps. No third-party intermediaries.
Python only (bring your own PostgreSQL)
pip install -e .
cp .env.example .env
# Edit .env: set DATABASE_URL and JWT_SECRET
python3 -m world_of_taxonomy init
python3 -m world_of_taxonomy ingest naics
python3 -m uvicorn world_of_taxonomy.api.app:create_app --factory --port 8000
Run the MCP server
python3 -m world_of_taxonomy mcp
Point your AI client (Claude Desktop, Cursor, VS Code) at it using the MCP configuration.
Run the frontend
cd frontend
npm install
npm run dev
Frontend at http://localhost:3000, proxies API calls to :8000 via Next.js rewrites.
Environment variables
| Variable | Required | Default | Description |
|---|---|---|---|
DATABASE_URL |
Yes | - | PostgreSQL connection string |
JWT_SECRET |
For auth | - | JWT signing secret (min 32 chars) |
BACKEND_URL |
For frontend | http://localhost:8000 |
API URL for Next.js proxy |
Selective ingestion
You do not need all 1,000 systems. Ingest only what your use case requires:
| Use Case | Commands |
|---|---|
| Industry classification | ingest naics, ingest isic, ingest nace, ingest crosswalk |
| Medical coding | ingest icd10cm, ingest icd11, ingest loinc |
| Trade classification | ingest hs, ingest unspsc |
| Occupations | ingest soc, ingest isco, ingest esco |
Each ingester is independent. Run them in any order. Re-run to update - they use upsert logic, so re-ingestion is idempotent.
Production deployment
graph TB
subgraph Prod["Production Architecture"]
LB["Load Balancer\nnginx / Caddy / cloud"]
API1["FastAPI Instance 1"]
API2["FastAPI Instance 2"]
APIX["FastAPI Instance N"]
PG2["Managed PostgreSQL\nRDS / Cloud SQL / Neon"]
FE2["Next.js\nVercel / Cloudflare"]
end
LB --> API1
LB --> API2
LB --> APIX
API1 --> PG2
API2 --> PG2
APIX --> PG2
FE2 -->|BACKEND_URL| LB
| Component | Recommendation |
|---|---|
| PostgreSQL | Any managed service - AWS RDS, Cloud SQL, Neon, Supabase |
| Backend | FastAPI behind reverse proxy. Stateless - scale horizontally. |
| Frontend | Vercel, Cloudflare Pages, or any Node.js host |
| MCP | Local process connecting directly to PostgreSQL |
Why self-host?
| Benefit | Detail |
|---|---|
| No rate limits | Hosted API has limits for reliability. Self-hosted has none. |
| Data sovereignty | Keep the entire database within your infrastructure. |
| Custom ingestion | Add proprietary classification systems alongside public ones. |
| Airgapped environments | Full platform without internet after initial ingestion. |
| Cost control | PostgreSQL + small server. No per-query pricing. |
The hosted API is great for getting started. Self-hosting is great for production workloads at scale.