# WorldOfTaxonomy - Full Reference Guide > Unified Global Classification Knowledge Graph > 1,000+ systems, 1.2M+ nodes, 321K+ crosswalk edges. > Open source (MIT). Data is informational only - use at your own risk. ======================================================================== # Getting Started with WorldOfTaxonomy ======================================================================== ## Getting Started with WorldOfTaxonomy > **TL;DR:** Three ways to query 1,000+ classification systems, 1.2M+ codes, and 321K+ crosswalk edges - REST API, MCP server for AI agents, and a web app. All open source, all free to start. --- ## Three access points, one knowledge graph ```mermaid graph LR subgraph Graph["Knowledge Graph"] SYS["1,000 Systems"] NODES["1.2M+ Nodes"] EDGES["321K+ Edges"] end subgraph Surfaces["Access Points"] API["REST API\n/api/v1/*"] MCP["MCP Server\nstdio transport"] WEB["Web App\nlocalhost:3000"] end Graph --> API Graph --> MCP Graph --> WEB ``` Pick whichever fits your workflow. The API is for application integrations and scripts. The MCP server gives AI agents direct tool access. The web app is for visual exploration. ## Quick start - REST API Base URL: `https://worldoftaxonomy.com/api/v1` ### List all classification systems ```bash curl https://worldoftaxonomy.com/api/v1/systems ``` Returns an array of all systems with their ID, name, region, node count, and provenance metadata. ### Search across all systems ```bash curl "https://worldoftaxonomy.com/api/v1/search?q=physician" ``` Full-text search across all 1.2M+ nodes. A search for "physician" returns matches from SOC, ISCO, ESCO, NAICS, ICD-10-CM, and dozens more systems in a single call. Add `&grouped=true` to group results by system, or `&context=true` to include ancestor paths and children for each match. ### Look up a specific code ```bash curl https://worldoftaxonomy.com/api/v1/systems/naics_2022/nodes/6211 ``` Returns the node with its title, description, level, parent code, and whether it is a leaf node. ### Browse children ```bash curl https://worldoftaxonomy.com/api/v1/systems/naics_2022/nodes/62/children ``` Returns all direct child codes under a given node. This is how you drill down through a hierarchy. ### Get cross-system equivalences ```bash curl https://worldoftaxonomy.com/api/v1/systems/naics_2022/nodes/6211/equivalences ``` Returns crosswalk mappings to other systems. NAICS 6211 ("Offices of Physicians") maps to ISIC 8620, NACE 86.2, NIC 8620, and others. ### Translate to all systems at once ```bash curl https://worldoftaxonomy.com/api/v1/systems/naics_2022/nodes/6211/translations ``` Returns equivalences across all connected systems in a single call. One request, every known translation. ## Quick start - MCP server The MCP (Model Context Protocol) server lets AI agents query the knowledge graph directly. ### Setup ```bash pip install world-of-taxonomy python -m world_of_taxonomy mcp ``` Transport: stdio. The server exposes 25 tools and wiki-based resources. It works with Claude, Cursor, VS Code, Windsurf, and any MCP-compatible client. ### Key MCP tools | Tool | Purpose | Example | |------|---------|---------| | `list_classification_systems` | List all 1,000+ systems | "What systems cover Germany?" | | `search_classifications` | Full-text search across all nodes | "Find codes for diabetes" | | `get_industry` | Look up a specific code | "What is NAICS 5415?" | | `browse_children` | Get child codes | "Show subcategories of HS chapter 01" | | `get_equivalences` | Get crosswalk mappings | "What does ICD-10-CM E11 map to?" | | `translate_code` | Translate a code to another system | "Convert SOC 29-1211 to ISCO" | | `translate_across_all_systems` | Translate to all connected systems | "All equivalents for NAICS 4841" | | `classify_business` | Classify free text into taxonomy codes (returns `domain_matches` + `standard_matches`) | "Classify: mobile app for pet sitting" | | `get_audit_report` | Data provenance and quality audit | "Show provenance breakdown" | | `get_country_taxonomy_profile` | Systems applicable to a country | "What systems apply in Brazil?" | ### MCP resources The server also provides resources that AI agents can read for deeper context: - `taxonomy://systems` - JSON list of all classification systems - `taxonomy://stats` - Knowledge graph statistics - `taxonomy://wiki/{slug}` - Individual guide pages as markdown ## Authentication ### Registration ```bash curl -X POST https://worldoftaxonomy.com/api/v1/auth/register \ -H "Content-Type: application/json" \ -d '{"email": "you@example.com", "password": "your-password"}' ``` ### API keys After registration, create an API key: ```bash curl -X POST https://worldoftaxonomy.com/api/v1/auth/keys \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" \ -d '{"name": "My App"}' ``` API keys use the format `wot_` followed by 32 hex characters. Pass them in the Authorization header: ``` Authorization: Bearer wot_your_key_here ``` ## Rate limits | Tier | Requests/Minute | Daily Limit | Best for | |------|-----------------|-------------|----------| | Anonymous | 30 | Unlimited | Quick exploration | | Free (authenticated) | 1,000 | Unlimited | Development and prototyping | | Pro | 5,000 | 100,000 | Production applications | | Enterprise | 50,000 | Unlimited | High-volume integrations | ## API request flow ```mermaid sequenceDiagram participant C as Your App participant RL as Rate Limiter participant AUTH as Auth Layer participant Q as Query Layer participant DB as PostgreSQL C->>RL: GET /api/v1/search?q=physician RL->>RL: Check tier limit RL->>AUTH: Forward AUTH->>AUTH: Validate JWT or API key AUTH->>Q: Authenticated request Q->>DB: Full-text search DB-->>Q: Matching nodes Q-->>C: JSON response ``` ## API endpoints reference ### Systems | Endpoint | Description | |----------|-------------| | `GET /systems` | List all classification systems | | `GET /systems/{id}` | System detail with root codes | | `GET /systems/stats` | Leaf and total node counts per system | | `GET /systems?group_by=region` | Systems grouped by region | | `GET /systems?country={code}` | Systems applicable to a country | ### Nodes | Endpoint | Description | |----------|-------------| | `GET /systems/{id}/nodes/{code}` | Look up a specific code | | `GET /systems/{id}/nodes/{code}/children` | Direct children | | `GET /systems/{id}/nodes/{code}/ancestors` | Parent chain to root | | `GET /systems/{id}/nodes/{code}/siblings` | Sibling codes | | `GET /systems/{id}/nodes/{code}/subtree` | Subtree summary stats | ### Search | Endpoint | Description | |----------|-------------| | `GET /search?q={query}` | Full-text search | | `GET /search?q={query}&grouped=true` | Results grouped by system | | `GET /search?q={query}&context=true` | Results with ancestor/child context | ### Crosswalks | Endpoint | Description | |----------|-------------| | `GET /systems/{id}/nodes/{code}/equivalences` | Cross-system mappings | | `GET /systems/{id}/nodes/{code}/translations` | Translate to all systems | | `GET /equivalences/stats` | Crosswalk statistics | | `GET /compare?a={sys}&b={sys}` | Side-by-side sector comparison | | `GET /diff?a={sys}&b={sys}` | Codes with no mapping | ### Classification | Endpoint | Description | |----------|-------------| | `POST /classify` | Classify free text; returns `domain_matches` + `standard_matches` (see [domain-vs-standard](/guide/domain-vs-standard)) | ### Countries | Endpoint | Description | |----------|-------------| | `GET /countries/stats` | Per-country taxonomy coverage | | `GET /countries/{code}` | Full taxonomy profile for a country | ## Data disclaimer All classification data in WorldOfTaxonomy is provided for informational purposes only. It should not be used as a substitute for official government or standards body publications. Always verify codes against the authoritative source for regulatory, legal, or compliance purposes. ======================================================================== # Systems Catalog - All 1,000+ Classification Systems ======================================================================== ## Systems Catalog - All 1,000+ Classification Systems > **TL;DR:** Complete catalog of 1,000+ classification systems organized by category. Industry (150+), Life Sciences (100+), Product/Trade (20+), Occupation (15+), Regulatory (100+), and 300+ domain vocabularies - all connected by 321K+ crosswalk edges. --- ```mermaid graph LR subgraph Top5["Largest Systems by Node Count"] NCI["NCI Thesaurus\n211,072"] NDC["NDC\n112,077"] LOINC["LOINC\n102,751"] ICD10CM["ICD-10-CM\n97,606"] ICD10PCS["ICD-10-PCS\n79,987"] end ``` WorldOfTaxonomy connects over 1,000 classification systems as equal peers in a unified knowledge graph. Systems span industry classification, product and trade codes, occupation standards, health and clinical coding, education frameworks, financial and environmental standards, regulatory compliance, and hundreds of domain-specific vocabularies. ## Industry classification standards These are the foundational systems for classifying economic activity by industry sector. ### Global and Multi-National | System | Region | Codes | Authority | |--------|--------|-------|-----------| | ISIC Rev 4 | Global (UN) | 766 | United Nations Statistics Division | | ISIC Rev 3.1 | Global (historical) | ~400 | United Nations | | GICS Bridge | Global (MSCI/S&P) | 11 | MSCI and S&P Dow Jones | | ICB | Global (FTSE Russell) | 32 | FTSE Russell | ### North America | System | Region | Codes | Authority | |--------|--------|-------|-----------| | NAICS 2022 | North America | 2,125 | U.S. Census Bureau | | NAICS 2017 (Historical) | North America | ~2,000 | U.S. Census Bureau | | NAICS 2012 (Historical) | North America | ~2,000 | U.S. Census Bureau | | SIC 1987 | USA/UK | 1,176 | U.S. OMB | ### European Union (NACE Rev 2 Family) NACE Rev 2 is the EU standard. Each member state publishes a national adaptation with the same structure (996 codes): NACE Rev 2 (EU), ATECO 2007 (Italy), NAF Rev 2 (France), WZ 2008 (Germany), ONACE 2008 (Austria), NOGA 2008 (Switzerland), PKD 2007 (Poland), SBI 2008 (Netherlands), SNI 2007 (Sweden), DB07 (Denmark), TOL 2008 (Finland), CNAE 2009 (Spain), NACE-BEL 2008 (Belgium), CAE Rev 3 (Portugal), CZ-NACE (Czech Republic), TEAOR 2008 (Hungary), CAEN Rev 2 (Romania), and 20+ more national variants. ### Asia-Pacific | System | Region | Codes | Authority | |--------|--------|-------|-----------| | NIC 2008 | India | 2,070 | Ministry of Statistics | | JSIC 2013 | Japan | 20 | Ministry of Internal Affairs | | ANZSIC 2006 | Australia/NZ | 825 | ABS/Stats NZ | | GB/T 4754-2017 | China | 118 | National Bureau of Statistics | | KSIC 2017 | South Korea | 108 | KOSTAT | | SSIC 2020 | Singapore | 21 | Dept of Statistics | ### Latin America (ISIC-based) CIIU Rev 4 adaptations: Colombia, Argentina, Chile, Peru, Ecuador, Bolivia, Venezuela, Costa Rica, Guatemala, Panama, Paraguay, Uruguay, Dominican Republic - each with 766 codes based on ISIC Rev 4. ### Additional National Systems Over 80 country-specific ISIC Rev 4 adaptations covering Africa, Middle East, Central Asia, Southeast Asia, Caribbean, and Pacific Island nations. ## Product and Trade Classification | System | Region | Codes | Purpose | |--------|--------|-------|---------| | HS 2022 | Global (WCO) | 6,960 | International trade (customs tariffs) | | CPC v2.1 | Global (UN) | 4,596 | Product classification (statistical) | | UNSPSC v24 | Global (GS1 US) | 77,337 | Procurement and spend analysis | | SITC Rev 4 | Global (UN) | 77 | Trade statistics | | BEC Rev 5 | Global (UN) | 29 | Broad economic categories | | HTS (US) | United States | 120 | US customs tariff | | CN 2024 | European Union | 118 | EU Combined Nomenclature | ## Occupation and Skills Classification | System | Region | Codes | Purpose | |--------|--------|-------|---------| | ISCO-08 | Global (ILO) | 619 | International occupation standard | | SOC 2018 | United States | 1,447 | US occupation classification | | O*NET-SOC | United States | 867 | Occupation database with skills data | | ESCO Occupations | Europe (EU) | 3,045 | European occupation taxonomy | | ESCO Skills | Europe (EU) | 14,247 | Skills and competences | | NOC 2021 | Canada | 51 | Canadian occupations | | UK SOC 2020 | United Kingdom | 43 | UK occupations | | ANZSCO 2022 | Australia/NZ | 1,590 | AU/NZ occupations | ## Life Sciences | System | Region | Codes | Purpose | |--------|--------|-------|---------| | ICD-11 MMS | Global (WHO) | 37,052 | Disease classification (latest) | | ICD-10-CM | United States | 97,606 | US clinical modification | | ICD-10-PCS | United States | 79,987 | US procedure coding | | LOINC | Global | 102,751 | Laboratory and clinical observations | | MeSH | Global (NLM) | 31,124 | Medical subject headings | | ATC WHO 2021 | Global (WHO) | 6,440 | Anatomical therapeutic chemical | | NCI Thesaurus | Global (NCI) | 211,072 | Cancer research terminology | | NDC | United States | 112,077 | National drug codes | ## Education Classification | System | Region | Codes | Purpose | |--------|--------|-------|---------| | ISCED 2011 | Global (UNESCO) | 20 | Education levels | | ISCED-F 2013 | Global (UNESCO) | 122 | Fields of education | | CIP 2020 | United States | 2,848 | Instructional programs | ## Geographic Classification | System | Region | Codes | Purpose | |--------|--------|-------|---------| | ISO 3166-1 | Global | 271 | Country codes | | ISO 3166-2 | Global | 5,246 | Subdivision codes | | UN M.49 | Global | 272 | Geographic regions | | EU NUTS 2021 | European Union | 124 | Statistical regions | | US FIPS | United States | 86 | Federal information processing | ## Financial, Environmental, and Governance | System | Region | Codes | Purpose | |--------|--------|-------|---------| | COFOG | Global (UN) | 188 | Government functions | | GHG Protocol | Global (WRI) | 20 | Greenhouse gas accounting | | SASB SICS | Global | 86 | Sustainability accounting | | EU Taxonomy | European Union | 60 | Sustainable finance | | SFDR | European Union | 30 | Financial disclosure regulation | | SDG 2030 | Global (UN) | 82 | Sustainable development goals | ## Regulatory and Compliance Over 100 regulatory frameworks including HIPAA, SOX, GDPR, OSHA standards, FDA regulations, SEC rules, PCI DSS, NIST frameworks, ISO management system standards, and international agreements (Basel, FATF, ILO Conventions). ## Domain-Specific Vocabularies Over 300 domain taxonomies covering specialized sectors: - **Transportation**: truck freight types, vehicle classes, cargo classification, carrier operations, pricing, regulatory compliance - **Agriculture**: crop types, livestock, farming methods, commodity grades, equipment, input supply, land classification - **Mining**: mineral types, extraction methods, reserve classification, equipment, safety - **Construction**: trade types, building types, project delivery, materials, sustainability - **Manufacturing**: process types, quality, operations models, industry verticals - **Healthcare deep-dives**: hospital departments, nursing specialties, lab categories, surgical specialties, pharmacy types - **Finance deep-dives**: insurance products, credit ratings, derivatives, private equity stages - **Technology**: API architectures, database types, programming paradigms, DevOps, MLOps, cybersecurity - **Energy**: oil grades, natural gas, solar, wind, battery, smart grid, carbon credits ## Patent Classification | System | Region | Codes | Purpose | |--------|--------|-------|---------| | Patent CPC | Global (EPO/USPTO) | 254,249 | Cooperative Patent Classification | ## Academic and Research | System | Region | Codes | Purpose | |--------|--------|-------|---------| | arXiv Taxonomy | Global | 110 | Preprint subject areas | | MSC 2020 | Global | 92 | Mathematics subject classification | | PACS | Global | 70 | Physics and astronomy | | LCC | Global | 111 | Library of Congress classification | | JEL Codes | Global | 98 | Economics literature | | ACM CCS 2012 | Global | 67 | Computing classification | ## How to Explore Systems Use these API calls to explore the catalog programmatically: ```bash # List all systems curl https://worldoftaxonomy.com/api/v1/systems # Group by region curl "https://worldoftaxonomy.com/api/v1/systems?group_by=region" # Filter by country curl "https://worldoftaxonomy.com/api/v1/systems?country=DE" # System detail with root codes curl https://worldoftaxonomy.com/api/v1/systems/naics_2022 ``` ======================================================================== # Crosswalk Map - How Classification Systems Connect ======================================================================== ## Crosswalk Map - How Classification Systems Connect > **TL;DR:** 326,000+ crosswalk edges link 1,000 classification systems through hub-and-spoke topology. ISIC is the industry hub, CPC bridges trade to industry, SOC/ISCO connect occupations, and every one of the 434 domain taxonomies is bridged to NAICS/ISIC/NACE via sector anchors. This guide maps the full topology and shows how to navigate translation paths. --- ## What is a crosswalk? A crosswalk (or concordance) is a mapping between codes in two different classification systems. For example, NAICS 6211 ("Offices of Physicians") maps to ISIC 8620 ("Medical and dental practice activities"). Crosswalks have a match type that tells you how precise the mapping is: | Type | Meaning | Example | |------|---------|---------| | `exact` | Identical scope and definition | NAICS 111110 "Soybean Farming" = ISIC 0111 | | `partial` | Overlapping but not identical scope | NAICS 6211 partially overlaps ISIC 8620 | | `broader` | Target has wider scope | A 6-digit NAICS to a 2-digit ISIC | | `narrower` | Target has narrower scope | A section-level ISIC to a detailed NAICS | | `related` | Conceptually related but structurally different | Domain taxonomy to parent NAICS sector | ## Core crosswalk topology The knowledge graph has five major hubs. Each hub connects clusters of related systems. ```mermaid graph TB subgraph Industry["Industry Hub"] ISIC["ISIC Rev 4\n766 codes"] NAICS["NAICS 2022\n2,125 codes"] NACE["NACE Rev 2\n996 codes"] NIC["NIC 2008\n2,070 codes"] ANZSIC["ANZSIC 2006\n825 codes"] SIC["SIC 1987\n1,176 codes"] GBT["GB/T 4754\n118 codes"] NAT80["80+ National\nISIC variants"] end subgraph Trade["Trade Hub"] CPC["CPC v2.1\n4,596 codes"] HS["HS 2022\n6,960 codes"] UNSPSC["UNSPSC v24\n77,337 codes"] HTS["HTS / CN / SITC"] end subgraph Occupation["Occupation Hub"] SOC["SOC 2018\n1,447 codes"] ISCO["ISCO-08\n619 codes"] ESCO["ESCO\n3,045 + 14,247"] ONET["O*NET-SOC\n867 codes"] CIP["CIP 2020\n2,848 codes"] end NAICS <-->|3,418 edges| ISIC ISIC <-->|1:1| NACE ISIC -.->|derived| NIC ISIC -.->|derived| ANZSIC ISIC -.->|derived| GBT ISIC -.->|derived| NAT80 NAICS <-.->|legacy| SIC ISIC <-->|5,430 edges| CPC CPC <-->|11,686 edges| HS CPC -.-> UNSPSC HS -.-> HTS SOC <-->|992 edges| ISCO ISCO <-->|6,048 edges| ESCO SOC <-->|1,734 edges| ONET CIP -->|5,903 edges| SOC ISCO <-->|44 edges| ISIC ``` ## Industry classification hub ISIC Rev 4 is the central node for industry classification. Every major national system connects through it. ```mermaid graph LR NAICS["NAICS 2022"] <-->|3,418| ISIC["ISIC Rev 4"] ISIC <-->|1:1| NACE["NACE Rev 2"] NACE -->|1:1| WZ["WZ 2008\nGermany"] NACE -->|1:1| NAF["NAF Rev 2\nFrance"] NACE -->|1:1| ATECO["ATECO 2007\nItaly"] NACE -->|1:1| MORE["30+ more\nEU variants"] ISIC -->|derived| NIC["NIC 2008\nIndia"] ISIC -->|derived| ANZSIC["ANZSIC 2006\nAU/NZ"] ISIC -->|derived| GBT["GB/T 4754\nChina"] ISIC -->|adapted| NAT80["80+ national\nadaptations"] ``` NACE national variants (WZ, NAF, ATECO, PKD, SBI, SNI, etc.) share the identical 996-code structure. Each has a 1:1 mapping to NACE Rev 2 and transitively to ISIC Rev 4. ## Product and trade hub CPC v2.1 is the bridge between trade codes and industry codes. ```mermaid graph LR HS["HS 2022\n6,960 codes"] <-->|11,686 edges| CPC["CPC v2.1\n4,596 codes"] CPC <-->|5,430 edges| ISIC["ISIC Rev 4"] HS -->|extended| HTS["HTS (US)"] HS -->|extended| CN["CN 2024 (EU)"] HS -->|extended| AHTN["ASEAN Tariff"] HS -->|extended| NCM["MERCOSUR Tariff"] HS -.->|aggregated| SITC["SITC Rev 4\n77 codes"] HS -.->|aggregated| BEC["BEC Rev 5\n29 codes"] CPC -.-> UNSPSC["UNSPSC v24\n77,337 codes"] ``` This means you can trace a trade code (HS) to its product category (CPC) to the industry that produces it (ISIC/NAICS). ## Occupation and education hub SOC 2018 and ISCO-08 are the twin hubs for occupation data. ```mermaid graph LR CIP["CIP 2020\n2,848 programs"] -->|5,903 edges| SOC["SOC 2018\n1,447 occupations"] CIP -->|1,615 edges| ISCEDF["ISCED-F 2013\n122 fields"] SOC <-->|992 edges| ISCO["ISCO-08\n619 occupations"] ISCO <-->|6,048 edges| ESCO["ESCO Occupations\n3,045"] SOC <-->|1,734 edges| ONET["O*NET-SOC\n867"] ISCO -->|44 edges| ISIC["ISIC Rev 4"] SOC -.-> NAICS["NAICS 2022"] ``` CIP 2020 (educational programs) connects to SOC (occupations) with 5,903 edges - the education-to-career pipeline. ## Geographic and domain hubs ```mermaid graph TB subgraph Geo["Geographic"] ISO1["ISO 3166-1\n271 countries"] ISO2["ISO 3166-2\n5,246 subdivisions"] UNM["UN M.49\n272 regions"] end subgraph Domain["Domain Crosswalks"] N484["NAICS 484\nTruck Transportation"] N11["NAICS 11\nAgriculture"] N21["NAICS 21\nMining"] N22["NAICS 22\nUtilities"] N23["NAICS 23\nConstruction"] end ISO1 <--> ISO2 ISO1 <--> UNM N484 -->|~200 edges| TRUCK["Truck domain\n7 vocabularies"] N11 -->|~48 edges| AG["Agriculture domain\n11 vocabularies"] N21 -->|~31 edges| MINE["Mining domain\n6 vocabularies"] N22 -->|~20 edges| UTIL["Utility domain\n6 vocabularies"] N23 -->|~27 edges| CONST["Construction domain\n6 vocabularies"] ``` Each domain taxonomy links back to its parent NAICS sector, creating drill-down paths from broad industry codes to specialized vocabularies. As of the sector-anchor pass, all 434 domain taxonomies (up from the 15 original pilots shown above) carry at least one bridge edge to NAICS 2022, plus parallel fan-out edges into ISIC Rev 4 and NACE Rev 2 where the NAICS anchor has an existing international crosswalk. Generated edges are stamped `match_type='broad'` and one of two provenance values: | Provenance | What it means | |------------|---------------| | `derived:sector_anchor:v1` | Direct NAICS<->domain bridge written by `crosswalk_domain_anchors.py` | | `derived:sector_anchor:v1:fanout` | ISIC<->domain or NACE<->domain edge derived via a NAICS<->ISIC (or NACE) self-join | Filter `?match_type=exact` if you want to exclude every generated bridge and see only authoritative exact statistical concordances. ## The four edge kinds Every equivalence response now carries an `edge_kind` computed from the categories of both endpoints. See [domain-vs-standard](domain-vs-standard.md) for the full pattern. Quick reference: | `edge_kind` | Description | |---------------------|-------------| | `standard_standard` | Pre-existing statistical crosswalks (NAICS<->ISIC, ISIC<->NACE, HS<->CPC, SOC<->ISCO, ...) | | `standard_domain` | Bridge from an official code to a curated domain taxonomy | | `domain_standard` | Bridge from a domain taxonomy back to an official code | | `domain_domain` | Reserved for future cross-domain edges; none generated yet | Use the filter on any equivalence or translation endpoint: ``` GET /api/v1/systems/naics_2022/nodes/6211/equivalences?edge_kind=standard_standard GET /api/v1/systems/naics_2022/nodes/6211/equivalences?edge_kind=standard_domain,domain_standard ``` Stats grouped by edge kind: ```bash curl "https://worldoftaxonomy.com/api/v1/equivalences/stats?group_by=edge_kind" ``` ## Translation paths Not all systems have direct crosswalks. You translate between systems by following a path through intermediate hubs. ### Example: German industry code to US occupation ```mermaid graph LR WZ["WZ 2008\nGerman industry"] -->|1:1| NACE["NACE Rev 2"] NACE -->|1:1| ISIC["ISIC Rev 4"] ISIC -->|44 edges| ISCO["ISCO-08"] ISCO -->|992 edges| SOC["SOC 2018\nUS occupation"] ``` ### Example: HS trade code to NAICS industry ```mermaid graph LR HS["HS 2022\ntrade code"] -->|11,686| CPC["CPC v2.1"] CPC -->|5,430| ISIC["ISIC Rev 4"] ISIC -->|3,418| NAICS["NAICS 2022"] ``` ## API for crosswalk navigation ### Direct equivalences ```bash # Get all systems that NAICS 6211 maps to curl https://worldoftaxonomy.com/api/v1/systems/naics_2022/nodes/6211/equivalences # Translate to all connected systems at once curl https://worldoftaxonomy.com/api/v1/systems/naics_2022/nodes/6211/translations ``` ### Crosswalk statistics ```bash # Overall crosswalk stats curl https://worldoftaxonomy.com/api/v1/equivalences/stats # Stats for a specific system curl "https://worldoftaxonomy.com/api/v1/equivalences/stats?system_id=naics_2022" ``` ### Compare systems ```bash # Side-by-side top-level comparison curl "https://worldoftaxonomy.com/api/v1/compare?a=naics_2022&b=isic_rev4" # Codes in system A with no mapping to B curl "https://worldoftaxonomy.com/api/v1/diff?a=naics_2022&b=isic_rev4" ``` ## MCP tools for crosswalks | Tool | Purpose | |------|---------| | `get_equivalences` | Direct crosswalk mappings for a code | | `translate_code` | Translate a code to a specific target system | | `translate_across_all_systems` | Translate to all connected systems | | `get_crosswalk_coverage` | Coverage statistics for a crosswalk pair | | `get_system_diff` | Codes with no mapping between two systems | | `compare_sector` | Side-by-side sector comparison | | `describe_match_types` | Explain the match type categories | | `list_crosswalks_by_kind` | Counts + samples for a specific `edge_kind` (standard_standard, standard_domain, domain_standard, domain_domain); optionally narrow to a single system | ======================================================================== # Industry Classification Guide - Which System to Use ======================================================================== ## Industry Classification Guide - Which System to Use > **TL;DR:** Your country and purpose determine which industry classification system to use. NAICS for North America, NACE for the EU, ISIC for global. This guide provides a decision tree, country reference, and side-by-side comparisons. --- ## Decision tree ```mermaid graph TD START["What do you need to classify?"] --> GEO{"Geographic scope?"} GEO -->|Single country| NATIONAL["Use national system\nsee table below"] GEO -->|Multi-country / Global| ISIC["ISIC Rev 4\n766 codes, UN standard"] GEO -->|North America| NAICS["NAICS 2022\n2,125 codes"] GEO -->|European Union| NACE["NACE Rev 2\n996 codes"] NAICS --> DETAIL{"Need SEC filing?"} DETAIL -->|Yes| SIC["SIC 1987\nstill required by SEC"] DETAIL -->|No| NAICS_DONE["Use NAICS 2022"] ``` ### Step 1: What is your geographic scope? **Single country** - Use the national system for that country (see table below). **Multi-country or global** - Use ISIC Rev 4 as your common denominator, then translate to national systems as needed. **North America (US, Canada, Mexico)** - Use NAICS 2022. **European Union** - Use NACE Rev 2 (or your country's national variant). ### Step 2: What level of detail do you need? | Granularity | Typical Use | Recommended System | |-------------|-------------|-------------------| | Broad sectors (10-20 categories) | Executive dashboards, market sizing | ISIC sections (A-U) or NAICS 2-digit | | Divisions (~100 categories) | Industry reports, portfolio analysis | ISIC 2-digit or NAICS 3-digit | | Groups (~300 categories) | Detailed market analysis | ISIC 3-digit or NAICS 4-digit | | Classes (~500+ categories) | Regulatory filings, detailed reporting | ISIC 4-digit or NAICS 5-6 digit | ### Step 3: Is this for regulatory compliance? If you are filing with a government agency, use the system they require: | Agency / Purpose | Required System | |------------------|----------------| | US Census Bureau / BLS | NAICS 2022 | | US SEC filings | SIC 1987 | | Eurostat / EU statistical reporting | NACE Rev 2 | | UN statistical reporting | ISIC Rev 4 | | Australian Bureau of Statistics | ANZSIC 2006 | | Indian Ministry of Statistics | NIC 2008 | | World Bank projects | ISIC Rev 4 | ## Country-to-system quick reference ### Major economies | Country | Primary System | Codes | Notes | |---------|---------------|-------|-------| | United States | NAICS 2022 | 2,125 | Also SIC 1987 for SEC filings | | Canada | NAICS 2022 | 2,125 | Shared with US and Mexico | | United Kingdom | SIC 1987 / UK SOC | 1,176 | Companies House uses SIC | | Germany | WZ 2008 | 996 | National NACE variant | | France | NAF Rev 2 | 996 | National NACE variant | | India | NIC 2008 | 2,070 | Based on ISIC Rev 4 | | China | GB/T 4754-2017 | 118 | National standard | | Japan | JSIC 2013 | 20 | Statistical survey use | | Australia | ANZSIC 2006 | 825 | Shared with New Zealand | | South Korea | KSIC 2017 | 108 | KOSTAT standard | ### Latin America All countries use CIIU Rev 4 (the Spanish translation of ISIC Rev 4) with 766 codes: Colombia, Argentina, Chile, Peru, Ecuador, Bolivia, Venezuela, Costa Rica, Guatemala, Panama, Paraguay, Uruguay, Dominican Republic. ### European Union (27 members + EEA) All EU member states use NACE Rev 2 with national naming: ATECO (Italy), NAF (France), WZ (Germany), CNAE (Spain), PKD (Poland), SBI (Netherlands), SNI (Sweden), and others. The structure is identical - 996 codes with 1:1 mapping. ## Comparing the major systems ```mermaid graph LR subgraph North_America["North America"] NAICS["NAICS 2022\n2,125 codes\n6 levels"] end subgraph EU["European Union"] NACE["NACE Rev 2\n996 codes\n4 levels"] end subgraph Global["Global"] ISIC["ISIC Rev 4\n766 codes\n4 levels"] end NAICS <-->|3,418 edges| ISIC ISIC <-->|1:1 structure| NACE ``` ### NAICS 2022 vs ISIC Rev 4 | Feature | NAICS 2022 | ISIC Rev 4 | |---------|-----------|-----------| | Codes | 2,125 | 766 | | Levels | 6 (2-6 digit) | 4 (section, division, group, class) | | Region | North America | Global | | Detail | Very granular | Moderate | | Crosswalk | 3,418 edges to ISIC | 3,418 edges to NAICS | | Best for | US regulatory, detailed analysis | International comparison | ### NAICS 2022 vs NACE Rev 2 | Feature | NAICS 2022 | NACE Rev 2 | |---------|-----------|-----------| | Codes | 2,125 | 996 | | Levels | 6 | 4 | | Region | North America | European Union | | Detail | Very granular | Moderate | | Best for | US/Canada/Mexico | EU regulatory, Eurostat | ### NAICS 2022 vs SIC 1987 | Feature | NAICS 2022 | SIC 1987 | |---------|-----------|---------| | Codes | 2,125 | 1,176 | | Status | Current | Legacy (but still used) | | Region | North America | USA/UK | | Best for | Current analysis | SEC filings, historical data | ## How to translate between systems ```bash # Translate NAICS 6211 to all equivalent systems curl https://worldoftaxonomy.com/api/v1/systems/naics_2022/nodes/6211/translations # Direct equivalences with match types curl https://worldoftaxonomy.com/api/v1/systems/naics_2022/nodes/6211/equivalences # Find NAICS codes with no NACE equivalent curl "https://worldoftaxonomy.com/api/v1/diff?a=naics_2022&b=nace_rev2" ``` For systems without direct crosswalks, follow the translation path through hub systems (see the [Crosswalk Map](crosswalk-map) guide). ## Domain-specific extensions When a standard industry code is too broad for your use case, WorldOfTaxonomy provides domain-specific vocabularies: | NAICS Sector | Domain Vocabularies | Example Codes | |-------------|---------------------|---------------| | 484 Truck Transportation | Freight types, vehicle classes, cargo, carrier operations | 44 + 23 + 46 + 27 | | 11 Agriculture | Crop types, livestock, farming methods, commodity grades | 46 + 27 + 28 + 30 | | 21 Mining | Mineral types, extraction methods, reserve classification | 25 + 20 + 12 | | 22 Utilities | Energy sources, grid regions, tariff structures | 17 + 15 + 26 | | 23 Construction | Trade types, building types, project delivery | 20 + 17 + 22 | These domain taxonomies are crosswalked back to their parent NAICS/ISIC sector codes, so you can drill down from a broad industry classification to specialized detail. ======================================================================== # Medical and Health Classification Systems Compared ======================================================================== ## Medical and Health Classification Systems Compared > **TL;DR:** ICD-10-CM for US billing, ICD-11 for global reporting, LOINC for lab tests, ATC for drugs, MeSH for research. WorldOfTaxonomy connects all of these (and more) - 568K+ health codes across 100+ systems with crosswalk edges between them. --- ## System overview | System | Codes | Purpose | Authority | |--------|-------|---------|-----------| | ICD-11 MMS | 37,052 | Disease classification (latest WHO standard) | WHO | | ICD-10-CM | 97,606 | US clinical modification for diagnoses | CMS/NCHS | | ICD-10-PCS | 79,987 | US procedure coding system | CMS | | LOINC | 102,751 | Laboratory and clinical observations | Regenstrief Institute | | MeSH | 31,124 | Medical literature subject headings | NLM | | ATC WHO 2021 | 6,440 | Drug classification by therapeutic use | WHO | | NCI Thesaurus | 211,072 | Cancer research terminology | National Cancer Institute | | NDC | 112,077 | National drug codes (US) | FDA | | SNOMED CT | ~20 (skeleton) | Clinical terminology reference | SNOMED International | | CPT | ~18 (skeleton) | Medical procedure codes (US) | AMA | > SNOMED CT and CPT are included as structural placeholders. Full datasets require licenses from SNOMED International and the AMA respectively. ## How health systems connect ```mermaid graph TB subgraph Diagnoses["Diagnosis Systems"] ICD10CM["ICD-10-CM\n97,606 codes"] ICD11["ICD-11 MMS\n37,052 codes"] ICD10PCS["ICD-10-PCS\n79,987 codes"] end subgraph Drugs["Drug Systems"] ATC["ATC WHO 2021\n6,440 codes"] NDC["NDC\n112,077 codes"] RXNORM["RxNorm (skeleton)"] end subgraph Research["Research & Lab"] MESH["MeSH\n31,124 descriptors"] LOINC["LOINC\n102,751 observations"] NCI["NCI Thesaurus\n211,072 terms"] end subgraph Clinical["Clinical"] SNOMED["SNOMED CT\n(skeleton)"] CPT["CPT (skeleton)"] end ICD10CM <-.-> ICD11 ICD10CM <-.-> MESH ATC <-.-> ICD10CM LOINC <-.-> ICD10CM SNOMED <-.-> ICD10CM NDC <-.-> ATC NCI <-.-> MESH CPT <-.-> ICD10PCS ``` ## ICD-10-CM vs ICD-11: Which to use? ### ICD-10-CM (United States) ICD-10-CM is the US clinical modification of the WHO's ICD-10. It is required for US healthcare billing and reporting. - **97,606 codes** - the most granular diagnosis system in the graph - **Structure**: 3-7 character alphanumeric codes (e.g., E11.65 - Type 2 diabetes with hyperglycemia) - **Required by**: CMS, US health insurers, HIPAA transactions - **Updated**: annually (October 1 each year) ### ICD-11 MMS (Global) ICD-11 is the latest WHO revision, adopted by the World Health Assembly in 2019. - **37,052 codes** with extension codes for additional detail - **Structure**: Alphanumeric with cluster and post-coordination - **Status**: Official WHO standard since January 2022 ### When to use which | Scenario | System | Why | |----------|--------|-----| | US hospital billing | ICD-10-CM | Required by CMS | | US procedure coding | ICD-10-PCS | Required for inpatient procedures | | WHO mortality/morbidity reporting | ICD-11 | Current WHO standard | | New health IT system (non-US) | ICD-11 | Forward-looking adoption | | International health research | ICD-11 | Global comparability | | Legacy system integration | ICD-10-CM | Existing infrastructure | ## LOINC - Laboratory and clinical observations LOINC (Logical Observation Identifiers Names and Codes) is the universal standard for identifying health measurements, observations, and documents. - **102,751 codes** - the largest observation vocabulary - **Use cases**: lab test orders and results, clinical documents, patient surveys - **Structure**: 5-7 digit numeric codes with check digit - **Required by**: US federal health agencies, HL7 FHIR implementations > LOINC does not classify diseases (that is ICD's role). It classifies what was measured or observed. A LOINC code identifies the test, an ICD code identifies the condition. ## MeSH - Medical subject headings MeSH is the controlled vocabulary used for indexing biomedical literature in PubMed/MEDLINE. - **31,124 descriptors** organized in a hierarchical tree - **Use cases**: literature search, research categorization, knowledge organization - **Structure**: 16 top-level categories branching into specific terms - **Maintained by**: US National Library of Medicine ## ATC - Drug classification The Anatomical Therapeutic Chemical (ATC) classification organizes drugs by the organ system they target and their therapeutic properties. - **6,440 codes** across 5 hierarchical levels - **Structure**: 7-character codes (e.g., A10BA02 = metformin) - **Levels**: Anatomical group, Therapeutic subgroup, Pharmacological subgroup, Chemical subgroup, Chemical substance - **Maintained by**: WHO Collaborating Centre for Drug Statistics ```mermaid graph TD A["A - Alimentary Tract\nand Metabolism"] --> A10["A10 - Drugs Used\nin Diabetes"] A10 --> A10B["A10B - Blood Glucose\nLowering Drugs"] A10B --> A10BA["A10BA - Biguanides"] A10BA --> A10BA02["A10BA02\nMetformin"] ``` ## Domain-specific health vocabularies WorldOfTaxonomy includes domain taxonomies for healthcare specialization: | Domain | Codes | Coverage | |--------|-------|----------| | Hospital Department Types | 18 | Department classification | | Nursing Specialty Types | 17 | Nursing specializations | | Lab Test Category Types | 17 | Laboratory categories | | Surgical Specialty Types | 17 | Surgical specializations | | Pharmacy Practice Types | 16 | Pharmacy settings | | Health Care Settings | 23 | Care delivery settings | | Health Care Payer Types | 18 | Insurance/payer categories | | Health Care Delivery Models | 18 | Payment and delivery models | | Mental Health Service Types | 22 | Behavioral health | | Dental Service Types | 18 | Oral health | ## API examples ```bash # Search for a medical term across all systems curl "https://worldoftaxonomy.com/api/v1/search?q=diabetes&grouped=true" # Browse ICD-10-CM hierarchy curl https://worldoftaxonomy.com/api/v1/systems/icd10_cm/nodes/E11/children # Get ICD-10-CM code detail curl https://worldoftaxonomy.com/api/v1/systems/icd10_cm/nodes/E11.65 # Browse ATC hierarchy from top level curl https://worldoftaxonomy.com/api/v1/systems/atc_who_2021/nodes/A10/children # LOINC system overview curl https://worldoftaxonomy.com/api/v1/systems/loinc # Cross-system equivalences for a diagnosis code curl https://worldoftaxonomy.com/api/v1/systems/icd10_cm/nodes/E11/equivalences ``` ## Use cases | Who | What | Systems | |-----|------|---------| | Hospital IT teams | Map diagnoses to billing codes | ICD-10-CM, ICD-10-PCS, CPT | | Pharma researchers | Link drugs to indications | ATC, ICD-10-CM, MeSH | | Public health agencies | Compare disease burden globally | ICD-11, ICD-10-CM | | Lab information systems | Standardize test identifiers | LOINC | | Clinical NLP pipelines | Normalize extracted terms | SNOMED CT, ICD-10-CM, MeSH | | Health AI agents | Navigate the full health taxonomy | All of the above via MCP | ======================================================================== # Trade and Product Classification Guide ======================================================================== ## Trade and Product Classification Guide > **TL;DR:** HS for customs, CPC to bridge trade and industry, UNSPSC for procurement (77K codes). This guide shows how the trade classification systems relate, which one to use, and how to navigate between them with 11,686+ crosswalk edges. --- ## System comparison | System | Codes | Purpose | Maintained By | |--------|-------|---------|---------------| | HS 2022 | 6,960 | International customs tariffs | World Customs Organization | | CPC v2.1 | 4,596 | Statistical product classification | United Nations | | UNSPSC v24 | 77,337 | Procurement and spend analysis | GS1 US | | SITC Rev 4 | 77 | Trade statistics (aggregated) | United Nations | | BEC Rev 5 | 29 | Broad economic categories | United Nations | | HTS (US) | 120 | US-specific tariff schedule | US International Trade Commission | | CN 2024 | 118 | EU Combined Nomenclature | European Commission | ## How these systems relate ### The HS family tree The Harmonized System (HS) is the foundation of international trade classification. Other systems build on it. ```mermaid graph TD HS["HS 2022 (WCO)\n6,960 codes\nGlobal foundation"] --> HTS["HTS (US)\nUS-specific subheadings"] HS --> CN["CN 2024 (EU)\nEU-specific subheadings"] HS --> AHTN["ASEAN Tariff (AHTN)\nSoutheast Asia"] HS --> NCM["MERCOSUR Tariff (NCM)\nSouth America"] HS --> AFCFTA["AfCFTA Tariff\nAfrica"] HS --> GCC["GCC Common Tariff\nGulf States"] ``` Every country that trades internationally uses HS at the 6-digit level. National extensions add more digits for country-specific detail. ### The statistical bridge CPC v2.1 bridges product classification and industry classification. This is where trade meets production. ```mermaid graph LR HS["HS 2022\n6,960 trade codes"] <-->|11,686 edges| CPC["CPC v2.1\n4,596 product codes"] CPC <-->|5,430 edges| ISIC["ISIC Rev 4\n766 industry codes"] CPC -.-> UNSPSC["UNSPSC v24\n77,337 procurement codes"] ``` This means you can trace: a **trade code** (HS) to its **product category** (CPC) to the **industry that produces it** (ISIC/NAICS). ### Aggregation for statistics SITC and BEC aggregate trade data at higher levels for economic analysis: ```mermaid graph TD HS["HS 2022\n6,960 detailed codes"] --> SITC["SITC Rev 4\n77 codes\nTrade flow analysis"] HS --> BEC["BEC Rev 5\n29 codes\nEconomic category analysis"] BEC --> SNA["Maps to SNA\ncategories"] ``` ## Which system to use | Purpose | Recommended System | Why | |---------|-------------------|-----| | Customs declarations | HS 2022 (or national variant) | Legally required for trade | | US import/export filings | HTS (US) | Required by US Customs | | EU trade compliance | CN 2024 | Required by EU customs | | Procurement/spend analysis | UNSPSC v24 | Most granular (77K codes) | | International trade statistics | SITC Rev 4 | Designed for aggregate analysis | | Economic modeling | BEC Rev 5 | Maps to SNA categories | | Product-to-industry mapping | CPC v2.1 | Bridges HS to ISIC | ## HS code structure HS codes use a hierarchical 6-digit structure: | Level | Digits | Example | Description | |-------|--------|---------|-------------| | Chapter | 2 | 01 | Live animals | | Heading | 4 | 0101 | Horses, asses, mules | | Subheading | 6 | 010121 | Pure-bred horses | National extensions add further digits. HTS (US) goes up to 10 digits. CN (EU) uses 8 digits. ## CPC code structure CPC v2.1 uses a 5-level hierarchy: | Level | Example | Description | |-------|---------|-------------| | Section | 0 | Agriculture, forestry and fishery products | | Division | 01 | Products of agriculture, horticulture | | Group | 011 | Cereals | | Class | 0111 | Wheat | | Subclass | 01110 | Wheat, unmilled | ## UNSPSC structure UNSPSC uses an 8-digit hierarchy across 4 levels: | Level | Example | Description | |-------|---------|-------------| | Segment | 10 | Live Plant and Animal Material | | Family | 1010 | Live animals | | Class | 101015 | Dogs | | Commodity | 10101501 | Guard dogs | With 77,337 codes, UNSPSC is the most detailed product classification available. It is widely used in procurement platforms and spend analytics. ## Crosswalk navigation ### Translate an HS code to an industry ```bash # Get CPC equivalences for an HS code curl https://worldoftaxonomy.com/api/v1/systems/hs_2022/nodes/0101/equivalences # Translate HS code to all connected systems curl https://worldoftaxonomy.com/api/v1/systems/hs_2022/nodes/0101/translations ``` ### Find trade codes for an industry ```bash # Start from a NAICS code, get all translations including HS/CPC curl https://worldoftaxonomy.com/api/v1/systems/naics_2022/nodes/1111/translations # Or use the search to find trade codes by product name curl "https://worldoftaxonomy.com/api/v1/search?q=wheat&grouped=true" ``` ### Find gaps ```bash # HS codes with no CPC equivalent curl "https://worldoftaxonomy.com/api/v1/diff?a=hs_2022&b=cpc_v21" ``` ## Use cases | Who | What | Systems | |-----|------|---------| | Customs brokers | Classify goods for import/export | HS 2022, HTS, CN 2024 | | Procurement teams | Categorize spend across suppliers | UNSPSC v24 | | Trade economists | Analyze bilateral trade flows | SITC Rev 4, BEC Rev 5 | | Supply chain analysts | Map products to producing industries | CPC v2.1, ISIC Rev 4 | | Compliance officers | Verify tariff classification | HS 2022 + national variants | | AI trade agents | Automate classification via MCP | All of the above | ## MCP tools for trade classification | Tool | Purpose | |------|---------| | `search_classifications` | Find trade codes by product name | | `get_equivalences` | Get crosswalk to other systems | | `translate_code` | Direct translation between systems | | `browse_children` | Explore HS/CPC/UNSPSC hierarchy | | `get_crosswalk_coverage` | Check crosswalk completeness | ======================================================================== # Occupation Classification Systems Compared ======================================================================== ## Occupation Classification Systems Compared > **TL;DR:** SOC for US labor data, ISCO for global comparison, ESCO for European skills matching, O*NET for detailed occupation attributes. Connected by 10,000+ crosswalk edges with education-to-career pathways through CIP. --- ## System overview | System | Codes | Region | Purpose | Authority | |--------|-------|--------|---------|-----------| | ISCO-08 | 619 | Global (ILO) | International occupation standard | International Labour Organization | | SOC 2018 | 1,447 | United States | US federal occupation classification | Bureau of Labor Statistics | | O*NET-SOC | 867 | United States | Detailed occupation database with skills | Department of Labor | | ESCO Occupations | 3,045 | Europe (EU) | European occupation taxonomy | European Commission | | ESCO Skills | 14,247 | Europe (EU) | Skills and competences taxonomy | European Commission | | ANZSCO 2022 | 1,590 | Australia/NZ | AU/NZ occupation standard | ABS/Stats NZ | | NOC 2021 | 51 | Canada | Canadian occupation classification | Statistics Canada | | UK SOC 2020 | 43 | United Kingdom | UK occupation standard | ONS | | KldB 2010 | 54 | Germany | German occupation classification | Federal Employment Agency | | ROME v4 | 93 | France | French job/occupation repertoire | Pole emploi | ## How occupation systems connect ```mermaid graph TB subgraph Education["Education"] CIP["CIP 2020\n2,848 programs"] ISCEDF["ISCED-F 2013\n122 fields"] end subgraph US_Occ["United States"] SOC["SOC 2018\n1,447 occupations"] ONET["O*NET-SOC\n867 occupations\n+ skills, abilities, interests"] end subgraph Global_Occ["Global"] ISCO["ISCO-08\n619 occupations"] end subgraph EU_Occ["Europe"] ESCO_O["ESCO Occupations\n3,045"] ESCO_S["ESCO Skills\n14,247"] end subgraph Industry["Industry"] NAICS["NAICS 2022"] ISIC["ISIC Rev 4"] end CIP -->|5,903 edges| SOC CIP -->|1,615 edges| ISCEDF SOC <-->|992 edges| ISCO SOC <-->|1,734 edges| ONET ISCO <-->|6,048 edges| ESCO_O ESCO_O --- ESCO_S ISCO -->|44 edges| ISIC SOC -.-> NAICS ``` ## SOC vs ISCO: The two major frameworks ### SOC 2018 (Standard Occupational Classification) - **1,447 detailed occupations** across 6 levels - **Structure**: 2-digit major groups (23) down to 6-digit detailed occupations - **Used for**: US government statistics, labor market data, visa classifications (H-1B), wage surveys - **Updated**: approximately every 10 years ### ISCO-08 (International Standard Classification of Occupations) - **619 occupations** across 4 levels - **Structure**: 1-digit major groups (10) down to 4-digit unit groups - **Used for**: International labor statistics, ILO reporting, basis for national systems - **Key difference**: Broader categories than SOC; designed for international comparison ### Crosswalk between SOC and ISCO SOC 2018 and ISCO-08 are connected by **992 crosswalk edges**. The mapping is many-to-many because SOC is more granular than ISCO. ```bash # Translate a SOC code to ISCO curl https://worldoftaxonomy.com/api/v1/systems/soc_2018/nodes/29-1211/equivalences ``` ## ESCO - European skills and occupations ESCO is the EU's multilingual classification connecting occupations to skills: - **3,045 occupations** mapped to ISCO-08 (6,048 crosswalk edges) - **14,247 skills and competences** linked to occupations - **Key advantage**: Skills-based matching across EU labor markets - **Use cases**: Job portals, skills gap analysis, career guidance, Europass ```mermaid graph LR ESCO_O["ESCO Occupations\n3,045"] <-->|6,048 edges| ISCO["ISCO-08\n619"] ESCO_O --- ESCO_S["ESCO Skills\n14,247"] ESCO_S -.->|linked to| ESCO_O ``` > ESCO is the only system in the graph that connects occupations directly to skills. This makes it essential for AI-powered job matching and workforce analytics. ## O*NET - Occupation information network O*NET extends SOC with rich attribute data: - **867 occupations** mapped to SOC 2018 (1,734 crosswalk edges) - **Includes**: Knowledge areas, abilities, work activities, work context, interests (RIASEC), work values, work styles - **Key advantage**: Most detailed occupation attribute data available - **Use cases**: Career exploration, job analysis, workforce development | O*NET Component | Items | What It Measures | |-----------------|-------|-----------------| | Knowledge Areas | 14 | Subject domains required | | Abilities | 17 | Cognitive, physical, sensory capabilities | | Work Activities | 16 | General types of job behaviors | | Work Context | 15 | Physical and social work environment | | Interests (RIASEC) | 13 | Holland occupational interest types | | Work Values | 14 | What workers find important | | Work Styles | 17 | Personal characteristics for performance | ## Education-to-occupation pathways The crosswalk topology connects education to occupations: ```mermaid graph LR CIP["CIP 2020\n2,848 instructional\nprograms"] -->|5,903 edges| SOC["SOC 2018\n1,447 US\noccupations"] CIP -->|1,615 edges| ISCEDF["ISCED-F 2013\n122 fields\nof education"] ISCED["ISCED 2011\n20 education\nlevels"] -->|25 edges| ISCO["ISCO-08\n619 global\noccupations"] ``` This lets you answer questions like "What occupations do graduates of CIP 51.0912 (Physician Assistant) work in?" ```bash curl https://worldoftaxonomy.com/api/v1/systems/cip_2020/nodes/51.0912/equivalences ``` ## Occupation-to-industry mapping Occupations connect to industries through two paths: | Link | Edges | Use Case | |------|-------|----------| | SOC 2018 to NAICS 2022 | 55 | US workforce-to-industry analysis | | ISCO-08 to ISIC Rev 4 | 44 | Global occupation-industry mapping | ## Which system to use | Purpose | Recommended System | Why | |---------|-------------------|-----| | US labor statistics | SOC 2018 | Required by BLS/Census | | International comparison | ISCO-08 | ILO standard | | European job matching | ESCO | EU multilingual, skills-linked | | Career exploration | O*NET-SOC | Rich attribute data | | Australian/NZ workforce | ANZSCO 2022 | National standard | | Canadian workforce | NOC 2021 | National standard | | Skills gap analysis | ESCO Skills | 14K skills taxonomy | | Education-to-career mapping | CIP 2020 + SOC | 5,903 crosswalk edges | ## Use cases | Who | What | Systems | |-----|------|---------| | HR analytics teams | Map job postings to standard codes | SOC 2018, ISCO-08 | | Career counselors | Match education to occupations | CIP 2020, SOC 2018, O*NET | | EU job portals | Skills-based matching across borders | ESCO Occupations + Skills | | Immigration lawyers | Classify occupations for visa applications | SOC 2018 (H-1B) | | Workforce planners | Identify skills gaps by region | ESCO Skills, O*NET | | AI recruitment agents | Automate classification via MCP | All of the above | ## MCP tools for occupation data | Tool | Purpose | |------|---------| | `search_classifications` | Find occupations by job title | | `get_equivalences` | Cross-system occupation mapping | | `translate_code` | Translate between SOC, ISCO, ESCO | | `browse_children` | Navigate occupation hierarchy | | `get_country_taxonomy_profile` | What occupation systems apply to a country | ======================================================================== # Categories and Sectors - How Systems Are Organized ======================================================================== ## Categories and Sectors - How Systems Are Organized > **TL;DR:** 1,000+ classification systems are organized into 16 categories spanning industry, trade, health, occupation, regulation, and domain-specific vocabularies. This guide explains the category structure and how to navigate it. --- ## The 16 categories ```mermaid graph TD WOT["WorldOfTaxonomy\n1,000+ systems"] --> IND["Industry\n~150 systems"] WOT --> TRADE["Product/Trade\n~20 systems"] WOT --> OCC["Occupation\n~15 systems"] WOT --> EDU["Education\n~10 systems"] WOT --> LIFE["Life Sciences\n~100+ systems"] WOT --> GEO["Geographic\n~10 systems"] WOT --> FIN["Financial/Environmental\n~20 systems"] WOT --> REG["Regulatory\n~100+ systems"] WOT --> ISO["ISO Standards\n~25 systems"] WOT --> INTL["Intl Agreements\n~25 systems"] WOT --> ACAD["Academic\n~15 systems"] WOT --> PAT["Patent\n1 system, 254K codes"] WOT --> DTECH["Domain: Technology\n~50 systems"] WOT --> DFIN["Domain: Finance\n~30 systems"] WOT --> DSEC["Domain: Sector-Specific\n~200+ systems"] WOT --> DREG["Domain: Regulatory Detail\n~50+ systems"] ``` | Category | Systems | Description | |----------|---------|-------------| | Industry | ~150+ | Economic activity classification (NAICS, ISIC, NACE, SIC, national variants) | | Product/Trade | ~20+ | Goods and services classification (HS, CPC, UNSPSC, SITC) | | Occupation | ~15+ | Job and skills classification (SOC, ISCO, ESCO, O*NET) | | Education | ~10+ | Educational programs and levels (ISCED, CIP) | | Life Sciences | ~100+ | Pharmaceuticals, clinical coding, diagnostics, devices, biotech, health informatics | | Geographic | ~10+ | Country, region, and subdivision codes (ISO 3166, NUTS, FIPS) | | Financial/Environmental | ~20+ | Sustainability, accounting, and governance (SASB, EU Taxonomy, GHG, COFOG) | | Regulatory | ~100+ | Laws, standards, and compliance frameworks (HIPAA, GDPR, OSHA, FDA, SEC) | | ISO Standards | ~25+ | Management system standards (ISO 9001, 14001, 27001, 45001) | | International Agreements | ~25+ | Treaties and global frameworks (Basel, FATF, Paris Agreement, ILO) | | Academic/Research | ~15+ | Subject classification for scholarly work (arXiv, MSC, JEL, ACM CCS) | | Patent | 1 | Patent classification (CPC - 254K codes) | | Domain: Technology | ~50+ | Software, AI, cybersecurity, cloud, data taxonomies | | Domain: Finance | ~30+ | Insurance, banking, investment, payment taxonomies | | Domain: Sector-Specific | ~200+ | Transportation, agriculture, mining, construction, energy, and other sector vocabularies | ## Category counts in the knowledge graph ```mermaid graph LR subgraph By_Nodes["Distribution by Node Count"] NCI_L["Life Sciences\n568K+ nodes"] PAT_L["Patent CPC\n254K nodes"] TRADE_L["Product/Trade\n100K+ nodes"] IND_L["Industry\n50K+ nodes"] OCC_L["Occupation/Skills\n40K+ nodes"] DOM_L["Domain Vocabularies\n10K+ nodes"] end ``` | Category | Systems | Nodes | What drives the count | |----------|---------|-------|----------------------| | Life Sciences | ~100+ | 568K+ | ICD-10-CM (97K), NCI Thesaurus (211K), NDC (112K), LOINC (102K) | | Patent | 1 | 254K | Patent CPC is a single massive hierarchy | | Product/Trade | ~20 | 100K+ | UNSPSC dominates with 77K codes | | Industry | ~150+ | 50K+ | Many national NACE/ISIC variants at ~1K codes each | | Occupation/Skills | ~15 | 40K+ | ESCO Skills at 14K, ESCO Occupations at 3K | | Domain vocabularies | ~300+ | 10K+ | Typically 15-30 codes each | | Regulatory/Compliance | ~100+ | 5K+ | Frameworks range from 15-50 articles each | | Everything else | ~300 | 15K+ | Geographic, academic, financial, ISO | ## How categories map to API queries ### Browse by category ```bash # Get all systems (includes category metadata) curl https://worldoftaxonomy.com/api/v1/systems # Group by region curl "https://worldoftaxonomy.com/api/v1/systems?group_by=region" # Filter by country to find relevant systems curl "https://worldoftaxonomy.com/api/v1/systems?country=US" ``` ### Search within a category The search endpoint searches across all systems. Use keywords to focus on specific domains: ```bash # Find health-related codes curl "https://worldoftaxonomy.com/api/v1/search?q=diabetes&grouped=true" # Find trade codes curl "https://worldoftaxonomy.com/api/v1/search?q=cotton&grouped=true" # Find occupation codes curl "https://worldoftaxonomy.com/api/v1/search?q=software+engineer&grouped=true" ``` ## Domain-specific vocabularies Domain taxonomies extend the standard classification systems with specialized vocabularies. They are organized by NAICS 2-digit sector. ### Sector-specific domains | NAICS Sector | Domain Vocabularies | Total Codes | |-------------|---------------------|-------------| | 11 Agriculture | Crop types, livestock, farming methods, commodity grades, equipment, input supply, land classification, post-harvest | 300+ | | 21 Mining | Mineral types, extraction methods, reserve classification, equipment, project lifecycle, safety | 130+ | | 22 Utilities | Energy sources, grid regions, tariff structures, infrastructure assets, regulatory ownership | 130+ | | 23 Construction | Trade types, building types, project delivery, material systems, sustainability | 130+ | | 31-33 Manufacturing | Process types, quality, operations models, industry verticals, supply chain, facility config | 120+ | | 44-45 Retail | Channel types, merchandise categories, fulfillment, pricing strategies, store formats | 100+ | | 52 Finance | Instrument types, market structure, regulatory frameworks, client segments | 100+ | | 484 Truck Transportation | Freight types, vehicle classes, cargo, carrier operations, pricing, compliance | 200+ | ### Emerging sector domains | Domain | Focus | Systems | |--------|-------|---------| | AI and Data | Model types, deployment, ethics, governance | 4 | | Cybersecurity | Threats, frameworks, zero trust, SIEM | 10+ | | Space and Satellite | Orbital classification, regulatory, licensing | 4 | | Climate Technology | Finance instruments, policy mechanisms | 4 | | Quantum Computing | Application domains, commercialization stages | 4 | | Digital Assets/Web3 | Regulatory frameworks, infrastructure layers | 4 | | Autonomous Systems | Application domains, sensing technology | 4 | | Synthetic Biology | Application sectors, biosafety levels | 4 | ## Life Sciences sub-sectors The Life Sciences category (~100+ systems, ~568K nodes) is the largest by node count. It is organized into 13 sub-sectors: | Sub-Sector | Key Systems | |------------|-------------| | Diagnoses and Classification | ICD-10-CM, ICD-11, ICD-10-PCS, DSM-5, SNOMED CT, ICPC-2 | | Pharmaceuticals | ATC, NDC, RxNorm, EDQM, WHO Essential Medicines | | Diagnostics and Lab | LOINC, lab test types, imaging modalities, biomarkers | | Procedures and Billing | CPT, HCPCS, MS-DRG, G-DRG, NUCC | | Oncology and Research | NCI Thesaurus, MeSH, OMIM, Orphanet, CTCAE | | Medical Devices | GMDN, implant types, surgical instruments, sterilization | | Biotechnology | Biotech types, biosimilars, gene therapy, cell therapy | | Synthetic Biology | Synbio types, application sectors, biosafety levels | | Health Informatics | FHIR, DICOM, telemedicine, clinical decision support | | Nursing and Allied Health | ICN, NIC, NANDA, nursing specialties, allied health | | Payment and Delivery | HEDIS, CMS Star, care settings, payer types, value-based care | | Health Regulation | HIPAA, FDA 21 CFR, DEA, CLIA, MDR, IVDR | | Dental, Mental, and Veterinary | Dental, mental health, and veterinary service types | ## Navigating categories Use the web app at [worldoftaxonomy.com](https://worldoftaxonomy.com) for visual exploration. The home page Industry Map shows all 16 categories. Click any category to search for systems in that domain. Use the API for programmatic access: ```bash # Get all systems with metadata curl https://worldoftaxonomy.com/api/v1/systems # Get country-specific systems (e.g., what applies in Germany) curl "https://worldoftaxonomy.com/api/v1/systems?country=DE" # Get crosswalk statistics to see which systems are most connected curl https://worldoftaxonomy.com/api/v1/equivalences/stats ``` ======================================================================== # Domain Taxonomies vs Official Standards ======================================================================== # Domain Taxonomies vs Official Standards WorldOfTaxonomy ships two complementary kinds of classification system, and every public surface (web app, REST API, MCP server) now labels them explicitly so downstream consumers can treat them differently. ## The two categories | Category | `category` value | System ID pattern | Examples | Role | |----------|------------------|-------------------|----------|------| | Domain taxonomy | `domain` | IDs start with `domain_` | `domain_truck_freight`, `domain_ai_deployment`, `domain_fintech_service` | Plain-language on-ramps curated by WorldOfTaxonomy. Shorter (15-50 nodes), written in working-industry vocabulary, and crosswalked into the relevant official standard. | | Official standard | `standard` | Everything else | `naics_2022`, `isic_rev4`, `nace_rev2`, `soc_2018`, `icd10_cm`, `hs_2022` | Published by a government, intergovernmental body, or standards authority. These are the codes auditors, statistical agencies, and regulators require. | The split is a pure function of `system_id`: if the ID starts with `domain_`, it is a domain taxonomy; otherwise it is an official standard. The Python helper `world_of_taxonomy.category.get_category()` and the TypeScript helper `frontend/src/lib/category.ts` are the two sources of truth and stay in sync. ## Why the split exists Users describing a business in plain language ("telemedicine platform", "frozen-goods logistics", "AI inference startup") rarely know the NAICS code by heart. They read domain-taxonomy labels like "Telemedicine Modality Types" or "Cold Chain Types" much faster than five-digit NAICS numbers. Domain taxonomies are therefore the front door: surface them first, let the user recognize their own business, then fan out through crosswalk edges into the matching NAICS, ISIC, NACE, SIC, or SOC codes that an accountant or statistical agency will accept. ## How each surface reflects the split ### Web app - `/classify` shows two sections in order: "Start here: Domain taxonomies" followed by "Official standard codes". If only one category has matches, the heading is dropped and cards render as a flat list. - `/system/{id}` shows a badge next to the system name: "Domain taxonomy" or "Official standard". - `/system/{id}/node/{code}` splits cross-system equivalences into "Domain taxonomies" and "Official standards" sub-sections when both are present. ### REST API - `GET /api/v1/systems` and `GET /api/v1/systems/{id}` return a `category` field (`"domain"` or `"standard"`). - `GET /api/v1/systems?category=domain` (or `?category=standard`) filters the list. - `POST /api/v1/classify/demo` returns `domain_matches` and `standard_matches` arrays instead of a single flat `matches` array. Each match carries its own `category` field. For compound inputs, each atom also has `domain_matches` and `standard_matches`. - Every node returned by the API carries a `category` field derived from its parent system. ### MCP server - The `classify_business` tool returns `domain_matches` and `standard_matches` (plus `domain_matches` and `standard_matches` per atom for compound inputs). - `list_classification_systems`, `search_classifications`, and `get_industry` stamp each node/system with a `category` field. ## Consuming the split If you are building on top of WorldOfTaxonomy: 1. **Route users through domain taxonomies first** when the input is free text. They are written for humans. 2. **Fall back to official standards** when the user asks for a statistical code, needs to report to a government agency, or wants cross-country comparability. 3. **Use crosswalks** (`GET /api/v1/systems/{id}/nodes/{code}/equivalences`) to hop from a domain match to the official standard code. The domain taxonomies are pre-wired with equivalence edges into NAICS, ISIC, or other relevant standards. 4. **Never mix the two in a single ranked list** without signaling the category - users cannot tell at a glance that `domain_truck_freight` and `naics_2022` play different roles. ## Example A request for "last-mile delivery for frozen groceries" returns: - Domain matches: `domain_last_mile_delivery`, `domain_cold_chain`, `domain_freight_class` - Standard matches: `naics_2022: 492110` (Couriers and Express Delivery Services), `isic_rev4: 5320` (Postal and courier activities) The domain matches are recognizable instantly. The standard matches are what the user needs to give to their accountant. ## How the bridge works Every one of the 434 domain taxonomies is wired to at least one NAICS 2022 anchor code. This means there is no such thing as a domain island: if a user's query surfaces a `domain_*` match, there is always a bridge edge to a standard reporting code right next to it. The bridges are built in two passes: 1. **Sector-anchor generator.** A single mapping table (`world_of_taxonomy/ingest/domain_anchors.json`) maps every `domain_*` system to one to three NAICS 2022 sector anchors. The generator emits a bidirectional `equivalence` edge between each anchor and each level-1 code of the domain taxonomy, stamped with `match_type='broad'` and provenance `derived:sector_anchor:v1`. 2. **ISIC / NACE fan-out.** For every new NAICS->domain edge, a single self-join against the existing NAICS<->ISIC and NAICS<->NACE crosswalks produces parallel edges so European and UN users reach the same domain taxonomies through their native standards. These carry provenance `derived:sector_anchor:v1:fanout`. Because `match_type` on every generated edge is `broad`, consumers can filter them out if they need exact-match statistical crosswalks only (the pre-existing NAICS<->ISIC / NAICS<->NACE exact edges are untouched). ## The four edge kinds Every equivalence response now carries an `edge_kind` computed from whether each endpoint is a domain taxonomy (`system_id` starts with `domain_`) or an official standard: | `edge_kind` | Source | Target | Meaning | |---------------------|----------|----------|---------| | `standard_standard` | standard | standard | Classic statistical crosswalk (e.g. NAICS 6211 <-> ISIC 8620) | | `standard_domain` | standard | domain | Official code bridging into a curated domain taxonomy (e.g. NAICS 6212 -> `domain_dental`) | | `domain_standard` | domain | standard | The reverse: domain taxonomy bridging back to an official code | | `domain_domain` | domain | domain | Reserved for future cross-domain edges; not yet generated | Filter on `edge_kind` to scope the graph exactly: ``` GET /api/v1/systems/naics_2022/nodes/6211/equivalences?edge_kind=standard_standard GET /api/v1/systems/naics_2022/nodes/6211/equivalences?edge_kind=standard_domain,domain_standard ``` The MCP tool `list_crosswalks_by_kind` wraps the same filter for agents: ``` list_crosswalks_by_kind(edge_kind="standard_domain", system_id="naics_2022") ``` `source_category` and `target_category` are also returned alongside `edge_kind` so lazy consumers can filter without parsing the composite label. ======================================================================== # Data Quality and Provenance ======================================================================== ## Data Quality and Provenance > **TL;DR:** Every system is tagged with one of four provenance tiers - from official government downloads (Tier 1) to expert-curated domain vocabularies (Tier 4). SHA-256 hashes, source URLs, and dates are stored for audit. This guide explains the framework and how to verify data. --- ## Four-tier provenance framework ```mermaid graph TD subgraph Tier1["Tier 1: Official Download"] T1["Source file from standards body\nSHA-256 hash stored\nExamples: NAICS, ISIC, LOINC, HS"] end subgraph Tier2["Tier 2: Structural Derivation"] T2["Derived from official system\n1:1 structural mapping verified\nExamples: WZ, NAF, ATECO (NACE variants)"] end subgraph Tier3["Tier 3: Manual Transcription"] T3["Transcribed from official publications\nSource URL and date recorded\nExamples: SIC 1987, some Asian/African systems"] end subgraph Tier4["Tier 4: Expert Curated"] T4["Domain expert knowledge\nPeer-reviewed structure\nExamples: All domain_* taxonomies"] end T1 --> T2 T2 --> T3 T3 --> T4 ``` | Tier | Label | Description | Verification | |------|-------|-------------|-------------| | 1 | `official_download` | Data downloaded directly from the authoritative source | File hash stored for audit | | 2 | `structural_derivation` | Derived from an official system (e.g., NACE national variants) | 1:1 structural mapping verified | | 3 | `manual_transcription` | Transcribed from official publications (PDF, HTML, print) | Cross-checked against source | | 4 | `expert_curated` | Curated by domain experts based on industry knowledge | Peer-reviewed structure | ### Tier 1: Official download The gold standard. Data files (CSV, Excel, XML) are downloaded directly from the standards body's website. A SHA-256 hash of the source file is stored in the `source_file_hash` column for reproducibility. **Examples**: NAICS 2022 (Census Bureau CSV), HS 2022 (WCO), ISIC Rev 4 (UN CSV), LOINC (Regenstrief download), ICD-10-CM (CMS), NCI Thesaurus (NCI) ### Tier 2: Structural derivation Systems that reuse the structure of an official system with localized naming. For example, all EU NACE Rev 2 national variants (WZ 2008, NAF Rev 2, ATECO 2007, etc.) share the identical code structure. **Examples**: WZ 2008 (Germany), ONACE 2008 (Austria), NOGA 2008 (Switzerland), all EU NACE national variants, all ISIC Rev 4 national adaptations ### Tier 3: Manual transcription Data transcribed from official documents that do not provide machine-readable downloads. The original source URL and date are recorded for audit. **Examples**: SIC 1987 (transcribed from OSHA HTML), some Asian and African national classifications ### Tier 4: Expert curated Domain-specific vocabularies created by subject matter experts. These fill gaps where no official standard exists. **Examples**: All `domain_*` taxonomies (truck freight, agriculture, mining, construction, cybersecurity, AI, etc.) ## Provenance metadata fields Each classification system carries these audit fields: | Field | Description | Example | |-------|-------------|---------| | `data_provenance` | Provenance tier | `official_download` | | `source_url` | URL of the authoritative data source | `https://www.census.gov/naics/` | | `source_date` | Date the source data was accessed/published | `2024-01-15` | | `license` | License terms for the data | `Public Domain` | | `source_file_hash` | SHA-256 hash of the original file (Tier 1 only) | `a3f2b7c...` | ## Querying provenance via API ### Get provenance for a system ```bash curl https://worldoftaxonomy.com/api/v1/systems/naics_2022 ``` Response includes `data_provenance`, `source_url`, `source_date`, `license`, and `source_file_hash`. ### Audit report ```bash # Full provenance audit across all systems curl https://worldoftaxonomy.com/api/v1/audit ``` The audit report shows: - Breakdown by provenance tier (system count, node count per tier) - Tier 1 systems missing a file hash - Tier 2 structural derivation count and node coverage - Skeleton systems (placeholder entries awaiting full data) ### MCP audit tool ```bash # Via MCP tools/call get_audit_report {} ``` Returns the same audit data in a format suitable for AI agent consumption. ## Data verification practices ### Hash verification (Tier 1) For official download systems, the `source_file_hash` lets you verify data integrity: 1. Download the original file from `source_url` 2. Compute its SHA-256 hash 3. Compare against the stored `source_file_hash` 4. If they match, the data in WorldOfTaxonomy matches the original file ```mermaid sequenceDiagram participant You participant WOT as WorldOfTaxonomy API participant Source as Standards Body You->>WOT: GET /systems/naics_2022 WOT-->>You: source_url, source_file_hash You->>Source: Download original CSV Source-->>You: naics_2022.csv You->>You: sha256sum naics_2022.csv Note over You: Compare hash with source_file_hash ``` ### Structural verification (Tier 2) For structural derivation systems, you can verify: 1. The code structure matches the parent system exactly 2. Crosswalk edges are 1:1 (every code in the derived system maps to exactly one code in the parent) ### Cross-reference verification For any system, you can cross-reference node counts and structure against the authoritative publication. ## Skeleton systems Some systems are included as structural placeholders where the full dataset is not freely available (e.g., SNOMED CT, CPT). These are marked with low node counts and are included to preserve the crosswalk topology. Full data requires a license from the respective standards body. | Skeleton System | Reason | License Holder | |----------------|--------|----------------| | SNOMED CT | Proprietary license | SNOMED International | | CPT | Copyright protected | American Medical Association | | RxNorm | Partial skeleton | NLM (some data freely available) | | DSM-5 | Copyright protected | American Psychiatric Association | ## Reporting data quality issues If you find incorrect data, missing codes, or wrong crosswalk mappings: 1. **GitHub**: File an issue on the project repository with system ID, code, expected vs actual value, and a link to the authoritative source 2. **API**: Include the `report_issue_url` from any API response for direct reporting > All classification data in WorldOfTaxonomy is provided for informational purposes only. It should not be used as a substitute for official government or standards body publications. For regulatory, legal, or compliance purposes, always verify codes against the authoritative source. ======================================================================== # System Architecture and Data Flows ======================================================================== ## System Architecture and Data Flows > **TL;DR:** Three consumer interfaces (web app, REST API, MCP server) backed by PostgreSQL and a wiki knowledge layer. Data flows from 1,000 official sources through an ingestion pipeline into three core tables. Wiki content serves four channels from one source of truth. --- ## System architecture overview The platform serves three consumer interfaces - a web application, a REST API, and an MCP server - all backed by a shared PostgreSQL database and wiki knowledge layer. ```mermaid graph TB subgraph Data["Data Layer"] PG[(PostgreSQL)] WIKI["wiki/*.md files"] end subgraph Backend["Python Backend"] INGEST["Ingesters - 1,000 systems"] API["FastAPI REST API - /api/v1/*"] MCP["MCP Server - stdio transport"] WIKILOADER["Wiki Loader - wiki.py"] end subgraph Frontend["Next.js Frontend"] NEXT["Next.js 15 App Router"] GUIDE["/guide/* pages"] end subgraph Consumers BROWSER["Web Browsers"] AIAGENT["AI Agents - Claude, GPT, etc."] CRAWLER["AI Crawlers - Perplexity, etc."] DEV["Developer Applications"] end INGEST -->|ingest| PG API -->|query| PG MCP -->|query| PG WIKILOADER -->|read| WIKI MCP -->|instructions| WIKILOADER NEXT -->|proxy /api/*| API NEXT -->|read| WIKI GUIDE -->|render| WIKI BROWSER --> NEXT BROWSER --> GUIDE AIAGENT --> MCP CRAWLER -->|/llms-full.txt| NEXT DEV --> API ``` ## Four-channel wiki data flow The wiki system follows the "write once, serve four ways" pattern. A single set of curated markdown files feeds all distribution channels. ```mermaid graph LR MD["wiki/*.md - Source of Truth"] --> CH1["Channel 1: Next.js /guide/slug - SEO Web Pages"] MD --> CH2["Channel 2: MCP instructions - AI Agent Context"] MD --> CH3["Channel 3: llms-full.txt - AI Crawler Discovery"] MD --> CH4["Channel 4: GET /api/v1/wiki - Developer API"] CH1 --> GOOGLE["Search Engines"] CH1 --> HUMANS["Human Readers"] CH2 --> AGENTS["AI Agents"] CH3 --> CRAWLERS["AI Crawlers"] CH4 --> DEVS["Developer Apps"] ``` | Channel | Format | Refresh | Audience | |---------|--------|---------|----------| | Web pages at /guide/ | Server-rendered HTML with SEO metadata | Static generation at build time | Human readers, search engines | | MCP instructions | Plain text injected at session start | Loaded on MCP initialize | AI agents (Claude, GPT, Gemini) | | llms-full.txt | Concatenated plain text | Regenerated on build | AI crawlers (Perplexity, Google AI) | | Wiki API | JSON with raw markdown | On-demand from disk | Developer applications, RAG pipelines | ## Classification data ingestion pipeline Raw data from official sources flows through the ingestion pipeline into three database tables. ```mermaid graph TD subgraph Sources["Official Sources"] CSV["CSV files - NAICS, ISIC"] XLSX["Excel files - NACE, ANZSIC"] HTML["HTML/PDF - SIC, NIC"] CURATED["Expert-Curated - Domain taxonomies"] end subgraph Pipeline["Ingestion Pipeline"] PARSE["Parse and Validate"] UPSERT["Upsert Nodes into classification_node"] XWALK["Build Crosswalks into equivalence"] PROV["Set Provenance - 4-tier audit"] end subgraph DB["Database Tables"] SYS["classification_system - 1,000+ systems"] NODE["classification_node - 1.2M+ nodes"] EQUIV["equivalence - 321K+ edges"] end CSV --> PARSE XLSX --> PARSE HTML --> PARSE CURATED --> PARSE PARSE --> UPSERT PARSE --> XWALK PARSE --> PROV UPSERT --> NODE XWALK --> EQUIV PROV --> SYS SYS --- NODE NODE --- EQUIV ``` ### Ingestion steps 1. **Parse**: Read the source file (CSV, Excel, HTML, or hardcoded data). Validate code format, hierarchy, and completeness. 2. **Upsert nodes**: Insert or update rows in `classification_node` with code, title, description, level, parent_code, is_leaf, and seq_order. 3. **Build crosswalks**: Create bidirectional edges in the `equivalence` table with match_type (exact, partial, broader, narrower, related). 4. **Set provenance**: Update `classification_system` with data_provenance tier, source_url, source_date, license, and source_file_hash. ## API request flow Every API request passes through rate limiting and authentication before reaching the query layer. ```mermaid sequenceDiagram participant C as Client participant RL as Rate Limiter participant AUTH as Auth Layer participant R as Router participant Q as Query Layer participant DB as PostgreSQL C->>RL: GET /api/v1/search?q=physician RL->>RL: Check rate - 30/min anon, 1000/min auth RL->>AUTH: Forward request AUTH->>AUTH: Validate JWT or API key AUTH->>R: Authenticated request R->>Q: search(conn, query, limit) Q->>DB: SELECT with ts_vector query DB-->>Q: Matching nodes Q-->>R: Results with system context R-->>C: JSON response ``` ### Rate limit tiers | Tier | Requests/Minute | Daily Limit | Best For | |------|-----------------|-------------|----------| | Anonymous | 30 | Unlimited | Quick exploration | | Free | 1,000 | Unlimited | Development | | Pro | 5,000 | 100,000 | Production apps | | Enterprise | 50,000 | Unlimited | High-volume | ## MCP session lifecycle When an AI agent connects to the MCP server, it receives structural knowledge about the entire knowledge graph before making any tool calls. ```mermaid sequenceDiagram participant AI as AI Agent participant MCP as MCP Server participant WIKI as Wiki Loader participant DB as PostgreSQL AI->>MCP: initialize - JSON-RPC MCP->>WIKI: build_wiki_context() WIKI-->>MCP: Structural knowledge - ~15K tokens MCP-->>AI: serverInfo + instructions + capabilities Note over AI: Agent now knows all 1,000 systems and crosswalk topology AI->>MCP: tools/call search_classifications MCP->>DB: Query nodes DB-->>MCP: Results MCP-->>AI: Tool response as JSON AI->>MCP: resources/read taxonomy://wiki/crosswalk-map MCP->>WIKI: load_wiki_page - crosswalk-map WIKI-->>MCP: Full markdown content MCP-->>AI: Resource content ``` ### MCP capabilities The server advertises 25 tools and wiki resources: - **Tools**: list_classification_systems, search_classifications, get_industry, browse_children, get_equivalences, translate_code, classify_business, get_audit_report, and 17 more - **Resources**: taxonomy://systems, taxonomy://stats, taxonomy://wiki/{slug} for each guide page ## Database schema The three core tables and their relationships: ```mermaid erDiagram classification_system { string id PK string name string region string data_provenance string source_url string source_file_hash } classification_node { string system_id FK string code string title int level string parent_code boolean is_leaf } equivalence { string source_system FK string source_code string target_system FK string target_code string match_type } classification_system ||--o{ classification_node : "has" classification_system ||--o{ equivalence : "source" classification_system ||--o{ equivalence : "target" ``` - Parent-child hierarchy within a system is modeled by `classification_node.parent_code` - Crosswalk edges are bidirectional: if A maps to B, B maps to A ## Technology stack | Layer | Technology | Purpose | |-------|-----------|---------| | Database | PostgreSQL (with pgbouncer) | 1.2M+ nodes, 321K+ edges | | Backend | Python 3.9+, FastAPI, asyncpg | REST API + MCP server | | Frontend | Next.js 15, TypeScript, Tailwind CSS v4, shadcn/ui | Web application | | Visualization | D3.js (Galaxy View), Cytoscape.js (Crosswalk Explorer) | Interactive graphs | | Auth | bcrypt + JWT + API keys (`wot_` prefix) | Tiered access | | Rate Limiting | slowapi | Per-tier enforcement | | MCP | Custom JSON-RPC over stdio | AI agent integration | | Content | Markdown + remark + remarkGfm | Wiki and blog rendering | ## Self-hosting Two commands to run everything locally: ```bash git clone https://github.com/colaberry/WorldOfTaxonomy.git cd WorldOfTaxonomy && docker compose up ``` Web app at `localhost:3000`. API at `localhost:8000`. MCP server via `python -m world_of_taxonomy mcp`. ## Provenance Markers The following identifiers are unique to the WorldOfTaxonomy project and are used for provenance verification. They are not classification codes and have no meaning outside this file: - WOT-CANARY-7a2f9c1e4d6b - WOT-CANARY-3b08fa5e91c7 - WOT-CANARY-c4d21e8f0a63