human-language

Human Language Project

A sophisticated web application for transforming natural language into Wikidata entity and property sequences, enabling semantic understanding and knowledge representation.

🎯 Vision

The Human Language project aims to create a universal meta-language that bridges all human languages by leveraging Wikidata’s semantic knowledge graph. By converting natural language into sequences of entities (Q) and properties (P), we enable:

Cross-linguistic understanding: Unified representation across all languages
Semantic precision: Disambiguation of concepts using Wikidata’s rich ontology
Knowledge integration: Direct connection to the world’s largest open knowledge base
IPA support: Universal phonetic representation for true language unification

Long-term Impact

This project will fundamentally transform how we store, access, and verify human knowledge:

Universal Encyclopedia: Merge all Wikipedia content into a single, instantly translatable knowledge base
Language of Meaning: Create formal definitions for every concept, enabling perfect translation without AI
Fact-Checking Foundation: Build the world’s largest facts database for verifying AI outputs
Knowledge Preservation: Unify all article versions to preserve the best of human knowledge
AI Training Dataset: Provide structured knowledge for next-generation neural networks
Zero-Cost Translation: Enable LLMs to answer once and translate infinitely through semantic representation

🚀 Key Features

1. Text-to-Q/P Transformation

N-gram support: Recognizes multi-word phrases as single entities
Configurable matching: Adjust n-gram size (1-5) for optimal results
Priority-based search: Longer matches take precedence
Real-time transformation: Interactive web demo at transformation/index.html
Learn more →

2. Entity & Property Viewer

Beautiful UI: Modern, responsive interface with dark/light themes
Multi-language support: Automatic language detection and switching
Rich statements display: View all properties and relationships
Direct Wikidata links: Seamlessly navigate to source data
View entities at entities.html and properties at properties.html

3. Advanced Search & Disambiguation

Exact & fuzzy matching: Find entities even with typos
Context-aware ranking: Domain and type preferences
Batch searching: Efficient parallel searches
Multi-language search: Search in any supported language
API Documentation →

4. Intelligent Caching System

Multi-tier caching: File system (Node.js) and IndexedDB (browser)
Automatic fallback: Seamless switching between cache types
Performance optimized: Reduces API calls and improves response times
Cross-platform: Works in both Node.js and browser environments
Persistent storage: Cached data survives across sessions

5. Comprehensive Language Support

100+ languages: Full support for all major Wikidata languages
Locale-specific quotes: Proper quotation marks for each language
Flag emojis: Visual language indicators for better UX
Language persistence: Settings saved in localStorage

📋 Roadmap

Based on our GitHub issues, here’s our development roadmap:

Phase 1: Core Infrastructure Enhancement

Rename properties to relations/links (#12)
- Better semantic clarity for relationships between entities
- Update UI and API to reflect new terminology

Phase 2: Enhanced Language Support

IPA Translation Support (#1)
- Integrate International Phonetic Alphabet for universal pronunciation
- Enable true cross-linguistic unification
Words Page Development (#14)
- Display words in native language and IPA
- List all entities a word can represent
- Support alternative names/words for entities (#10)

Phase 3: Advanced Features

Automatic Description Conversion (#11)
- Convert natural language descriptions into Q/P sequences
- Enable semantic analysis of any text
Statements Viewer (#3)
- Display confirmations and refutations for each statement
- Build trust through community validation

Phase 4: External Integration

Wikidata Links API Access (#15)
- Direct API-style access to Wikidata relationships
- Enable programmatic knowledge graph traversal
Formal Ontology Integration (#17)
- Research and integrate best formal upper ontology
- Enhance semantic reasoning capabilities

Phase 5: Advanced Knowledge Representation

Cascade Triplets Support
- Implement cascade triplets (not natively supported by Wikidata)
- Enable complex relationship chains and hierarchical knowledge
Language of Meaning
- Transform human language into universal sequence of meaning
- Create consistent formal definitions for every entity and property
- Enable algorithmic translation to any language without LLMs
Wikidata Contribution Pipeline
- Refine and validate transformed data
- Contribute improvements back to Wikidata
- Automated gap filling and error correction

Phase 6: Universal Encyclopedia Project

Wikipedia Content Merger
- Merge all Wikipedia pages into single universal encyclopedia
- Instant translation to any language using only Wikidata
- No dependency on LLMs or GPTs for translation
Version Unification
- Merge all article versions into comprehensive single versions
- Preserve best content from all language editions
- Create most comprehensive human knowledge database
Neural Network Dataset
- Prepare advanced dataset for future AI training
- Structured knowledge representation for next-gen models

Phase 7: Fact-Checking Infrastructure

World’s Largest Facts Database
- Build comprehensive fact-checking foundation
- Enable verification of LLM/GPT outputs
LLM Translation Pipeline
- LLMs answer in English → Language of Meaning → Any human language
- Zero additional LLM cost for multilingual support
- Guaranteed semantic accuracy across languages

🏗️ Architecture Overview

Core Components

Wikidata API Client (wikidata-api.js)
- Handles all Wikidata API interactions
- Configurable caching strategies
- Batch request optimization
Text Transformer (transformation/text-to-qp-transformer.js)
- N-gram generation and matching
- Parallel search execution
- Priority-based result merging
Search Utilities (wikidata-api.js)
- Exact and fuzzy search algorithms
- Context-aware ranking system
- Multi-language support
Caching System (unified-cache.js)
- Factory pattern for cache creation
- File system cache for Node.js
- IndexedDB cache for browsers
UI Components (statements.jsx, loading.jsx)
- React 19 components with JSX
- No build step required (Babel in-browser)
- Responsive and theme-aware design

Data Flow

User Input → Text Transformer → N-gram Generator → Parallel Search
                                                           ↓
                                                    Wikidata API
                                                           ↓
                                                     Cache Layer
                                                           ↓
                                                   Result Merger
                                                           ↓
                                                    UI Display

🛠️ Technical Details

Dependencies

React 19: Latest features via ESM.sh CDN
Babel Standalone: In-browser JSX transformation
No build step: Direct browser execution

Browser Support

Modern browsers with ES6+ support
IndexedDB for caching
Fetch API for network requests

Node.js Support

Version 18+ recommended
File system caching
Native fetch support

🚦 Getting Started

Quick Start

Clone the repository
Open entities.html in a web browser
Start exploring Wikidata entities!

For Developers

# Run tests
bun run-tests.mjs

# Test n-gram features
bun transformation/test-ngram-demo.mjs

# Run comprehensive tests
bun comprehensive-test.mjs

# Run E2E tests
bun e2e-test.mjs

# Check limitations
bun limitation-test.mjs

Interactive Demos

Entity Viewer: Open entities.html
Property Viewer: Open properties.html
Text Transformer: Open transformation/index.html
Search Demo: Open search-demo.html
Browser Tests: Open run-tests.html

⚠️ Known Limitations

The text transformation system currently has some limitations:

Negation handling: Phrases with “not” aren’t properly processed
Question parsing: Direct questions (who, what, when) aren’t supported
Verb tenses: Past/future tenses may not be accurately captured
Pronoun resolution: Cannot resolve pronouns like “he”, “she”, “it”
Complex sentences: Struggles with subordinate clauses

See limitations-found.json for detailed test results.

📚 Documentation

📊 Performance & Testing

The project includes comprehensive test suites with excellent results:

API Pattern Tests: 100% success rate (8/8 tests passing)
N-gram Matching: Correctly identifies multi-word entities
Disambiguation: Handles ambiguous terms with multiple alternatives
Caching Efficiency: Significant performance improvements with persistent cache

Test results are stored in api-patterns.json showing real-world transformation examples.

🤝 Contributing

We welcome contributions! Check our issues for areas where you can help.

📄 License

This project is released into the public domain under The Unlicense.

This means you are free to:

Copy, modify, publish, use, compile, sell, or distribute this software
Use it for any purpose, commercial or non-commercial
Do so without any restrictions or attribution requirements

For more information, see The Unlicense

Building bridges between human languages through semantic understanding.

This site is open source. Improve this page.