Human Language Project
A sophisticated web application for transforming natural language into Wikidata entity and property sequences, enabling semantic understanding and knowledge representation.
🎯 Vision
The Human Language project aims to create a universal meta-language that bridges all human languages by leveraging Wikidata’s semantic knowledge graph. By converting natural language into sequences of entities (Q) and properties (P), we enable:
- Cross-linguistic understanding: Unified representation across all languages
- Semantic precision: Disambiguation of concepts using Wikidata’s rich ontology
- Knowledge integration: Direct connection to the world’s largest open knowledge base
- IPA support: Universal phonetic representation for true language unification
Long-term Impact
This project will fundamentally transform how we store, access, and verify human knowledge:
- Universal Encyclopedia: Merge all Wikipedia content into a single, instantly translatable knowledge base
- Language of Meaning: Create formal definitions for every concept, enabling perfect translation without AI
- Fact-Checking Foundation: Build the world’s largest facts database for verifying AI outputs
- Knowledge Preservation: Unify all article versions to preserve the best of human knowledge
- AI Training Dataset: Provide structured knowledge for next-generation neural networks
- Zero-Cost Translation: Enable LLMs to answer once and translate infinitely through semantic representation
🚀 Key Features
1. Text-to-Q/P Transformation
- N-gram support: Recognizes multi-word phrases as single entities
- Configurable matching: Adjust n-gram size (1-5) for optimal results
- Priority-based search: Longer matches take precedence
- Real-time transformation: Interactive web demo at
transformation/index.html
- Learn more →
2. Entity & Property Viewer
- Beautiful UI: Modern, responsive interface with dark/light themes
- Multi-language support: Automatic language detection and switching
- Rich statements display: View all properties and relationships
- Direct Wikidata links: Seamlessly navigate to source data
- View entities at
entities.html
and properties at properties.html
3. Advanced Search & Disambiguation
- Exact & fuzzy matching: Find entities even with typos
- Context-aware ranking: Domain and type preferences
- Batch searching: Efficient parallel searches
- Multi-language search: Search in any supported language
- API Documentation →
4. Intelligent Caching System
- Multi-tier caching: File system (Node.js) and IndexedDB (browser)
- Automatic fallback: Seamless switching between cache types
- Performance optimized: Reduces API calls and improves response times
- Cross-platform: Works in both Node.js and browser environments
- Persistent storage: Cached data survives across sessions
5. Comprehensive Language Support
- 100+ languages: Full support for all major Wikidata languages
- Locale-specific quotes: Proper quotation marks for each language
- Flag emojis: Visual language indicators for better UX
- Language persistence: Settings saved in localStorage
đź“‹ Roadmap
Based on our GitHub issues, here’s our development roadmap:
Phase 1: Core Infrastructure Enhancement
Phase 2: Enhanced Language Support
Phase 3: Advanced Features
Phase 4: External Integration
Phase 5: Advanced Knowledge Representation
Phase 6: Universal Encyclopedia Project
Phase 7: Fact-Checking Infrastructure
🏗️ Architecture Overview
Core Components
- Wikidata API Client (
wikidata-api.js
)
- Handles all Wikidata API interactions
- Configurable caching strategies
- Batch request optimization
- Text Transformer (
transformation/text-to-qp-transformer.js
)
- N-gram generation and matching
- Parallel search execution
- Priority-based result merging
- Search Utilities (
wikidata-api.js
)
- Exact and fuzzy search algorithms
- Context-aware ranking system
- Multi-language support
- Caching System (
unified-cache.js
)
- Factory pattern for cache creation
- File system cache for Node.js
- IndexedDB cache for browsers
- UI Components (
statements.jsx
, loading.jsx
)
- React 19 components with JSX
- No build step required (Babel in-browser)
- Responsive and theme-aware design
Data Flow
User Input → Text Transformer → N-gram Generator → Parallel Search
↓
Wikidata API
↓
Cache Layer
↓
Result Merger
↓
UI Display
🛠️ Technical Details
Dependencies
- React 19: Latest features via ESM.sh CDN
- Babel Standalone: In-browser JSX transformation
- No build step: Direct browser execution
Browser Support
- Modern browsers with ES6+ support
- IndexedDB for caching
- Fetch API for network requests
Node.js Support
- Version 18+ recommended
- File system caching
- Native fetch support
🚦 Getting Started
Quick Start
- Clone the repository
- Open
entities.html
in a web browser
- Start exploring Wikidata entities!
For Developers
# Run tests
bun run-tests.mjs
# Test n-gram features
bun transformation/test-ngram-demo.mjs
# Run comprehensive tests
bun comprehensive-test.mjs
# Run E2E tests
bun e2e-test.mjs
# Check limitations
bun limitation-test.mjs
Interactive Demos
- Entity Viewer: Open
entities.html
- Property Viewer: Open
properties.html
- Text Transformer: Open
transformation/index.html
- Search Demo: Open
search-demo.html
- Browser Tests: Open
run-tests.html
⚠️ Known Limitations
The text transformation system currently has some limitations:
- Negation handling: Phrases with “not” aren’t properly processed
- Question parsing: Direct questions (who, what, when) aren’t supported
- Verb tenses: Past/future tenses may not be accurately captured
- Pronoun resolution: Cannot resolve pronouns like “he”, “she”, “it”
- Complex sentences: Struggles with subordinate clauses
See limitations-found.json
for detailed test results.
📚 Documentation
The project includes comprehensive test suites with excellent results:
- API Pattern Tests: 100% success rate (8/8 tests passing)
- N-gram Matching: Correctly identifies multi-word entities
- Disambiguation: Handles ambiguous terms with multiple alternatives
- Caching Efficiency: Significant performance improvements with persistent cache
Test results are stored in api-patterns.json
showing real-world transformation examples.
🤝 Contributing
We welcome contributions! Check our issues for areas where you can help.
đź“„ License
This project is released into the public domain under The Unlicense.
This means you are free to:
- Copy, modify, publish, use, compile, sell, or distribute this software
- Use it for any purpose, commercial or non-commercial
- Do so without any restrictions or attribution requirements
For more information, see The Unlicense
Building bridges between human languages through semantic understanding.