human-language

Human Language Project

A sophisticated web application for transforming natural language into Wikidata entity and property sequences, enabling semantic understanding and knowledge representation.

🎯 Vision

The Human Language project aims to create a universal meta-language that bridges all human languages by leveraging Wikidata’s semantic knowledge graph. By converting natural language into sequences of entities (Q) and properties (P), we enable:

Long-term Impact

This project will fundamentally transform how we store, access, and verify human knowledge:

  1. Universal Encyclopedia: Merge all Wikipedia content into a single, instantly translatable knowledge base
  2. Language of Meaning: Create formal definitions for every concept, enabling perfect translation without AI
  3. Fact-Checking Foundation: Build the world’s largest facts database for verifying AI outputs
  4. Knowledge Preservation: Unify all article versions to preserve the best of human knowledge
  5. AI Training Dataset: Provide structured knowledge for next-generation neural networks
  6. Zero-Cost Translation: Enable LLMs to answer once and translate infinitely through semantic representation

🚀 Key Features

1. Text-to-Q/P Transformation

2. Entity & Property Viewer

3. Advanced Search & Disambiguation

4. Intelligent Caching System

5. Comprehensive Language Support

đź“‹ Roadmap

Based on our GitHub issues, here’s our development roadmap:

Phase 1: Core Infrastructure Enhancement

Phase 2: Enhanced Language Support

Phase 3: Advanced Features

Phase 4: External Integration

Phase 5: Advanced Knowledge Representation

Phase 6: Universal Encyclopedia Project

Phase 7: Fact-Checking Infrastructure

🏗️ Architecture Overview

Core Components

  1. Wikidata API Client (wikidata-api.js)
    • Handles all Wikidata API interactions
    • Configurable caching strategies
    • Batch request optimization
  2. Text Transformer (transformation/text-to-qp-transformer.js)
    • N-gram generation and matching
    • Parallel search execution
    • Priority-based result merging
  3. Search Utilities (wikidata-api.js)
    • Exact and fuzzy search algorithms
    • Context-aware ranking system
    • Multi-language support
  4. Caching System (unified-cache.js)
    • Factory pattern for cache creation
    • File system cache for Node.js
    • IndexedDB cache for browsers
  5. UI Components (statements.jsx, loading.jsx)
    • React 19 components with JSX
    • No build step required (Babel in-browser)
    • Responsive and theme-aware design

Data Flow

User Input → Text Transformer → N-gram Generator → Parallel Search
                                                           ↓
                                                    Wikidata API
                                                           ↓
                                                     Cache Layer
                                                           ↓
                                                   Result Merger
                                                           ↓
                                                    UI Display

🛠️ Technical Details

Dependencies

Browser Support

Node.js Support

🚦 Getting Started

Quick Start

  1. Clone the repository
  2. Open entities.html in a web browser
  3. Start exploring Wikidata entities!

For Developers

# Run tests
bun run-tests.mjs

# Test n-gram features
bun transformation/test-ngram-demo.mjs

# Run comprehensive tests
bun comprehensive-test.mjs

# Run E2E tests
bun e2e-test.mjs

# Check limitations
bun limitation-test.mjs

Interactive Demos

⚠️ Known Limitations

The text transformation system currently has some limitations:

  1. Negation handling: Phrases with “not” aren’t properly processed
  2. Question parsing: Direct questions (who, what, when) aren’t supported
  3. Verb tenses: Past/future tenses may not be accurately captured
  4. Pronoun resolution: Cannot resolve pronouns like “he”, “she”, “it”
  5. Complex sentences: Struggles with subordinate clauses

See limitations-found.json for detailed test results.

📚 Documentation

📊 Performance & Testing

The project includes comprehensive test suites with excellent results:

Test results are stored in api-patterns.json showing real-world transformation examples.

🤝 Contributing

We welcome contributions! Check our issues for areas where you can help.

đź“„ License

This project is released into the public domain under The Unlicense.

This means you are free to:

For more information, see The Unlicense


Building bridges between human languages through semantic understanding.