This module provides advanced search and disambiguation capabilities for Wikidata entities and properties. It supports exact word sequence matching, fuzzy search, and context-aware disambiguation.
- Exact Match Search - Find entities and properties that match the exact word sequence
- Fuzzy Search - Find matches with partial or misspelled queries
- Disambiguation Search - Combines exact and fuzzy matching with intelligent ranking
- Context-Aware Search - Uses domain and type preferences for better ranking
- Exact vs Fuzzy Matching - Prioritizes exact matches over fuzzy ones
- Type Filtering - Search for entities, properties, or both
- Language Support - Multi-language search capabilities
- Context Ranking - Domain-aware and type-aware result ranking
The main search utility class that provides all search functionality.
const searchUtility = new WikidataSearchUtility(apiClient, cacheManager, dataProcessor);Search for entities and properties with exact word sequence matching.
Parameters:
query(string) - Exact word sequence to search forlanguages(string) - Languages to search in (default: 'en')limit(number) - Maximum number of results (default: 50)type(string) - Type to search for: 'item', 'property', or 'both' (default: 'both')
Returns: Promise
{
entities: Array,
properties: Array,
total: number
}Search for entities and properties with fuzzy matching.
Parameters: Same as searchExactMatch
Returns: Same structure as searchExactMatch
Enhanced disambiguation search that combines exact and fuzzy matching.
Parameters: Same as searchExactMatch
Returns: Promise
{
exact: Array, // Exact matches
fuzzy: Array, // Fuzzy matches
combined: Array, // All results combined
total: number // Total number of results
}Context-aware search with domain and type preferences.
Parameters:
query(string) - Query string to search forcontext(Object) - Context information for rankingdomain(string) - Domain context for rankingpreferredTypes(Array) - Array of preferred types
languages(string) - Languages to search in (default: 'en')limit(number) - Maximum number of results (default: 50)type(string) - Type to search for (default: 'both')
Returns: Same structure as disambiguateSearch
import { searchUtility } from './wikidata-api.js';
// Exact match search
const results = await searchUtility.searchExactMatch('Albert Einstein', 'en', 10);
console.log(`Found ${results.total} results`);
// Fuzzy search
const fuzzyResults = await searchUtility.searchFuzzy('Einstien', 'en', 10);
console.log(`Found ${fuzzyResults.total} results`);// Disambiguation search for ambiguous terms
const parisResults = await searchUtility.disambiguateSearch('Paris', 'en', 20);
console.log(`Found ${parisResults.total} results`);
console.log(`Exact matches: ${parisResults.exact.length}`);
console.log(`Fuzzy matches: ${parisResults.fuzzy.length}`);
// Access combined results with match types
parisResults.combined.forEach(result => {
console.log(`${result.label} (${result.id}) - ${result.matchType} match`);
});// Search with domain context
const physicsContext = {
domain: 'physics',
preferredTypes: ['item']
};
const physicsResults = await searchUtility.searchWithContext('Einstein', physicsContext, 'en', 10);
console.log(`Found ${physicsResults.total} physics-related results`);
// Search with property preference
const propertyContext = {
preferredTypes: ['property']
};
const propertyResults = await searchUtility.searchWithContext('instance', propertyContext, 'en', 10);
console.log(`Found ${propertyResults.total} property results`);// Search only for entities
const entityResults = await searchUtility.searchExactMatch('Einstein', 'en', 10, 'item');
console.log(`Found ${entityResults.entities.length} entities`);
// Search only for properties
const propertyResults = await searchUtility.searchExactMatch('instance of', 'en', 10, 'property');
console.log(`Found ${propertyResults.properties.length} properties`);// Search in different languages
const enResults = await searchUtility.searchExactMatch('Einstein', 'en', 5);
const deResults = await searchUtility.searchExactMatch('Einstein', 'de', 5);
const frResults = await searchUtility.searchExactMatch('Einstein', 'fr', 5);
console.log(`English: ${enResults.total} results`);
console.log(`German: ${deResults.total} results`);
console.log(`French: ${frResults.total} results`);{
entities: [
{
id: "Q937",
label: "Albert Einstein",
description: "German-born theoretical physicist",
url: "https://www.wikidata.org/wiki/Q937"
}
],
properties: [
{
id: "P31",
label: "instance of",
description: "that class of which this subject is a particular example and member",
url: "https://www.wikidata.org/wiki/P31"
}
],
total: 2
}{
exact: [
{
id: "Q937",
label: "Albert Einstein",
description: "German-born theoretical physicist",
matchType: "exact"
}
],
fuzzy: [
{
id: "Q12345",
label: "Einstein Institute",
description: "Research institute",
matchType: "fuzzy"
}
],
combined: [...], // All results combined
total: 2
}import { WikidataSearchTest } from './search-test.js';
const testSuite = new WikidataSearchTest();
await testSuite.runAllTests();import { demonstrateSearchFeatures } from './search-test.js';
await demonstrateSearchFeatures();Open search-demo.html in a web browser to interact with the search functionality through a user-friendly interface.
The search functions throw errors for various failure scenarios:
try {
const results = await searchUtility.searchExactMatch('query');
} catch (error) {
console.error('Search failed:', error.message);
}Common error scenarios:
- Network connectivity issues
- Invalid query parameters
- API rate limiting
- Malformed responses
The search functionality leverages the existing caching system for:
- Entity and property data
- Label information
- Search results
- Respects Wikidata API rate limits
- Implements batch processing for large queries
- Uses efficient pagination
- Use specific types when possible (e.g., 'item' instead of 'both')
- Limit result counts to reasonable numbers
- Use context-aware search for better relevance
- Cache frequently searched terms
The search functionality works in all modern browsers that support:
- ES6 modules
- Fetch API
- IndexedDB (for caching)
- Async/await
For Node.js environments, you may need to:
- Use a fetch polyfill (Node.js < 18)
- Use an IndexedDB polyfill or disable caching
- Handle CORS differently
When adding new search features:
- Follow the existing code structure
- Add comprehensive tests
- Update documentation
- Consider performance implications
- Test with various query types and languages