The Web-based Systems Group at the University of Mannheim conducts research on methods for integrating data from large numbers of data sources in the context of the open Web and in corporate data lakes. Our research includes areas such as entity matching, schema matching, table annotation, information extraction, and data discovery. Our current work focuses on utilizing large language models and LLM-based agents for data integration tasks. We apply the developed methods to integrate product data from large numbers of e-shops and to construct knowledge graphs such as DBpedia. The empirical research of the group includes monitoring the adoption of schema.org annotations on the public Web by regularly extracting structured data from large Web corpora.
Web-based Systems Group @ University of Mannheim
Pinned Loading
Repositories
- PyDI Public
The PyDI framework provides methods for end-to-end data integration. The framework covers all steps of the integration process, including schema matching, data translation, entity matching, and data fusion. The framework offers traditional string-based methods as well as modern LLM- and embedding-based techniques for these tasks.
wbsg-uni-mannheim/PyDI’s past year of commit activity - winter Public Forked from olehmberg/winter
WInte.r is a Java framework for end-to-end data integration. The WInte.r framework implements well-known methods for data pre-processing, schema matching, identity resolution, data fusion, and result evaluation.
wbsg-uni-mannheim/winter’s past year of commit activity - WebMall Public
This repository contains the code and data of the WebMall benchmark for evaluating the capability of Web agents to find and compare product offers from multiple e-shops.
wbsg-uni-mannheim/WebMall’s past year of commit activity - WebMall-Interfaces Public
Modern LLM agents interact with the web through various architectures - from traditional browser automation to API-based approaches. This project provides implementation and evaluation code to systematically compare their effectiveness across 91 realistic e-commerce scenarios.
wbsg-uni-mannheim/WebMall-Interfaces’s past year of commit activity - AgentLab Public Forked from ServiceNow/AgentLab
AgentLab: An open-source framework for developing, testing, and benchmarking web agents on diverse tasks, designed for scalability and reproducibility.
wbsg-uni-mannheim/AgentLab’s past year of commit activity - SubsetCreatorJupyterNBs Public
Jupyter notebooks used to create the schema.org subsets from the MD and JSON-LD corpus for the WDC 2020 structured data extraction.
wbsg-uni-mannheim/SubsetCreatorJupyterNBs’s past year of commit activity - wdc-page Public
This repository contains the source files of the Web Data Commons website and is used to maintain the site. The Web Data Commons project extracts structured data from the Common Crawl
wbsg-uni-mannheim/wdc-page’s past year of commit activity - BrowserGym Public Forked from ServiceNow/BrowserGym
🌎💪 BrowserGym, a Gym environment for web task automation
wbsg-uni-mannheim/BrowserGym’s past year of commit activity - TailorMatch Public
This repository contains code and comprehensive examples to replicate and build upon the experiments presented in our paper “Fine-tuning Large Language Models for Entity Matching” The repository provides resources for implementing fine-tuning techniques on large language models specifically for entity matching tasks.
wbsg-uni-mannheim/TailorMatch’s past year of commit activity
Top languages
Loading…
Most used topics
Loading…