An R-based tool for batch searching Spanish public telephone directories by surname. It scrapes surname lists, generates gender-aware variants, constructs search URLs for multiple directory services, and writes them to batch files for systematic querying.
Originally written during the early days of COVID-19 (2020) and released publicly in 2025.
- Multi-service support -- queries both Guiatel/Infobel and ABCtelefonos directory services.
- Gender-aware surname variants -- automatically generates feminine forms for surnames following Slavic naming conventions (e.g., appending
-ato surnames ending in-v). - Batch file generation -- splits search URLs into numbered batch files to facilitate manual querying and avoid rate limiting.
- Multi-region coverage -- targets six Spanish provinces/regions in a single run.
- Web scraping pipeline -- extracts surname lists and directory indexes directly from web sources using
rvest.
| Dependency | Version | Purpose |
|---|---|---|
| R | >= 3.6 | Runtime |
| tidyverse | latest | Data manipulation and functional utilities |
| rvest | latest | HTML parsing and web scraping |
| stringi | latest | String processing and substring operations |
Install all dependencies in R:
install.packages(c("tidyverse", "rvest", "stringi"))- Clone the repository:
git clone https://github.com/GeiserX/search-by-surname.git
cd search-by-surname-
Open
search.Rand adjust the output paths inwrite_lines()calls to match your local directory structure. By default, the script writes to a Windows path under Google Drive. -
Run the script in R or RStudio:
source("search.R")The script will:
- Scrape surname lists from the configured web source.
- Generate gender-aware surname variants.
- Build search URLs for each surname and region.
- Write numbered batch files to the specified output directories.
| Region | Guiatel/Infobel | ABCtelefonos |
|---|---|---|
| Zaragoza | Yes | Yes |
| Murcia | Yes | Yes |
| Granada | Yes | Yes |
| Asturias | Yes | Yes |
| Almeria | Yes | Yes |
| Albacete | Yes | Yes |
- Surname lists -- scraped from publicly available wiki pages with surname databases.
- Guiatel / Infobel -- Spanish white pages telephone directory (
blancas.paginasamarillas.es). - ABCtelefonos -- independent Spanish telephone directory (
abctelefonos.com).
- Output paths are hardcoded and must be manually adjusted before running.
- The Guiatel/Infobel URL construction is commented out in the source; it requires uncommenting and may need updating if the service has changed its URL structure since 2020.
- No built-in rate limiting or request throttling -- batch files are intended for manual use.
- Surname source is specific to Bulgarian surnames; adapting to other origins requires changing the scraping source.
- The targeted directory services may have changed their structure, imposed CAPTCHAs, or shut down since the script was originally written.
This tool queries publicly available telephone directory services. It is provided strictly for educational and research purposes. Users are responsible for complying with all applicable laws and the terms of service of the queried platforms. The author assumes no liability for misuse.
Automated scraping of directory services may violate their terms of service. Use responsibly and at your own risk.
This project is licensed under the GNU General Public License v3.0.