Ensure the following are installed:
- Python 3.x
- Required Python libraries:
pandas
beautifulsoup4
selenium
openpyxl
Install the libraries using:
pip install pandas beautifulsoup4 selenium openpyxl
- Chrome WebDriver: Download the Chrome WebDriver that matches your Chrome version and place it in a known directory.
- input_file: The script reads tracking numbers from an Excel file named
input_unknown.xlsx
. Ensure this file is in the same directory as the script and contains a column namedref_number
.
- chrome_driver_path: Update the path to your Chrome WebDriver executable:
chrome_driver_path = 'D:\chromedriver-win64\chromedriver.exe'
- proxy_address: If you need to use a proxy, set the proxy address. The default is
socks5://127.0.0.1:8443
.
proxy_address='socks5://127.0.0.1:8443'
- The script automatically generates an output file named
tracking_resultsN.xlsx
, whereN
is a sequential number starting from 1.
After configuring the parameters, run the script. It will:
- Read tracking numbers from the input Excel file.
- Check for already processed tracking numbers.
- Fetch tracking information for each number using Selenium.
- Write the results to the output Excel file.
Run the script using:
python your_script_name.py
The script logs its progress and results to the console, allowing you to monitor any errors during execution.
- The script is designed to handle multiple tracking numbers concurrently.
- Ensure a stable network connection.
- Adjust the
max_workers
parameter in theThreadPoolExecutor
to optimize performance based on your system capabilities.