Description
I have a simple UI in Tkinter
, which fixes several issues, WITHOUT changing the core library. If you are interested, it does show some interesting things you can do with icrawler. Yes it might seem like a mess, but if you are already using icrawler it should be clear. I can write python, and I am learning tkinter, but suggestions are welcome on my Issues list. Most things work and I want to add more.
I forked the whole project in case I needed to do fixes, but the UI is all in /examples/
- FileTypes.py
- FilenameDownloader.py
- GoogleLanguageOptions.py
- iCrawlerTK.py
- iCrawlerTK.yaml
- logging.conf
https://github.com/Patty-OFurniture/icrawler
#98 - keep_file()
override in FilenameDownloader checks file type, you can return False if extension != "jpg"
#111 - example how to override set_logger()
for full control (commented out for me)
#108 - get file name (from Content-Disposition or URL)
#108 - also log (INFO) image #, filename, URL. You can change the formatting, log to a file, or whatever else you want
#117 and #107- log (DEBUG) the Google content if no images are found to help resolve, if it's still a problem
#110 - a similar log could be done for Bing. Not implemented, but easily copied (google.py)
#106 - a keyword separator option, so you san enter, for example: "beans|rice" and search first "beans" then "rice", separately
#103 - google language selection fix should help Baidu, since it adds headers to look more like a web browser and avoid getting flagged.
#104 - google language selection should help. Common languages are in GoogleLanguageOptions.py, add to it if you need to
#61 - sort of fixed, it creates a directory for each keyword. "rice" goes in storage/rice/, "beans" in storage/beans/ - hopefully it is a good example.
#121 - a better, but not perfect, check for disk space errors, in the core library
Also image type detection for #108, finding the correct file extension
Thanks to hellock for the library, I'm just making it easier for me to use!
Have fun!
Patty