Skip to content

Sample Document URL Schemes

Andy Jackson edited this page Aug 10, 2015 · 2 revisions

The “Document URL Scheme” serves as a filter that ensures that unwanted PDFs are not included in the Watched Target crawl. The general rule for defining a Document URL Scheme is to start with the URL of an interesting Document, remove the leading or , and then trim the result string to the part that you expect to be common to all related PDFs.

Several examples of this process are given below:

Property URL
Document URL http://www.ofsted.gov.uk/sites/default/files/documents/surveys-and-good-practice/a/Are%20you%20ready%20Good%20practice%20in%20school%20readiness.pdf
Document URL Scheme www.ofsted.gov.uk/sites/default/files
Document URL http://www.ifs.org.uk/uploads/publications/comms/r96.pdf
Document URL Scheme www.ifs.org.uk/uploads/publications/comms
Document URL https://www.gov.uk/government/uploads/system/uploads/attachment\_data/file/325491/family-resources-survey-statistics-2012-2013.pdf
Document URL Scheme www.gov.uk/government/uploads/system/uploads/attachment\_data/file
Document URL http://bmdynamics.com/issue\_pdf/bmd110501-%2001-14.pdf
Document URL Scheme bmdynamics.com/issue_pdf
Document URL http://www.cipd.co.uk/binaries/labour-market-outlook\_2014-summer.pdf
Document URL Scheme www.cipd.co.uk/binaries

Clone this wiki locally