Question: What’s the difference between the “URLs Crawled” and the “Pages in Sitemap” values shown in Sitemap Creator (and why are they not the same)?
Answer: By default, Sitemap Creator only includes HTML and PDF content in your XML sitemap file. The “URLs Crawled” field shows the total number of files that have been crawled on your website (including images, CSS, and other content not included in the sitemap by default). “Pages in Sitemap” shows the number of files that are PDF or HTML and have been included.
Sitemap Creator can be configured to include all files (regardless of their content type) in your sitemap. To do this, open the Advanced Project Settings window by clicking the ‘Wrench’ button. Select the ‘Other’ tab and check the box marked ‘Include All Content Types’. (If you do not see this option, you may need to upgrade to the latest version of Sitemap Creator.)
If you wish to exclude certain files after enabling ‘Include All Content Types’, use the Exclusion List to restrict Sitemap Creator by creating rules such as:
*.xls (to exclude all files ending with .xls)