Task: configure SharePoint search service so that it will be allowed to show only pages of such types: documents (.doc/.docx/.pdf), tables (.xls, .xslx), aspx-pages (.aspx).
Instruction:
- First of all, you need to compose regex for solving your task. In my case, regex would be:
1http://<host>/.*(.aspx|.doc(x)?|.xls(x)?|.pdf)I can advice you to use very-very helpful site https://regex101.com/ for composing and testing your regular expressions.
- Copy you regex and navigate to SharePoint Admin Center -> Services, find search service and got to Manage -> Crawl Rules. Add new crawl rule with include type (ATTENTION: remove backslashes from regex that you’ve got on step 1). Enable checkbox “Follow complex URLs also“!
- Click save. On page, were you were adding new rule, you also can test some links and see will be page covered with this rule. For example:
- Also, you have to add global exclude rule for all content sources on priorities after include rules (ATTENTION: add include and exclude rules for all content sources). In my case exclude rule regex will be:
1https://host/.* - Save all rules and run full index scan. Check crawled pages in crawl log.
- PROFIT!!!