Has anyone a good robots.txt and .htaccess wordfile to share?

GeekDrop · Jan 31, 2018#12018-01-31T19:29+00:00

Topic says it all.

Mofi · Feb 04, 2018#22018-02-04T12:29+00:00

I wrote quickly a syntax highlighting wordfile robots.uew for files with name robots.txt using information read on Wikipedia article Robots exclusion standard and Web Root Pages About /robots.txt and Standard for Robot Exclusion.

Code: Select all

/L20"robots.txt" Nocase Noquote Line Comment = # File Names = robots.txt
/Delimiters =  #*/
/Function String = "%user-agent: ++^([~^p #]+^)"
/C1"Standard fields"
Disallow:
User-agent:
/C2"Nonstandard fields"
Allow:
Crawl-delay:
Host:
Sitemap:
/C3"Asterisk/slash"
*
// /

The delimiters * and / can be removed to get highlighted those two characters only when a space is on left side and a space or line termination or # on right side. * and / are highlighted also in middle of a user agent string or url on being added to set of delimiters.

Keyword Nocase can be removed to make syntax highlighting case-sensitive. Some robots interpret the few field names case-sensitive as defined in the wordfile while most interpret them case-insensitive as defined in standard for robot exclusion. The URLs are always interpreted by robots case-sensitive because of file systems on Unix/Linux web servers are case-sensitive too.

The space character between colon after field name and value string is just optional, but in my point of view highly recommended for better readability. The syntax highlighting definition above requires that at least one space exists on each line between colon and value string.

This variant of the wordfile robots.uew allows no space character between colon and value string and being therefore 100% conform to the standard for robot exclusion.

Code: Select all

/L20"robots.txt" Nocase Noquote Line Comment = # File Names = robots.txt
/Delimiters =  #*/:
/Function String = "%user-agent: ++^([~^p #]+^)"
/C1"Standard fields"
Disallow
User-agent
/C2"Nonstandard fields"
Allow
Crawl-delay
Host
Sitemap
/C3"Asterisk/slash"
*
// /
/C4"Colon"
:

The function string is defined in old style with UltraEdit regular expression syntax for maximum compatibility with older versions of UltraEdit listing in function list all user agents referenced in the file.