Generate a sitemap.xml file for your Neocities site using my simple Sitemap Generator script.

A sitemap.xml file allows you to list and prioritise all your pages so search engines so search engines can already find them. Although most crawlers index all your files automatically, they don't grant you precise control over the ranking of your pages. Plus, an automated listing of pages comes in handy for other purposes - like my random page functionality.

Requires Python 3.5+

Usage

Run as a Python script:python3 sitemap.py

Unlike most sitemap generators, mine operates using a local site directory. It outputs a sitemap.xml file in said directory, ready to upload to Neocities or a similar service. It is not designed for operation with more advanced setups; there are plenty of great programs that can already do that.

Note that a sitemap.xml file is distinct from a sitemap HTML page. Users cannot interact with a XML file; it is provided exclusively for the usage of other computers.

It is recommended that you use a robots.txt file to point crawlers towards the sitemap; a simple example is provided below:

User-agent: *
Allow: /

Sitemap: http://novov.neocities.org/sitemap.xml

Configuration

Configuration is achieved by a sitemap.json file in the same directory as the script. If you don't know how to use JSON, the Wikipedia page is a great starting resource.

All of the entries below are mandatory; there are no default values.

priorities
A dictionary containing a file path beginning from the site root (/) as the key, and a decimal number from 0 to 1 as the value. Each value indicates a priority for its respective page or folder. Priorities under 0.1 will not be included in the output, and unspecified files will have a value of 0.5.
automate_priorities
A boolean (true/false) value. If true, files in the root directory and index.html files will have their default priority incremented by 0.1 (the homepage being increased by 0.2).
site_path
The root folder of the site on your computer, i.e. where you store the files that you upload. This is an absolute path, beginning from the topmost directory on your computer.
site_url
The URI/URL that visitors use to access your site's content.
output_path
A path (including filename) indicating the output location of sitemap.xml. Browsers expect it in the root directory; therefore a value of merely sitemap.xml should suffice for almost all uses.

Example

{
	"site_path": "/Users/Novov/Sites/Neocities/",
	"site_url": "https://novov.neocities.org",
	"output_path": "sitemap.xml",
	"priorities": {
		"/resources": 0,
		"/images": 0,
		"/fonts": 0,
		"/styles": 0,
		"/not_found.html": 0,
		"/random.html": 0,
		"/gallery/noaccessa.html": 0.1,
		"/gallery/noaccessb.html": 0.1,
		"/gallery/noaccessc.html": 0.1,
		"/gallery/wca.html": 0.1,
		"/gallery/wcb.html": 0.1,
		"/gallery/infodesk.html": 0.1
	},
	"automate_priorities": "true"
}

This configuration file produces the results at https://novov.neocities.org/sitemap.xml.