Generate a sitemap.xml file for your Neocities site using my simple Sitemap Generator script.
A sitemap.xml
file allows you to list and prioritise all your pages so search engines so search engines can already find them. Although most crawlers index all your files automatically, they don't grant you precise control over the ranking of your pages. Plus, an automated listing of pages comes in handy for other purposes - like my random page functionality.
Requires Python 3.5+
Usage
Run as a Python script:python3 sitemap.py
Unlike most sitemap generators, mine operates using a local site directory. It outputs a sitemap.xml
file in said directory, ready to upload to Neocities or a similar service. It is not designed for operation with more advanced setups; there are plenty of great programs that can already do that.
Note that a sitemap.xml
file is distinct from a sitemap HTML page. Users cannot interact with a XML file; it is provided exclusively for the usage of other computers.
It is recommended that you use a robots.txt
file to point crawlers towards the sitemap; a simple example is provided below:
User-agent: * Allow: / Sitemap: http://novov.neocities.org/sitemap.xml
Configuration
Configuration is achieved by a sitemap.json
file in the same directory as the script. If you don't know how to use JSON, the Wikipedia page is a great starting resource.
All of the entries below are mandatory; there are no default values.
priorities
- A dictionary containing a file path beginning from the site root (
/
) as the key, and a decimal number from 0 to 1 as the value. Each value indicates a priority for its respective page or folder. Priorities under 0.1 will not be included in the output, and unspecified files will have a value of 0.5. automate_priorities
- A boolean (true/false) value. If true, files in the root directory and
index.html
files will have their default priority incremented by 0.1 (the homepage being increased by 0.2). site_path
- The root folder of the site on your computer, i.e. where you store the files that you upload. This is an absolute path, beginning from the topmost directory on your computer.
site_url
- The URI/URL that visitors use to access your site's content.
output_path
- A path (including filename) indicating the output location of
sitemap.xml
. Browsers expect it in the root directory; therefore a value of merelysitemap.xml
should suffice for almost all uses.
Example
{ "site_path": "/Users/Novov/Sites/Neocities/", "site_url": "https://novov.neocities.org", "output_path": "sitemap.xml", "priorities": { "/resources": 0, "/images": 0, "/fonts": 0, "/styles": 0, "/not_found.html": 0, "/random.html": 0, "/gallery/noaccessa.html": 0.1, "/gallery/noaccessb.html": 0.1, "/gallery/noaccessc.html": 0.1, "/gallery/wca.html": 0.1, "/gallery/wcb.html": 0.1, "/gallery/infodesk.html": 0.1 }, "automate_priorities": "true" }
This configuration file produces the results at https://novov.neocities.org/sitemap.xml.