From 94de9053e6cc67dc394049d3a4ca510474c89d2c Mon Sep 17 00:00:00 2001 From: inference Date: Fri, 2 Feb 2024 19:19:55 +0000 Subject: [PATCH] Add file "robots.txt" "robots.txt" is a file which allows the website owner to disallow bots, crawlers, scrapers, and other potentially malicious or unwanted behaviour on their website. --- robots.txt | 43 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) create mode 100644 robots.txt diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..9fe47ef --- /dev/null +++ b/robots.txt @@ -0,0 +1,43 @@ +# Inferencium - Website - robots.txt +# Version: 1.0.0-beta.1 + +# Copyright 2024 Jake Winters +# SPDX-License-Identifier: BSD-3-Clause + + +# ChatGPT +User-agent: ChatGPT-User +Disallow: / + +User-agent: GPTbot +Disallow: / + + +# Google Bard +User-agent: Google-Extended +Disallow: / + + +# iThenticate (http://www.slysearch.com/) +## A tool which crawls the internet in search of copyright and intellectual property violations +## which may be of interest to clients. These tools have no right to scan my website for such +## purposes. +User-agent: SlySearch +Disallow: / + + +# NameProtect (http://www.nameprotect.com/botinfo.html) +## A tool which crawls the internet in search of brand and intellectual property violations which +## may be of interest to clients. These tools have no right to scan my website for such purposes. +User-agent: NPBot +Disallow: / + + +# Turnitinbot (http://www.turnitin.com/robot/crawlerinfo.html) +## A tool to scan the internet to allow educational institutions to compare content against +## students' work in order to prevent plagiarism. These tools promote a bad precedence against +## open-source content as it may be marked as copyrighted/plagiarised when it's actually legally +## available for use under the copyright holder's license. I allow complete usage of my content for +## educational purposes, without exception. +User-agent: Turnitinbot +Disallow: /