New AI classifier for indicating AI-written text

OpenAI has introduced a classifier that has been trained to differentiate between text generated by AI and text written by humans.

OpenAI has developed and trained a classifier with the purpose of distinguishing between text authored by humans and text generated by AI systems from various providers. While it is challenging to identify all AI-generated text with complete accuracy, we believe that effective classifiers can help identify and address false claims that AI-generated text was written by humans. These claims can range from running automated misinformation campaigns and utilizing AI tools for academic dishonesty to presenting an AI chatbot as a human.

It’s important to note that our classifier is not infallible. In evaluations conducted on an English text “challenge set,” our classifier accurately identified 26% of AI-generated text as “likely AI-written” (true positives), but also misclassified human-written text as AI-generated 9% of the time (false positives). The reliability of our classifier generally improves as the length of the input text increases. Compared to our previous classifier, this new version demonstrates significantly greater reliability when processing text from more recent AI systems.

To gather feedback on the usefulness of imperfect tools like this one, we are making this classifier available to the public. We continue to work on improving the detection of AI-generated text and aim to share enhanced methods in the future.

Limitations

OpenAI classifier has a number of important limitations. It should not be used as a primary decision-making tool, but instead as a complement to other methods of determining the source of a piece of text.

  1. The classifier is very unreliable on short texts (below 1,000 characters). Even longer texts are sometimes incorrectly labeled by the classifier.
  2. Sometimes human-written text will be incorrectly but confidently labeled as AI-written by our classifier.
  3. We recommend using the classifier only for English text. It performs significantly worse in other languages and it is unreliable on code.
  4. Text that is very predictable cannot be reliably identified. For example, it is impossible to predict whether a list of the first 1,000 prime numbers was written by AI or humans, because the correct answer is always the same.
  5. AI-written text can be edited to evade the classifier. Classifiers like ours can be updated and retrained based on successful attacks, but it is unclear whether detection has an advantage in the long-term.
  6. Classifiers based on neural networks are known to be poorly calibrated outside of their training data. For inputs that are very different from text in our training set, the classifier is sometimes extremely confident in a wrong prediction.

Training the classifier

OpenAI’s classifier is a language model that underwent fine-tuning using a dataset consisting of pairs of human-written text and AI-written text covering the same subject matter. The dataset was gathered from various sources that we have determined to be authored by humans, including pretraining data and human demonstrations provided for prompts submitted to InstructGPT. Each text pair was divided into a prompt and a response. In generating responses, we utilized a range of language models trained by both OpenAI and other organizations. To ensure a low false positive rate, we adjust the confidence threshold on our web app, meaning we only classify text as likely AI-written if the classifier displays a high level of confidence.

Impact on educators and call for input

OpenAI recognizes the significance of addressing AI-written text detection in educational settings and acknowledges the broader impact of AI-generated text classifiers in classrooms. As part of our commitment to responsible AI deployment, we have created an initial resource specifically tailored for educators. This resource highlights the potential applications, limitations, and considerations associated with using ChatGPT in an educational context.

While our focus is initially on engaging with educators, we understand that our classifier and associated tools will also have implications for journalists, researchers studying misinformation and disinformation, and other relevant groups.

OpenAI is actively engaging with educators in the United States to gather insights into their experiences and to foster discussions regarding ChatGPT’s capabilities and limitations. We are committed to expanding our outreach efforts as we continue to learn and develop our understanding of the challenges at hand. These conversations are crucial as we strive to deploy large language models in a safe manner, while maintaining direct engagement with affected communities.

If you are directly impacted by these issues, including but not limited to teachers, administrators, parents, students, and education service providers, we encourage you to share your feedback through the provided form. We greatly appreciate direct feedback on the preliminary resource, and we also welcome any additional resources that educators have developed or found useful. This may include course guidelines, updates to honor codes and policies, interactive tools, or AI literacy programs. Your contributions will help us refine our approaches and ensure that our AI technologies align with the needs and values of the education community.

 

Leave a Reply

Your email address will not be published. Required fields are marked *