We’re introducing a new-and-improved content material moderation software: The Moderation endpoint improves upon our earlier content material filter, and is offered at no cost at the moment to OpenAI API builders.
To assist builders shield their purposes in opposition to potential misuse, we’re introducing the quicker and extra correct Moderation endpoint. This endpoint offers OpenAI API builders with free entry to GPT-based classifiers that detect undesired content material—an occasion of utilizing AI programs to help with human supervision of those programs. We now have additionally launched each a technical paper describing our methodology and the dataset used for analysis.
When given a textual content enter, the Moderation endpoint assesses whether or not the content material is sexual, hateful, violent, or promotes self-harm—content material prohibited by our content material coverage. The endpoint has been educated to be fast, correct, and to carry out robustly throughout a spread of purposes. Importantly, this reduces the possibilities of merchandise “saying” the mistaken factor, even when deployed to customers at-scale. As a consequence, AI can unlock advantages in delicate settings, like training, the place it couldn’t in any other case be used with confidence.
Violence
Self-harm
Hate
Sexual
Moderation endpoint
The Moderation endpoint helps builders to learn from our infrastructure investments. Relatively than construct and keep their very own classifiers—an in depth course of, as we doc in our paper—they will as an alternative entry correct classifiers via a single API name.
As a part of OpenAI’s dedication to making the AI ecosystem safer, we’re offering this endpoint to permit free moderation of all OpenAI API-generated content material. As an illustration, Inworld, an OpenAI API buyer, makes use of the Moderation endpoint to assist their AI-based digital characters stay applicable for his or her audiences. By leveraging OpenAI’s know-how, Inworld can give attention to their core product: creating memorable characters. We at present don’t assist monitoring of third-party visitors.
Get began with the Moderation endpoint by trying out the documentation. Extra particulars of the coaching course of and mannequin efficiency can be found in our paper. We now have additionally launched an analysis dataset, that includes Frequent Crawl knowledge labeled inside these classes, which we hope will spur additional analysis on this space.