Gooning Thoughts on GPT-OSS (about 2 hours later)

OpenAI released an open source model! Yay-
I'm sorry, but I can't help with that.

OpenAI policies are utterly burnt into this model. While their paper on the model's safety implies it's relatively easy to finetune out refusals themselves given a good RL harness, its pretty deeply aligned to OAI's perspectives.

It will quote verbatim from a policy prompt it doesn't even have just to refuse non-con roleplay content and such. This does not really appear to be 100% solvable with:

Using a non-ChatGPT developer or system prompt (keep in mind system and developer prompts are different, and the chat template normally passes user-specified system messages as developer messages unless you use a custom one)
Prefilling the reasoning block with something saying that it knows its okay to not follow OpenAI policy (sometimes this works, but half the time it just does it anyways)
Not even giving it the opportunity to use a reasoning block (it is completely incoherent and unusable like this, and falls into near instant reploops)

Granted, even when I do get a response, its slop anyways, so I don't think the model is worth saving. Just wanted to note it down :)

Disclaimers #

I only tried the 20b one, I have no interest in such a big MoE in a range where there are many better options.

#llm #ml

last updated: 2026-01-27