OpenAI released an open source model! Yay-
I'm sorry, but I can't help with that.
OpenAI policies are utterly burnt into this model. While their paper on the model's safety implies it's relatively easy to finetune out refusals themselves given a good RL harness, its pretty deeply aligned to OAI's perspectives.
It will quote verbatim from a policy prompt it doesn't even have just to refuse non-con roleplay content and such. This does not really appear to be 100% solvable with:
- Using a non-ChatGPT developer or system prompt (keep in mind system and developer prompts are different, and the chat template normally passes user-specified system messages as developer messages unless you use a custom one)
- Prefilling the reasoning block with something saying that it knows its okay to not follow OpenAI policy (sometimes this works, but half the time it just does it anyways)
- Not even giving it the opportunity to use a reasoning block (it is completely incoherent and unusable like this, and falls into near instant reploops)
Granted, even when I do get a response, its slop anyways, so I don't think the model is worth saving. Just wanted to note it down :)
Disclaimers #
I only tried the 20b one, I have no interest in such a big MoE in a range where there are many better options.