GTPO AND GRPO-S: TOKEN AND SE | Pangram Labs