RM-R1: ReWard MODELING AS REA | Pangram Labs