TRANSITIVE RL: Value Learning | Pangram Labs