CASEGen: A Benchmark for Mult | Pangram Labs