$\tau^{2}$-BENCH: EVALUATING | Pangram Labs