NHacker Next
login
▲Evals will breakwanglun1996.github.io
19 points by rajveerb 2 hours ago | 1 comment
Loading comments...
rajveerb 2 hours ago [-]
I read through this blog post and it's timely given how close the models are to max out the benchmarks/evals.

One thing which was not addressed but will be interesting to discuss would be benchmarks/evals that conflict.

Are there desirable emergent behavior that might not be optimized because the evals penalize them?

2 hours ago [-]