Services
Solutions
Insights
AI tools
About
More
HumanEval benchmark - testing LLMs on coding
MATH benchmark - testing LLMs on math problems
GPQA benchmark - testing LLMs on graduate-level questions
MMLU benchmark - testing LLMs multi-task capabilities
MT Bench: Evaluating LLMs
Chatbot Arena: A Grassroot LLM Evaluation
Where AI meets business
Unsubscribe any time, no hard feelings 😊
Thanks for subscribing!