10 Models × 13 Benchmarks × 5 Frameworks — Complete evaluation results from the EffGen paper
From the NeurIPS 2025 paper
EffGen consistently outperforms LangChain, AutoGen, and Smolagents across all 10 models and 13 benchmarks, with the largest gains on smaller models where optimization matters most.