
GPT4 With Reflexion Has a Superior Coding Score
A slightly improved Reflexion-based GPT-4 agent achieves state-of-the-art pass@1 results (88%) on HumanEval, outperforming GPT-4 (67.0%) and CodeT: Code Generation with Generated Tests (65.8%), which were the previous state-of-the-art standards. Relaxing Success Evaluation By using Reflexion to iteratively refine the current implementation, researchers are shifting the “accuracy bottleneck” from correct syntactic and semantic code generation …
Read More
0