LLM Agents

[SICA] A Self-Improving Coding Agent🔗

Arxiv: https://arxiv.org/abs/2504.15228 14 Apr 2024

The Self-Improving Coding Agent (SICA) is a framework where an agent autonomously edits its own Python codebase to improve performance on benchmarks. Similar to AlphaDev's search for optimal primitives, SICA uses empirical evaluation to evolve its tools and reasoning capabilities without human intervention or gradient-based learning.

The core loop involves benchmarking the current agent (e.g., on SWE-Bench), archiving results, and selecting the best-performing version to act as a meta-agent.

graph TD
    Bench[Benchmark Agent] --> Archive[(Archive Results)]
    Archive --> Select[Select Best as Meta-Agent]
    Select --> Analyze{Analyze Traces}
    Analyze --> Identify[Identify Failures & Tools]
    Identify --> SubAgents[Invoke Sub-Agents]
    SubAgents --> Refactor[Edit agent_code/]
    Refactor --> Overseer{Overseer Safety}
    Overseer --> |Pass| Updated[Updated Agent]
    Overseer --> |Fail| SubAgents
    Updated --> Bench

How Improvement Happens: After running benchmarks, the meta-agent analyzes the execution traces, identifying specific successes and failures. It doesn't just "guess" updates; it:

Analyzes Traces: Investigates where the agent got stuck or failed.
Identifies Improvements: Pinpoints needed tools, like smarter AST-based symbol locators or diff minimizers.
Invokes Sub-Agents: Calls specialized agents (e.g., software developer, archive explorer) to propose, implement, test, and verify code changes in its own agent_code/ directory.

Safety and Continuity: To prevent the agent from getting stuck in infinite loops or unproductive paths, SICA employs an overseer component. This trick ensures the agent remains on track and within safety boundaries while iteratively refactoring its own logic.

Results show significant performance gains, jumping from 17% to 53% on SWE-Bench Verified subsets, demonstrating the power of autonomous, reflection-driven code updates.

[AutoGen] Enabling Next-Gen LLM Applications via Multi-Agent Chat🔗

Arxiv: https://arxiv.org/abs/2308.08155 3 Oct 2023 Microsoft

AutoGen is an open-source framework that allows developers to build LLM applications via multiple agents that can converse with each other to accomplish tasks. The framework enables:

Multi-agent conversations where agents can collaborate to solve complex tasks
Customizable agent behaviors and capabilities
Flexible agent interaction patterns
Integration with various LLM backends
Support for both synchronous and asynchronous communication

Key Features:

Conversable Agents: Agents that can engage in natural language conversations
Task-Oriented Dialogues: Structured conversations aimed at completing specific tasks
Dynamic Agent Teams: Ability to form and modify agent teams based on task requirements
Extensible Architecture: Easy integration of new agent types and capabilities

[RetroFormer] Retrospective LL Agents with Policy Gradient Optimization🔗

Arxiv: https://arxiv.org/abs/2308.02151 4 Aug 2023 Salesforce

This paper introduces Retroformer, a principled framework for reinforcing language agents by learning a plug-in retrospective model, which automatically refines the language agent prompts from environment feedback through policy optimization. Specifically, our proposed agent architecture can learn from arbitrary reward information across multiple environments and tasks, for iteratively fine-tuning a pre-trained language model, which refines the language agent prompts by reflecting on failed attempts and assigning credits of actions taken by the agent on future rewards.

graph LR
    Env[Environment] -- Feedback/Reward --> Retro[Retrospective Model]
    Retro -- Refined Prompt --> Agent[LLM Agent]
    Agent -- Action --> Env

    subgraph "Policy Optimization"
        Retro
    end

Key Components:

Retrospective Model: Learns from environment feedback to improve agent performance
Policy Optimization: Refines agent prompts based on reward signals
Credit Assignment: Analyzes the impact of actions on future rewards
Multi-Environment Learning: Adapts to various tasks and environments

RetroFormer Table

RetroFormer Agent