Agent banana: High-Fidelity Image Editing with Agentic Thinking and Tooling
🚧 Code coming soon. Stay tuned!
We present Agent Banana, a agentic planner–executor framework designed for high-fidelity, object-aware, thinking with editing. Agent Banana offers these key features:
-
🔥 Framework: Agent Banana couples high-level reasoning with tool-use capabilities to decompose complex requests into atomic sub-edits. It employs "Photoshop-style" layer isolation and masking to ensure precise modifications while preserving non-target content.
-
🔥 Context Folding: To enable stable long-horizon control, we introduce Context Folding, which compresses long interaction histories into structured memory. This allows the system to track state changes effectively and support rollback/replanning across multi-turn interactions.
-
🔥 Image Layer Decomposition: We propose Image Layer Decomposition to perform edits on isolated high-resolution layers. This approach prevents drift across iterations and ensures that edits are applied at native resolution without downsampling artifacts.
-
🔥 HDD-Bench: We release HDD-Bench, a high-definition, dialogue-based benchmark featuring verifiable stepwise targets and native 4K images. Unlike prior single-turn benchmarks, it supports rigorous diagnosis of long-horizon failures and professional workflow simulation.
-
🔥 Performance: On HDD-Bench, Agent Banana achieves state-of-the-art results in multi-turn consistency and background fidelity (e.g., IC 0.871, SSIM 0.84) while remaining competitive on instruction following.
@article{ye2026agentbananahighfidelityimage,
title={Agent Banana: High-Fidelity Image Editing with Agentic Thinking and Tooling},
author={Ruijie Ye and Jiayi Zhang and Zhuoxin Liu and Zihao Zhu and Siyuan Yang and Li Li and Tianfu Fu and Franck Dernoncourt and Yue Zhao and Jiacheng Zhu and Ryan Rossi and Wenhao Chai and Zhengzhong Tu},
year={2026},
eprint={2602.09084},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2602.09084},
}


