MultiCrafter
High-Fidelity Multi-Subject Generation via
Disentangled Attention and Identity-Aware Preference Alignment
Tao Wu^1, Yibo Jiang^1, Yehao Lu¹, Zhizhong Wang², Zeyi Huang², Zequn Qin¹, Xi Li^1†
¹Zhejiang University ²Huawei

arXiv Code (Coming soon)

✨ Press and hold the mouse button to drag and view

Abstract

Multi-subject image generation aims to synthesize user-provided subjects in a single image while preserving subject fidelity, ensuring prompt consistency, and aligning with human aesthetic preferences. Existing In-Context-Learning based methods are limited by their highly coupled training paradigm. These methods attempt to achieve both high subject fidelity and multi-dimensional human preference alignment within a single training stage, relying on a single, indirect reconstruction loss, which is difficult to simultaneously satisfy both these goals. To address this, we propose MultiCrafter, a framework that decouples this task into two distinct training stages. First, in a pre-training stage, we introduce an explicit positional supervision mechanism that effectively resolves attention bleeding and drastically enhances subject fidelity. Second, in a post-training stage, we propose Identity-Preserving Preference Optimization, a novel online reinforcement learning framework. We feature a scoring mechanism to accurately assess multi-subject fidelity based on the Hungarian matching algorithm, which allows the model to optimize for aesthetics and prompt alignment while ensuring subject fidelity achieved in the first stage. Experiments validate that our decoupling framework significantly improves subject fidelity while aligning with human preferences better.

How does it work?

Overall pipeline of MultiCrafter. Our framework is built on two core innovations: (Top) Identity-Disentangled Attention Regularization uses positional supervision to prevent attribute leakage and the MoE-LORA architecture to boost model capacity for diverse scenarios; and (Bottom) the Identity-Preserving Preference Alignment framework employs a novel online reinforcement learning strategy with a Multi-ID Alignment Reward and the stable GSPO algorithm to align the model with human preferences.

Results of Multi-Human Personalization.

Results of Multi-Object Personalization.

Results of Single-Subject Personalization.