Learning to Optimize Multi-Objective Alignment Through Dynamic Reward Weighting was released on arXiv!