1.Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection(ICLR 2023)



作者:  Kaifeng Gao, Long Chen, Hanwang Zhang, Jun Xiao, Qianru Sun







Prompt tuning with large-scale pretrained vision-language models empowers open-vocabulary predictions trained on limited base categories, e.g., object classification and detection. In this paper, we propose compositional prompt tuning with motion cues: an extended prompt tuning paradigm for compositional predictions of video data. In particular, we present Relation Prompt (RePro) for Open-vocabulary Video Visual Relation Detection (Open-VidVRD), where conventional prompt tuning is easily biased to certain subject-object combinations and motion patterns. To this end, RePro addresses the two technical challenges of Open-VidVRD: 1) the prompt tokens should respect the two different semantic roles of subject and object, and 2) the tuning should account for the diverse spatio-temporal motion patterns of the subject-object compositions. Without bells and whistles, our RePro achieves a new state-of-the-art performance on two VidVRD benchmarks of not only the base training object and predicate categories, but also the unseen ones. Extensive ablations also demonstrate the effectiveness of the proposed compositional and multi-mode design of prompts. Code is available at this https URL.

2.Stable Target Field for Reduced Variance Score Estimation in Diffusion Models(ICLR 2023)



作者:Yilun Xu, Shangyuan Tong, Tommi Jaakkola






        扩散模型通过逆转一个固定的前向扩散过程产生样本。尽管已经提供了令人印象深刻的经验结果,但这些扩散模型算法可以通过减少其去噪分数匹配目标中训练目标的方差而进一步改进。我们认为,这种方差的来源在于对中间噪声-方差尺度的处理,其中数据中的多种模式会影响反向路径的方向。我们建议通过纳入一个参考批次来补救这个问题,我们用它来计算加权条件分数作为更稳定的训练目标。我们表明,该程序通过减少训练目标的协方差(痕迹),确实有助于挑战性的中间制度。新的稳定目标可以被看作是用偏差来换取降低的方差,其中偏差会随着参考批次大小的增加而消失。经验表明,新的目标改善了各种流行的扩散模型的图像质量、稳定性和训练速度,这些模型都是用一般的ODE和SDE求解器。当与EDM结合使用时,我们的方法在无条件的CIFAR-10生成任务上进行了35次网络评估,产生了1.90的当前SOTA FID。

Diffusion models generate samples by reversing a fixed forward diffusion process. Despite already providing impressive empirical results, these diffusion models algorithms can be further improved by reducing the variance of the training targets in their denoising score-matching objective. We argue that the source of such variance lies in the handling of intermediate noise-variance scales, where multiple modes in the data affect the direction of reverse paths. We propose to remedy the problem by incorporating a reference batch which we use to calculate weighted conditional scores as more stable training targets. We show that the procedure indeed helps in the challenging intermediate regime by reducing (the trace of) the covariance of training targets. The new stable targets can be seen as trading bias for reduced variance, where the bias vanishes with increasing reference batch size. Empirically, we show that the new objective improves the image quality, stability, and training speed of various popular diffusion models across datasets with both general ODE and SDE solvers. When used in combination with EDM, our method yields a current SOTA FID of 1.90 with 35 network evaluations on the unconditional CIFAR-10 generation task. The code is available at this https URL

For the Underrepresented in Gender Bias Research: Chinese Name Gender Prediction with Heterogeneous Graph Attention Network



作者: Zihao Pan, Kai Peng, Shuai Ling, Haipeng Zhang 







Achieving gender equality is an important pillar for humankind's sustainable future. Pioneering data-driven gender bias research is based on large-scale public records such as scientific papers, patents, and company registrations, covering female researchers, inventors and entrepreneurs, and so on. Since gender information is often missing in relevant datasets, studies rely on tools to infer genders from names. However, available open-sourced Chinese gender-guessing tools are not yet suitable for scientific purposes, which may be partially responsible for female Chinese being underrepresented in mainstream gender bias research and affect their universality. Specifically, these tools focus on character-level information while overlooking the fact that the combinations of Chinese characters in multi-character names, as well as the components and pronunciations of characters, convey important messages. As a first effort, we design a Chinese Heterogeneous Graph Attention (CHGAT) model to capture the heterogeneity in component relationships and incorporate the pronunciations of characters. Our model largely surpasses current tools and also outperforms the state-of-the-art algorithm. Last but not least, the most popular Chinese name-gender dataset is single-character based with far less female coverage from an unreliable source, naturally hindering relevant studies. We open-source a more balanced multi-character dataset from an official source together with our code, hoping to help future research promoting gender equality.


