TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
基本信息原文链接:https://arxiv.org/abs/2410.23168作者:HaiyangWang,YueFan,MuhammadFerjadNaeem,YongqinXian,JanEricLenssen,LiweiWang,FedericoTombari,BerntSchiele️关键词:ProgressiveScaling,Attentionmechanism分类:机器学习摘要中