Loading...
常用网站综合网站:https://paperswithcode.com/https://scholar.google.com/https://www.ka...
ViLT(Vision-and-Language Transformer)是一种多模态AI模型,可以给出图片的文字描述,最近阅读学习相关论文(https:...