ViTPose : A simple yet powerful transformer baseline for Human Pose Estimation
ViTPose: Simple Vision Transformer Baselines for Human Pose EstimationAlthough no specific domain knowledge is considered in the design, plain vision transformers have shown excellent performance in visual recognition tasks. However, little effort has been made to reveal the potential of such simple structures for pose estimation tasks. In this paper,