We introduce a novel approach to learn a common phase manifold from motion datasets across different characters, such as human and dog, using vector quantized periodic autoencoders. This manifold clusters semantically similar motions into the same connected component and aligns them temporally without supervision. Our method enables effective motion matching and supports applications in motion retrieval, transfer, and stylization.