Abstract: Although recent image captioning models have achieved substantial progress, they still encounter limitations in capturing abstract semantics, resulting in insufficient semantic depth and ...