🖼️ Image Captioning — ViT + GPT-2

Upload any image and get an AI-generated caption. Model: nlpconnect/vit-gpt2-image-captioning

10 128
1 8
1 4

How it works

  • ViT encodes the image into patch embeddings
  • GPT-2 decodes embeddings into natural language
  • Beam search picks the best caption from multiple candidates

Tips

  • Clear, well-lit photos work best
  • Increase beam width for better accuracy
  • Multiple captions reveals model uncertainty

Part of the AI Engineer Portfolio