🖼️ Image Captioning — ViT + GPT-2
Upload any image and get an AI-generated caption. Model: nlpconnect/vit-gpt2-image-captioning
10 128
1 8
1 4
How it works
- ViT encodes the image into patch embeddings
- GPT-2 decodes embeddings into natural language
- Beam search picks the best caption from multiple candidates
Tips
- Clear, well-lit photos work best
- Increase beam width for better accuracy
- Multiple captions reveals model uncertainty
Part of the AI Engineer Portfolio