Box-QAymo provides a comprehensive framework for generating, processing, and evaluating visual question answering (VQA) tasks on the Waymo dataset. The framework supports diverse question types, multiple evaluation metrics, and various answer formats.
Core Components
- Data Processing: Waymo dataset extraction and preprocessing
- Question Generation: Hierarchical prompt generators for different question types
- Model Evaluation: Support for multiple VLMs and evaluation metrics
- Answer Processing: Handles multiple choice, text, and bounding box answers
Supported Models
- VLMs: LLaVA, Qwen-VL, SENNA
- Evaluation Metrics: F1, Precision, Recall
- Question Types: Binary, attribute, and motion reasoning
Quick Start
For detailed setup instructions, including Waymo dataset preprocessing, crowd-sourced metadata download, and model evaluation scripts, please visit our GitHub repository. The repository includes:
- Complete installation and setup guide
- Waymo dataset extraction scripts
- VQA dataset generation pipeline
- Model evaluation and comparison tools
- Pre-trained model integration