This project demonstrates how Vision Language Models (VLMs) can enhance robotic systems through vision understanding and natural language interaction. It implements a function-calling interface that connects VLMs to robotic perception/control modules for two core tasks: Room-to-Room Navigation (R2R) and Embodied Question Answering (EQA). The system is evaluated in AI2-THOR simulation, showing how foundation models can improve robotic perception and planning.
EQA_succ_fast_handbrake.mp4
- Python 3.9+ (python 3.11.5 was used for the development).
Clone the repository:
git clone https://github.com/tommasoTubaldo/Application_of_VLMs_in_Robotics.git
cd Application_of_VLMs_in_RoboticsInstall dependencies:
pip install -r requirements.txt
pip install -U google-genaiNote: Use a virtual environment (
python -m venv venvand thensource venv/bin/activate) to avoid conflicts.
Choose either Vertex AI (recommended) or Gemini API:
With the Google Cloud Services, you are given $300 of credit to be used with the Vertex AI services, and 90 days of free trial.
-
Set Up Google Cloud Project:
- Go to Google Cloud Console.
- Click "Create a new project", name it and confirm.
- Enable Billing:
- Follow this guide to link a billing account.
- Enable Vertex AI API:
- Visit the API enablement page and select your project.
-
Install the Google Cloud CLI:
- Install the Google Cloud SDK.
- Authenticate and log in:
gcloud auth application-default login
-
Configure environment variables:
-
Run these commands on your project directory:
export PROJECT_ID="<your_project_id>" export LOCATION="<your_location>" export API_MODE="vertex"
-
You can find the project id by entering the Google Cloud Console and following these instructions.
-
You can choose the location by refering to Vertex AI regions.
-
-
Get an API key:
- Go to Google AI Studio
- Click "Create API key" and copy the key.
-
Configure environment:
- In your project directory:
cd ~/your_project export GEMINI_API_KEY="<your_api_key>" export API_MODE="gemini"
- In your project directory:
To run the project, simply run:
python3 main.py