| title | AquaSAR-Env |
|---|---|
| emoji | 🌊 |
| colorFrom | blue |
| colorTo | gray |
| sdk | docker |
| pinned | false |
| app_port | 7860 |
(Run python run_app.py --test-rl to generate and view this simulation GIF locally!)
| Role | Name | |
|---|---|---|
| Team Lead | Vaghela Parthavsinh Ruchita | parthavsinh@gmail.com |
| Member | Krisha Patel | krishakpatel19@gmail.com |
| Member | Mansi Vora | mansivora279@gmail.com |
AquaSAR-Env is a high-impact, state-of-the-art multi-agent reinforcement learning environment engineered by Team MetaMavericks. Fully ported and integrated with the OpenEnv standard, it simulates a critical maritime search and rescue (SAR) operation coordinating one aerial drone (UAV) and two surface boats (USVs) to locate and rescue a drifting survivor in stochastic ocean conditions.
Our solution focuses on autonomous coordination, dynamic pathfinding around hazards, and robust response to unpredictable environmental factors.
The Reinforcement Learning model is trained to near-perfect precision using our advanced multi-agent reward shaping.
- The Drone (UAV): Leverages its superior speed (15 m/s) to fly directly toward the lost survivor with perfect precision, acting as the primary scout and first responder.
- Boat 1 (USV): Follows the drone closely to secure the survivor for extraction.
- Boat 2 (USV): Keeps a close watch on the survivor and surrounding environment, remaining in formation to monitor for any secondary subjects or dynamic hazards during the rescue.
Team MetaMavericks designed AquaSAR-Env to push the boundaries of autonomous maritime coordination:
- Full OpenEnv Compatibility: Native asynchronous WebSockets client (
metamavericks_env.client) and a FastAPI server seamlessly support LLM inference and web dashboards. - Hugging Face RL Ecosystem Native: The environment features a built-in
Gymnasiumwrapper (metamavericks_env.gym_wrapper.MetamavericksGymWrapper) that performs automatic Action and Observation scaling (scaling 48D physical bounds to[-1, 1]), fully optimizing it for Stable-Baselines3, RLlib, and TRL. - Advanced Multi-Agent Reward Shaping: Utilizes a dense Linear Distance Penalty combined across all three agents to provide a constant, undeniable gradient toward the target. This completely eliminates random wandering and farming behaviors, ensuring perfect multi-agent convergence.
- Heterogeneous Multi-Agent System: Coordinates agents with different mobility constraints (UAVs at 15 m/s vs USVs at 8 m/s).
- Stochastic Ocean Currents: Implements realistic Ornstein-Uhlenbeck style random walk currents that affect the survivor's drift trajectory.
- Dynamic Hazard Avoidance: Incorporates active penalty zones (Rocky Shoals, -2.0 reward penalty) requiring agents to perform real-time pathfinding to avoid catastrophic failure.
A 48-dimensional flat feature vector (16 features per agent), automatically scaled to roughly [-1.0, 1.0] by the wrapper:
- Relative position to target (x, y)
- Relative velocity to target (x, y)
- Own velocity (x, y)
- Ocean current vector (x, y)
- Nearest hazard vector (x, y)
- Scalar distance to nearest hazard
- Scalar distance to target
- Normalized relative position to target (x, y)
- Agent type (0.0 for drone, 1.0 for boat)
- Status code (0.0 for active, 1.0 for penalty)
A continuous 6-dimensional vector representing the [-1.0, 1.0] and maps them to physical speeds (15.0 for Drone, 8.0 for Boats).
We've provided a unified runner script run_app.py to handle all OpenEnv tasks.
python run_app.py --server(Runs on port 8000. Leave this running in a separate terminal!)
Run this in a new terminal while the server is active.
# Set your Hugging Face API Token first
export HF_TOKEN="your_hugging_face_token_here"
# Or on Windows CMD: set HF_TOKEN="your_hugging_face_token_here"
# Or on PowerShell: $env:HF_TOKEN="your_hugging_face_token_here"
python run_app.py --inference(If you need help setting your token, run python run_app.py --help-token)
Trains the agent using Stable-Baselines3 (PPO) via the Gymnasium wrapper.
python run_app.py --trainEvaluates the trained agent and saves a visual simulation to eval_episode_test.gif.
python run_app.py --test-rlEngineered with precision by MetaMavericks.