-
Apple <- Zhejiang University, CAD&CG
- Beijing
- person.zjulearning.org.cn/guodongxu/
Stars
- All languages
- C
- C#
- C++
- CSS
- Common Lisp
- Cuda
- Dockerfile
- Emacs Lisp
- Fortran
- Go
- HTML
- Haskell
- Java
- JavaScript
- Julia
- Jupyter Notebook
- Lua
- MATLAB
- Makefile
- Markdown
- Mathematica
- Objective-C
- PHP
- PLSQL
- PostScript
- PureBasic
- Python
- R
- Ruby
- Rust
- SCSS
- Scala
- Shell
- Smalltalk
- Stata
- Swift
- TeX
- TypeScript
- Vim Script
- Wolfram Language
The simplest, fastest repository for training/finetuning small-sized VLMs.
LLM2CLIP significantly improves already state-of-the-art CLIP models.
official repository of CVPR 2024 paper, RMem: Restricted Memory Banks Improve Video Object Segmentation
VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)
Strong and Open Vision Language Assistant for Mobile Devices
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)
Code for the paper "Detecting Any Human-Object Interaction Relationship: Universal HOI Detector with Spatial Prompt Learning on Foundation Models"
[NeurIPS 2023] PGDiff: Guiding Diffusion Models for Versatile Face Restoration via Partial Guidance
[IJCV2024] Exploiting Diffusion Prior for Real-World Image Super-Resolution
[ICCV 2023] ProPainter: Improving Propagation and Transformer for Video Inpainting
[ICLR 2024] Controlling Vision-Language Models for Universal Image Restoration. 5th place in the NTIRE 2024 Restore Any Image Model in the Wild Challenge.
(TPAMI 2024) A Survey on Open Vocabulary Learning
Official Code for DragGAN (SIGGRAPH 2023)
Official Implementation of paper "A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence"
Hackable and optimized Transformers building blocks, supporting a composable construction.
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Code release for our CVPR 2023 paper "Detecting Everything in the Open World: Towards Universal Object Detection".
A command-line productivity tool powered by AI large language models like GPT-5, will help you accomplish your tasks faster and more efficiently.
[ECCV 2022] XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model
DenseTeacher: Dense Pseudo-Label for Semi-supervised Object Detection
A Trimap-Free Portrait Matting Solution in Real Time [AAAI 2022]
程序员延寿指南 | A programmer's guide to live longer
[CVPR 2022] End-to-End Semi-Supervised Learning for Video Action Detection


