LibriSpeech-Long is a benchmark dataset for long-form speech generation and processing. Released as part of "Long-Form Speech Generation with Spoken Language Models" (arXiv 2024).

94 4 Updated Dec 28, 2024

apple / ml-tarflow

Python 334 34 Updated Dec 17, 2024

hkchengrex / MMAudio

[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Python 2,166 258 Updated Feb 23, 2026

naver-ai / usdm

Official PyTorch implementation of "Paralinguistics-Aware Speech-Empowered LLMs for Natural Conversation" (NeurIPS 2024)

Python 95 4 Updated Dec 3, 2024

chomeyama / wavehax

Official repository of Wavehax vocoder

Python 71 7 Updated Dec 20, 2025

LTH14 / mar

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838

Python 1,908 122 Updated Feb 20, 2026

kyutai-labs / moshi

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 10,151 950 Updated May 5, 2026

Aria-K-Alethia / BigCodec

Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"

Python 217 18 Updated Sep 19, 2024

feizc / FluxMusic

Text-to-Music Generation with Rectified Flow Transformers

Python 1,712 128 Updated Dec 10, 2024

jishengpeng / WavTokenizer

[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling

Python 1,291 111 Updated Mar 2, 2025

sh-lee-prml / PeriodWave

The official Implementation of PeriodWave and PeriodWave-Turbo

Python 221 17 Updated Apr 14, 2025

SakanaAI / AI-Scientist

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑‍🔬

Jupyter Notebook 13,514 1,940 Updated Dec 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sang-gil Lee L0SG

Achievements

Achievements

Block or report L0SG

Stars

adobe-research / openflam

NVIDIA / personaplex

Eps-Acoustic-Revolution-Lab / EAR_VAE

NVIDIA / audio-intelligence

12kimih / HiCUPID

ace-step / ACE-Step

MoonshotAI / Kimi-Audio

luotianze666 / WaveFM

chaehunshin / DiptychPrompting

KyungsuKim42 / tokensynth

lucadellalib / focalcodec

facebookresearch / audiobox-aesthetics

facebookresearch / coconut

Stability-AI / stable-codec

NVIDIA / Cosmos-Tokenizer

NVIDIA / Cosmos

yuhanghe01 / RiTTA

declare-lab / TangoFlux

google-deepmind / librispeech-long