Skip to content
View L0SG's full-sized avatar

Block or report L0SG

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

OpenFLAM: Framewise Language Audio Model

Python 103 6 Updated Jan 14, 2026

PersonaPlex code.

Python 9,785 1,368 Updated Mar 2, 2026

This is the official implementation for εar-VAE model including inference and evaluation parts, more details coming soon...

Python 71 6 Updated Feb 13, 2026

Elucidated Text-To-Audio (ETTA) is a SOTA text-to-audio model with a holistic understanding of the design space and trained with synthetic captions.

Python 116 10 Updated Mar 3, 2026

[ACL 2025] Exploring the Potential of LLMs as Personalized Assistants: Dataset, Evaluation, and Analysis

Python 14 Updated Apr 1, 2026

ACE-Step: A Step Towards Music Generation Foundation Model

Python 4,437 561 Updated Feb 15, 2026

Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation

Python 4,597 351 Updated Jun 21, 2025

[NAACL 2025] WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching

Python 126 12 Updated Apr 8, 2026
Python 59 3 Updated Mar 22, 2025

The official implementation of TokenSynth (ICASSP 2025)

Python 87 4 Updated Oct 27, 2025

A low-bitrate single-codebook 16 / 24 kHz speech codec based on focal modulation

Jupyter Notebook 164 16 Updated Nov 30, 2025

Unified automatic quality assessment for speech, music, and sound.

Python 711 51 Updated Jun 5, 2025

Training Large Language Model to Reason in a Continuous Latent Space

Python 1,602 176 Updated Apr 8, 2026

A family of state-of-the-art Transformer-based audio codecs for low-bitrate high-quality audio coding.

Python 429 29 Updated Feb 12, 2026

A suite of image and video neural tokenizers

Jupyter Notebook 1,723 88 Updated Feb 11, 2025

New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos

8,097 511 Updated Jan 6, 2026

Event Relation in Text-to-Audio (TTA) Generation

Python 21 Updated Feb 26, 2025

[ICLR 2026] TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching

Jupyter Notebook 862 79 Updated Jan 28, 2026

LibriSpeech-Long is a benchmark dataset for long-form speech generation and processing. Released as part of "Long-Form Speech Generation with Spoken Language Models" (arXiv 2024).

94 4 Updated Dec 28, 2024
Python 334 34 Updated Dec 17, 2024

[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Python 2,166 258 Updated Feb 23, 2026

Official PyTorch implementation of "Paralinguistics-Aware Speech-Empowered LLMs for Natural Conversation" (NeurIPS 2024)

Python 95 4 Updated Dec 3, 2024

Official repository of Wavehax vocoder

Python 71 7 Updated Dec 20, 2025

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838

Python 1,908 122 Updated Feb 20, 2026

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 10,151 950 Updated May 5, 2026

Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"

Python 217 18 Updated Sep 19, 2024

Text-to-Music Generation with Rectified Flow Transformers

Python 1,712 128 Updated Dec 10, 2024

[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling

Python 1,291 111 Updated Mar 2, 2025

The official Implementation of PeriodWave and PeriodWave-Turbo

Python 221 17 Updated Apr 14, 2025

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑‍🔬

Jupyter Notebook 13,514 1,940 Updated Dec 19, 2025
Next