Skip to content

MorePET/mat-vis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

257 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mat-vis

PBR texture data factory for MorePET/mat.

Curates ~3 000 PBR materials from four open sources, bakes them to flat PNGs, and hosts the output as a per-file Hugging Face dataset (huggingface.co/datasets/gerchowl/mat-vis, ADR-0012). Consumers fetch individual textures with one plain HTTP GET — no rowmap, no range read, no pyarrow, no binary deps.

pip install mat-vis-client
from mat_vis_client import MatVisClient

client = MatVisClient()                                      # auto-discovers latest release
png = client.fetch_texture("ambientcg", "Rock064", "color")  # 1k PNG bytes, one HTTP GET
results = client.search(category="wood")                      # filter by category

Architecture

┌─────────────────────────────────────────────────────────────────┐
│  IN GIT (this repo, ~40 MB, reviewable)                        │
│                                                                 │
│  index/*.json        — material metadata per source             │
│  mtlx/<source>/*.mtlx — MaterialX XML (gpuopen originals)      │
│  src/mat_vis_baker/  — fetch → bake → pack pipeline             │
│  clients/            — Python, JS, Rust, Shell reference clients│
│  .dagger/            — Dagger CI pipeline                       │
└─────────────────────────────────────────────────────────────────┘
        │
        ▼  Dagger bake pipeline (per-file substrate, ADR-0012)
┌─────────────────────────────────────────────────────────────────┐
│  ON HUGGING FACE DATASETS (calver tag: v2026.04.0)             │
│                                                                 │
│  <source>/<tier>/<material_id>/<channel>.png  — texture file    │
│  <source>/<tier>/.tier_complete               — tier sentinel   │
│  <source>.json                                — material catalog│
│  physicallybased.json                         — scalar props    │
└─────────────────────────────────────────────────────────────────┘
        │
        ▼  one plain HTTP GET per texture (stdlib urllib, zero deps)
┌─────────────────────────────────────────────────────────────────┐
│  CONSUMER                                                      │
│                                                                 │
│  pip install mat-vis-client      (PyPI, zero deps)             │
│  — or —                                                        │
│  <script src="mat-vis-client.mjs">  (browser/Node)             │
│  — or —                                                        │
│  curl + jq (mat-vis.sh)                                        │
└─────────────────────────────────────────────────────────────────┘

Sources

Source Materials License Content
ambientcg ~1 965 CC0-1.0 PNG textures
polyhaven ~752 CC0-1.0 PNG textures
gpuopen ~300 per-material MaterialX + PNG textures
physicallybased.info ~86 CC0-1.0 scalar only (IOR, roughness, color)

Resolution tiers

All tiers share the per-file substrate and client API. Each tier is the upstream's native resolution — sub-1k tiers below the natively- served set are out-of-scope for v0.6.0 (the tar-era resize derive pipeline was retired in #189; per-file derive is future work).

Tier Per material Status
128 ~10 KB released (ambientcg, polyhaven)
256 ~40 KB released (ambientcg, polyhaven)
512 ~150 KB released (ambientcg, polyhaven)
1k ~2 MB released (ambientcg, polyhaven, gpuopen)
2k ~10 MB released (polyhaven); ambientcg + gpuopen baking

Client usage

Feature matrix

The Python client is the reference implementation and full-featured. JS and Rust clients are minimal per-file fetchers — the shell / SQL bindings are a couple of curl lines. Pick based on runtime needs.

Feature Python JS Rust Shell SQL
fetch_texture (per-file PNG GET)
Catalog discovery
Per-material materials list
Local file cache (~/.cache/mat-vis/)
Cache soft-cap + MAT_VIS_CACHE_MAX_SIZE
Per-file size cap (MAT_VIS_MAX_FETCH_SIZE)
Rate-limit auto-retry (429/503/403)
Redirect / signed-URL cache
search by category + scalar ranges
prefetch bulk download
MaterialX export (synthesized)
MaterialX original (gpuopen)
Format adapters (three.js, glTF)
Typed RateLimitError / MatVisError
CLI ✅ (Node)

If you need search, prefetch, MaterialX, or format adapters, use Python. For drop-in per-file fetches in a browser or lightweight Rust binary, the smaller clients have what you need.

Python

from mat_vis_client import MatVisClient

client = MatVisClient()

# fetch a single texture channel
png = client.fetch_texture("ambientcg", "Rock064", "color", tier="1k")
with open("rock064_color.png", "wb") as f:
    f.write(png)

# list available materials
for mat_id in client.materials("ambientcg", "1k"):
    print(mat_id)

# search across all sources (kwargs — category + optional scalar ranges)
results = client.search(category="stone", roughness_range=(0.4, 0.9))

# MaterialX export — dotted API
# Synthesized (always works: UsdPreviewSurface wrapper over our PNGs)
mtlx_path = client.mtlx("ambientcg", "Rock064", tier="1k").export("./out")

# Original upstream document (gpuopen today; None elsewhere)
orig = client.mtlx("gpuopen", "<material-uuid>").original
if orig is not None:
    xml = orig.xml                      # raw upstream XML
    orig.export("./out")                # PNGs + upstream mtlx with local paths

# Low-level adapters (generic: scalars dict + textures dict)
from mat_vis_client.adapters import to_threejs, to_gltf, export_mtlx

JavaScript (browser or Node)

import { MatVisClient } from './mat-vis-client.mjs';
const client = new MatVisClient();
const png = await client.fetchTexture('polyhaven', 'castle_brick_02_red', 'color', '1k');

Shell (curl + jq)

source mat-vis.sh
mat_vis_fetch ambientcg Rock064 color 1k > rock064.png

SQL (DuckDB / pyarrow)

SELECT id, source, category FROM
  'https://github.com/MorePET/mat-vis/releases/download/v2026.04.0/mat-vis-ambientcg-1k-ceramic.parquet'
WHERE category = 'ceramic'

Development

Prerequisites

  • Python 3.12+, uv
  • Dagger (CI pipeline)
  • Nix + direnv (optional, provides full devShell)

Local bake

uv sync
source .venv/bin/activate

# bake a single source + tier
mat-vis-baker all ambientcg 1k ./output --release-tag v2026.04.0

# derive smaller tiers from a release
mat-vis-baker derive-from-release v2026.04.0 512 ./output-512

# generate catalog from release
mat-vis-baker catalog-from-release v2026.04.0 --output-dir .

Operator's guide: orphan LFS cleanup

Under the per-file substrate (ADR-0012) HF Hub uploads each LFS blob before finalizing the commit. A mid-batch crash can therefore leave orphan blobs on the object store — uploaded, but not referenced by any committed file. Xet dedup makes future re-uploads bytes-free, so the practical damage is the storage accounting line; cleanup is optional but housekeeping-friendly.

# Dry-run audit against the scratch repo (default behaviour).
mat-vis-baker audit-orphans --repo gerchowl/mat-vis-tst

# Pin to a specific revision.
mat-vis-baker audit-orphans --repo gerchowl/mat-vis-tst --revision v2026.05.0

# Delete orphans (interactive: type DELETE to confirm).
mat-vis-baker audit-orphans --repo gerchowl/mat-vis-tst --delete

# Auditing the canonical prod repo requires --allow-prod.
mat-vis-baker audit-orphans --repo gerchowl/mat-vis --allow-prod

# Bypass the interactive prompt (e.g. inside a CI job):
MAT_VIS_AUDIT_FORCE=1 mat-vis-baker audit-orphans \
  --repo gerchowl/mat-vis-tst --delete

Dagger CI

# smoke test
dagger call -m .dagger smoke --src=.

# full bake + release upload
dagger call -m .dagger bake-and-release \
  --src=. --source=ambientcg --tier=1k \
  --release-tag=v2026.04.0 --registry-pass=env:GITHUB_TOKEN

Versioning

  • Data releases: calver (v2026.04.0) — tied to upstream source updates
  • Code/client releases: semver (v0.1.0) — API changes

Key design decisions

Architecture is captured in docs/decisions/:

  1. ADR-0001 — Three-layer storage: JSON indexes + .mtlx in git, Parquet bundles as Release assets, rowmap for byte-level access.
  2. ADR-0002 — GitHub Releases hosting (free, CDN-backed); weekly watch for upstream change detection.
  3. ADR-0003 — Per (source x tier) Parquet files; category partitioning with dynamic size splitting to stay under GitHub's 2 GB limit.
  4. ADR-0004 — Lazy local cache at ~/.cache/mat-vis/ as default; prefetch and no-cache modes opt-in.

Newer ADRs (0007–0011) reshape the substrate around the Hugging Face dataset + sharded Dagger pipeline + the two-layer index record. See the ADR index for the full ordering.

Upstream metadata vocabulary

The baker normalizes four upstream vocabularies (ambientcg, polyhaven, gpuopen, physicallybased) onto 10 canonical categories. The captured vocabulary — every category title and top-100 tag per source, with counts — is committed as docs/sources/metadata-vocabulary.md (and the machine-readable sidecar metadata-vocabulary.json). Regenerate with uv run python scripts/probe-metadata-vocab.py when an upstream schema shifts.

Relationship to mat

mat-vis is the data factory. MorePET/mat is the user-facing library.

mat mat-vis (this repo)
What Python API + material data Data pipeline + hosting
Source data TOML (physical properties) .mtlx + JSON (appearance)
Artifact PyPI wheel (~2 MB) Parquet on GH Releases (GB)
Versioning semver (API-driven) calver (upstream-driven)
User installs? yes (pip install mat) pip install mat-vis-client

License

  • Code (build scripts, workflows, clients): MIT — see LICENSE.
  • Data: license inherits from each upstream source. Three of four are CC0 1.0 (public domain). gpuopen license per-material.

Links

About

PBR texture data factory — ~3000 materials from ambientcg, polyhaven, gpuopen, physicallybased.info baked to Parquet on GitHub Releases. pip install mat-vis-client

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors