A pure Go library for parsing and validating GEDCOM (GEnealogical Data COMmunication) files.
- Multi-version Support: Parse and write GEDCOM 5.5, 5.5.1, and 7.0 with automatic version detection
- Version Conversion: Bidirectional conversion between versions with transformation tracking
- Historical Calendar Support: Parse dates in Julian, Hebrew, and French Republican calendars with conversion
- Streaming APIs: Memory-efficient parsing and encoding for very large files (1M+ records)
- Comprehensive Validation: Date logic, orphaned references, duplicates, and quality reports
- Vendor Extensions: Parse Ancestry.com and FamilySearch custom tags
- Zero Dependencies: Uses only the Go standard library
- Well-tested: 93-100% per-package test coverage with multi-platform CI
See FEATURES.md for the complete feature list including all supported record types, events, attributes, and encoding details.
Support status for common genealogy software:
| Software | Status |
|---|---|
| RootsMagic | |
| Legacy Family Tree | |
| Family Tree Maker | |
| Gramps | 🧪 Synthetic test only |
| Ancestry | 🧪 Synthetic test only |
Full compatibility matrix: docs/COMPATIBILITY.md
GEDCOM Specification: Full support for 5.5, 5.5.1, and 7.0
go get github.com/cacack/gedcom-go/v2- Go 1.25 or later
This library tracks Go's release policy, supporting the two most recent major versions. When a Go version reaches end-of-life and no longer receives security patches, we bump our minimum accordingly.
The library provides a simple, single-import API for common operations. Import with an alias for cleaner code:
import gedcomgo "github.com/cacack/gedcom-go/v2"package main
import (
"fmt"
"log"
"os"
gedcomgo "github.com/cacack/gedcom-go/v2"
)
func main() {
f, err := os.Open("family.ged")
if err != nil {
log.Fatal(err)
}
defer f.Close()
doc, err := gedcomgo.Decode(f)
if err != nil {
log.Fatal(err)
}
fmt.Printf("GEDCOM Version: %s\n", doc.Header.Version)
fmt.Printf("Individuals: %d\n", len(doc.Individuals()))
fmt.Printf("Families: %d\n", len(doc.Families()))
}// Basic validation (returns []error)
errors := gedcomgo.Validate(doc)
// Comprehensive validation with severity levels (returns []Issue)
issues := gedcomgo.ValidateAll(doc)
for _, issue := range issues {
fmt.Printf("[%s] %s\n", issue.Severity, issue.Message)
}f, _ := os.Create("output.ged")
defer f.Close()
err := gedcomgo.Encode(f, doc)For files too large to materialize in memory (10k+ individuals, exports from major platforms), use the streaming APIs. The parser yields records one at a time; the encoder writes them one at a time. The full Document is never constructed, so the heap retained after the operation completes is a small constant rather than proportional to file size.
import (
"os"
"github.com/cacack/gedcom-go/v2/charset"
"github.com/cacack/gedcom-go/v2/encoder"
"github.com/cacack/gedcom-go/v2/gedcom"
"github.com/cacack/gedcom-go/v2/parser"
)
// Streaming parse — iterate level-0 records without building a Document.
func countRecords(path string) (map[string]int, error) {
f, err := os.Open(path)
if err != nil {
return nil, err
}
defer f.Close()
counts := make(map[string]int)
for rec, err := range parser.Records(charset.NewReader(f)) {
if err != nil {
return nil, err // rec is nil here; always check err first
}
counts[rec.Type]++
}
return counts, nil
}
// Streaming encode — call sequence is WriteHeader → WriteRecord* → WriteTrailer → Close.
// Capture Close's error so ErrTrailerNotWritten and flush failures aren't dropped.
func writeStreamed(path string, records []*gedcom.Record) (err error) {
out, err := os.Create(path)
if err != nil {
return err
}
defer func() {
if cerr := out.Close(); cerr != nil && err == nil {
err = cerr
}
}()
enc := encoder.NewStreamEncoder(out)
defer func() {
if cerr := enc.Close(); cerr != nil && err == nil {
err = cerr
}
}()
if err := enc.WriteHeader(&gedcom.Header{Version: "5.5", Encoding: "UTF-8"}); err != nil {
return err
}
for _, rec := range records {
if err := enc.WriteRecord(rec); err != nil {
return err
}
}
return enc.WriteTrailer()
}On a 1.1MB / 2,322-individual file, streaming parse holds ~17% of the heap that batch decode retains after the call returns (and ~54% of the cumulative allocations). See examples/stream for the full pattern and docs/PERFORMANCE.md for benchmark details.
// Convert to GEDCOM 7.0
converted, report, err := gedcomgo.Convert(doc, gedcomgo.Version70)
if report.HasDataLoss() {
for _, item := range report.DataLoss {
fmt.Printf("Lost: %s - %s\n", item.Feature, item.Reason)
}
}// Find and display individuals
for _, individual := range doc.Individuals() {
if len(individual.Names) > 0 {
fmt.Printf("Name: %s\n", individual.Names[0].Full)
}
// Access events
for _, event := range individual.Events {
fmt.Printf(" %s: %s\n", event.Tag, event.Date)
}
}
// O(1) lookup by cross-reference ID
person := doc.GetIndividual("@I1@")
if person != nil {
fmt.Printf("Found: %s\n", person.Names[0].Full)
}
// Navigate family relationships
family := doc.GetFamily("@F1@")
if family != nil {
husband := doc.GetIndividual(family.Husband)
wife := doc.GetIndividual(family.Wife)
}Process GEDCOM files with errors while extracting as much valid data as possible. Lenient mode (the default for DecodeWithDiagnostics) recovers from common real-world quirks — empty lines, invalid level numbers, unknown tags, and malformed indentation jumps (e.g., 1 BIRT directly followed by 4 DATE, as seen in some Ancestry/MyHeritage exports). Recovered issues are reported as Diagnostics with codes like BAD_LEVEL_JUMP, UNKNOWN_TAG, and EMPTY_LINE rather than failing the parse.
result, err := gedcomgo.DecodeWithDiagnostics(f)
if err != nil {
log.Fatal(err) // Fatal I/O error
}
// Check for parse issues
if result.Diagnostics.HasErrors() {
fmt.Printf("Found %d errors\n", len(result.Diagnostics.Errors()))
for _, d := range result.Diagnostics {
fmt.Printf(" Line %d: [%s] %s\n", d.Line, d.Code, d.Message)
}
}
// Use the partial document
doc := result.Document
fmt.Printf("Parsed %d individuals\n", len(doc.Individuals()))To opt into strict parsing (fail on the first syntax error, no diagnostics collected), pass &decoder.DecodeOptions{StrictMode: true} — see Custom Decode Options.
- Usage Guide: USAGE.md - Comprehensive guide covering basic concepts, examples, and best practices
- Examples: See the
examples/directory (README):examples/parse- Basic parsing and information displayexamples/encode- Creating GEDCOM files programmaticallyexamples/query- Navigating and querying genealogy dataexamples/validate- Validating GEDCOM filesexamples/stream- Streaming parse and encode for very large files
- API Documentation: pkg.go.dev/github.com/cacack/gedcom-go/v2
- Contributing: CONTRIBUTING.md
The gedcomgo facade also exposes *WithOptions variants for the common operations:
doc, err := gedcomgo.DecodeWithOptions(r, opts)
err = gedcomgo.EncodeWithOptions(w, doc, opts)
errs := gedcomgo.ValidateWithOptions(doc, opts)
issues := gedcomgo.ValidateAllWithOptions(doc, opts)Option types — DecodeOptions, EncodeOptions, ValidateOptions — are re-exported from the facade. For the full surface area (streaming, diagnostics, converter), import the underlying packages directly:
import (
"github.com/cacack/gedcom-go/v2/decoder"
"github.com/cacack/gedcom-go/v2/encoder"
"github.com/cacack/gedcom-go/v2/validator"
"github.com/cacack/gedcom-go/v2/converter"
)// Decode with progress reporting and timeout
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
opts := &gedcomgo.DecodeOptions{
Context: ctx,
TotalSize: fileInfo.Size(),
OnProgress: func(bytesRead, totalBytes int64) {
fmt.Printf("\rProgress: %d%%", bytesRead*100/totalBytes)
},
}
doc, err := gedcomgo.DecodeWithOptions(reader, opts)// Configure validation strictness and duplicate detection
opts := &gedcomgo.ValidateOptions{
Strictness: validator.StrictnessStrict,
MaxErrors: 100,
SkipRules: []string{"W001"},
Duplicates: &validator.DuplicateConfig{
RequireExactSurname: true,
MinNameSimilarity: 0.8,
},
}
issues := gedcomgo.ValidateAllWithOptions(doc, opts)// Encode with custom line endings and line length
opts := &gedcomgo.EncodeOptions{
LineEnding: "\r\n", // CRLF
MaxLineLength: 248,
}
err := gedcomgo.EncodeWithOptions(writer, doc, opts)// Convert with strict data loss checking
opts := &converter.ConvertOptions{
Validate: true,
StrictDataLoss: true, // Fail on any data loss
}
converted, report, err := converter.ConvertWithOptions(doc, gedcom.Version55, opts)For fine-grained control, these packages are available:
charset- Character encoding utilities with UTF-8 validationconverter- Version conversion with transformation trackingdecoder- High-level GEDCOM decoding with automatic version detectionencoder- GEDCOM document writing with configurable line endingsgedcom- Core data types (Document, Individual, Family, Source, etc.)parser- Low-level line parsing with detailed error reportingvalidator- Document validation with error categorizationversion- GEDCOM version detection (header and heuristic-based)
This library follows Semantic Versioning. We do not break exported types in v1+ without a major version bump.
| Package | Key APIs |
|---|---|
gedcom |
Document, Individual, Family, Event, Date |
decoder |
Decode(), DecodeWithOptions() |
encoder |
Encode(), EncodeWithOptions(), NewStreamEncoder(), NewStreamEncoderWithOptions(), EncodeStreaming(), EncodeStreamingWithOptions() |
converter |
Convert(), ConvertWithOptions() |
parser |
Parse(), ParseLine(), NewRecordIterator(), NewRecordIteratorWithOffset(), Records(), RecordsWithOffset(), NewLazyParser() |
validator |
Validate(), ValidateAll(), NewStreamingValidator() |
charset |
NewReader() |
version |
Detect() |
- Experimental features (duplicate detection algorithms, quality report format) may evolve in minor versions
As GEDCOM 7.x evolves, we add support additively. New tags and structures are added without breaking existing code.
Vendor extensions (Ancestry, FamilySearch) are best-effort and not covered by stability guarantees.
For the complete policy including deprecation process, see docs/API_STABILITY.md.
The project includes a Makefile for common development tasks:
# Show all available commands
make help
# Run all checks and build
make all
# Run tests
make test
# Run tests with coverage (93-100% per-package coverage)
make test-coverage
# Generate HTML coverage report
make coverage-html
# Run benchmarks
make bench
# Format code
make fmt
# Run linters
make vet
make lint
# Run pre-commit checks
make pre-commit
# Clean build artifacts
make cleanYou can also use Go commands directly:
# Run all tests
go test ./...
# Run tests with coverage
go test -cover ./...
# Run benchmarks
go test -bench=. ./...
# Download dependencies
go mod download
# Build all packages
go build ./...
# Format code
go fmt ./...
# Run static analysis
go vet ./...The library is designed for high performance with efficient memory usage:
- Parser: 66ns/op for simple lines, ~700μs for 1000 individuals
- Decoder: 13ms for 1000 individuals with full document structure
- Encoder: 1.15ms for 1000 individuals
- Validator: 5.91μs for 1000 individuals, zero allocations for valid documents
# Run all benchmarks
make bench
# Run specific package benchmarks
make bench-parse
make bench-decode
make bench-encode
# Save baseline for comparison
make bench-save
# Compare current performance with baseline
make bench-compareAutomated regression detection with 10% threshold:
# Run regression tests
make perf-regressionFor detailed performance metrics, profiling guides, and optimization opportunities, see docs/PERFORMANCE.md.
MIT License - see LICENSE file for details.
Contributions are welcome! Please ensure:
- All tests pass (
go test ./...) - Code coverage remains ≥85%
- Code is formatted (
go fmt ./...) - No linter warnings (
go vet ./...)
See CONTRIBUTING.md for detailed guidelines.