Skip to content

Create CSV Generation Utility #154

@undead2146

Description

@undead2146

Description

Create a utility tool to scan Command & Conquer Generals and Zero Hour installations and generate authoritative CSV registries with file hashes and metadata.

Requirements

• Scan game installation directories for all files
• Calculate MD5 and SHA256 hashes for each file
• Detect game type (Generals vs Zero Hour) from file patterns (see #143 for detailed patterns)
Categorize files by language: Use --language parameter to determine which files are language-specific vs shared
• Generate CSV with schema: relativePath,size,md5,sha256,gameType,language,isRequired,metadata
• Language codes written to CSV must be uppercase ("All", "EN", "DE", ..., "ZH-CN", "ZH-TW"); map input like "--language de" → "DE"

  • Files shared across all languages (game.dat, Generals.exe, maps.big, etc.) get language = "All"
  • Language-specific files (English.big, German.big, etc.) get language = "<CODE>" based on --language parameter
  • Use file path patterns to identify language-specific files (Data\english, Data\german, etc.)

• Store all file paths as relative to installation root (never absolute paths)
• Follow constants.md patterns for configuration
• Use CsvCatalogEntry model from GenHub.Core.Models.Content so schema is consistent with downstream code

CLI Usage

Command Line Interface:

GenHub.Tools.CsvGenerator.exe --installDir <path> --gameType <Generals|ZeroHour> --version <string> --output <path> [--language <code>]

Required Arguments:

  • --installDir <path>: Path to the game installation directory to scan
  • --gameType <Generals|ZeroHour>: Type of game installation
  • --version <string>: Game version (e.g., "1.08" for Generals, "1.04" for Zero Hour)
  • --output <path>: Output path for the generated CSV file

Optional Arguments:

  • --language <code>: Language code for the installation being scanned (default: "en", supported: en, de, fr, es, it, ko, pl, pt-br, zh-cn, zh-tw)
    • Purpose: Tells the generator which language variant is being scanned so it can properly categorize files
    • Effect: Files detected as language-specific will be marked with this language code in the CSV
    • Example: --language de means German-specific files will get language = "DE" in the CSV

Example Usage:

# Generate CSV for Generals 1.08 English installation
# Result: Shared files get language="All", English-specific files get language="EN"
GenHub.Tools.CsvGenerator.exe --installDir "C:\Games\Command & Conquer Generals" --gameType Generals --version 1.08 --output Generals-1.08.csv --language en

# Generate CSV for Zero Hour 1.04 German installation
# Result: Shared files get language="All", German-specific files get language="DE"
GenHub.Tools.CsvGenerator.exe --installDir "C:\Games\Command & Conquer Generals Zero Hour" --gameType ZeroHour --version 1.04 --output ZeroHour-1.04.csv --language de

Acceptance Criteria

  • CSV generation utility scans game installations completely
  • All file hashes calculated correctly (MD5 + SHA256)
  • Game type detection works for Generals and Zero Hour patterns (matches Implement Game Type and Language Filtering #143 requirements)
  • Language detection supports all 10 supported languages (matches Implement Unified Language Support in CSV Pipeline #144 requirements)
  • CSV output matches required schema with relative paths only
  • Utility handles large installations (80k+ files) efficiently
  • Progress reporting for long-running scans
  • Error handling for inaccessible files/directories

Technical Details

  • Location: GenHub.Tools/CsvGenerator/
  • Output: CSV files in docs/GameInstallationFilesRegistry/
  • Dependencies: System.IO, System.Security.Cryptography
  • Performance: Streaming processing for large datasets

Language Categorization Logic

The --language parameter controls how files are categorized in the CSV's language column:

How Language Categorization Works

  1. Shared Fileslanguage = "All"

    • Core game files present in all language variants
    • Examples: game.dat, Generals.exe, maps.big, INI.big
    • These files are identical across all language installations
  2. Language-Specific Fileslanguage = "<CODE>" (based on --language parameter)

    • Files that differ between language variants
    • Examples: English.big, German.big
    • The <CODE> comes from the --language parameter (e.g., --language delanguage = "DE")

File Pattern Detection

The generator uses file path patterns to identify language-specific files:

  • English: Data\english\, English.big, AudioEnglish.big
  • German: Data\german\, German.big, AudioGerman.big
  • And so on for all supported languages...

Example CSV Output

When scanning an English installation with --language en:

relativePath,size,md5,sha256,gameType,language,isRequired,metadata
Data/INI/GameData.ini,12345,abc...,def...,Generals,All,true,"{""category"":""config""}"
Data/Lang/English/game.str,67890,f45...,a67...,Generals,EN,true,"{""category"":""language""}"
Data/English.big,112233,1a2...,2b3...,Generals,EN,true,"{""category"":""language""}"

The same installation scanned with --language de would produce:

relativePath,size,md5,sha256,gameType,language,isRequired,metadata
Data/INI/GameData.ini,12345,abc...,def...,Generals,All,true,"{""category"":""config""}"
Data/Lang/English/game.str,67890,f45...,a67...,Generals,DE,true,"{""category"":""language""}"
Data/English.big,112233,1a2...,2b3...,Generals,DE,true,"{""category"":""language""}"

Note that the language codes in the CSV output are uppercase (EN, DE, etc.) as per the requirement.

Cross-Cutting Sub-Issues (EPIC #108)

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions