Add -f codepage flag for input/output encoding#627
Add -f codepage flag for input/output encoding#627dlevy-msft-sql wants to merge 8 commits intomainfrom
Conversation
- Add -f/--code-page flag with ODBC-compatible format parsing - Support 50+ codepages: Unicode, Windows, OEM/DOS, ISO-8859, CJK, EBCDIC, Macintosh - Apply input codepage in IncludeFile() for :r command - Apply output codepage in outCommand() for :OUT file writes - Add --list-codepages flag to display all supported codepages - Add comprehensive unit tests for parsing and encoding lookup
There was a problem hiding this comment.
Pull request overview
Adds ODBC sqlcmd-compatible -f/--code-page support to control input/output file encodings (plus --list-codepages) for improved interoperability.
Changes:
- Introduces codepage parsing/validation and a mapping from Windows codepages to Go encodings.
- Applies configured codepages when reading
-i/:Rfiles and when writing-o/:OUT/:ERRORoutputs. - Adds CLI argument tests, unit tests for codepage parsing/lookup, and README documentation.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
cmd/sqlcmd/sqlcmd.go |
Adds -f/--code-page, --list-codepages, validation, and wiring into runtime Sqlcmd configuration. |
cmd/sqlcmd/sqlcmd_test.go |
Adds CLI parsing/validation test cases for the new flags. |
pkg/sqlcmd/codepage.go |
Implements ParseCodePage, GetEncoding, and SupportedCodePages. |
pkg/sqlcmd/codepage_test.go |
Adds unit tests for parsing and encoding lookup. |
pkg/sqlcmd/commands.go |
Applies output codepage transforms in :OUT and :ERROR. |
pkg/sqlcmd/sqlcmd.go |
Applies input codepage transforms (or BOM-based auto-detect fallback) when including files. |
README.md |
Documents -f usage and --list-codepages. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Fix nil encoding panic in errorCommand when OutputCodePage is 65001 (UTF-8) - Close file handle in outCommand when GetEncoding returns an error - Handle close error properly in errorCommand - Apply UTF-8 BOM stripping when input codepage is 65001 - Fix test subtest names to use strconv.Itoa instead of string(rune)
- Use localizer.Errorf for all user-facing error messages - Fix UTF-16 BOM handling using ExpectBOM for input decoding - Add transformWriteCloser to properly close underlying file handles - Use transformWriteCloser in outCommand and errorCommand for both UnicodeOutputFile and CodePage transforms to prevent file handle leaks - Add integration tests for output/error codepage encoding
Use strconv.Itoa instead of %d to avoid locale-specific thousands separators in error message.
- Create codepageRegistry map as single source of truth for codepages - GetEncoding() now uses the registry instead of switch statement - SupportedCodePages() now generates list from registry - Removes duplicate codepage definitions between the two functions - Sort SupportedCodePages result by codepage number for consistency
- Verify all returned codepages are valid in GetEncoding - Ensure results are sorted by codepage number - Check well-known codepages are present
| settings.OutputCodePage = cp | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
ParseCodePage currently accepts inputs that contain no actual codepage value (e.g. "," or only whitespace) and returns a non-nil CodePageSettings with both InputCodePage/OutputCodePage left as 0. That silently disables codepage handling even though the user supplied -f. Consider detecting this case (arg != "" but neither codepage parsed) and returning an error (and add a unit test for it).
| // If a non-empty argument was provided but no codepage was parsed, | |
| // treat this as an error rather than silently disabling codepage handling. | |
| if arg != "" && settings.InputCodePage == 0 && settings.OutputCodePage == 0 { | |
| return nil, localizer.Errorf("invalid codepage: %s", arg) | |
| } |
| // UTF-8 codepage: still apply BOM stripping | ||
| utf8bom := unicode.BOMOverride(unicode.UTF8.NewDecoder()) | ||
| reader = transform.NewReader(f, utf8bom) | ||
| } | ||
| } else { | ||
| // Default: auto-detect BOM for UTF-16, fallback to UTF-8 |
There was a problem hiding this comment.
The comment "UTF-8 codepage: still apply BOM stripping" is a bit misleading: unicode.BOMOverride will also switch decoders when it sees a UTF-16 BOM (not just strip a UTF-8 BOM). If the intent is broader BOM auto-detection, consider updating the comment; if the intent is only stripping a UTF-8 BOM, consider using a transformer that only removes the UTF-8 BOM.
| // UTF-8 codepage: still apply BOM stripping | |
| utf8bom := unicode.BOMOverride(unicode.UTF8.NewDecoder()) | |
| reader = transform.NewReader(f, utf8bom) | |
| } | |
| } else { | |
| // Default: auto-detect BOM for UTF-16, fallback to UTF-8 | |
| // UTF-8 codepage: use BOMOverride to strip UTF-8 BOM and auto-detect UTF-16 BOMs, defaulting to UTF-8 otherwise | |
| utf8bom := unicode.BOMOverride(unicode.UTF8.NewDecoder()) | |
| reader = transform.NewReader(f, utf8bom) | |
| } | |
| } else { | |
| // Default: auto-detect BOMs (UTF-8/UTF-16) and decode accordingly, falling back to UTF-8 when no BOM is present |
| name string | ||
| codepage int | ||
| expectedBytes []byte | ||
| inputText string | ||
| skipOnEncError bool |
There was a problem hiding this comment.
The skipOnEncError field in this test table is never used. Please remove it or use it (e.g., to skip assertions when a target encoding can’t represent the input text), to avoid dead/misleading test code.
| name string | |
| codepage int | |
| expectedBytes []byte | |
| inputText string | |
| skipOnEncError bool | |
| name string | |
| codepage int | |
| expectedBytes []byte | |
| inputText string |
|
PR replaced by #638 |
Summary
Implements the
-fflag from ODBC sqlcmd for specifying input and output file code pages.Changes
CodePageandListCodePagesfields,-f/--code-pageflag, validation, and runtime application:Rcommand to respect codepage settings when reading filesCodePagefield to Sqlcmd structUsage
Supported Codepages
Testing
Improves ODBC sqlcmd compatibility.