A family of fixed-width, bit-packed string types optimized for:
- Deterministic memory layout
- Zero allocation
- O(1) comparisons
- Fast hashing
- Compact identifier storage
The core idea: encode restricted alphabets into 6-bit symbols and store them inside native integer words.
PackedString is a fixed-capacity, bit-packed string representation optimized for:
- Predictable performance
- Constant-time metadata access
- Extremely fast equality
- Cache efficiency
- VM / compiler / tokenizer usage
- Identifier-heavy workloads
Instead of storing 8-bit ASCII characters, it uses:
- 6-bit encoding
- 64-character alphabet
- Fixed-width storage
- Embedded metadata
This makes the representation:
- 100% stack-allocatable
- No heap allocation
- No dynamic memory
- No pointer chasing
- Deterministic behavior
Each character is encoded in 6 bits.
Alphabet (64 symbols):
0-9
a-z
A-Z
_ $
Total: 10 + 26 + 26 + 2 = 64.
This allows exact packing inside fixed-size integer blocks.
All APIs operate on sixbit values internally.
Alphabet (64 symbols):
0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_$
Encoding rule:
- 6 bits per character
- ASCII only touched during pack/unpack/debug
- Public API operates on sixbit values
Example:
u8 h = ps_char('H'); // encode ASCII → sixbit
ps_find_six(ps, h); // searchYou deliberately expose encoding — this is C, not a managed abstraction.
The PackedString family is designed to scale by capacity:
| Variant | Max Chars | Storage Size | Use Case |
|---|---|---|---|
| packed8 | 10 | 64-bit | small identifiers |
| packed16 | 20 | 128-bit | general identifiers |
| packed24 | 30 | 192-bit | long tokens |
| packed32 | 40 | 256-bit | extended cases |
Each variant:
- Uses the same 6-bit encoding
- Uses embedded metadata
- Uses fixed layout
- Avoids dynamic allocation
The current stable implementation is packed16.
- Portable C (no
__uint128_t) - No platform-specific intrinsics
- Two
u64for 128-bit representation - Constant-time equality
- O(1) metadata access
- Branch-minimal critical paths
- VM-friendly
| Operation | Complexity | Notes |
|---|---|---|
| length | O(1) | metadata bits |
| flags | O(1) | metadata bits |
| equal | O(1) | 2× u64 compare |
| packed_compare | O(1) | 120-bit compare |
| hash | O(1) | murmur finalizer |
| at | O(1) | direct bit extraction |
| substring | O(1) | bit shifting |
| concat | O(1) | bounded 20 chars |
| case conversion | O(N ≤ 20) | max 20 iterations |
| contains | O(N) | bounded small N |
| contains_six | O(1) | jump table |
All operations are bounded by 20 characters maximum.
This gives predictable latency.
Length field is 5 bits → 0–31.
Valid lengths: 0–20
21–30: reserved
31: INVALID state
Special states:
PSC_INVALIDPSC_NULLPSC_EMPTY
This allows:
- Optional-like semantics
- Error propagation without extra memory
- State tagging without changing structure size
This is not a general string library.
Use when:
- You need fixed-size identifiers
- You want value-type strings
- You want deterministic hashing
- You want very fast comparisons
- You are building a compiler, VM, DSL, DB, token system
Avoid when:
- Arbitrary UTF-8 required
- Large dynamic text required