Skip to content

Pretty fast and limited kind of string used for table keys, compiler keywords, unique map keys etc.

Notifications You must be signed in to change notification settings

dreamProgrammerh/Packed-String

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Packed String Family

A family of fixed-width, bit-packed string types optimized for:

  • Deterministic memory layout
  • Zero allocation
  • O(1) comparisons
  • Fast hashing
  • Compact identifier storage

The core idea: encode restricted alphabets into 6-bit symbols and store them inside native integer words.

Overview

PackedString is a fixed-capacity, bit-packed string representation optimized for:

  • Predictable performance
  • Constant-time metadata access
  • Extremely fast equality
  • Cache efficiency
  • VM / compiler / tokenizer usage
  • Identifier-heavy workloads

Instead of storing 8-bit ASCII characters, it uses:

  • 6-bit encoding
  • 64-character alphabet
  • Fixed-width storage
  • Embedded metadata

This makes the representation:

  • 100% stack-allocatable
  • No heap allocation
  • No dynamic memory
  • No pointer chasing
  • Deterministic behavior

Encoding Model

Each character is encoded in 6 bits.

Alphabet (64 symbols):

0-9
a-z
A-Z
_ $

Total: 10 + 26 + 26 + 2 = 64.

This allows exact packing inside fixed-size integer blocks.


Alphabet Model

All APIs operate on sixbit values internally.

Alphabet (64 symbols):

0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_$

Encoding rule:

  • 6 bits per character
  • ASCII only touched during pack/unpack/debug
  • Public API operates on sixbit values

Example:

u8 h = ps_char('H');     // encode ASCII → sixbit
ps_find_six(ps, h);      // search

You deliberately expose encoding — this is C, not a managed abstraction.


Packed Family Variants

The PackedString family is designed to scale by capacity:

Variant Max Chars Storage Size Use Case
packed8 10 64-bit small identifiers
packed16 20 128-bit general identifiers
packed24 30 192-bit long tokens
packed32 40 256-bit extended cases

Each variant:

  • Uses the same 6-bit encoding
  • Uses embedded metadata
  • Uses fixed layout
  • Avoids dynamic allocation

The current stable implementation is packed16.


Design Goals

  1. Portable C (no __uint128_t)
  2. No platform-specific intrinsics
  3. Two u64 for 128-bit representation
  4. Constant-time equality
  5. O(1) metadata access
  6. Branch-minimal critical paths
  7. VM-friendly

Packed16

Performance Characteristics

Operation Complexity Notes
length O(1) metadata bits
flags O(1) metadata bits
equal O(1) 2× u64 compare
packed_compare O(1) 120-bit compare
hash O(1) murmur finalizer
at O(1) direct bit extraction
substring O(1) bit shifting
concat O(1) bounded 20 chars
case conversion O(N ≤ 20) max 20 iterations
contains O(N) bounded small N
contains_six O(1) jump table

All operations are bounded by 20 characters maximum.

This gives predictable latency.


Error Model

Length field is 5 bits → 0–31.

Valid lengths: 0–20
21–30: reserved
31: INVALID state

Special states:

  • PSC_INVALID
  • PSC_NULL
  • PSC_EMPTY

This allows:

  • Optional-like semantics
  • Error propagation without extra memory
  • State tagging without changing structure size

Intended Usage

This is not a general string library.

Use when:

  • You need fixed-size identifiers
  • You want value-type strings
  • You want deterministic hashing
  • You want very fast comparisons
  • You are building a compiler, VM, DSL, DB, token system

Avoid when:

  • Arbitrary UTF-8 required
  • Large dynamic text required

About

Pretty fast and limited kind of string used for table keys, compiler keywords, unique map keys etc.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages