Algorithm to Map Strings to Short Replacements -


i'm looking @ ways deterministically replace unique strings unique , optimally short replacements. have finite set of strings, , best compression achieve far through enumeration algorithm, order input set , replace strings enumeration of char strings on extended alphabet (a..z, a...z, aa...zz, aa... zz, a0...z9, aa..., aaa...zaa, aaa...zaaa, ....).

this works wonderfully far compression concerned, has severe drawback not atomic on given input string. rather, result depends on knowing all input strings right start, , on ordering of input set.

anybody knows of algorithm has similar compression doesn't require knowing input strings upfront?! hashing example not work me, depending on size of input set i'd need hash length of 8-12 hashes unique, , long replacements (currently, replacement strings 1-3 chars long use cases (<10,000 input strings)). also, if theoreticians among know wasted effort, interested hear :-) .

you use enumeration scheme, sorted order in first encounter input strings.

for example, first string ever process can mapped "a". next distinct string mapped "b", etc.

every time process string, you'd need see if has been mapped.


Comments

Popular posts from this blog

Javascript line number mapping -

c# - Is it possible to remove an existing registration from Autofac container builder? -

php - Mysql PK and FK char(36) vs int(10) -