String similarity in PHP: levenshtein like function for long strings -


the function levenshtein in php works on strings maximum length 255. alternatives compute similarity score of sentences in php.

basically have database of sentences, , want find approximate duplicates. similar_text function not giving me expected results. easiest way me detect similar sentences below:

$ss="jack nice boy, isn't he?"; $pp="jack nice boy he";  $ss=strtolower($ss);  // convert lower case dont care case $pp=strtolower($pp);  $score=similar_text($ss, $pp); echo "$score %\n";  // outputs 29 %  $score=levenshtein ( $ss, $pp ); echo "$score\n";  // outputs '5', indicates similar. but, not work more 255 chars :( 

the levenshtein algorithm has time complexity of o(n*m), n , m lengths of 2 input strings. pretty expensive , computing such distance long strings take long time.

for whole sentences, might want use diff algorithm instead, see example: highlight difference between 2 strings in php

having said this, php provides similar_text function has worse complexity (o(max(n,m)**3)) seems work on longer strings.


Comments

Popular posts from this blog

Javascript line number mapping -

c# - Is it possible to remove an existing registration from Autofac container builder? -

php - Mysql PK and FK char(36) vs int(10) -