String similarity in PHP: levenshtein like function for long strings -
the function levenshtein
in php works on strings maximum length 255. alternatives compute similarity score of sentences in php.
basically have database of sentences, , want find approximate duplicates. similar_text
function not giving me expected results. easiest way me detect similar sentences below:
$ss="jack nice boy, isn't he?"; $pp="jack nice boy he"; $ss=strtolower($ss); // convert lower case dont care case $pp=strtolower($pp); $score=similar_text($ss, $pp); echo "$score %\n"; // outputs 29 % $score=levenshtein ( $ss, $pp ); echo "$score\n"; // outputs '5', indicates similar. but, not work more 255 chars :(
the levenshtein
algorithm has time complexity of o(n*m)
, n
, m
lengths of 2 input strings. pretty expensive , computing such distance long strings take long time.
for whole sentences, might want use diff
algorithm instead, see example: highlight difference between 2 strings in php
having said this, php provides similar_text
function has worse complexity (o(max(n,m)**3)
) seems work on longer strings.
Comments
Post a Comment