Performance ideas (in-memory C# hashset and contains too slow) -

i have following code

private void loadintomemory() {     //init large hashset     hashset<document> hsalldocuments = new hashset<document>();      //get first rows database     list<document> docslist = document.getallabovedocid(0, 500000);      //load objects dictionary     foreach (document d in docslist)     {         hsalldocuments.add(d);     }      application["dicalldocuments"] = hsalldocuments; }  private hashset<document> documenthits(hashset<document> hsrawhit, hashset<document> hsalldocuments, string query, string[] queryarray) {     int counter = 0;     const int maxcount = 1000;      foreach (document d in hsalldocuments)     {         //headline         if (d.headline.contains(query))         {             if (counter >= maxcount)                 break;             hsrawhit.add(d);             counter++;         }          //description         if (d.description.contains(query))         {             if (counter >= maxcount)                 break;             hsrawhit.add(d);             counter++;         }          //splitted query word word         //string[] queryarray = query.split(' ');         if (queryarray.count() > 1)         {             foreach (string q in queryarray)             {                 if (d.headline.contains(q))                 {                     if (counter >= maxcount)                         break;                     hsrawhit.add(d);                     counter++;                 }                  //description                 if (d.description.contains(q))                 {                     if (counter >= maxcount)                         break;                     hsrawhit.add(d);                     counter++;                 }             }         }      }      return hsrawhit; }

first load data hashset (via application later use) - runs fine - totally ok slow i'm doing.

will running 4.0 framework in c# (can't update new upgrade 4.0 async stuff).

the documenthits method runs slow on current setup - considering it's in memory. can speed method?

examples awesome - thanks.

i see using hashset, not using of it's advantages, should use list instead.

what's taking time looping through documents , looking strings in other strings, should try elliminate as possible of that.

one possibility set indexes of documents contains character pairs. if string query contains hello, looking in documents contains he, el, ll , lo.

you set dictionary<string, list<int>> dictionary key character combinations , list contains indexes documents in document list. setting dictionary take time, of course, can focus on character combinations less common. if character combination exists in 80% of documents, it's pretty useless elliminating documents, if character combination exists in 2% of documents has elliminated 98% of work.

if loop through documents in list , add occurances lists in dictionary, lists of indexes sorted, easy join lists later on. when add indexes list, can throw away lists when large , see not useful elliminating documents. way keeping shorter lists , not consume memory.

edit:

it put small example:

public class indexelliminator<t> {    private list<t> _items;   private dictionary<string, list<int>> _index;   private func<t, string> _getcontent;    private static hashset<string> getpairs(string value) {     hashset<string> pairs = new hashset<string>();     (int = 1; < value.length; i++) {       pairs.add(value.substring(i - 1, 2));     }     return pairs;   }    public indexelliminator(list<t> items, func<t, string> getcontent, int maxindexsize) {     _items = items;     _getcontent = getcontent;     _index = new dictionary<string, list<int>>();     (int index = 0;index<_items.count;index++) {       t item = _items[index];       foreach (string pair in getpairs(_getcontent(item))) {         list<int> list;         if (_index.trygetvalue(pair, out list)) {           if (list != null) {             if (list.count == maxindexsize) {               _index[pair] = null;             } else {               list.add(index);             }           }         } else {           list = new list<int>();           list.add(index);           _index.add(pair, list);         }       }     }   }    private static list<int> joinlists(list<int> list1, list<int> list2) {     list<int> result = new list<int>();     int i1 = 0, i2 = 0;     while (i1 < list1.count && i2 < list2.count) {       switch (math.sign(list1[i1].compareto(list2[i2]))) {         case 0: result.add(list1[i1]); i1++; i2++; break;         case -1: i1++; break;         case 1: i2++; break;       }     }     return result;   }    public list<t> find(string query) {     hashset<string> pairs = getpairs(query);     list<list<int>> indexes = new list<list<int>>();     bool found = false;     foreach (string pair in pairs) {       list<int> list;       if (_index.trygetvalue(pair, out list)) {         found = true;         if (list != null) {           indexes.add(list);         }       }     }     list<t> result = new list<t>();     if (found && indexes.count == 0) {       indexes.add(enumerable.range(0, _items.count).tolist());     }     if (indexes.count > 0) {       while (indexes.count > 1) {         indexes[indexes.count - 2] = joinlists(indexes[indexes.count - 2], indexes[indexes.count - 1]);         indexes.removeat(indexes.count - 1);       }       foreach (int index in indexes[0]) {         if (_getcontent(_items[index]).contains(query)) {           result.add(_items[index]);         }       }     }     return result;   }  }

test:

list<string> items = new list<string> {   "hello world",   "how you",   "what this",   "can true",   "some phrases",   "words upon words",   "what do",   "where go",   "when this",   "how can be",   "well above margin",   "close center" }; indexelliminator<string> index = new indexelliminator<string>(items, s => s, items.count / 2);  list<string> found = index.find("this"); foreach (string s in found) console.writeline(s);

output:

what can true when how can

Search This Blog

Return

Performance ideas (in-memory C# hashset and contains too slow) -

edit:

Comments

Post a Comment

Popular posts from this blog

Show multiple (2,3,4,…) images in the same window in OpenCV -

c# - Is it possible to remove an existing registration from Autofac container builder? -

asp.net - RadAsyncUpload in code behind, how to? -