numpy - plotting a 2D matrix in python, code and most useful visualization -
i have large matrix(10x55678) in "numpy" matrix format. rows of matrix correspond "topics" , columns correspond words(unique words text corpus). each entry i,j in matrix probability, meaning word j belongs topic probability x. since using ids rather real words , since dimension of matrix large need visualized in way.which visualization suggest? simple plot? or more sophisticated , informative one?(i asking these cause ignorant useful types of visualization). if possible can give me example using numpy matrix? thanks
the reason asked question want have general view of word-topic distributions in corpus. other methods welcome
you use matplotlib's imshow
or pcolor
method display data, comments have mentioned, might hard interpret without zooming in on subsets of data.
a = np.random.normal(0.0,0.5,size=(5000,10))**2 = a/np.sum(a,axis=1)[:,none] # normalize pcolor(a)
you sort words probability belong cluster:
maxvi = np.argsort(a,axis=1) ii = np.argsort(maxvi[:,-1]) pcolor(a[ii,:])
here word index on y-axis no longer equals original ordering since things have been sorted.
another possibility use networkx
package plot word clusters each category, words highest probability represented nodes either larger or closer center of graph , ignore words have no membership in category. might easier since have large number of words , small number of categories.
hopefully 1 of these suggestions useful.
Comments
Post a Comment