numpy - plotting a 2D matrix in python, code and most useful visualization -


i have large matrix(10x55678) in "numpy" matrix format. rows of matrix correspond "topics" , columns correspond words(unique words text corpus). each entry i,j in matrix probability, meaning word j belongs topic probability x. since using ids rather real words , since dimension of matrix large need visualized in way.which visualization suggest? simple plot? or more sophisticated , informative one?(i asking these cause ignorant useful types of visualization). if possible can give me example using numpy matrix? thanks

the reason asked question want have general view of word-topic distributions in corpus. other methods welcome

you use matplotlib's imshowor pcolor method display data, comments have mentioned, might hard interpret without zooming in on subsets of data.

a = np.random.normal(0.0,0.5,size=(5000,10))**2 = a/np.sum(a,axis=1)[:,none]  # normalize  pcolor(a) 

unsorted random example

you sort words probability belong cluster:

maxvi = np.argsort(a,axis=1) ii = np.argsort(maxvi[:,-1])  pcolor(a[ii,:]) 

enter image description here

here word index on y-axis no longer equals original ordering since things have been sorted.

another possibility use networkx package plot word clusters each category, words highest probability represented nodes either larger or closer center of graph , ignore words have no membership in category. might easier since have large number of words , small number of categories.

hopefully 1 of these suggestions useful.


Comments

Popular posts from this blog

c# - Is it possible to remove an existing registration from Autofac container builder? -

php - Mysql PK and FK char(36) vs int(10) -