python - Constructing Zipf Distribution with matplotlib, FITTED-LINE -
i have list of paragraphs, want run zipf distribution on combination.
my code below:
from itertools import * pylab import * collections import counter import matplotlib.pyplot plt paragraphs = " ".join(targeted_paragraphs) paragraph in paragraphs: frequency = counter(paragraph.split()) counts = array(frequency.values()) tokens = frequency.keys() ranks = arange(1, len(counts)+1) indices = argsort(-counts) frequencies = counts[indices] loglog(ranks, frequencies, marker=".") title("zipf plot combined article paragraphs") xlabel("frequency rank of token") ylabel("absolute frequency of token") grid(true) n in list(logspace(-0.5, log10(len(counts)-1), 20).astype(int)): dummy = text(ranks[n], frequencies[n], " " + tokens[indices[n]], verticalalignment="bottom", horizontalalignment="left")
purpose attempt draw "a fitted line" in graph, , assign value variable. not know how add that. appreciated both of these issues.
i know it's been while since question asked. however, came across possible solution problem @ scipy site.
thought post here in case else required.
i didn't have paragraph info, here whipped dict
called frequency
has paragraph occurrence values.
we values , convert numpy array. define zipf distribution parameter
has >1.
finally display histogram of samples,along probability density function
working code:
import random import matplotlib.pyplot plt scipy import special import numpy np #generate sample dict random value simulate paragraph data frequency = {} i,j in enumerate(range(50)): frequency[i]=random.randint(1,50) counts = frequency.values() tokens = frequency.keys() #convert counts of values numpy array s = np.array(counts) #define zipf distribution parameter. has >1 = 2. # display histogram of samples, #along probability density function count, bins, ignored = plt.hist(s, 50, normed=true) plt.title("zipf plot combined article paragraphs") x = np.arange(1., 50.) plt.xlabel("frequency rank of token") y = x**(-a) / special.zetac(a) plt.ylabel("absolute frequency of token") plt.plot(x, y/max(y), linewidth=2, color='r') plt.show()
Comments
Post a Comment