python - code for counting number of sentences, words and characters in an input file -
i have written following code count number of sentences, words , characters in input file sample.txt, contains paragraph of text. works fine in giving number of sentences , words, not give precise , correct number of characters ( without whitespaces , punctuation marks)
lines,blanklines,sentences,words=0,0,0,0 num_chars=0 print '-'*50
try: filename = 'sample.txt' textf = open(filename,'r')c except ioerror: print 'cannot open file %s reading' % filename import sys sys.exit(0)
for line in textf: print line lines += 1 if line.startswith('\n'): blanklines += 1 else:
sentences += line.count('.')+ line.count ('!')+ line.count('?') tempwords = line.split(none) print tempwords words += len(tempwords)
textf.close()
print '-'*50 print "lines:", lines print "blank lines:",blanklines print "sentences:",sentences print "words:",words
import nltk import nltk.data import nltk.tokenize
with open('sample.txt' , 'r') f: line in f: num_chars += len(line)
num_chars = num_chars - (words +1 )
pcount = 0 nltk.tokenize import treebankwordtokenizer open('sample.txt','r') f1: line in f1: #tokenised_words = nltk.tokenize.word_tokenize(line) tokenizer = treebankwordtokenizer() tokenised_words = tokenizer.tokenize(line) w in tokenised_words: if ((w=='.')|(w==';')|(w=='!')|(w=='?')): pcount = pcount + 1 print "pcount:",pcount num_chars = num_chars - pcount print "chars:",num_chars
pcount number of punctuation marks. can suggest changes need make in order find out exact number of characters without spaces , punctuation marks?
you can use regex replace non-alphanumeric characters , count number of characters in each line.
Comments
Post a Comment