python - code for counting number of sentences, words and characters in an input file -


i have written following code count number of sentences, words , characters in input file sample.txt, contains paragraph of text. works fine in giving number of sentences , words, not give precise , correct number of characters ( without whitespaces , punctuation marks)

lines,blanklines,sentences,words=0,0,0,0 num_chars=0

print '-'*50

try: filename = 'sample.txt' textf = open(filename,'r')c except ioerror: print 'cannot open file %s reading' % filename import sys sys.exit(0)

for line in textf: print line lines += 1 if line.startswith('\n'): blanklines += 1 else:

    sentences += line.count('.')+ line.count ('!')+ line.count('?')      tempwords = line.split(none)     print tempwords      words += len(tempwords) 

textf.close()

print '-'*50 print "lines:", lines print "blank lines:",blanklines print "sentences:",sentences print "words:",words

import nltk import nltk.data import nltk.tokenize

with open('sample.txt' , 'r') f: line in f: num_chars += len(line)

num_chars = num_chars - (words +1 )

pcount = 0 nltk.tokenize import treebankwordtokenizer open('sample.txt','r') f1: line in f1: #tokenised_words = nltk.tokenize.word_tokenize(line) tokenizer = treebankwordtokenizer() tokenised_words = tokenizer.tokenize(line) w in tokenised_words: if ((w=='.')|(w==';')|(w=='!')|(w=='?')): pcount = pcount + 1 print "pcount:",pcount num_chars = num_chars - pcount print "chars:",num_chars

pcount number of punctuation marks. can suggest changes need make in order find out exact number of characters without spaces , punctuation marks?

you can use regex replace non-alphanumeric characters , count number of characters in each line.


Comments

Popular posts from this blog

linux - Mailx and Gmail nss config dir -

c# - Is it possible to remove an existing registration from Autofac container builder? -

php - Mysql PK and FK char(36) vs int(10) -