Number of Words in a String

  • Follow


Hello everyone,

Is this the simplest way to find the number of words in a string?   
Seems a little complicated, and I can't seem to turn it into a  
function because when I replace the string with the argument  
placeholder myString_ I get an error message saying that a string is  
expected in that spot.

	Length[ReadList[StringToStream["The cat in the hat."], Word]]

	Returns 5.

Gregory

0
Reply gregory.lypny (231) 8/11/2009 7:58:29 AM

wordCount[str_String] := 
 Module[{s = StringReplace[str,
     Whitespace -> " "]},
  StringCount[s, " "] + 1]

str = "The  cat in      the hat.";

wordCount[str]

5


Bob Hanlon

---- Gregory Lypny <gregory.lypny@videotron.ca> wrote: 

=============
Hello everyone,

Is this the simplest way to find the number of words in a string?   
Seems a little complicated, and I can't seem to turn it into a  
function because when I replace the string with the argument  
placeholder myString_ I get an error message saying that a string is  
expected in that spot.

	Length[ReadList[StringToStream["The cat in the hat."], Word]]

	Returns 5.

Gregory



0
Reply hanlonr (2281) 8/12/2009 8:32:25 AM


Length[StringSplit["The cat in the hat."]]

--David

On Aug 11, 3:58 am, Gregory Lypny <gregory.ly...@videotron.ca> wrote:
> Hello everyone,
>
> Is this the simplest way to find the number of words in a string?  
> Seems a little complicated, and I can't seem to turn it into a  
> function because when I replace the string with the argument  
> placeholder myString_ I get an error message saying that a string is  
> expected in that spot.
>
>         Length[ReadList[StringToStream["The cat in the hat."], Wo=
rd]]
>
>         Returns 5.
>
> Gregory


0
Reply dbreiss (537) 8/12/2009 8:32:36 AM

On Aug 11, 3:58 am, Gregory Lypny <gregory.ly...@videotron.ca> wrote:

> Is this the simplest way to find the number of words in a string?  
[...]
>         Length[ReadList[StringToStream["The cat in the hat."], Word]]

I would use StringCount:

StringCount["Now is the time for all good men to come to the aid of
their country.",
  LetterCharacter..]

will return 16. You can tweak the pattern if you want to, say, treat
strings of digits as words as well.

Cheers,
Pillsy

0
Reply pillsbury (453) 8/12/2009 8:34:25 AM

Hi,

directly from the documentation:

StringCount["The cat in the hat.", WordCharacter..]

Funny enough, for the U.S. constitution example, vim, Word and
OpenOffice report 7620 words instead of Mathematica's 7632. So
WordCharacter.. seems not completely equivalent to other counting methods.

Regards,
Yves

Gregory Lypny schrieb:
> Hello everyone,
>
> Is this the simplest way to find the number of words in a string?  
> Seems a little complicated, and I can't seem to turn it into a 
> function because when I replace the string with the argument 
> placeholder myString_ I get an error message saying that a string is 
> expected in that spot.
>
> 	Length[ReadList[StringToStream["The cat in the hat."], Word]]
>
> 	Returns 5.
>
> Gregory
>

0
Reply yves.klett (109) 8/12/2009 8:35:53 AM

hi,

Try this:

CountWords[str_String]:=Length[StringSplit[str]]

Its not perfect, still working on it. I suspect only via a RegEx use
will we get what we want.

andrew

0
Reply meitnik (88) 8/12/2009 8:36:46 AM

On Aug 11, 5:58 pm, Gregory Lypny <gregory.ly...@videotron.ca> wrote:
> Hello everyone,
>
> Is this the simplest way to find the number of words in a string?  
> Seems a little complicated, and I can't seem to turn it into a  
> function because when I replace the string with the argument  
> placeholder myString_ I get an error message saying that a string is  
> expected in that spot.
>
>         Length[ReadList[StringToStream["The cat in the hat."], Word]]
>
>         Returns 5.
>
> Gregory

were you using Set ( a = b) rather than SetDelayed (a := b) in your
function definition? The following works for me:

In[12]:= f[str_] := Length[ReadList[StringToStream[str], Word]]
f["hey there"]
Out[13]= 2

However, it's much quicker to use the StringSplit function:

In[14]:= Length@StringSplit["hey there"]
Out[14]= 2
In[19]:= Do[f["hey there"],{5000}] // AbsoluteTiming
Out[19]= {5.1405921,Null}
In[21]:= Do[Length@StringSplit["hey there"],{5000}] // AbsoluteTiming
Out[21]= {0.0156249,Null}

Cheers,
Peter.

0
Reply pfalloon (72) 8/12/2009 8:37:08 AM

Hi,

> 
> Is this the simplest way to find the number of words in a string?   

I don't know if you would consider this simpler, it is straight from the
documentation for StringCount:

StringCount["The cat in the hat.", WordCharacter ..]

> Seems a little complicated, and I can't seem to turn it into a  
> function because when I replace the string with the argument  
> placeholder myString_ I get an error message saying that a string is  
> expected in that spot.
> 
> 	Length[ReadList[StringToStream["The cat in the hat."], Word]]
> 
> 	Returns 5.

I don't know what you did when turning it into a function, but this:

numwords[s_String] := Length[ReadList[StringToStream[s], Word]]

numwords["The cat in the hat."]

seems to work alright....


hth,

albert

0
Reply awnl4187 (501) 8/12/2009 8:37:29 AM

On Aug 11, 2:58 am, Gregory Lypny <gregory.ly...@videotron.ca> wrote:
> Hello everyone,
>
> Is this the simplest way to find the number of words in a string?  
> Seems a little complicated, and I can't seem to turn it into a  
> function because when I replace the string with the argument  
> placeholder myString_ I get an error message saying that a string is  
> expected in that spot.
>
>         Length[ReadList[StringToStream["The cat in the hat."], Wo=
rd]]
>
>         Returns 5.
>
> Gregory

StringCount["The cat in the hat.", Whitespace] + 1

probably a better way is use WordBoundary.

StringCount["The cat in the hat.", WordBoundary]/2

Mike

0
Reply mike.honeychurch (210) 8/12/2009 8:38:02 AM

Hi Albert,
 
> I don't know what you did when turning it into a function, but this:
> 
> numwords[s_String] := Length[ReadList[StringToStream[s], Word]]
> 
> numwords["The cat in the hat."]
> 
> seems to work alright....

It does for a few calls.  But many calls to this (or similar) functions
leaves many open streams which slows your machine. To see this, try the
following sequence (which numwords defined as above)

opn = Streams[]
numwords@"the cat in the hat" & /@ Range[15];
opn = Streams[]

The streams can be closed using Close[#]& /@ Select[opn, SameQ[Head@#,
InputStream] &];

There are at least three remedies to this.

One is to remember to periodically close all opened streams.

Another is to modify Albert's function to something like

numwordsb[s_String] := Block[{opn, lng}, lng = Length[ReadList[opn =
StringToStream[s], Word]]; Close[opn]; lng]

But the most effective is probably to avoid StringToStream as much as you
can.  For me, this is by using Import[, "Table"] rather than my <6.0
hand-rolled code.

YMMV.

Regards,

Dave.


0
Reply davidannetts1 (270) 8/13/2009 7:21:01 AM

With respect to the use of regular expressions or Mathematica string
patterns, note that the former are faster 20-30%:

In[1]:= NN = 1000000;

For word splitting:

In[2]:= StringSplit["The cat in a hat, (not on the mat)??.",
RegularExpression["[^A-Za-z]+"]]
Out[2]= {The,cat,in,a,hat,not,on,the,mat}

In[3]:= Timing[
 Do[StringSplit["The cat in a hat, (not on the mat)??.",
RegularExpression["[^A-Za-z]+"]];, {NN}]]
Out[3]= {9.999,Null}

In[4]:= StringSplit["The cat in a hat, (not on the mat)??.", Except
[WordCharacter] ..]
Out[4]= {The,cat,in,a,hat,not,on,the,mat}

In[5]:= Timing[
 Do[StringSplit["The cat in a hat, (not on the mat)??.", Except
[WordCharacter] ..];, {NN}]]
Out[5]= {12.808,Null}

so: 22% faster with regex.


For word counting:

In[6]:= StringCount["The cat in a hat, (not on the mat)??.",
RegularExpression["[A-Za-z]+"]]
Out[6]= 9

In[7]:= Timing[
 Do[StringCount["The cat in a hat, (not on the mat)??.",
RegularExpression["[A-Za-z]+"]];, {NN}]]
Out[7]= {6.396,Null}

In[8]:= StringCount["The cat in a hat, (not on the mat)??.",
WordCharacter ..]
Out[8]= 9

In[9]:= Timing[
 Do[StringCount["The cat in a hat, (not on the mat)??.",
WordCharacter ..];, {NN}]]
Out[9]= {9.438,Null}

so, 32% faster with regex.


ADL


On Aug 11, 9:58 am, Gregory Lypny <gregory.ly...@videotron.ca> wrote:
> Hello everyone,
>
> Is this the simplest way to find the number of words in a string?  
> Seems a little complicated, and I can't seem to turn it into a  
> function because when I replace the string with the argument  
> placeholder myString_ I get an error message saying that a string is  
> expected in that spot.
>
>         Length[ReadList[StringToStream["The cat in the hat."], Wo=
rd]]
>
>         Returns 5.
>
> Gregory


0
Reply alberto.dilullo (79) 8/14/2009 9:59:13 AM

10 Replies
26 Views

(page loaded in 0.134 seconds)

Similiar Articles:













7/22/2012 2:43:46 PM


Reply: