Hello everyone,
Is this the simplest way to find the number of words in a string?
Seems a little complicated, and I can't seem to turn it into a
function because when I replace the string with the argument
placeholder myString_ I get an error message saying that a string is
expected in that spot.
Length[ReadList[StringToStream["The cat in the hat."], Word]]
Returns 5.
Gregory
|
|
0
|
|
|
|
Reply
|
gregory.lypny (231)
|
8/11/2009 7:58:29 AM |
|
wordCount[str_String] :=
Module[{s = StringReplace[str,
Whitespace -> " "]},
StringCount[s, " "] + 1]
str = "The cat in the hat.";
wordCount[str]
5
Bob Hanlon
---- Gregory Lypny <gregory.lypny@videotron.ca> wrote:
=============
Hello everyone,
Is this the simplest way to find the number of words in a string?
Seems a little complicated, and I can't seem to turn it into a
function because when I replace the string with the argument
placeholder myString_ I get an error message saying that a string is
expected in that spot.
Length[ReadList[StringToStream["The cat in the hat."], Word]]
Returns 5.
Gregory
|
|
0
|
|
|
|
Reply
|
hanlonr (2281)
|
8/12/2009 8:32:25 AM
|
|
Length[StringSplit["The cat in the hat."]]
--David
On Aug 11, 3:58 am, Gregory Lypny <gregory.ly...@videotron.ca> wrote:
> Hello everyone,
>
> Is this the simplest way to find the number of words in a string?
> Seems a little complicated, and I can't seem to turn it into a
> function because when I replace the string with the argument
> placeholder myString_ I get an error message saying that a string is
> expected in that spot.
>
> Length[ReadList[StringToStream["The cat in the hat."], Wo=
rd]]
>
> Returns 5.
>
> Gregory
|
|
0
|
|
|
|
Reply
|
dbreiss (537)
|
8/12/2009 8:32:36 AM
|
|
On Aug 11, 3:58 am, Gregory Lypny <gregory.ly...@videotron.ca> wrote:
> Is this the simplest way to find the number of words in a string?
[...]
> Length[ReadList[StringToStream["The cat in the hat."], Word]]
I would use StringCount:
StringCount["Now is the time for all good men to come to the aid of
their country.",
LetterCharacter..]
will return 16. You can tweak the pattern if you want to, say, treat
strings of digits as words as well.
Cheers,
Pillsy
|
|
0
|
|
|
|
Reply
|
pillsbury (453)
|
8/12/2009 8:34:25 AM
|
|
Hi,
directly from the documentation:
StringCount["The cat in the hat.", WordCharacter..]
Funny enough, for the U.S. constitution example, vim, Word and
OpenOffice report 7620 words instead of Mathematica's 7632. So
WordCharacter.. seems not completely equivalent to other counting methods.
Regards,
Yves
Gregory Lypny schrieb:
> Hello everyone,
>
> Is this the simplest way to find the number of words in a string?
> Seems a little complicated, and I can't seem to turn it into a
> function because when I replace the string with the argument
> placeholder myString_ I get an error message saying that a string is
> expected in that spot.
>
> Length[ReadList[StringToStream["The cat in the hat."], Word]]
>
> Returns 5.
>
> Gregory
>
|
|
0
|
|
|
|
Reply
|
yves.klett (109)
|
8/12/2009 8:35:53 AM
|
|
hi,
Try this:
CountWords[str_String]:=Length[StringSplit[str]]
Its not perfect, still working on it. I suspect only via a RegEx use
will we get what we want.
andrew
|
|
0
|
|
|
|
Reply
|
meitnik (88)
|
8/12/2009 8:36:46 AM
|
|
On Aug 11, 5:58 pm, Gregory Lypny <gregory.ly...@videotron.ca> wrote:
> Hello everyone,
>
> Is this the simplest way to find the number of words in a string?
> Seems a little complicated, and I can't seem to turn it into a
> function because when I replace the string with the argument
> placeholder myString_ I get an error message saying that a string is
> expected in that spot.
>
> Length[ReadList[StringToStream["The cat in the hat."], Word]]
>
> Returns 5.
>
> Gregory
were you using Set ( a = b) rather than SetDelayed (a := b) in your
function definition? The following works for me:
In[12]:= f[str_] := Length[ReadList[StringToStream[str], Word]]
f["hey there"]
Out[13]= 2
However, it's much quicker to use the StringSplit function:
In[14]:= Length@StringSplit["hey there"]
Out[14]= 2
In[19]:= Do[f["hey there"],{5000}] // AbsoluteTiming
Out[19]= {5.1405921,Null}
In[21]:= Do[Length@StringSplit["hey there"],{5000}] // AbsoluteTiming
Out[21]= {0.0156249,Null}
Cheers,
Peter.
|
|
0
|
|
|
|
Reply
|
pfalloon (72)
|
8/12/2009 8:37:08 AM
|
|
Hi,
>
> Is this the simplest way to find the number of words in a string?
I don't know if you would consider this simpler, it is straight from the
documentation for StringCount:
StringCount["The cat in the hat.", WordCharacter ..]
> Seems a little complicated, and I can't seem to turn it into a
> function because when I replace the string with the argument
> placeholder myString_ I get an error message saying that a string is
> expected in that spot.
>
> Length[ReadList[StringToStream["The cat in the hat."], Word]]
>
> Returns 5.
I don't know what you did when turning it into a function, but this:
numwords[s_String] := Length[ReadList[StringToStream[s], Word]]
numwords["The cat in the hat."]
seems to work alright....
hth,
albert
|
|
0
|
|
|
|
Reply
|
awnl4187 (501)
|
8/12/2009 8:37:29 AM
|
|
On Aug 11, 2:58 am, Gregory Lypny <gregory.ly...@videotron.ca> wrote:
> Hello everyone,
>
> Is this the simplest way to find the number of words in a string?
> Seems a little complicated, and I can't seem to turn it into a
> function because when I replace the string with the argument
> placeholder myString_ I get an error message saying that a string is
> expected in that spot.
>
> Length[ReadList[StringToStream["The cat in the hat."], Wo=
rd]]
>
> Returns 5.
>
> Gregory
StringCount["The cat in the hat.", Whitespace] + 1
probably a better way is use WordBoundary.
StringCount["The cat in the hat.", WordBoundary]/2
Mike
|
|
0
|
|
|
|
Reply
|
mike.honeychurch (210)
|
8/12/2009 8:38:02 AM
|
|
Hi Albert,
> I don't know what you did when turning it into a function, but this:
>
> numwords[s_String] := Length[ReadList[StringToStream[s], Word]]
>
> numwords["The cat in the hat."]
>
> seems to work alright....
It does for a few calls. But many calls to this (or similar) functions
leaves many open streams which slows your machine. To see this, try the
following sequence (which numwords defined as above)
opn = Streams[]
numwords@"the cat in the hat" & /@ Range[15];
opn = Streams[]
The streams can be closed using Close[#]& /@ Select[opn, SameQ[Head@#,
InputStream] &];
There are at least three remedies to this.
One is to remember to periodically close all opened streams.
Another is to modify Albert's function to something like
numwordsb[s_String] := Block[{opn, lng}, lng = Length[ReadList[opn =
StringToStream[s], Word]]; Close[opn]; lng]
But the most effective is probably to avoid StringToStream as much as you
can. For me, this is by using Import[, "Table"] rather than my <6.0
hand-rolled code.
YMMV.
Regards,
Dave.
|
|
0
|
|
|
|
Reply
|
davidannetts1 (270)
|
8/13/2009 7:21:01 AM
|
|
With respect to the use of regular expressions or Mathematica string
patterns, note that the former are faster 20-30%:
In[1]:= NN = 1000000;
For word splitting:
In[2]:= StringSplit["The cat in a hat, (not on the mat)??.",
RegularExpression["[^A-Za-z]+"]]
Out[2]= {The,cat,in,a,hat,not,on,the,mat}
In[3]:= Timing[
Do[StringSplit["The cat in a hat, (not on the mat)??.",
RegularExpression["[^A-Za-z]+"]];, {NN}]]
Out[3]= {9.999,Null}
In[4]:= StringSplit["The cat in a hat, (not on the mat)??.", Except
[WordCharacter] ..]
Out[4]= {The,cat,in,a,hat,not,on,the,mat}
In[5]:= Timing[
Do[StringSplit["The cat in a hat, (not on the mat)??.", Except
[WordCharacter] ..];, {NN}]]
Out[5]= {12.808,Null}
so: 22% faster with regex.
For word counting:
In[6]:= StringCount["The cat in a hat, (not on the mat)??.",
RegularExpression["[A-Za-z]+"]]
Out[6]= 9
In[7]:= Timing[
Do[StringCount["The cat in a hat, (not on the mat)??.",
RegularExpression["[A-Za-z]+"]];, {NN}]]
Out[7]= {6.396,Null}
In[8]:= StringCount["The cat in a hat, (not on the mat)??.",
WordCharacter ..]
Out[8]= 9
In[9]:= Timing[
Do[StringCount["The cat in a hat, (not on the mat)??.",
WordCharacter ..];, {NN}]]
Out[9]= {9.438,Null}
so, 32% faster with regex.
ADL
On Aug 11, 9:58 am, Gregory Lypny <gregory.ly...@videotron.ca> wrote:
> Hello everyone,
>
> Is this the simplest way to find the number of words in a string?
> Seems a little complicated, and I can't seem to turn it into a
> function because when I replace the string with the argument
> placeholder myString_ I get an error message saying that a string is
> expected in that spot.
>
> Length[ReadList[StringToStream["The cat in the hat."], Wo=
rd]]
>
> Returns 5.
>
> Gregory
|
|
0
|
|
|
|
Reply
|
alberto.dilullo (79)
|
8/14/2009 9:59:13 AM
|
|
|
10 Replies
26 Views
(page loaded in 0.134 seconds)
|