Hi!
I've been looking in API's for a while in desperate need for an easy
way to parse string and retrieve data (forget about Regexp or scanf),
so that any non-rubyist guy I work with could describe, with a single
string, a FTP directory on which some files are saved. Moreover, I
need some metadata so that I can effectively sort and work with data I
retrieve from this FTP.
For example, I would not know which file I should retrieve on:
'ftp://ftp.org/DATA/mike'
but
'ftp://ftp.org/DATA/{user_name}/{year}/{month}-{day}.txt' would do
just fine, so that I could, for example, get this hash:
{:year=>"2005", :user_name=>"mike", :day=>"15", :month=>"10"}
for this filename:
'ftp://ftp.org/DATA/mike/2005/10-15.txt'
I don't know if such a method is already available for Ruby, so I
decided to implement it on my own. Here it is:
###### Source ###########################################
class String
def parse_for_variables(description,begin_var_name="{",end_var_name="}")
split_reg_exp=Regexp.new(Regexp.quote(begin_var_name)<<"(.+?)"<<Regexp.quote(end_var_name))
@variables=[]
@is_a_variable_name=true
searching_reg_exp=Regexp.new("^"<<description.split(split_reg_exp).collect{|str|
@is_a_variable_name=!@is_a_variable_name
if @is_a_variable_name then
@variables<<str.sub(/:(\d+)$/,'').intern
str=~/:(\d+)$/ ? '(.{'<<$1<<'})' :"(.+)"
else
Regexp.quote(str)
end
}.join<<"$")
values=searching_reg_exp.match(self).to_a[1..-1]
!values.nil? &&
@variables.length==values.length &&
Hash.check_for_consistency_and_create_from_arrays(@variables,values)
end
end
class Hash
def self.create_from_arrays(keys,values)
self[*keys.zip(values).flatten]
end
def self.check_for_consistency_and_create_from_arrays(keys,values)
@result={}
keys.each_with_index{|k,i|
raise ArgumentError if @result.has_key?(k) and @result[k]!=values[i]
@result[k]=values[i]
}
@result
rescue ArgumentError
false
end
end
############################################################
#### Examples ###############################################
irb(main):026:0> 'foobar'.parse_for_variables('foo{name}')
=> {:name=>"bar"}
# You can specify the length of a string by adding :i to the end of a
variable name
irb(main):027:0> 'foobar'.parse_for_variables('foo{name:3}')
=> {:name=>"bar"}
irb(main):028:0> 'foobar'.parse_for_variables('foo{name:2}')
=> false
irb(main):029:0> 'foobar'.parse_for_variables('foo{name}')
=> {:name=>"bar"}
# By default, variable names are written between {}, but it could be
overridden with optional arguments
irb(main):030:0> 'foo(bar){|x|
x+2}'.parse_for_variables('foo(<<arg>>){|<<var>>|
<<expression>>}','<<','>>')
=> {:arg=>"bar", :var=>"x", :expression=>"x+2"}
irb(main):031:0>
'C:\Windows\system32\vbrun700.dll'.parse_for_variables('{disk}:\{path}\{filename}.{extension}')
=> {:disk=>"C", :extension=>"dll", :filename=>"vbrun700",
:path=>"Windows\\system32"}
irb(main):032:0>
'2006-12-09.csv'.parse_for_variables('{year}-{month}-{day}.csv')
=> {:year=>"2006", :day=>"09", :month=>"12"}
irb(main):033:0> '2005 12 15'.parse_for_variables('{year} {month} {day}')
=> {:year=>"2005", :day=>"15", :month=>"12"}
irb(main):034:0>
'20061209.txt'.parse_for_variables('{year:4}{month:2}{day:2}.txt')
=> {:year=>"2006", :day=>"09", :month=>"12"}
irb(main):035:0>
'20061209.txt'.parse_for_variables('{year:2}{month:2}{day:2}.txt')
=> false
# You can use a variable name twice:
irb(main):036:0>
'DATA/2007/2007-12-09.csv'.parse_for_variables('DATA/{year}/{year}-{month}-{day}.csv')
=> {:year=>"2007", :day=>"09", :month=>"12"}
# as long as values are consistent:
irb(main):037:0>
'DATA/2007/2006-12-09.csv'.parse_for_variables('DATA/{year}/{year}-{month}-{day}.csv')
=> false
irb(main):038:0> 'whateverTooLong'.parse_for_variables('whatever{name:4}')
=> false
irb(main):039:0>
'whateverAsLongAsIWant'.parse_for_variables('whateverKsome_variableK','K','K')
=> {:some_variable=>"AsLongAsIWant"}
irb(main):040:0>
'whatevertoolong.csv'.parse_for_variables('whatever$name:4$.csv','$','$')
=> false
############################################################
Have you ever use such a method?
Is it possible to implement it in a more elegant way?
Thanks for reading, and please feel free to use my code if you ever need it,
Eric Duminil
|
|
0
|
|
|
|
Reply
|
eric.duminil (17)
|
7/18/2007 7:50:35 AM |
|
Eric DUMINIL wrote:
> '20061209.txt'.parse_for_variables('{year:4}{month:2}{day:2}.txt')
> => {:year=>"2006", :day=>"09", :month=>"12"}
I like this. It's sort of like a cut down regex for non-programmers. You
should write this up with a definition and put it in a library. I bet
people would use it.
Don't forget to come up with a cool name.
best,
Dan
--
Posted via http://www.ruby-forum.com/.
|
|
0
|
|
|
|
Reply
|
dan9365 (85)
|
7/18/2007 7:57:47 AM
|
|
From: Eric DUMINIL [mailto:eric.duminil@gmail.com]=20
# For example, I would not know which file I should retrieve on:
# 'ftp://ftp.org/DATA/mike'
# but
# 'ftp://ftp.org/DATA/{user_name}/{year}/{month}-{day}.txt' would do
# just fine, so that I could, for example, get this hash:
# {:year=3D>"2005", :user_name=3D>"mike", :day=3D>"15", :month=3D>"10"}
# for this filename:
# 'ftp://ftp.org/DATA/mike/2005/10-15.txt'
very nice.
but would it be more practical if we delineate a variable just like we =
used to in ruby inline string; ie, use #{var} instead of just {var}
this would be handy like, if i want to rename or move all folders under =
/mike/2005/ to /mike/2007/ eg.. the retrieval and assignment string just =
stay the same...
kind regards -botp
|
|
0
|
|
|
|
Reply
|
botp (987)
|
7/18/2007 8:08:58 AM
|
|
Hi
Thanks for the appreciation!
Your suggestion is interesting, even though I'm not sure it would work, bec=
ause:
'foobar'.parse_for_variables('foo#{name}','#{')
=3D> {:name=3D>"bar"}
works, but when you use it with double quotes string:
'foobar'.parse_for_variables("foo#{name}",'#{')
NameError: undefined local variable or method `name' for main:Object
it already tries to evaluate "name" inside the string...
so either you get retrieval or assignment right, but not both :(
Anyway, assignment is not that big a deal:
(irb) h=3D{:year=3D>"2005", :user_name=3D>"mike", :day=3D>"15", :month=3D>"=
10"}
=3D> {:year=3D>"2005", :user_name=3D>"mike", :day=3D>"15", :month=3D>"10"}
(irb) 'ftp://ftp.org/DATA/{user_name}/{year}/{month}-{day}.txt'.gsub(/\{(.+=
?)\}/){h[$1.intern]}
=3D> "ftp://ftp.org/DATA/mike/2005/10-15.txt"
Best regards,
Eric
On 18/07/07, Pe=F1a, Botp <botp@delmonte-phil.com> wrote:
> From: Eric DUMINIL [mailto:eric.duminil@gmail.com]
> # For example, I would not know which file I should retrieve on:
> # 'ftp://ftp.org/DATA/mike'
> # but
> # 'ftp://ftp.org/DATA/{user_name}/{year}/{month}-{day}.txt' would do
> # just fine, so that I could, for example, get this hash:
> # {:year=3D>"2005", :user_name=3D>"mike", :day=3D>"15", :month=3D>"10"}
> # for this filename:
> # 'ftp://ftp.org/DATA/mike/2005/10-15.txt'
>
> very nice.
> but would it be more practical if we delineate a variable just like we us=
ed to in ruby inline string; ie, use #{var} instead of just {var}
>
> this would be handy like, if i want to rename or move all folders under /=
mike/2005/ to /mike/2007/ eg.. the retrieval and assignment string just sta=
y the same...
>
> kind regards -botp
>
>
|
|
0
|
|
|
|
Reply
|
eric.duminil (17)
|
7/18/2007 8:33:46 AM
|
|
From: Eric DUMINIL [mailto:eric.duminil@gmail.com]=20
# 'foobar'.parse_for_variables("foo#{name}",'#{')
# NameError: undefined local variable or method `name' for main:Object
oops, totally ignored that, was thinking about lazy evals..
i think you're current interface is good, it would be easy to infix the =
"#" later...
kind regards -botp
|
|
0
|
|
|
|
Reply
|
botp (987)
|
7/18/2007 9:14:37 AM
|
|
There is an option for regexen to lazily evaluate. So you could
represent a regex-free string with a regex like that, then whenever
you need it - evaluate the regex, convert it to a string and use it
:).
OR you could store the string 'stuff \#{name}' and later #eval() it or
something similar and less dangerous when you need it's evaluation.
Aur
On 7/18/07, Eric DUMINIL <eric.duminil@gmail.com> wrote:
> Hi
> Thanks for the appreciation!
> Your suggestion is interesting, even though I'm not sure it would work, b=
ecause:
>
> 'foobar'.parse_for_variables('foo#{name}','#{')
> =3D> {:name=3D>"bar"}
>
> works, but when you use it with double quotes string:
>
> 'foobar'.parse_for_variables("foo#{name}",'#{')
> NameError: undefined local variable or method `name' for main:Object
>
> it already tries to evaluate "name" inside the string...
> so either you get retrieval or assignment right, but not both :(
> Anyway, assignment is not that big a deal:
>
> (irb) h=3D{:year=3D>"2005", :user_name=3D>"mike", :day=3D>"15", :month=3D=
>"10"}
> =3D> {:year=3D>"2005", :user_name=3D>"mike", :day=3D>"15", :month=3D>"10"=
}
>
> (irb) 'ftp://ftp.org/DATA/{user_name}/{year}/{month}-{day}.txt'.gsub(/\{(=
+?)\}/){h[$1.intern]}
> =3D> "ftp://ftp.org/DATA/mike/2005/10-15.txt"
>
> Best regards,
>
> Eric
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On 18/07/07, Pe=F1a, Botp <botp@delmonte-phil.com> wrote:
> > From: Eric DUMINIL [mailto:eric.duminil@gmail.com]
> > # For example, I would not know which file I should retrieve on:
> > # 'ftp://ftp.org/DATA/mike'
> > # but
> > # 'ftp://ftp.org/DATA/{user_name}/{year}/{month}-{day}.txt' would do
> > # just fine, so that I could, for example, get this hash:
> > # {:year=3D>"2005", :user_name=3D>"mike", :day=3D>"15", :month=3D>"10"=
}
> > # for this filename:
> > # 'ftp://ftp.org/DATA/mike/2005/10-15.txt'
> >
> > very nice.
> > but would it be more practical if we delineate a variable just like we =
used to in ruby inline string; ie, use #{var} instead of just {var}
> >
> > this would be handy like, if i want to rename or move all folders under=
/mike/2005/ to /mike/2007/ eg.. the retrieval and assignment string just s=
tay the same...
> >
> > kind regards -botp
> >
> >
>
>
|
|
0
|
|
|
|
Reply
|
sonoflilit (196)
|
7/18/2007 9:24:30 AM
|
|
I think that what you describe is exactly what I implemented as
searching_reg_exp.
For example searching_reg_exp corresponding to
'ftp://ftp.org/DATA/{user_name}/{year}/{month}-{day}.txt' is:
/^ftp:\/\/ftp\.org\/DATA\/(.+)\/(.+)\/(.+)\-(.+)\.txt$/
if you want it to be non-greedy, it would be:
/^ftp:\/\/ftp\.org\/DATA\/(.+?)\/(.+?)\/(.+?)\-(.+?)\.txt$/
Or did I get you wrong?
I wouldn't choose the eval() path for security reasons, as you mentioned it=
...
'foo{system("rm -rf ~/")}' would be pretty bad!
Which method are you thinking about when you wrote "something similar
and less dangerous"?
Bye,
Eric
On 18/07/07, SonOfLilit <sonoflilit@gmail.com> wrote:
> There is an option for regexen to lazily evaluate. So you could
> represent a regex-free string with a regex like that, then whenever
> you need it - evaluate the regex, convert it to a string and use it
> :).
>
> OR you could store the string 'stuff \#{name}' and later #eval() it or
> something similar and less dangerous when you need it's evaluation.
>
> Aur
>
> On 7/18/07, Eric DUMINIL <eric.duminil@gmail.com> wrote:
> > Hi
> > Thanks for the appreciation!
> > Your suggestion is interesting, even though I'm not sure it would work,=
because:
> >
> > 'foobar'.parse_for_variables('foo#{name}','#{')
> > =3D> {:name=3D>"bar"}
> >
> > works, but when you use it with double quotes string:
> >
> > 'foobar'.parse_for_variables("foo#{name}",'#{')
> > NameError: undefined local variable or method `name' for main:Object
> >
> > it already tries to evaluate "name" inside the string...
> > so either you get retrieval or assignment right, but not both :(
> > Anyway, assignment is not that big a deal:
> >
> > (irb) h=3D{:year=3D>"2005", :user_name=3D>"mike", :day=3D>"15", :month=
=3D>"10"}
> > =3D> {:year=3D>"2005", :user_name=3D>"mike", :day=3D>"15", :month=3D>"1=
0"}
> >
> > (irb) 'ftp://ftp.org/DATA/{user_name}/{year}/{month}-{day}.txt'.gsub(/\=
{(.+?)\}/){h[$1.intern]}
> > =3D> "ftp://ftp.org/DATA/mike/2005/10-15.txt"
> >
> > Best regards,
> >
> > Eric
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On 18/07/07, Pe=F1a, Botp <botp@delmonte-phil.com> wrote:
> > > From: Eric DUMINIL [mailto:eric.duminil@gmail.com]
> > > # For example, I would not know which file I should retrieve on:
> > > # 'ftp://ftp.org/DATA/mike'
> > > # but
> > > # 'ftp://ftp.org/DATA/{user_name}/{year}/{month}-{day}.txt' would do
> > > # just fine, so that I could, for example, get this hash:
> > > # {:year=3D>"2005", :user_name=3D>"mike", :day=3D>"15", :month=3D>"1=
0"}
> > > # for this filename:
> > > # 'ftp://ftp.org/DATA/mike/2005/10-15.txt'
> > >
> > > very nice.
> > > but would it be more practical if we delineate a variable just like w=
e used to in ruby inline string; ie, use #{var} instead of just {var}
> > >
> > > this would be handy like, if i want to rename or move all folders und=
er /mike/2005/ to /mike/2007/ eg.. the retrieval and assignment string just=
stay the same...
> > >
> > > kind regards -botp
> > >
> > >
> >
> >
>
>
|
|
0
|
|
|
|
Reply
|
eric.duminil (17)
|
7/18/2007 9:37:33 AM
|
|
|
6 Replies
31 Views
(page loaded in 0.097 seconds)
|