So I am trying to get some information from a snippet of html
(http://pastebin.com/iTXyxQ0j), and im using doc.inner_text to get the
important parts, but when I do so I get an odd amount of spacing
(http://pastebin.com/6HWDs5dm). is there a way where I can get rid of
all that extra spacing so I can just print the output and it looks
clean? possibly something like
pino
0.2.11-ubuntu0~lucid
troorl
(2010-07-04)
pino
0.2.10-ubuntu0~karmic
troorl
(2010-05-27)
that? or can i get each piece of text and add it to an array? if i do
that while its got all that odd spacing, is that spacing a piece of the
variable? or is it juts the text?
thanks guys!
--
Posted via http://www.ruby-forum.com/.
|
|
0
|
|
|
|
Reply
|
wrinkliez (3)
|
8/4/2010 4:29:48 AM |
|
On Wed, Aug 4, 2010 at 6:29 AM, David Ainley <wrinkliez@gmail.com> wrote:
> So I am trying to get some information from a snippet of html
> (http://pastebin.com/iTXyxQ0j), and im using doc.inner_text to get the
> important parts, but when I do so I get an odd amount of spacing
> (http://pastebin.com/6HWDs5dm). =A0is there a way where I can get rid of
> all that extra spacing so I can just print the output and it looks
> clean? =A0possibly something like
>
> pino
> 0.2.11-ubuntu0~lucid
> troorl
> (2010-07-04)
>
> pino
> 0.2.10-ubuntu0~karmic
> troorl
> (2010-05-27)
>
> that? =A0or can i get each piece of text and add it to an array? =A0if i =
do
> that while its got all that odd spacing, is that spacing a piece of the
> variable? =A0or is it juts the text?
You can remove 2 or more consecutive "\n" like this:
irb(main):001:0> s =3D<<EOS
irb(main):002:0" test
irb(main):003:0"
irb(main):004:0" test2
irb(main):005:0" sdfsdf
irb(main):006:0" werwer
irb(main):007:0"
irb(main):008:0"
irb(main):009:0"
irb(main):010:0"
irb(main):011:0" sdfsdfsd
irb(main):012:0" sdfer234
irb(main):013:0" EOS
=3D> "test\n\ntest2\nsdfsdf\nwerwer\n\n\n\n\nsdfsdfsd\nsdfer234\n"
irb(main):019:0> s.gsub /\n\n+/, "\n"
=3D> "test\ntest2\nsdfsdf\nwerwer\nsdfsdfsd\nsdfer234\n"
or
irb(main):020:0> s.gsub /\n{2,}/, "\n"
=3D> "test\ntest2\nsdfsdf\nwerwer\nsdfsdfsd\nsdfer234\n"
Hope this helps,
Jesus.
|
|
0
|
|
|
|
Reply
|
ISO
|
8/4/2010 6:28:10 AM
|
|
Use the String methods: s. strip!, s.gsub! and s.squeeze as in
the following snippet:
# no-white.rb - remove empty lines and sequences of blanks
# from a text file
fh = File.open('6HWDs5dm.txt')
while( !fh.eof)
line = fh.readline.chomp
# remove leading and trailing blanks
line.strip!
# skip empty lines
next if line == ''
# convert tab chars to blanks
line.gsub!(/\t/,' ')
# substitute a single blank for a sequence of blanks
line.squeeze!(' ')
# add code to process line if needed
puts line
end
fh.close
exit(0)
HTH gfb
"David Ainley" <wrinkliez@gmail.com> wrote in message
news:a8de6e7e2af61a043990f1a86a62f009@ruby-forum.com...
> So I am trying to get some information from a snippet of html
> (http://pastebin.com/iTXyxQ0j), and im using doc.inner_text to get the
> important parts, but when I do so I get an odd amount of spacing
> (http://pastebin.com/6HWDs5dm). is there a way where I can get rid of
> all that extra spacing so I can just print the output and it looks
> clean? possibly something like
>
> pino
> 0.2.11-ubuntu0~lucid
> troorl
> (2010-07-04)
>
> pino
> 0.2.10-ubuntu0~karmic
> troorl
> (2010-05-27)
>
> that? or can i get each piece of text and add it to an array? if i do
> that while its got all that odd spacing, is that spacing a piece of the
> variable? or is it juts the text?
>
> thanks guys!
> --
> Posted via http://www.ruby-forum.com/.
>
|
|
0
|
|
|
|
Reply
|
GianFranco
|
8/4/2010 8:33:34 AM
|
|
Hey guys, thanks for the responses. Jesus, the gsubs don't do anything
:/, the output still looks the same.
And Gianfranco, everytime I try to use readline, it gives me an error
"private method `readline' called for #<String:0xb71c3fd8>
(NoMethodError)"
--
Posted via http://www.ruby-forum.com/.
|
|
0
|
|
|
|
Reply
|
David
|
8/4/2010 2:18:28 PM
|
|
On Wed, Aug 4, 2010 at 4:18 PM, David Ainley <wrinkliez@gmail.com> wrote:
> Hey guys, thanks for the responses. =A0Jesus, the gsubs don't do anything
> :/, the output still looks the same.
> And Gianfranco, everytime I try to use readline, it gives me an error
> "private method `readline' called for #<String:0xb71c3fd8>
> (NoMethodError)"
Can you show your code?
Jesus.
|
|
0
|
|
|
|
Reply
|
ISO
|
8/4/2010 7:28:22 PM
|
|
|
4 Replies
27 Views
(page loaded in 0.151 seconds)
Similiar Articles: Best way to delete old files? find & exec VS find & xargs - comp ...Maybe running -exec rmdir \{\} \; to get rid of ... them, but that would require a fair bit of extra ... San Francisco, CA bay area < This line left intentionally blank ... how to merge multiple lines into one line - comp.lang.awk ...... filename though either way it adds an extra blank and ... www.torfree.net/~chris/books/cfaj/ssr.html ... Parsing multiple lines with regex - comp.lang.java.programmer ... ... A splash screen - comp.lang.java.gui... When the timer expires hide the window and get rid of it ... run, _see_, it's a pity not to take > it the extra 1%. ... screen randomly goes blank for 1-2 seconds - comp.sys ... [comp.publish.cdrom] CD-Recordable FAQ, Part 1/4 - comp.publish ...Archive-name: cdrom/cd-recordable/part1 Posting-Frequency: monthly Last-modified: 2008/10/09 Version: 2.71 Send corrections and updates to And... I just made a stupid mistake - comp.cad.solidworksIts line in the sand time, and while there are still ... You have somehow managed to be extra-pathetic at your ... Youd want to get rid of the multibodies before sending it ... How best to detect duplicate values in a column? - comp.databases ...Any difference, including an extra space, a ... one shortitl and select all but one to get rid of ... the code for both actual functionality and HTML formatting. When I get ... Where did Fortran go? - comp.lang.fortran... files, especially if there are blank ... Blocks but directly in the command line. Of course, there is > some information via ... one of those things you just can't get rid ... top 10 uses for random data compression?? anyone? - comp ...It will hate extra coverages outside the institutional ... Try glancing the taxi's blank dust and Jbilou will ... service Carolyn's bar with wells, it'll upstairs rid ... Stop putting extra line spaces in my e-mail signatures - Outlook ...A <p> tag is interpreted by Outlook as a line break plus an extra line space. Note HTML tags are not typically visible unless you are using the code view of an HTML ... HTML Parsing Error: Unable to modify the parent container element ...Any suggestions on what I can do to get rid ... Any suggestions on what I can do to get rid of this message? Message: HTML Parsing ... is the reason why the webpage is blank ... 7/25/2012 10:50:40 AM
|