Re: suggested option for wget/curl.

  • Follow


Michael Black wrote:-
>    Based on context from the original poster's previous posts, I think he
>    wants to selectively download pages, or parts of pages.  So it would seem
>    he wants a "smart" browser, that will look at the context of the page and
>    retrieve only what he wants.  Sed won't work, since it requires the file
>    to be transferred to his computer before he can start filtering it, and it
>    would seem (based on previous context) that he is trying to avoid
>    downloading the bloat in the first place.

OP often wonders why he can't write 'to be understood'.
But if one person CAN understsnd it's OK.
Why should "to avoid
downloading the bloat in the first place." be based on previous context,
rather than the optimum bandwidth saver, independant of context?

> I think the poster is heading down the wrong path, complicating things for
> the wrong reasons.  I find that if I keep frequent pages like google and
> wikipedia local to my computer, it avoids some of the most common wasted
> transfers, the pages held locally act just like they were remote in terms
> of the browser, so it bumps out an intermediate step for common searches.

Sure, you can always argue "your time & convenience is more valuable
than physical resources" which leads to the N.American V8 mentality;
which may be right -- for some.

Sylvain Robitaille wrote:-
>    Examine the manual pages for wget and curl, with special attention to
>    the "continue fetch" options to which you're referring.  I think you'll
>    find in both cases that all it does is skip forward some number of
>    bytes (which with curl you can specify as an argument to that option),
>    before starting to save the new download.  They don't have any form
>    of intelligence to help them determine whether any of the *content*
>    already exists in the file being appended to.  They're simply skipping
>    forward some number of bytes before appending to the target file.
> 
>    If you can figure out a way to calculate how many bytes to skip from
>    the start, (and can live with any duplicated material at the end of each
>    download), I suspect that "curl" will be the better choice.

So finally the 4th respondent knows what I'm talking about.
Since I OP-ed I've discovered that wget is just a background browser.
There are 2 classes of resources to be saved:
1. physical: bandwidth, file-space;
2. human effort from clutter/noise.

Often you want to get a series of articles, which have the same 
60% contents header, with a 10% <article contents> and the rest
a mostly redundant 30% trailer.

Currently, I fetch and append all articles to a file [the book]
via `lynx -dump URL`, with the URL & a separator line for each
part/web-page of the book.
Then I delete the repeated parts while reading it.

Sure `curl` can <fetch the rest of the URL from byte N>
provided the server has the facility.
But I'm just checking to see if others have optimising methods which
I don't yet use.

I've just realised that it won't work.
Oh, I'm wrong again, it can work: 
when I look at the lynx-rendered text in my editor, I can see the
<byte count>. But that's not the byte-count of the <source-cut>.

OTOH the mapping from a convenient 'transision' in the lynx-rendered
text to the byte-count of the corresponding source[html] should be
easy to find automatically? You appreciate the problem of validly
concatenating 2 html-sources?

BTW these optimisations may start as novel engineering quirks, but
you can't tolerate using the 'consumer' methods once you get used
to optimising.

Here's a typical on-line-auto-shopping-list-script:----
g1277 Dec14L Dec14D 
cd /mnt/p11/Econ/Krugman g1277 Tmplt Dec14
cd /mnt/p11/Inet/USEnet nhd12  GrpLst  GrpHdrsLst
-----------> which 3 fetches mean:-
1.use lynx set to line-len < 77, with 2 args: 
      File of URLs, 
      File to append fetches to, with URL header & separator-line.
2. goto the appropriate dir and get the latest blog & save it as Dec14
  This is an example of where the header is always repeated, but
  not so annoying, because you you don't save multiple 'pages'
  in a 'book' with the same garbage.
3. goto the appropriate dir and get the list of Newgroups from
    GrpLst to use lynx to use google to get the list of headers
    [for each group] appended to file: GrpHdrsLst, with suitable
    URL-header & separator-line for each fetch. 

What I really need is to be able to get gmail without a graphical
browser. Can anyone help.

== Chris Glur.

0
Reply no.top.post (346) 1/9/2012 4:45:54 PM

On 2012-01-09, no.top.post@gmail.com <no.top.post@gmail.com> wrote:
> What I really need is to be able to get gmail without a graphical
> browser. Can anyone help.

Would a GUI mail client work?  There are lots of those, and
Slackware comes with Thunderbird and KMail.

Would a CLI mail client work?  If so see the following article:

http://lcorg.blogspot.com/search/label/All-Text%20Linux%20Workstation?updated-max=2009-03-11T17%3A37%3A00-04%3A00&max-results=20

I believe it has instructions for configuring mutt for GMail, although
it has been a long time since I've looked at it.  There's also alpine.
-- 
                                 Chick Tower

For e-mail:  aols2 DOT sent DOT towerboy AT xoxy DOT net
0
Reply c.tower (28) 1/10/2012 5:42:03 PM


no.top.post@gmail.com wrote:

> What I really need is to be able to get gmail without a graphical
> browser. Can anyone help.

Configure your gmail account to allow POP access then use a POP client
such as fetchmail to retrieve the mail.

Enabling POP:
  <http://support.google.com/mail/bin/answer.py?hl=en&answer=13273>

Configuring other mail clients:
  <http://support.google.com/mail/bin/answer.py?answer=13287>

[ Followup-To: set to comp.os.linux.misc ]
0
Reply news002 (121) 1/11/2012 12:26:03 PM

Dave Gibson wrote:
> no.top.post@gmail.com wrote:
> 
>> What I really need is to be able to get gmail without a graphical
>> browser. Can anyone help.
> 
> Configure your gmail account to allow POP access then use a POP client
> such as fetchmail to retrieve the mail.
> 
> Enabling POP:
>   <http://support.google.com/mail/bin/answer.py?hl=en&answer=13273>
> 
> Configuring other mail clients:
>   <http://support.google.com/mail/bin/answer.py?answer=13287>
> 
> [ Followup-To: set to comp.os.linux.misc ]
+1
0
Reply tnp (2273) 1/11/2012 12:44:49 PM

On 2012-01-11, Dave Gibson <dave.gma+news002@googlemail.com.invalid> wrote:
> no.top.post@gmail.com wrote:
>
>> What I really need is to be able to get gmail without a graphical
>> browser. Can anyone help.
>
> Configure your gmail account to allow POP access then use a POP client
> such as fetchmail to retrieve the mail.

Or just point any IMAP client at GMail's IMAP server.  I use mutt, but
Thunderbird et al work fine as well.  I leave all my mail on the
server and that way I can still access it from anywhere via a browser.
The "tags" all show up as IMAP forlders and everything pretty much
"just works".

-- 
Grant Edwards               grant.b.edwards        Yow! I'm continually AMAZED
                                  at               at th'breathtaking effects
                              gmail.com            of WIND EROSION!!
0
Reply invalid171 (6610) 1/11/2012 2:49:24 PM

On 01/11/2012 07:26 AM, Dave Gibson wrote:
> no.top.post@gmail.com wrote:
>
>> What I really need is to be able to get gmail without a graphical
>> browser. Can anyone help.
>
> Configure your gmail account to allow POP access then use a POP client
> such as fetchmail to retrieve the mail.
>
> Enabling POP:
>    <http://support.google.com/mail/bin/answer.py?hl=en&answer=13273>
>
> Configuring other mail clients:
>    <http://support.google.com/mail/bin/answer.py?answer=13287>
>
> [ Followup-To: set to comp.os.linux.misc ]

Been doing that with Thunderbird for years. Works great.

TJ
0
Reply TJ70 (53) 1/11/2012 5:24:59 PM

5 Replies
33 Views

(page loaded in 0.276 seconds)


Reply: