f



HTML and PDF for reading large documents (bookmarks)

  When HTML was new, I thought that I should prefer it over
  PS/PDF for my own internal usage, because HTML was �open� and
  text-file based, while PS/PDF was �closed� or at least it felt
  this way because it was binary-file based and came from a
  single company.

  But today I observe that I convert more and more larger
  documents into PDF for my own convenience!

  Why?

  A minor reason was that I observed that one browser was
  very slow when it had to display large HTML pages.

  But the major point is, when I read large documents,
  I want to take a note of some bookmark within the document
  to continue to read there after an interruption.

  When a document is split into pages, this is easy. I just
  write down the /page number/. (I do not want to rely on 
  in-program bookmarking tools, because I might continue to
  read the same file with a different reader later.)

  But when I have a large HTML file, it is impossible for 
  me to find any reference that I could write down as an
  indication of where exactly in this large document I left
  off so that I then can continue to read it at the same 
  position later (possibly even with another browser).

  Actually this is not a deficiency of HTML, but of browsers.
  A browser simply could display that the first character
  fully displayed on the screen is at �position 23.3111841427 %�
  of the document (the percent indication should refer to the
  HTML source code so as to be independent of display settings
  and CSS). But browsers usually do /not/ do this.

  Actually HTML makes /more/ sense when one is reading on a
  screen, because an artificial pagination has no use on a
  screen, where on is scrolling the document. But a position
  indicator would still help.

  So it seems that browser manufacturers do not think of 
  people who want to use an HTML browser to read a large
  file in more than one sessions and thus need a way to 
  later find where they left off in the previous session!

Newsgroups: comp.infosystems.www.authoring.html,comp.infosystems.www.browsers.misc

0
ram
5/12/2016 10:02:40 AM
comp.infosystems.browsers.misc 292 articles. 0 followers. Post Follow

9 Replies
479 Views

Similar Articles

[PageSpeed] 19

In article <HTML-20160512103507@ram.dialup.fu-berlin.de>,
 ram@zedat.fu-berlin.de (Stefan Ram) wrote:
....

> So it seems that browser manufacturers do not think of 
>   people who want to use an HTML browser to read a large
>   file in more than one sessions and thus need a way to 
>   later find where they left off in the previous session!
 
I am reminded of the convenience of a Kindle, where one is always 
returned after an interruption reading a book to where one left off. 
Even when changing a device on which you are reading (like an iPad 
with a Kindle app), if you are on wifi, you will be invited to go to 
where you last read. All this is coordinated at remote servers.

Browsers are usually not used in the way you use them. They might 
acquire the capabilities if it was more needed. Certainly an *author*, 
independently of a browser manufacturer, could put in facilities to 
make this easier for a reader, in all sorts of ways.

In the meantime how about a simple copy of a phrase or sentence you 
are up to, next time you return to the *very long* document, you 
search for the phrase or sentence. This is where you, not the browser 
or the author, are in control.

-- 
dorayme
0
dorayme
5/12/2016 10:58:32 AM
On 5/12/2016 3:02 AM, Stefan Ram wrote:
>   When HTML was new, I thought that I should prefer it over
>   PS/PDF for my own internal usage, because HTML was �open� and
>   text-file based, while PS/PDF was �closed� or at least it felt
>   this way because it was binary-file based and came from a
>   single company.
> 
>   But today I observe that I convert more and more larger
>   documents into PDF for my own convenience!
> 
>   Why?
> 
>   A minor reason was that I observed that one browser was
>   very slow when it had to display large HTML pages.
> 
>   But the major point is, when I read large documents,
>   I want to take a note of some bookmark within the document
>   to continue to read there after an interruption.
> 
>   When a document is split into pages, this is easy. I just
>   write down the /page number/. (I do not want to rely on 
>   in-program bookmarking tools, because I might continue to
>   read the same file with a different reader later.)
> 
>   But when I have a large HTML file, it is impossible for 
>   me to find any reference that I could write down as an
>   indication of where exactly in this large document I left
>   off so that I then can continue to read it at the same 
>   position later (possibly even with another browser).
> 
>   Actually this is not a deficiency of HTML, but of browsers.
>   A browser simply could display that the first character
>   fully displayed on the screen is at �position 23.3111841427 %�
>   of the document (the percent indication should refer to the
>   HTML source code so as to be independent of display settings
>   and CSS). But browsers usually do /not/ do this.
> 
>   Actually HTML makes /more/ sense when one is reading on a
>   screen, because an artificial pagination has no use on a
>   screen, where on is scrolling the document. But a position
>   indicator would still help.
> 
>   So it seems that browser manufacturers do not think of 
>   people who want to use an HTML browser to read a large
>   file in more than one sessions and thus need a way to 
>   later find where they left off in the previous session!
> 
> Newsgroups: comp.infosystems.www.authoring.html,comp.infosystems.www.browsers.misc
> 

I often read very large Web pages.  If I need to interrupt, I open a new
plain-text file (.txt on Windows).  I copy the URI from my browser and
paste it into the file.  Then I copy part of a line of text and paste
that into the file.  I save the file, using the title of the Web page as
the file name.  Then I can terminate my browser and shut down my PC.
The next day, I can return directly to where I left off.

-- 
David E. Ross
<http://www.rossde.com/>.

Donald Trump claims everyone likes him.  Does that
include his ex-wives?  How about the students who
discovered that their education at Trump University
was worthless?
0
David
5/12/2016 2:39:26 PM
12.5.2016, 13:02, Stefan Ram wrote:

>   When HTML was new, I thought that I should prefer it over
>   PS/PDF for my own internal usage, because HTML was �open� and
>   text-file based, while PS/PDF was �closed�

If you are referring to documents for your private use, or for internal 
use in a company, then the topic is, strictly speaking, off-topic in all 
comp.infosystems.www groups. However, over the years, non-WWW use of WWW 
technologies has been discussed in these groups; but care should be 
taken to distinguish WWW use from non-WWW use when relevant.

>   But today I observe that I convert more and more larger
>   documents into PDF for my own convenience!

Convert what type(s) of documents?

>   A minor reason was that I observed that one browser was
>   very slow when it had to display large HTML pages.

It depends. If you use very old-style table layout for a large document, 
then you have that problem. For a simple-structure HTML document, there 
is no reason a browser cannot start rendering its content fast. Setting 
dimensions for images may speed things up.

>   But the major point is, when I read large documents,
>   I want to take a note of some bookmark within the document
>   to continue to read there after an interruption.

This is a user agent issue. There is nothing in HTML that requires or 
forbids such behavior.

>   Actually this is not a deficiency of HTML, but of browsers.

Indeed.

On the practical side, you might consider making your document (if you 
convert it anyway, from some format) an e-book in the EPUB format, which 
is really just a zipped file containing an XHTML document and associated 
files. This is often very easy when using the free Calibre software. 
E-book readers typically have features like �remembering� your location, 
or at least marking a location.

I am currently working on a document that might eventually become an 
e-book distributed commercially, or a free e-book, or just a web page. 
Using Calibre and the EPUB format keeps all options open. (Although 
there is no benefit from using XHTML syntax for web pages, there are no 
real drawbacks either. For the EPUB format, XHTML syntax must be used, 
but that�s almost a triviality when using suitable software.)

-- 
Yucca, http://www.cs.tut.fi/~jkorpela/
0
Jukka
5/12/2016 7:27:30 PM
"Jukka K. Korpela" <jkorpela@cs.tut.fi> writes:
>>But today I observe that I convert more and more larger
>>documents into PDF for my own convenience!
>Convert what type(s) of documents?

  For example, someone has published 10 HTML documents,
  each of which would have about 10 pages when printed,
  and I want to read those with a handheld device that
  I use like an ebook reader. So, I manually join the
  10 pages to a single file and then convert this to PDF
  and copy this single PDF file to the "ebook reader" device.
  Not only can I now write down which page I should read
  next, but it is also easier to handle as one large file
  compared with 10 small files.

>On the practical side, you might consider making your document (if you 
>convert it anyway, from some format) an e-book in the EPUB format, which 
>is really just a zipped file containing an XHTML document and associated 
>files. This is often very easy when using the free Calibre software. 
>E-book readers typically have features like ''remembering'' your location, 
>or at least marking a location.

  I tried this, but I did not like it, because the 
  pagination of an EPUB is not fixed. Changing the
  font size or even just the device orientation or
  the reader program might give some point in the
  document a new page number. And last time I tried,
  it was even more difficult to convert an EPUB to
  PDF than to convert a plain (single large) HTML
  document.

  Another problem with using HTML for the kind of 
  documents that I write is that browser manufacturers
  - after a phase of increasing support for MathML - 
  now agains seem to be decreasing MathML support.
  This and the whole XHTML/XHTML2/HTML5/HTML5.1 hassle
  seems to indicate that �HTML� is not really stable today.
  When I publish EPUB with MathML today, do I now how
  many readers will be able to display this today or 
  tomorrow?

>I am currently working on a document that might eventually become an 
>e-book distributed commercially, or a free e-book, or just a web page. 

  I am offering my course notes on web pages. A course
  page has links to dozens of lesson pages. But that's it.
  No deeper nesting is involved. There are no sublesson pages.

  The participiants of my courses often are annoyed by this,
  slightly annoyed, that is. They would prefer to have it all
  within a single large file and sometimes manually copy each 
  lesson into a word document. I can understand them. I would love
  to offer the course notes as a single file myself, but I did not
  yet have the time to prepare this, because - of course - I do
  not want to do this /once/, but I need to establish an /automatic
  process/ that can generate all versions of a document (multi-page
  and single-page) from a single source, so that changes to the
  single source propagate to all document versions without manual
  interaction.

  Since this post still is crossposted into the (nearly empty)
  browser newsgroup:

  The web browser "Amaya" has a menu entry "make book" (or some such)
  that reportedly can combine several HTML pages into a single
  document. I tried it and even changed all my links to the kind
  of links that were prescribed by Amaya, but it did not work.

  I can still find this today on my web pages:

<a class="hrl" rel="subdocument" 
href="http://www.purl.org/stefan_ram/pub/java_methodendeklaration_de">

  ; I inserted the �rel="subdocument"� above specifically for
  the �make book� function of Amaya, but it did not work!

Newsgroups: comp.infosystems.www.authoring.html,comp.infosystems.www.browsers.misc
Followup-To: comp.infosystems.www.authoring.html

0
ram
5/12/2016 8:08:00 PM
On 12 May 2016 10:02:40 GMT, Stefan Ram wrote:
> But when I have a large HTML file, it is impossible for 
> me to find any reference that I could write down as an
> indication of where exactly in this large document I left
> off so that I then can continue to read it at the same 
> position later (possibly even with another browser).
> 
> Actually this is not a deficiency of HTML, but of browsers.

You're straining to find a problem, and overlooking at least two 
solutions (often three) already available to you.

1. Use a word or phrase that's unique or nearly so, then when you 
return to the document press Ctrl+F to find the word or phrase. 

2. Or just notice about what percentage of the document you've 
scrolled down.

3. Authors should give id= attributes on all section headers, for 
this purpose among others. In Firefox, you can highlight an element 
and then right-click and Inspect Element to see it. I assume most 
other browsers have something similar. This isn't guaranteed, but 
it's the easiest when the author has done his or her job, because you 
can include the value of the id attribute right in your bookmark.

-- 
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
                                       http://BrownMath.com/
                                  http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator:      http://validator.w3.org/
CSS 2.1 spec:   http://www.w3.org/TR/CSS21/
validator:      http://jigsaw.w3.org/css-validator/
Why We Won't Help You: http://preview.tinyurl.com/WhyWont
0
Stan
5/13/2016 10:25:24 AM
Stefan Ram wrote:

>   I am offering my course notes on web pages. A course
>   page has links to dozens of lesson pages. But that's it.
>   No deeper nesting is involved. There are no sublesson pages.
> 
>   The participiants of my courses often are annoyed by this,
>   slightly annoyed, that is. They would prefer to have it all
>   within a single large file and sometimes manually copy each
>   lesson into a word document. I can understand them. I would love
>   to offer the course notes as a single file myself, but I did not
>   yet have the time to prepare this, because - of course - I do
>   not want to do this /once/, but I need to establish an /automatic
>   process/ that can generate all versions of a document (multi-page
>   and single-page) from a single source, so that changes to the
>   single source propagate to all document versions without manual
>   interaction.

You can use XML for the content, and XSLT to extract from it and present 
that in any format that you like.

F'up2 <news:comp.infosystems.www.authoring.misc>

-- 
When all you know is jQuery, every problem looks $(olvable).

0
Thomas
5/13/2016 10:46:06 PM
12.5.2016, 23:08, Stefan Ram wrote:

>   For example, someone has published 10 HTML documents,
>   each of which would have about 10 pages when printed,
>   and I want to read those with a handheld device that
>   I use like an ebook reader. So, I manually join the
>   10 pages to a single file and then convert this to PDF
>   and copy this single PDF file to the "ebook reader" device.

Wouldn�t it be more natural to create an EPUB ebook from them and use 
any normal ebook reader on any device?

>> On the practical side, you might consider making your document (if you
>> convert it anyway, from some format) an e-book in the EPUB format, which
>> is really just a zipped file containing an XHTML document and associated
>> files. This is often very easy when using the free Calibre software.
>> E-book readers typically have features like ''remembering'' your location,
>> or at least marking a location.
>
>   I tried this, but I did not like it, because the
>   pagination of an EPUB is not fixed.

Just like pagination of an HTML document is not fixed. This should be 
regarded as a useful property, not a problem.

>   Changing the
>   font size or even just the device orientation or
>   the reader program might give some point in the
>   document a new page number.

So? What do you need page numbers for? Do you also need line numbers and 
numbers of characters on a line?

>   Another problem with using HTML for the kind of
>   documents that I write is that browser manufacturers
>   - after a phase of increasing support for MathML -
>   now agains seem to be decreasing MathML support.

I�m not sure I see what you are talking about. First you mentioned that 
you want to read some material that someone else has produced as a set 
of HTML documents. Now you seem to be discussing something completely 
different, namely authoring documents with mathematical content. The 
answers to that complicated questions depend, among other things, on the 
intended use of such documents. For such documents, EPUB format (as 
currently defined and supported) is inadequate; it can handle reasonably 
only rather simple math expressions. HTML with MathML is currently also 
limited in practice, though I cannot see what you mean by decreasing 
support. But tools like MathJax can produce very satisfactory results 
for online use. For offline use, PDF appears to be the only feasible 
solution, unless the users can be expected to have Microsoft Word 2007 
or newer.

>   I am offering my course notes on web pages.

Well, maybe you should have started by saying this and illustrated it 
with a URL. It seems that your real problem is how to produce some 
material in different formats for different types of use, automatically 
generating them from some base format. This is a very broad question and 
can hardly be discussed in a useful way without knowing much more about 
the type of content, intended uses, etc. (Well, it *could* be discussed 
as a general level, but that would mean something like writing a 
voluminous book.)

>   Since this post still is crossposted into the (nearly empty)
>   browser newsgroup:

I don�t see why it was included initially, but thought you might have a 
reason.

>   The web browser "Amaya" has a menu entry "make book" (or some such)
>   that reportedly can combine several HTML pages into a single
>   document. I tried it and even changed all my links to the kind
>   of links that were prescribed by Amaya, but it did not work.

Last time I seriously tried Amaya (was it ten years ago), it looked like 
experimental software created for testing something or proving some 
point, rather than software for production work. I have not heard any 
news that would give a reason to give it another chance.

-- 
Yucca, http://www.cs.tut.fi/~jkorpela/
0
Jukka
5/16/2016 4:58:50 AM
"Jukka K. Korpela" <jkorpela@cs.tut.fi> writes:
>So? What do you need page numbers for? Do you also need line numbers and 
>numbers of characters on a line?

  When I have read only a part of a document, I would like to
  write down the number of the page where I should continue to
  read it later. Line numbers are not needed for this.

>I'm not sure I see what you are talking about. First you
>mentioned that you want to read some material that someone
>else has produced as a set of HTML documents.

  The topic intended was to compare HTML with PDF.
  This comparison then might include the viewpoint of a
  consumer as well as the point-of-view of a producer.
  When I need to choose a format as a producer, I might
  ask myself, �which format would I prefer as a consumer?�.

>HTML with MathML is currently also limited in practice,
>though I cannot see what you mean by decreasing support  

  I remember having read something about Chrome removing
  support for MathML to make the browser faster.

0
ram
5/16/2016 5:52:43 PM
16.5.2016, 20:52, Stefan Ram wrote:

> "Jukka K. Korpela" <jkorpela@cs.tut.fi> writes:
>> So? What do you need page numbers for? Do you also need line numbers and
>> numbers of characters on a line?
>
>   When I have read only a part of a document, I would like to
>   write down the number of the page where I should continue to
>   read it later. Line numbers are not needed for this.

What you are really talking about is setting up a location in a document 
so that you continue from it. Page numbers are a blunt tool here. Page 
*and* line numbers would be much more specific, but unnecessarily technical.

This is not about data formats (HTML vs. PDF), but about software used 
to display data. E-book readers generally let you mark the current 
location (or do it automatically); web browsers don’t. Using EPUB, which 
is really just XHTML bundled with associated image, style, and other 
files, you make your data accessible on e-book readers.

>   The topic intended was to compare HTML with PDF.

Was it? Are data formats the issue? It sounds like your issue is with 
software.

>   I remember having read something about Chrome removing
>   support for MathML to make the browser faster.

I don’t. And I think it would be odd indeed. Code that is not executed 
can hardly affect the speed. Code for processing MathML is relevant only 
when the document contains MathML. Removing support to MathML would 
admittedly affect the *size* of browser code; this might have been 
important in the early 1990s. ☺

-- 
Yucca, http://www.cs.tut.fi/~jkorpela/
0
Jukka
5/16/2016 6:13:45 PM
Reply: