The Nature of Hypertext
peter Murray-rust
p.murray-rust at mail.cryst.bbk.ac.uk
Wed Dec 6 09:05:01 GMT 1995
On Tue, 5 Dec 1995 bruno at chemcrys.cam.ac.uk wrote:
[...]
> possible to conceive an automatic method of generating hypertext
> documents from 'straight' text. I have also had the dubious honour of
> attempting to convert a Masters dissertation into hypertext (as part of
> one of the aforementioned studies). It isn't always easy especially if
> you are not allowed to rearrange the text in any way.
There are two distinct issues here: markup and hypertext. I interpret
hypertext to mean a document whose structure is enhanced (hopefully) by
the addition of links from one part of the document to another. These
links are additional to the normal structuring tools we learn when we
read conventional books - so that an index is not regarded as hypertext,
though logically it is.
Like Henry I agree that Bush is normally credited with hypertext, though
I think that Diderot and the encyclopaedists are one of the major epics
in the globalisation of knowledge. The recent use of the term has
stressed this globalisation - i.e. that the hyperdocument can be
distributed over many physical entities.
If HTML only contained links (<A HREF=to>, <A NAME=from>) it would be
pure hypertext, but it contains some minimal markup as well. Apart from
the formatting, HTML 2.0 marks up:
data containers (UL and OL)
document structuring (H1...H6) They didn't get this right!
TITLE and ADDRESS
IMG
This markup defines these elements (sic) as having a particular role in
the document, and it is legitimate to use them for searching, indexing,
restructuring, etc - though this is rarely possible with the present
diversity of authoring tools.
> > One of the ideas of hypertext is that the concept of a page is done away
> with. I still tend to view hypertext documents in terms of pages;
> I wonder if others do.
The terminology is common and IMO quite useful. However there is a big
contrast between the supporters of what I call CONTENT and FORM. Form is
(at present) the most highly desired - Can X send Y a 'page' that looks
exactly how X wants it. This is where CENTER, BLINK, etc raise such
passions. Almost all discussion on the HTML-WG is about form.
It is content that concerns me more. In chemistry I believe it matters
critically that information is tramsmitted accurately, and that its
display is (relatively) less important. HTML is very forgiving about
variations in syntax, so, for example:
Please send <CURRENCY COUNTRY=CANADA>10 dollars
will be rendered (without comment) by all browsers simply by omitting the
tag:
Please send 10 dollars.
This will not do for chemistry!
I have addressed this in Chemical Markup Language (CML) which is now at:
http://www.dl.ac.uk/CBMT/cml/
CML concentrates on information structure and content and very little on
form. I shall be adding more discussion at that site of the flavour of
this posting.
If you are converting *.txt to *.html you need to ask yourself WHY? If
it's simply for formatting so that it's nicer to look at when downloaded,
then a trivial tool will do. If, however, you want an INDEX or other
markup and the author hasn't included that, it's an expensive operation
and there are not many shortcuts. If you have the source (e.g. LaTeX or
Word, there are tools to convert to semantically void HTML).
>
> I often find something on the WWW that I want to print off and read
> away from the computer. Sometimes this isn't easy, particularly if a document
> is spread across more than one HTML file (as sometimes happens with papers
> presented at electronic conferences).
I agree. When I want people to download something (as for Chemical
Markup Language) I include a *.tar.gz for the appropriate part of the
distribution.
>
> These issues possibly apply more to resources such as journals than they do
> to some other applications, but:
They apply across the board! It's a culture change that we have to
make It will take at least half a generation.
>
> How 'ready' is the scientific community to change the way it approaches
> 'written' texts?
This depends (IMO) on our education. Books (as opposed to scrolls, clay
tablets) have been in common use for ca. 500 years and a large part of
our education is given to teaching people how to use them. Teenagers are
now much more familiar with electronic metaphors (through keyboards,
screens, etc). Mine read much less paper than we used to. They love the
WWW.
>
> How prepared is the scientific community to glean information from
> a computer screen and not worry about having a hard copy?
>
It depends on what they want to do with it. There are still several
things we can't do on screen (annotation is one, reading in the bath
another). But who uses the CSD printed books for searches if they have
(free) access to an on line version?
> To what extent are these barriers to the promotion of WWW resources?
There are many things that may/will happen outside our community (better
screens, new metaphors), but WE must concentrate urgently on getting our
discipline-specific information in order. This will take 10-20 years.
>
> Are there issues relating to the design of HTML documents that we need to
> consider in relation to this, be it in the conversion of existing
> 'straight' texts or in the design of HTML documents from scratch?
Yes. At present badly thought out hypertext is a nightmare. IMO it
works best when hypertext maps well onto convential paper structures. I
have tried to come up with some archetypes and have genralised this to
four:
- serial book (e.g. detective novel). read from page1 to page 200
- dictionary (phone book, CSD, Swissprot). Locate a precise
chunk of information by (alphabetical) index
- tree (technical manual e.g. brakes, engine, lights can all be
read independently)
- anthology (literature, or journal, where items are distinct but
have a common theme)
The electronic era has added the 'grep' - i.e. searching unstructured
documents.
I'd be very interested to know ehether other people have additions
to this list. Are there any new ones which have arisen over the last
year or two?
I've just thought of a fifth: the map.
P.
BTW Chemical Markup Language is now in a reasonable state to look at.
There is a browser which can be compiled under UNIX (and we are trying
for PC and Mac). There are many examples, including molecular data files
and I am writing extensive documentation. Feedback will be most valuable.
Peter Murray-Rust, Glaxo Research & Dev. (pmr1716 at ggr.co.uk); (BioMOO: PeterMR)
Birkbeck College, ubcg09q at cryst.bbk.ac.uk, CBMT/Daresbury mbglx at seqnet.dl.ac.uk
http://www.cryst.bbk.ac.uk/PPS/index.html, http://www.dl.ac.uk/CBMT/HOME.html
-----
chemweb: A list for Chemical Applications of the Internet.
To unsubscribe, send to listserver at ic.ac.uk the following message;
unsubscribe chemweb
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)
More information about the chemweb
mailing list