The Nature of Hypertext

peter Murray-rust p.murray-rust at mail.cryst.bbk.ac.uk
Wed Dec 6 09:05:01 GMT 1995


On Tue, 5 Dec 1995 bruno at chemcrys.cam.ac.uk wrote:

[...]
> possible to conceive an automatic method of generating hypertext
> documents from 'straight' text. I have also had the dubious honour of 
> attempting to convert a Masters dissertation into hypertext (as part of 
> one of the aforementioned studies). It isn't always easy especially if
> you are not allowed to rearrange the text in any way. 

There are two distinct issues here: markup and hypertext.  I interpret 
hypertext to mean a document whose structure is enhanced (hopefully) by 
the addition of links from one part of the document to another.  These 
links are additional to the normal structuring tools we learn when we 
read conventional books - so that an index is not regarded as hypertext, 
though logically it is.  

Like Henry I agree that Bush is normally credited with hypertext, though 
I think that Diderot and the encyclopaedists are one of the major epics 
in the globalisation of knowledge.  The recent use of the term has 
stressed this globalisation - i.e. that the hyperdocument can be 
distributed over many physical entities.

If HTML only contained links (<A HREF=to>, <A NAME=from>) it would be 
pure hypertext, but it contains some minimal markup as well.  Apart from 
the formatting, HTML 2.0 marks up:
	data containers (UL and OL)
	document structuring (H1...H6) They didn't get this right!
	TITLE and ADDRESS
	IMG
This markup defines these elements (sic) as having a particular role in 
the document, and it is legitimate to use them for searching, indexing, 
restructuring, etc - though this is rarely possible with the present 
diversity of authoring tools.
 
> > One of the ideas of hypertext is that the concept of a page is done away
> with. I still tend to view hypertext documents in terms of pages; 
> I wonder if others do. 

The terminology is common and IMO quite useful.  However there is a big 
contrast between the supporters of what I call CONTENT and FORM.  Form is 
(at present) the most highly desired - Can X send Y a 'page' that looks 
exactly how X wants it.  This is where CENTER, BLINK, etc raise such 
passions.  Almost all discussion on the HTML-WG is about form.

It is content that concerns me more.  In chemistry I believe it matters 
critically that information is tramsmitted accurately, and that its 
display is (relatively) less important.  HTML is very forgiving about 
variations in syntax, so, for example:
	Please send <CURRENCY COUNTRY=CANADA>10 dollars
will be rendered (without comment) by all browsers simply by omitting the 
tag:
	Please send 10 dollars.
This will not do for chemistry!

I have addressed this in Chemical Markup Language (CML) which is now at:
http://www.dl.ac.uk/CBMT/cml/
CML concentrates on information structure and content and very little on 
form.  I shall be adding more discussion at that site of the flavour of 
this posting.

If you are converting *.txt to *.html you need to ask yourself WHY? If 
it's simply for formatting so that it's nicer to look at when downloaded, 
then a trivial tool will do.  If, however, you want an INDEX or other 
markup and the author hasn't included that, it's an expensive operation 
and there are not many shortcuts.  If you have the source (e.g. LaTeX or 
Word, there are tools to convert to semantically void HTML).

 > 
> I often find something on the WWW that I want to print off and read
> away from the computer. Sometimes this isn't easy, particularly if a document 
> is spread across more than one HTML file (as sometimes happens with papers 
> presented at electronic conferences).

I agree.  When I want people to download something (as for Chemical 
Markup Language) I include a *.tar.gz for the appropriate part of the 
distribution.

> 
> These issues possibly apply more to resources such as journals than they do
> to some other applications, but:

They apply across the board!  It's a culture change that we have to 
make  It will take at least half a generation.

> 
>    How 'ready' is the scientific community to change the way it approaches
>    'written' texts?

This depends (IMO) on our education.  Books (as opposed to scrolls, clay 
tablets) have been in common use for ca. 500 years and a large part of 
our education is given to teaching people how to use them.  Teenagers are 
now much more familiar with electronic metaphors (through keyboards, 
screens, etc).  Mine read much less paper than we used to.  They love the 
WWW.

 > 
>    How prepared is the scientific community to glean information from
>    a computer screen and not worry about having a hard copy?
> 
It depends on what they want to do with it.  There are still several 
things we can't do on screen (annotation is one, reading in the bath 
another).  But who uses the CSD printed books for searches if they have 
(free) access to an on line version?

>    To what extent are these barriers to the promotion of WWW resources?

There are many things that may/will happen outside our community (better 
screens, new metaphors), but WE must concentrate urgently on getting our 
discipline-specific information in order.  This will take 10-20 years.

 > 
>    Are there issues relating to the design of HTML documents that we need to
>    consider in relation to this, be it in the conversion of existing
>    'straight' texts or in the design of HTML documents from scratch?

Yes.  At present badly thought out hypertext is a nightmare.  IMO it 
works best when hypertext maps well onto convential paper structures.  I 
have tried to come up with some archetypes and have genralised this to 
four:
	- serial book (e.g. detective novel).  read from page1 to page 200
	- dictionary (phone book, CSD, Swissprot).  Locate a precise
		chunk of information by (alphabetical) index
	- tree (technical manual e.g. brakes, engine, lights can all be 
		read independently)
	- anthology (literature, or journal, where items are distinct but 
		have a common theme)
The electronic era has added the 'grep' - i.e. searching unstructured 
documents.
	I'd be very interested to know ehether other people have additions 
to this list.  Are there any new ones which have arisen over the last 
year or two?
	I've just thought of a fifth: the map.

	P.

BTW Chemical Markup Language is now in a reasonable state to look at.  
There is a browser which can be compiled under UNIX (and we are trying 
for PC and Mac).  There are many examples, including molecular data files 
and I am writing extensive documentation.  Feedback will be most valuable.


Peter Murray-Rust, Glaxo Research & Dev. (pmr1716 at ggr.co.uk); (BioMOO: PeterMR)
Birkbeck College, ubcg09q at cryst.bbk.ac.uk, CBMT/Daresbury mbglx at seqnet.dl.ac.uk
http://www.cryst.bbk.ac.uk/PPS/index.html, http://www.dl.ac.uk/CBMT/HOME.html


-----
chemweb: A list for Chemical Applications of the Internet.
To unsubscribe, send to listserver at ic.ac.uk the following message;
unsubscribe chemweb
List coordinator, Henry Rzepa (rzepa at ic.ac.uk)



More information about the chemweb mailing list