Standard Generalized Markup Language
Standard Generalized Markup Language | |
---|---|
File extension: | none |
MIME type: | application/sgml, text/sgml |
Uniform Type Identifier: | public.xml |
Type of format: | metalanguage |
Extended from: | GML |
Extended to: | HTML, XML |
Standard(s): | ISO 8879 |
The Standard Generalized Markup Language (SGML) is a metalanguage in which one can define markup languages for documents. SGML is a descendant of IBM's Generalized Markup Language (GML), developed in the 1960s by Charles Goldfarb, and (whose surname initials were used by Goldfarb to make up the term GML[1]). SGML and GML should not be confused with the Game Maker scripting language, or with the Geography Markup Language developed by the Open GIS Consortium.
SGML provides a variety of markup syntaxes that can be used for many applications. By changing the SGML Declaration one does not even need to use "angle brackets" although they are the norm – part of the concrete reference syntax defined in the standard (GML used a colon to introduce a tag, a period to end it, and 'e' to indicate an end tag: :xmp.thus:exmp., and SGML is flexible enough to accept that grammar too).
Contents |
Original uses
SGML was originally designed to enable the sharing of machine-readable documents in large projects in government, legal and industry, which have to remain readable for several decades—a very long time in information technology. It has also been used extensively in the printing and publishing industries, but its complexity has prevented its widespread application for small-scale general-purpose use.
Primarily intended for text and database publishing, one of its first major applications was the second edition of the Oxford English Dictionary (OED), which was and is wholly marked up in SGML.
Syntax
SGML allows most aspects of a markup language's syntax to be customized.
The default syntax appears similar to this example:
<QUOTE TYPE="example"> typically something like <ITALICS>this</ITALICS> </QUOTE>
HTML uses this SGML default syntax.
Customization of the syntax for a markup language in SGML is specified by a Document Type Definition, or DTD.
Letter case is not distinguished in tag names, so the three tags <quote>
, <QUOTE>
, and <quOtE>
are equivalent. Whether a tag must be paired like the above <QUOTE></QUOTE>
pair or occurring singly like an HTML <HR>
is defined in the DTD for the markup language being defined. (In this case the XML counterpart would be the specific empty tag <hr/>
, which has no equivalent in SGML, even though this was proposed during SGML development).
SGML markup languages do not require attribute values lacking whitespace and special values (such as '>' in the default syntax) to be surrounded by quote marks "
or '
, so that the above markup could be written:
<QUOTE TYPE=example> typically something like <ITALICS>this</ITALICS> </QUOTE>
One feature of SGML markup languages is the NET (Null End Tag) construction: <ITALICS/this/
which is structurally equivalent to <ITALICS>this</ITALICS>
. Another is the "presumptuous empty tagging", such that the empty tag </>
in <ITALICS>this</>
"inherits" its value from the nearest previous nonempty tag, which of course is <ITALICS>
(in other words, it closes the most recently opened item). The expression is thus another, more concise, equivalent to <ITALICS>this</ITALICS>
. A third is the 'text on the same line' feature, which allows an item to be ended by a line-end (especially useful for headings and the like).
SGML is an ISO standard: "ISO 8879:1986 Information processing—Text and office systems—Standard Generalized Markup Language (SGML)" which was accepted in October of 1986.
Derivatives
HTML
HTML was originally designed based on SGML tagging but without SGML's emphasis on rigorous markup. It was later reformulated (at version 2.0) to be an application of SGML, although there's some debate on whether it ever actually became one. The charter for the recently revived World Wide Web Consortium HTML Working Group goes so far as to say, "the Group will not assume that an SGML parser is used for 'classic HTML'".[2]
XML
XML is a simplified rework of SGML, which is designed so to make the XML parser much easier to implement, compared to an SGML parser. One consequence is that an XML parser is much less forgiving to erroneous XML code. A consequence of the ease of implementation is that XML has replaced SGML virtually completely. Contributing to this is the fact that few SGML aware programs existed when XML was created. The number of XML applications today are numerous. XML also has more lightweight internationalization. XML is used for general-purpose applications, such as the Semantic Web, XHTML, SVG, RSS, Atom, XML-RPC and SOAP.
DocBook
Another markup language originally created as an application of SGML is DocBook, designed for authoring technical documentation. DocBook is now also available as an XML application.
Other
There are also a number of languages that are related in part to SGML and XML, but, because they cannot be parsed or validated or otherwise processed using standard SGML and XML tools, cannot be considered to be applications of SGML or XML. One example is the Z Format, a language designed for typesetting and documentation.
See also
References
- ^ Charles F. Goldfarb (1996). The Roots of SGML - A Personal Recollection. Retrieved on 2007-07-07.
- ^ HTML Working Group Charter. Retrieved on 2007-04-19.