Tuesday 2 June 2009

W3C’s Steven Pemberton on XHTML2

Please note that the XHTML2 document was sent in error. The correct document has been forwarded along and Steven’s response to my query is now published as The Real “Why XHTML” Discussion.

With all the fuss about HTML5 at Google I/O last week, the question of “what about XHTML2?” keeps coming up in conversation. In an effort to better understand the answer to that question, I asked Steven Pemberton, W3c Chair of HTML and Forms Working Groups, who graciously took the time to chat with me about it and who then provided this overview to answer the question for the Web designer and developer public.

The following information is kindly provided courtesy of Steven Pemberton, CWI, Amsterdam, and W3C.


Based on the experience we have with HTML, XHTML 2 is an attempt to fix many of the extant problems.

The areas that are being addressed include:

Make it as generic XML as possible


  • All the ones that you can imagine, because XML is a Good Thing (tools,
    interoperability, etc).
  • If XHTML 2 gets accepted it will draw the web community further into
    the XML world.
  • Much of XHTML 2 works on most existing browsers already (as an example

Less presentation, more structure

Make documents more semantically meaningful; make CSS responsible for the presentation, not HTML.

Author advantages:

  • Easier to write your documents
  • Easier to change your documents
  • Easy to change the look of your documents
  • Access to professional designs
  • CSS gives more presentational possibilities than plain HTML
  • Supports single-authoring: write your document once, supply different
    stylesheets for different devices or purposes
  • Your documents are smaller
  • Visible on more devices
  • Visible to more people

Webmaster advantages:

  • Separation of concerns: authors write the text, graphic designers
    design the look
  • Simpler HTML, less training
  • Cheaper to produce, easier to manage
  • Easy to change house style, without changing your documents
  • More control over the look of your site
  • Reach more people
  • Search engines find your stuff easier
  • Visible on more devices

Reader (Surfer) advantages:

  • Faster download (one of the top 4 reasons for liking a site)
  • Easier to find information
  • You can actually read the information if you are sight-impaired
  • Information more accessible
  • You can use more devices

More accessibility

The design should be as inclusive as possible. This includes finding a replacement for the unsuccessful longdesc and making forms more accessible. Device independence and increased structure help here too.

Better internationalization

It is a World Wide Web.

More device independence

New devices becoming available, such as telephones, PDAs, tablets, printers, televisions and so on mean that it is imperative to have a design that allows you to author once and render in different ways on different devices, rather than authoring new versions of the document for each type of device, or limiting your design to a single type of device. This includes creating a more flexible event handling system to allow for new sorts of events that new devices might generate.

More usability

Try to make the language easy to write, and make the resulting documents easy to use. According to research, usability is the second most important property of a website (after good content), so it is important that the technology supports this. This includes:

  • observing how people currently write HTML documents, and designing content-models around these needs
  • finding a better approach to frames than the current one. Usability experts advise authors not to use frames (; yet frames clearly have a useful functionality. Problems of frames include:
    • The [back] button works unintuitively in many cases.
    • You cannot bookmark a collection of documents in a frameset.
    • If you do a [reload], the result may be different to what you had.
    • [page up] and [page down] are often hard to do.
    • You can get trapped in a frameset.
    • Search engines find HTML pages, not Framed pages, so search results usually give you pages without the navigation context that they were intended to be in.
    • Since you can’t content negotiatiate, <noframes> markup is necessary for user agents that don’t support frames. Search engines are ‘user agents’ that don’t support frames! But despite that, almost no one produces <noframes> content, and so it ruins web
      searches (and makes builders of such sites look stupid!)
    • There are security problems caused by the fact that it is not visible to the user when different frames come from different sources

More flexibility, future-proofing

As new technologies emerge, it is desirable not to bind documents to one particular technology but to allow flexibility in what can be accepted. For instance:

  • HTML binds the document to the scripting language used, so that it is hard or impossible to write a document that works with different scripting languages. Technologies used by XHTML 2, such as XML Events, allows the separation of document content and scripting, so that documents can be made that work on different user agents.
  • Fallback mechanisms allow a document to offer several equivalent versions of a resource and let the user agent decide the most appropriate to use, with a final fallback being to markup in the document. This makes documents more fault-tolerant — since if a resource is not available the document is still meaningful — and more accessible.

Less scripting

Achieving functionality through scripting is difficult for the author, restricts the type of user agent you can use to view the document, and impairs interoperability. We have tried to identify current typical usage, such as navigation lists, and collapsing tree structures, and include those usages in markup.

Better forms

HTML Forms were the foundation of e-commerce. Improving forms covers many of the points above: return XML, more accessible, more usable (such as client-side checking), more device independent, less scripting.

Filed under:   general
Posted by:   Molly | 13:28 | Comments (8)

Comments (8)

  1. I’m sorry, but almost none of the advantages he mentions are out of reach using today’s HTML 4.01 standard. In fact, most of them are commonly accepted best practices: accessibility, separation of presentation and structure, internationalisation. These are things professional developers have been achieving for years without any academia-led intervention on behalf of the W3. Also, the quote:

    > If XHTML 2 gets accepted it will draw the web community further into
    > the XML world.

    smacks of an overt and unsettling bias towards XML as “a good thing”. XML isn’t something that all developers would necessarily see as an advantage these days.

  2. I’m not sure this is much of an improvement since most of us (atleast the developers I know) have been working with best practices for years. I guess we’ll just have to wait and see 🙂

  3. I like Mr. Pemberton’s take on XHTML2. HTML is far too open-ended and thus often abused. It started and should have remained a document format; now it lacks much meaning. HTML5, imho, is making this much worse. I’m happy to have the option of using something that will me much easier to process in a simple, standard way. (Yes, I know we can do that with HTML now, but it hurts b/c of all the possibile extras that may be tied into it.)

    The arguments in favor of HTML remind me (in sort of an opposite way) of the arguments against JavaScript. People complain that JavaScript is an abominable language, but if you read Crockford’s JavaScript: The Good Parts, you realize that JavaScript is really a very nice language extended with a lot of not-so-good features. Yet JavaScript is not something you want to parse; you already have the interpreter in your browser. HTML, otoh, you may want to parse and work with. Currently, that’s not so fun. A simple document standard like XHTML2 brings the fun back.

  4. Most content providers don’t know or care about well-formedness. Many web coders don’t either. The ability to deal with broken documents is html’s greatest complication, and strength at the same time.

    Until high quality user-space tools exist to create good xml out of grungy user data, html will continue to be the path of necessity for web content.

  5. I was looking forward to XHTML2 for a number of years. It seemed like it struck a nice balance between DocBook and HTML plus more useful elements.

    But since then, XHTML2 has slide further and further into obscurity and essentially irrelevance. Microformats have shown that HTML 4.01 is good enough. The only major hiccup is bringing RDF in – and hopefully, with people like Shelley Powers involved, this is closer to happening in HTML5.

    HTML 5 scores heavily for having added in sections, headers, footers and navigation elements – exactly the things we’ve been waiting on XHTML2 to deliver.

    The above rationale I find bizarre, almost out of touch with reality. Parsing HTML with common tools is now almost trivial (I love the PHP Simple Dom Parser, but Tidy is equally a developer tool for treating web pages like a data source).

    The less presentation more structure points tries to make it sound like these options are solely the remit of XHTML2, yet we’ve been building websites with HTML 4.01 and CSS 2.1 for a number of years now. I don’t see where XHTML2 makes things better in a way that’s not possible with HTML5 (or CSS3).

    Fixing longdesc is a step in the right direction. There’s a divergence with HTML5 who just want to get rid of it because it’s so misused and presents little benefit.

    The usability regarding frames, I’m under the impression that modern browsers have solved a large number of these issues. It’s no longer a problem as far as I can see. I think the only problem I’ve seen with frames this year is Digg’s toolbar which interferes with the browser title bar not reflecting the actual non-digg page being viewed. It would be interesting to see how XHTML2 solves that problem, but I don’t see it would be compelling enough to adopt XHTML2 over HTML5 or HTML 4.01

    I gave up on XML derivations of HTML a while ago when we got to the point that XML’s draconian error handling penalises users more than site owners. Mark Pilgrim’s mini-rant of a few years ago that XHTML isn’t fit for use on the human-consumption part of the web is so painfully true.

    I’m tired of the cost of dealing with XHTML, tired of the elitist snobbery of ‘well, you should control every single piece of content on the page, all the time, every time to ensure it all remains well-formed’. XHTML2 might have been a good idea in Web 1.0, but with widgets and third party data sources, XHTML is just utterly unworkable unless developers become snobs themselves. And I can’t work like that.

    JavaScript is the other reason I walked away from XML based HTML. The correct way of using the DOM is with namespace-aware methods, which just clutters up the JavaScript and makes it harder for developers to write decent code.

    JSON is the XML killer. It works with today’s web. RDF is gaining traction through it’s non-XML serialisations. The W3C needs to be careful with XHTML2 because the signs are there that the web development community is routing right around it, looking for the pragmatic and practical solutions.

    I find the plus points of XHTML2 as expressed above just utterly unconvincing, and I’m disappointed that – if this is representative of the XHTML2 Workding Group – that XHTML2 seems to be the typical W3C effort of being totally disconnected from the Web we work and live in.

    The tradeoffs for using XHTML over HTML are too high, and the benefits not enough.

    (Disclaimer. I own I did have plans for it when XHTML2 became something concrete and usable. Five years later…)


  6. darn, mike/isofarro has pretty much said it all in my view. apart from the two points in “more flexibility”, almost nothing that steve writes seems to be exclusive to XHTML2 at all. if steve can’t sell the unique selling points of XHTML2, why should developers even care?

  7. Pingback: » The Real “Why XHTML” Discussion

  8. Comments are closed in this discussion now as a result of a mishap that occured regarding the files Steven and I exchanged. Besides, the other post is MUCH more interesting and personal. Please refer to The Real Why XHTML Discussion for further commenting.

    Thank you and sorry for the confusion!
    Molly 🙂

Upcoming Travels