User agents defining class names

A horrible new trend that abandons good design to help wrong practices

A new trend is disturbing me, and I'd like to nip it in the bud. There are various projects and activities which advocate giving meaning to specific class names (as they appear in the class attribute) so that clever user agents can extract extra meaning from the marked-up text, beyond any meaning that HTML can express.

  • Microformats attempt to express small chunks of information (like details of persons or events) in HTML in a machine-readable way. I think this is a wonderful idea!

    However, they do it by defining class names like vevent and vcard and expect these not to clash with author-defined classes. They even say that authors should move their own classes out of the way of microformats to avoid clashes!

  • Google now allow authors control over which parts of their pages should not be translated by Google. Another great idea!

    But they've also arbitrarily picked a class name, which authors now have to avoid if they don't mean it.

Note that, in both cases, the author has control of when to use these special class names — but he no longer defines their meaning! User agents that process his documents now define the meaning, and he has no control over them once he places them on the Web. Until these ideas came along, the author chose to link in stylesheets, which gave meaning to classes by styling up the associated content. Now he has to watch his step, and I see this as an instrusion on his namespace.

This is an impractical situation because the author has to keep track of an ever-growing set of definitions made by others, some authority has to keep track of them so they don't clash with each other, and authors may have to make changes to their sites retrospectively to accomodate new definitions. I say that this is not necessary to implement those good ideas above.

First, here are some mitigations of the approach of user agents defining class names:

  • Names won't clash very often — This isn't good enough for me. Name clashes simply weren't a problem before, because the meaning was expressed in distinct stylesheets, totally under control of the author. And when a clash happens now, the author has to move out of the way, which he never had to do before.

  • §7.5.2 of the HTML4.01 specification says that stylesheets are not the only way to interpret class names — Indeed, it implies that user agents can infer meaning any way they choose, but I don't think the intention was to do this without agreement of the author. He has no control over the user agents that process his documents on the Web, so he cannot choose their interpretations. He relies on user-agent writers following the Recommendations of the W3C to ensure that both he and they will make the same interpretations.

  • A Microformat class should only be interpreted when the document has included a profile for it — This is better, as the author now has some control. The inclusion of a URI at a certain place in the document now indicates that he is using a certain kind of microformat, but it still falls short in that he cannot choose the class names and map them to the meanings defined by the microformat. If he wants to start using a new format, he has to go back and check that his current classes don't clash with those of the new format.

I assert that all of these problems can be avoided, simply by adapting some existing mechanisms.

What we are really trying to do in these cases is to add properties to certain sections of content so that they will be processed differently or specially in certain contexts or applications. What options do we have to do this?

  • Add custom attributes to HTML — This could end up filling a page with material useless for most user agents, and makes validation messy. Though both of these are relatively minor problems, can we find a better way? After all, we still have to avoid clashes between lots of arbitrary attribute names.

  • Add custom attributes to XHTML — We can now use XML namespaces to avoid the clashes, but some say that the Web is not yet ready for XHTML.

  • Add namespace prefixes to class names — If the author can define, for a given format, a prefix to be used locally within the page, avoiding name clashes would be completely under his control. How would such a prefix be defined?

  • Associate external attributes to elements (like CSS does) — But what language could we use to express these attributes, how would we express the associations to HTML elements, and how would we link a set of associations to a page?

I've already suggested why I'd rule out the first two options, so I'll describe and evaluate the last two options in detail below.

Namespace prefixes for class names

This technique should give just enough flexibility for the author to avoid name clashes. The author would introduce his own prefix (e.g. google) for a microformat's namespace (identified by URI) like this:

<link rel="schema.google" href="http://google.com/ns/robots">

A user agent that understands this microformat will adjust itself to look only for class names beginning with google, for example:

<span class="google-notranslate">some text</span>
<span class="google-noindex">some text</span>

This should be very simple to implement, and requires no central authority beyond DNS. User agents could behave as follows to allow old pages to continue to use profiles while new ones can use prefixes:

  • Look for a prefix being defined for the namespace of your microformat, i.e. in the <head> element, find <link rel="schema.pref" href="your_namespace_URI">, and extract pref. Your prefix is then pref-.

  • Otherwise, search for the profile URI of your microformat, i.e. <head profile="... your_profile_URI ...">. If you find that, use an empty string, or some other documented default, as your prefix.

  • Otherwise, do not recognise any special class names for your microformat.

Using CSS for non-style properties

I think that the last option has the greatest flexibility because the choice of class names remains totally in the hands of the author (as it was before). It even allows authors to select affected content using expressions other than class names. However, it's much more complex to achieve, with significant implications.

To implement this approach, we could simply re-use the syntax and the linking methods of CSS. All we then need to do is define a set of CSS properties for each application (translation, and each microformat), and allow the specialised user agents to detect them.

This raises more issues, but I think they are easily solved:

  • Who will define all these new properties and prevent clashes? — The application developers will. Each property will belong to a namespace owned by its definer, and authors wishing to use them will define a prefix for each namespace. There will be no need for a central authority beyond the one for DNS.

    Namespace prefixes will be defined using the existing CSS syntax, and referenced as if vendor prefixes:

    @namespace google "http://google.com/css";
    
    .notranslate {
      -google-translate: disable;
    }
    

    …or preferably they would use CSS3 namespace syntax for type selectors:

    @namespace google "http://google.com/css";
    
    .notranslate {
      google|translate: disable;
    }
    
  • Won't regular browsers be downloading lots of useless CSS rules? — CSS rules for different applications could be stored in separate files, and loaded using media queries that only specialised user agents will recognise. These queries will be expressed using media types and features with prefixes chosen by the author to refer to application-specific namespaces:

    <link rel   = "schema.google"
          href  = "http://google.com/css">
    <link rel   = "stylesheet"
          media = "-google-translator"
          href  = "translation-styles.css">
    <!-- or -->
    <link rel   = "stylesheet"
          media = "google|translator"
          href  = "translation-styles.css">
    

    Again, there's no need for a central authority.

  • CSS is only for style; won't this break the separation between content and style that is considered so valuable? — CSS was designed for style, but is it really specific to it? The syntax for selectors is largely independent, and by exploiting vendor prefixes, the set of media types and features you might use to link a stylesheet in belong to an extensible set, so you can still keep style separated from other aspects. And the fact that one writes <link rel="STYLESHEET"> is irrelevant — STYLESHEET is just a mnemonic.

    Really, style is just one way of interpreting content, and in general, you're linking in an ‘interpretation sheet’, not a stylesheet.

The result is a method of designing completely new ways of interpreting content, not merely rendering it, without having to wait for an authority to standardize it, and while leaving choice of class name entirely to the author, where it should be.

As an example, another mode of interpretation could be the application of an indexer for a search engine. A property could be defined to tell search engines not to index or not to follow certain content within a page:

@namespace search "http://www.w3c.org/2009/search-engines";

.noindex {
  search|index: disable;
}

.nofollow {
  search|follow: disable;
}

In this case, a consortium of search engines could agree on these properties, without having to affect other assignments within W3C.

These styles are only relevant to search engines, so we should prevent regular browsers from loading them in with an appropriate media query:

<link rel="schema.search"
     href="http://www.w3c.org/2009/search-engines">
<link rel="stylesheet"
     href="search-styles.css"
    media="search|robot">

What do the Microformats people think about namespacing?

Namespaced content on the Web has failed.

Namespacing isn't a goal that can fail or succeed according to popularity or how much it is exploited. It is a technical means to a solve a problem, and so would only fail if it technically could not solve that problem. Does it solve it? Yes! Has it failed? No!

[…] in practice people write scrapers that look for namespace prefixes as if they are part of the element name, or perform literal string matches on common namespace prefix uses […], not as mere shorthands for namespace URIs.

People use perfectly good tools the wrong way all the time. There's nothing wrong with the tool, so the solution isn't to throw it away!

Namespaces are actually *not* well supported in sufficient modern browsers[…]

Only the plugins and extensions need to handle namespaces for class names (as described above). Look through the page's <link> elements for your URI and determine the local prefix.

Namespaces encourage people to seclude themselves in their own namespace and invent their own schema rather than reusing existing elements in existing formats. This hurts interoperability because a dozen different namespaces can all have their own slightly different semantics for the same element.

You can prevent that by having a community that looks at how microformats overlap and share structure. Oh, you seem to have one already, which you need anyway to stop name clashes between formats. Format inventors will want to be part of that community in order to get widespread support.

If you want to carry on a theoretical discussion of namespaces, please do so elsewhere, for in practice, discussing them is a waste of time, and off-topic for microformats lists.

Oh, how very open-minded!

[…]but it does mean that we ask less of [publishers] than most other standards efforts, which ask publishers to learn new languages, create new files, namespaces etc.

Publishers won't be creating new namespaces, just using them.

[…]humans first, machines second. One aspect of being more human-centric in design is about making it easier for humans in general to publish information in microformats, rather than just making it easier for machines (programs) to parse microformats. This seems like an obvious trade-off in that many fewer humans develop/write parsers than publish content, and thus making publishing easier benefits more people.

Instances of microformats are going to be read (by machines) thousands of times more than humans (and often machines) will write them. And the definition of microformat classes without namespace prefixes (or at least, some sort of author-controlled switch) makes it more difficult for an author to be sure he doesn't stumble on one.


Standard Navigation info

Ĉi tiu paĝo disponeblas ĉi-lingve, depende de la lingvo-agordo de via krozilo.