A horrible new trend that abandons good design to help wrong practices
A new trend is disturbing me, and I'd like to nip it in the bud.
There are various projects and activities which advocate giving
meaning to specific class names (as they appear in the class attribute) so that clever user agents can
extract extra meaning from the marked-up text, beyond any meaning
that HTML can express.
Microformats attempt to express small chunks of information (like details of persons or events) in HTML in a machine-readable way. I think this is a wonderful idea!
However, they do it by defining class names like vevent and vcard and expect
these not to clash with author-defined classes. They even say that
authors should move their own classes out of the way of
microformats to avoid clashes!
Google now allow authors control over which parts of their pages should not be translated by Google. Another great idea!
But they've also arbitrarily picked a class name, which authors now have to avoid if they don't mean it.
Note that, in both cases, the author has control of when to use these special class names — but he no longer defines their meaning! User agents that process his documents now define the meaning, and he has no control over them once he places them on the Web. Until these ideas came along, the author chose to link in stylesheets, which gave meaning to classes by styling up the associated content. Now he has to watch his step, and I see this as an instrusion on his namespace.
This is an impractical situation because the author has to keep track of an ever-growing set of definitions made by others, some authority has to keep track of them so they don't clash with each other, and authors may have to make changes to their sites retrospectively to accomodate new definitions. I say that this is not necessary to implement those good ideas above.
Microformats Would Benefit from a Pseudo-Namespace – Jens Meiert — A discussion of problems caused by microformats' class namespace intrusion
First, here are some mitigations of the approach of user agents defining class names:
Names won't clash very often — This isn't good enough for me. Name clashes simply weren't a problem before, because the meaning was expressed in distinct stylesheets, totally under control of the author. And when a clash happens now, the author has to move out of the way, which he never had to do before.
§7.5.2 of the HTML4.01 specification says that stylesheets are not the only way to interpret class names — Indeed, it implies that user agents can infer meaning any way they choose, but I don't think the intention was to do this without agreement of the author. He has no control over the user agents that process his documents on the Web, so he cannot choose their interpretations. He relies on user-agent writers following the Recommendations of the W3C to ensure that both he and they will make the same interpretations.
A Microformat class should only be interpreted when the document has included a profile for it — This is better, as the author now has some control. The inclusion of a URI at a certain place in the document now indicates that he is using a certain kind of microformat, but it still falls short in that he cannot choose the class names and map them to the meanings defined by the microformat. If he wants to start using a new format, he has to go back and check that his current classes don't clash with those of the new format.
I assert that all of these problems can be avoided, simply by adapting some existing mechanisms.
What we are really trying to do in these cases is to add properties to certain sections of content so that they will be processed differently or specially in certain contexts or applications. What options do we have to do this?
Add custom attributes to HTML — This could end up filling a page with material useless for most user agents, and makes validation messy. Though both of these are relatively minor problems, can we find a better way? After all, we still have to avoid clashes between lots of arbitrary attribute names.
Add custom attributes to XHTML — We can now use XML namespaces to avoid the clashes, but some say that the Web is not yet ready for XHTML.
Add namespace prefixes to class names — If the author can define, for a given format, a prefix to be used locally within the page, avoiding name clashes would be completely under his control. How would such a prefix be defined?
Associate external attributes to elements (like CSS does) — But what language could we use to express these attributes, how would we express the associations to HTML elements, and how would we link a set of associations to a page?
I've already suggested why I'd rule out the first two options, so I'll describe and evaluate the last two options in detail below.
This technique should give just enough flexibility for the author to avoid name clashes. The author would introduce his own prefix (e.g. google) for a microformat's namespace (identified by URI) like this:
<link rel="schema.google" href="http://google.com/ns/robots">
A user agent that understands this microformat will adjust itself to look only for class names beginning with google, for example:
<span class="google-notranslate">some text</span> <span class="google-noindex">some text</span>
This should be very simple to implement, and requires no central authority beyond DNS. User agents could behave as follows to allow old pages to continue to use profiles while new ones can use prefixes:
Look for a prefix being defined for the namespace of your
microformat, i.e. in the <head> element, find <link rel="schema.pref"
href="your_namespace_URI">, and extract
pref. Your prefix is then
pref-.
Otherwise, search for the profile URI of your microformat, i.e.
<head profile="...
your_profile_URI ...">. If you find
that, use an empty string, or some other documented default, as
your prefix.
Otherwise, do not recognise any special class names for your microformat.
I think that the last option has the greatest flexibility because the choice of class names remains totally in the hands of the author (as it was before). It even allows authors to select affected content using expressions other than class names. However, it's much more complex to achieve, with significant implications.
To implement this approach, we could simply re-use the syntax and the linking methods of CSS. All we then need to do is define a set of CSS properties for each application (translation, and each microformat), and allow the specialised user agents to detect them.
This raises more issues, but I think they are easily solved:
Who will define all these new properties and prevent clashes? — The application developers will. Each property will belong to a namespace owned by its definer, and authors wishing to use them will define a prefix for each namespace. There will be no need for a central authority beyond the one for DNS.
Namespace prefixes will be defined using the existing CSS syntax, and referenced as if vendor prefixes:
@namespace google "http://google.com/css";
.notranslate {
-google-translate: disable;
}
…or preferably they would use CSS3 namespace syntax for type selectors:
@namespace google "http://google.com/css";
.notranslate {
google|translate: disable;
}
Won't regular browsers be downloading lots of useless CSS rules? — CSS rules for different applications could be stored in separate files, and loaded using media queries that only specialised user agents will recognise. These queries will be expressed using media types and features with prefixes chosen by the author to refer to application-specific namespaces:
<link rel = "schema.google"
href = "http://google.com/css">
<link rel = "stylesheet"
media = "-google-translator"
href = "translation-styles.css">
<!-- or -->
<link rel = "stylesheet"
media = "google|translator"
href = "translation-styles.css">
Again, there's no need for a central authority.
CSS is only for style; won't this break the separation
between content and style that is considered so valuable? —
CSS was designed for style, but is it really specific to it? The
syntax for selectors is largely independent, and by exploiting
vendor prefixes, the set of media types and features you might use
to link a stylesheet in belong to an extensible set, so you can
still keep style separated from other aspects. And the fact that
one writes <link rel="STYLESHEET">
is irrelevant — STYLESHEET is just a mnemonic.
Really, style is just one way of interpreting content, and in general, you're linking in an ‘interpretation sheet’, not a stylesheet.
The result is a method of designing completely new ways of interpreting content, not merely rendering it, without having to wait for an authority to standardize it, and while leaving choice of class name entirely to the author, where it should be.
As an example, another mode of interpretation could be the application of an indexer for a search engine. A property could be defined to tell search engines not to index or not to follow certain content within a page:
@namespace search "http://www.w3c.org/2009/search-engines";
.noindex {
search|index: disable;
}
.nofollow {
search|follow: disable;
}
In this case, a consortium of search engines could agree on these properties, without having to affect other assignments within W3C.
These styles are only relevant to search engines, so we should prevent regular browsers from loading them in with an appropriate media query:
<link rel="schema.search"
href="http://www.w3c.org/2009/search-engines">
<link rel="stylesheet"
href="search-styles.css"
media="search|robot">
namespaces considered harmful — From the Microformats Wiki
Namespaced content on the Web has failed.
Namespacing isn't a goal that can fail or succeed according to popularity or how much it is exploited. It is a technical means to a solve a problem, and so would only fail if it technically could not solve that problem. Does it solve it? Yes! Has it failed? No!
[…] in practice people write scrapers that look for namespace prefixes as if they are part of the element name, or perform literal string matches on common namespace prefix uses […], not as mere shorthands for namespace URIs.
People use perfectly good tools the wrong way all the time. There's nothing wrong with the tool, so the solution isn't to throw it away!
Namespaces are actually *not* well supported in sufficient modern browsers[…]
Only the plugins and extensions need to handle namespaces for
class names (as described above). Look through the page's
<link> elements for your URI
and determine the local prefix.
Namespaces encourage people to seclude themselves in their own namespace and invent their own schema rather than reusing existing elements in existing formats. This hurts interoperability because a dozen different namespaces can all have their own slightly different semantics for the same element.
You can prevent that by having a community that looks at how microformats overlap and share structure. Oh, you seem to have one already, which you need anyway to stop name clashes between formats. Format inventors will want to be part of that community in order to get widespread support.
If you want to carry on a theoretical discussion of namespaces, please do so elsewhere, for in practice, discussing them is a waste of time, and off-topic for microformats lists.
Oh, how very open-minded!
microformats principles — Lowering barriers for publishers — From the Microformats Wiki
[…]but it does mean that we ask less of [publishers] than most other standards efforts, which ask publishers to learn new languages, create new files, namespaces etc.
Publishers won't be creating new namespaces, just using them.
[…]humans first, machines second. One aspect of being more human-centric in design is about making it easier for humans in general to publish information in microformats, rather than just making it easier for machines (programs) to parse microformats. This seems like an obvious trade-off in that many fewer humans develop/write parsers than publish content, and thus making publishing easier benefits more people.
Instances of microformats are going to be read (by machines) thousands of times more than humans (and often machines) will write them. And the definition of microformat classes without namespace prefixes (or at least, some sort of author-controlled switch) makes it more difficult for an author to be sure he doesn't stumble on one.