Preface The following is a story of the Second Law of Thermodynamics* as it applies to human efforts to represent knowledge in ontologies or classification systems...But with a twist...Because knowledge rarely exists in a closed system, because all things are never equal, the best laid plans for orderly arrangements of information disintegrate into disorder. The very structures we build to make Sense of our data turns into Nonsense when, as The Dude (aka Jeffrey Lebowski) would say, "New sh#$ has come to light! Man." And in our unending quest for more information and more knowledge, "New sh#$ is always coming to light...Man" faster than our stiff-legged classification systems can handle. As a result, the systems ultimately reject the new data and what we're left with is a regurgitated mess of bits and pieces strewn across the landscape of your variegated and uncoordinated information gathering and storage devices: email clients, web mail clients, documents, pdas, paper calendars, sticky notes, notepads, envelopes, napkins, the back of your hand and last but not least, your brain.
*The Second Law of Thermodynamics states that in a closed system, entropy or the measure of disorder, always increases. In other words, the end is nigh, so stop filing your email!
What IS there to like about hierarchies?
There's been a lot of hierarchy bashing in the last year and a lot of buzz for more free-form, organic, ground-up, natural a-organizational, self-organizing
environments,
ecosystems,
auras,
campfires...primarily in the form of tags or labels (ie. gmail, delicious, flickr), folksonomies and faceted classifications.
Clay Shirky on how
Ontologies are Overrated
David Weinberger wrote Taxonomies to Tags: from Trees to Piles of Leaves for Release 1.0
Former Microsoft Development Lead on the original Exchange team writes about how they
did it all wrong with hierarchies
We've joined in all the hoopla as well with
BrowserDesign
But, just to be contrary, we now find ourselves asking the question, "Well what is there to like about hierarchies?" (and this time, we mean it!) Or, said another way,
let's not throw the baby out with the bathwater. Hierarchies can't please all the people, all of the time, but we think it's actually great for a lot of things. It's a powerful tool and needs to be wielded selectively and with great care. The rest of this paper is devoted to:
- Understanding the true nature of hierarchies
- Figuring out how we might employ them such that we exploit their strengths and render their weaknesses irrelevant
Yet another way to make the same point is, "Flatland, faceted classifications, tagsonomies and piles of leaves isn't all that either." They have their flaws and they are numerous and fatal.
...And one last thing, it just so happens that Flatland and Hierarchies make for a great marriage of complementary traits. Where one fails, the other succeeds and vice versa.
2 simple concepts to get before we proceed:
- Items are Discrete units of information
- Containers are Groupings of items, aka Folders, Collections
Definition of Hierarchy
The idea of Hierarchy can be expressed in the following rule and its corollaries:
Rule: A classification structure comprised of fixed parent-child relationships.
- Corollary 1: Items in the hierarchy can only exist in 1 location in the hierarchy
- Corollary 2: There are exactly 2 types of relationships in a hierarchy: Parent-Child and Sibling.
- Corollary 3: Items and Containers in the hierarchy are either wholly within or wholly without another Container in the hierarchy
Example of a well-groomed hierarchy
Imagine that you are an incredible-y anal dresser. You fold your clothing with the aid of a straight edge. (I've actually personally witnessed such a thing.) When you launder, you not only separate whites from darks, but as everyone knows, not all darks are created equal and red/oranges, green/blues and yellow/tan/greys are all washed separately as well.
Your organization of clean clothing is even more astounding. The Container Store is your temple and Elfa is the altar at which you worship. Your Closet is a strict 4-level, semantically encoded hierarchy: Occasion>>Mood>>Anatomy>>Layer
- Notice the (small) size of the vertical scrollbar
- Notice the total # of containers in the window...
- Closet_hierarchy.png:
In order to evaluate hierarchies as classification systems for real human beings, we must first lay down what people are actually trying to accomplish when they employ hierarchies in their day-to-day lives.
Organization is essentially the act of gathering stuff into containers OR the chunking of too much stuff into a manageable amount of stuff...
...People sometimes need to continuing gathering their containers of stuff into a hierarchy of containers because they have so many containers that they need containers of containers.
Some user goals for organizing (culled from sample sidebars, user interviews and academic user research):
- Provide a high level narrative of the scope and shape of their stuff
- Provide a guided navigation system to explore a particular topic
- Provide means for targeted search and retrieval of individual items
- Provide a way to explore content
- Gather stuff into containers for the sake of gathering the stuff in a container (ie. Playlist)
- Providing easy access to the stuff you need to get at often (ie. Favorites)
- Attach semantics to data
The reason we want to be clear about all of the various reasons people try to organize is to be clear that the reasons are numerous and very different. Oftentimes, when we talk about organizational schemes, the assumption is that one size fits all. That people organize only for one purpose. And that the ideal organizational scheme should optimize for that single purpose.+
The proposition is that Hierarchies are great at reasons #1 and 2. However because narrative and guided navigation experience are more often than not neglected and unexplored motivations for organization, Hierarchies have been generally underappreciated and unduly derided.
But really, when you think about it, organization, like so many other human activities is simply our struggle to impose patterns and meaning to an otherwise meaningless pool of data. This search for narrative and coherence amidst disorder and randomness is fundamental to everything we do.
Yet, in today's world of PIM applications, people have virtually given up on trying to get a handle on their information (except for a few staunch email filing hold outs). And when we talk "getting a handle on things", we don't mean making sure you reply to all your email or are able to find a specific piece of information. We mean understanding what direction things are headed in, where in your life is a fire about to break out, what's getting neglected that you'll regret later, how overloaded are you, where could you be more efficient, where are your priorities?
+Granted the Clay Shirky article does talk about the situations in which Hierarchies are useful (ie. Small, fixed data set that is professionally designed). But what he doesn't go into is
why people seem to insist on constructing hierarchies, even in non-ideal situations (ie. Large, ever-changing, data sets). Trying to figure out the
why is the central preoccupation of this paper.
Hierarchies are great at #1: story-telling. Precisely because they are
- so limited and therefore unambiguous and
- because the nature of hierarchies is to group potentially lots of things into an organizational scheme where only a few things are revealed to you at a time.
Case study #1: Closet hierarchy: The makings of a great hierarchy
Hierarchies are great at #1 IF they are semantically pure, IF each level of the hierarchy has consistent meaning.
Another way to put it would be that if Hierarchies are organizations of Containers, aka Categories or groupings of items centered around some user-defined concept, aka in Chandler-speak: Attribute value...then encoding semantics into the Hierarchy would be the equivalent of assigning a Category type or in Chandler-speak: Attribute.
Let's go back to our Fashion Nazi's Closet hierarchy: Occasion>>Mood scheme>>Anatomy>>Layer
The 4-levels of hierarchy is really just a means of encoding semantics into the organizational structure. The neatness of the hierarchy further
chunks down the 2,000+ containers of containers of containers of containers of stuff into 4 possible types of containers.
Hierarchies are great at #1 IF furthermore, at each level of the hierarchy, you are presented with the full range of options for that semantic level.
- Occasion Sleepwear, @Home, @Gym, Casual, @Work, Formal, Inaugural
- Mood Martha Stewart pastels, Teletubby brights, J-Lo bling, Boardroom darks, Church lady dours
- Anatomy Tops, Bottoms, Head, Shoulders, Arms, Wrists, Hands, Torso, Hips, Legs, Ankles, Feet
- Layer Underthings, Indoor layer, Overlayer, Outdoor layer.
Below is a graphical representation of the Anatomy level of the Closet Hierarchy. Again, the
ideal is to create a level that is exactly 1-dimension, filled out from end-to-end with
no overlap.
- Filled out end-to-end means that there is never any doubt that you are missing out on some cache of misfiled stuff (ie. I see email organized by Monday, Tuesday, Thursday, Friday, Saturday and Sunday, where's Wednesday's stuff?)
- People often times deduce the right answer, by eliminating all the obviously wrong answers down to the one that sounds the least wrong (ie. SATs). As a result, not being able to see all of the answers can be paralyzing.
- This is why Apple menus don't change depending on context. All of the options are there for you to ponder, whether or not they apply to the view, so that you can "deduce" the right answer. For example, in the Mailbox menu below from Apple Mail, the user might not know that the correct menu item for Stop Syncing is Go Offline without the aid of the greyed out Go Online helps them remember how they "Started Syncing" in the first place.
- Mailboxes_menu.png:
- No overlap means that there is never any doubt as to which container you should look in to find a particular item
- color_spectrum.png:
However as you can see by the graph, not even the Closet Hierarchy achieves this goal as there is both clothing designed exclusively for shoulders (ie. Mink stoal) and wrists (ie. wrist bands) as well as clothing that covers multiple sections of the body at once (ie. shirt, pants).
- 1-D_graph.png:
Hierarchies are great at #1 if the "complete spectrum" effect is enhanced if the categories are actually arrayed in some meaningful order, so that you really feel secure that all of the bases have been covered. (ie. Top to Bottom, Sunday-Saturday as opposed to just alphabetical order)
Here is an alphabetical arrangement of the days of the week:
- Friday
- Monday
- Saturday
- Sunday
- Thursday
- Tuesday
- Wednesday
versus a time-based arrangement of the days of the week:
- Sunday
- Monday
- Tuesday
- Wednesday
- Thursday
- Friday
- Saturday
The result of all of this chunking and containering is that you end up with an eagle eye view of the contents of your Closet, which in turn provides you with
- a pretty coherent narrative of the contents of your Closet
- a guided navigation system for targeted retrieval of stuff: Where did I put that diamond encrusted, evening gown?
Case study #2: Dewey Decimal System
The top level of the Dewey Decimal Classification (DDC) system is yet another example of how chunking information along a single dimension can be an effective way to communicate a coherent narrative about data.
In the graph below, I've laid out the DDC along the "Foof factor" dimension where Foof is a cross between Froofy and Poofy. The distribution of topic areas tells us the following story about the contents of Libraries:
- The bulk of writing lies in the middle of the curvie in the soft sciences and humanities
- There is considerably less writing at the ends of the spectrum: hard sciences and the arts
- This makes sense since you could say that the primary by-product of the soft sciences and the humanities is expository (explanatory) writing whereas the hard sciences and the arts are more concerned with creating "things" as opposed to writing about things: ie. theorems, technologies or works of art.
- 03_DDC_Bell_Curve.png:
Hierarchies are great at #2: Providing a guided navigation experience precisely because they are so great at #1
- Items are chunked into containers, containers are chunked into container types. All of this chunking provides you with a narrative of your data.
- The semantically pure levels of the hierarchy provide you with a roadmap of the series of decisions you will need to make at each level of the hierarchy
- The semantically consistent levels means that the series of decisions is always the same no matter which branch of the tree you explore
- Each level of the hierarchy is completely filled out, so you're never afraid that you might be missing out on something
Hierarchies are great at #2 because you help you get the job done in the fastest, most efficient and most effective way.
As it turns out, our Fashion Nazi has a special gift for constructing hierarchies. The Closet hierarchy is exceptionally designed for the following reasons:
In addition to the semantic purity, the Fashion Nazi has gone one step further to ensure that the semantics themselves are nested in the optimal order
- where optimal is defined as the ability to eliminate as much stuff as possible at each decision point in the hierarchy
- where decision point is defined as the point at which Fashion Nazi must choose a particular Closet container in order to travel to the next level of the Closet hierarchy
- where the Fashion Nazi is optimizing for getting dressed in the morning
At every level of the hierarchy, Fashion Nazi (FN) wants to encounter
only viable options of dress. Therefore, it makes most sense to define the top levels of the hierarchy with semantics such that FN can quickly eliminate most of the containers and only feel obligated to delve into 1 or 2. Anatomy and Layers would be a bad example of what to put at the top of the hierarchy. In order to get fully dressed, FN must dip into every Anatomy container and depending on the season (Summer, Spring/Fall or Winter), nearly every Layer container.
Right off the bat, FN cannot eliminate any branches of the tree. And what's more is that as he dips into each Anatomy container and then each Layer container, he is then confronted with the very thing he expressely did
not want from the outset: a whole array of
unviable options of dress: For each Body part and Layer, he must repeatedly select from an array of apparel for every Occasion and then for every Mood. OR worse, in the monotony of deciding upon Occasion and Mood over and over again, FN accidentally makes inconsistent Occasion and Mood selections and ends up at the gym with JLo Bling bunny slippers, Teletubby Red track pants and a formal Boardroom blazer.
A different way to put it is that Fashion Nazi
never wants to make the same decision twice. With Anatomy and Layer at the top of the tree, you're repeatedly presented with the Occassion and Mood options over and over again: 48 times to be exact. With Occassion and Mood at the top of the tree, you only make those decisions once.
In contrast, Occasion and Mood are great examples of what to put at the top of the hierarchy. Fashion Nazi always knows what occasion he's dressing for and it's usually only 1 occassion. Fashion Nazi is not always sure about his mood, but he can usually narrow it down to 1 or 2. These days, it's been more often JLo Bling and less often Martha Stewart pastels.
Right off the bat, 86% of the tree vanishes and once FN has figured out what kind of mood he's in, he is only ever presented with
viable options of dress: Clothing for every part of the body and every layer of dress that is appropriate for the occasion he is attending and the mood he is in.
In other words, not only do hierarchies chunk data down into containers and then chunk the containers down into container types, they proceed to prioritize the ordering of those container types by way of the fixed parent-child structure to help guide you towards your goal.
- In the Closet hierarcy, you only need to make the Occasion and Mood decisions once and then you're on your way...
- Closet_tree.png:
- In the Inverted hierarchy where Anatomy and Layer are at the top of the tree, you must make the Occasion and Mood decision 12x4=48 times!
- Inverted_tree.png:
The end result is an efficient and secure browsing experience
- The hierchical organization guides FN towards a coherent outfit, where each piece of clothing is appropriate for the occasion he is attending and conveys a consistent mood.
- As mentioned above, the filled out spectrum of possibility at each level of the hierarchy gives Fashion Nazi the confidence he needs to know that he hasn't overlooked some crucial cache of clothing (ie. underwear).
That's why hierarchies are good at Guided Navigation, because they help you find the right things in the right way.
Continue to Part 3 of 3:
HierarchyVersusFacetsVersusTags
Comments
four thoughts:
(1) on the definition of hierarchy:
Corollary 2: There are exactly 2 types of relationships in a hierarchy: Parent-Child and Sibling.
also: if it's a hierarchy you're viewing in an outline-ish form, another sneaky information-carrying relation can be wedged in: an ordering relationship among siblings:
- things to buy
- food!!
- tickets
- silly entertaining things
Order here encodes for importance -- most important first. This isn't a pure hierarchy, but that's how we often see them. (Automatic alphabetical order therefore takes away such information)
just as the ordering of items through the Parent-Child relation lets people understand that things on top have certain properties lower ones don't (e.g., are more general, encompass children), the ordering among siblings can stand in for certain properties. The only such sibling-order-encoded properties I can think of at the moment: importance, precedence, the order you created them
(2) "hierarchies are good at narratives/story-telling" -- that could also mean more simply, "explanation/understanding", right? [sorry, i'm overly suspicious of metaphorical uses of the term "story", please forgive]
(3) I think a strength of exclusive membership you're getting at is this: if an item lives in one place, you then know it doesn't live in all the other places.
If there's 5 bins: {A, B, C, D, E} and you know an item I lives in D, then by exclusive membership you know it does NOT live in the other 4. In fact, you know it isn't related to the other four categories.
But under a tag-soup system (tagsonomy), all you know is that it's in there. you have the knowledge
But with exclusive membership, you 4 more pieces of information:
- ~Related(A, I)
- ~Related(B, I)
- ~Related(C, I)
- ~Related(E, I)
So exclusive membership is powerful, but might be too rigid. If you introduce a hierarchy of bins, this allows one very specific loophole out of exclusive membership: an item belongs/is-related to not only the bin it's directly in, but also all of that bin's superbins. To allow this there's a very strict constraint: ALL items in a sub-bin must have that dual membership in the parent bin. (a subset relationship: forall I: [Related(subbin, I) => Related(parentbin, I)]
A: Work
|
|--B: Project1
| |
| |-C: contacts
| |
| |-D: calendar items
|
|-E
so if I is in C, by the usual exclusive reasoning we know it's not in D or E. But hierarchy loopholes us out of exclusive membership, and both A and B are also related to I. [this system is useful even with semantically mixed categories..]
This is a whole lot of implications from a hierarchy. If your data fits it, you're golden. If not, bad news...
(4) another (more obvious) strength of hierarchies: they describe aspects of not just items, but also the categories themselves.
work
|
|--project1
|--project2
Personal
|-family
|-friends
project1 is related to work in a way family and friends are not. The existence of this outline is a tiny bit useful, even without thinking of emails or whatever being contained inside any of the categories.
--
BrendanOConnor - 17 Jul 2005
Point #2: The use of the word story: I actually mean to use it for the specific reason that stories are linear. They dictate an ordered experience of something, whereas explanations could be multi-dimensional.
Point #4 is very interesting. In a faceted system, you might say that just as you would want to describe tags or attribute values with some kind of metatag (facet) or attribute value type (attribute)...you might want to go one step further and describe the metatag itself.
So Family is perhaps a sphere of life along with Friends and Work. But the Family and Friends spheres share the characteristic of being Personal.
In other words, hierarchical facets...which is what the presentation on Tuesday is going to be about :o)
--
MimiYin 18 Jul 2005
it's annoying to have a joint email/wiki discussion, so here's a copy-and-paste:
From: Brendan O'Connor
To: Mimi Yin
Cc: OSAF Development
Subject: [Dev] Re: Wiki pages for Virtuality presentation on Tuesday
Date: Sun, 17 Jul 2005 15:47:18 -0700
Mimi, I was confused about what a faceted system actually is. After
reading what you wrote, my interpretation is that it's when
(1) your data model supports "key: value" pairs attached to items
(2) your UI does sorts and queries based on these key:value pairs.
By that understanding, iTunes is a faceted system with (1) Artist:,
Title:, Album: keys, and (2) a UI that builds queries via navigation with
these keys.
Then "tagsonomy" or "tag soup" would mean
(1) your data model supports "value" tags attached to items. [or, it
supports key:value, but you always use the same key]
(2) your UI does sorts and queries based on these "value" tags.
By this understanding, GMail labels, or any flat non-exclusive-membership
category system, are these key-less "value" tags.
Are these useful definitions? Since I don't think I understand what you
mean, I just want clear definitions so I know I'm not misunderstanding
things.
Brendan
--
BrendanOConnor - 18 Jul 2005