So here we are at last. What happens next with this blog, I wonder? I’ve been giving it some thought, but I’ll leave that for another post.
For the moment, we’re discussing text visualization. There is a meta-visualization element inherent to text visualization, since text is, after all, already a symbolic representation. If you go back far enough in human history, to pictograms, hieroglyphics and their ilk, the difference between writing and visualization disappears entirely. There is something philosophical to be said here about semantics and symbolic meaning, but I’ll save that for a late night conversation over a bottle of wine (or two).
As it is, text visualizations are faced with the double-edged sword of visualizing data which already contains inherent meaning. When your glyph is a sphere, you have a lot of flexibility in how you use it, given how abstract it already is. When your glyph is the word, ‘methamphetamine,’ however, you’re faced with the challenge of using a symbol that is not only unwieldy, but contains more information and associations than any normal glyph. I think this is the primary reason that both the best and worst visualizations I’ve ever seen were, at least partially, text visualizations. Furthermore, the decision of what is important in the data becomes quite difficult when dealing with subjective data, such as a large amount of text. Though word count can be revealing, it is only an indirect way of getting at the variable of concern, which is semantic meaning. Both of the visualizations we looked at for this week had admirable ways of dealing with these issues, and I was impressed overall. That these very intelligent and complex visualizations were still so flawed speaks more to the difficulty of the task than to specific issues with the visualizations themselves.
DocBlocks, a tool for visualizing the content of legislation, primarily focused on this problem of semantic meaning. By using a learning program it attempted to create a way to classify sections of legislation by the subject area, regardless of the overall subject of the bill (for instance, locating a gun rights law in a credit card bill). Though the authors assert that their categorization program works ‘well,’ they also point out that it often multiply categorizes sections or does not appear to give a specific categorization at all. This makes me wonder, since the authors intend to make this publicly available, why they can’t implement a wiki-model, using the program to identify uncharacteristic passages within a bill (which their program can already do) and allow users to categorize the section themselves.
In terms of the actual visualization of the data, the stacked lines were servicable, though nothing specatular. Multiple categorizations could become cluttered quickly, but the combination of a search bar and an intuitive zoom tool would be incredibly useful. If this was going to be a public tool, it would be useful to have a way for other users comments on specific sections to be accessable through the visualization, perhaps as a tool-tip menu.
Parallel text clouds is a really impressive project. The number of different (and original) visualizations that have been linked in a highly interactive manner is prodigious. It takes nearly eight minutes in this video just to gloss over all of the capabilities of this viz. Though all this was very shiny, I couldn’t help feeling by the end that I had just used a supercomputer to solve a sudoku. Though all of the tools were useful, I felt that the amount of information in the wordcloud itself, while interesting, was relatively sparse. It strikes me that cases could be categorized by subject based on relevant case law cited and the same data could be presented more legibly in a series of simple bar graphs.
Words already carry so much meaning that visualizing them without reducing them to abstractions or numbers is more likely to create a mess than a masterpiece. Textclouds are a perfect example. Though they are the primary text visualization available, they are notoriously useless for actually extracting relevant information. The text visualizations that I’ve seen work are those like Ben Fry’s On the Origin of Species that charts the changes and additions to the text over different editions. The major difference here is that the semantic meaning of the words is irrelevant to the visualization itself. The text is, of course, available directly from the viz, such that insights gained from the latter can be applied easily to the former. However, visualizing the structure of a text is much more accessible than visualizing the meaning. We’ve spent the last ten-thousand years creating systems of writing that can convey information too complex to put into pictures. In text visualization, to a certain extent, we’re trying to reinvent the wheel.