07 February 2011

Hunting the Spotted Hiybbprqag Mountweazel: The Google / Bing Thing

Last week, Google claimed Bing has been stealing its search results, sparking a rather public tussle over legitimate uses of clickstream monitoring. While the debate itself has been interesting, I'm more struck by the way in which Google discovered the supposed theft:
We created about 100 "synthetic queries"—queries that you would never expect a user to type, such as [hiybbprqag]. As a one-time experiment, for each synthetic query we inserted as Google’s top result a unique (real) webpage which had nothing to do with the query. [...]

To be clear, the synthetic query had no relationship with the inserted result we chose—the query didn’t appear on the webpage, and there were no links to the webpage with that query phrase. In other words, there was absolutely no reason for any search engine to return that webpage for that synthetic query. You can think of the synthetic queries with inserted results as the search engine equivalent of marked bills in a bank.

We gave 20 of our engineers laptops with a fresh install of Microsoft Windows running Internet Explorer 8 with Bing Toolbar installed. As part of the install process, we opted in to the “Suggested Sites” feature of IE8, and we accepted the default options for the Bing Toolbar.

We asked these engineers to enter the synthetic queries into the search box on the Google home page, and click on the results, i.e., the results we inserted. We were surprised that within a couple weeks of starting this experiment, our inserted results started appearing in Bing. Below is an example: a search for [hiybbprqag] on Bing returned a page about seating at a theater in Los Angeles.

Why's this so interesting? Producers of dictionaries and maps used more or less the same technique to catch copyright infringers in the 19th and 20th centuries. Often called Mountweazels, after the fictitious entry for Lillian Virginia Mountweazel in the New Columbia Encyclopedia, these traps allowed publishers to identify violations of their copyright.

It's easy to see why dictionaries and maps might need Mountweazels. Reference works intend to be informative, with short, functional prose; and maps of course should all show the same streets in the same location, if not the same information about those streets. Since dictionaries and maps -- if they're any good -- should accord with all other dictionaries and maps, it's tempting for publishers to filch each others' work.

In other words, publishers construct, then disseminate fictions -- fictions which often take on a life of their own -- in order to protect the actual and real content of a work, as well as its reputation as purveying "true" information. Mountweazels are thus a beautiful confluence of medium and message, form and genre, literature and copyright law; they show how world and word weave together across time and through different platforms. Google's "hiybbprqag" doesn't quite have the same charm as, say, esquivalience, defined in the 2001 edition of The New Oxford American Dictionary as "the willful avoidance of one's official responsibilities," or Guglielmo Baldini, my favorite fake Italian composer -- but with time, it may become as notorious, taking on a meaning of its own. Whether Google intended it or not, it has fabricated an entity that acts in and on the world.

A few Mountweazels, trap streets, and mistaken entries:
  • Dord, a word erroneously added to the 1934 edition of the New International Dictionary after a slip reading "D or d, cont./density" was misread as "Dord," meaning "density."
  • From the 1975 New Columbia Encyclopedia: "Mountweazel, Lillian Virginia, 1942-1973, American photographer, b. Bangs, Ohio. Turning from fountain design to photography in 1963, Mountweazel produced her celebrated portraits of the South Sierra Miwok in 1964. She was awarded government grants to make a series of photo-essays of unusual subject matter, including New York City buses, the cemeteries of Paris and rural American mailboxes. The last group was exhibited extensively abroad and published as Flags Up! (1972) Mountweazel died at 31 in an explosion while on assignment for Combustibles magazine."
  • Guglielmo Baldini appeared in the 1980 New Grove Dictionary of Music and Musicians, along with Dag Henrik Esrum-Hellerup. Baldini had made his fake debut in the work of a century earlier in the work of German musicologist Hugo Riemann.
  • Beatosu and Goblu, Ohio were both added to the official state of Michigan map in 1978-9. Take a closer look at the name: they refer to the University of Michigan ("Go Blue!") and their rivals from Ohio State University ("Beat OSU!"). The chairman of the State Highway Commission, a U of M alumnus, inserted the fake towns -- mostly for fun, but possibly with trap street intentions. More on copyright traps in maps here.


Sarah Werner said...

I love this! I suspect that hiybbprqag might have a nice future, if only because it is such a horrible, impossible mouthful. In fact, I suspect it's a verb longing to be set free: Perhaps we should hiybbpraqag this post?

I also like this quote from Erin McKean in the New Yorker piece about equiasvalience: "As for “esquivalience” ’s excesses, McKean made no apologies. “Its inherent fakeitude is fairly obvious,” she said. “We wanted something highly improbable. We were trying to make a word that could not arise in nature.” Indeed, “esquivalience,” like Lillian Virginia Mountweazel, is something of a maverick. “There shouldn’t be an ‘l’ in there. It should be esquivarience,” McKean conceded. “But that sounds like it would mean ‘slight differences between racehorses.’ ”"


That use of "fakeitude" is just fabulous.

Whitney said...

Great quote!

I agree hiybbprqag will have a nice future -- in fact, it's the perfect web-born Mountweazel, since it's all but unpronounceable in English, and so much online communication is through text rather than voice. (I've chatted with you lots, but never heard you speak!)

John McVey said...

Thought of you and Mountweazel when the Google/Bing contretemps surfaced. Might we think of a Mountweazel as a species of literary genre? It's certainly creative lexicography!

I should be searching for mountweazel in 19th century telegraphic codes...

Whitney said...

A telegraphic code mountweazel would be *amazing* -- it's just the kind of book that would need one!

Although if different companies produced different books, I suppose the whole content of any given codebook could be considered a kind of mountweazel. Definitely a genre that needs more investigation.

Mediterranean kiwi said...

A most interesting post - I should try writing a Mountweazel myself on my own blog. I write about the cuisine of the Mediterranean island of Crete, and often find myself using a local Greek word that I have transliterated into English (as my blog is only in English). Searching for these novel words through Google, I often find that my own blog posts come up more often than other writers, and I'm always wondering whether I am creating a new English word, or disseminating an incorrect idea about a transliterated Greek word...

Terrence Lambard said...

I read that these are lapses within the semantic search facet of Google's supposed algorithm. From the SERPS, normally google has word recommendations or spelling recommendations.

seo services sydney