Semantics: Hot Again?
January 11, 2011
We hear this each year: Semantics will be hot in [fill in the year].
Kazeon’s The Future eDiscovery Arms Race: It is all about the Semantics investigates where the eDiscovery market must go in order to handle the growing volume of ESI while progressing the efficiency, accuracy and reliability of the search result process. After posing this question, where does Kazeon end up? At the doorstep of Semantics Future Institute.
Those who only speak English may not realize what a complicated language we really have. Words used regularly often carry several meanings, at times even four or more definitions to a single word. This is known as polysemy, and it throws a huge wrench in the search method.
We are beginning to see some workarounds to this issue finally realized. One of the major players is Latent Semantic Analysis, which plainly “searches documents for themes within the language usage and extracts the concepts, which are common to the documents”, helping to alleviate false positive results. Other aides include Word Search Disambiguation, which focuses on word meaning rather than merely matching character strings and Local Co-Occurrence Statistics, which counts how often sets of terms appear together within a predetermined period.
These tools will no doubt be helpful in refining search techniques, but our main question is, where was Kazeon five years ago when the semantic buzz began?
Sarah Rogers, January 11, 2011
Freebie
Mastering the Android OS Universe
January 10, 2011
Google’s quest for world domination of the smartphone operating systems’ universe may be thwarted says the ZDNet article, “Android’s Biggest Worry: Fragmentation,” . Despite growing market share and popular applications, the Android is fragmented by many complex issues including OEMs software, carrier applications, and multiple operating system versions.
“Android is growing, but it’s also growing complexity at the same time. Device fragmentation is not the issue, but rather the fragmentation of the ecosystem. So many different shops, so many different models. The carriers messing with the experience again. Open but not really open, a very Google-centric ecosystem,” says Peter Vesterbacka, one of Rovio’s founders and an Angry Bird developer, in a Tech N’ Marketing interview.
My money’s on Google. They’ve managed to conquer (or at least be a top contender) in the vast and complex information world, making search easy and effortless. Perhaps they can do the same for the Android? Is Google confident that Android fragmentation is a trivial problem?
Christina Sheley, January 10, 2011
Freebie
Nuggets: Real or Fake Gold?
January 10, 2011
Xoogler Daniel Tunkelang wrote a short item back linking to his earlier write up about information nuggets. You may want to take a look at “Exploring Nuggetize”. The illustration shows how the “nugget” method converts Noisy Channel articles into what are digital Post It notes with the key points extracted from the source. In the “Exploring Nuggetize” article there are references to facets, snippets, and search.
The key point in “Exploring Nuggetize” in my opinion was:
The nuggets are full sentences, and thus feel quite different from conventional search-engine snippets. Conventional snippets serve primarily to provide information scent, helping users quickly determine the utility of a search result without the cost of clicking through to it and reading it. In contrast the nuggets are document fragments that are sufficiently self-contained to communicate a coherent thought. The experience suggests passage retrieval rather than document retrieval.
Overall I am okay with the notion of nuggets and the highlighting of Dhiti and its Dive service. You can learn more about both at http://dhiti.com.
What caught my attention was the response by Dhiti in the comments section to the follow on write up “Enabling Exploratory Search with Dhiti”. The question Dhiti answered was related to the user’s behavior when the Dhiti “nuggetizing” widget is implemented on a blog.
Here’s the comment. Please, check the original here because I have trimmed the remarks for this post. Emphasis added by Beyond Search as well:
We [Dhiti] observe the following patterns…:
1) The widget does contribute to increased engagement. We see about 5-10% of readers “interact” with the widget, either to click through on an article… About 60% of the interactions are clicks on articles.
2) We notice that there’s a higher probability of readers reading the articles fully than normal…
3) We observe search referrals interact a lot more with the widget…. So there is more likelihood for exploration.
4) When a search query brings traffic to a page, Users … want to explore the site more for the same query!
5) Through the pivots, the publisher gets to know what their readers [are] … interested to explore around….
6) The pivots also provide cues to the publisher to create reference pages (like Wikipedia) …
Several observations:
First, “nuggets” is probably the wrong metaphor for this type of “informed extraction.”
Second, the approach offers some useful opportunities to metrics about a blog reader’s behavior. My reaction was, “Ah, something more useful than AdSense clicks or traditional log files.”
Third, the company has a good idea, is small with “three co-founders,” and based in Bangalore. Good idea and I have a hunch some of the big outfits in the world of search may be thinking about this function.
Stephen E Arnold, January 10, 2011
Freebie
Expedia and Filtered Search Results
January 10, 2011
Expedia must have learned how to throw temper tantrums from infamous JetBlue flight attendant Steven Slater. After American Airlines pulled out of online travel booking site Orbitz, competitor Expedia began hiding the airline’s results in their own searches.
Techdirt reports in “Expedia Against ‘Search Discrimination’… Unless It Gets to Do the Discriminating” that this response is rich considering Expedia is a leading member of FairSearch, a group dedicated to fighting search discrimination (mainly that proposed by Google’s acquisition of ITA software.
Here’s a snippet: “So, just as Expedia, in an attempt to complain about Google, claims it’s against search providers discriminating by manipulating results to promote or punish certain players, it’s doing so in a way that’s significantly more noticeable than anything Google is doing…To complain about this exact form of discrimination, while doing it in a way that’s much more noticeable than the one you’re complaining about? That’s pure, unadulterated hypocrisy.”
Is this the end of objective search results as we know it? Maybe but when people run a search on one of today’s big systems, data not in the result set may not be missed? Will consumers know to run a separate query to locate the “missing” information? Whom does filtering “help”?
Christina Sheley, January 10, 2011
Freebie
A Theory of Android Stickiness
January 8, 2011
Can mathematics, specifically Metcalfe’s Law, be used to explain the assumption that mobile phone users will stick with a particular platform like Android for a lifetime? The recent Asymco.com blog post “How Sticky is the Android” makes this attempt.
After a lengthy explanation, author Horace Dediu surmises that mathematically, stickiness can be derived from the equation “value of a platform is K n log(n), where K is the stickiness of sunk costs.” He goes on to say that “in the end it’s not just about how big the user base n is (which is the only thing that is measurable), it’s how contiguous n is and how compelling the content,” making these important factors in keeping individual’s engaged with a particular platform.
A must read for those interested in more theoretical explanation of consumer behavior. There have been anecdotes about the “value” of the iPhone to AT&T. These rumors pivot on the data consumption of an iPhone user compared to a user of a BlackBerry or other mobile device. Stickiness may be partly defined by data consumption. Will a fast and efficient search service reduce stickiness or increase it? There’s more work to be done on the subject of stickiness.
Christina Sheley, January 8, 2011
Freebie
Will DuckDuckGo Ruffle Feathers?
January 8, 2011
Search engine DuckDuckGo’s new marketing campaign, summarized in Search Engine Journal’s “DuckDuckGo Pitches Private Search” ) says that what differentiates them from Google is privacy—they don’t store personal Internet data or associate it with a user account.
The heavy-handed marketing maneuver is being touted by DuckDuckGo founder and sole employee Gabriel Weinberg in a Search Engine Land report as an educational tactic. “I am trying to make the privacy aspects of search engines understandable to the average person who doesn’t have a lot of background knowledge on the more technical aspects.”
We are interested to see if Weinberg’s approach ruffles the feathers of the average searcher. Will they sit up and take notice of the privacy issue or does the attempt fly south?
Christina Sheley, January 8, 2011
Freebie
Is Google Chasing Dessert and Ignoring the Main Course?
January 7, 2011
We love the Google in Harrod’s Creek? The Street View picture of our office is now a bush. Our listing is in “review” and has been for months. The goose finds these actions amusing.
“Google’s Decreasingly Useful, Spam-Filled Web Search” keeps an earlier write up’s points alive despite the gingerbread. (You can read the source of the Marco.org information at this link.) Among the points, the subject of “spam” is the most interesting in our opinion.
One person’s spam may be another person’s dinner on a cruise ship. Our view is that a Google query is a useful adjunct to other research actions.
Is Google increasingly becoming an outsider for certain types of online research?
For example, yesterday we had to dig up quickly some information from our Overflight archive about a “relaxed SQL” search vendor. Here’s what we did to locate the items of information:
First, we ran the general query on Exalead’s search at www.exalead.com/search. This index is not distorted by advertisements and has more than 10 billion pages in its index. We also use the Exalead engine for Overflight. We then did the query on Blekko.com (www.blekko.com) and plucked specific results before navigating to Web sites. Yep, old fashioned pre-retrieval vetting. Still works at ArnoldIT.
Second, we ran queries for the company’s founder, who is in indexes under several spelling variants. We think spelling variants are quite interesting, particularly when the vendor is involved in licensing technology to what seem to be “dating” or “meeting people” services. The systems we used were:
- Cluuz.com at www.cluuz.com. This appears to be a Yahoo BOSS service implemented on content in the Bing.com/Yahoo.com index
- The Google News Archive at http://news.google.com/archivesearch, using the advanced search functions to get the string variants
- Icerocket at http://www.icerocket.com/
- Collecta at http://www.collecta.com
Third, we did our patent searching using my favorite site, the USPTO at www.uspto.gov.
Notice that we did not use the general Google Web index. There were four reasons:
- Relevancy, unless the advanced search features are used for the query, is focused on the person looking for Lady Gaga, not “relaxed SQL”
- The date of documents is important to us and we find that figuring out the date of an item and the freshness of the Google index a bit of a challenge and frankly not worth the effort
- The automatic truncation and spelling correction functions override what’s stipulated in certain situations. When looking for proper name variants, I don’t want automatic anything. I want to see what I typed in the search query string
- The 32 billion Web pages, the ads, and the other stuff jammed into a Google results display are mental clutter for me. I now avoid trying to figure out what’s what by using other services.
How did we do? We learned from the outfit asking us to perform the research that we surfaced information that directly supported what the company developing “relaxed SQL” was saying in briefings.
Mission accomplished using Google as one component in a secondary process. That’s quite a change from our original dependence on Google in 2002.
My hunch is that Google is nearly perfect and the change in our Web search method is a result of mental degradation here in Harrod’s Creek. If you are dependent on Google, good for you.
Stephen E Arnold, January 7, 2011
Freebie
Naming Search Systems: The False Hit Challenge
January 5, 2011
Run a query for Thunderstone, the pioneer in search appliances and highly configurable systems. You get links to a rock and roll band. Our Brainware feed reader returns stories that hit on crazy videos and not too much information about the trigram technology that distinguishes Brainware from other search vendors.
The name of a search system is important. Get the name wrong and it becomes difficult to locate specific information about a search system. From the number of inquiries I get about search vendors, I think the names are becoming a more significant part of the company’s presentation of its capabilities.
A recent example is Mindbreeze, the search arm of Fabasoft in Austria. Now the Mindbreeze search product must contend with the disambiguation challenge: the picture.
For anyone considering a name for a search technology, checking for overlaps is a useful step. Then once a name is in hand, that name has to be managed to ensure that a person looking for the company can locate relevant information. In the new world of non objective search, the name is the thing. For search, marketing–not technology–is the differentiator in 2011 in our opinion.
Whitney Grace, January 5, 2011
Freebie
OpenText Cares
January 5, 2011
Vendors claim that their clients are the most important parts of their companies. They back up this claim with automated customer service hotlines and technical support that creates more answers than questions. OpenText/NStein has released a press statement to assure their clients that they will be treated with respect and care: “OpenText/Nstein’s ongoing commitment to our WCM’s customers.” Here’s a snippet:
“When you contact your Customer Care Center, we readily check for improvements that may have been made for your version of WCM and often customize existing patches for your needs. These patches and fixes are all based on improvements built in the newer versions of WCM; they were ultimately inspired by your ideas and suggestions to the Customer Care Centre and our long experience in content management.”
We find that statements about ongoing commitments are thought provoking. Why make them if the commitment is evident to licensees? Just a question to consider.
Whitney Grace, January 5, 2011
Freebie
Amazon and Its Fast Moving Cloud
January 3, 2011
Several years ago, I noted that Google’s technical papers described features and functions that were evident in Amazon’s actual services. At that moment, I realized the Google had lost its chance for a cloud utility play. Now the GOOG may come roaring back, but with the legal friction increasing, Amazon has some clean air through which to float its big, fast cumulus cloud. Sure, Rackspace is a competitor to Amazon, and every vendor is yammering about the cloud. But right now, the Amazon has a big PR push underway. Now, to be fair, the Amazon cloud generated a nasty storm with its hardware crash the other day. Not good.
That’s why the PR guns are firing. You can see two examples of “good news”. Navigate first to the “I love Amazon” sky writing from Netflix. “Why We Use and Contribute to Open Source Software” and “Netflix Touts Open Source, Ignores Linux.” Netflix, of course, is flying in the Amazon clouds. The other PR example is a bit of a downer for library types who expect books to be available. Point your browser thing at “Amazon Erases Certain Books on Kindle Due to Content.”
But despite the good and bad PR, Amazon managed to pull of an interesting and useful technical coup. “Announcing VM Import for Amazon EC2” said:
VM Import enables you to easily import virtual machine images from your existing environment to Amazon EC2 instances.
Useful for many applications. Crash recovery. I think so.
Net net: The others in the cloud race need to kick into a different gear. Google? A question, “Can you get that airplane aloft?” Storm clouds rushing in.
Stephen E Arnold, January 3, 2011
Freebie