Natural Language Processing Not Suited for Web Searches
August 9, 2011
There’s a new cowboy in town and he’s shaking up the search engine industry. The article, Real Language Q&A: The Next Generation of Search?, on Search Engine Journal, explores the practicality of Oren Etzioni’s recommendations for search engines in his new paper, titled, Search Needs a Shake Up, published in Nature.
According to Etzioni, current search engines have not kept up with the time. The reliance they have on old algorithms with results displayed as a list that can run into the millions is no longer practical. As the article explains,
“In Etzioni’s view, the next generation of search would abandon the “blue link” structure in favor of directly answering the questions of users. “Moving up the information food chain requires a search engine that can interpret a user’s question, extract facts from all the information on the web, and select an appropriate answer,” he states. The tricky part, though, is in finding the answer. With so many ambiguities, it’s difficult to see how most questions could be answered by a search site.”
Conveniently, Etzioni offers his own University of Washington’s Reverb program as a step in the right direction. Reverb relies on Natural Language Processing (NLP) which is an interesting direction for search engines, but depends entirely on the reliability of the user’s question.
In a world of Etzioni’s search engine, the functionally illiterate would never receive an accurate search result because the search engine would never recognize, “who be prez bamas baby mama?”
While it would be a lovely world to live in if life was as well-spoken as Jeopardy and Watson could answer our every question quickly and precisely, that is not the case and never will be. NLP works well with voice searches and should stay there. Though Etzioni poses some interesting questions and points out the while elephant in the search engine room, the answer is not as simple as NLP. At least not yet.
Catherine Lamsfuss, August 9, 2011
Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search
AIIM White Paper on SharePoint Deployment
August 9, 2011
The Association for Information and Image Management is a lead, non-profit organization that provides education, research, and best practice methods to help organizations control and optimize their information. On their web site is a free white paper entitled, “SharePoint-Strategies and Experiences.” It focuses on how SharePoint deployment is happening so quickly that organizations aren’t thinking about the information management implications for documents stored on web sites and libraries. There is also confusion about how SharePoint can be utilized along existing document management platforms and ECM (enterprise content management) systems. We learned:
SharePoint 2007 has been a massively popular release, and our respondents generally consider that it does a good job in most areas. Many organizations have deployments spanning hundreds of locations with thousands of team sites. A significant proportion has already rolled out access to100% of their employees, making SharePoint perhaps the first true enterprise-wide content management system. Return on investment has generally been as expected or better, despite the fact that an astonishing 50% of installations went ahead without any formal business case being required.
For this white paper, AIIM measured users experiences and extrapolated what to expect for SharePoint and ECM in the future. ECM vendors are adapting their products to coincide with SharePoint, because of its growing popularity with businesses. Other major topics that are focused on in the article are strategies for integration, use of third-party applications, utilization for traditional ECM applications, and implementing SharePoint based off user feedback.
To read this article you will need to join AIIM, but it’s free and their research and findings will help you understand the current state of SharePoint 2010. What the report should have included is that SurfRay Ontolica is the best third party application to improve your SharePoint search.
Stephen E Arnold, August 9, 2011
Sponsored by SurfRay, developers of Ontolica for SharePoint
Google Tightens YouTube Thumb Ties
August 8, 2011
Google must have been inspired by the Kenton Knepper modern thumb tie “trick”.
Lady Gaga is one of the most popular artists today and has millions of fans that follow her every move. However, according to the NPR article “Lady Gaga’s YouTube Account is Suspended” no one is above the rules even a mega superstar. According to the article Lady Gaga’s Google Inc. owned YouTube account was suspended “due to “multiple or severe violations of YouTube’s copyright policy.”
It is company policy to remove accounts after there have been three copyright violations. The singers account was restored later the same day but the whole situation makes one wonder just how far Google and others are willing to go. The site in question is run by Lady Gaga officials so it seems hard for her to infringe on her own copyright laws. If Google suspends an account of such a popular personality then that means that the rest of the public better beware.
You may sign into your YouTube account only to find that you have been accused and found guilty of YouTube violations. It will be interesting to see how Google and others target alleged “wrongdoers” and lay down the law.
With the Kenton gizmos struggling is not a good idea. Even magic acts can go wrong.
April Holmes, August 8, 2011
ACIS Shoulders into the SharePoint Search Market
August 8, 2011
According to the PRWeb article “ACIS Extends Enterprise Search Services into the Cloud — Opportunities for Cutting Costs While Increasing Quality of Search,” ACIS (ACIS Consulting Inc.) recently announced the introduction of its Cloud-based Enterprise Search Service (EsaaS). ACIS asserts that it is “one of the most experienced Microsoft Fast Enterprise Search technology developers and systems integrators.)
The new EsaaS system, according to our source:
…is a complete solution plus services package that includes a cloud-based IT infrastructure, a highly advanced enterprise search platform, professional consulting and management services that tie it all together.
Clients can look forward to reduced costs as well as a noticeable improvement in search experience which in turn improves overall their customer satisfaction. As more and more enterprises realize the importance of the cloud, ACIS believes that the cloud working side by side with their technology can better meet the needs of customers. Efrem Habteselassie, principal at ACIS Consulting made the following statement:
This is why we are launching this ESaaS service offering – to combine the benefits associated with cloud computing with our world-class experts to offer the first complete Enterprise Search as a Service package.
Seems like the beginning of a beautiful and lucrative relationship. The challenge, of course, will be differentiating the ACIS solution from Microsoft’s own solutions, industrial strength solution providers like Exalead, and the dozens of Certified Gold firms offering similar services. ACIS does have some interesting acronyms.
April Holmes, August 8, 2011
Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search
Inteltrax: Top Stories, August 1 to August 5
August 8, 2011
Inteltrax, the data fusion and business intelligence information service, captured three key stories germane to search this week, specifically about the careers that have either sprouted up or drastically changed during data analytics’ rise.
The first such story, “Big Data Architects in Demand” involved the rising importance of digital architects in the expanding data warehouse field.
Following the trend of data warehouses evolving with job responsibilities, our story, “Warehouse Database Administrator Roles Changing” showed how this important role, too, is rapidly altering as newer and more niche-oriented technology becomes available.
Stepping away from the warehouse and into the halls of congress, the article “Congressman Issa Weaves Government and Analytics Tighter” shows how the roles of politicians are changing and becoming more efficient thanks to analytic tools.
While the economy is sputtering in many areas, we’ve seen nothing but growth in business intelligence and data analytics since launching our site over a year ago. Routinely, analytics firms post record earnings, which leads to more job opportunities. We expect to see this employment market grow and evolve as more companies learn how analytics can help them.
Follow the Inteltrax news stream by visiting www.inteltrax.com
Patrick Roland, Editor, Inteltrax, August 8, 2011
Sponsored by Digital Reasoning, developers of the next generation content analytics system Synthsys.
FCC Close Captioning: Unintended Side Effects?
August 8, 2011
In July, the Federal Communications Commission inadvertently handed a gift to indexers of Web content. Broadcast Engineering reported, “FCC sets six-month deadline for Internet closed captioning.”
The goal, of course, is improved accessibility to video content for the hearing impaired. The new rules continue the spirit of the 1996 Telecommunications Act which required closed captioning for most TV shows. Now that Web programming has become central to modern life, the requirements bring this accessibility up-to-date. Writer Michael Grotticelli states,
“Next January, captioning for live and near-live programming must be online. By next July, all prerecorded programming ‘substantially edited’ for the Internet must be captioned. The report recommends performance objectives, technical standards and regulations. No information can be lost in the transcoding process, including spelling, positioning, timing and presentation.”
We would like to point out the side effect this will have on the search industry: this development will makes it much easier for content to be indexed. Speech to text is not so hot, so putting the burden on the video maker shifts costs.
It will also create a legal gotcha for those who violate the guideline, so watch out. See here for the text of the report.
Cynthia Murrell, August 8, 2011
Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search
Boost Your SharePoint Search Maturity
August 8, 2011
All SharePoint developers want to further their SharePoint deployment and make the program work to its maximum efficiency. In order to do so, developers must plan and take appropriate steps. The Nothing But SharePoint web site recently posted an interested article entitled, “Operational Steps to Boosting Your SharePoint Search Maturity Level” and it is loaded with useful information. Instead of being a “how-to” article with step-by-step instructions how to do something, it offers explanations how on to improve deployment and collects articles that describe the upgrades. The main concentration of the article is on SharePoint search and how to improve it.
This is divided into three key areas:
- Leverage Content Types and Properties-allows users to navigate and refine searches.
- Customize results to specific needs-adapting webpages to match users needs and adapting pages to the needs of different audiences.
- Make results actionable-allowing users to take information to the next logical step.
The article tells us:
As we have seen, there are three key areas where specific improvements will improve the maturity of a SharePoint implementation. These can be achieved within the limitations of the standard SharePoint functionality through a combination of configuring webparts, editing XML and XSLTs and extending web-parts.
SharePoint is currently using Fast search technology to power users’ search queries. These key areas are meant to guide your deployment, but only once you start implementing them will you be able to see where you need concentrate on. After reading this article, I can’t help but wonder, “what is the future of Fast and the various flavors of SharePoint search?” We’ll keep you posted on that, but in the mean time SurfRay Ontolica is one tool that will help you improve your SharePoint search.
Whitney Grace, August 8, 2011
Sponsored by SurfRay, developers of Ontolica for SharePoint
Google: Now in the Barrel
August 7, 2011
What happens to a rodeo clown in a barrel? Most bulls eye the barrel and they continue to do unpredictable things. But when the rodeo clown is in the barrel, the clown knows a hard knock may be coming. In the Silicon Valley ranch land, there’s a bull riding contest underway. Google, it seems, is now in the barrel. The bulls are charging around and it seems as though some of the bigger ones are looking to put some horn on the clown in the barrel. Every company gets a turn in the barrel. Silicon Valley is mostly family with a handful of Hatfield and McCoy schisms. Siblings and cube mates may leave one company to go to another, but the closeness and shared experiences persist. When a company finds itself in the barrel, then there is a moment of fear, then horror, and then shock. How can those folks who shared a cube or a dorm room or a significant other find themselves in a fight that can become bloody, even fatal?
Easy.
Power is like those nifty sticks that some of the southern Europeans jab into tomorrow’s dinner. The grace and elegance is superficial. The real business of irritating the bull is the thrill of the kill and the utter, total domination of the bull. In the US, we don’t do outright public bull killing. We use the word sport and focus on the clown thing. It is a bit like sumo wrestling with combatants wearing plump, air filled fat suits.
I enjoyed “With Google There Will Be Bad Blood.” The run down of the bulls, matadors, and clowns is interesting. I found this passage notable:
Speaking of failed Google acquisitions, after Google tried and failed to buy Yelp and Groupon, they moved forward on products that competed directly with them. In the process, Yelp has felt Google was actively screwing them in search results. Bad blood galore now. On the smaller startup side of things, both Color and Path turned down massive acquisition offers from Google. Part of it was because the startups wanted to remain independent, but a large part was also that neither groups of employees wanted to work for Google. Naturally, Google has since been working on products that compete with both — not only Google+, but also mobile apps created through Google’s Slide division.
This passage resonated with me for three reasons.
First, if the statement is accurate, it caused me to understand that Google’s approach to innovation is to learn, try to acquire, and then do the “me too” thing. With innovation pushed to the product level at Google, I wonder how much “me too” is operating in Google’s core?
Second, Google seems adept at identifying what’s hot. My question is, “Why doesn’t Google convert a trend into a product or service itself? I find this an interesting question to which I have no answer.
Third, whatever Google is doing, some of the companies it covets are not ready to journey to the Googleplex to get a Google mouse pad. In 2004, I think companies Google wanted to buy were swooning to be certifiably Googley. Perceptions seem to be different.
So what?
General Web search users are not likely to make a shift any time soon. Furthermore, the competitive fights over patents are likely to have an impact over a longer time horizon. Neither the media nor most azure chip “real” consultants have the mental chops to shift from the “if it bleeds, it leads” approach to “real” journalism. So today’s dust up is tomorrow’s forgotton dust bunny.
The rodeo clown analogy provides me with an anchor point. Other Silicon Valley cowboys had to do time in the barrel. Some survived. Google strikes me as an amiable clown and one likely to emerge with a few bumps and bruises but intact. No wonder some children find clowns scary. Which company goes in the barrel next?
Stephen E Arnold, August 7, 2011
Sponsored by Pandia.com, publisher of The New Landscape of Enterprise Search.
Metadata Formally Recognized by Courts
August 7, 2011
Meta-Cognition, meaning to think about thinking, is a term psychologists love to throw around to discuss intelligence and the capacity to learn. Now, it seems the legal community is going to jump aboard the thinking-ship with their own term – metadata, to think about data, or more precisely, data thinking about data. The article, Technology: Recent Cases Help Evolve Guidelines for Producing Metadata: Keeping ESI Load Files in a Forensically Sound Manner that Preserves Metadata is Key, on Inside Counsel, examines the nature of metadata and tries to pin down a practical use for it.
The first part of the problem – what is metadata? – is universally agreed upon now days. Metadata is any non-visible data, such as author, word count, title (including changes), time/date stamps, etc…, connected to documents or other Electronically Stored Information (ESI). Lawyers can use this valuable information to nail down time lines, prove who monkeyed with a document, and which custodians did what to ESIs, in general.
As the legal community catches up with technology, more and more judges are ruling that metadata is not hearsay, but rather falls under the protection of ESI. Most recently, a judge set some practical guidelines for metadata:
“Judge Shira Scheindlin emphasized that metadata is an integral part of an electronic record. Although it is not legal precedent, her list is a reasonable set of guidelines for in-house counsel responding to ESI requests, as follows. Earlier this year, in National Day Laborer Organizing Network v. United States Immigration and Customs Enforcement Agency, 2011 WL 381625 (S.D.N.Y. Feb. 7, 2011) (opinion withdrawn upon agreement of the parties), Judge Shira Scheindlin emphasized that metadata is an integral part of an electronic record. Although it is not legal precedent, her list is a reasonable set of guidelines for in-house counsel responding to ESI requests, as follows. The metadata that should accompany the production of any text-based ESI includes: File Name…Custodian… Source Device…Source Path…Production Path…Modified Date…Modified Time…Time Offset Value…Identifier.”
Now that metadata is being recognized as a legitimate resource for information, indexing becomes even more vital than ever.
Catherine Lamsfuss, August 7, 2011
Sponsored by Quasar CA, your source for informed financial advisory services
Delightful Irony: Human Crashes Google Car
August 7, 2011
This morning my Overflight information service overflowed with Google related information. There were coveys of quales [Latin and not a misspelling, gentle reader] about Google and patents. There was another Googley shutdown story. The idea is that you should just Google a word. Who cares about a “real” dictionary entry. I find the reference appropriate because who cares about a “real” anything, including an azure chip consulting company with a penchant for becoming authorities in ANSI standard controlled term lists. I found a tardy response to the feline centric “How Do I Hate Google? Let Me Count the Ways”, which had precious little of the Elizabeth Barrett Browning gentleness from her pain and suffering.
Consider this EBB passage:
First time he kissed me, he but only kissed The fingers of this hand wherewith I write; And, ever since, it grew more clean and white.
Now evaluate the budding wordsmith Brian S. Hall’s passage:
David Drummond, you are [lame]. Larry, Sergey, you are [lame]. And I know why you’re [lame]. I know why you have monopoly profits in one business, use them to *destroy* other businesses, dominate the newest business (smartphones) and still whine.
Now who should be the focus for legions of soon to be unemployed English majors?
But what caught my attention was this item: “Google Blames a Human for its Robo-Car Crash.” My take: Algorithm good. Human bad.
Now what happens if Google’s next big product initiative such as a relaunch of the fascinating Google TV product line or a fully integrated, graphically consistent interface to the Android mobile devices flops?
Maybe algorithm good, human bad? Amusing to me because humans, not algorithms, are actually making decisions at the Googleplex. So a failure at Google boils down to “Human bad.” Seems logical.
Stephen E Arnold, August 7, 2011
Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

