Linear Content Analysis: Big Problems, Big Costs
January 16, 2011
Another sacred cow is now headed toward the intellectual McDonald’s beef supplier. Most organizations producing content for commercial databases use a “relaxed linear method”. The idea is that certain steps work like Henry Ford’s assembly line and other processes are more in touch with the multi-modal, do-it-when-you-can approach.
There’s an interesting write up by Clearwell Systems about the utility of relaxed methods of document processing. Heck, forget relaxed. The information in the blog post suggests that the scrum and online game method may be the optimal way to make sense out of content.
The write up is “Reinventing Review in Electronic Discovery,” and it is tailored to the legal eagle and litigation support world. But the implications of the information may have considerable value to those working in other types of research. There are some references to the close-to-the-vest US government reports about content processing as well as some academic research.
The passage that I noted was:
Of course, fundamentally changing linear review with specific technologies that radically changes the review workflow is an approach worth considering. While offering such aids, it must be remembered that human judgment is still needed and the process must incorporate both increasing their knowledge as well as their ability to apply judgment.
My take is that multi-path work methods, technology, and regular humans are needed to make sense out of certain types of content. Humans alone can’t do the job particularly well or economically. Lawyers like the billable work too.
Stephen E Arnold, January 16, 2011
Freebie unlike law firms and eDiscovery system vendors
Revolutionary, My Dear Watson
January 16, 2011
One can only imagine what heights AI will reach in say, five years. The exciting news is that there is a glimpse of the future today with the first tangible steps taken by IBM. Per an article on InfoWorld.com, Watson, IBM’s new supercomputer, has been introduced on the game show circuit of all places.
During an exhibition round of “Jeopardy”, Watson faired well against the top reigning human champs, winning by a handy $1100. While literally a trivial endeavor, Watson’s game performance proves some noteworthy achievements. Watson possesses “sufficient artificial intelligence to ably play a game like “Jeopardy” that not only tests the amount of raw data a machine (or human) has stored away, but also an ability to analyze natural language — “Jeopardy” categories and answers contain puns, for example — so as to understand what sort of information is really being requested and to present that information clearly, concisely, and quickly.”
The extensions of Watson’s abilities are far-reaching, including value-adding services to nearly any industry, bringing to the table the power to evaluate huge collections of data and find correlated relationships humans would often miss. The potential for gaining momentum on the cure for cancer has even been tossed around.
So great job, IBM! You have achieved what has to date been the unachievable. Now, if only those pesky kinks in the search on your public facing Web site could be ironed out. Maybe Watson could be put on the case? Does Watson use OminiFind?
Sarah Rogers, January 16, 2011
Freebie
Big Bandits? Apple, Google, or Microsoft
January 16, 2011
I feel enervated when an azure chip consultant forces me to decide which company is not just a bandit, the biggest bandit. Navigate to “WebM vs. H.264: Google Bets Big on Itself.” The write up does a good job of explaining how these large monopolies are trying to become even larger and exert more control over things digital. I am not a video hound, but the goslings and many other folks in Harrod’s Creek cannot get through a day without large doses of kick back, couch potato action. The addled goose? Not so much.
The story quotes a poobah from the azure chip consulting firm, Yankee Group. Here’s the quote I liked:
“The question is, who are you going to trust to not hold standards licensees to ransom? Little MPEG or giant Google?” Howe [Yankee Group expert] asked. “At this point, I don’t think we know who is the bigger bandit.”
Wow. Apple, Google, and Microsoft described as alleged bandits. Exciting.
Stephen E Arnold, January 17, 2011
Freebie, unlike the advice of azure chip consultants commenting about banditry.
Search Blind Spot: WordPress Free Themes
January 15, 2011
We don’t do much searching for words and phrases designed to get traffic for this blog. We use jargon, some Greek and Latin, a little Hebrew, and quite a bit of insider jargon. My favorites are “azure chip consultant” (folks who couldn’t get a job at one of the top management consulting firms) and “real journalists” (people who were nuked from publishing companies when revenues went the way of the iPad News Corp. newspaper.
But “Why You Should Never Search for Free WordPress Themes in Google or Anywhere Else” alerted me to another example of “objective” search results gone wrong, very, very wrong. Here’s the key passage:
Someone who has come to WordPress on the first time is more than likely to type “free WordPress themes” into Google to find a site that gives them what they want. Unfortunately they’re more than likely to end up with spammy links, at best, on their site. Of course, the WordPress Theme Directory can be frustrating in its lack of themes that work with WordPress 3.0. Many of the themes look a little out of date and lots look very bloggy.
Why? Spammers have found ways to generate traffic to Web sites that may not contain what the user needs, wants, or expects. If you want to include some of the spammy trick, the write up includes numerous code examples.
Why are search engines filtering for baloney? Do you think money is a motivator?
Stephen E Arnold, January 15, 2011
Freebie
Content Freedom Threatened
January 15, 2011
I am not sure if I agree that Apple’s App Store is a threat to Internet freedom. “Wikipedia’s Jimmy Wales: App Stores a Clear and Present Danger” voices this concern. I know that walled gardens are in the future. Money flows when folks control the customer. Loss of control of the customer spells trouble at some point in a company’s future. I think it is easier to make money when one has a customer list and keeps it secret.
Content is a magnet, so in order to increase the “pull” of content, walls are a useful architectural consideration. In fact, walls are going up everywhere. Facebook is a big walled garden. Oracle’s database is a big walled garden. Even open source friendly companies are going to need a walled garden. Without walls, an smart 20 something can nuke a business intentionally or inadvertently. Facebook, for example, wants to reinvent or support the reinvention of enterprise applications as social apps and services within its walled garden. How do you think that will work as more Facebook users enter the workforce. Even companies “on top of Facebook” are likely to have some heart stopping moments.
Here’s a snippet I liked from the article about Wikipedia’s Jimmy Wales:
The app store model is a more immediate threat to internet freedom than breaches of net neutrality. That’s the opinion of Wikipedia chief Jimmy Wales. According to Wales — who was quick to stress he was speaking in a purely personal capacity — set-ups such as the iTunes App Store can act as a “chokepoint that is very dangerous.” He said such it was time to ask if the model was “a threat to a diverse and open ecosystem” and made the argument that “we own [a] device, and we should control it.”
Walled gardens are not new. As the competitive arena heats in the warm financial gusts flowing across certain areas of the online ecosystem, old style information silos are going to be built inside these walls. The challenge will be choosing which garden is the one to make home.
Stephen E Arnold, January 15, 2011
Freebie
Google and Multi-Community Content
January 15, 2011
I found Google’s patent application US20110010384, “Multi-Community Content Sharing in Online Social Networks” for two reasons. First, the inventors scattered across three Google offices. The participation of the Beijing engineer was interesting. Microsoft also seems to be tapping China for some of its ideas. Second, the invention points to social content federation. Here’s the abstract:
An online social networking system (100) can be used to distribute content within an online social network. The product comprises code for carrying out a method that begins with receiving content to be posted to a host community. Labels (420) are also provided to associate with the content. The labels (420) are used to identify communities in the online social network to which to post the content. Code is generated that, when executed, displays the content on a webpage of the host community, and displays the content on a webpage of each of the identified communities. The content may comprise one or more events, images, forum and topics.
Facebook does; Google describes. Which company has social momentum?
Stephen E Arnold, January 15, 2011
Freebie
Yandex Injects Semantics
January 14, 2011
The Russian search engine Yandex is trying to pull a Google. Yandex’s new search engine has been dubbed “Spectrum” for it’s ability to “read” your mind. Spectrum uses a machine learning program that uses your previous searches as well as historic search patterns to infer what it is that you’d like on your search return.
“As users’ interests and intents tend to change, the system performs query analysis several times a week”, says Yandex. This amounts to Spectrum analyzing about five billion search queries.
For example, if most people who search “gone with the wind” are searching for the movie Gone With the Wind, then the majority of results will be about the movie instead of the book. If they can pull it off Russia will have a search engine to rival even the best in the world.
Yandex may pose a threat to Google in certain markets. There is considerable chatter about the Google Microsoft Web search face off. We think the real action is among Google and Baidu and Yandex. The Russian and Chinese markets are big, but it may be easier to Google’s Russian and Chinese competitors to attach Google on its home turf than for Google to put pressure on Baidu and Yandex in their back yards.
Stephen E Arnold, January 14, 2011
Freebie
X1 Rolls out Search for iPhone
January 14, 2011
X1 Technologies, Inc., has released the new X1 mobile search for iPhone on iTunes. The application requires X1 Professional Desktop Search to run but once installed allows the user to access his / her home pc from their mobile devices, regardless of physical location.
“We know that information workers snatch stray minutes to fit in work when they are on the move. But they often find that their mobile devices lack the files, contacts, or data they need.X1 Mobile Search remedies that problem.”
X1 Mobile Search also enables the user to experience fast-as-you-type searches, offline document storage as well as the ability to search inside of email and the ability to edit documents from your mobile device.
Leslie Radcliff, January 14, 2011
Freebie
Taxonomy and Efficiency
January 14, 2011
One of the leaders in the data and content management field, Access Innovations, Inc., has compiled a list in “10 Reasons to Resolve to Create a Taxonomy for Your Business in 2011.” A taxonomy creates an organized system of classification.
Here are three of the reasons with which we resonated:
Every person or department uses a different term, even though they’re all talking about the same thing. Your coworkers can’t find the company policy for the Fourth (or Fifth, or Sixth) of July, because it’s tagged as Independence Day? An enterprise taxonomy can get all of you searching the same language, if not talking it.
A coworker just spent 45 minutes trying to locate a document, but didn’t know what search term to use. Taxonomy browsing should work for him or her. And with synonyms, he/she can look for eye doctors or even “optimalogists” and find ophthalmologists.
Everything for HR gets called “HR” – all 10,000 documents. Get your indexers, taggers, and searchers browsing down to the more specific terms that a taxonomy can show them. You have HR documents on free pizza as a fringe benefit? Add Fringe benefits as a narrower term, and add Free pizza under Fringe benefits, so people can save some dough.
Access Innovations asserts that their system of taxonomy can help to eliminate irrelevant and unwanted search returns and will enable a searcher to use relative terms for the same search regardless of punctuation, spelling, or terminology.
“The bottom line is that a good taxonomy can save your staff time, and your organization time and money.”
If this system can save companies both time and money, why not give it a try? Let’s face it, in today’s economy where every penny counts, waste and inefficiency can lead to the failure of a business venture while efficiency and ease tend to lend themselves to success.
Leslie Radcliff, January 14, 2011
Freebie
Social Media, Niches, and Search
January 14, 2011
MySpace seems to be struggling. “Struggling” may be the wrong word, particularly if you were one of the hundreds of employees nuked in the riffing a few days ago. You can get the “real” news from Silicon Republic’s “MySpace Confirms Layoffs of 500 Staff Members.” I never was a MySpacer.
Image Source: http://www.webguild.org/20110104/myspace-to-layoff-50-of-employees. Good write up too.
The last time I watched a demo, the page on display flickered and brayed noise. What I do associate with MySpace are:
- iPad publisher and financial expert, Rupert Murdoch paid $580 million for the ur-Facebook.
- Mr. Murdoch’s comment: “The world is changing very fast. Big will not beat small anymore. It will be the fast beating the slow.” Source: Woopidoo.com here. I think of this quote when I read about Facebook’s Goldman Sachs’s deal and the MySpace “challenge”.
But what is interesting is that social media content is moving into a walled garden. Facebook’s content has value partly because of its walled garden. Even Google’s shift in support for content on YouTube reminds me of an exclusionary move. The chatter about curation, filtering, and controls translate in my addled goose brain to a shift from open to closed.
This has several implications for search:
First, the idea of going one place to access content is getting more and more difficult. I think the hurdles posed by registration processes and other methods of capturing “value” are building blocks of a new type of digital real estate: the private park, the walled garden, and a snooty country club. Who will be able to access which service for information.
Second, search is going to require a user to run the same query in different systems and then aggregate the meaningful results. Federated search is going to be increasingly important. Few users will tolerate manual hunting for a content collection, registering and maybe paying for access, and then figuring out what to do with results from different collections.
A bastion of the old way in the online walled garden business.
Third, because of the difficulty in accessing content, users—particularly in North America—will create an interesting new market for snippets, digests, and nuggets. Research for some people will become variations on “Farmville for Dummies” and “How to Lose 10 Pounds in 10 Days.”
What I find really interesting is that “the Internet” seems to be shaping into a variant of the original online industry. Content islands have to be visited by a researcher on a digital cruise ship. The pricing, access methods, and restrictions will vary. I thought research the pre-Internet way was gone for good.
Nope.
It’s 1980 all over again.
Stephen E Arnold, January 14, 2011
Freebie