Recommind and Predictive Coding

June 15, 2011

The different winners of the Kentucky Derby, Preakness, and Belmont horse races cast some doubt on predictive analytics. But search and content processing is not a horse race. The results are going to be more reliable and accurate, or that is the assumption. One thing is 100 percent certain: A battle over the phrase “predictive coding” in the marketing of math that’s in quite a few textbooks is brewing.

First, you will want to read US 7,933,859, Systems and Methods for Predictive Coding.” You can get your copy via the outstanding online service at USPTO.gov. The patent was a zippy one, filed on May 25, 2010, and granted on April 26, 2011.

There were quite a few write ups about the patent. We noted “Recommind Patents Predictive Coding” from Recommind’s Web site. The company has a Web site focused on predictive coding with the tag line “Out predict. Out perform.” A quote from a lawyer at WilmerHale announces, “This is a game changer in eDiscovery.”

Why a game changer? The answer, according to the news release, is:

Recommind’s Predictive Coding™ technology and workflow have transformed the legal industry by accelerating the most expensive phase of eDiscovery, document review. Traditional eDiscovery software relies on linear review, a tedious, expensive and error-prone process . . . . Predictive Coding uses machine learning to categorize and prioritize any document set faster, more accurately and more defensibly than contract attorneys, no matter how much data is involved.

Some push back was evident in “Predictive Coding War Breaks Out in US eDiscovery Sector.” The point in this write up is that other vendors have been offering predictive functions in the legal market.

Our recollection is that a number of other outfits dabble in this technological farm yard as well. You can read the interview with Google-funded Recorded Future and Digital Reasoning in my Search Wizards Speak series. I have noted in my talks that there seems to be some similarity between Recommind’s systems and methods and Autonomy’s, a company that is arguably one of the progenitors of probabilistic methods in the commercial search sector. Predecessors to Autonomy’s Integrated Data Operating Layer exist all the way back to math-crazed church men in ye merrie old England before steam engines really caught on. So, new? Well, that’s a matter for lawyers I surmise.

With the legal dust up between i2 Ltd. and Palantir, two laborers on the margins of the predictive farm yard, legal fires can consume forests of money in a flash. You can learn more about data fusion and predictive analytics in my Inteltrax information service. Navigate to www.inteltrax.com.

Stephen E Arnold, June 15, 2011

Sponsored by ArnoldIT.com, the resource for enterprise search information and current news about data fusion

Written by Stephen E. Arnold · Filed Under Analytics, Business strategy, EDiscovery, News, Online (general), Search, Text analytics, Text processing | 1 Comment

The SoLoMo Angle for Google

June 15, 2011

We wanted to highlight a great Google acronym. Google’s Executive Chairman, Eric Schmidt, was interviewed recently at the D9 conference and answered some tough questions as reported in SEW’s ” Eric Schmidt on Search Result Answers, Social Failures & Google Offers Launch.”

Opining on topics that “…ranged from a new approach to answering questions in search results, to Google’s social media failures, and the launch of Google Offers,” Schmidt openly admitted responsibility for his flubs, particularly in the social venue. He indicated that Google searches would be evolving from link-based answers to algorithmically-based answers. Mr. Schmidt allegedly said:

The future of Google is “SoLoMo — social, local, and mobile;”

This is a wonderful buzzword and it meshes nicely with the assertion that that Google counts itself among the four top brands of the post-PC era excluding Microsoft.

Google is moving down the me-to lane at the innovation supermarket. The company rolled out Google Offers. This is an online deal service which reminds me of the Groupon and LivingSocial service, the Yellow Pages service, the Courier-Journal’s service, and a number of other online deal plays.

Yep, “SoLoMo”.

Stephen E Arnold, June 15, 2011

Sponsored by ArnoldIT.com, the resource for enterprise search information and current news about data fusion

Written by Stephen E. Arnold · Filed Under Business strategy, Google, Marketing, Mobile, News, Online (general) | Comments Off on The SoLoMo Angle for Google

Google and Personalized Results

June 15, 2011

When I ask a person to run a query, people tell me they get results different from mine. The reason is that free Web search systems and some enterprise search systems personalize results. The notion of personalization has been around since Sagemaker. Remember that service which was built on Microsoft technology and then quietly disappeared years ago. (I wrote about the service in March 2008 in “Search: The Wheel Keeps on a Turnin’”.) There were even earlier forms of personalization available in commercial systems from the late 1970s and early 1980s. These were called “SDIs” which is a shorthand for Selective Dissemination of Information. Today, the “alert” is a close cousin of these filters. “Filter bubble” is a clever metaphor for something that dates from the era of “Hello Goodbye” and “I Heard It through the Grape Vine.”

The idea is that a user or system administrator sets up a standing query or series of terms. When new content arrives, matches are forwarded to the user or to a special file. The original “dollar a day” Desktop Data filled in boxes with filtered alerts, but who wants to revisit the history of search when today 20 somethings cheerfully reinvent the wheel between Foosball games and Facebook time?

Now personalization is rampant and it will become even more prevalent. Depending on one’s point of view, personalization is a great innovation or it is the bane of an informed professional.

How does one get around Google’s personalization functions? Most people don’t even know that the search results Google displays are filtering hits, predicting what a person wants and then subjecting those guesses to what is displayed to the user as “information.”

For the time being, you can sidestep Google’s personalization by running queries on this url: http://bit.ly/m3R3q7

If it goes dark, let me know via the comments section. We will have to hunt for another work around. Beyond Search contributor Cynthia Murrell found a useful write up about this service. She said in an email to me:

Search Engine Journal explains “How to Get Standardized Search Results.” For those in the SEO field, it used to be easy to measure a site’s ranking— perform a search, and there you go. However, the customization factors that Google has added complicate the issue. Results will be different for each searcher, depending on search history (google.com cookies) and the region one is searching from. You can go through the tedious process of deleting the cookies, but what to do about the assignation of a data center based on your location? Writer Nick Oba has the solution: exploit a feature Google created to accommodate mobile phone queries.

Google released a nifty little app called Google Mobilizer. This strips bells and whistles away from web pages, making them easily viewable on mobile phones . . . . You can also view websites from your desktop browser using Google Mobilizer. Google Mobilizer is, in effect, a proxy which also happens to strip out irrelevant frills. We use Google Mobilizer to view search results on Google Search. The results . . . are from the most authoritative data center there is. These are the same results you would get when searching from the Googleplex, not logged in, and with cookies freshly deleted.

There you have it—the solution to measuring clients’ standardized rankings. At least until the next wrinkles are introduced.

A happy quack to Ms. Murrell for this added color. Imagine the potential of shaping results for the purpose of disinformation, advertising, or affecting how a particular entity is presented to a researcher, journalist, college student, or corporate president.

Nah, this would never happen. Bears in Kentucky eat at Kentucky Fried Chicken too.

Stephen E Arnold, June 15, 2011

For more about search, check out The New Landscape of Enterprise Search, now available from Pandia.com in Oslo, Norway.

Written by Stephen E. Arnold · Filed Under Business strategy, Google, Marketing, News, Search, Technology, Text processing | 2 Comments

Expert System Is on the Move

June 15, 2011

The way consumers and enterprises are accessing information is changing. Not only is there a need to access and manage information stored in the traditional internal sources, but organizations must be able to effectively manage and capture intelligence from the streams of information coming in from every direction. Without semantic technology, traditional enterprise search is unable to extract value from the stream, which means leaving a great deal of critical information behind. We learned from a recent Expert System news release:

With the overwhelming amount of information available today, there is an unprecedented need to be able to cut through the noise and capture the information that is most important to you,” said Luca Scagliarini, VP of Strategy and Business Development at Expert System. “Semantic is the only technology that can really help companies take advantage of all the information available via the real-time web, and it’s the only technology that will be able to filter the noise for the conversations, the patterns and sentiment that is important to you.

Expert System is positioning itself as a way to deliver enterprise search by intercepting the critical and the relevant from all the streams of information available. By combining the benefits of semantic tagging and semantic-based text comprehension, Cogito SEE allows the enterprise to leverage all the information organizations have access to and require to drive business strategies. New features include:

A point of access to structured and unstructured information including newsfeeds, social networks and other internet sources.
An interface that enables intuitive, visual navigation of tags, facets, as well as interaction with search results to discover new connections and data.
Semantic search capability for multilanguage content.
Automatic and customizable report generation to monitor and share evolving search details and results.

For more information, visit www.expertsystem.net.

Derek Clark, June 15, 2011

Written by Stephen E. Arnold · Filed Under Enterprise, Enterprise search, News, Semantic, Technology, Text processing | 1 Comment

Protected: SharePoint 2007 Gets Updated Web Parts

June 15, 2011

Written by Stephen E. Arnold · Filed Under Enterprise, Enterprise search, Microsoft, News, Search, SharePoint, Technology | Comments Off on Protected: SharePoint 2007 Gets Updated Web Parts

How Do I Search Thee? Let Me Count the Ways

June 14, 2011

“Open Source Search Engines Every Developer Should Know About” provides snapshots of about a dozen different “open source systems.”

There are quite a few enterprise search and Web search systems available. Confusing the enterprise products with the Web products is an all too common error. But within each of these two types of search engines, some vendors provide free or hobbled systems to allow system managers to do some tire kicking.

Overviews like this one are useful because new finds do turn up. Unfortunately, overviews can mix up the different types of search systems and increase the confusion instead of clarifying the situation.

Paul Anthony, on the blog webdistortion.com, describes a number of search engines and names some sites use the systems. He lists the individual search engines with a brief write up on each along with a video showing how the engine operates. Among those included are:

Constellio, www.constellio.com. The system is, according to Constellio, the “first complete open source enterprise content search solution.”
Search Blox, www.searchblox.com. The system is built around Lucene.
Sphinx, http://sphinxsearch.com. The system is “an open source full text search server, designed from the ground up with performance, relevance (aka search quality), and integration simplicity in mind.”

The author is critical of the application of search engines on seemingly a large number of web sites. He writes:

typically search is one of the most poorly implemented pieces of technology on a site, with developers opting for the standard the out of the box solution which comes with most modern content management systems – and in many cases doesn’t do justice to your content. I thought I’d take a look at what other enterprise level and open source search engines out there to find and index the information on your site faster, and provide users with a deeper, more relevant result set.

The write up does include some red herrings; for example, Coveo. The company has emphasized customer service applications, not search if I recall the PR person’s description of the “new direction” for Coveo. Also, I expect the investors to be somewhat surprised to be listed as an open source search system. You can download a version of Coveo that is limited to the number of documents on my iPad. The get the “real deal,” you have to pay a license fee for this proprietary system. There are also some unusual omissions. I don’t expect many of today’s analysts to be familiar with the Lucene-based Tesuji.eu system but I do expect a reference to FLAX.

Like most write ups about search and retrieval, the attempts to explain, categorize, and clarify usually increase the confusion. This is, in my opinion, a highly desirable condition for the unemployed “real journalists”, the pundits, the failed CMS system administrators, and the majors in dance theory. Consultants in enterprise search have to come from somewhere other than computer science programs.

Stephen E Arnold, June 14, 2011

Sponsored by ArnoldIT.com, the resource for enterprise search information and current news about data fusion

Written by Stephen E. Arnold · Filed Under News, Open source, Search, Technology | Comments Off on How Do I Search Thee? Let Me Count the Ways

Facebook Face Play No Big Surprise

June 14, 2011

You might be living under a rock if you haven’t heard about Facebook’s newest addition to its social network–facial recognition software. That’s right – the beloved social network is building a database of their user’s faces and telling us it’s all to make our lives easier. As discussed in “Facebook Quietly Switches on Facial Recognition Tech by Default” the controversial feature allows users “to automatically provide tags for the photos uploaded” by recognizing facial features of your friends from previously uploaded photos. Yet again, Facebook finds themselves under fire their laissez-faire attitude towards privacy.

This latest Facebook technology is being vilified. It has been called “creepy,” “disheartening,” and even “terrifying.” These are words that would usually be reserved for the likes of Charles Manson or Darth Vader, not an online social network. The biggest backlash seems to come from the fact that the didn’t “alert its international stalkerbase that its facial recognition software had been switched on by default within the social network.” This opt-out, instead of opt-in, attitude is what is upsetting the masses. Graham Cluely, a UK-based security expert says that “[y]et again, it feels like Facebook is eroding the online privacy of its users by stealth.”

To be fair, Facebook released a notice on The Facebook Blog in December 2010that the company was unleashing its “tag suggestions” to United States users and when you hear them describe the technology it seems to be anything, but Manson-esque. In fact, it invokes thoughts of Happy Days. They say that since people upload 100 million tagged photos everyday, that they simply are helping “you and your friends relive everything from that life-altering skydiving trip to a birthday dinner where the laughter never stopped.” They go as far as to say that photo tags are an “essential tool for sharing important moments” and facial recognition just makes that easier.

Google has also been working on facial recognition technology in the form of a smartphone app known as Google Googles and celebrity recognition. However, now Google is claiming to have halted the project because, as Google Chairman Eric Schmidt said “[p]eople could use this stuff in a very, very bad way as well as in a good way.” See “Facebooks’s Again in Spotlight on Privacy”.

So who’s right? Facebook by moving forward or Google by holding up its facial recognition technology?

It seems to me that Google is just delaying the inevitable. Let’s face it. As a Facebook user my right to my privacy may be compromised the second I sign up in exchange for what Facebook offers.

Technology, like the facial recognition software, is changing the social media landscape, and I suppose I should not be surprised when the company implements its newest creation even when it puts my privacy at risk.

Is it creepy?

Probably and users should be given an opportunity to opt-in, not out. Is it deplorable. No. It’s our option to join and Facebook is taking full advantage of it.

Jennifer Wensink, June 14, 2011

Sponsored by ArnoldIT.com, the resource for enterprise search information and current news about data fusion

Written by Stephen E. Arnold · Filed Under Business strategy, Facebook, News, Privacy, Rich media, Technology | Comments Off on Facebook Face Play No Big Surprise

Open Search Server

June 14, 2011

TechWorld’s “Open Search Server Releases New Developer Preview” offers details on a preview of a new Apache Lucene-based search system.

Written in Java, the article reports that:

Open Search Server can crawl file systems, databases and websites” and supports a wide variety of document formats. The preview includes “a new screenshot feature that captures screenshots of the Web pages being crawled, similar to the preview feature of a big name public search engine.

The open software offers companies independence in developing their information management strategies. In the “Cloud” era, these strategies – how users will be able to search and retrieve documents – will become strategic.

If you are tracking open source search vendors, add this one to your list.

Stephen E Arnold, June 14, 2011

Sponsored by ArnoldIT.com, the resource for enterprise search information and current news about data fusion

Written by Stephen E. Arnold · Filed Under Appliance, Enterprise, Enterprise search, News, Open source, Search | Comments Off on Open Search Server

The Hoof Beats of Net Censorship

June 14, 2011

“Iran Has Had Enough of the Internet,” writes TechEYE.net, while ComputerWorld reports “China Censors Web to Curb Inner Mongolia Protests.” Control seems inevitable in certain countries.

It’s tough to search when someone has stacked the deck. In China, the government has been blocking search terms and Facebook posts on subjects it wishes to keep quiet. Regarding the most recent actions, TechEYE,net explains:

The censorship comes after protests erupted in the region when an ethnic Mongolian shepherd was run over by an ethnic Han truck driver, according to human rights groups. Ethnic Mongolians in the region have taken to the streets, prompting authorities to declare martial law in some of the cities.

Meanwhile, Iran has decided to shut down the Internet altogether and replace it with its own system. Here’s snippet we noted:

The plan is being drawn up by the country’s communications ministry. The idea seems to have the backing of Iran’s Supreme Leader Ayatolla Ali Khamenei and will help further encourage a certain flavor of Islamic moral values on Iran’s people. It will take two years for Iran’s government departments to be taken off the grid but about 60 percent of the country’s homes and businesses will be connected to [the government’s] new net in much less time.”

Will such censorship come to the US with a vengeance? I doubt it, but certain types of rational thinking yields some surprising outcomes. For examples, read Voltaire’s Bastards.

Cynthia Murrell June 14, 2011

Sponsored by ArnoldIT.com, the resource for enterprise search information and current news about data fusion

Written by Stephen E. Arnold · Filed Under Business strategy, Government, News, Online (general), Search | Comments Off on The Hoof Beats of Net Censorship

Protected: People Search in SharePoint: More Tips

June 14, 2011

Written by Stephen E. Arnold · Filed Under Microsoft, News, Search, SharePoint, Technology, Text processing | Enter your password to view comments.

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.