Text Analytics SummitPolySpot: Agile Enterprise Search Infrastructure

Will Facebook Go Steady with a Chinese Search Partner?

September 6, 2011

We wanted to document what we think is an important strategic rumor.

Rumors are circulating that Facebook is in talks to partner with Baidu, the Chinese leader in online search. Forbes reports on the latest in, “Facebook and Baidu – Take that Google.” One has to wonder why Facebook is looking to a search company to gain ground in China. Not to be ignored is what Baidu hopes to get out of the deal.

The first and most important factor is that today Baidu commands 75+ percent of all keyword searches in China, and they are still growing.  Baidu is expected to be bigger than Yahoo on a worldwide scale within one year, with a global footprint that will rival or exceed Google’s footprint in its key markets, North America and Europe.  And everything that Baidu does is just fine with China.  Imagine Facebook using Baidu’s compliance technology to put them immediately into the good graces of the Chinese government.

So perhaps Facebook found a loophole around Chinese strict government standards? Will Baidu find that Facebook’s momentum is too difficult to control, even on Chinese soil? We will see, but if the rumored alliance comes to fruition, this could be a very big deal – a very big, very profitable, deal. Worth watching.

Emily Rae Aldridge, September 6, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Symantec and Clearwell Technology Push Forward

September 6, 2011

Social media participation is an increasingly valuable, and inescapable, tool. With social media widgets permeating the Internet, companies can only avoid participation by eschewing the Web altogether. That’s not an option for most.

However, tapping social media comes with a price, as the Brainyard reports in “Could Social Media Flub Cost You $4.3 Million?” That article examines a survey sponsored by Symantec which details the losses most companies experience using this medium.

A key component in these losses involves government regulations on business communications. Companies must retain such exchanges to comply with open records requests, industry regulations, and eDiscovery requests, explains The Var Guy in “Symantec Enterprise Vault 10 Handles Social Media Compliance.”

As the title suggests, writer Charlene O’Hanlon points to the latest edition of Symantec’s Enterprise Vault as a solution to the compliance problem:

Helping expand Enterprise Vault beyond its former boundaries is technology Symantec gained through its recent acquisition of legal discovery solutions provider Clearwell Systems. Clearwell’s eDiscovery Platform complements Enterprise Vault’s ability to capture information, tag specific records for future litigation and quickly search those records for relevant records by enabling customers to process, analyze and review those records for internal audits, legal eDiscovery and corporate governance.

We know that Clearwell’s technology is quite good, and can recommend it. The other technology we’re not so sure about, though. Shop around, but this may be your best bet to reduce compliance costs associated with social media.

Cynthia Murrell,, September 6, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Hadoop Gaining Ground on RDBMS Like a Smart Car Climbing Pike’s Peak

September 6, 2011

Open-source Apache Hadoop software is co-existing on the market with the more established RDBMS for relational database management. Computer World reports in, “Hadoop Growing, Not Replacing RDBMS in Enterprises.” We learned:

Hadoop is designed to help companies manage and process petabytes of data. Much of the technology’s appeal lies in its ability to break up very large data sets into smaller data blocks that are then distributed across a cluster of commodity hardware for faster processing. Early adopters of the technology, including Facebook, Amazon, eBay and Yahoo, have been using Hadoop to store and analyze petabytes of unstructured data that conventional RDBMS setups couldn’t handle easily.

Computer World’s review is not completely negative, but rather restrictive in our view. RDBMS has organizational inertia on its side, an obstacle any newcomer has to conquer. RDBMS is entrenched in the rigid world of transaction data, customer information, and call records. However, Hadoop is adept in creative sectors such as event data, search engine results, and text and multimedia content from social media sites. Security concerns are also cited, although as adoption becomes more widespread those concerns are sure to lessen.

Our view is that in the present financial environment, open source is likely to suffer severe pressures. Giant, for profit companies will want to capitalize on open source goodness and then implement a fiercely commercial pricing model for services, training, consulting, engineering, and proprietary extensions. Big money will lure key developers, and the “community” may be subject to London, UK style dissention. Yikes!

Emily Rae Aldridge, September 6, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Linguamatics Scores Big with Text Mining

September 6, 2011

Wouldn’t it be great if there was a way to sift through all the chatter on Twitter and other social media sites to get to the real meat and potatoes? What if companies could find the proverbial needle in the Twitter-haystack? All this is being done by Cambridge-based Linguamatics as reported in the article, Tweet Smell of Success, on Business Weekly.

The small company (only 50 employees after expanding) caught the world’s attention due to their text-mining skills. Last year, using their search expertise, they were able to very accurately predict the outcome of an election based on the Tweets which occurred during a live, televised debate.

There core technology was developed by the four original founding members. Three remain at the company. They have expanded, rapidly, in their ten years of business, and rely solely on income. They believe their success is due to their unique search approach.

David Milward, CTO and co-founder said: ‘We knew that language processing could get people relevant information much faster than traditional search methods. However, previous systems needed reprogramming for different questions: we wanted to give users the flexibility to extract any information they wanted.’

Linguamatics is just one of many emerging search management companies, each with its own niche. With business and technology constantly shifting to newer and faster methods of getting information, it is no surprise that businesses demand better search methods. More and more information is popping up within the internet, intranets, file-sharing and other data storage entities. Traditional brute force search looks less and less useful to the professionals in some of these hot new market sectors.

Catherine Lamsfuss, September 6, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

SharePoint Social Tools Gets a Major Endorsement

September 6, 2011

Since we started focusing on SharePoint, we have reported that the collaborative content server was not designed for social networking. We have watched Microsoft listen to its users and they have added many social tools, but they still couldn’t compare to the social networking giants. To our surprise, we dug this tidbit up from Infoworld.com: “If You Must Have In-House Social Tools, Go With SharePoint.”

The social tools include blogs, “I Like It” tags, notes, and profiles pages. The article describes three ways these tools can be used by businesses: abused, ignored, or improve productivity. Any of these options could happen based on the company’s culture and employees’ attitudes.

SharePoint administrators control a large portion of how the tools will be used, which can prevent it from being abused. One way to combat them being ignored is to conduct training (the cure all for many things).

Despite the positive endorsement, the article ends on this note which we think you may want to consider:

I’m not a fan of social networking tools at work. I believe it distracts people more than it provides value. Call me a dinosaur, but when I want to say something important to the entire company, I use this ancient system called email. Maybe I’m not a team player because I don’t like collaborating on documents; if I need your help on a document, I’ll email it to you and you can look it over.

Our opinion is that social networking has a time and a place, is beneficial, and should be taken in small quantities. This brings up the thought that SharePoint users need to search through large quantities of data, some of which will pose unusual challenges. We suggest that you explore SurfRay’s Ontolica to take the edge of the search.

Stephen E Arnold, September 6, 2011

SurfRay

Inteltrax: Top Stories, Aug 29 to Sept 2, 2011

September 5, 2011

Inteltrax, the data fusion and business intelligence information service, captured three key stories germane to search this week, pulling these stories from across a wide spectrum of analytic topics.

Our feature this week, “Definition of Big Data Evolving”  took an inside look at how customers, not designers, are sculpting what we will come to call “big data” in the future.

Another story, “JP Morgan Shows No Sign of Analytic Slowdown”  explains how JP Morgan cut its costs by investing in faster analytic tools.

Another interesting story, “Digital Reasoning Beefs up its Front Office,”  showed how one of the business intelligence/data analytics world’s fastest risers is strengthening its leadership with an expert in healthcare. (Beyond Search will be running an interview with Dr. Ric Upton in a future issue of Beyond Search.)

These stories and more made up our week as we follow the ever-evolving landscape of big data. Whether it’s executives changing titles or the changing terminology of the field, we’ve got our eyes on it all and will bring the latest scoop to readers.

Follow the Inteltrax news stream by visiting www.inteltrax.com

Patrick Roland, Editor, Inteltrax, September 5, 2011

Sponsored by Pandia.com

Oracle Data Mining Update

September 5, 2011

The new Oracle Data Mining Update is generating buzz, including a piece by James Taylor entitled, “First Look – Oracle Data Mining Update.” Oracle Data Mining (ODM) is an in-database data mining and predictive analytics engine, which allows for the building of predictive models. The features added in the latest version are highlighted.

The fundamental architecture has not changed, of course. ODM remains a “database-out” solution surfaced through SQL and PL-SQL APIs and executing in the database. It has the 12 algorithms and 50+ statistical functions I discussed before and model building and scoring are both done in-database. Oracle Text functions are integrated to allow text mining algorithms to take advantage of them. Additionally, because ODM mines star schema data it can handle an unlimited number of input attributes, transactional data and unstructured data such as CLOBs, tables or views.

The ability of ODM to build and executive analytic models completely in-database is a real plus in the market. The software would be a good candidate for anyone interested in using predictive analytics to take advantage of their operational data.

Emily Rae Aldridge, September  5, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Statisticians Weigh In on Big Data

September 5, 2011

The Joint Statistical Meetings, the largest assembly of data scientists in North America, provided fertile ground this summer for a survey by Revolution Analytics on the state of Big Data technologies. Revolution Analytics presents the results in “97 Percent of Data Scientists Say ‘Big Data’ Technology Solutions Need Improvement.”

As the headline suggests, the vast majority of these experts crave improvement in the field:

The survey revealed nearly 97 percent of data scientists believe big data technology solutions need improvement and the top three obstacles data scientists foresee when running analytics on Big Data are: complexity of big data solutions; difficulty of applying valid statistical models to the data; and having limited insight into the meaning of the data.

Results also show a lack of consensus on the definition of “Big Data.” Is the threshold a terabyte? Petabyte? Or does it vary by the job? No accepted standard exists.

Survey-takers were asked about their future use of existing analytics platforms, SPSS, SAS, R, S+, and MATLAB. Most respondents expected to increase use of only one of these, the open source R project (a.k.a. GNU S).

Revolution Analytics bases their data management software and services on the R project. The company also sponsors Inside-R.org, a resource for the R project community. I’d have to see the survey to know whether the emphasis they found on R was skewed, but let’s give them the benefit of the doubt for now.

Cynthia Murrell, September 5, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Is the Open Source Community Getting More Fractious?

September 5, 2011

Do we sense some edginess in the “community” for open source search? TheServerSide.com declares, “Lucene Should Just Shut Up about Java 7.” This rude headline is a response to those who have written on Lucene’s side, such as The H Open’s “Java 7 Paralyses Lucene and Solr.”

ServerSide writer Richard Mayhew defends Oracle and its release of the open source Java 7. He admits there were problems, like there usually are with revised software, but says that they are being addressed. He feels Lucene should have tested Java 7early on, and is overreacting to the problems:

It’s not like Java 7 was sneaking up on anyone, Oracle’s been doing webinars and presentations and press releases a lot lately to get the word out to whoever’s living under a rock and didn’t know. So Lucene should have said hey this thing’s coming, maybe we should try it. And when these geniuses did, it was too late to turn back without giving java 7 a huge black eye, which nobody needs.

We decline to weigh in on this particular debate. However, we do see it as a sign of a deeper, more critical problem in the open source fellowship. We think it warrants close observation.

Cynthia Murrell, September 5, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Web Search Industry Challenged to Innovate

September 5, 2011

Google is the ultimate search solution, right? But have you noticed a curious lack of new ideas in the world of Web search? If so, you’re not alone. The critcism of Google won’t die.

Network World reports, “Computer scientist calls for Web search shake-up.” It seems that Oren Etzioni, who teaches computer science at the University of Washington, feels creative juices are in short supply in the Web search field. His commentary in Nature is only available to subscribers or those willing to pay per article, but writer Bob Brown provides a glimpse:

The main obstacle to progress ‘seems to be a curious lack of ambition and imagination,’ Etzioni writes in the piece.

The search critic, is “[Dr. Oren] Etzioni, who directs the University of Washington’s multidisciplinary Turing Center, calls on search engineers and others to ‘think outside the keyword search box.’” He is also working in the field of search and retrieval as well.

We learned:

[Dr. Etzioni] envisions more voice-based search that relies on increasingly intelligent computers like IBM’s Jeopardy-winning Watson and technologies like the Turing Center’s ReVerb software that can figure out how online information relates to each other.

Dr. Etzioni asserts that some changes will continue to be prompted by the move to smaller screens. Despite his criticism, he sees a bright future for the industry. He points to intelligent search developments in shopping search, like Decide.com, as an example of where we’re headed.

Eventually.

Cynthia Murrell, September 5, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

« Previous PageNext Page »

  •  Only search links from this page: