Inteltrax: Top Stories, July 11 to July 15

July 18, 2011

Inteltrax, the data fusion and business intelligence information service, captured three key stories germane to search this week, particularly the explosion of social media-oriented business intelligence.

Our week jumped off with a lengthy feature article, “Facebook Becoming Data Mining Powerhouse,”  draws a surprising correlation between the recent Supreme Court data mining ruling and how Facebook’s advertising arm might use this to tighten its impressive lead on the rest of the advertising world.

Facebook’s bite-sized rival, Twitter, also got a lot of column space this week. First, in the article “Twitter Joins the Analytic Race,” which explored the micro-blog sites recent purchase of analytic house, BlackType and asked, “Why?”.

Like an entertaining Twitter feed, the company popped up frequently over the week, next in the article, “Mining Twitter Mountain.”  This story focused on the numerous analytic apps and programs designed to pluck sentiment and cohesive data from millions of 140-character chunks of info.

It’s no secret that social media is producing more savory data for advertisers, investors and trend-spotters than ever thought possible. We were excited to see the social media companies themselves taking a role, but also intrigued about the analytic companies springing up around them, not unlike mining camps around an Old West gold strike. We’ll be watching these claims and others, rest assured.

Follow the Inteltrax news stream by visiting www.inteltrax.com

Patrick Roland, Editor, Inteltrax.

Sponsored by Pandia.com, publishers of Stephen E Arnold’s new monograph, The New Landscape of Enterprise Search

MarkLogic, FAST, Categorical Affirmatives, and a Direction Change

July 5, 2011

I weakened this morning (July 4, 2011) with a marketing Fourth of July boom. I received one of those ever present LinkedIn updates putting a comment from the Enterprise Search Engine Professionals Group in front of me.

image

The MarkLogic positioning exploded on my awareness like a Fourth of July skyrocket’s burst.

Most of the comments on the LinkedIn group are ho hum. One hot topic has been Microsoft’s failure to put much effort in its blogs about Fast Search & Transfer’s technology. Snore. Microsoft put down $1.2 billion for Fast, made some marketing noises, and had a fellow named Mr. Treo-something talk to me about the “new” Fast Search system. Then search turned out to be more like a snap in but without the simplicity of a Web part. Microsoft moved on and search is there, but like Google’s shift to Android, search is not where the action is. I am not sure who “runs” the enterprise search unit at Microsoft. Lots of revolving door action is my impression of Microsoft’s management approach in the last year.

The noise died down and Fast has become another component in the sprawling Shanghai of code known as SharePoint 2010. Making Fast “fast” and tuning it to return results that don’t vary with each update has created a significant amount of business for Microsoft partners “certified” to work on Fast Search. Licensees of the Linux/Unix version of ESP are now like birds pushed from the next by an impatient mother.

New MarkLogic Market Positioning?

Set Microsoft aside for a moment and look at this post from a MarkLogic professional who once worked at Fast Search and subsequently at Microsoft. I am not sure how to hyperlink to LinkedIn posts without generating a flood of blue and white screens begging for log in, sign up, and money. I will include a link, but you are on your own.

Here’s the alleged MarkLogic professional’s comment:

Many organizations are replacing FAST with MarkLogic. MarkLogic offers a scalable enterprise search engine with all the features of FAST plus more…

Wow.

An XML engine with wrappers is now capable of “all” the Fast features. In my new monograph “The New Landscape of Enterprise Search”, I took some care to review information presented by Fast at CERN, the wizard lair in Europe, about Fast Search’s effort to rewrite Fast ESP, which was originally a Web search engine. The core was wrapped to convert Web search into enterprise search. This was neither quick nor particularly successful. Fast Search & Transfer ran into some tough financial waters, ended up the focus of a government investigation, and was quickly sold for a price that surprised me and the goslings in Harrod’s Creek.

You can get the details of the focus of the planned reinvention of the Fast system and the link to the source document at CERN which I reference in my Landscape study. A rewrite indicates that some functions were not in 2007 and 2008 performing in  a manner that was acceptable to someone in Fast Search’s management. Then the acquisition took place. The Linux/Unix support was nuked. Fast under Microsoft’s wing has become a utility in the incredible assemblage of components that comprises SharePoint 2010. I track the SharePoint ecosystem in my information service SharePointSemantics.com. If you haven’t seen the content, you might want to check it out.

Read more

Big Data Inhabits New Space in the Virtual Market

June 20, 2011

We noticed this press release, “Big Data Mall Opens on the Informatica Marketplace” which was picked up by GlobeNewswire.

Big data is the buzzword du jour to describe large amounts of structured and unstructured information. The idea is that there is so much data to process that traditional methods fall short of delivering useful results at a reasonable cost in the time available for a 30 something decision maker to fill his or her role as a “decider.”

Companies like Informatica are making tackling this contemporary challenge a priority, and continue to lead the way in terms of data management solutions. Concurrent with the release of their Informatica 9.1 Platform, consumers now have access to the recent addition to the Informatica Marketplace, the Big Data Mall.

The Marketplace allows both customer and vendor to collaborate in an effort to better manage the goals of modern commerce. The methods arrived at are what is referred to within the Marketplace as blocks. Specific blocks are then collected into sections known as malls. The release provides an explanation of this new section:

“The Big Data Mall is a focal point for the industry in addressing the challenges and opportunities in Big Data,” said Tony Young, chief information officer, Informatica. “The new Mall debuts with 40 Blocks from Informatica and other leading vendors that map to the three major technology trends that define Big Data – Big Transaction Data, Big Interaction Data and Big Data Processing. New Blocks will be added going forward, as more and more innovative solutions emerge from the industry around Big Data.”

Will big data become the next frontier for findability or will predictive analytics become the next big thing?

Micheal Cory, June 20, 2011

Sponsored by ArnoldIT.com, the resource for enterprise search information and current news about data fusion

Inteltrax: Top Stories, June 10 to June 16 2011

June 20, 2011

For readers of Beyond Search who have an interest in data fusion and analytics, the editor of Inteltrax.com, our Web log tracking this market, provided us with a run down of last week’s top stories.—Stephen E Arnold

Inteltrax, the data fusion and business intelligence information service, captured four key stories germane to search this week.

First, “Analytics for Cities” points out the many ways companies like IBM are strengthening search for city governments to run smoother using business intelligence and analytics.

Second, “Don’t Forget India When Pushing Analytic Chips Toward China” takes an in-depth look at the burgeoning Chinese analytic and search market. However, those betting heavily on China are doing a disservice overlooking India.

Third, “South Africa Ready to Join Analytics Boom”  shows how some are declaring South Africa dead when it comes to using analytic search, however, a recent economic boom suggests otherwise.

Fourth, “The Rising Tide of Unstructured Data” http://inteltrax.com/2011/06/the-rising-tide-of-unstructured-data warns how unstructured data is a growing thread to the analytics and search communities alike.

Clearly, search professionals are being transformed by developments in predictive analytics, whether it is as far away as Africa or China, in their own city or even in their own business’ mounting pile of info. These are subjects that effect our global business world on almost every level and deserve our attention.

Follow the Inteltrax news stream by visiting www.inteltrax.com

Patrick Roland, Editor, Inteltrax June 20, 2011

Thanks to Digital Reasoning, a sponsor of Beyond Search

Will Technology Actually Revolutionize News Gathering

June 18, 2011

One of my for fee columns for July 2011 focuses on AOL Patch.com. One could make the case that Patch.com is one of the efforts underway to revolutionize news.

Information, particularly news, is in one of those “best of times, worst of times” moments. Shocking event follows shocking story the way a ballpark wiener requires a white bread roll.

Some major formats, channels, and companies are failing. The content is not hot or not relevant. The price is too high for the perceived value or the hassle is too great for the payback.

We found Ushahid.com’s “’What Really Happened?’: Using SwiftRiver to Help Confirm Newstips” thought provoking. The story discusses the current failings of news outlets and the increasing efforts to use technological innovations to usher in a new era of reporting. The piece highlights the use of SwiftRiver, described on its site as:

” … a free an open-source suite of tools for managing excessive amounts of real-time data. Our architecture allows users to mashup real-time data from disparate media channels (Twitter, Email, SMS, JSON, XML or RSS/Atom), structures it, then offers methods for using the output.”

Being someone who can easily lose hours poring over articles and posts from a host of media outlets, most of which originate beyond our borders, the drive the author speaks of to rise above the idiocy of modern news media resonated with me. I found this passage somewhat encouraging:

“Can we get a ‘people’s newswire’ based on eyewitness reports of newsworthy events? I believe we can – if we combine the automation of systems like Swiftriver, the data visualization possibilities of tools like Ushahidi, and the insight of trained reporters who can follow up on potential leads.”

We remain open minded. However, will technology replace the traditional approach to identifying a story, researching it, and then putting the write up through a process of nit-picking by colleagues and bosses? We don’t think so, but will it matter to those raised with smartphones and persistent distraction?

Stephen E Arnold, June 18, 2011

Sponsored by ArnoldIT.com, the resource for enterprise search information and current news about data fusion

Attensity Marketing Themes Revealed

June 16, 2011

In my email was the Attensity newsletter, dated June 15, 2011. In addition to unaudited assertions like “the company’s most successful quarter to date” and the word “successful” undefined, there were some interesting hints about the company’s strategy.

First, in the letter from the CEO (Ian Bonner), the company has rolled out a Customer Command Center. The militaristic suggestion is fascinating. A number of search and content processing vendors offer dashboards, but the command center may be a fascinating new view of what text processing software is supposed to do.

Second, the company continues to emphasize the new release of the firm’s flagship, Attensity 6.0. You can get additional information about the system from a Web page with the title “BI Guys Watch Out, It’s Never Been This Easy.

Third, Attensity continues to use webinars to drum up awareness and business. In what I find an interesting move, the webinar about “accuracy” now includes a companion white paper. You can read that document at this link. Registration appears to be simple once you provide the all important contact information. The one two punch of a webinar and a more traditional white paper may be one indication that hot new marketing methods require multiple payload delivery vehicles. I wanted to pick up on the “command center” metaphor.

Finally, a battle of assertions about sentiment appears to be escalating. I elected not to report about the misfires of one well known vendor of sentiment solutions. It seems that Attensity has picked up some vibrations and responded with “When Does Sentiment NOT Matter?” The idea is that sentiment is not an all purpose solution. I agree with Attensity. Perhaps some blogger or sentiment vendor will step up and rip the skrim from the reality of sentiment analysis.

Net net: the “command center” analogy strikes me as marking a step up in the marketing warfare for text analytics. One indicator will be the diffusion of the “command center” metaphor. Which competitor will be the first to embrace this Attensity-ism?

Stephen E Arnold, June 16, 2011

Sponsored by ArnoldIT.com, the resource for enterprise search information and current news about data fusion

Recommind and Predictive Coding

June 15, 2011

The different winners of the Kentucky Derby, Preakness, and Belmont horse races cast some doubt on predictive analytics. But search and content processing is not a horse race. The results are going to be more reliable and accurate, or that is the assumption. One thing is 100 percent certain: A battle over the phrase “predictive coding” in the marketing of math that’s in quite a few textbooks is brewing.

First, you will want to read US 7,933,859, Systems and Methods for Predictive Coding.” You can get your copy via the outstanding online service at USPTO.gov. The patent was a zippy one, filed on May 25, 2010, and granted on April 26, 2011.

There were quite a few write ups about the patent. We noted “Recommind Patents Predictive Coding” from Recommind’s Web site. The company has a Web site focused on predictive coding with the tag line “Out predict. Out perform.” A quote from a lawyer at WilmerHale announces, “This is a game changer in eDiscovery.”

Why a game changer? The answer, according to the news release, is:

Recommind’s Predictive Coding™ technology and workflow have transformed the legal industry by accelerating the most expensive phase of eDiscovery, document review. Traditional eDiscovery software relies on linear review, a tedious, expensive and error-prone process . . . . Predictive Coding uses machine learning to categorize and prioritize any document set faster, more accurately and more defensibly than contract attorneys, no matter how much data is involved.

Some push back was evident in “Predictive Coding War Breaks Out in US eDiscovery Sector.” The point in this write up is that other vendors have been offering predictive functions in the legal market.

Our recollection is that a number of other outfits dabble in this technological farm yard as well. You can read the interview with Google-funded Recorded Future and Digital Reasoning in my Search Wizards Speak series. I have noted in my talks that there seems to be some similarity between Recommind’s systems and methods and Autonomy’s, a company that is arguably one of the progenitors of probabilistic methods in the commercial search sector. Predecessors to Autonomy’s Integrated Data Operating Layer exist all the way back to math-crazed church men in ye merrie old England before steam engines really caught on. So, new? Well, that’s a matter for lawyers I surmise.

With the legal dust up between i2 Ltd. and Palantir, two laborers on the margins of the predictive farm yard, legal fires can consume forests of money in a flash. You can learn more about data fusion and predictive analytics in my Inteltrax information service. Navigate to www.inteltrax.com.

Stephen E Arnold, June 15, 2011

Sponsored by ArnoldIT.com, the resource for enterprise search information and current news about data fusion

Statistics for the Statistically Inclined

June 10, 2011

Due to a strong bias against everyone’s favorite search engine, it is difficult for me to become excited over new Google developments.  However, having endured a number of statistics classes, I will certainly give credit where credit is due.

I was recently directed to Google Correlate and spent a solid twenty-five minutes entertaining myself with test statistical relationships.  The offering consists of comparisons of an uploaded data set against a real data set courtesy of the search mogul.  Google provides results based on a Pearson Correlation Coefficient (r) nearest to 1.0, giving the user the most positively correlated queries.  One can customize the results in a number of manners: for negative relationships, against a time series or regional location, for a normalized sine function or a scatter plot, etc.

For any glazed over eyes out there, the Web site sums up the intent this way:

“Google Correlate is like Google Trends in reverse. With Google Trends, you type in a query and get back a series of its frequency (over time, or in each US state). With Google Correlate, you enter a data series (the target) and get back queries whose frequency follows a similar pattern.”

Don’t worry, there is a tutorial.

It should also be noted that this service is tagged as “experimental”.  I fear due to lack of popularity, it may dissolve in its very own time series in sad, monthly increments.

I imagine this tool is providing certain students some relief, but what of regular users?  In the words of the head gander, how many Google mobile users know what correlate means?  Without crunching the data, I think our r may be approaching -1.0.

Sarah Rogers, June 10, 2011

Sponsored by ArnoldIT.com, the resource for enterprise search information and current news about data fusion

ProQuest: A Typo or Marketing?

June 10, 2011

I was poking around with the bound phrase “deep indexing.” I had a briefing from a start up called Correlation Concepts. The conversation focused on the firm’s method of figuring out relationships among concepts within text documents. If you want to know more about Correlation Concepts, you can get more information from the firm’s Web site at http://goo.gl/gnBz6.

I mentioned to Correlation Concepts Dr. Zbigniew Michalewicz’s work in mereology and genetic algorithms and also referenced the deep extraction methods developed by Dr. David Bean at Attensity. I also commented on some of the methods disclosed in Google’s open source content. But Google has become less interesting to me as new approaches have become known to me. Deep extraction requires focus, and I find it difficult to reconcile focus with the paint gun approach Google is now taking in disciplines far removed from my narrow area of interest.

image

A typo is a typo. An intentional mistake may be a joke or maybe disinformation. Source: http://thiiran-muru-arul.blogspot.com/2010/11/dealing-with-mistakes.html

After the interesting demo given to me by Correlation Concepts, I did some patent surfing. I use a number of tools to find, crunch, and figure out which crazily worded filing relates to other, equally crazily worded documents. I don’t think the patent system is much more than an exotic work of fiction and fancy similar to Spenser’s The Faerie Queene.

Deep indexing is important. Key word indexing does not capture in some cases the “aboutness” of a document. As metadata becomes more important, indexing outfits have to cut costs. Human indexers are like tall grass in an upscale subdivision. Someone is going to trim that surplus. In indexing, humans get pushed out for fancy automated systems. Initially more expensive than humans, the automated systems don’t require retirement, health care, or much management. The problem is that humans still index certain content better than automated systems. Toss out high quality indexing and insert algorithmic methods, and you get search results which can vary from indexing update to indexing update.

Read more

Digital Reasoning Adds Chinese Support to Synthesys

June 9, 2011

Digital Reasoning Introduces Chinese Language Support for Big Data Analytics,” announces the company’s press release. This latest advance from the natural language wizards acknowledges the growing prevalence of Chinese on the Web. The support augments their premiere product, Synthesys:

“Synthesys can now analyze the unstructured data from a variety of sources in both English and Chinese to uncover potential threats, fraud, and political unrest. By automating this process, intelligence analysts can gain actionable intelligence in context quickly and without translation.”

This key development is the sort of thing that makes us view Digital Reasoning as a break out company in content processing. Their math-based approach to natural language analytics puts them ahead of the curve in this increasingly important field. Synthesis has become an essential tool for government agencies and businesses alike.

This support for Chinese is just the beginning. Rob Metcalf, President and COO, knows that “the next generation of Big Data solutions for unstructured data will need to natively support the world’s most widely spoken languages.”

We’re delighted to see Digital Reasoning continue to excel.

Cynthia Murrell June 8, 2011

Sponsored by ArnoldIT.com, the resource for enterprise search information and current news about data fusion

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta