Google and Semantic Search

March 15, 2012

The Wall Street Journal certainly has a scoop if one has been ignoring Google’s actions over the last five or six years. For a traditional “real” news publication owned by News Corp., the newspaper knows how to generate what I call “faux excitement.” The for fee version of the Wall Street Journal story is at http://goo.gl/DnRrP although the link may go dead in a New York minute.

You will want to snag a copy of the dead tree edition of the March 15, 2012, newspaper. Turn to Section B1 and read “Google Gives Search a Refresh.” If you have don’t have an online subscription to Mr. Murdoch’s favorite newspaper, click here.

I found the write up bittersweet. An era has ended at the Google. Google is moving into the choppy waters of “smart” search. Others have been in the kayaks trying to navigate meaning for a long time. Perhaps the best known player is Autonomy, which is now the “baby tiger” at Hewlett Packard. Google wants to skip the baby tiger metaphor and jump to the semantic shark.

My research suggests that Google has been grinding away at semantic search for a while, at least a decade. There were signals about Google wanting to get beyond the “clever” linking method and the semantic techniques of Oingo (Applied Semantics) a decade ago. (Notice the word “semantics” in the company name?)

Then Google took a couple of steps forward when it landed the Transformics technologies and hired Dr. Ramanathan Guha. You can get the run down on Dr. Guha’s semantic leanings when you work through the hits for this query on Google: Ramanathan Guha semantic Web. No quotes required. Dr. Guha is the wizard behind the Programmable Search Engine, which I described in some length in Google Version 2.0: The Calculating Predator, published by the UK outfit Infonortics five years ago. The monograph may still be in print, and if you can snag a copy, you will see how Google’s wizard explains a system and method to populate “fact tables” and perform other feats of semantic legerdemain. The Wall Street Journal focuses on Google’s acquisition of Metaweb Technologies which is more along the lines of a a complementary content or fact generating system. Google has a tendency to “glue” technologies together, not toss the shark technologies out with the bathwater.

The write up is one of those fear-uncertainty-doubt maneuvers which technology companies enjoy. “Real” journalists are too savvy to fall for the shiny lures. The persistent reader will learn that there is no release date for the new Google search. This surprised me because I was sure I read and later heard that Google version 2.0 was Google Plus, not plain old search with some WolframAlpha.com like touches and Blekko nuances stirred in for enhanced flavor. I must admit I was confused about a news story written in the present tense which is really about some search advances which will arrive at an indeterminate time in the future, maybe tomorrow, maybe in September when the leaves turn.

The story suggests that Google is making changes because of Microsoft Bing, Apple’s voice search, or Facebook, which has no search service of much consequence. My hunch is that Google is making changes to search for one reason: ad revenue via traditional browser based search is softening. This is bad news for anyone dependent on online advertising revenue to pay for airplanes, Davos visits, and massive television and print advertising. Forget the competitors, Google has to do something that works to pump up margins and generate massive revenue. After more than a decade of trying to diversify its revenue, Google is under the gun. If Google’s magic touch were actually working, then the company should be rolling in dough from multiple revenue streams. Where is the payoff from appliances, enterprise sales, and me-too services which have essentially zero impact on companies like Apple, Facebook, and Microsoft.

Google’s PR thrust to focus attention on how it will improve search comes too quickly after Google got “real” journalists to believe that Google 2.0 was the “social” services. Well, how has that worked out for Google? I wrote about James Whittaker’s explanation of “Why I Left Google”. If you haven’t read the Whittaker write up, click here. The passage I noted was:

I couldn’t even get my own teenage daughter to look at Google+ twice, “social isn’t a product,” she told me after I gave her a demo, “social is people and the people are on Facebook.” Google was the rich kid who, after having discovered he wasn’t invited to the party, built his own party in retaliation. The fact that no one came to Google’s party became the elephant in the room.

Net net: Google has been in the semantic game a long time. Semantic technology is now in operation at Google, just as plumbing. Now Google wants to expose the pipes and drains.

The reason?

Semantic are hoped to give Google more hooks on which to hang advertising messages. Without something new, revenue growth at Google may degrade at a time when Apple, Facebook, and Microsoft continue to grow. The unthinkable? Nope, the reality.

Stephen E Arnold, March 15, 2012

Sponsored by Pandia.com

Business Intelligence in Tweets?

March 15, 2012

Six-year old Twitter has lot of credibility when it comes to knowing what people want. The social-networking site is serving up around 350 million tweets a day and that number is constantly growing. For businesses, that number means real-time analytics and activity from potential and current customers.

Mike Brown, current director of corporate development at Twitter, recently spoke at the CITE Conference and commented that the company is “the ultimate business intelligence tool.” This is because of Twitter’s ability to give a peak into what customers and competitors are saying, and the company plans to get more innovative on tracking activity and providing data analytics—with plans for advanced GPS sensors and targeted proactive advertisements for users.

A recent Computer World article, “Twitter Exec Calls Tweets the ‘Ultimate Business Intelligence Tool,” provides more insight from Brown on the importance of using Twitter in the business environment. The article states:

‘One of my favorite Twitter accounts, …because he just joined recently, is Rupert Murdoch,’ Brown said. ‘Whether you subscribe to his politics or not, the guy tells it like he thinks it and you really get that sense when you read his tweets. ‘I think whether you’re a brand or a marketer or a small business owner, [you need] to talk with an authentic voice that feels like your own, [one] your customers know,’ he said. ‘Your customer’s BS meter is pretty good. Don’t hand off your Twitter to your PR agency or even an intern who’s going to be with your business for a short while.’

Interesting advice. So, is the “ultimate business intelligence tool” statement by Brown self-serving? Nah, we think not. The assumption is a simple, fair, and par for the course. Note that raw tweet data can now be bought from companies like DataSift and Gnip. Everyone is starting to recognize the impact of tweets.

Andrea Hayden, March 15, 2012

Sponsored by Pandia.com

Linguamatics Releases New Cloud Based Text Mining Solution

March 15, 2012

Search appears to be a transparent technology but in reality it is not. With the mass amounts of unstructured information being released into cyberspace there is a growing need for solutions to sort it. Enter text mining.Text mining allows users to extract value from vast amounts of unstructured textual data.

Business Wire recently reported on the release of a new text mining platform by Linguamatics in the news release “Linguamatics Puts Big Data Mining on the Cloud.”

According to the release, in response to the industry trend to move software applications on to the cloud, Linguamatics has launched the first NLP-based, scalable text mining platform on the cloud.

The article states:

The new service builds on the successful launch by Linguamatics last year of I2E OnDemand, the Software-as-a-Service version of Linguamatics’ I2E text mining software. I2E OnDemand proved to be so popular with both small and large organizations, that I2E is now fully available as a managed services offering, with the same flexibility in choice of data resources as with the in-house, Enterprise version of I2E. Customers are thus able to benefit from best-of-breed text mining with minimum setup and maintenance costs.”
We are very excited about the possibilities of text mining on the cloud as well as Linguamatics’ ability to get its software up and running quickly.

Our view is that Linguamatics is an outfit worth monitoring.

Jasmine Ashton, March 15, 2012

Sponsored by Pandia.com

Attensity Election Forecasts

March 14, 2012

Is the prediction half right or half wrong? Sci-Tech Today seems to opt for optimism with “Twitter Analysis Gets Elections Half Right.” Attensity attempted to demonstrate its social analytics chops by forecasting Super Tuesday Republican Primary results using Twitter tweets. Their predictions were about 50% accurate; isn’t that about what you’d get flipping coins?

A lack of location data seems to be the reason Attensity’s predictions were less precise than hoped. Writer Scott Martin reveals:

Part of the problem lies in a lack of location-based data about Twitter users’ tweets. Such information is ‘scarce’ on Twitter, says Michael Wu, principal scientist of analytics for Lithium, a social-analytics firm. That’s because Twitter users would have to turn on the ‘location’ feature in their mobile devices. A vast pool of location-based tweets would enable analytics experts to better connect tweets to where they come from across the nation. In the case of Super Tuesday, that would mean more localized information on tweets about candidates.

Another roadblock to accurate prediction lies in identifying when multiple tweets come from the same enthusiastic tweeter, or are spam-like robo-tweets. Furthermore, there is no ready way to correlate the expression of opinions with actions, like actually voting. It seems that this analytic process has a long way to go. It also seems that half right is close enough to spin marketing horseshoes.

Serving several big-name clients, Attensity provides enterprise-class social analytics as well as industry solutions for vertical markets. They pride themselves on the accuracy and ease of use of their tools. My thought is that I will pick horses the old fashioned way.

Cynthia Murrell, March 14, 2012

Sponsored by Pandia.com

Working Smart: What If One Is Unsmart?

March 10, 2012

Short honk: I read “Work Smart, Not Hard – An Introduction To Google Analytics Dashboards.” Three times this week I have heard “work smart.” Maybe this is a meme designed to make those who can use ATM perceive themselves as power users of advanced computing systems? Here’s the passage I noted:

[Google Analytics] saves me time, helps me look like I’m 100% on top of things when a client calls, and helps me add hours back into days that were previously spent hunting and pecking for information.

The idea is that a vendor creates a system which generates reports based on data. I do not trust data from sources which do no provide me with access to the who, what, when, and how those data were managed. When I buy carrots, I check out what’s on offer. The same approach does not apply to the use of data from Google or other sources.

But the key point in the write up is this phrase:

helps me look like I’m 100% on top of things when a client calls

I think appearances are important, but the notion of helping a person appear to know something when that person may not know whereof what he/she speaks is troubling to me.

As systems put training wheels on software, systems and services which “process” data and spit out answers, why are folks so eager to “look” smart. Why are those who are supposed to be data wizards so eager to give up the “hands in the dirt” approach which puts one in touch with the raw material?

I wonder if the clients know those who advise them are “working smart” by using systems that put up an appearance of insight when the reality may be quite different. ATM users may not quality as data analysts. If those individuals had the requisite skills to make sense of unverifiable data, would there be a shortage of analytics professionals?

Stephen E Arnold, March 10, 2012

Sponsored by Pandia.com

More Allegations about Fast Search Impropriety

March 8, 2012

With legions of Microsoft Certified Resellers singing the praises of the FS4SP (formerly the Fast Search & Transfer search and retrieval system), sour notes are not easily heard. I don’t think many users of FS4SP know or care about the history of the company, its university-infused technology, or the machinations of the company’s senior management and Board of Directors. Ancient history.

I learned quite a bit in my close encounters with the Fast ESP technology. No, ESP does not mean extra sensory perception. ESP allegedly meant the enterprise search platform. Fast Search, before its purchase by Microsoft, was a platform, not a search engine. The idea was that the collection of components would be used to build applications in which search was an enabler. The idea was a good one, but search based applications required more than a PowerPoint to become a reality. The 64 bit Exalead system, developed long before Dassault acquired Exalead, was one of the first next generation, post Google systems to have a shot at delivering a viable search based application. (The race for SBAs, in my opinion, is not yet over, and there are some search vendors like PolySpot which are pushing in interesting new directions.) Fast Search was using marketing to pump up license deals. In fact, the marketing arm was more athletic than the firm’s engineering units. That, in my view, was the “issue” with Fast Search. Talk and demos were good. Implementation was a different platter of herring five ways.

image

Fast Search block diagram circa 2005. The system shows semantic and ontological components, asserts information on demand, and content publishing functions—all in addition to search and retrieval. Similar systems are marketed today, but hybrid content manipulation systems are often a work in progress in 2012. © Fast Search & Transfer

I once ended up with an interesting challenge resulting from a relatively large-scale, high-profile search implementation. Now you may have larger jobs than I typically get, but I was struggling with the shift from Inktomi to the AT&T Fast search system in order to index the public facing content of the US federal government.

Inktomi worked reasonably well, but the US government decided in its infinite wisdom to run a “free and open competition.” The usual suspects responded to the request for proposal and statement of work. I recall that “smarter than everyone else” Google ignored the US government’s requirements.

image

This image is from a presentation by Dr. Lervik about Digital Libraries, no date. The slide highlights the six key functions of the Fast Search search engine. These are extremely sophisticated functions. In 2012, only a few vendors can implement a single system with these operations running in the core platform. In fact, the wording could be used by search vendor marketers today. Fast Search knew where search was heading, but the future still has not arrived because writing about a function is different from delivering that function in a time and resource window which licensees can accommodate. © Fast Search & Transfer

Fast Search, with the guidance of savvy AT&T capture professionals, snagged the contract. That was a fateful procurement. Fast Search yielded to a team from Vivisimo and Microsoft. Then Microsoft bought Fast Search, and the US government began its shift to open source search. Another consequence is that Google, as you may know, never caught on in the US Federal government in the manner that I and others assumed the company would. I often wonder what would have happened if Google’s capture team had responded to the statement of work instead of pointing out that the requirements were not interesting.

Read more

Big Data Excitement at the 2012 Strata Conference

March 8, 2012

Don’t get hit by a stray bullet at the big data corral. IT World examines “The Wild West of Big Data.” Fresh from this year’s Strata Conference in Santa Clara, journalist Brian Proffitt  describes how the current hubbub around big data mirrors the open source environment of a decade ago (the sense of urgency around a rising technology) and how it doesn’t (the lack of a grass-roots community feel).

Excitement is understandable in this burgeoning field, and Proffit felt the anticipation of profit as a “zing” in the air. However, he seems to long for the atmosphere of yore, when excited hackers fueled the advance of innovation for innovation’s sake, rather than the current domination of cloud advances by corporate types looking to make a buck. While he admits companies acknowledge the open source contributions to their products, they usually do so by way of pointing out their own efforts to give back.

The article observes:

“Big data’s community is purely commercial and without the threat of a big competitor to stand in its way. In this sense, it is more of a gold rush than Linux ever was, because without the checks of the internal and external pressures that the early Linux community endured, there seems to be nothing that can get in big data’s way.

“Which may be why we are seeing, even now, signs from experts that are warning potential customers and the vendors willing to take their money to slow down and start really thinking about what they want to do.”

Excellent advice for any gold rush, we’d say. Proffit feels the same, but observes that such voices of caution were in the minority among the Conference’s speakers. No surprise there; who has time for the voice of reason during a stampede?

Cynthia Murrell, March 8, 2012

Sponsored by Pandia.com

Big Data Bridges Diverse Fields

March 3, 2012

Solving the big data problem and what we are going to do with the immense amount of unstructured information that is just sitting in cyber space, is on the forefront of many minds.

The New York Times technology blog Bits recently reported on this issue in the article “IBM: Big Data, Bigger Patterns.”

According to the article, the recent explosion of information available on the Internet paired with inexpensive computer hardware has made it possible for enterprises to store huge amounts of unstructured data. Now that we know it is possible, the goal is to do it cost effectively.

In order to remain cost sensitive, many companies are trying to find overlapping interests and commonalities between different fields.

The article states:

“The trend of looking for commonalities and overlapping interests is emerging in many parts of both academia and business. At the ultra small nanoscale examination of a cell, researchers say, the disciplines of biology, chemistry and physics begin to collapse in on each other. In a broader search for patterns, students of the statistical computing language known as R have used methods of counting algae blooms to prove patterns of genocide against native peoples in Central America.”

While the cross pollination of various business interests is very exciting, we’re interested to see if it leads to complications down the road.

Jasmine Ashton, March 3, 2012

Sponsored by Pandia.com

Protected: Ikanow: Creating Pathways through Information

March 1, 2012

This content is password protected. To view it please enter your password below:

Hadoop Technology: Calling All Mathematicians!

February 26, 2012

Scalability and big data solutions are not simply buzzwords thrown around the search industry. These are both key items in assessing value of platforms, and are both key reasons users are drawn to Hadoop technology.

However, the fact that Hadoop is picking up steam poses a major problem to those attempting to find talent to work the technology. People experienced in Hadoop are hard to come by. Cloudera, IBM, Hortonworks, and MapR are all investing in Hadoop training programs, choosing to invest in internal candidates rather than trying to hire new talent. A related article on CIO.in, “Hadoop Wins Over Enterprise IT, Spurs Talent Crunch” asserts on the topic:

‘We originally thought we needed to find a hardcore Java developer,’ Return Path’s Sautins says. But in reality, the talent that’s best suited for working with Hadoop isn’t necessarily a Java engineer. ‘It’s somebody who can understand what’s going on in the cluster, is interested in picking up some of these tools and figuring out how they work together, and can deal with the fact that pretty much everything in the Hadoop ecosystem is not even a 1.0 release yet,’ Sautins says. ‘That’s a real skill set.’

The problem of finding talent could eventually limit the continued adoption of Hadoop technology. Search analytics is now opening doors for those with deep math skills and backgrounds in statistics and science. People with this basic skills can be taught how to use these tools, and will be very valuable to a great number of companies adopting this technology.

Andrea Hayden, February 26, 2012

Sponsored by Pandia.com

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta