Google Gets Sentimental

August 10, 2009

I got a briefing from a company called Lexalytics. The firm, as I recall, was explaining its sentiment based content processing technology. I thought it was interesting. I subsequently learned that Lexalytics’ system would be part of the Financial Times’s online service, but my recollection is fuzzy. I thought of this company when I learned about the Google patent application US20090193328, “Aspect Based Sentiment Summarization”. You can find this document at the ever so powerful USPTO via its patent search engine. The abstract for the patent application, which some wizards believe are little more than the equivalent of my mother’s making Christmas tree ornaments for her friends stated:

Reviews express sentiment about one or more entities. Phrases in the reviews that express sentiment about a particular aspect are identified. Reviewable aspects of the entity are also identified. The reviewable aspects include static aspects that are specific to particular types of entities and dynamic aspects that are extracted from the reviews of a specific entity instance. The sentiment phrases are associated with the reviewable aspects to which the phrases pertain. The sentiment expressed by the phrases associated with each aspect is summarized, thereby producing a summary of sentiment associated with each reviewable aspect of the entity. The summarized sentiment and associated phrases can be stored and displayed to a user as a summary description of the entity.

Now Lexalytics and other companies with sentiment sniffers are only part of what this document sparked in my mind. The other low voltage arc was in the Endeca “Guided Navigation” department of my addled goose brain. As I read the exciting patent document and its droll legalese, I realized that the Google is claiming that its performs the same magic that Orange Julius does when it mixes fruits in fruit shake.

Will Lexalytics and Endeca shiver their timbers? Nope. My hunch is that both companies will see their technology as light years ahead of the Google’s. I also assert that both companies will not see Google’s claims as having much impact on their enterprise and ecommerce content processing applications.

In my opinion, this type of “Google does not have what we have” thinking is going to lead to unfortunate circumstances and quickly.

Stephen Arnold, August 11, 2009

Written by Stephen E. Arnold · Filed Under EDiscovery, Enterprise, Google, News, Technology, Text analytics, Text processing | 2 Comments

Google Relationship Map

August 3, 2009

A happy quack to the reader who sent me a link to Muckety.com and its relationship map of Google. Same Googlers and former Googler whom I track appear on the map; for example, Anna Patterson (University of Illinois Ph.D., developer of Xift, Google inventor, one of the founders of Cuil.com) and the Digg-hyped Marissa Mayer(keeper of the user interface and authority on Internet anonymity).

But there are some omissions. You can click around as I did, and you may be able to nail down Steve Lawrence or Sanjay Ghemawat. Perfect? Nope. Useful. I think it is suggestive in light of IBM’s alleged “invention” of relationship maps discovered by processing data.

For the purposes of comparison, here’s the Cluuz.com map of Ms. Mayer:

I assume IBM’s relationship maps put these two free systems to shame.

Stephen Arnold, August 3, 2009

Written by Stephen E. Arnold · Filed Under News, Online (general), Privacy, Search, Text analytics, Text processing | 1 Comment

IBM Snags SPSS, May Be Bad Timing

July 29, 2009

IBM bought SPSS. Most third and fourth year statistics majors learn to love either SPSS or arch-rival SAS. MicrostAT just does not paddle fast enough for the serious stats whiz. You can read about the deal on the IBM Web site or on TechCrunch.

I liked the “Monster Merger” story. The guts of the deal are presented. For me the most interesting comment was:

IBM says it will continue to support and enhance SPSS technologies while allowing customers to take advantage of its own product portfolio. SPSS will become part of the Information Management division within the Software Group business unit, led by Ambuj Goyal, General Manager, IBM Information Management.

Right.

What I have not seen is a discussion of the SPSS text processing functions. IBM has its OmniFind and a legion of partners to deliver text processing functions. Then there is the Web Fountain system. You do remember Web Fountain, don’t you. The brainiacs at Almaden continue to labor away in text processing.

Now IBM gets PASW which counts, categorizes, and performs other content processing operations. SPSS bought Lexiquest and has added functionality since that deal in 2002.

The plumbing for SPSS text processing has these components:

SPSS, like IBM, requires a commitment from a licensee. IBM may be joining the party a bit late. The shift to lighter weight analytic tools is underway. Newcomers like Clarabridge have been holding their own. SAS’s purchase of Teragram and its open sourcing some of Teragram’s software makes it clear that the good old days may be receding in the rear view mirror. SPSS can be a real resource hog. That should make IBM happy. IBM loves to sell consulting but a close second is selling hardware and engineering support. SPSS has not made the leap to Web services.

In short, I think the text processing components of SPSS may get lost and quickly within the massive IBM organization. Furthermore, this deal may have been made at the right time for SPSS and maybe the wrong time for IBM. Just my opinion.

Stephen Arnold, July 29, 2009

Written by Stephen E. Arnold · Filed Under Business strategy, Enterprise, News, Text analytics, Text processing | 1 Comment

Kapow Technologies

July 17, 2009

With the rise of free real time search systems such as Scoopler, Connecta, and ITPints, established players may find themselves in shadows. Most of the industrial strength real time content processing companies like Connotate and Relegence prefer to be out of the spotlight. The reason is that their customers are often publicity shy. When you are monitoring information to make a billion on Wall Street or to snag some bad guys before those folks can create a disruption, you want to be far from the Twitters.

A news release came to me about an outfit called Kapow Technologies. The company described itself this way:

Kapow Technologies provides Fortune 1000 companies with industry-leading technology for accessing, enriching, and serving real-time enterprise and public Web data. The company’s flagship Kapow Web Data Server powers solutions in Web and business intelligence, portal generation, SOA/WOA enablement, and CMS content migration. The visual programming and integrated development environment (IDE) technology enables business and technical decision-makers to create innovative business applications with no coding required. Kapow currently has more than 300 customers, including AT&T, Wells Fargo, Intel, DHL, Vodafone and Audi. The company is headquartered in Palo Alto, Calif. with additional offices in Denmark, Germany and the U.K

I navigated to the company’s Web site out of curiosity and learned several interesting factoids:

First, the company is a “market leader” in open source intelligence. It has technology to create Web crawling “robots”. The technology can, according to the company, “deliver new Web data sources from inside and outside the agency that can’t be reached with traditional BI and ETL tools.” More information is here. Kapow’s system can perform screen scraping; that is, extracting information from a Web page via software robots.

Second, the company offers what it calls a “portal generation” product. The idea is to build new portals or portlets without coding. The company said:

With Kapow’s technology, IT developers [can]: Avoid the burden of managing different security domains; eliminate the need to code new transaction; and bypass the need to create or access SOA interfaces, event-based bus architectures or proprietary application APIs.

Third, provide a system that handles content migration and transformation. With transformation an expensive line item in the information technology budget, managing these costs becomes more important each month in today’s economic environment. Kapow says here:

The module [shown below] acts much as an ETL tool, but performs the entire data extraction and transformation at the web GUI level. Kapow can load content directly into a destination application or into standard XML files for import by standard content importing tools. Therefore, any content can be migrated and synchronized to and between any web based CMS, CRM, Project Management or ERP system.

Kapow offers connections for a number of widely used content management systems, including Interwoven, Documentum, Vignette, and Oracle Stellent, among others.

Kapow includes a search function along with application programming interfaces, and a range of tools and utilities, including RoboSuite (a block diagram appears below):

Source: http://abss2.fiit.stuba.sk/TeamProject/2006/team05/doc/KapowTech.ppt

Written by Stephen E. Arnold · Filed Under Business strategy, EDiscovery, Feature, Online (general), Technology, Text analytics, Text processing | 5 Comments

Big Data, Big Implications for Microsoft

July 17, 2009

In March 2009, my Overflight service picked up a brief post in the Google Research Web log called “The Unreasonable Effectiveness of Data.” The item mentioned that three Google wizards wrote an article in the IEEE Intelligent Systems journal called “The Unreasonable Effectiveness of Data.” You may be able to download a copy from this link.

On the surface this is a rehash of Google’s big data argument. The idea is that when you process large amounts of data with a zippy system using statistical and other mathematical methods, you get pretty good information. In a very simple way, you know what the odds are that something is in bounds or out of bounds, right or wrong, even good or bad. Murky human methods like judgment are useful, but with big data, you can get close to human judgment and be “right” most of the time.

When you read the IEEE write up, you will want to pay attention to the names of the three authors. These are not just smart guys, these are individuals who are having an impact on Google’s leapfrog technologies. There’s lots of talk about Bing.com and its semantic technology. These three Googlers are into semantics and quite a bit more. The names:

Alon Halevy, former Bell Labs researcher and the thinker answering to some degree the question, “What’s after relational databases”?”
Peter Norvig, the fellow who wrote the standard textbook on computational intelligence and smart software
Fernando Pereira, the former chair of Penn’s computer science department and the Andrew and Debra Rachleff Professor.

So what do these three Googlers offer in their five page “expert opinion” essay?

First, large data makes smart software smart. This is a reference to the Google approach to computational intelligence.

Second, big data can learn from rare events. Small data and human rules are not going to deliver the precision that one gets from algorithms and big data flows. In short, costs for getting software and systems smarter will not spiral out of control.

Third, the Semantic Web is a non starter so another method – semantic interpretation – may hold promise. By implication, if semantic interpretation works, Google gets better search results plus other benefits for users.

Conclusion: dataspaces.

See Google is up front and clear when explaining what its researchers are doing to improve search and other knowledge centric operations. What are the implications for Microsoft? Simple. The big data approach is not used in the Powerset method applied to Bing in my opinion. Therefore, Microsoft has a cost control issue to resolve with its present approach to Web search. Just my opinion. Your mileage may vary.

Stephen Arnold, July 17, 2009

Written by Stephen E. Arnold · Filed Under Business strategy, Google, Microsoft, News, Online (general), Search, Semantic, Text analytics, Text processing | Comments Off on Big Data, Big Implications for Microsoft

Software Robots Determine Content Quality

July 15, 2009

ZDNet ran an interesting article by Tom Steinert-Threlkeld about software taking over human editorial judgment. “Quality Scores for Web Content: How Numbers Will Create a Beautiful Cycle of Greatness for Us All” is worth tucking into one’s folder for future reference.

Some background. Mr. Steinert-Threlkeld notes that the hook for his story is a fellow named Patrick Keane, who worked at the Google for several years. What’s not included in Mr. Steinert-Threlkeld’s write up is that Google has been working on “quality scores” for many years. You can get references to specific patent and technical documents in my Google monographs. I just wanted to point out that the notion of letting software methods do the work that arbiters of taste have been doing is not a new idea.

The core of the ZDNet story was:

Keane is at work on figuring out what will constitute a Quality Score, for every article, podcast, Webcast or other piece of output generated by an Associated Content contributor. If his 21st Century content production and distribution network can figure out how to put a useful rank on what it puts out on the Web then it can raise it up, notch by notch. This scoring comes right back to the Page Rank process that is at the heart of Google’s success as a search engine. “The great thing about Page Rank in Google ‘ s algorithm is … seeing the Web as a big popularity contest,’’ said Keane, in Associated Content’s offices on Ninth Avenue in Manhattan.

Mr. Steinert-Threlkeld does a good job of explaining how the method at Mr. Keane’s company (Associated Content) will approach the scoring issue.

My thoughts, before I forget them, are:

Digging into what Google has disclosed about its scoring systems and methods is probably a useful exercise for those covering Google and the businesses in which former Googlers find themselves. The key point is that the Google is leaning more heavily on smart software and less on humans. The implication of this decision is that as content flows go up, Google’s costs will rise less quickly than those of outfits such as Associated Content. Costs are the name of the game in my opinion.
Former Googlers are going to find themselves playing in interesting jungle gyms. The insights about information will create what I cool “Cuil situations”; that is, how far from the Googzilla nest with a Xoogler stray? My hunch is that Associated Content may find itself surfing on Google because Associated Content will not have the plumbing that the Google possesses.
Dependent services, by definition, will be subordinate to the core provider. Xooglers may be capping the uplift of their new employers who will find themselves looking at short term benefits, not the long term implications of certain methods.

I think Associated Content will be an interesting company to watch.

Stephen Arnold, July 15

Written by Stephen E. Arnold · Filed Under Business strategy, Google, News, Online (general), Publishing, Text analytics, Text processing | 2 Comments

The Gilbane Lecture: Google Wave as One Environmental Factor

July 14, 2009

Author’s note: In early June 2009, I gave a talk to about 50 attendees of the Gilbane content management systems conference in San Francisco. When I tried to locate the room in which I was to speak, the sign in team could not find me on the program. After a bit of 30 something “we’re sure we’re right” outputs, the organizer of the session located me and got me to the room about five minutes late. No worries because the Microsoft speaker was revved and ready.

When my turn came, I fired through my briefing in 20 minutes and plopped down, expecting no response from the audience. Whenever I talk about the Google, I am greeted with either blank stares or gentle snores. I was surprised because I did get several questions. I may have to start arriving late and recycling more old content. Seems to be a winner formula.

This post is a summary of my comments. I will hit the highlights. If you want more information about this topic, you can get it by searching this Web log for the word “Wave”, buying the IDC report No. 213562 Sue Feldman and I did last September, or buying a copy of Google: The Digital Gutenberg. If you want to grouse about my lack of detail, spare me. This is a free Web log that serves a specific purpose for me. If you are not familiar with my editorial policy, take a moment to get up to speed. Keep in mind I am not a journalist, don’t pretend to be one, and don’t want to be included in the occupational category.

Here’s we go with my original manuscript written in UltraEdit from which I gave my talk on June 5, 2009, in San Francisco:

For the last two years, I have been concluding my Google briefings with a picture of a big wave. I showed the wave smashing a skin cancer victim, throwing surfer dude and surf board high into the air. I showed the surfer dude riding inside the “tube”. I showed pictures of waves smashing stuff. I quite like the pictures of tsunami waves crushing fancy resorts, sending people in sherbert colored shirts and beach wear running for their lives.

Yep, wave.

Now Google has made public why I use the wave images to explain one of the important capabilities Google is developing. Today, I want to review some features of what makes the wave possible. Keep in mind that the wave is a consequence of deeper geophysical forces. Google operates at this deeper level, and most people find themselves dealing with the visible manifestations of the company’s technical physics.

Source: http://www.toocharger.com/fiches/graphique/surf/38525.htm

This is important for enterprise search for three reasons. First, search is a commodity and no one, not even I, find key word queries useful. More sophisticated information retrieval methods are needed on the “surface” and in the deeper physics of the information factory. Second, Google is good at glacial movement. People see incremental actions that are separated in time and conceptual space. Then these coalesce and the competitors say, “Wow, where did that come from?” Google Wave, the present media darling, is a superficial development that combines a number of Google technologies. It is not the deep geophysical force, however. Third, Google has a Stalin-era type of planning horizon. Think in terms of five years, then you have the timeline on which to plot Google developments. Wave, in fact, is more than three years old if you start when Google bought a company called Transformics, older if you dig into the background of the Transformics technology and some other components Google snagged in the last five years. Keep that time thing in mind.

First, key word search is at a dead end. I have been one of the most vocal critics of key word search and variants of that approach. When someone says, “Key word search is what we need,” I reply, “Search is dead.” In my mind, I add, “So is your future in this organization.” I keep my parenthetical comment to myself.

Users need information access, not a puzzle to solve in order to open the information lock box. In fact, we have now entered the era of “data anticipation”, a phrase I borrowed from SAS, the statistics outfit. We have to view search in terms of social analytics because human interactions provide important metadata not otherwise obtainable by search, semantic, or linguistic technology. I will give you an example of this to make this type of metadata crystal clear.

You work at Enron. You get an email about creating a false transaction. You don’t take action but you forward the email to your boss and then ignore the issue. When Enron collapsed, the “fact” that you knew and did nothing when you first knew and subsequently is used to make a case that you abetted fraud. You say, “I sent the email to my boss.” From your prison cell, you keep telling your attorney the same thing. Doesn’t matter. The metadata about what you did to that piece of information through time put your tail feather in a cell with a biker convicted of third degree murder and a prior for aggravated assault.

Got it?

Written by Stephen E. Arnold · Filed Under Conferences, Database, Feature, Google, Online (general), Real time search, Search, Technology, Text analytics, Text processing | Comments Off on The Gilbane Lecture: Google Wave as One Environmental Factor

Overflight for Attensity

July 8, 2009

Short honk: ArnoldIT.com has added Attensity to its Overflight profile service. You can see the auto generated page here. We will be adding additional search and content processing companies to the service. No charge, and this is a version of the service I use when those who hire the addled goose to prepare competitive profiles. I have a list of about 350 search and content processing vendors. I will peck away at this list until my enthusiasm wanes. If you want a for fee analysis of one of these companies, read the About section of this Web log before contacting me. Yep, I charge money for “real” analysis. Some folks expect me to survive on my good looks and charming personality. LOL.

Stephen Arnold, June 8, 2009

Written by Stephen E. Arnold · Filed Under News, Online (general), Technology, Text analytics, Text processing | Comments Off on Overflight for Attensity

Google Gestation Period

July 7, 2009

I went through my notes about the Guha patent documents. These were published in February 2007. BearStearns published my analysis of these documents in May 2007. I am not sure these are available to the public, but I did describe the Programmable Search Engine invention in my Google Version 2.0 study which came out in September 2007. The Google Squared service and its query “digital camera” replicates the exemplary item in the Guha patent document. Several observations:

My 2005 assertion that the Google gestation period is about four years. There is a two year ramp period inside the firm during which time the technology is shaped and then, if deemed patentable, submitted to the USPTO and other patent bodies.
After the patent document is published like the Guha February 2007 PSE patents a two year maturing and deployment process begins.

The appearance of the Google Squared service as a beta marks the Darwinian field testing. The age of semantics is now officially underway. You can read about Google’s methods in my trilogy The Google Legacy (2005), Google Version 2.0 (2007), and Google: The Digital Gutenberg (2009). The 2007 and 2009 studies provide some research data germane to those who want to surf on Google. Yep, that the source of my “wave” analogies and the injunction at the end of my Google talks to “surf on Google”.

What’s next? Wait for my newest monograph on time in search and content. I find it easier to let research and content analysis illuminate the would and could of the GOOG.

Stephen Arnold, June 7, 2009

Written by Stephen E. Arnold · Filed Under Business strategy, Google, News, Semantic, Technology, Text analytics, Text processing | Comments Off on Google Gestation Period

Google and Scientific Tagging

June 28, 2009

In my talk on June 26, 2007 for NFAIS, a question came from one of the participants in the Webcast of my presentation. A person wanted to know if Google Scholar tagged documents with scientific and other types of more formal language. The example was “heart attack” or “myocardial infarction”. I pointed the questioner to Big Google and this query: backpain. Now scroll to the bottom of the page, and you will see these added features:

This is a component of “universal search” so you see videos, categorized results, and the more precise medical term “fibromyalgia”. My point was the Google has the capability of providing these types of added value tags to the content in Google Scholar and to Google Books, for that matter. So far for public access, more sophisticated content processing outputs are not part of these two services; that is, Google Scholar or Google Books. If you know that Google is adding more sophisticated features to these services, please, use the comments section of this Web log to alert me. As Google grows larger and changes, I have a tough time keeping track of Mother Google’s knitting. People do seem to be resonating with the notion of surfing on Google. I have accepted an invitation to give a talk at the Magazine Publishers Association shindig in New York this fall. The topic? Surfing on Google. It’s not nice to fool, Mother Google.

Stephen Arnold, June 28, 2009

Written by Stephen E. Arnold · Filed Under Google, News, Text analytics, Text processing | 1 Comment

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Google Gets Sentimental

Google Relationship Map

IBM Snags SPSS, May Be Bad Timing

Kapow Technologies

Big Data, Big Implications for Microsoft

Software Robots Determine Content Quality

The Gilbane Lecture: Google Wave as One Environmental Factor

Overflight for Attensity

Google Gestation Period

Google and Scientific Tagging

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta