Text Analytics SummitPolySpot: Agile Enterprise Search Infrastructure

Amazon, Google, and Android: The Stakes Rise

December 12, 2011

Amazon has added incentive to compete more aggressively with Google. The kids in Mountain View have decided to emulate Amazon’s  Prime free shipping model. Google is also collecting different features to build what looks to me like a similar, Amazon type service. Hey, that’s innovation today. Live with it.

Is Google being exploited? It’s hard to imagine, but Datamation makes the case in “How Amazon Is Making a Sucker Out of Google.” Writer Mike Elgan insists that Google’s anything-goes Android policy, in which any company can use Android for any purpose, gives Amazon the leverage it needs to seriously wound the search giant.

Amazon’s Kindle Fire, which runs a version of Android, is positioned to undercut its competitors. Amazon is literally selling the Fire at a loss, intending to make up the money in easy tablet-based ordering form Amazon.com. This clever bit of manipulation will deal a blow to Android tablet makers and, by extension, to Google. It will also place Kindle Fires in many, many hands, which is where the real trouble starts, according to Elgan. The article asserts:

Amazon sells Kindles in order to sell products and services on the Amazon.com web site. And nearly all these products and services directly compete with Google’s. . . . “The Kindle Fire is the cloudiest of cloud tablets. To use the device is to become a user of Amazon’s cloud services. Cloud storage is free and unlimited for Kindle Fire users, which means there’s no reason to bother with Google’s cloud services.”

Perhaps Elgan is right: Google should play some defense and change its licensing rules before it’s too late? Our view is that Chinese four SIM phone manufacturers will be doing their own thing with Android too. Cat is out of the bag and eating tuna in Seattle.

Cynthia Murrell, December 12, 2011

Sponsored by Pandia.com

Taxonomy: More Marketing Craziness in Play?

December 12, 2011

For whatever reason, I have been picking up rumors, factoids, and complaints about the sales and marketing tactics of various search and content processing vendors. With holidays just around the corner, one would think that in run up to Kwanzaa, Christmas, Hanukkah, and Boxing Day folks would chill.

Ah, Agility!

The first dust up concerns tag lines. At issue is the word “agile”, which is becoming one of more popular terms. I was in a meeting at which a heated discussion about whose search and content processing system is agile. Endeca claims agility. I am not going to dispute that a 13 or 14 year old system is not agile, but in Internet years, there may be some flexibility lost. Run a query for “agile” and “search” and you get a hit to a recruitment firm, a marketing outfit, and something called the Tamilan Search Engine. I also spotted PolySpot, a French infrastructure, solutions, and applications company. The problem is that words are slippery. What are the synonyms for “agile”? I expect to see some of these turning up in 2012. How about gazelle search or spry search?

In though economic times, financial pressures can distort business methods.

Circular Partnerships: Snakes Eating Their Tails

The second dust up concerns partnerships. I have been looking through the list of partners identified by such companies as Microsoft, WAND, and others. What I have discovered is that most of the partners are either household names like IBM or companies I have never heard of. Furthermore, when I dig into the partners’ names unfamiliar to me, I discover companies which are consulting firms or resellers who offer a roster of “stuff.” I understand the importance of amplifying a sales force. A partnership plan is little more than a way to reduce the cost of getting a lead and making a sales call. One of the experts in this game is the struggling giant Thomson Reuters. The company signs up partners when sales flag. In the taxonomy game, the partnerships have another twist. The linkages are circular. Antidot or Modeca points to partners and partners point to other search and content processing vendors which point to the original company. I find this confusing because “partner plays” are gaining momentum among specialist firms. I think the “partner” card is an indication that a search and content processing firm may be beating the bushes to get revenue. Just my opinion, of course.

Today, everything is for sale. Be wary if a pitch sounds too good to be true. Image source: http://asksistermarymartha.blogspot.com/2009_10_01_archive.html

Pitching Automation No Matter the Consequences

The third dust up involves taxonomies and is related to the circular nature of partnerships and financial pressures. Now there is considerable contention in the market with regard to taxonomies. The word “taxonomy” itself is a shuttlecock with software badminton players swinging with abandon. The idea is simple: A hierarchical word list. But with hot new spins like ontology (not to be confused with the branch of metaphysics that deals with the nature of being), metatagging, and categorization.

On one side of the dictionary are those who want the software to discover the concepts, terms, and bound phrases. Then these terms are automatically assigned to content processed by the system. If this sounds like the Bayesian magic associated with Hewlett Packard Autonomy or Recommend, you are on the money. There alternative approaches which have considerable payoff. A good example is the work done by Tim Estes and his team at Digital Reasoning, a firm which received financial goodness from SilverLake Sumeru. The idea is that humans play either a modest role or no role at all. Because of the volume of data flowing through a system, human intermediated systems struggle to keep pace with fluidity of human discourse. On one side, therefore, automation. For simplicity’s sake, let’s call this the Google approach.

On the other side of the dictionary are those who see humans with subject matter expertise playing an important role. The idea, which seems quaint to many of the self appointed experts and azure chip consultants, is that human beings can set up a conceptual scheme, populate it with words, terms, and bound phrases. Thus, armed with a controlled term list, a system can use those terms to index or tag content. The idea has merit because the American National Standards Institute has spelled out guidelines for controlled term lists.

Here’s how the battle shapes up. One one side are the “we don’t need any humans” crowd. In my opinion, some enthusiasts for this no-humans position are TEMIS, Google, and in some cases Autonomy. Many of the automated indexing and tagging systems work quite well when the corpus of content is tightly bounded. What do I mean by “tightly bounded?” Pick up a hard copy of a medical journal about cancer or about nuclear engineering. The vocabulary does not vary too much from article to article within each topic area. In fact, once you learn about 2,000 nuclear terms, you can figure out the basic idea of most nuclear power write ups.

Are some search and content processing vendors taking notice of sales methods associated with used car sales professionals? Even Google is advertising on the “vast wasteland”. Image source: http://www.townhillautosales.com/?24

What happens when you process unbounded content? Well, real life language use is more tricky. Non experts simplify complex ideas, often importing non specialist terms for arcane jargon. Do you know what an ECCS is? Probably not. A “real” journalist or consultant will convert the notion of an emergency core cooling system to something along the lines of a “spare radiator.” Not exactly on the money, but indicative of how precise language is softened. In these situations, it is useful to have a term list of the specialist words, terms, and bound phrases. Subcategories under Cooling Systems can contain the ECCS entry and others. The idea is that content can be assigned certain terms no matter what the words and phrases in the source document may be.

Some companies like TEMIS, Google, and Yandex are not to keen on the human involvement. The reasons range from the cost of getting humans to do index and taxonomy development to an arrogance about how software performs. Wizards see the world in terms of their wizardry which is okay with me. I think it is silly to assume software can handle language with the facility of humans, but I am have some experience with what happens when “good enough” is not.

Other companies like Access Innovations (a former client from days of yore)  and (believe it or not) Dow Jones (a component of the exciting Murdoch organization) believe that humans are important. The humans can develop the lists, set up guidelines or rules for the indexing system to consult, and provide interfaces to allow subject matter experts to adjust the term list and tune the indexing system. The benefit is that the accuracy of the indexing, based on my real life experience, is much better. There is language drift, but there are methods to intervene and correct that drift.

Without a method to adjust to what software is too stupid to see, the indexing “drifts”. The impact of this is not too good. You run a query for a particular snake bite treatment, and you cannot locate the content. The term you use is not assigned by the system and it does not appear in the source document. So what? Well, how about your child dies. Maybe this is an unpleasant thought, but the consequences of lousy indexing and concept assignment are often more serious than not finding a pizza joint in San Jose.

Here’s what one indexing professional told me. I have to mask the name and company to avoid a hassle, but you will get the idea from this comment I captured:

Some companies such as a certain Paris-based company sell expensive software to clients and then leave. People don’t know what to do with it.  So they have an expensive difficult to implement natural language processing systems which could work but are left hanging.  The package from us is the whole thing we are big on total service, follow up training, and getting people implemented and using it without our help but we are there – just a phone call or email away to help and support them. The Paris based company says companies like Access Innovations are not a natural language processing system and although we do have the natural language processing  we don’t make people pay for it separately. With most systems, rules are often needed to achieve more than “good enough” tagging.  Access Innovations, a specialist able to generate ANSI compliant term lists, delivers 85 to 90 percent accuracy. The Paris-based outfit delivers far lower accuracy. Clients don’t understand the issues with low accuracy tagging, findability, and long term system usability.

So What?

What we have, gentle reader, is an example of the automation crowd glossing over the need for human-intermediation solutions. What disturbs me is that the chatter about taxonomy in boot camps, companies which are coming from left field, and self appointed experts is putting the spotlight on indexing and classifying content.

That’s a plus.

The downside is that when the indexing goes off the rails, the user may not be able to find the needed information. That’s why companies like Digital Reasoning and Access Innovations have the ability to deliver automation plus human-intermediated interactions. The licensee suffers when automation goes wrong. The users suffer. The search system vendor may be blamed. Beware the taxonomy vendor spouting glittering generalities about smart software. Usually the “spout” dispenses tainted outputs.

Bottom line: I avoid vendors who present to me the “one true way.” This approach may work when preparing foie gras. For some taxonomy vendors hungry for cash, the traditional, labor intensive methods get in the way of making a quick sale. Unfortunately when humans create language, more traditional methods are often completely appropriate for mission critical indexing tasks. Honk!

Stephen E Arnold, December 12, 2011

Sponsored by Pandia.com

OpenText Social Framework: The Auto-Classification Bump Up

December 12, 2011

Is this perfect timing or has the train already left the station? OpenText has posted the press release, “OpenText Introduces First Auto-Classification Solution with Built-in Transparency and Defensibility.” OpenText’s offering is unique, the company insists, because its statistical sampling and quality assurance ensure innate transparency and defensibility.

Why is automated classification increasingly essential for businesses? The article asserts:

[Managers] are being asked to manage massive volumes of ‘transitory’ or low value social content and emails due to their cost and potential risk. Classification of content is critical because it lets the business know what content to keep and what can be thrown away. Historically, end-users have been asked to classify content, but adoption and accuracy rates have been low, often leaving the organization exposed to expensive eDiscovery requests and penalties.

Automation takes classification out of the hands of mistake-prone humans, and into the OpenText Content Analytics engine, which is based on the work of linguistics experts and promises dramatically improved accuracy.

OpenText has been working in the field of enterprise content management since 1991, and now has clients worldwide. We just hope they aren’t arriving too late to automated classification, an already crowded subset. We will check back on the platform in the near future because the company is busy repositioning itself into a social framework. Just think. A few years ago it was an SGML database, then a document management vendor, a collaboration platform, and now a social framework. Beneath the marketing fog, OpenText remains a $1 billion outfit focusing on selling services and husbanding its integration and research and development expertise. Does the heart of Autonomy beat within RedDot? Is OpenText the new Autonomy? Eerie parallels with one difference: No $10 billion buy out.

Cynthia Murrell, December 12, 2011

Sponsored by Pandia.com

BA Insight Interview

December 11, 2011

Short honk: We overlooked a new interview with Guy Mounier, BA-Insight. If you track the vendors who provide components to extend and enhance Microsoft SharePoint, you may find the interview with BA Insight interesting.

image

You can find the interview at this link. The interview carries the date of September 27, 2011. Our error. At age 67, I lose my pen several times a day.

Stephen E Arnold, December 11, 2011

Obviousness Redux: Google Is the New Microsoft

December 11, 2011

I think our beloved publisher (Stephen E Arnold) discussed the Microsoftization of Google in his 2004 or 2005 monograph, The Google Legacy. But, in today’s world, reinventing the past is perfectly okay. Recycling information is what we do. We do it in this blog. “Steve Jobs Was Right: Google IS Turning Into Microsoft,” declares Business Insider. Writer Matt Rosoff recalls a conversation last spring between Google’s Larry Page and Apple’s beloved Steve Jobs. We learned from the write up:

“Jobs later recounted the conversation to his biographer Walter Isaacson. ’Figure out what Google wants to be when it grows up. It’s now all over the map. What are the five products you want to focus on? Get rid of the rest because they’re dragging you down. They’re turning you into Microsoft.’

We agree with Jobs: three or more tries to get a product right; then, if there’s no traction, kill the project. Page has indeed culled a number of underperforming assets; he also deserves props for  improving employee retention.

However, Rosoff provides a list of ways in which Google still mirrors Microsoft. For example, both Google Search and Windows are “800-pound gorillas,” dominating their markets as well as providing most of their companies’ profits. There are other direct product comparisons: Google+ to Bing; Google Music to Zune; and Android to Xbox to name a few. See the piece for how these pairs relate, and for more similarities.

The trend is not necessarily bad. Rosoff points out that, though Microsoft isn’t on top the way it once was, it’s still a solid and highly profitable company. Google could do worse than to emulate Microsoft. Old news is new news. Oh, and the obvious is as it was seven years ago—obvious. How about this observation? Google+ (Google Plus) is the new search and it doesn’t work for me as well as the “old” Google. News? Probably to Google which is really an online ad agency in the me too business.

Cynthia Murrell, December 11, 2011

Sponsored by Pandia.com

Amazon, Google, and Android: The Stakes Rise

December 11, 2011

Amazon has added incentive to compete more aggressively with Google. The kids in Mountain View have decided to emulate Amazon’s  Prime free shipping model. Google is also collecting different features to build what looks to me like a similar, Amazon type service. Hey, that’s innovation today. Live with it.

Is Google being exploited? It’s hard to imagine, but Datamation makes the case in “How Amazon Is Making a Sucker Out of Google.” Writer Mike Elgan insists that Google’s anything-goes Android policy, in which any company can use Android for any purpose, gives Amazon the leverage it needs to seriously wound the search giant.

Amazon’s Kindle Fire, which runs a version of Android, is positioned to undercut its competitors. Amazon is literally selling the Fire at a loss, intending to make up the money in easy tablet-based ordering form Amazon.com. This clever bit of manipulation will deal a blow to Android tablet makers and, by extension, to Google. It will also place Kindle Fires in many, many hands, which is where the real trouble starts, according to Elgan. The article asserts:

Amazon sells Kindles in order to sell products and services on the Amazon.com web site. And nearly all these products and services directly compete with Google’s. . . . “The Kindle Fire is the cloudiest of cloud tablets. To use the device is to become a user of Amazon’s cloud services. Cloud storage is free and unlimited for Kindle Fire users, which means there’s no reason to bother with Google’s cloud services.”

Perhaps Elgan is right: Google should play some defense and change its licensing rules before it’s too late? Our view is that Chinese four SIM phone manufacturers will be doing their own thing with Android too. Cat is out of the bag and eating tuna in Seattle.

Cynthia Murrell, December 11, 2011

Sponsored by Pandia.com

Attensity: Friday Night Spam Fest

December 10, 2011

Short honk: Here’s an opinion for my one or two readers. I am confused about email marketing from search and content processing companies. Is spam a best practice? Is spam a signal of marketing need or sales desperation? Is spam better than relying on satisfied customers to generate referrals?

ArnoldIT does not do “spam” via email. Whenever I give a talk, I am a veritable Iowa-inspired food production factory of low grade calories. But at age 67, what do you want from a semi retired goose in rural Kentucky?

Here’s the story:

I was at dinner on December 9, 2011, and my wretched mobile device buzzed. I thought I had the gizmo on silent.

Wrong.

A quick look and what do I see, fork paused with a chunk of fried spam half way to my goosely bill. In my opinion, spamming me in Harrod’s Creek during dinner time is the email equivalent of an 800 call from a telemarketer pitching a roof repair deal.

Digital spam. Friday night. Dinner time. Brilliant I suppose.

Here what I received, ruining my appetite for the “real” stuff I was nibbling at the time: “Attensity to Deliver Real Time Audience Analytics on Republican Debate.” Who mailed this missive? sender@attensity.com. Okay, spam mavens, get that email address: sender@attensity.com.

image

What does Attensity promise me as my real spam cools?

Well, the goose is energized. Here’s the low calorie pitch:

The reports will be driven by Attensity’s real-time social analytics solution, which gives organizations the ability to monitor and analyze over 75 million online and social media sources, as well as internal sources such as emails, surveys and communities, and extract business insights from those conversations. The solution is part of Attensity’s award-winning suite of multi-channel customer analytics and response applications.

Believe it or not, Attensity, one of the “leaders” in understanding the “voice of the customer” or sentiment analysis found to evoke sentiment from me. I don’t think about Attensity as a customer support outfit. Nope. Nope. Nope. I think about Attensity’s roots and its more fascinating line of business. Navigate to LinkedIn and learn this:

Welcome to Attensity Government Systems — the broadest suite of semantic applications and engines to help you realize your agency objectives through the power of text. Attensity’s products and solutions, our dedicated government field engineering team, along with our network of defense, consulting, and solutions integrators are delivering results every day to key government agencies in intelligence, law enforcement, civilian service, and defense. By selecting and implementing Attensity’s solutions, these organizations are better understanding and responding to citizen needs, and connecting the dots to prevent terror and crime. Source: http://www.linkedin.com/company/attensity-government-systems

To put this snippet in context, you may find these links helpful.

Who funded Attensity? Lots of folks, including an important government agency? Here’s a link which may be of interest.

Now what’s fascinating is that Attensity is into the voice of the customer thing.

In my own algorithmic method, the Attensity marketing effort gets an “unsatisfactory” for email marketing effectiveness. I assume an azure chip consultant, a former middle school teacher, or a failed search engine optimization expert cooked up this campaign.

Here’s a thought.

Check out the non spamming alternatives to Attensity. You can get sentiment methods from folkslike ExpertSystem, ClearCI.com, the French outfit PolySpot, and Infonic Lexalytics operation, among others.

One nagging question for me: Why is Attensity spamming me on Friday night. 8:33 pm?

Brilliance, desperation, a Hail Mary from a football university in Utah?

Stephen E Arnold, December 10, 2011

Freebie, gentle reader, freebie. The goose’s feathers are ruffled.

Fat Apps Microsoftize Mobile Apps

December 10, 2011

If it seems like a step backward, that’s because it is: Network Computing declares,  “Fat Apps Are Where It’s At.” At least for now.

Writer Mike Fratto makes the case that, in the shift from desktop to mobile, we’re getting ahead of ourselves. Cloud-based applications that run only the user interface on mobile devices are a great way to save space– if you can guarantee constant wireless access to the Web. That’s not happening yet. Wi-Fi is unreliable, and wireless data plans with their data caps can become very expensive very quickly.

Besides, says Fratto, services that aim to place the familiar desktop environment onto mobile devices, like Citrix XenAppor VMware ThinApp, are barking up the wrong tree. The article asserts:

There isn’t the screen real estate available on mobile devices–certainly not on phones–to populate menus and pull downs. . . . But that is how desktop apps are designed. Lots of features displayed for quick access because you have the room to do it while still providing enough screen space to write a document or work on a spreadsheet. Try using Excel as a thin app on your phone or tablet. See how long it takes for you to get frustrated.

So, Fratto proposes “fat apps” as the temporary alternative, applications designed for mobile use with local storage that let you continue to work without a connection. Bloatware is back, at least until we get affordable, universal wireless access worked out. At a conference last week, one firm told me, “Our mobile app for an equipment manufacturer is only two or three gigabytes.” Svelte? Just like Word.

Cynthia Murrell, December 10, 2011

Sponsored by Pandia.com

New People Search: Just What You Wanted

December 10, 2011

Killer Startups reported on a new search engine designed specifically to find people in the post Zopeo.com Online People Search.

Zopeo seeks to be a global white pages and according to it’s website, has a mission to provide the most comprehensive people search on the web and to empower and enable people everywhere to search for their family and friends on the Web. The new system allows users to search by inputting the name of the person that you are looking for, along with the place where you think that they may live.

the article states:

The search will take just a couple of seconds, and if it’s indeed successful then you’ll be learning not just that person’s whereabouts and contact information but also a detailed 20 year history. So, catching up with any friends that had vanished from your life is a piece of cake.

In addition to getting the current address and phone number of long lost friends or family members, you can also use Zopeo to run background checks on potentially shady characters. The background checks are interesting, and if you have not probed an individual, you may want to dive in and check out the butcher, the baker, and the candlestick maker. Zopeo delivers known aliases / maiden names, relatives, current and past roommates, property ownership, nationwide criminal records, bankruptcies, tax liens, civil judgments, assets, Web site ownership, and more. Put on your tin foil hat and give Zopeo a go.

Jasmine Ashton, December 10, 2011

Sponsored by Pandia.com

DARPA’s Open Source Initiatives Require Product Data Management Reevaluation

December 9, 2011

In an effort to curtail out of control spending on defense projects DARPA is looking to the software industry for a how-to model for design.  The article, DARPA’s factory of the future looks like open source development, on Ars Technica, explains the changes coming to the DARPA design teams and manufacturing facilities.  By structuring the design process after successful ventures in engineering time and money will hopefully be saved.  It is hoped that product cycle will be reduced from ten years to two following the new directives.
The article explains the changes as being based on engineering models:
“Correct by construction is an engineering approach used in software engineering and integrated circuit design that uses mathematical models to check the impact each component of a system has on the whole, ensuring that the design falls within certain constraints. DARPA is funding the development of engineering “meta-tools” that would allow engineers to contribute components to a design that would be checked against a set of models, checking for potential unintended integration issues.”
With the success of the engineering industries having previously used open-source and advanced computer imaging it will be no surprise to see DARPA succeed as well.  That causes a new problem, however.  What to do with all the new data coming from a variety of locations and sources.  Product data management needs to be considered before it becomes an issue so that engineers involved in DARPA project can find, re-use and share data quickly and efficiently.

Catherine Lamsfuss, December 9, 2011

« Previous PageNext Page »

  •  Only search links from this page: