Search, Not Just Sentiment Analysis, Needs Customization

July 11, 2014

One of the most widespread misperceptions in enterprise search and content processing is “install and search.” Anyone who has tried to get a desktop search system like X1 or dtSearch to do what the user wants with his or her files and network shares knows that fiddling is part of the desktop search game. Even a basic system like Sow Soft’s Effective File Search requires configuring the targets to query for every search in multi-drive systems. The work arounds are not for the casual user. Just try making a Google Search Appliance walk, talk, and roll over without the ministrations of an expert like Adhere Solutions. Don’t take my word for it. Get your hands dirty with information processing’s moving parts.

Does it not make sense that a search system destined for serving a Fortune 1000 company requires some additional effort? How much more time and money will an enterprise class information retrieval and content processing system require than a desktop system or a plug-and-play appliance?

How much effort is required to these tasks? There is work to get the access controls working as the ever alert security manager expects. Then there is the work needed to get the system to access, normalize, and process content for the basic index. Then there is work for getting the system to recognize, acquire, index, and allow a user to access the old, new, and changed content. Then one has to figure out what to tell management about rich media, content for which additional connectors are required, the method for locating versions of PowerPoints, Excels, and Word files. Then one has to deal with latencies, flawed indexes, and dependencies among the various subsystems that a search and content processing system includes. There are other tasks as well like interfaces, work flow for alerts, yadda yadda. You get the idea of the almost unending stream of dependent, serial “thens.”

When I read “Why Sentiment Analysis Engines need Customization”, I felt sad for licensees fooled by marketers of search and content processing systems. Yep, sad as in sorrow.

Is it not obvious that enterprise search and content processing is primarily about customization?

Many of the so called experts, advisors, and vendors illustrate these common search blind spots:

ITEM: Consulting firms that sell my information under another person’s name assuring that clients are likely to get a wild and wooly view of reality. Example: Check out IDC’s $3,500 version of information based on my team’s work. Here’s the link for those who find that big outfits help themselves to expertise and then identify a person with a fascinating employment and educational history as the AUTHOR.

image

See  http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=idc%20attivio

In this example from http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=idc%20attivio, notice that my work is priced at seven times that of a former IDC professional. Presumably Mr. Schubmehl recognized that my value was greater than that of an IDC sole author and priced my work accordingly. Fascinating because I do not have a signed agreement giving IDC, Mr. Schubmehl, or IDC’s parent company the right to sell my work on Amazon.

This screen shot makes it clear that my work is identified as that of a former IDC professional, a fellow from upstate New York, an MLS on my team, and a Ph.D. on my team.

image

See http://amzn.to/1ner8mG.

I assume that IDC’s expertise embraces the level of expertise evident in the TechRadar article. Should I trust a company that sells my content without a formal contract? Oh, maybe I should ask this question, “Should you trust a high  profile consulting firm that vends another person’s work as its own?” Keep that $3,500 price in mind, please.

ITEM: The TechRadar article is written by a vendor of sentiment analysis software. His employer is Lexalytics / Semantria (once a unit of Infonics). He writes:

High quality NLP engines will let you customize your sentiment analysis settings. “Nasty” is negative by default. If you’re processing slang where “nasty” is considered a positive term, you would access your engine’s sentiment customization function, and assign a positive score to the word. The better NLP engines out there will make this entire process a piece of cake. Without this kind of customization, the machine could very well be useless in your work. When you choose a sentiment analysis engine, make sure it allows for customization. Otherwise, you’ll be stuck with a machine that interprets everything literally, and you’ll never get accurate results.

When a vendor describes “natural language processing” with the phrase “high quality” I laugh. NLP is a work in progress. But the stunning statement in this quoted passage is:

Otherwise, you’ll be stuck with a machine that interprets everything literally, and you’ll never get accurate results.

Amazing, a vendor wrote this sentence. Unless a licensee of a “high quality” NLP system invests in customizing, the system will “never get accurate results.” I quite like that categorical never.

ITEM: Sentiment analysis is a single, usually complex component of a search or content processing system. A person on the LinkedIn enterprise search group asked the few hundred “experts” in the discussion group for examples of successful enterprise search systems. If you are a member in good standing of LinkedIn, you can view the original query at this link. [If the link won’t work, talk to LinkedIn. I have no idea how to make references to my content on the system work consistently over time.] I pointed out that enterprise search success stories are harder to find than reports of failures. Whether the flop is at the scale of the HP/Autonomy acquisition or a more modest termination like Overstock’s dumping of a big name system, the “customizing” issues is often present. Enterprise search and content processing is usually:

  • A box of puzzle pieces that requires time, expertise, and money to assemble in a way that attracts and satisfies users and the CFO
  • A work in progress to make work so users are happy and in a manner that does not force another search procurement cycle, the firing of the person responsible for the search and content processing system, and the legal fees related to the invoices submitted by the vendor whose system does not work. (Slow or no payment of licensee and consulting fees to a search vendor can be fatal to the search firm’s health.)
  • A source of friction among those contending for infrastructure resources. What I am driving at is that a misconfigured search system makes some computing work S-L-O_W. Note: the performance issue must be addressed for appliance-based, cloud, or on premises enterprise search.
  • Money. Don’t forget money, please. Remember the CFO’s birthday. Take her to lunch. Be really nice. The cost overruns that plague enterprise search and content processing deployments and operations will need all the goodwill you can generate.

If sentiment analysis requires customizing and money, take out your pencil and estimate how much it will cost to make NLP and sentiment to work. Now do the same calculation for relevancy tuning, index tuning, optimizing indexing and query processing, etc.

The point is that folks who get a basic key word search and retrieval system work pile on the features and functions. Vendors whip up some wrapper code that makes it possible to do a demo of customer support search, eCommerce search, voice search, and predictive search. Once the licensee inks the deal, the fun begins. The reason one major Norwegian search vendor crashed and burned is that licensees balked at paying bills for a next generation system that was not what the PowerPoint slides described. Why has IBM embraced open source search? Is one reason to trim the cost of keeping the basic plumbing working reasonably well? Why are search vendors embracing every buzzword that comes along? I think that search and an enterprise function has become a very difficult thing to sell, make work,  and turn into an evergreen revenue stream.

The TechRadar article underscores the danger for licensees of over hyped systems. The consultants often surf on the expertise of others. The vendors dance around the costs and complexities of their systems. The buzzwords obfuscate.

What makes this article by the Lexalytics’ professional almost as painful as IDC’s unauthorized sale of my search content is this statement:

You’ll be stuck with a machine that interprets everything literally, and you’ll never get accurate results.

I agree with this statement.

Stephen E Arnold, July 11, 2014

Searching News: More and More Difficult

July 10, 2014

An outfit called the Washington Examiner printed “Censorship: 38 Journalism Groups Slam Obama’s Politically-Driven Suppression of News.’” Stories that talk about censorship are difficult to peg on the white board of online information. True, I have noticed that certain documents once easily findable in www.usa.gov have been increasingly difficult to locate. My touchstone example is information about the US government’s RAC, MIC, and ZPIC programs to combat alleged Medicare non compliance. I have stumbled across other examples when querying the Department of Energy’s Web site with routine queries I used when DOE was a cheerleader for the Autonomy IDOL search system.

The “Politically Driven” article is somewhat different. The angle is that “real journalists”—presumably not the type of professionals working at entities like IDC—are not able to get information. The terms “media coverage” and “limiting access to top officials” make it clear that “real” journalists have some gripes; namely:

  • Officials blocking reporters’ requests to talk to specific staff people.
  • Excessive delays in answering interview requests that stretch past reporters’ deadlines.
  • Officials conveying information “on background” — refusing to give reporters what should be public information unless they agree not to say who is speaking.
  • Federal agencies blackballing reporters who write critically of them.

The article points to a “survey” in which “40 percent of public affairs officers admitted they blocked certain reporters because they did not like what they wrote.” Yep, a survey, similar to those cited by some consultancies to “prove” that something is really, really true.

The article concludes with a rousing call to action:

SPJ’s Cuillier told Secrets, “I feel this excessive message management and information control are caused by the professionalization of PR in the bureaucracy — in all levels of government.” And, he added, “It is up to journalists — and citizens — to push back against this force. Hard!”

I find this an interesting statement. What does “push back” mean? If I put on my semantic analysis hat, I can list possible meanings for “push back.”

The point is that news is shaped, sometimes gently, sometimes firmly. In order to determine what is accurate, one must work quite hard. The notion that an individual can ferret out specifics of a particular event by gaining easy access, walking halls, or just showing up flies in the face of my experience.

I have learned that misinformation, disinformation, and reformation are the common currency of professionals today. Forget the problem with US government bureaucracies. These operations survive changes in administration, budget shifts, and policy changes.

Focus instead on individuals who take information, put their name on it, reshape it, and use it to further a narrow agenda. I emphasize in my lectures for the intelligence community that figuring out what is “accurate” is getting more and more difficult.

We are in the grip of a cultural shift in information. Recent examples that make the magnitude of the “accuracy” challenge may be found in these examples:

ITEM: A Google executive dies and is described as a family man as a factoid in an article about a heroin overdose, a person of alleged ill repute, and a yacht. See “Did She Kill Before?

ITEM: A fellow with a fascinating work history puts his name on work done by the ArnoldIT team, sells it for $3,500 a whack on Amazon, and ignores my requests for payment. The person appears to be David Schubmehl, employed by the consulting and publishing firm IDC. Here’s the Amazon listing for my work with my name and that of two of my researchers. Seems just fine, right? I find this shaping of my information interesting because I have not given permission for this material to be sold on Amazon. But who cares about a 70 year old getting trampled by the “real” professionals?

ITEM: WN.com search results for th3 query “Brazil Riots 2014.” A lack of information about the events after Brazil’s loss in Rio flies in the face of the alleged robberies and police actions. See http://wn.com/brazil_riots_2014. Where’s the information, WN.com.

Net net: Anyone who wants accurate information has to work the old fashioned way. Interviews, research, reading, and compilation of factoids from various sources. I am not sure a fuzzy “push back” will have much impact in our present information environment.

For short cuts, one can ask a reporter on the US government beat, the editor at WN.com, or the very, very happy David Schubmehl, research director, where he analyzes the future and surfs on my team’s research.

Exciting times when “real” pros want easy access, a hop over the negative, and a free ride to expertise.

Stephen E Arnold, July 10, 2014

Amazon May Be Disintermediating Publishers: Maybe Good News for Authors?

July 9, 2014

Update: A person asked me who is the IDC “expert.”  The answer is David Schubmehl. His picture on LinkedIn shows him as a very, very happy individual. My photograph shows a quite annoyed 70 year old individual. Whenever I think about this unauthorized reuse of my content now being sold on Amazon, my heart races and I fear the IDC matter is pushing me closer to the “narrow house.” Did William Cullen Bryant use another’s work in “Thanatopsis”? Stephen E Arnold, July 9, 2014 at 4 53 pm

I read “Amazon Angles to Attract Hachette’s Authors to Its Side.” The main point is that Amazon is pro content and anti at least one publisher. Here’s the passage I noted with considerable interest:

Amazon has proposed giving Hachette’s authors all the revenue from their e-book sales on Amazon as the parties continue to negotiate a new contract. Hachette’s response on Tuesday was to suggest that the retailer was trying to make it commit suicide.

Why am I pro Amazon? Well, two UK publishers stiffed me for books I wrote and they published. One annoying outfit is out of business. No loss, believe me. The other is still promoting the book and presumably selling the scintillating monograph called Successful Enterprise Search Management. More recently I reported that IDC, one of the numerous McKinsey / Bain / Boston Consulting chasers published my content under another person’s name. The “expert” whose knowledge derived in part from the work of me and my associates is marketed on Amazon at this link as of July 9,, 2014. Notice that the IDC “original work” carries the hefty price tag of $3,500. (Goodness, I was offered a job at IDC when I worked at Ziff Communications in New York. I passed. I was uncomfortable from the git go with this company.)

image

Verified, July 9, 2014 at Amazon.com. Search for Schubmehl Attivio or IDC Attivio.

I hope Amazon disintermediates any publisher, consulting firm, or knowledge outfit that does not issue contracts, honor copyright, and puts individuals like me in the unenviable position of having my expertise inflate that of another; specifically, an alleged expert named Dave Schubmehl, formerly from the vendor of multiple software written by third parties. I assume that’s what “ramp quickly” means. For more on the shuffle of my work under an IDC’s consultant see http://arnoldit.com/wordpress/wp-admin/post.php?post=40033&action=edit.

Publishers and trust, respect, and appropriate professional behavior in my experience do not go together like peanut butter and jelly.

Go, Amazon. Disintermediate these outfits. And I will gladly split any money from my work 50 50 with you. Amazon has earned my trust. The publishers who have treated me poorly have lost my trust.

Ronald Reagan was correct, “Trust but verify.”

Stephen E Arnold, July 9, 2014

Swimming in a Hadoop Data Lake

July 8, 2014

I read an interview conducted by the consulting firm PWC. The interview appeared with the title “Making Hadoop Suitable for Enterprise Data Science.” The interview struck me as important for two reasons. The questioner and the interview subject introduce a number of buzzwords and business generalizations that will be bandied about in the near future. Second, the interview provides a glimpse of the fish with sharp teeth that swim in what seems to be a halcyon data lake. With Hadoop goodness replenishing the “data pond,” Big Data is a life sustaining force. That’s the theory.

The interview subject is Mike Lang, the CEO of Revelytix. (I am not familiar with Revelytix, and I don’t know how to pronounce the company’s name.) The interviewer is one of those tag teams that high end consulting firms deploy to generate “real” information. Big time consulting firms publish magazines, emulating the McKinsey Quarterly. The idea is that Big Ideas need to be explained so that MBAs can convert information into anxiety among prospects. The purpose of these bespoke business magazines is to close deals and highlight technologies that may be recommended to a consulting firm’s customers. Some quasi consulting firms borrow other people’s work. For an example of this short cut approach, see the IDC Schubmehl write up.

Several key buzzwords appear in the interview:

  • Nimble. Once data are in Hadoop, the Big Data software system, has to be quick and light in movement or action. Sounds very good, especially for folks dealing with Big Data. So with Hadoop one has to use “nimble analytics.” Also, sounds good. I am not sure what a “nimble analytic” is, but, hey, do not slow down generality machines with details, please.
  • Data lakes. These are “pools” of data from different sources. Once data is in a Hadoop “data lake”, every water or data molecule is the same. It’s just like chemistry sort of…maybe.
  • A dump. This is a mixed metaphor, but it seems that PWC wants me to put my heterogeneous data which is now like water molecules in a “dump”. Mixed metaphor is it not? Again. A mere detail. A data lake has dumps or a dump has data lakes. I am not sure which has what. Trivial and irrelevant, of course.
  • Data schema. To make data fit a schema with an old fashioned system like Oracle, it takes time. With a data lake and a dump, someone smashes up data and shapes it. Here’s the magic: “They might choose one table and spend quite a bit of time understanding and cleaning up that table and getting the data into a shape that can be used in their tool. They might do that across three different files in HDFS [Hadoop Distributed File System]. But, they clean it as they’re developing their model, they shape it, and at the very end both the model and the schema come together to produce the analytics.” Yep, magic.
  • Predictive analytics, not just old boring statistics. The idea is that with a “large scale data lake”, someone can make predictions. Here’s some color on predictive analytics: “This new generation of processing platforms focuses on analytics. That problem right there is an analytical problem, and it’s predictive in its nature. The tools to help with that are just now emerging. They will get much better about helping data scientists and other users. Metadata management capabilities in these highly distributed big data platforms will become crucial—not nice-to-have capabilities, but I-can’t-do-my-work-without-them capabilities. There’s a sea of data.”

My take is that PWC is going to bang the drum for Hadoop. Never mind that Hadoop may not be the Swiss Army knife that some folks want it to be. I don’t want to rain on the parade, but Hadoop requires some specialized skills. Fancy math requires more specialized skills. Interpretation of the outputs from data lakes and predictive systems requires even more specialized skills.

No problem as long as the money lake is sufficiently deep, broad, and full.

The search for a silver bullet continues. That’s what makes search and content processing so easy. Unfortunately the buzzwords may not deliver the type of results that inform decisions. Fill that money lake because it feeds the dump.

Stephen E Arnold, July 7, 2014

Trying To Make A Search More Relevant

June 20, 2014

Here is a thought that does not make much sense when taken in the bigger picture scope. PRLog explains the conundrum in “BA Insight To Discuss How To Make Enterprise Search Relevant Through Unified Information Access.” BA Insight’s CTO Jeff Fried and David Schubmehl, a research director at IDC, will host a webinar that shares the same name as the above article. The webinar will discuss how enterprise search technology is lagging:

“Due to the vast explosion of structured and unstructured data, users are experiencing increasing challenges locating and accessing the critical information and expertise needed to excel in their roles. Even the enterprise search technology that has been implemented to resolve these issues is failing to locate relevant information while providing a sub-par user experience. This can have negative consequences, such as the inability to effectively respond to customer queries, widespread duplication of effort, and decreased employee productivity.”

Fried and Schubmehl will focus on how enterprise search is changing, how organizations are driving demand, and how to make enterprise search a killer application. The bigger question is if BA Insight is using this to make their own products more relevant? Has enterprise search really lost its relevancy or is it one observation? The “unified information access” tag is one used by other companies like Sinequa and Attivio. These companies appear to be cut from the same cloth when touting their talents.

Whitney Grace, June 20, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

Attivio is Synonymous with Partnership

December 21, 2013

If you need a business intelligence solution, apparently Attivio is the one stop shop to go. Attivio has formed two strategic partnerships. The Providence Journal announced that “Actian And Attivio OEM Agreement Accelerates Big Data Business Value By Integrating Big Content.” Actian, a big data analytics company, has an OEM agreement with Attivio to use its Active Intelligence Engine (AIE) to ramp their data analytics solution. AIE completes Actian’s goal to deliver analytics on all types of data from social media to surveys to research documents.

The article states:

” ‘Big Content has become a vital piece in the Big Data puzzle,’ said David Schubmehl, Research Director, IDC. ‘The majority of enterprise information created today is human-generated, but legacy systems have traditionally required processing structured data and unstructured content separately. The addition of Attivio AIE to Actian ParAccel provides an extremely cost-effective option that delivers impressive performance and value.’ “

Panorama announced on its official Web site that, “Panorama And Attivio Announce BI Technology Alliance Partnership.” The AIE will be combined with Panorama’s software to improve the business value of content and big data. Panorama’s BI solution will use the AIE to streamline enterprise decision-making processes by eliminating the need to switch between applications to access data. This will speed up business productivity and improve data access.

The article explains:

“ ‘One of the goals of collaborative BI is to connect data, insights and people within the organization,’ said Sid Probstein, CTO at Attivio. ‘The partnership with Panorama achieves this because it gives customers seamless and intuitive discovery of information from sources as varied as corporate BI to semi-structured data and unstructured content.’”

Attivio is a tool used to improve big data projects to enhance usage of data. The company’s strategy to be a base for other solutions to be built on is similar to what Fulcrum Technologies did in 1985.

Whitney Grace, December 21, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Attivio Teams up with Capax Global

September 4, 2013

Attivio has signed up another partner, this time a leader in search. PR Newswire reveals, “Capax Global and Attivio Announce Strategic Reseller Partnership.” The move will help Capax Global’s customers smoothly shift from conventional enterprise search to the more comprehensive unified information access (UIA) approach. The press release quotes Capax Global CEO and managing director John Baiocco:

“We have seen a natural shift towards UIA as our enterprise search customers contend with massive volumes of information, coming from multiple sources, in different formats. Traditional approaches are no longer adequate in dealing with the scale and complexity of enterprise information. Attivio leads the industry in addressing the demands of big data volume, variety, and velocity that our customers face.”

David Schubmehl, research director at analysis firm IDC, also weighs in on the importance of UIA:

“Unified information access is the next logical progression beyond enterprise search as companies face unprecedented volumes of disparate information, of which 85 percent or more is unstructured. Because UIA platforms can integrate large volumes of information across disconnected silos, technologies like AIE have become a key enabler for big data analytics and decision support.”

Founded in 2007 and headquartered in Massachusetts, Attivio also has offices in other U.S. states, the U.K., Germany, and Israel. The company’s award-winning Active Intelligence Engine integrates structured and unstructured data, making it easier to translate information assets into useful business insights.

Capax Global celebrates its 20th birthday this year, making it a veteran in the search field. The privately-held company, based in New York, offers consulting services, custom implementations, and cloud-hosting services. An emphasis on its clients’ unique business objectives is no doubt part of its appeal for its many customers, which include Fortune 500 companies and major organizations around the world.

Cynthia Murrell, September 04, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

HP, Autonomy, and a Context Free Expert Output about Search: The Bet on a Horse Approach to Market Analysis

May 4, 2013

I don’t think too much about:

  1. Azure chip consultants. You know, these are the firms which make a living from rah rahs, buzzwording, and pontification to sell reports. (I know. I labored at a non-azure chip outfit for what seems like decades. Experience is a good instructor. Oh, if you are a consultant, please, complain about my opinion using the comments section of this free blog.)
  2. Hewlett Packard. I recall that the company used to make lab equipment which was cool. Now I think the firm is in some other businesses but as quickly as I latch on to one like the Treo and mobile, HP exits the business. The venerable firm confuses my 69 year old mind.
  3. Autonomy. I think I did some work for the outfit but I cannot recall. Age and the lifestyle in rural Kentucky takes a toll on the memory I admit.

Nevertheless, I read “HP’s Autonomy Could Face Uphill Battle In Data Market.” There were some gems in the write up which I found amusing and illustrative of the problems which azure chip consulting firms and their experts have when tackling certain business issues.

The main idea of the write up for “investors” is that HP faces “challenges.” Okay. That’s a blinding insight. As you may recall, HP bought Autonomy for $11 billion and then a few months later roiled the “investors” by writing off billions on the deal. That was the mobile phone model, wasn’t it?

The write up then pointed out:

HP wanted Autonomy to jump-start its move into software and cloud-based computing. Autonomy is the No. 1 provider of search and retrieval software that companies use to find and share files and other information on their websites and document management systems.

Okay. But that too seems obvious.

Now here comes the kicker. The expert outfit providing inputs to the reporter doing the bull dog grip on this worn out bone is quoted as saying:

“Software license revenue (in this market) isn’t growing at the same rate as before, and we are beginning to see the rise of some new technologies, specifically content analytics and unified information access,” Schubmehl said. These new types of software can be used with types of business analytics software, business intelligence software and other software to help enterprises do a better job of locating specific information, he says, which is the job of search retrieval software.

I don’t know much about IDC but what strikes me from this passage is that there are some assertions in this snippet which may warrant a tiny bit of evaluation.

image

Will context free analyses deliver a winner? Will there be a Gamblers Anonymous for those who bet on what journalists and mid tier (second string) consultancies promulgate? For more about Gamblers Anonymous navigate to http://www.gamblersanonymous.org/ga/

Here goes:

Read more

Google Search Appliance Updates for the Enterprise

November 2, 2012

The shiny new 7.0 version of Google Search Appliance has been updated for the enterprise, now allowing administrators to add information to the cloud, various social media outlets, and other online storage sites. According to the article “Enterprise Tools Added to Google Search Appliance” on PC Advisor, the upgrade includes a new Entity Recognition feature with auto suggestions for searches as well as a document preview feature.

The article tells us why the need for such an update is necessary for the enterprise:

“IDC analyst David Schubmehl said users would like enterprise searches to be as easy as Web searches, noting that slow searches can hurt productivity. A 2009 IDC study found that the time spent searching for data averaged 8.8 hours per week per employee, at a cost of $14,209 per worker per year.”

We believe Google Enterprise offers some great features, including the option for employees to add their own search results to existing results. However, if secure search and access is an enterprise priority for your corporation, then we would recommend a careful examination before opting for Google Enterprise. A company such as Intrafind offers a secure option for searching structured and unstructured enterprise data.

Andrea Hayden, November 2, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

« Previous Page

  • Archives

  • Recent Posts

  • Meta