CyberOSINT banner

Can Online Systems Discern Truth and Beauty or All That One Needs to Know?

October 14, 2015

Last week I fielded a question about online systems’ ability to discern loaded or untruthful statements in a plain text document. I responded that software is not yet very good at figuring out whether a specific statement is accurate, factual, right, or correct. Google pokes at the problem in a number of ways; for example, assigning a credibility score to a known person. The higher the score, the person may be more likely to be “correct.” I am simplifying, but you get the idea: Recycling a variant of Page Rank and the CLEVER method associated with Jon Kleinberg.

There are other approaches as well, and some of them—dare I suggest, most of them—use word lists. The idea is pretty simple. Create a list of words which have positive or negative connotations. To get fancy, you can work a variation on the brute force Ask Jeeves’ method; that is, cook up answers or statement of facts “known” to be spot on. The idea is to match the input text with the information in these word lists. If you want to get fancy, call these lists and compilations “knowledgebases.” I prefer lists. Humans have to help create the lists. Humans have to maintain the lists. Get the lists wrong, and the scoring system will be off base.

There is quite a bit of academic chatter about ways to make software smart. A recent example is “Sentiment Diffusion of Public Opinions about Hot Events: Based on Complex Network.” In the conclusion to the paper, which includes lots of fancy math, I noticed that the researchers identified the foundation of their approach:

This paper studied the sentiment diffusion of online public opinions about hot events. We adopted the dictionary-based sentiment analysis approach to obtain the sentiment orientation of posts. Based on HowNet and semantic similarity, we calculated each post’s sentiment value and classified those posts into five types of sentiment orientations.

There you go. Word lists.

My point is that it is pretty easy to spot a hostile customer support letter. Just write a script that looks for words appearing on the “nasty list”; for example, consumer protection violation, fraud, sue, etc. There are other signals as well; for example, capital letters, exclamation points, underlined words, etc.

The point is that distorted, shaped, weaponized, and just plain bonkers information can be generated. This information can be gussied up in a news release, posted on a Facebook page, or sent out via Twitter before the outfit reinvents itself.

The researcher, the “real” journalist, or the hapless seventh grader writing a report will be none the wiser unless big time research is embraced. For now, what can be indexed is presented as if the information were spot on.

How do you feel about that? That’s a sentiment question, gentle reader.

Stephen E Arnold, October 14, 2015

Artificial Intelligence: A Jargon Mandala to Understand the Universe of Search

October 12, 2015

I read “Lux: Useful Sankey Diagram on AI.” A Sankey diagram, according to Sankey Diagrams a “Sankey diagram says more than 1,000 pie charts.” The assumption is, of course, that a pie chart presents meaningful data. In the energy sector you can visual flows in complex systems. It helps to have numbers when one is working towards a Sankey map, but if real data are not close at hand, one can fudge up some data.

Here’s the Sankey diagram in the write up:


You can see an almost legible version at this link.

What the diagram suggests is that certain information access and content processing functions flow into data mining, machine learning, and statistics. If you are a fan of multidimensionality, the arrow of time may flow in the reverse direction; that is from data mining, machine learning, and statistics to affective computing, cognitive computing, computational discovery, image and video analytics, language translation, navigation, recommender systems, and speech recognition.

The intermediary state, tinted a US currency green provides intermediating operations or conditions; for example, anomaly detection, collaborative filtering, computer eavesdropping, computer vision, pattern recognition, NLP, path planning, clustering, deep learning, dimensionality reduction, networks graphic models, online reinforcement learning, pattern similarity, probabilistic modeling, regression, and, my favorite, search algorithms.

The diagram, like the wild and crazy chemical imagery for Watson, seems to be a way to:

  1. Collect a number of discrete operations
  2. Arrange the operations into some orderly framework
  3. Allow the viewer to perceive relationships or the potential for relationships among the operations.

In short, skip the wild and crazy presentations by search and content processing vendors about how search enables broader and, hence, more valuable activities. Search is relegated to an entry in the intermediating column of the Sankey diagram.

My thought is that some folks will definitely love the idea that the many different specialties of content processing can be presented in a mandala which invites contemplation and consideration.

The diagram makes clear that when a company wants to know what one can do with the different and often clever operatio0ns one can perform with content, the answer may be, “Make a poster and hang it on the wall.”

In terms of applications, the chart makes quite explicit that some clever team will have to put the parts in order. Does this remind you of building a Star Wars character from Lego blocks.

The construct is the value, not the individual enabling blocks.

Stephen E Arnold, October 12, 2015

Another Categorical Affirmative: Nobody Wants to Invest in Search

October 8, 2015

Gentle readers, I read “Autonomy Poisoned the Well for Businesses Seeking VC Cash.” Keep in mind that I am capturing information which appeared in a UK publication. I find this type of essay interesting and entertaining. Will you? Beats me. One thing is certain. This topic will not be fodder for the LinkedIn discussion groups, the marketers hawking search and retrieval at conferences to several dozen fellow travelers, or in consultant reports promoting the almost unknown laborers in the information access vineyards.

Why not?

The problem with search reaches back a few years, but I will add a bit of historical commentary after I highlight what strikes me as the main point of the write up:

Nobody wants to invest in enterprise search, says startup head. Patrick White, Synata

Many enterprise search systems are a bit like the USS United States, once the slickest ocean liner in the world. The ship looks like a ship, but the effort involved in making it seaworthy is going to be project with a hefty price tag. Implementing enterprise search solutions are similar to this type of ocean-going effort.

There you go. “Nobody.” A categorical in the “category” of logic like “All men are mortal.” Remarkable because outfits like Attivio, Coveo, and Digital Reasoning, among others have received hefty injections of venture capital in recent memory.

The write up makes this interesting point:

“I think Autonomy really messed up [the space]”, and when investors hear ‘enterprise search for the cloud’ it “scares the crap out of them”, he added. “Autonomy has poisoned the well for search companies.” However, White added that Autonomy was just the most high profile example of cases that have scared off investors. “It is unfair just to blame Autonomy. Most VCs have at least one enterprise search in their portfolio. So VCs tend to be skittish about it,” he [added.

I am not sure I agree. Before there was Autonomy, there was Fulcrum Technologies. The company’s marketing literature is a fresh today as it was in the 1990s. The company was up, down, bought, and merged. The story of Fulcrum, at least up to 2009 or so is available at this link.

The hot and cold nature of search and content processing may be traced through the adventures of Convera (formerly Excalibur Technologies) and its relationships with Intel and the NBA, Delphes (a Canadian flame out), Entopia (a we can do it all), and, of course, Fast Search & Transfer.

Now Fast Search, like most old school search technology, is very much with us. For a dose of excitement one can have Search Technologies (founded by some Convera wizards) implement Fast Search (now owned by Microsoft).

Where Are the Former Big Six in Enterprise Search Vendors: 2004 and 2015

Autonomy, now owned by HP and mired in litigation over allegations of financial fraud

Convera, after struggles with Intel and NBA engagements, portions of the company were sold off. Essentially out of business. Alums are consultants.

Endeca, owned by Oracle and sold as an eCommerce and business intelligence service. Oracle gives away its own enterprise search system.

Exalead, owned by Dassault Systèmes and now marketed as a product component system. No visibility in the US.

Fast Search, owned by Microsoft and still available as a utility for SharePoint. The technology dates from the late 1990s. Brand is essentially low profiled at this time.

Verity, Autonomy purchased Verity and used its customer list for upsales and used the K2 technology as part of the sprawling IDOL suite.

Fast Search reported revenues which after an investigation and court procedure were found to be a bit enthusiastic. The founder of Fast Search was the subject of the Norwegian authorities’ attention. You can check out the news reports about the prohibition on work and the sentence handed down for the issues the authorities concluded warranted a slap on the wrist and a tap on the head.

The story of enterprise search has been efforts—sometimes Herculean—to sell information access companies. When a company sells like Vivisimo for about one year’s revenues or an estimated $20 million, there is a sense of getting that mythic task accomplished. IBM, like most of the other acquirers of search technology, try valiantly to convert a utility into something with revenue lift. As I watch the evolution of the lucky exits, my overall impression is that the purchasers realize that search is a utility function. Search can generate consulting and engineering fees, but the customers want more.

That realization leads to the wild and crazy hyper marketing for products like Hewlett Packard’s cloud version of Autonomy’s IDOL and DRE technology or IBM’s embrace of open source search and the wisdom of wrapping that core with functions.

Enterprise search, therefore, is alive and well within applications or solutions that are more directly related to something that speaks to senior managers; namely, making sales and reducing costs.

What’s the cost of making sure the controls for an enterprise search system are working and doing the job the licensee wants done?

The problem is the credit card debt load which Googlers explained quite clearly. Technology outfits, particularly information access players, need more money than it is possible for most firms to generate. This contributes to the crazy flips from search to police analysis, from looking up an entry in a data base to an assertion that customer support is enabled, hunting for an article in this blog is now real time, active business intelligence, or indexing by proper noun like White House morphs into natural language understanding of unstructured text.

Investments are flowing to firms which could be easily positioned as old school search and retrieval operations. Consider Lexmark, a former unit of IBM, and an employer of note not far from my pond filled with mine run off in Kentucky. The company, like Hewlett Packard, wants to find a way to replace its traditional business which was not working as planned as a unit of IBM. Lexmark bought Brainware, a company with patents on trigram methods and a good business for processing content related to legal matters. Lexmark is doing its best to make that into a Trump scale back office content processing business. Lexmark then bought a technology dating from the 1980s (ISYS Search Software once officed in Crow’s Nest I believe) and has made search a cornerstone of the Lexmark next generation health care money spinning machine. Oracle has a number of search properties. Most of these are unknown to Oracle DBAs; for example, Artificial Linguistics, TripleHop, InQuira’s shotgun NLP technology, etc. The point is that the “brands” have not had enough magnetism to pull revenues on a stand alone basis.

Successes measured in investment dollars is not revenue. Palantir is, in effect, a search and retrieval outfit packaged as a super stealthy smart intelligence system. Recorded Future, funded by Google and In-Q-Tel, is doing a bang up job with specialized content processing. There are, remember, search and retrieval companies.

The money in search appears to be made in these plays:

  • The Fast Search model. Short cuts until an investigator puts a stop to the activities.
  • Creating a company and then selling it to a larger firm with a firm conviction that it can turn search into a big time money machine
  • Buying a search vendor to get its customers and opportunities to sell other enterprise software to those customers
  • Creating a super technology play and going after venture funding until a convenient time arrives to cash out
  • Pursue a dream for intelligent software and survive on research grants.

This list does not exhaust what is possible. There are me-too plays. There are mobile niche plays. There are apps which are thinly disguised selective dissemination of information services.

The point is that Autonomy is a member of the search and retrieval club. The company’s revenues came from two principal sources:

  1. Autonomy bought companies like Verity and video indexing and management vendor Virage and then sold other products to these firm’s clients and incorporated some of the acquired technology into products and services which allowed Autonomy to enter a new market. Remember Autonomy and enhanced video ads?
  2. Autonomy managed well. If one takes the time to speak with former Autonomy sales professionals, the message is that life was demanding. Sales professionals including partners had to produce revenue or some face time with the delightful Dr. Michael Lynch or other senior Autonomy executives was arranged.

That’s it. Upselling and intense management for revenues. Hewlett Packard was surprised at the simplicity of the Autonomy model and apparently uncomfortable with the management policies and procedures that Autonomy had been using in highly visible activities for more than a decade as a publicly traded company.

Perhaps some sources of funding will disagree with my view of Autonomy. That is definitely okay. I am retired. My house is paid for. I have no charming children in a private school or university.

The focus should be on what the method for generating revenue is. The technology is of secondary importance. When IBM uses “good enough” open source search, there is a message there, gentle reader. Why reinvent the wheel?

The trick is to ask the right questions. If one does not ask the right questions, the person doing the querying is likely to draw incorrect conclusions and make mistakes. Where does the responsibility rest? When one makes a bad decision?

The other point of interest should be making sales. Stated in different terms, the key question for a search vendor, regardless of camouflage, what problem are you solving? Then ask, “Will people pay money for this solution?”

If the search vendor cannot or will not answer these questions and provide data to be verified, the questioner runs the risk of taking the USS United States for a cruise as soon as you have refurbed the ship, made it seaworthy, and hired a crew.

The enterprise search sector is guilty of making a utility function appear to be a solution to business uncertainty. Why? To make sales. Caveat emptor.

Stephen E Arnold, October 8, 2015

IBM Defines Information Access the Madison Avenue Way

October 7, 2015

Yesterday (October 6, 2015) I wrote a little dialogue about the positioning of IBM as the cognitive computing company. I had a lively discussion at lunch after the story appeared about my suggesting that IBM was making a grand stand play influenced by Madison Avenue thinking, not nuts and bolts realities of making sales and generating revenue.

Well, let’s let IBM rejiggle the line items in its financial statements. That should allow the critics of the company to see that Watson (which is the new IBM) account for IBM revenues. I am okay with that, but for me, the important numbers are the top line revenue and profit. Hey, call me old fashioned.

In the midst of the Gartner talk about IBM, the CNBC exclusive with IBM’s Big Blue dog (maybe just like the Gartner talk and thus not really “exclusive”?), and the wall paper scale ads in the New York Times and Wall Street Journal, there was something important. I don’t think IBM recognizes what it has done for the drifting, financially challenged, and incredibly fragmented search and content processing market. Even the LinkedIn enterprise search discussion group which bristles when I quote Latin phrases to the members of the group will be revivified.


Indexing and groupoiing are useful functions. When applied with judgment, an earthworm of unrelated words and phrases may communicate more effectively.

To wit, this is IBM’s definition of Watson which is search based on Lucene, home brew code, and IBM acquisitions’ software:

Author extraction—Lots of “extraction” functions
Concept expansion
Concept insights—I am not sure I understand the concept functions
Concept tagging—Another concept function
Dialog—Part of NLP maybe
Entity extraction—Extraction
Face detection with the charming acronym F****d—Were the Mad Ave folks having a bit of fun?
Feed detection—Aha, image related
Image Link extraction—Aha, keeping track of urls
Image tagging—Aha, image indexing. I wonder is this is recognition or using information in the file or a caption
Keyword extraction
Language detection
Language translation
Message resonance—No clue here in Harrod’s Creek
Natural language classifier—NLP again
Personality insights—Maybe figuring out what the personality of the author of a processed file means?
Question and answer (I think this is natural language processing which incorporates many other functions in this list)—More NLP
Relationship extraction—IBM has technology from its purchase of i2 which performs this function. How does this work on disparate streams of unstructured content? I have some thoughts
Review and rank—Does this mean relevance ranking?
Sentiment analysis—Yes, is a document with the word F****d in it positive or negative
Speech to text—Seems similar to text to speech
Taxonomy—Ah, ha. A system to generate a list of controlled terms. No humans needed? Nah, humans can be billable and it is an IBM function
Text extraction—Another extraction function
Text to speech
Tone analyzer—So what is the tone of a document containing the string F****d?
Tradeoff analytics—Hmm. Now Watson is doing a type of analytics presumably performed on text? What are the thresholds in the numerical recipe? Do the outputs make sense to a normal human?
Visual recognition—Baffller
Watson news—Is this news about Watson or news presented in Watson via a feed-type mechanism. Phrase does not even sound cool to me.

Now that’s a heck of a list. Notice that the word “search” does not appear in the list. I did not spot the word “semantics” either. Perhaps I was asleep at the switch.

When I was in freshman biology class in 1962, Dr. Daphne Swartz, a very traditional cut ‘em up and study ‘em scientist, lectured for 90 minutes about classification. I remember learning about Aristotle and this dividing organizations into two groups: Plants and animal. I know this is rocket science, but bear with me. There was the charmingly named Carolus Linnaeus, a fan of herring I believe, who cooked up the kingdom, genus, species thing. Then there was, much later, the wild and crazy library crowd which spawned Dewey or, as I named him, Mr. Decimal.

Why is this germane?

It seems to me that IBM’s list of Watson functions needs a bit of organization. In fact, some of the items appear to below to other items; for example: language detection and language translation. More egregious is the broad concept of natural language processing. One could, if one were motivated, argue that entity extraction, text extraction, and keyword extraction might look similar to a non-Watsonian intellect. Dr. Swartz would probably have some constructive criticism to offer.

What’s the purpose of this earthworm list?

Beats me. Makes IBM Watson seem more than Lucene with add ons?

Stephen E Arnold, October 7, 2015

Full Text Search Gets Explained

October 6, 2015

Full text search is a one of the primary functions of most search platform.  If a search platform cannot get full text search right, then it is useless and should be tossed in the recycle bin.    Full text search is such a basic function these days that most people do not know how to explain what it is.  So what is full text?

According to the Xojo article, “Full Text Search With SQLite” provides a thorough definition:

“What is full text searching? It is a fast way to look for specific words in text columns of a database table. Without full text searching, you would typically search a text column using the LIKE command. For example, you might use this command to find all books that have “cat” in the description…But this select actually finds row that has the letters “cat” in it, even if it is in another word, such as “cater”. Also, using LIKE does not make use of any indexing on the table. The table has to be scanned row by row to see if it contains the value, which can be slow for large tables.”

After the definition, the article turns into advertising piece for SQLite and how it improves the quality of full text search.  It offers some more basic explanation, which are not understood by someone unless they have a coding background.   It is a very brief with some detailed information, but could explain more about what SQLite is and how it improves full text search.

Whitney Grace, October 6, 2015
Sponsored by, publisher of the CyberOSINT monograph

The Cricket Cognitive Analysis

September 4, 2015

While Americans scratch their heads at the sport cricket, it has a huge fanbase and not only that, there are mounds of data that can now be fully analyzed says First Post in the article, “The Intersection Of Analytics, Social Media, And Cricket In The Cognitive Era Of Computing.”

According to the article, cricket fans absorb every little bit of information about their favorite players and teams.  Technology advances have allowed the cricket players to improve their game with better equipment and ways to analyze their playing, in turn the fans have a deeper personal connection with the game as this information is released.  For the upcoming Cricket World Cup, Wisden India will provide all the data points for the game and feed them into IBM’s Analytics Engine to improve the game for spectators and the players.

Social media is a huge part of the cricket experience and the article details examples about how it platforms like Twitter are processed through sentimental analysis and IBM Text Analytics.

“What is most interesting to businesses however is that observing these campaigns help in understanding the consumer sentiment to drive sales initiatives. With right business insights in the nick of time, in line with social trends, several brands have come up with lucrative offers one can’t refuse. In earlier days, this kind of marketing required pumping in of a lot of money and waiting for several weeks before one could analyze and approve the commercial success of a business idea. With tools like IBM Analytics at hand, one can not only grab the data needed, assess it so it makes a business sense, but also anticipate the market response.”

While Cricket might be what the article concentrates on, imagine how data analytics are being applied to other popular sports such as American football, soccer, baseball, golf, and the variety of racing popular around the world.

Whitney Grace, September 4, 2015
Sponsored by, publisher of the CyberOSINT monograph

Suggestions for Developers to Improve Functionality for Search

September 2, 2015

The article on SiteCrafting titled Maxxcat Pro Tips lays out some guidelines for improved functionality when it comes deep search. Limiting your Crawls is the first suggestion. Since all links are not created equally, it is wise to avoid runaway crawls on links where there will always be a “Next” button. The article suggests hand-selecting the links you want to use. The second tip is Specify Your Snippets. The article explains,

“When MaxxCAT returns search results, each result comes with four pieces of information: url, title, meta, and snippet (a preview of some of the text found at the link). By default, MaxxCAT formulates a snippet by parsing the document, extracting content, and assembling a snippet out of that content. This works well for binary documents… but for webpages you wanted to trim out the content that is repeated on every page (e.g. navigation…) so search results are as accurate as possible.”

The third suggestion is to Implement Meta-Tag Filtering. Each suggestion is followed up with step-by-step instructions. These handy tips come from a partnering between Sitecrafting is a web design company founded in 1995 by Brian Forth. Maxxcat is a company acknowledged for its achievements in high performance search since 2007.

Chelsea Kerwin, September 2, 2015

Sponsored by, publisher of the CyberOSINT monograph

Thetus Savanna Updated

September 1, 2015

I read “Savanna 4.4 Introduces Application Wide Enhancements for Improved All Source Analysis.” The title of the news release reveals that Thetus is a provider of technology to law enforcement and intelligence entities. The notion of “all source” implies that disparate information can be processed and important signals extracted.

According to the write up:

Savanna 4.4 features include: [1] Geospatial Occurrence visualization: By combining Occurrences (people, organizations, things, places and events), with Map, users are now able to view geospatial data from one or more Occurrences on a Map to visually compare events and places to find trends. [2] Customizable styles with Linknet: In Linknet, a tool to convey networks of people, places, events, and things, customizable nodes allow users to change the look and style of a Linknet to easily pinpoint specific nodes or information. Make your point clear and bring life to your link analysis with this customized styling. [3] Connect and filter events with Timeline: In Timeline, a temporal visualization of Occurrence events, users can filter Timeline data by date and display the connection between events that are common to multiple Occurrences in order to compare and connect events.

For more information, you will need to contact the company. The firm’s Web site provides some suggestions.

Stephen E Arnold, August 30, 2015

Do Search and CMS Deliver a Revenue Winner?

August 21, 2015

I spotted a write up called “Look for Enterprise Search, Analytics and These ECM Leaders for Your Transactional Content.” I found the article darned amazing even for public relations about a mid tier consulting firm and one of its analyses.

The main point of the article is that analysts have analyzed enterprise software and identified vendors who provide “ECM Transactional Content Services.” Fabricating collections of objects and slapping a jargon laded label on the batch is okay with me.


Empty calories await you, gentle reader.

What struck me as interesting was this statement:

Forrester Vice President and Principal Analyst Craig Le Clair points to key advancements and opportunities by the leading ECM providers to help enterprises realize greater value in these systems:

  • Ramping analytics to drive insight and reduce administrative burden
  • Accelerating their move to cloud
  • Improved search and content sharing
  • Using stronger and more open application program interfaces (APIs) that spur innovation
  • Moving quickly to fill gaps in their mobile road maps.

Notice the “ECM”. The acronym refers to software which provides editing, access, and publishing functions to its users. The idea, it seems, is that an employee will write a memo and the ECM will keep track of the document. In practice, based on my experience, the ECM recipe usually fails to satisfy my hunger.

ECM and its close cousins in acronym land are similar to the approach articulated by my kindergarten teacher more than half century ago. She said, according to my mother, “Keep your mittens and lunch in your cubby.” The spirit of the kindergarten teacher lives on in enterprise content management systems.

Unfortunately those who have work to do often create content using tools suited for a specific task. For an engineer, that tool might be Solidworks. Bench chemists are often confused when an ECM is described as the tool for their work. One chemist said to me after an enthusiastic presentation by an information technology person, “I work with chemical structures. What’s this person talking about?” Lawyers in the midst of big risk litigation want to use their own and often flawed document systems.  Even the marketer who cheers for ECM for Web content parks some high value data in that wonderful Adobe creative cloud with some back up data on iCloud. I have spotted a renegade analyst with an off the books workstation equipped with an Australian text processing and search system. is notable for what is not available because executive brand entities roll their own content solutions.

I was able to review a copy of the consultant report upon which the article was based. Wowza. The write up assembled a grad bag of widely disparate companies, added three cups of buzzwords, and output mixed in one kilo of MBAisms.

To be fair, the report identified “challenges.” These items baffled me. For example, “Deep experience in key transactional applications.” This is a challenge, really?

But the vendors in the report are able to “address emerging opportunities.” Okay, so these are not opportunities. The opportunities are emerging. Hmmm. Here’s an example: “Ramping analytics to drive insight and reduce administrative burden.” Yikes. Ramping analytics. Driving analytics. Reducing administrative burden. Very active stuff this ECM. Gerund alert. Gerund alert.

What companies are into this suite of challenges and emerging opportunities? Here’s the list of the mid tier touted stallions from the ECM stable:

  1. EMC, a company which is considering having a subsidiary of itself purchase the parent company. Folks, when a company does this type of recursive stuff, the core business might be a little bit uncertain.
  2. HP. Yep, an outfit which has lost its way, suffered five consecutive quarters of declining revenue, and bought a company for $11 billion and then wrote off most of that expense because the sellers of the company fooled HP, its consultants, accountants, and lawyers. Okay. A winner for the legal eagles maybe.
  3. IBM. Heaven help me. IBM has suffered declining revenues for 13 consecutive quarters, annoyed me with a blizzard of Watson silliness, and spent lots of time getting rid of businesses. I have a difficult time believing that IBM can manage enterprise content. But, hey, that’s just my rural Kentucky ignorance, right?
  4. Laserfiche. The company offers a “flexible, proven enterprise content management system. I believe this statement. The company was founded in 1987 and sure seems to have its roots in well seasoned technology. The company has lots of customers and lots of award. The only hitch in the git along is that I never ran across this outfit in my work. Bad luck I guess.
  5. Lexmark. Folks, let us recall the rumor that Lexmark and its content businesses are not money makers. I heard that the content cluster achieved an astounding $70 to $80 million shortfall. Who knows if this rumor is accurate. I do know that Lexmark is cutting staff, and one does not take this drastic step unless one needs to reduce costs pronto.
  6. M Files. I never heard of this outfit. I did a quick check of my files and learned that the company “helps enterprises find, share, and secure documents and information. Even in highly regulated industries.” The company is also “passionate about productivity.” The outfit relies on dtSearch for information access. This is okay because dtSearch can process most of the content within a Microsoft-centric environment. But M Files strikes me as a different type of outfit from HP or IBM. As I flipped through the information I had collected, the company struck me as a collection of components. Assembly required.
  7. Newgen Software. Another newbie for me. The company was in my Overflight archive. The firm provides BPM (business process management), ECM (enterprise content management), DMS (I have no idea what this acronym means), CCM (I have no idea what this acronym means), and workflow (I thought this was the same as BPM). The company operated from New Delhi. My thought? Another collection of components with assembly in someone’s future.
  8. Hyland OnBase. This is the third outfit on the list about which I have a modest amount of information. The company says that it is a “leader in ECM.” I believe it. The firm’s url is the same as its flagship product. The company was founded in 1991 and created OnBase, which is a plus. After 25 years, the darned thing should work better than a Rube Goldberg solution assembled from a box of components.
  9. OpenText. Okay, OpenText is a company which has more search engines and content processing systems than most Canadian firms. The challenge at OpenText is having enough cash to invest in keeping the diverse assortment of systems current. Which of these systems is the one referenced in the mid tier firm’s report? SGML search, BASIS, BRS, Nstein, the Autonomy stub in RedDot, Nstein, Fulcrum, or some other approach? Details can be important.
  10. Unisys. Okay, finally a company that is essentially an integrator which still supports Burroughs mainframes. Unisys can implement systems because it is an integrator. For government work, Unisys matches the statement of work to available software. Although some might question this statement, Unisys can implement almost any kind of system eventually.

Several observations:

First, enterprise content management is a big and fuzzy concept. The evidence of this is the number of acronyms some of the companies use to explain what they do. I assume that it is my ignorance which prevents me from understanding exactly how scanning, indexing, retrieval, repurposing, workflow, and administrative functions work in a cost constrained, teleworker, mobile gizmo world.

Second, open source is knocking on the door of this sector. At some point, organizations will tire of the cost and complexity of collections of loosely federated and integrated software subsystems and look for an alternative. Toss in the word Big Data, and there will be a stampede of New Age consultants ready to step forward and reinvent these outfits. Disruption is probably less of a challenge than the challenge of keeping existing revenues from doing the HP, IBM, and Lexmark drift down.

Third, the search function seems to be a utility or an after thought. The only problem is that search does not work particularly well in an enterprise where the workers log in from Starbucks and try to interact with enterprise software from a Blackberry.

Fourth, what an odd collection of outfits? HP, IBM, and Lexmark along with 30 year old imaging firms plus some small outfits. Maybe the selection of firms makes sense to you, gentle reader. For me, the report make evident the struggles of some experts in ECM, BPM, and the acronyms I know zero about.

In short, this mid tier report strikes me as a russische punschtorte. On the surface, the darned thing looks good, maybe mouth watering. After a chomp or two, I want a paprikahenderl.

This ECM thing is a confection, not a meaty chicken. Mixing in search does nothing for the recipe.

Stephen E Arnold, August 22, 2015

Quality and Text Processing: An Old Couple Still at the Alter

August 6, 2015

I read “Why Quality Management Needs Text Analytics.” I learned:

To analyze customer quality complaints to find the most common complaints and steer the production or service process accordingly can be a very tedious job. It takes time and resources.

This idea is similar to the one expressed by Ronen Feldman in a presentation he gave in the early 2000s. My notes of the event record that he reviewed the application of ClearForest technology to reports from automobile service professionals which presented customer comments and data about repairs. ClearForest’s system was able to pinpoint that a particular mechanical issue was emerging. The client responded to the signals from the ClearForest system and took remediating action. The point was that sometime in the early 2000s, ClearForest had built and deployed a text analytics system with a quality-centric capability.

I mention this point because many companies are recycling ideas and concepts which are in some cases long beards. ClearForest was acquired by the estimable Thomson Reuters. Some of the technology is available as open source at Calais.

In search and content processing, the case examples, the lingo, and even the technology has entered what I call its “recycling” phase.

I learned about several new search systems this week. I looked at each. One was a portal, another a metasearch system, and a third a privacy centric system with a somewhat modest index. Each was presented as new, revolutionary, and innovative. The reality is that today’s information highways are manufactured from recycled plastic bottles.

Stephen E Arnold, August 6, 2015

« Previous PageNext Page »