CyberOSINT banner

The Cricket Cognitive Analysis

September 4, 2015

While Americans scratch their heads at the sport cricket, it has a huge fanbase and not only that, there are mounds of data that can now be fully analyzed says First Post in the article, “The Intersection Of Analytics, Social Media, And Cricket In The Cognitive Era Of Computing.”

According to the article, cricket fans absorb every little bit of information about their favorite players and teams.  Technology advances have allowed the cricket players to improve their game with better equipment and ways to analyze their playing, in turn the fans have a deeper personal connection with the game as this information is released.  For the upcoming Cricket World Cup, Wisden India will provide all the data points for the game and feed them into IBM’s Analytics Engine to improve the game for spectators and the players.

Social media is a huge part of the cricket experience and the article details examples about how it platforms like Twitter are processed through sentimental analysis and IBM Text Analytics.

“What is most interesting to businesses however is that observing these campaigns help in understanding the consumer sentiment to drive sales initiatives. With right business insights in the nick of time, in line with social trends, several brands have come up with lucrative offers one can’t refuse. In earlier days, this kind of marketing required pumping in of a lot of money and waiting for several weeks before one could analyze and approve the commercial success of a business idea. With tools like IBM Analytics at hand, one can not only grab the data needed, assess it so it makes a business sense, but also anticipate the market response.”

While Cricket might be what the article concentrates on, imagine how data analytics are being applied to other popular sports such as American football, soccer, baseball, golf, and the variety of racing popular around the world.

Whitney Grace, September 4, 2015
Sponsored by, publisher of the CyberOSINT monograph

Suggestions for Developers to Improve Functionality for Search

September 2, 2015

The article on SiteCrafting titled Maxxcat Pro Tips lays out some guidelines for improved functionality when it comes deep search. Limiting your Crawls is the first suggestion. Since all links are not created equally, it is wise to avoid runaway crawls on links where there will always be a “Next” button. The article suggests hand-selecting the links you want to use. The second tip is Specify Your Snippets. The article explains,

“When MaxxCAT returns search results, each result comes with four pieces of information: url, title, meta, and snippet (a preview of some of the text found at the link). By default, MaxxCAT formulates a snippet by parsing the document, extracting content, and assembling a snippet out of that content. This works well for binary documents… but for webpages you wanted to trim out the content that is repeated on every page (e.g. navigation…) so search results are as accurate as possible.”

The third suggestion is to Implement Meta-Tag Filtering. Each suggestion is followed up with step-by-step instructions. These handy tips come from a partnering between Sitecrafting is a web design company founded in 1995 by Brian Forth. Maxxcat is a company acknowledged for its achievements in high performance search since 2007.

Chelsea Kerwin, September 2, 2015

Sponsored by, publisher of the CyberOSINT monograph

Thetus Savanna Updated

September 1, 2015

I read “Savanna 4.4 Introduces Application Wide Enhancements for Improved All Source Analysis.” The title of the news release reveals that Thetus is a provider of technology to law enforcement and intelligence entities. The notion of “all source” implies that disparate information can be processed and important signals extracted.

According to the write up:

Savanna 4.4 features include: [1] Geospatial Occurrence visualization: By combining Occurrences (people, organizations, things, places and events), with Map, users are now able to view geospatial data from one or more Occurrences on a Map to visually compare events and places to find trends. [2] Customizable styles with Linknet: In Linknet, a tool to convey networks of people, places, events, and things, customizable nodes allow users to change the look and style of a Linknet to easily pinpoint specific nodes or information. Make your point clear and bring life to your link analysis with this customized styling. [3] Connect and filter events with Timeline: In Timeline, a temporal visualization of Occurrence events, users can filter Timeline data by date and display the connection between events that are common to multiple Occurrences in order to compare and connect events.

For more information, you will need to contact the company. The firm’s Web site provides some suggestions.

Stephen E Arnold, August 30, 2015

Do Search and CMS Deliver a Revenue Winner?

August 21, 2015

I spotted a write up called “Look for Enterprise Search, Analytics and These ECM Leaders for Your Transactional Content.” I found the article darned amazing even for public relations about a mid tier consulting firm and one of its analyses.

The main point of the article is that analysts have analyzed enterprise software and identified vendors who provide “ECM Transactional Content Services.” Fabricating collections of objects and slapping a jargon laded label on the batch is okay with me.


Empty calories await you, gentle reader.

What struck me as interesting was this statement:

Forrester Vice President and Principal Analyst Craig Le Clair points to key advancements and opportunities by the leading ECM providers to help enterprises realize greater value in these systems:

  • Ramping analytics to drive insight and reduce administrative burden
  • Accelerating their move to cloud
  • Improved search and content sharing
  • Using stronger and more open application program interfaces (APIs) that spur innovation
  • Moving quickly to fill gaps in their mobile road maps.

Notice the “ECM”. The acronym refers to software which provides editing, access, and publishing functions to its users. The idea, it seems, is that an employee will write a memo and the ECM will keep track of the document. In practice, based on my experience, the ECM recipe usually fails to satisfy my hunger.

ECM and its close cousins in acronym land are similar to the approach articulated by my kindergarten teacher more than half century ago. She said, according to my mother, “Keep your mittens and lunch in your cubby.” The spirit of the kindergarten teacher lives on in enterprise content management systems.

Unfortunately those who have work to do often create content using tools suited for a specific task. For an engineer, that tool might be Solidworks. Bench chemists are often confused when an ECM is described as the tool for their work. One chemist said to me after an enthusiastic presentation by an information technology person, “I work with chemical structures. What’s this person talking about?” Lawyers in the midst of big risk litigation want to use their own and often flawed document systems.  Even the marketer who cheers for ECM for Web content parks some high value data in that wonderful Adobe creative cloud with some back up data on iCloud. I have spotted a renegade analyst with an off the books workstation equipped with an Australian text processing and search system. is notable for what is not available because executive brand entities roll their own content solutions.

I was able to review a copy of the consultant report upon which the article was based. Wowza. The write up assembled a grad bag of widely disparate companies, added three cups of buzzwords, and output mixed in one kilo of MBAisms.

To be fair, the report identified “challenges.” These items baffled me. For example, “Deep experience in key transactional applications.” This is a challenge, really?

But the vendors in the report are able to “address emerging opportunities.” Okay, so these are not opportunities. The opportunities are emerging. Hmmm. Here’s an example: “Ramping analytics to drive insight and reduce administrative burden.” Yikes. Ramping analytics. Driving analytics. Reducing administrative burden. Very active stuff this ECM. Gerund alert. Gerund alert.

What companies are into this suite of challenges and emerging opportunities? Here’s the list of the mid tier touted stallions from the ECM stable:

  1. EMC, a company which is considering having a subsidiary of itself purchase the parent company. Folks, when a company does this type of recursive stuff, the core business might be a little bit uncertain.
  2. HP. Yep, an outfit which has lost its way, suffered five consecutive quarters of declining revenue, and bought a company for $11 billion and then wrote off most of that expense because the sellers of the company fooled HP, its consultants, accountants, and lawyers. Okay. A winner for the legal eagles maybe.
  3. IBM. Heaven help me. IBM has suffered declining revenues for 13 consecutive quarters, annoyed me with a blizzard of Watson silliness, and spent lots of time getting rid of businesses. I have a difficult time believing that IBM can manage enterprise content. But, hey, that’s just my rural Kentucky ignorance, right?
  4. Laserfiche. The company offers a “flexible, proven enterprise content management system. I believe this statement. The company was founded in 1987 and sure seems to have its roots in well seasoned technology. The company has lots of customers and lots of award. The only hitch in the git along is that I never ran across this outfit in my work. Bad luck I guess.
  5. Lexmark. Folks, let us recall the rumor that Lexmark and its content businesses are not money makers. I heard that the content cluster achieved an astounding $70 to $80 million shortfall. Who knows if this rumor is accurate. I do know that Lexmark is cutting staff, and one does not take this drastic step unless one needs to reduce costs pronto.
  6. M Files. I never heard of this outfit. I did a quick check of my files and learned that the company “helps enterprises find, share, and secure documents and information. Even in highly regulated industries.” The company is also “passionate about productivity.” The outfit relies on dtSearch for information access. This is okay because dtSearch can process most of the content within a Microsoft-centric environment. But M Files strikes me as a different type of outfit from HP or IBM. As I flipped through the information I had collected, the company struck me as a collection of components. Assembly required.
  7. Newgen Software. Another newbie for me. The company was in my Overflight archive. The firm provides BPM (business process management), ECM (enterprise content management), DMS (I have no idea what this acronym means), CCM (I have no idea what this acronym means), and workflow (I thought this was the same as BPM). The company operated from New Delhi. My thought? Another collection of components with assembly in someone’s future.
  8. Hyland OnBase. This is the third outfit on the list about which I have a modest amount of information. The company says that it is a “leader in ECM.” I believe it. The firm’s url is the same as its flagship product. The company was founded in 1991 and created OnBase, which is a plus. After 25 years, the darned thing should work better than a Rube Goldberg solution assembled from a box of components.
  9. OpenText. Okay, OpenText is a company which has more search engines and content processing systems than most Canadian firms. The challenge at OpenText is having enough cash to invest in keeping the diverse assortment of systems current. Which of these systems is the one referenced in the mid tier firm’s report? SGML search, BASIS, BRS, Nstein, the Autonomy stub in RedDot, Nstein, Fulcrum, or some other approach? Details can be important.
  10. Unisys. Okay, finally a company that is essentially an integrator which still supports Burroughs mainframes. Unisys can implement systems because it is an integrator. For government work, Unisys matches the statement of work to available software. Although some might question this statement, Unisys can implement almost any kind of system eventually.

Several observations:

First, enterprise content management is a big and fuzzy concept. The evidence of this is the number of acronyms some of the companies use to explain what they do. I assume that it is my ignorance which prevents me from understanding exactly how scanning, indexing, retrieval, repurposing, workflow, and administrative functions work in a cost constrained, teleworker, mobile gizmo world.

Second, open source is knocking on the door of this sector. At some point, organizations will tire of the cost and complexity of collections of loosely federated and integrated software subsystems and look for an alternative. Toss in the word Big Data, and there will be a stampede of New Age consultants ready to step forward and reinvent these outfits. Disruption is probably less of a challenge than the challenge of keeping existing revenues from doing the HP, IBM, and Lexmark drift down.

Third, the search function seems to be a utility or an after thought. The only problem is that search does not work particularly well in an enterprise where the workers log in from Starbucks and try to interact with enterprise software from a Blackberry.

Fourth, what an odd collection of outfits? HP, IBM, and Lexmark along with 30 year old imaging firms plus some small outfits. Maybe the selection of firms makes sense to you, gentle reader. For me, the report make evident the struggles of some experts in ECM, BPM, and the acronyms I know zero about.

In short, this mid tier report strikes me as a russische punschtorte. On the surface, the darned thing looks good, maybe mouth watering. After a chomp or two, I want a paprikahenderl.

This ECM thing is a confection, not a meaty chicken. Mixing in search does nothing for the recipe.

Stephen E Arnold, August 22, 2015

Quality and Text Processing: An Old Couple Still at the Alter

August 6, 2015

I read “Why Quality Management Needs Text Analytics.” I learned:

To analyze customer quality complaints to find the most common complaints and steer the production or service process accordingly can be a very tedious job. It takes time and resources.

This idea is similar to the one expressed by Ronen Feldman in a presentation he gave in the early 2000s. My notes of the event record that he reviewed the application of ClearForest technology to reports from automobile service professionals which presented customer comments and data about repairs. ClearForest’s system was able to pinpoint that a particular mechanical issue was emerging. The client responded to the signals from the ClearForest system and took remediating action. The point was that sometime in the early 2000s, ClearForest had built and deployed a text analytics system with a quality-centric capability.

I mention this point because many companies are recycling ideas and concepts which are in some cases long beards. ClearForest was acquired by the estimable Thomson Reuters. Some of the technology is available as open source at Calais.

In search and content processing, the case examples, the lingo, and even the technology has entered what I call its “recycling” phase.

I learned about several new search systems this week. I looked at each. One was a portal, another a metasearch system, and a third a privacy centric system with a somewhat modest index. Each was presented as new, revolutionary, and innovative. The reality is that today’s information highways are manufactured from recycled plastic bottles.

Stephen E Arnold, August 6, 2015

Dr. Watson: Concerned about Flabbiness and Sugar

August 5, 2015

The IBM PR attack continues. Today’s installment pits Watson (IBM’s Jeopardy winning, post production blind, Lucene based smart software) against flab and short chain soluble carbohydrates. Think diabetes or worse a visit to the dentist.

Navigate to “Dr. Watson: IBM Plans to Use big Data to Manage Diabetes and Obesity.” The story is not new. Once again IBM is reporting a “team up” deal. I wish the stories about Watson would talk about landing very large contracts with major government entities or Fortune 100 firms. I cannot get excited about old fashioned data mining applications. Sorry. Call me jaded.

The write up states without one whit of skepticism:

This new partnership marks a substantial leap into the healthcare sector for IBM, with CVS joining the likes of Apple and Medtronic as partners of IBM’s growing data service,Watson Health. By partnering up with CVS, Watson will be able to analyze and learn from “an unprecedented mix of health information sources”, including medical records, medical insurance claims and data from smart fitness devices.

I found the notion of the UK’s National Health Service hooking up with IBM an interesting one. Does the NHS have a functioning computer infrastructure? Has the promise of taxonomies delivered something useful to its intended users?

IBM might be able to help with systems. Will Watson remediate the NHS findability challenges? What will NHS pay to get Dr. Watson on the job? Has anyone involved in the Alphr (a former PC oriented outfit?) used Watson?

I don’t think much happens with these Watson stories than recycling what Watson’s team generates with rather amazing regularity.

Where are the billion dollar plus revenues? That is important to me.

Stephen E Arnold, August 5, 2015

Poor IBM i2: 15 Year Old Company Makes Headlines in Fraud Detection and Big Blue Is Not Mentioned

August 3, 2015

Before IBM purchased i2 Ltd from an investment outfit, I did some work for Mike Hunter, one of the founders of i2 Ltd. i2 is not a household name. The fault lies not with i2’s technology; the fault lies at the feet of IBM.

A bit of history. Back in the 1990s, Hunter was working on an advanced degree in physics at Cambridge University. HIs undergraduate degree was from Manchester University. At about the same time, Michael Lynch, founder of Autonomy and DarkTrace, was a graduate of Cambridge and an early proponent of guided machine learning implemented in the Digital Reasoning Engine or DRE, an influential invention from Lynch’s pre Autonomy student research. Interesting product name: Digital Reasoning Engine. Lynch’s work was influential and triggered some me too approaches in the world of information access and content processing. Examples can be found in the original Fast Search & Transfer enterprise systems and in Recommind’s probabilistic approach, among others.

By 2001, i2 had placed its content processing and analytics systems in most of the NATO alliance countries. There were enough i2 Analyst Workbenches in Washington, DC to cause the Cambridge-based i2 to open an office in Arlington, Virginia.

i2 delivered in the mid 1990s, tools which allowed an analyst to identify people of interest, display relationships among these individuals, and drill down into underlying data to examine surveillance footage or look at text from documents (public and privileged).

IBM has i2 technology, and it also owns the Cybertap technology. The combination allows IBM to deploy for financial institutions a remarkable range of field proven, powerful tools. These tools are mature.

Due to the marketing expertise of IBM, a number of firms looked at what Hunter “invented” and concluded that there were whizzier ways to deliver certain functions. Palantir, for example, focused on Hollywood style visualization, Digital Reasoning emphasized entity extraction, and Haystax stressed insider threat functions. Today there are more than two dozen companies involved in what I call the Hunter-i2 market space.

Some of these have pushed in important new directions. Three examples of important innovators are: Diffeo, Recorded Future, and Terbium Labs. There are others which I can name, but I will not. You will have to wait until my new Dark Web study becomes available. (If you want to reserve a copy, send an email to benkent2020 at yahoo dot com. The book will run about 250 pages and cost about $100 when available as a PDF.)

The reason I mention i2 is because a recent Wall Street Journal article called “”Spy Tools Come to Wall Street” Print edition for August 3, 2015) and “Spy Software Gets a Second Life on Wall Street” did not. That’s not a surprise because the Murdoch property defines “news” in an interesting way.

The write up profiles a company called Digital Reasoning, which was founded in 2000 by a clever lad from the University of Virginia. I am confident of the academic excellence of the university because my son graduated from this fine institution too.

Digital Reasoning is one of the firms engaged in cognitive computing. I am not sure what this means, but I know IBM is pushing the concept for its fascinating Watson technology, which can create recipes and cure cancer. I am not sure about generating a profit, but that’s another issue associated with the cognitive computing “revolution.”

I learned:

In pitching prospective clients, Digital Reasoning often shows a demonstration of how its system respo9nded when it was fed 500,000 emails related to the Enron scandal made available by the Federal Energy Regulatory Commission. After being “taught” some key concepts about compliance, the Synthesys program identified dozens of suspicious emails in which participants were using language that suggested attempts to conceal or destroy information.

Interesting. I would suggest that the Digital Reasoning approach is 15 years old; that is, only marginally newer than the i2 system. Digital Reasoning lacks the functionality of Cybertap. Furthermore, companies like Diffeo, Recorded Future, and Terbium incorporate sophisticated predictive methods which operate in an environment of real time information flows. The idea is that looking at an archive is interesting and useful to an attorney or investigator looking backwards. However, the focus for many financial firms is on what is happening “now.”

The Wall Street Journal story reminds me of the third party descriptions of Autonomy’s mid 1990s technology. Those who fail to understand the quantity of content preparation and manual, subject matter expert effort required to obtain high value outputs are watching smoke, not investigating the fire.

For organizations looking for next generation technology which is and has been working for several years, one must push beyond the Palantir valuation and look to the value of innovative systems and methods.

For a starter, check out Diffeo, Recorded Future, and Terbium Labs. Please, push IBM to exert some effort to explain the i2-Cybertap capabilities. I tip my hat to the PR firm which may have synthesized some information for a story that is likely to make the investors’ hearts race this fine day.

Stephen E Arnold, August 3, 2015

Endeca: Facets of Novelty

August 1, 2015

I am no specialist in the arcane art of legal eagle spotting. I did notice some references to a dust up between an outfit called Speedtrack and licensees of Endeca’s ageing search technology.

The Speedtrack outfit seems to have rights to an invention called “Method for Accessing Computer Files and Data, Using Linked Categories Assigned to Each Data File Record on Entry of the Data File Record.” This is explained brilliantly in US5544360, filed in February 1995.

Here’s a diagram showing how the user can click on categories to locate information. No typing required.


Compare this to Endeca’s invention, “Hierarchical Data Driven Navigation System and Method for Information Retrieval.” This is US7062483, filed in 2001. You may also find US7035864 and US7325201 interesting as well.


Federal Circuit Reaffirms Kessler Doctrine As A Patent Infringement Defense For Customers” explains that the Speedtrack infringement case pivots on the Kessler doctrine. Here’s the explanation from the article:

First, unlike res judicata, which is a defense that is personal to the parties in a prior litigation, the Kessler Doctrine “attaches to the [accused] product itself” and precludes a patentee from reasserting the same patent against the same (or “essentially the same”) product in a subsequent action.

Then noted:

Second, the Federal Circuit ruled that the Kessler doctrine may be raised by customers as well as the product manufacturer or supplier.

What I found fascinating was this infringement related statement attributed to the presiding legal eagle:

Third, the Federal Circuit held that the Kessler doctrine applied to Speedtrack’s claim even though the Endeca software allegedly infringed only when combined with the customer’s own computer hardware.

I recall that Endeca’s faceted navigation burst upon the scene in the late 1990s. Who knew that Jerzy Lewak (co founder of Speedtrack), Slawek Grzechnik, and Jon Matousek seemed to be trying to figure out a way around the problem of keyword search before Endeca?

I wonder if Oracle were surprised too. I have a hunch Speedtrack was.

Stephen E Arnold, August 1, 2015

Palantir Sucks in More Dinero

July 24, 2015

I am all for keeping the companies involved with law enforcement and intelligence entities out of the public eye. The hoo hah about Hacking Team is a grim reminder of what happened to Gamma Group and FinFisher when information about their services and products hit the “real” journalists’ radar.

I want to point you to “Confirmed. Palantir Raise a Huge $450 Million Investment.” The write up points out:

This [more cash investments] confirms a report last month that the company was raising up to $500 million at a valuation of $20 billion – making it the third most valuable “startup” on the Valley scene. (If you can call a 16-year-old company that reportedly generates millions in revenue a “startup.”)

Palantir is a unicorn wearing an invisibility saddle, tack, and saddle blanket. That’s okay with me. My observation is that Palantir has technology which is intended to prevent untoward acts. Are these untoward acts being prevented? I will let you answer that question.

I have no comment on whether the Palantir technology works. Even court documents related to Palantir’s dust up with i2 Group Ltd (a former client of mine) are not public. Why would i2, the pioneer in Palantir’s software segment, get involved with legal eagles?

Perhaps someone will have an answer some day. For now, I will ignore the partially invisible unicorn. The company has plenty of stakeholders who are trying to figure out Palantir so my efforts are redundant.

Stephen E Arnold, July 24, 2015

Real Journalists and Presstitution

July 24, 2015

I read and enjoyed an article for one word: “presstitute.” You can see the word in context in “Are Media Companies One Native Ad Away from Becoming Presstitutes.” Perhaps the word “native” is not clear? Inclusions, inserts, or paid advertorials will make the meaning of native clear.

The idea is that “real” journalists were before the eye opening days of yellow journalism were objective. Messrs. Pulitzer and Hearst were like Mark Zuckerberg and Larry Page more than a century ago.

Flash forward to the present and the “real” journalists are struggling to make their well honed business model work in a world of iPhones and Instagram.

Read the original essay. You get some dancing around the May pole, but the article is significant because of the word “presstitute” in my opinion. That’s a business model with legs. No comment about whether the legs are comely, hirsute, appropriate, or inappropriate from me, however.

Stephen E Arnold, July 214, 2015

Next Page »