CyberOSINT banner

What Watson Can Do For Your Department

July 6, 2015

The story of Justin Chen, a Finance Manager, is one of many “Stories by Role” now displayed on IBM. Each character has a different job, such as Liza Hay from Marketing, Donny Cruz from IT and Anisa Mirza from HR. Each job comes with a problem for which Watson, IBM’s supercomputer, has just the solution. Justin, the article relates, is having trouble deciding which payments to follow. Watson provides solutions,

“With IBM® Watson™ Analytics, Justin can ask which customers are least likely to pay, who is most likely to pay and why. He can analyze this information… [and] collect more payments more efficiently… With Watson Analytics, Justin can ask which customers are likely to leave and which are likely to stay and why. He can use the answers for analysis of customer attrition and retention, predict the effect on revenue and determine which customer investments will lead to more profitable growth.”

It seems that the now world-famous Watson has been converted from search to a basket containing any number of IBM software solutions. It isn’t stated in the article, but we can probably assume that the revenue from each solution counts toward Watson’s soon to be reported billions in revenue.

Chelsea Kerwin, July 6, 2014

Sponsored by, publisher of the CyberOSINT monograph


Need Semantic Search: Lucidworks Asserts It Is the Answer by Golly

July 3, 2015

If you read this blog, you know that I comment on semantic technology every month or so. In June I pointed to an article which had been tweeted as “new stuff.” Wrong. Navigate to “Semantic Search Hoohah: Hakia”; you will learn that Hakia is a quiet outfit. Quiet as in no longer on the Web. Maybe gone?

There are other write ups in my free and for fee columns about semantic search. The theme has been consistent. My view is that semantic technology is one component in a modern cybernized system. (To learn about my use of the term cyber, navigate to

I find the promotion of search engine optimization as “semantic” amusing. I find the search service firms’ promotion of their semantic expertise amusing. I find the notion of open source outfits deep in hock to venture capitalists asserting their semantic wizardry amusing.

I don’t know if you are quite as amused as I am. Here’s an easy way to determine your semantic humor score. Navigate to this slideshare link and cruise through the 34 deck presentation made by one of Lucidworks’ search mavens. Lucidworks is a company I have followed since it fired up its jets with Marc Krellenstein on board. Dr. Krellenstein ejected in short order, and the company has consumed many venture dollars with management shifts, repositionings, and the Big Data thing.

We now have Lucidworks in the semantic search sector.

Here’s what I learned from the deck:

  1. The company has a new logo. I think this is the third or fourth.
  2. Search is about technology and language. Without Google’s predictive and personalized routines, words are indeed necessary.
  3. Buzzwords and jargon do not make semantic methods simple. Consider this statement from the deck, “Tokenization plus vector mathematics (TF/IDF) or one of its cousins—“bag of words” – Algorithmic tweaks – enhanced bag of words.” Got that, gentle reader. If not, check out “sausagization.”
  4. Lucidworks offers a “field cache.” Okay, I am not unfamiliar with caching in order to goose performance, which can be an issue with some open source search systems. But Searchdaimon, an open source search system developed in Norway, runs circles around Lucidworks. My team did the benchmark test of major open source systems. Searchdaimon was the speed champ and had other sector leading characteristics as well.)
  5. Lucidworks does the ontology thing as well. The tie up of “category nodes” and “evidence nodes” may be one reason the performance goblin noses into the story.

The problem I encountered is that the write up for the slide deck emphasized Fusion as a key component. I have been poking around the “fusion” notion as we put our new study of the Dark Web together. Fusion is a tricky problem and the US government has made fusion a priority. Keep in mind that content is more than text. There are images, videos, geocodes, cryptic tweets in Farsi, and quite a few challenging issues with making content available to a researcher or analyst.

It seems that Lucidworks has cracked a problem which continues to trouble some reasonably sophisticated folks in the content analysis business. Here’s the “evidence” that Lucidworks can do what others cannot:


This diagram shows that after a connector is available, then “pipelines proliferate.” Well, okay.

I thought the goal was to process content objects with low latency, easily, and with semantic value adds. “Lots of stages” and “index pipelines: one way query pipelines: round trip” does not compute for this addled goose.

If the Lucidworks approach makes sense to you go for it. My team and I will stick to here and now tools and open source technology which works without the semantic jargon which is pretty much incidental to the matter. We need to process more than text. CyberOSINT vendors deliver and most use open source search as a utility function. Yep, utility. Not the main event. The failure of semantic search vendors suggests that the buzzword is not the solution to marketing woes. Pop. (That’s a pre fourth of July celebratory ladyfinger.)

Stephen E Arnold, July 3, 2015

Attivio ReachesTop 100 Status

June 29, 2015

The Data Dexterity Company announced the brand new Database Trends and Applications (DBTA) 100 and according to Yahoo Finance, Attivio is now on the list: “Attivio Named By Database Trends Applications To Its Prestigious Top 100 List.”

“We are pleased to be recognized by Database Trends and Applications as one of the most important firms in the data space; it further validates the type of feedback that our customers provide on a daily basis,” said Stephen Baker, CEO of Attivio. “As firms continue to be more reliant on maximizing their data to drive business-critical insights, we expect to play a critical role in driving this type of business innovation.”

Attivio joins the ranks of other companies that have made huge innovations in the data industry; they include EMC, Amazon, IBM, and more.  Attivio is an industry leader in enterprise systems with its intelligence search platform.  Attivio’s search platform enables users to make immediate insights with data visibility.  Attivio has a well-known client use that encompasses such names as National Instruments, Nexen, GE, UBS, and Qualcomm.  The company believes that there are many innovations to be made from all types, not just the type that is easily found in a database.  Attivio uses its search platform to uncover insights in unstructured data that would otherwise be missed by other enterprise search platforms.

We have been following Attivio for many years and by having its name added to DBTA 100 proves it can perform well and deliver useful results.  Enterprise search continues to be an important factor for enterprise systems, though people are often forgetting that today.  Attivio’s addition to the DBTA 100 stresses that not everyone has forgotten.

Whitney Grace, June 29, 2015

Sponsored by, publisher of the CyberOSINT monograph

Oracle Data Integrator Extension

June 29, 2015

The article titled Oracle Launches ODI in April with the Aim to Revolutionize Big Data on Market Realist makes it clear that Oracle sees big money in NoSQL. Oracle Data Integrator, or ODI, enables developers and analysts to simplify their lives and training. It cancels the requirement for their learning multiple programming languages and allows them to use Hadoop and the like without much coding expertise. The article states,

“According to a report from PCWorld, Jeff Pollock, Oracle vice president of product management, said, “The Oracle Data Integrator for Big Data makes a non-Hadoop developer instantly productive on Hadoop…” Databases like Hadoop and Spark are targeted towards programmers who have the coding knowledge expertise required to manipulate these databases with knowledge of the coding needed to manage them. On the other hand, analysts usually use software for data analytics.”

The article also relates some of Oracle’s claims about itself, including that it holds a larger revenue than IBM, Microsoft, SAP AG, and Teradata combined. Those are also Oracle’s four major competitors. With the release of ODI, Oracle intends to filter data arriving from a myriad of different places. Clustering data into groups related by their format or framework is part of this process. The end result is a more streamlined version without assumptions about the level of coding knowledge held by an analyst.

Chelsea Kerwin, June 29, 2014

Sponsored by, publisher of the CyberOSINT monograph

How to Succeed in China: Maybe Follow the Rules?

June 27, 2015

I love articles which explain how to do something to anticipated readers who have zero chance to build a business in China. Navigate to “A New Wave of US Internet Companies Is Succeeding in China—By Giving the Government What It Wants.”

I am not sure a degree in business is required to understand this concept. In my experience, when one is in another country, common sense suggests that the government officials expect outsiders to play by the rules. Ever wonder why West Point cadets look so darned polished. Well, consider the downside associated with wearing an Iron Maiden T shirt, soccer trunks, and flip flops?

The write up points out:

“If you want to develop an internet business in Chinese now, you have to be willing to work with the Chinese government, even if that means censoring content or sharing access to your data,” Ben Cavender, principal at the China Market Research Group, told Quartz.

Outfits who have learned this simple lesson, according to the write up, are LinkedIn, Uber, and Evernote. Outfits who have not figured out the calculus of the West Point approach to order include Facebook, Google, and Twitter. Hey, Facebook is trying. I saw a news item revealing that the Facebook top Facebooker learned sort of Chinese. Yippy.

So which companies have “better” managers? Those in the big market or those looking at the big market?

How does this related to search and content processing? I don’t know of too many information access companies dominating the Chinese market. When it comes to cyberOSINT, there is Knowlesys which sort of operates in Hong Kong and does have an office in China.

Class dismissed. Oh, you with the flip flops, may I have a word with you?

Stephen E Arnold, June 27, 2015

How the Cloud Might Limit SharePoint Functionality

June 25, 2015

In the highly anticipated SharePoint Server 2016, on-premises, cloud, and hybrid functionality are all emphasized. However, some are beginning to wonder if functionality can suffer based on the variety of deployment chosen. Read all the details in the Search Content Management article, “How Does the Cloud Limit SharePoint Search and Integration?”
The article begins:
“All searches are not created equal, and tradeoffs remain for companies mulling deployment of the cloud, on-premises and hybrid versions of Microsoft’s collaboration platform, SharePoint. SharePoint on-premises has evolved over the years with a focus on customization and integration with other internal systems. That is not yet the case in the cloud with SharePoint Online, and there are still unique challenges for those who look to combine the two products with a hybrid approach.”
The article goes on to say that there are certain restrictions, especially with search customization, for the SharePoint Online deployment. Furthermore, a good amount of configuration is required to maximize search for the hybrid version. To keep up to date on how this might affect your organization, and the required workarounds, stay tuned to Stephen E. Arnold is longtime search professional, and his work on SharePoint is conveniently collocated in a dedicated feed to maximize efficiency.
Emily Rae Aldridge, June 25, 2015
Sponsored by, publisher of the CyberOSINT monograph

Publishers Want to Dejuice Apple, Squash It

June 22, 2015

I read “Publishers Slam Apple over Presumptuous News App Conditions.” Publishers presumptuous? I know of one publisher who used my research and marketed it on Amazon without my permission. Was that presumptuous of IDC and its wizard Dave Schubmehl?

According to the write up:

Publishers are up in arms following an email from Apple about inclusion in the firm’s upcoming News application and the kind of conditions that will be imposed. The email said that participants are presumed to have accepted Apple’s terms unless they explicitly opt out. It’s the old opt-out over opt-in thing.

Yes, up in arms. I can see the publishers at the New York Athletic Club wielding their squash rackets with malice. My goodness, what a chilling thought. What if those white clad clubsters were to descend on the Apple store in Manhattan and threaten the geniuses?

My fears subsided when I read:

The service will draw content from publicly available RSS feeds, and it is possible that Apple will be challenged, according to one expert, but not in any really meaningful way.

My concern for a Squash Assault receded. Publishers may have to retire to the Yacht Club to find another option.

Stephen E Arnold, June 22, 2015

Bing Search: Pump Up the Music

June 22, 2015

My approach to online research is to look for information. I know that I have looked up Mozart in the past. Viewed in the aggregate, I look for high technology companies, people involved in high technology, and products what embody technology. Music videos are not what makes my intellectual engines sing along.

I read “Bing Wants to Become the Search Engine of Choice for Music Videos.” Good for Bing. There are many Web pages which exist to be indexed. There are search challenges to resolve. There is the problem of Microsoft index silos. Have you done a query in Bing and wished that there were relevant links to content in Microsoft’s academic index? Well, too bad.

According to the write up:

The update adds larger thumbnails to Bing’s video section, which now displays additional information about each clip, like channel, upload date and view count.  Users also have the option to watch a preview of each clip within their search results, and explore related queries more easily.

The new Microsoft is definitely innovating in search aimed at those who are hungry for music videos. What’s the next innovation? Video games? Online horoscopes? Nail polish colors?

Stephen E Arnold, June 22, 2015


Amazon, Pages, and Research

June 21, 2015

I read “What If Authors Were Paid Every Time Someone Turned a Page.” As you may know, I have complained directly and through my attorney because IDC and its wizard Dave Schubmehl sold a report containing my information on Amazon. The mid tier consulting firm pegged a $3,500 price tag on an eight page report based on my work. Well, as Jack Benny used to say. Well.

The publisher / consultant behavior annoyed me, but I do not sell my content via Amazon. I would prefer to give away a report than get tangled in the Bezos buzz saw. Sure, I buy talcum powder from the Zon, but that’s because the grocery in Harrod’s Creek does not sell any talcum powder. The Zon gets the product to me in a few days. Sometimes.

My thoughts about Amazon ramped up a notch when I read this passage in the article from The Atlantic:

Soon, the maker of the Kindle is going to flip the formula used for reimbursing some of the authors who depend on it for sales. Instead of paying these authors by the book, Amazon will soon start paying authors based on how many pages are read—not how many pages are downloaded, but how many pages are displayed on the screen long enough to be parsed. So much for the old publishing-industry cliché that it doesn’t matter how many people read your book, only how many buy it. For the many authors who publish directly through Amazon, the new model could warp the priorities of writing: A system with per-page payouts is a system that rewards cliffhangers and mysteries across all genres. It rewards anything that keeps people hooked, even if that means putting less of an emphasis on nuance and complexity.

Several observations:

  1. I often buy digital and hard copy books because I need access to a specific passage. I recently ordered a book about law enforcement and the Web. I was interested in two chapters and the bibliographies for this chapter. The notion of paying the author, a police professional, for only those pages I examined rubs me the wrong way. I have the book and I may need to access other chapters at a different point in time. But I want the author to be paid for this very good work. If I understand the write up, Amazon wants to move in a different direction.
  2. When I get a book via Amazon for my Kindle, I thought I could use the book as long as I had the device. Well. (There’s the Benny word again) I have experienced disappearing content. My wife asked me where a title was, I said, “In the archive.” Nope. The title was disappeared. Nifty. I contacted Amazon via a form and heard nothing back. Who got paid? Amazon but I no longer have the digital book. Nifty, but I probably made a mistake or at least that’s what outfits operating like Time Warner-type companies tell me. My fault.
  3. Amazon, like the Google, is faced with cost projections that are likely to give accountants headaches and sleepless nights. Amazon, a digital Wal-Mart type operation, is going to squeezing revenue any way possible. Someone has to pay for the Amazon phone and other Amazon adventures. Same day groceries, anyone?

Net net: No wonder the second hand book stores in Louisville, Kentucky are crowded. Physical books work the way they have for centuries, thank you. You will be able to buy my new study from the electronic store we have set up. The book will even be available in hard copy if a person wants a tangible instance. Maybe I will sell fewer copies. That’s okay. I prefer to avoid being clever and making my work available to anyone who wants to access it. None of that IDC like behavior either. $3,500 for eight pages. Crazy, right?

I often purchase fiction books, read a few pages, and then decide the book is not in my wheel house. I want the author to get paid whether I read every page or not. I think the author wants to get paid as well. The only outfit who doesn’t want to pay may be the Zon.

Stephen E Arnold, June 21, 2015

Trash Organizational Silos? Not So Fast

June 18, 2015

I read “Three things You Need to Break Down Those Company Silos.” Enterprise search vendors have harped on the impossible dream: Federate an organization’s information and data. Make the content available to authorized users.

The reality is a bit different from the cute PowerPoint slides with photos of farm silos and placid bovines.

The article comes at silos is a different, almost Lord of the Rings fantasy way. The write up states:

The title of this feature makes it pretty clear that we think a company operating in a silo mentality is a Bad Thing and that the structure needs to be sorted out…Take the information security function of your business. Can you let individual departments look after their security? Of course not, because they don’t know how to and they won’t do it anyway – particularly the sales function because given the choice of employing a security specialist (who costs money) or a salesperson (who does quite the opposite) the decision is a no-brainer.

Yep, security. Let’s reflect a moment. There are issues in the popular press about the security rupture at the unit of the White House, called Office of Personnel Management. That’s the outfit that kept track of me as a contractor, employees, and Snowden types. Then there is the Anthem health care thing, the Target thing, the hacking of the US Army’s Web site, and, gee, lots of other examples.

Broken silos, like this one, kill folks. One giant silo, if it breaks, may kill lots of farmers. The CBS news crew wisely observed from a helicopter. And silos can burn or explode. Boom. Stay back may be good advice.

What’s this tell us? Talk about security sells consulting work. But the mechanisms within many organizations ignore security. So, silos and security? Yep, these work pretty well in the pharma industry. Some helpful folks in marketing are just not allowed to know who is working on what in which lab, and for good reason. Silos are best implemented by stakeholders. No perceived stake, no security.

Let’s move on.

The article in a somewhat parental fashion tell me what I need, and you too, of course. The suggestions are MBA baloney. A person not in top management is not going to get through to the top dogs. Maybe Bain, BCG, Booz, Allen and McKinsey consultants can communicate at this carpet land level. But my hunch is that most others are going to get a smile and not much else.

I want to take a moment and consider these suggestions. Let’s assume that I am a 25 year old working on a project and I have some “matrix” responsibility for technical quality assurance for a software product.

The article wants me to help the senior managers to understand the big picture. As a 20 something, my concept of a big picture was the 20 inch TV in Sears. When I was 25, I had zero—and I am speaking from the experience of my 50 year work history—idea what the big picture of the company employing me is. I worked at Halliburton Nuclear and Booz, Allen for years. I then moved into senior management at other big outfits. No reasonable senior officer expected me, no matter how clever I was supposed to be, to know what the organization’s big picture is. The two or three men and women at the top, in my experience, struggled with figuring out where on the wall the picture was located. Big was quarterly numbers. Inputs from below are like pellets fired at a military aircraft cruising at 30,000 feet.

The second thing I need, according to the article, is identifying tasks that belong elsewhere. Okay, let’s think about this. I visited a company 10 days ago. The firm had a headquarters which contained computers, products, and people. The company had dozens of offices. As an outsider with decades of business experience, I could—note the word “could”—have told the firm to move to lower cost real estate, migrate the computer systems to Amazon, get rid of full time staff and shift to contract workers whom the company would call when there were tasks to perform, shift suppliers from vendors in Europe to Vendors in Cambodia, etc. What I did was focus on a handful of suggestions that were within the resource capacity of the company. What is the point in telling the three senior managers to do things which the company cannot afford, cannot match to the firm’s business processes, or to the technical capabilities of the staff? If I were 25 and slogging through some fun stuff related to nuclear fuel, I would be unable to identify meaningful actions for our designated Halliburton leader, an impressive fellow named Thomas H. Cruikshank, who when I knew him had not yet become the chairman and CEO of Halliburton Energy Services. I watched, I learned, and I kept my mouth shut. My job was to process nuclear data and do whatever the top dogs told me to do. I found this approach to be quite beneficial to me and my career. When asked, I would formulate a response. Tell top dogs what belonged elsewhere? Nope, not for me.

The third thingI need to do, according to the article, is to do my job well. Okay, easy to say, but for me and the majority of the hundreds of employees I have hired, trained, and managed over the last 50 years, the key is to help people succeed. The “well” stuff is subjective and irrelevant. A script or program works or it does not. Let’s do the works part and tackle the well later or maybe never. The more reliable objective approach is to define tasks so that a specific employee can perform that task, learn along the way, and complete the work so that his or her output is useful to a co worker, a customer, or a friend of the senior vice president’s spouse. Screw ups occur with broad generalizations. “Well” is not the same as completion and feedback and improvement. Excellence results from doing tasks, making errors, adapting, and producing outputs that others can use. If I were a 20 something and my boss told me to do something well, my reaction would have been, “Why did you hire me if you did not think I was [a] bright, [b] a hard working task oriented individual , and [c] committed to doing what I had to do to win the respect of  my co workers and clients?” The question would cause me to lose confidence in that manager.

Let me circle back to enterprise search. For decades vendors took the Fast Search & Transfer approach (other vendors used this method as well). The vendor would say, “We can index all of your organization’s information.” Then the vendor would suggest, “Search will unlock the value of the knowledge in your organization.”


The vendors who took and continue to take this approach are unaware that their customers will quickly learn that the emperor has no clothes. No one wants “all” information available. Do you want your personal health records online and searchable? What about the drafts of the contract for the sale of the unit in Princeton, New Jersey, to a Chinese investment firm? What about the golf scramble data on your laptop which you run as a favor for a pall in the Kiwanis Club of Topeka, Kansas?

Silos are not going away. Silos of information are central to many work processes? Individuals who yap about removing information silos, work silos, or any other kind of silo are trapped within a large, somewhat oily MBA sausage on

Management precepts like Fast Search-type assertions do little to solve some very real, very important business problems. Focus and appropriate control are more helpful that business school saucisson.

Stephen E Arnold, June 18, 2015

Next Page »