CyberOSINT banner

The Dichotomy of SharePoint Migration

May 7, 2015

SharePoint Online gets good reviews, but only from critics and those who are utilizing SharePoint for the first time. Those who are sitting on huge on-premises installations are dreading the move and biding their time. It is definitely an issue stemming from trying to be all things to all people. Search Content Management covers the issue in their article, “Migrating to SharePoint Online is a Tale of Two Realities.”

The article begins:

“Microsoft is paving the way for a future that is all about cloud computing and mobility, but it may have to drag some SharePoint users there kicking and screaming. SharePoint enables document sharing, editing, version control and other collaboration features by creating a central location in which to share and save files. But SharePoint users aren’t ready — or enthused about — migrating to . . . SharePoint Online. According to a Radicati Group survey, only 23% of respondents have deployed SharePoint Online, compared with 77% that have on-premises SharePoint 2013.”

If you need to keep up with how SharePoint Online may affect your organization’s installation, or the best ways to adapt, keep an eye on Stephen E. Arnold is a longtime leader in search and distills the latest tips, tricks, and news on his dedicated SharePoint feed. SharePoint Online is definitely the future of SharePoint, but it cannot afford to get there at the cost of its past users.

Emily Rae Aldridge, May 7, 2015

Sponsored by, publisher of the CyberOSINT monograph


Visual Data Mapper Quid Raises $39M

April 14, 2015

The article on TechCrunch titled Quid Raises $39M More to Visualize Complex Ideas explains the current direction of Quid. Quid, the business analytics company interested in the work of processing vast amounts of data to build visual maps as well as branding and search, has been developing new paths to funding. The article states,

“When we wrote about the company back in 2010, it was focused on tracking emerging technologies, but it seems to have broadened its scope since then. Quid now says it has signed up 80 clients since launching the current platform at the beginning of last year.The new funding was led by Liberty Interactive Corporation, with participation from ARTIS Ventures, Buchanan Investments, Subtraction Capital, Tiger Partners, Thomas H. Lee Limited Family Partnership II, Quid board member Michael Patsalos-Fox…”

Quid also works with such brands as Hyundai, Samsung and Microsoft, and is considered to be unique in its approach to the big picture of tech trends. The article does not provide much information as to what the money is to be used for, unless it is to do with the changes to the website, which was once called the most pretentious of startup websites for its detailed explanation of its primary and secondary typefaces and array of titular allusions.

Chelsea Kerwin, April 14, 2014

Stephen E Arnold, Publisher of CyberOSINT at

Set Data Free from PDF Tables

April 13, 2015

The PDF file is a wonderful thing. It takes up less space than alternatives, and everyone with a computer should be able to open one. However, it is not so easy to pull data from a table within a PDF document. Now, Computerworld informs us about a “Free Tool to Extract Data from PDFs: Tabula.” Created by journalists with assistance from organizations like Knight-Mozilla OpenNews, the New York Times and La Nación DATA, Tabula plucks data from tables within these files. Reporter Sharon Machlis writes:

“To use, download the software from the project website . It runs locally in your browser and requires a Java Runtime Environment compatible with Java 6 or 7. Import a PDF and then select the area of a table you want to turn into usable data. You’ll have the option of downloading as a comma- or tab-separated file as well as copying it to your clipboard.

“You’ll also be able to look at the data it captures before you save it, which I’d highly recommend. It can be easy to miss a column and especially a row when making a selection.”

See the write-up for a video of Tabula at work on a Windows system. A couple caveats: the tool will not work with scanned images. Also, the creators caution that, as of yet, Tabula  works best with simple table formats. Any developers who wish to get in on the project should navigate to its GitHub page here.

Cynthia Murrell, April 13, 2015

Stephen E Arnold, Publisher of CyberOSINT at

Vilocity 2.0 Released by Nuwave

March 17, 2015

The article on Virtual Strategy Magazine titled NuWave Enhances their Vilocity Analytic Framework with Release of Vilocity 2.0 Update promotes the upgraded framework as a mixture of Oracle Business Intelligence Enterprise Edition and Oracle Endeca Information Discovery. The ability to interface across both of these tools as well as include components from both in a single dashboard makes this a very useful program, with capabilities such as exporting to Microsoft to create slideshows, pre-filter and the ability to choose sections of a page and print across both frameworks. The article explains,

“The voices of our Vilocity customers were vital in the Vilocity 2.0 release and we value their input,” says Rob Castle, NuWave’s Chief Technology Officer… The most notable Vilocity deployment NuWave has done is for the U.S. Army EMDS Program. From deployment and through continuous support NuWave has worked closely with this client to communicate issues and identify tools that could improve Vilocity. The Vilocity 2.0 release is a culmination of NuWave’s desire for their clients to be successful.”

It looks like they have found a way to make Endeca useful. Users of the Vilocity Analytic framework will be able to find answers to the right questions as well as make new discoveries. The consistent look and feel of both systems should aid users in getting used to them, and making the most of their new platform.

Chelsea Kerwin, March 17, 2014

Stephen E Arnold, Publisher of CyberOSINT at

EMC: Another Information Sideshow in the Spotlight

January 31, 2015

An information sideshow is enterprise software that presents itself as the motor, transmission, and differential for the organization. Get real. The main enterprise applications are accounting, database management systems, sales management, and systems that manage real stuff (ERP, PLM, etc.)

Applications that purport to manage Web content or organize enterprise wide information and data are important but the functions concern overhead positions except in publishing companies and similar firms.

Since the Web became everyone’s passport to becoming an expert online professional, Web content management systems blossomed and flamed out. Anyone using Broadvision or Sagemaker?

Documentum is a content management system. It is mandated or was mandated as the way to provide information to support the antics of the Food and Drug Administration and some other regulated sectors. The money from FDA’s blessing does not mean that Documentum is in step with today’s digital demands. In fact, for some applications, systems like Documentum are good for the resellers and integrators. Users often have a different point of view. Do you love OpenText, MarkLogic, and other proprietary content management systems? Remember XyVision?

Several years ago, I had a fly over of a large EMC Documentum project. When I was asked to take a look, a US government entity had been struggling for three years to get a Documentum system up and running. I think one of the resellers and consultants was my old pal IBM, which sells its own content management systems, by the way. At the time I was working with the Capitol Police (yep, another one of those LE entities that few people know much about). Think investigation.

I poked around the system, reviewed some US government style documentation, and concluded that in process system would require more investment and time to get up and toddling, not walking, mind you, just toddling. I bailed and worked on projects that sort of really worked mostly in other governmental entities.

After that experience, I realized that “content management” was a bit of a charade, not to different from Web servers and enterprise search. The frenzy for Web stuff made it easy for vendors of proprietary systems to convince organizations to buy bespoke, proprietary content management systems. Wow.

The outfits that are in the business of creating content know about editorial policies. Licensees of content management systems often do not. But publishing expertise is irrelevant to many 20 somethings, failed webmasters, self appointed experts, and confused people looking for a source of money.

The world is chock a block with content management systems. But there is a difference today, and the shift from proprietary systems to open source systems puts vendors of proprietary systems in a world of sales pain. For some outfits, CMS means SharePoint (heaven help me).

For other companies CMS means open source CMS systems. No license fees. No restrictions on changes. But CMS still requires expensive ministrations from CMS experts. Just like enterprise search.

I read “EMC Reports Mixed Results, Fingers Axe: Reduction in Force Planned.” For me this passage jumped out of the article:

The Unified Backup and Recovery segment includes mid-range VNX arrays and it had a storming quarter too, with 2,000 new VNX customers. VCE also added a record number of new customers. RSA grew at a pedestrian rate in the quarter, four per cent year-on-year with the Information Intelligence Group (Documentum, etc) declining eight per cent; this product set has never shone.

So, an eight percent decline. Not good. Like enterprise search, this proprietary content management product has a long sales cycle and after six months of effort, the client may decide to use an open source solution. Joomla anyone? My hunch is that the product set will emit as many sparklies as the soot in my fireplace chimney.

CMS is another category of software for which cyber OSINT method points the way to the future. Automated systems capture what humans do and operate on that content automatically. Allowing humans to index, tag, copy, date, and perform other acts of content violence leads to findability chaos.

In short, EMC Documentum is going to face some tough months. Drupal anyone?

Stephen E Arnold, January 31, 2015

Attivio Highlights Content Intake Issues

November 4, 2014

I read “Digesting Ingestion.” The write up is important because it illustrates how vendors with roots in traditional information retrieval like Attivio are responding to changing market demands.

The article talks about the software required to hook a source like a Web page or a dynamic information source to a content processing and search system. Most vendors provide a number of software widgets to handle frequently encountered file types; for example, Microsoft Word content, HTML Web pages, and Adobe PDF documents. However, when less frequently encountered content types are required, a specialized software widget may be required.

Attivio states:

There are a number of multiplicative factors to consider from the perspective of trying to provide a high-quality connector that works across all versions of a source:

·         The source software version, including patches, optional modules, and configuration

·         Embedded or required 3rd party software (such as a relational database), including version, patches, optional modules and configuration

·         Hardware and operating system version, including patches, optional modules, and configuration

·         Throughput/capacity of the repository APIs

·         Throughput/capacity and ability to operate in parallel.

This is useful information. In a real world example, Attivio reports that a number of other factors can come into play. These range from lacking appropriate computing resources to corrupt data that connectors send to the exception folder and my favorite Big Data.

Attivio is to be credited for identifying these issues. Search-centric vendors have to provide solutions to these challenges. I would point out that there are a number of companies that have leapfrogged search-centric approaches to high volume content intake.

These new players, not the well known companies providing search solutions, are the next generation in information access solutions. Watch for more information about automated collection and analysis of Internet accessible information and the firms redefining information access.

Stephen E Arnold, November 4, 2014

ArnoldIT Search Requirements Video

October 26, 2014

The goslings continue to experiment with short videos. The most recent on is about enterprise search requirements. The four minute YouTube program hits some highlights about the perilous process of licensing an enterprise search system. The video is located at

Donald C Anderson, October 26, 2014

Even Content Marketers React to Pay to Play Allegation

August 18, 2014

I find CMS Wire quite interesting. A number of the articles are by consultants and some seem quite vendor centric. In general, it is a useful way to keep track of what’s hot and what’s not in the world of content management. Like knowledge management or anything with the word “management” in its moniker, I am not sure what these disciplines embrace. Like the equally fuzzy notion of predicative analytics, I find that the aura of meaning often at odds with reality. Whether it is the failure of certain professionals to “predict” problems with the caliphate or whether it focuses on predicting which start up with be the next big thing, the here and now are often slippery, surprising, and, at times, baffling.

Not in “How Vendors learn to Play the Gartner Game.” This is a darned good write up and it introduces a bound phrase I find intellectually satisfying: “the Gartner Game.” I understand Scrabble and checkers. More sophisticated games are beyond my ken. I am not able to play the Gartner Game, but I can enjoy certain aspects of it.

The article explains the game clearly:

Now, in fairness, just because someone gives you a wad of cash — even in the form of extra business — it’s no guarantee you’ll write something favorable. Trust me on this: Back when news was still reported in daily papers and reporters were wooed with more insincerity than a contestant on The Bachelor, it was customary for sources to send gifts.

My own brush with Gartner-like firms was a bit different. I did not expect to see a report with my name on sold on Amazon from late 2012 to July 2014. Why? I provided content/research to IDC, a Gartner competitor. IDC took the information, created reports, and sold those reports. I received no contract. No sales reports. When one of the documents turned up on Amazon, I realized that an IDC expert named Schubmehl was surfing on my work.

I wrote a short commentary about the apparent erosion of certain business practices. In that article, I found a thread connecting the HP problem with the post office, the Google executive’s brush with heroin and a female not involved in Kolmogorov analyses, and IDC’s Schubmehl. In each case, executives made decisions that probably seemed really good at the time. Over time, the decisions proved to be startling. I mean the post office and postage. Horrific. I mean the Google wizard who ended up dead on a yacht while his wife took care of the kids. Professionally clumsy. I mean an “expert” who writes reports taking another person’s information and using it to close information centric deals.

I don’t know much about the world of mid tier consulting firms. I worked for a number of years at a pretty good outfit, Booz, Allen & Hamilton. I did some work for other consulting firms as well. I do not recall a single instance of a failure to pay postage, a colleague flat lining from heroin, or a professional on our team using another’s work or name to make professional hay.

None of these actions surprise me. I am getting older and I suppose I am able to cruise forward in Harrod’s Creek without worrying about the situational decisions that produce some interesting business situations. Exciting stuff this world of mid tier consulting and the unbounded scope of action some executives enjoy. Wow. Postage, heroin, and using another’s name to look informed. Amazing.

I will expand on this notion of “loose governance” in one of my columns. This notion of “governance” is an intriguing topic in knowledge management.

As Einstein said:

“Two things are infinite: the universe and human stupidity; and I’m not sure about the universe.”

Stephen E Arnold, August 15, 2014

Are HP, Google and IDC “Out of Square”?

August 2, 2014

Editor’s note: These three companies are involved in search and content processing. The opinion piece considers the question, “Is management unable to ensure standard business processes working in some businesses today?” Links have been inserted to open source information that puts some of the author’s comments in context. Comments about this essay may be posted using the Comments function for this blog.

Forgetting to Put Postage on Lots of Letters

I read “HP to Pay $32.5 Million to Settle Claims of Overbilling USPS.” (Keep in mind you may have to pony up some cash to access this article. Mr. Murdoch needs cash to buy more media properties. Do your part!)

The main point of the story, told by “real” journalists, is that the company failed “to comply with pricing terms.” The “real” news story asserts:

The DOJ also alleged H-P made misrepresentations during the negotiation of the contract with the USPS regarding its pricing and its plans to ensure it would provide the required most favored customer pricing.

I suppose any company can overlook putting postage on an envelope. When that happened to me in my day of snail mail activity, my local postmistress Claudette would give me a call and I would go to the Harrod’s Creek post office and buy a stamp.

I am no big time manager, but I understood that snail mail required a stamp. If you are a member of the House or Senate, the rules are different, but even the savvy Congressperson makes sure the proper markings appear on the absolutely essential missives.

My mind, which I admit is not as agile as it was when I worked at Halliburton Nuclear Utility Services, drew a dotted line between this seemingly trivial matter of goofing on an administrative procedure and the fantastic events still swirling around Hewlett Packard’s purchase of Autonomy, a vendor of search and content processing software.

A number of questions flapped slowly across my mind:

  1. Is HP management becoming careless with trivial matters like paying $11 billion for a company generating about $800 million in revenue and forgetting to pay the US post office?
  2. Is the thread weaving together such HP events as the mobile operating system affair, the HP tablet, the fumbling of the Alta Vista opportunity, and the apparent administrative goofs like the Autonomy purchase and this alleged postage stamp licking flawed administrative processes?
  3. What does the stamp sticking, Autonomy litigating, and alleged eavesdropping say about the company’s “git ‘er done” approach?

Larry the Cable Guy for President!!!

The attitude may apply to confident senior managers with incentives to produce revenue. Image source:

I don’t think too much about Hewlett Packard. I do wonder if HP is an isolated actor or if companies with search interests are focusing on priorities that seem to be orthogonal to what I understand to be appropriate corporate behavior. One isolated event is highly suggestive.

But what do similar events suggest? In this short essai, I want to summarize two events. Both of these are interesting. For me, I see a common theme connecting the HP stamp licking and the two macro events. The glue fixing these in my mind is what seems to be a failure of management to pay attention to details.

But first, let’s go back in time for a modest effort penned by Edmund Spenser.
Read more

Changing of the Guard at Attensity

April 13, 2014

Attensity provides social analytics and engagement applications for customer relationship management. Their former CEO, J. Kirsten Bay, has left to take the helm of ISC8, a provider of intelligent cyber solution technologies. Street Insider gives the details in their story, “ISC8 Announces CEO Succession.”

The article begins:

“Succeeding Mr. Joll as President and CEO is J. Kirsten Bay, who has joined ISC8 effective March 19, 2014. Kirsten was most recently President and CEO of Attensity Group, a Big Data analytics enterprise software and services company specializing in customer experience management and corporate intelligence, where she restructured the company improving both operating revenue and margin.”

Will this be a moment of growth or turmoil for Attensity? The newest Attensity CEO is Howard Lau, who has connections to SAP and some investment firms. As a venture capitalist, Mr. Lau brings a different emphasis and expertise to Attensity, which might signal a shift.

Emily Rae Aldridge, April 13, 2014

Sponsored by, developer of Augmentext

Next Page »