CyberOSINT banner

Advice for Smart SEO Choices

August 11, 2015

We’ve come across a well-penned article about the intersection of language and search engine optimization by The SEO Guy. Self-proclaimed word-aficionado Ben Kemp helps website writers use their words wisely in, “Language, Linguistics, Semantics, & Search.” He begins by discrediting the practice of keyword stuffing, noting that search-ranking algorithms are more sophisticated than some give them credit for. He writes:

“Search engine algorithms assess all the words within the site. These algorithms may be bereft of direct human interpretation but are based on mathematics, knowledge, experience and intelligence. They deliver very accurate relevance analysis. In the context of using related words or variations within your website, it is one good way of reinforcing the primary keyword phrase you wish to rank for, without over-use of exact-match keywords and phrases. By using synonyms, and a range of relevant nouns, verbs and adjectives, you may eliminate excessive repetition and more accurately describe your topic or theme and at the same time, increase the range of word associations your website will rank for.”

Kemp goes on to lament the dumbing down of English-language education around the world, blaming the trend for a dearth of deft wordsmiths online. Besides recommending that his readers open a thesaurus now and then, he also advises them to make sure they spell words correctly, not because algorithms can’t figure out what they meant to say (they can), but because misspelled words look unprofessional. He even supplies a handy list of the most often misspelled words.

The development of more and more refined search algorithms, it seems, presents the opportunity for websites to craft better copy. See the article for more of Kemp’s language, and SEO, guidance.

Cynthia Murrell, August 11, 2015

Sponsored by, publisher of the CyberOSINT monograph


Chrome Restricts Extensions amid Security Threats

June 22, 2015

Despite efforts to maintain an open Internet, malware seems to be pushing online explorers into walled gardens, akin the old AOL setup. The trend is illustrated by a story at PandoDaily, “Security Trumps Ideology as Google Closes Off its Chrome Platform.” Beginning this July, Chrome users will only be able to download extensions for that browser  from the official Chrome Web Store. This change is on the heels of one made in March—apps submitted to Google’s Play Store must now pass a review. Extreme measures to combat an extreme problem with malicious software.

The company tried a middle-ground approach last year, when they imposed the our-store-only policy on all users except those using Chrome’s development build. The makers of malware, though, are adaptable creatures; they found a way to force users into the development channel, then slip in their pernicious extensions. Writer Nathanieo Mott welcomes the changes, given the realities:

“It’s hard to convince people that they should use open platforms that leave them vulnerable to attack. There are good reasons to support those platforms—like limiting the influence tech companies have on the world’s information and avoiding government backdoors—but those pale in comparison to everyday security concerns. Google seems to have realized this. The chaos of openness has been replaced by the order of closed-off systems, not because the company has abandoned its ideals, but because protecting consumers is more important than ideology.”

Better safe than sorry? Perhaps.

Cynthia Murrell, June 22, 2015

Sponsored by, publisher of the CyberOSINT monograph

Free Book from OpenText on Business in the Digital Age

May 27, 2015

This is interesting. OpenText advertises their free, downloadable book in a post titled, “Transform Your Business for a Digital-First World.” Our question is whether OpenText can transform their own business; it seems their financial results have been flat and generally drifting down of late. I suppose this is a do-as-we-say-not-as-we-do situation.

The book may be worth looking into, though, especially since it passes along words of wisdom from leaders within multiple organizations. The description states:

“Digital technology is changing the rules of business with the promise of increased opportunity and innovation. The very nature of business is more fluid, social, global, accelerated, risky, and competitive. By 2020, profitable organizations will use digital channels to discover new customers, enter new markets and tap new streams of revenue. Those that don’t make the shift could fall to the wayside. In Digital: Disrupt or Die, a multi-year blueprint for success in 2020, OpenText CEO Mark Barrenechea and Chairman of the Board Tom Jenkins explore the relationship between products, services and Enterprise Information Management (EIM).”

Launched in 1991, OpenText offers tools for enterprise information management, business process management, and customer experience management. Based in Waterloo, Ontario, the company maintains offices around the world.

Cynthia Murrell, May 27, 2015

Sponsored by, publisher of the CyberOSINT monograph

The Dichotomy of SharePoint Migration

May 7, 2015

SharePoint Online gets good reviews, but only from critics and those who are utilizing SharePoint for the first time. Those who are sitting on huge on-premises installations are dreading the move and biding their time. It is definitely an issue stemming from trying to be all things to all people. Search Content Management covers the issue in their article, “Migrating to SharePoint Online is a Tale of Two Realities.”

The article begins:

“Microsoft is paving the way for a future that is all about cloud computing and mobility, but it may have to drag some SharePoint users there kicking and screaming. SharePoint enables document sharing, editing, version control and other collaboration features by creating a central location in which to share and save files. But SharePoint users aren’t ready — or enthused about — migrating to . . . SharePoint Online. According to a Radicati Group survey, only 23% of respondents have deployed SharePoint Online, compared with 77% that have on-premises SharePoint 2013.”

If you need to keep up with how SharePoint Online may affect your organization’s installation, or the best ways to adapt, keep an eye on Stephen E. Arnold is a longtime leader in search and distills the latest tips, tricks, and news on his dedicated SharePoint feed. SharePoint Online is definitely the future of SharePoint, but it cannot afford to get there at the cost of its past users.

Emily Rae Aldridge, May 7, 2015

Sponsored by, publisher of the CyberOSINT monograph


Visual Data Mapper Quid Raises $39M

April 14, 2015

The article on TechCrunch titled Quid Raises $39M More to Visualize Complex Ideas explains the current direction of Quid. Quid, the business analytics company interested in the work of processing vast amounts of data to build visual maps as well as branding and search, has been developing new paths to funding. The article states,

“When we wrote about the company back in 2010, it was focused on tracking emerging technologies, but it seems to have broadened its scope since then. Quid now says it has signed up 80 clients since launching the current platform at the beginning of last year.The new funding was led by Liberty Interactive Corporation, with participation from ARTIS Ventures, Buchanan Investments, Subtraction Capital, Tiger Partners, Thomas H. Lee Limited Family Partnership II, Quid board member Michael Patsalos-Fox…”

Quid also works with such brands as Hyundai, Samsung and Microsoft, and is considered to be unique in its approach to the big picture of tech trends. The article does not provide much information as to what the money is to be used for, unless it is to do with the changes to the website, which was once called the most pretentious of startup websites for its detailed explanation of its primary and secondary typefaces and array of titular allusions.

Chelsea Kerwin, April 14, 2014

Stephen E Arnold, Publisher of CyberOSINT at

Set Data Free from PDF Tables

April 13, 2015

The PDF file is a wonderful thing. It takes up less space than alternatives, and everyone with a computer should be able to open one. However, it is not so easy to pull data from a table within a PDF document. Now, Computerworld informs us about a “Free Tool to Extract Data from PDFs: Tabula.” Created by journalists with assistance from organizations like Knight-Mozilla OpenNews, the New York Times and La Nación DATA, Tabula plucks data from tables within these files. Reporter Sharon Machlis writes:

“To use, download the software from the project website . It runs locally in your browser and requires a Java Runtime Environment compatible with Java 6 or 7. Import a PDF and then select the area of a table you want to turn into usable data. You’ll have the option of downloading as a comma- or tab-separated file as well as copying it to your clipboard.

“You’ll also be able to look at the data it captures before you save it, which I’d highly recommend. It can be easy to miss a column and especially a row when making a selection.”

See the write-up for a video of Tabula at work on a Windows system. A couple caveats: the tool will not work with scanned images. Also, the creators caution that, as of yet, Tabula  works best with simple table formats. Any developers who wish to get in on the project should navigate to its GitHub page here.

Cynthia Murrell, April 13, 2015

Stephen E Arnold, Publisher of CyberOSINT at

Vilocity 2.0 Released by Nuwave

March 17, 2015

The article on Virtual Strategy Magazine titled NuWave Enhances their Vilocity Analytic Framework with Release of Vilocity 2.0 Update promotes the upgraded framework as a mixture of Oracle Business Intelligence Enterprise Edition and Oracle Endeca Information Discovery. The ability to interface across both of these tools as well as include components from both in a single dashboard makes this a very useful program, with capabilities such as exporting to Microsoft to create slideshows, pre-filter and the ability to choose sections of a page and print across both frameworks. The article explains,

“The voices of our Vilocity customers were vital in the Vilocity 2.0 release and we value their input,” says Rob Castle, NuWave’s Chief Technology Officer… The most notable Vilocity deployment NuWave has done is for the U.S. Army EMDS Program. From deployment and through continuous support NuWave has worked closely with this client to communicate issues and identify tools that could improve Vilocity. The Vilocity 2.0 release is a culmination of NuWave’s desire for their clients to be successful.”

It looks like they have found a way to make Endeca useful. Users of the Vilocity Analytic framework will be able to find answers to the right questions as well as make new discoveries. The consistent look and feel of both systems should aid users in getting used to them, and making the most of their new platform.

Chelsea Kerwin, March 17, 2014

Stephen E Arnold, Publisher of CyberOSINT at

EMC: Another Information Sideshow in the Spotlight

January 31, 2015

An information sideshow is enterprise software that presents itself as the motor, transmission, and differential for the organization. Get real. The main enterprise applications are accounting, database management systems, sales management, and systems that manage real stuff (ERP, PLM, etc.)

Applications that purport to manage Web content or organize enterprise wide information and data are important but the functions concern overhead positions except in publishing companies and similar firms.

Since the Web became everyone’s passport to becoming an expert online professional, Web content management systems blossomed and flamed out. Anyone using Broadvision or Sagemaker?

Documentum is a content management system. It is mandated or was mandated as the way to provide information to support the antics of the Food and Drug Administration and some other regulated sectors. The money from FDA’s blessing does not mean that Documentum is in step with today’s digital demands. In fact, for some applications, systems like Documentum are good for the resellers and integrators. Users often have a different point of view. Do you love OpenText, MarkLogic, and other proprietary content management systems? Remember XyVision?

Several years ago, I had a fly over of a large EMC Documentum project. When I was asked to take a look, a US government entity had been struggling for three years to get a Documentum system up and running. I think one of the resellers and consultants was my old pal IBM, which sells its own content management systems, by the way. At the time I was working with the Capitol Police (yep, another one of those LE entities that few people know much about). Think investigation.

I poked around the system, reviewed some US government style documentation, and concluded that in process system would require more investment and time to get up and toddling, not walking, mind you, just toddling. I bailed and worked on projects that sort of really worked mostly in other governmental entities.

After that experience, I realized that “content management” was a bit of a charade, not to different from Web servers and enterprise search. The frenzy for Web stuff made it easy for vendors of proprietary systems to convince organizations to buy bespoke, proprietary content management systems. Wow.

The outfits that are in the business of creating content know about editorial policies. Licensees of content management systems often do not. But publishing expertise is irrelevant to many 20 somethings, failed webmasters, self appointed experts, and confused people looking for a source of money.

The world is chock a block with content management systems. But there is a difference today, and the shift from proprietary systems to open source systems puts vendors of proprietary systems in a world of sales pain. For some outfits, CMS means SharePoint (heaven help me).

For other companies CMS means open source CMS systems. No license fees. No restrictions on changes. But CMS still requires expensive ministrations from CMS experts. Just like enterprise search.

I read “EMC Reports Mixed Results, Fingers Axe: Reduction in Force Planned.” For me this passage jumped out of the article:

The Unified Backup and Recovery segment includes mid-range VNX arrays and it had a storming quarter too, with 2,000 new VNX customers. VCE also added a record number of new customers. RSA grew at a pedestrian rate in the quarter, four per cent year-on-year with the Information Intelligence Group (Documentum, etc) declining eight per cent; this product set has never shone.

So, an eight percent decline. Not good. Like enterprise search, this proprietary content management product has a long sales cycle and after six months of effort, the client may decide to use an open source solution. Joomla anyone? My hunch is that the product set will emit as many sparklies as the soot in my fireplace chimney.

CMS is another category of software for which cyber OSINT method points the way to the future. Automated systems capture what humans do and operate on that content automatically. Allowing humans to index, tag, copy, date, and perform other acts of content violence leads to findability chaos.

In short, EMC Documentum is going to face some tough months. Drupal anyone?

Stephen E Arnold, January 31, 2015

Attivio Highlights Content Intake Issues

November 4, 2014

I read “Digesting Ingestion.” The write up is important because it illustrates how vendors with roots in traditional information retrieval like Attivio are responding to changing market demands.

The article talks about the software required to hook a source like a Web page or a dynamic information source to a content processing and search system. Most vendors provide a number of software widgets to handle frequently encountered file types; for example, Microsoft Word content, HTML Web pages, and Adobe PDF documents. However, when less frequently encountered content types are required, a specialized software widget may be required.

Attivio states:

There are a number of multiplicative factors to consider from the perspective of trying to provide a high-quality connector that works across all versions of a source:

·         The source software version, including patches, optional modules, and configuration

·         Embedded or required 3rd party software (such as a relational database), including version, patches, optional modules and configuration

·         Hardware and operating system version, including patches, optional modules, and configuration

·         Throughput/capacity of the repository APIs

·         Throughput/capacity and ability to operate in parallel.

This is useful information. In a real world example, Attivio reports that a number of other factors can come into play. These range from lacking appropriate computing resources to corrupt data that connectors send to the exception folder and my favorite Big Data.

Attivio is to be credited for identifying these issues. Search-centric vendors have to provide solutions to these challenges. I would point out that there are a number of companies that have leapfrogged search-centric approaches to high volume content intake.

These new players, not the well known companies providing search solutions, are the next generation in information access solutions. Watch for more information about automated collection and analysis of Internet accessible information and the firms redefining information access.

Stephen E Arnold, November 4, 2014

ArnoldIT Search Requirements Video

October 26, 2014

The goslings continue to experiment with short videos. The most recent on is about enterprise search requirements. The four minute YouTube program hits some highlights about the perilous process of licensing an enterprise search system. The video is located at

Donald C Anderson, October 26, 2014

Next Page »