CyberOSINT banner

No Microfiche Required

November 16, 2015

Longstanding publications are breathing new life into their archives by re-publishing key stories online, we learn from NiemanLab’s article, “Esquire Has a Cold: How the Magazine is Mining its Archives with the Launch of Esquire Classics.” We learn that Esquire has been posting older articles on their Esquire Classics website, timed to coincide with related current events. For example, on the anniversary of Martin Luther King Jr.’s death last April, the site republished a 1968 article about his assassination.

Other venerable publications are similarly tapping into their archives. Writer Joseph Lichterman notes:

“Esquire, of course, isn’t the only legacy publication that’s taking advantage of archival material once accessible only via bound volumes or microfiche. Earlier this month, the Associated Press republished its original coverage of Abraham Lincoln’s assassination 150 years ago…. Gawker Media’s Deadspin has The Stacks, which republishes classic sports journalism originally published elsewhere. For its 125th anniversary last year, The Wall Street Journal published more than 300 archival articles. The New York Times runs a Twitter account, NYT Archives, that resurfaces archival content from the Times. It also runs First Glimpses, a series that examines the first time famous people or concepts appeared in the paper.”

This is one way to adapt to the altered reality of publication. Perhaps with more innovative thinking, the institutions that have kept us informed for decades (or centuries) will survive to deliver news to our great-grandchildren. But will it be beamed directly into their brains? That is another subject entirely.


Cynthia Murrell, November 16, 2015

Sponsored by, publisher of the CyberOSINT monograph

Amazon Punches Business Intelligence

November 11, 2015

Amazon already gave technology a punch when it launched AWS, but now it is releasing a business intelligence application that will change the face of business operations or so Amazon hopes.  ZDNet describes Amazon’s newest endeavor in “AWS QuickSight Will Disrupt Business Intelligence, Analytics Markets.”  The market is already saturated with business intelligence technology vendors, but Amazon’s new AWS QuickSight will cause another market upheaval.

“This month is no exception: Amazon crashed the party by announcing QuickSight, a new BI and analytics data management platform. BI pros will need to pay close attention, because this new platform is inexpensive, highly scalable, and has the potential to disrupt the BI vendor landscape. QuickSight is based on AWS’ cloud infrastructure, so it shares AWS characteristics like elasticity, abstracted complexity, and a pay-per-use consumption model.”

Another monkey wrench for business intelligence vendors is that AWS QuickSight’s prices are not only reasonable, but are borderline scandalous: standard for $9/month per user or enterprise edition for $18/month per user.

Keep in mind, however, that AWS QuickSight is the newest shiny object on the business intelligence market, so it will have out-of-the-box problems, long-term ramifications are unknown, and reliance on database models and schemas.  Do not forget that most business intelligence solutions do not resolve all issues, including ease of use and comprehensiveness.  It might be better to wait until all the bugs are worked out of the system, unless you do not mind being a guinea pig.

Whitney Grace, November 11, 2015
Sponsored by, publisher of the CyberOSINT monograph


Big Data, Like Enterprise Search, Kicks the ROI Can Down the Road

November 8, 2015

I read “Experiment with Big Data Now, and Worry about ROI Later, Advises Pentaho ‘Guru’.” That’s the good thing about gurus. As long as the guru gets a donation, the ROI of advice is irrelevant.

I am okay with the notion of analyzing data, testing models, and generating scenarios based on probabilities. Good, useful work.

The bit that annoys me is the refusal to accept that certain types of information work is an investment. The idea that fiddling with zeros and ones has a return on investment is—may I be frank?—stupid.

Here’s a passage I noted as a statement from a wizard from Pentaho, a decent outfit:

“There are a couple of business cases you can make for data laking. One is warm storage [data accessed less often than “hot”, but more often than “cold”] – it’s much faster and cheaper to run than a high-end data warehouse. On the other hand, that’s not where the real value is – the real value is in exploring, so that’s why you do at least need to have a data scientist, to do some real research and development.”

The buzzwords, the silliness of “real value,” and “real” research devalue work essential to modern business.

Enterprise search vendors were the past champions of baloney. Now the analytics firms are trapped in the fear of valueless activity.

That’s not good for ROI, is it?

Stephen E Arnold, November 8, 2015

Data Analytics Is More Than Simple Emotion

November 6, 2015

Hopes and Fears posted the article, “Are You Happy Now? The Uncertain Future Of Emotion Analytics” discusses the possible implications of technology capable of reading emotions.  The article opens with a scenario from David Collingridge explaining that the only way to truly gauge technology’s impact is when it has become so ingrained into society that it would be hard to change.  Many computing labs are designing software capable of reading emotions using an array of different sensors.

The biggest problem ahead is not how to integrate emotion reading technology into our lives, but what are the ethical concerns associated with it?

Emotion reading technology is also known as affective computing and the possible ethical concerns are more than likely to come from corporation to consumer relationships over consumer-to-consumer relationships.  Companies are already able to track a consumer’s spending habits by reading their Internet data and credit cards, then sending targeted ads.

Consumers should be given the option to have their emotions read:

“Affective computing has the potential to intimately affect the inner workings of society and shape individual lives. Access, an international digital rights organization, emphasizes the need for informed consent, and the right for users to choose not to have their data collected. ‘All users should be fully informed about what information a company seeks to collect,’ says Drew Mitnick, Policy Counsel with Access, ‘The invasive nature of emotion analysis means that users should have as much information as possible before being asked to subject [themselves] to it.’”

While the article’s topic touches on fear, it ends on a high note that we should not be afraid of the future of technology.  It is important to discuss ethical issues right now, so groundwork will already be in place to handle affective computing.

Whitney Grace, November 6, 2015

TemaTres Open Source Vocabulary Server

November 3, 2015

The latest version of the TemaTres vocabulary server is now available, we learn from the company’s blog post, “TemaTres 2.0 Released.” Released under the GNU General Public License version 2.0, the web application helps manage taxonomies, thesauri, and multilingual vocabularies. The web application can be downloaded at SourceForge. Here’s what has changed since the last release:

*Export to Moodle your vocabulary: now you can export to Moodle Glossary XML format

*Metadata summary about each term and about your vocabulary (data about terms, relations, notes and total descendants terms, deep levels, etc)

*New report: reports about terms with mapping relations, terms by status, preferred terms, etc.

*New report: reports about terms without notes or specific type of notes

*Import the notes type defined by user (custom notes) using tagged file format

*Select massively free terms to assign to other term

*Improve utilities to take terminological recommendations from other vocabularies (more than 300:

*Update Zthes schema to Zthes 1.0 (Thanks to Wilbert Kraan)

*Export the whole vocabulary to Metadata Authority Description Schema (MADS)

*Fixed bugs and improved several functional aspects.

*Uses Bootstrap v3.3.4

See the server’s SourceForge page, above, for the full list of features. Though as of this writing only 21 users had rated the product, all seemed very pleased with the results. The TemaTres website notes that running the server requires some other open source tools: PHP, MySql, and HTTP Web server. It also specifies that, to update from version 1.82, keep the db.tematres.php, but replace the code. To update from TemaTres 1.6 or earlier, first go in as an administrator and update to version 1.7 through Menu-> Administration -> Database Maintenance.

Cynthia Murrell, November 3, 2015

Sponsored by, publisher of the CyberOSINT monograph

RAVN Pipeline Coupled with ElasticSearch to Improve Indexing Capabilities

October 28, 2015

The article on PR Newswire titled RAVN Systems Releases its Enterprise Search Indexing Platform, RAVN Pipeline, to Ingest Enterprise Content Into ElasticSearch unpacks the decision to improve the ElasticSearch platform by supplying the indexing platform of the RAVN Pipeline. RAVN Systems is a UK company with expertise in processing unstructured data founded by consultants and developers. Their stated goal is to discover new lands in the world of information technology. The article states,

“RAVN Pipeline delivers a platform approach to all your Extraction, Transformation and Load (ETL) needs. A wide variety of source repositories including, but not limited to, File systems, e-mail systems, DMS platforms, CRM systems and hosted platforms can be connected while maintaining document level security when indexing the content into Elasticsearch. Also, compressed archives and other complex data types are supported out of the box, with the ability to retain nested hierarchical structures.”

The added indexing ability is very important, especially for users trying to index from from or into cloud-based repositories. Even a single instance of any type of data can be indexed with the Pipeline, which also enriches data during indexing with auto-tagging and classifications. The article also promises that non-specialists (by which I assume they mean people) will be able to use the new systems due to their being GUI driven and intuitive.

Chelsea Kerwin, October 28, 2015

Sponsored by, publisher of the CyberOSINT monograph


Braiding Big Data

October 26, 2015

An apt metaphor to explain big data is the act of braiding.  Braiding requires  person to take three or more locks of hair and alternating weaving them together.  The end result is clean, pretty hairstyle that keeps a person’s hair in place and off the face.  Big data is like braiding, because specially tailored software takes an unruly mess of data, including the combed and uncombed strands, and organizes them into a legible format.   Perhaps this is why TopQuadrant named its popular big data software TopBraid, read more about its software upgrade in “TopQuadrant Launches TopBraid 5.0.”

TopBraid Suite is an enterprise Web-based solution set that simplifies the development and management of standards-based, model driven solutions focused on taxonomy, ontology, metadata management, reference data governance, and data virtualization.  The newest upgrade for TopBraid builds on the current enterprise information management solutions and adds new options:

“ ‘It continues to be our goal to improve ways for users to harness the full potential of their data,’ said Irene Polikoff, CEO and co-founder of TopQuadrant. ‘This latest release of 5.0 includes an exciting new feature, AutoClassifier. While our TopBraid Enterprise Vocabulary Net (EVN) Tagger has let users manually tag content with concepts from their vocabularies for several years, AutoClassifier completely automates that process.’ “


The AutoClassifer makes it easier to add and edit tags before making them a part of the production tag set. Other new features are for TopBraid Enterprise Vocabulary Net (TopBraid EVN), TopBraid Reference Data Manager (RDM), TopBraid Insight, and the TopBraid platform, including improvements in internationalization and a new component for increasing system availability in enterprise environments, TopBraid DataCache.

TopBraid might be the solution an enterprise system needs to braid its data into style.

Whitney Grace, October 26, 2015

Sponsored by, publisher of the CyberOSINT monograph

University Partners up with Leidos to Investigate How to Cut Costs in Healthcare with Big Data Usage

October 22, 2015

The article on News360 titled Gulu Gambhir: Leidos Virginia Tech to Research Big Data Usage for Healthcare Field explains the partnership based on researching the possible reduction in healthcare costs through big data. Obviously, healthcare costs in this country have gotten out of control, and perhaps that is more clear to students who grew up watching the cost of single pain pill grow larger and larger without regulation. The article doesn’t go into detail on how the application of big data from electronic health records might ease costs, but Leidos CTO Gulu Gambhir sounds optimistic.

“The company said Thursday the team will utilize technical data from healthcare providers to develop methods that address the sector’s challenges in terms of cost and the quality of care. Gulu Gambhir, chief technology officer and a senior vice president at Leidos, said the company entered the partnership to gain knowledge for its commercial and federal healthcare business.”

The partnership also affords excellent opportunities for Virginia Tech students to gain real-world, hands-on knowledge of data research, hopefully while innovating the healthcare industry. Leidos has supplied funding to the university’s Center for Business Intelligence and Analytics as well as a fellowship program for grad students studying advanced information systems related to healthcare research.
Chelsea Kerwin, October 22, 2015

Sponsored by, publisher of the CyberOSINT monograph

Reclaiming Academic Publishing

October 21, 2015

Researchers and writers are at the mercy of academic publishers who control the venues to print their work, select the content of their work, and often control the funds behind their research.  Even worse is that academic research is locked behind database walls that require a subscription well beyond the price range of a researcher not associated with a university or research institute.  One researcher was fed up enough with academic publishers that he decided to return publishing and distributing work back to the common people, says Nature in “Leading Mathematician Launches arXiv ‘Overlay’ Journal.”

The new mathematics journal Discrete Analysis peer reviews and publishes papers free of charge on the preprint server arXiv.  Timothy Gowers started the journal to avoid the commercial pressures that often distort scientific literature.

“ ‘Part of the motivation for starting the journal is, of course, to challenge existing models of academic publishing and to contribute in a small way to creating an alternative and much cheaper system,’ he explained in a 10 September blog post announcing the journal. ‘If you trust authors to do their own typesetting and copy-editing to a satisfactory standard, with the help of suggestions from referees, then the cost of running a mathematics journal can be at least two orders of magnitude lower than the cost incurred by traditional publishers.’ ”

Some funds are required to keep Discrete Analysis running, costs are ten dollars per submitted papers to pay for software that manages peer review and journal Web site and arXiv requires an additional ten dollars a month to keep running.

Gowers hopes to extend the journal model to other scientific fields and he believes it will work, especially for fields that only require text.  The biggest problem is persuading other academics to adopt the model, but things move slowly in academia so it will probably be years before it becomes widespread.

Whitney Grace, October 21, 2015
Sponsored by, publisher of the CyberOSINT monograph

SAS: Predictive Analytics for Every One. Yes, Every One

October 19, 2015

Forget your university statistics course. Ignore the thrill of secondary school calculus. A new world has arrived. The terraformer is SAS, the statistics outfit everyone knows and loves.

I read “SAP Predictive Analytics Software Overview,” and was delighted to learn that I can now have on my desktop (sorry, mobile device users):

  • Perform data analyses, including time series forecasting, outlier detection, trend analysis, classification analysis, segmentation analysis and affinity analysis.
  • Create visualizations and analyze data through the use of scatter matrix charts, parallel coordinates, cluster charts and decision trees.
  • Use the R open source language for statistical analysis.
  • Perform in-memory data mining for large-volume data analysis.

What, you may ask, is a user to do if the underpinnings of these operations are not understood?

My hunch is that for the ease of use and point and click functions of tried and true SAS plus KXEN technology is that you may find yourself in need of a specialist.

Knowledge of SAS conventions, R, and possibly third party libraries or Hadoop may come in handy.

I am delighted that SAS, founded in 1976 is delivering innovations. Unfortunately to make predictive analytics deliver fresh bread in an optimized way will require a grasp of statistical procedures, the ability to validate input data sets, and manipulate the options presented.

In short, statistics and math skills coupled with the fundamentals of data analysis should do nicely to help you get the most from this new bundle from SAS. No word on pricing.

Stephen E Arnold, October 19, 2015

« Previous PageNext Page »