CyberOSINT banner

Inferences: Check Before You Assume the Outputs Are Accurate

November 23, 2015

Predictive software works really well as long as the software does not have to deal with horse races, the stock market, and the actions of single person and his closest pals.

Inferences from Backtest Results Are False Until Proven True” offers a useful reminder to those who want to depend on algorithms someone else set up. The notion is helpful when the data processed are unchecked, unfamiliar, or just assumed to be spot on.

The write up says:

the primary task of quantitative traders should be to prove specific backtest results worthless, rather than proving them useful.

What throws backtests off the track? The write up provides a useful list of reminders:

  1. Data-mining and data snooping bias
  2. Use of non tradable instruments
  3. Unrealistic accounting of frictional effects
  4. Use of the market close to enter positions instead of the more realistic open
  5. Use of dubious risk and money management methods
  6. Lack of effect on actual prices

The author is concerned about financial applications, but the advice may be helpful to those who just want to click a link, output a visualization, and assume the big spikes are really important to the decision you will influence in one hour.

One point I highlighted was:

Widely used strategies lose any edge they might have had in the past.

Degradation occurs just like the statistical drift in Bayesian based systems. Exciting if you make decisions on outputs known to be flawed. How is that automatic indexing, business intelligence, and predictive analytics systems working?

Stephen E Arnold, November 23, 2015

No Mole, Just Data

November 23, 2015

It all comes down to putting together the pieces, we learn from Salon’s article, “How to Explain the KGB’s Aazing Success Identifying CIA Agents in the Field?” For years, the CIA was convinced there was a Soviet mole in their midst; how else to explain the uncanny knack of the 20th Century’s KGB to identify CIA agents? Now we know it was due to the brilliance of one data-savvy KGB agent, Yuri Totrov, who analyzed U.S. government’s personnel data to separate the spies from the rest of our workers overseas. The technique was very effective, and all without the benefit of today’s analytics engines.

Totrov began by searching the KGB’s own data, and that of allies like Cuba, for patterns in known CIA agent postings. He also gleaned a lot if info from  publicly available U.S. literature and from local police. Totrov was able to derive 26 “unchanging indicators” that would pinpoint a CIA agent, as well as many other markers less universal but useful. Things like CIA agents driving the same car and renting the same apartment as their immediate predecessors. Apparently, logistics agents back at Langley did not foresee that such consistency, though cost-effective, could be used against us.

Reporter Jonathan Haslam elaborates:

“Thus one productive line of inquiry quickly yielded evidence: the differences in the way agency officers undercover as diplomats were treated from genuine foreign service officers (FSOs). The pay scale at entry was much higher for a CIA officer; after three to four years abroad a genuine FSO could return home, whereas an agency employee could not; real FSOs had to be recruited between the ages of 21 and 31, whereas this did not apply to an agency officer; only real FSOs had to attend the Institute of Foreign Service for three months before entering the service; naturalized Americans could not become FSOs for at least nine years but they could become agency employees; when agency officers returned home, they did not normally appear in State Department listings; should they appear they were classified as research and planning, research and intelligence, consular or chancery for security affairs; unlike FSOs, agency officers could change their place of work for no apparent reason; their published biographies contained obvious gaps; agency officers could be relocated within the country to which they were posted, FSOs were not; agency officers usually had more than one working foreign language; their cover was usually as a ‘political’ or ‘consular’ official (often vice-consul); internal embassy reorganizations usually left agency personnel untouched, whether their rank, their office space or their telephones; their offices were located in restricted zones within the embassy; they would appear on the streets during the working day using public telephone boxes; they would arrange meetings for the evening, out of town, usually around 7.30 p.m. or 8.00 p.m.; and whereas FSOs had to observe strict rules about attending dinner, agency officers could come and go as they pleased.”

In the era of Big Data, it seems like common sense to expect such deviations to be noticed and correlated, but it was not always so obvious. Nevertheless, Totrov’s methods did cause embarrassment for the agency when they were revealed. Surely, the CIA has changed their logistic ways dramatically since then to avoid such discernable patterns. Right?

Cynthia Murrell, November 23, 2015

Sponsored by, publisher of the CyberOSINT monograph


How to Speak to Executives

November 19, 2015

If you need help communicating with the higher-ups, see “Sales Pitch: How to Sell Your IT Strategy to the Board” at SmartDataCollective. Writer Simon Mitchell points out that, when trying to convince the higher-ups to loosen the purse strings, IT pros are unlikely to succeed if their audience doesn’t understand what they’re talking about. He advises:

“Step out of your technological mindset. Long presentations on subjects outside your audience’s core competence are a waste of everyone’s time. Don’t bore the board with too much detail about how the technology actually works. Focus on the business case for your strategy.”

The write-up goes on to recommend a three-point framework for such presentations: focus on the problem (or opportunity), deliver the strategy, and present costs and benefits. See the post for more on each of these points. It is also smart have the technical details on hand, in case anyone asks. We’re left with four take-aways:

“*Before you present your next big IT initiative to the board, put yourself in their shoes. What do they need to hear?

*Review how you can make tech talk accessible and appealing to non-technical colleagues.

*Keep your presentations short and sweet.

*Focus on the business case for your IT strategy.”

Mitchell also wisely recommends The Economist’s Style Guide for more pointers. But, what if the board does not put you on the agenda or, when you make your pitch, no one cares? Well, that’s a different problem.

Cynthia Murrell, November 19, 2015

Sponsored by, publisher of the CyberOSINT monograph


More Bad News for Traditional TV

November 17, 2015

Traditional TV is in a slow decline towards obsoleteness.  With streaming options offering more enticing viewing options with less out of pocket expenses and no contracts, why would a person sign on for cable or dish packages that have notoriously bad customer service, commercials, and insane prices?  Digital Trends has the most recent information from Nielsen about TV viewing habits, “New Nielsen Study On Streaming Points To More Bad News For Traditional TV.”

Pay-for-TV services have been on the decline for years, but the numbers are huge for the latest Nielsen Total Audience report:

“According to the data, broadband-only homes are up by 52 percent to 3.3 million from 2.2 million year over year. Meanwhile, pay-TV subscriptions are down 1.2 percent to 100.4 million, from 101.6 million at this time last year. And while 1.2 percent may not seem like much, that million plus decline has caused all sorts of havoc on the stock market, with big media companies like Viacom, Nickelodeon, Disney, and many others seeing tumbling stock prices in recent weeks.”

While one might suggest that pay-for-TV services should start the bankruptcy paperwork, there has been a 45% rise in video-on-demand services.  Nielsen does not tabulate streaming services, viewership on mobile devices, and if people are watching more TV due to all the options?

While Nielsen is a trusted organization for TV data, information is still collected view paper submission forms.  Nielsen is like traditional TV and need to update its offerings to maintain relevancy.

Whitney Grace, November 17, 2015

Sponsored by, publisher of the CyberOSINT monograph

Icann Is an I Won’t

November 16, 2015

Have you ever heard of Icann?  You are probably like many people within the United States and have not heard of the non-profit private company.  What does Icann do?  Icann is responsible for Internet protocol addresses (IP) and coordinating domain names, so basically the company is responsible for a huge portion of the Internet.  According to The Guardian in “The Internet Is Run By An Unaccountable Private Company. This Is A Problem,” the US supposedly runs the Icann but its role is mostly clerical and by September 30, 2015 it was supposed to hand the reins over to someone else.

The “else” is the biggest question.  The Icann community spent hours trying to figure out who would manage the company, but they ran into a huge brick wall.  The biggest issue is that the volunteers want Icann to have more accountability, which does not seem feasible. Icann’s directors cannot be fired, except by each other.  Finances are another problem with possible governance risks and corruption.

A supposed solution is to create a membership organization, a common business model for non-profits and will give power to the community.  Icann’s directors are not too happy and have been allowed to add their own opinions.  Decisions are not being made at Icann and with the new presidential election the entire power shift could be off.  It is not the worst that could happen:

“But there’s much more at stake. Icann’s board – as ultimate authority in this little company running global internet resources, and answerable (in fact, and in law) to no one – does have the power to reject the community’s proposals. But not everything that can be done, should be done. If the board blunders on, it will alienate those volunteers who are the beating heart of multi-stakeholder governance. It will also perfectly illustrate why change is required.”

The board has all the power and the do not have anyone to hold them accountable.  Icann directors just have to stall long enough to keep things the same and they will be able to give themselves more raises.

Whitney Grace, November 16, 2015
Sponsored by, publisher of the CyberOSINT monograph

No Microfiche Required

November 16, 2015

Longstanding publications are breathing new life into their archives by re-publishing key stories online, we learn from NiemanLab’s article, “Esquire Has a Cold: How the Magazine is Mining its Archives with the Launch of Esquire Classics.” We learn that Esquire has been posting older articles on their Esquire Classics website, timed to coincide with related current events. For example, on the anniversary of Martin Luther King Jr.’s death last April, the site republished a 1968 article about his assassination.

Other venerable publications are similarly tapping into their archives. Writer Joseph Lichterman notes:

“Esquire, of course, isn’t the only legacy publication that’s taking advantage of archival material once accessible only via bound volumes or microfiche. Earlier this month, the Associated Press republished its original coverage of Abraham Lincoln’s assassination 150 years ago…. Gawker Media’s Deadspin has The Stacks, which republishes classic sports journalism originally published elsewhere. For its 125th anniversary last year, The Wall Street Journal published more than 300 archival articles. The New York Times runs a Twitter account, NYT Archives, that resurfaces archival content from the Times. It also runs First Glimpses, a series that examines the first time famous people or concepts appeared in the paper.”

This is one way to adapt to the altered reality of publication. Perhaps with more innovative thinking, the institutions that have kept us informed for decades (or centuries) will survive to deliver news to our great-grandchildren. But will it be beamed directly into their brains? That is another subject entirely.


Cynthia Murrell, November 16, 2015

Sponsored by, publisher of the CyberOSINT monograph

Amazon Punches Business Intelligence

November 11, 2015

Amazon already gave technology a punch when it launched AWS, but now it is releasing a business intelligence application that will change the face of business operations or so Amazon hopes.  ZDNet describes Amazon’s newest endeavor in “AWS QuickSight Will Disrupt Business Intelligence, Analytics Markets.”  The market is already saturated with business intelligence technology vendors, but Amazon’s new AWS QuickSight will cause another market upheaval.

“This month is no exception: Amazon crashed the party by announcing QuickSight, a new BI and analytics data management platform. BI pros will need to pay close attention, because this new platform is inexpensive, highly scalable, and has the potential to disrupt the BI vendor landscape. QuickSight is based on AWS’ cloud infrastructure, so it shares AWS characteristics like elasticity, abstracted complexity, and a pay-per-use consumption model.”

Another monkey wrench for business intelligence vendors is that AWS QuickSight’s prices are not only reasonable, but are borderline scandalous: standard for $9/month per user or enterprise edition for $18/month per user.

Keep in mind, however, that AWS QuickSight is the newest shiny object on the business intelligence market, so it will have out-of-the-box problems, long-term ramifications are unknown, and reliance on database models and schemas.  Do not forget that most business intelligence solutions do not resolve all issues, including ease of use and comprehensiveness.  It might be better to wait until all the bugs are worked out of the system, unless you do not mind being a guinea pig.

Whitney Grace, November 11, 2015
Sponsored by, publisher of the CyberOSINT monograph


Big Data, Like Enterprise Search, Kicks the ROI Can Down the Road

November 8, 2015

I read “Experiment with Big Data Now, and Worry about ROI Later, Advises Pentaho ‘Guru’.” That’s the good thing about gurus. As long as the guru gets a donation, the ROI of advice is irrelevant.

I am okay with the notion of analyzing data, testing models, and generating scenarios based on probabilities. Good, useful work.

The bit that annoys me is the refusal to accept that certain types of information work is an investment. The idea that fiddling with zeros and ones has a return on investment is—may I be frank?—stupid.

Here’s a passage I noted as a statement from a wizard from Pentaho, a decent outfit:

“There are a couple of business cases you can make for data laking. One is warm storage [data accessed less often than “hot”, but more often than “cold”] – it’s much faster and cheaper to run than a high-end data warehouse. On the other hand, that’s not where the real value is – the real value is in exploring, so that’s why you do at least need to have a data scientist, to do some real research and development.”

The buzzwords, the silliness of “real value,” and “real” research devalue work essential to modern business.

Enterprise search vendors were the past champions of baloney. Now the analytics firms are trapped in the fear of valueless activity.

That’s not good for ROI, is it?

Stephen E Arnold, November 8, 2015

Data Analytics Is More Than Simple Emotion

November 6, 2015

Hopes and Fears posted the article, “Are You Happy Now? The Uncertain Future Of Emotion Analytics” discusses the possible implications of technology capable of reading emotions.  The article opens with a scenario from David Collingridge explaining that the only way to truly gauge technology’s impact is when it has become so ingrained into society that it would be hard to change.  Many computing labs are designing software capable of reading emotions using an array of different sensors.

The biggest problem ahead is not how to integrate emotion reading technology into our lives, but what are the ethical concerns associated with it?

Emotion reading technology is also known as affective computing and the possible ethical concerns are more than likely to come from corporation to consumer relationships over consumer-to-consumer relationships.  Companies are already able to track a consumer’s spending habits by reading their Internet data and credit cards, then sending targeted ads.

Consumers should be given the option to have their emotions read:

“Affective computing has the potential to intimately affect the inner workings of society and shape individual lives. Access, an international digital rights organization, emphasizes the need for informed consent, and the right for users to choose not to have their data collected. ‘All users should be fully informed about what information a company seeks to collect,’ says Drew Mitnick, Policy Counsel with Access, ‘The invasive nature of emotion analysis means that users should have as much information as possible before being asked to subject [themselves] to it.’”

While the article’s topic touches on fear, it ends on a high note that we should not be afraid of the future of technology.  It is important to discuss ethical issues right now, so groundwork will already be in place to handle affective computing.

Whitney Grace, November 6, 2015

TemaTres Open Source Vocabulary Server

November 3, 2015

The latest version of the TemaTres vocabulary server is now available, we learn from the company’s blog post, “TemaTres 2.0 Released.” Released under the GNU General Public License version 2.0, the web application helps manage taxonomies, thesauri, and multilingual vocabularies. The web application can be downloaded at SourceForge. Here’s what has changed since the last release:

*Export to Moodle your vocabulary: now you can export to Moodle Glossary XML format

*Metadata summary about each term and about your vocabulary (data about terms, relations, notes and total descendants terms, deep levels, etc)

*New report: reports about terms with mapping relations, terms by status, preferred terms, etc.

*New report: reports about terms without notes or specific type of notes

*Import the notes type defined by user (custom notes) using tagged file format

*Select massively free terms to assign to other term

*Improve utilities to take terminological recommendations from other vocabularies (more than 300:

*Update Zthes schema to Zthes 1.0 (Thanks to Wilbert Kraan)

*Export the whole vocabulary to Metadata Authority Description Schema (MADS)

*Fixed bugs and improved several functional aspects.

*Uses Bootstrap v3.3.4

See the server’s SourceForge page, above, for the full list of features. Though as of this writing only 21 users had rated the product, all seemed very pleased with the results. The TemaTres website notes that running the server requires some other open source tools: PHP, MySql, and HTTP Web server. It also specifies that, to update from version 1.82, keep the db.tematres.php, but replace the code. To update from TemaTres 1.6 or earlier, first go in as an administrator and update to version 1.7 through Menu-> Administration -> Database Maintenance.

Cynthia Murrell, November 3, 2015

Sponsored by, publisher of the CyberOSINT monograph

Next Page »