CyberOSINT banner

Algorithmic Art Historians

July 14, 2015

Apparently, creativity itself is no longer subjective. MIT Technology Review announces, “Machine Vision Algorithm Chooses the Most Creative Paintings in History.” Traditionally, art historians judge how creative a work is based on its novelty and its influence on subsequent artists. The article notes that this is a challenging task, requiring an encyclopedic knowledge of art history and the judgement to decide what is novel and what has been influential. Now, a team at Rutgers University has developed an algorithm they say is qualified for the job.

Researchers Ahmed Elgammal and Babak Saleh credit several developments with bringing AI to this point. First, we’ve recently seen several breakthroughs in machine understanding of visual concepts, called classemes. that include recognition of factors from colors to specific objects. Another important factor: there now exist well-populated online artwork databases that the algorithms can, um, study. The article continues:

“The problem is to work out which paintings are the most novel compared to others that have gone before and then determine how many paintings in the future have uses similar features to work out their influence. Elgammal and Saleh approach this as a problem of network science. Their idea is to treat the history of art as a network in which each painting links to similar paintings in the future and is linked to by similar paintings from the past. The problem of determining the most creative is then one of working out when certain patterns of classemes first appear and how these patterns are adopted in the future. …

“The problem of finding the most creative paintings is similar to the problem of finding the most influential person on a social network, or the most important station in a city’s metro system or super spreaders of disease. These have become standard problems in network theory in recent years, and now Elgammal and Saleh apply it to creativity networks for the first time.”

Just what we needed. I have to admit the technology is quite intriguing, but I wonder: Will all creative human endeavors eventually have their algorithmic counterparts and, if so, how will that effect human expression?

Cynthia Murrell, July 14, 2015

Sponsored by, publisher of the CyberOSINT monograph

SAS Explains Big Data. Includes Cartoon, Excludes Information about Cost

July 13, 2015

I know that it is easy to say Big Data. It is easy to say Hadoop. It is easy to make statements in marketing collateral, in speeches, and in blogs written by addled geese. Honk!


I wish to point out that any use of these terms in the same sentence require an important catalyst: Money. Money that has been in the words of the government procurement officer, “Allocated, not just budgeted.”

Here are the words:

  1. Big Data
  2. Hadoop
  3. Unstructured data.

Point your monitored browser at “Marketers Ask: What Can Hadoop Do That My Data Warehouse Can’t?” The write up originates with SAS. When a company anchored in statistics, I expect some familiarity with numbers. (yep, just like the class you have blocked from your mind. The mid term? What mid term?)

The write up points out that unstructured data comes in many flavors. This chart, complete with cartoon, identifies 15 content types. I was amazed. Just 15. What about the data in that home brew content management system or tucked in the index of the no longer supported DEC 20 TIPS system. Yes, that data.


How does Hadoop deal with the orange and blue? Pretty well but you and the curious marketer must attend to three steps. Count ‘em off, please:

  1. Identify the business issue. I think this means know what problem one is trying to solve. This is a good idea, but I think most marketing problems boil down to generating revenue and proving it to senior management. Marketing looks for silver bullets when the sales are not dropping from the sky like packages for the believers in the Cargo Cult.
  2. Get top management support. Yep, this is a good idea because the catalyst—money—has to be available to clean, acquire, and load the goodies in the blue boxes and the wonky stuff from the home brew CMS.
  3. Develop a multi play plan. I think this means that the marketer has zero clue how complicated the Hadoop magic is. The excitement of extract, transform, and load. The thrill of batch processing awaits. Then the joy of looking at outputs which baffle the marketer more comfortable selecting colors and looking at Adwords’ reports than Hadoop data.

My thought is that SAS understands data, statistical methods, and the reality of a revolution which is taking place without the strictures of SAS approaches.

I do like the cartoon. I do not like the omission of the money part of the task. Doing the orange and blue thing for marketers is expensive. Do the marketers know this?


Stephen E Arnold, July 13, 2015

Watson Based Tradeoff Analytics Weighs Options

July 13, 2015

IBM’s Watson now lends its considerable intellect to helping users make sound decisions. In “IBM Watson Tradeoff Analytics—General Availability,” the Watson Developer Community announces that the GA release of this new tool can be obtained through the Watson Developer Cloud platform. The release follows an apparently successful Beta run that began last February. The write-up explains that the tool:

“… Allows you to compare and explore many options against multiple criteria at the same time. This ultimately contributes to a more balanced decision with optimal payoff.

“Clients expect to be educated and empowered: ‘don’t just tell me what to do,’ but ‘educate me, and let me choose.’ Tradeoff Analytics achieves this by providing reasoning and insights that enable judgment through assessment of the alternatives and the consequent results of each choice. The tool identifies alternatives that represent interesting tradeoff considerations. In other words: Tradeoff Analytics highlights areas where you may compromise a little to gain a lot. For example, in a scenario where you want to buy a phone, you can learn that if you pay just a little more for one phone, you will gain a better camera and a better battery life, which can give you greater satisfaction than the slightly lower price.”

For those interested in the technical details behind this Watson iteration, the article points you to Tradeoff Analyticsdocumentation. Those wishing to glimpse the visualization capabilities can navigate to  this demo. The write-up also lists post-beta updates and explains pricing, so check it out for more information.

Cynthia Murrell, July 13, 2015

Sponsored by, publisher of the CyberOSINT monograph

Dealing with Company and Product Identity: Terbium Labs Nails It

July 11, 2015

Navigate to and read about the company.


Nifty name. Very nifty name indeed. Now, a bit of branding commentary.

I used to work at Halliburton Nuclear. Ah, the good old days of nuclear engineers poking fun at civil engineers and mathematicians not understanding any joke made my the computer engineers.

The problem of naming companies in high technology disciplines is a very big one. Before Halliburton gobbled up the Nuclear Utility Services outfit, the company with more than 400 nuclear engineers on staff struggled with its name. Nuclear Utility Services was abbreviated to NUS. A pretty sharp copywriter named Richard Harrington of the dearly loved Ketchum, McLeod and Gove ad agency came up with this catchy line:

After the EPA, call NUS.

The important point is that Mr. Harrington, a whiz person, wanted to have people read each letter: E-P-A, not say eepa and say N-U-S not say noose. In Japanese, the sound “nus” has a negative meaning usually applied to pressurized body odor emissions. Not good.

Search and content processing vendors struggle with names. I have written about outfits which have fumbled the branding ball. Examples range from Thunderstone which has been usurped by a gaming company. Brainware which has been snagged and used for interesting videos. Smartlogic whose name has been appropriated by a smaller outfit doing marketing/design stuff. There are names which are impossible to find; for example, i2, AMI, and ChaCha to name a few among many.

I want to call attention to a quite useful product naming which I learned about recently. Navigate to Consider the word Terbium. Look for the word “Matchlight.”

I find Terbium a darned good word because terbium is an element, which my old (and I mean old) chemistry professor pronounced “ter-beem”). The element has a number of useful applications. Think solid sate devices and as a magic ingredient in some rocket fuels and—okay, okay—some explosives.

But as good as “terbium” is for a company I absolutely delight in this product name:


Now what’s Matchlight and why should anyone care. My hunch is that the technology which allows a next generation approach to content identification and other functions works to

  • light a match in the wilderness
  • illuminate a dark space
  • start a camp fire so I can cook a goose

You can and should learn more about Terbium Labs and its technology. The names will help you remember.

Important company; important technology. Great name Matchlight. (Hear that search and content processing vendors with dud names?)

Stephen E Arnold, July 11, 2015

Business Intelligence: The Grunt Work? Time for a Latte

July 10, 2015

I read “One Third of BI Pros Spend Up to 90% of Time Cleaning Data.” Well, well, well. Good old and frail eWeek has reported what those involved in data work have known for what? Decades, maybe centuries? The write up states with typical feather duster verbiage:

A recent survey commissioned by data integration platform provider Xplenty indicates that nearly one-third of business intelligence (BI) professionals are little more than “data janitors,” as they spend a majority of their time cleaning raw data for analytics.

What this means is that the grunt work in analytics still has to be done. This is difficult and tedious work even with normalization tools and nifty hand crafted scripts. Who wants to do this work? Not the MBAs who need slick charts to nail their bonus. Not the frantic marketer who has to add some juice to the pale and wan vice president’s talk at the Rotary Club. Not anyone, except those who understand the importance of scrutinizing data.

The write up points out that extract, transform, and load functions or ETL in the jingoism of Sillycon Valley is work. Guess what? The eWeek story uses these words to explain what the grunt work entails:

  • Integrating data from different platforms
  • Transforming data
  • Cleansing data
  • Formatting data.

But here’s the most important item in the article: If the report on which the article is based is correct, 21 percent of the data require special care and feeding. How’s that grab you for a task when you are pumping a terabyte of social media or intercept data a day? Right. Time for a bit of Facebook and a trip to Starbuck’s.

What happens if the data are not ship shape? Well, think about the fine decisions flowing from organizations which are dependent on data analytics. Why not chase down good old United Airlines and ask the outfit if anyone processed log files for the network which effectively grounded all flights? Know anyone at the Office of Personnel Management? You might ask the same question.

Ignoring data or looking at outputs without going through the grunt work is little better than guessing. No, wait. Guessing would probably return better outcomes. Time for some Foosball.

Stephen E Arnold, July 10, 2015

SAS Text Miner Promises Unstructured Insight

July 10, 2015

Big data is tools help organizations analyze more than their old, legacy data.  While legacy data does help an organization study how their process have changed, the data is old and does not reflect the immediate, real time trends.  SAS offers a product that bridges old data with the new as well as unstructured and structured data.

The SAS Text Miner is built from Teragram technology.  It features document theme discovery, a function the finds relations between document collections; automatic Boolean rule generation; high performance text mining that quickly evaluates large document collection; term profiling and trending, evaluates term relevance in a collection and how they are used; multiple language support; visual interrogation of results; easily import text; flexible entity options; and a user friendly interface.

The SAS Text Miner is specifically programmed to discover data relationships data, automate activities, and determine keywords and phrases.  The software uses predictive models to analysis data and discover new insights:

“Predictive models use situational knowledge to describe future scenarios. Yet important circumstances and events described in comment fields, notes, reports, inquiries, web commentaries, etc., aren’t captured in structured fields that can be analyzed easily. Now you can add insights gleaned from text-based sources to your predictive models for more powerful predictions.”

Text mining software reveals insights between old and new data, making it one of the basic components of big data.

Whitney Grace, July 10, 2015

Sponsored by, publisher of the CyberOSINT monograph

Coveo Partners with Etherios on Salesforce Services

July 7, 2015

Professional services firm Etherios is teaming up with Coveo in a joint mission to add even more value to customers’ Salesforce platforms, we learn from “Etherios and Coveo Announce Strategic Alliance” at Yahoo Finance. Etherios is a proud Salesforce Platinum Partner. The press release tells us:

 “Coveo connects information from across a company’s IT ecosystem of record and delivers the knowledge that matters to customers and agents in context. Coveo for Salesforce – Communities Edition helps customers solve their own cases by proactively offering case-resolving knowledge suggestions, and Coveo for Salesforce – Service Cloud Edition allows customer support agents to upskill as they engage customers by injecting case-resolving content and experts into the Salesforce UI as they work.

“Etherios provides customers with consulting and implementation services in the areas of Sales, Customer Service, Field Service and IoT [Internet of Things]. … Etherios capabilities span operational strategy, business process, technical design and implementation expertise.”

 Founded in 2005, Coveo leverages search technology to boost users’ skills, knowledge, and proficiency while supplying tools for collaboration and self-service. The company maintains offices in the U.S. (SanMateo, CA), the Netherlands, and Quebec.

 A division of Digi International, Etherios launched in 2008 specifically to supply cloud-based tools for Salesforce users. They prefer to inhabit the cutting edge, and operate out of Chicago, Dallas, and San Francisco.

 Cynthia Murrell, July 7, 2015

Sponsored by, publisher of the CyberOSINT monograph

Misinformation and Truth: An Issue in Play

July 6, 2015

Navigate to “Italian Newspaper Creates Fake Restaurant to Prove TripAdvisor Sucks.” The story tells the story of a real journalistic operation which created a non existent restaurant. Then the real journalists contributed reviews of the vaporous eatery. TripAdvisor’s algorithms sucked in the content and, according to the write up,

declared La Scaletta the best restaurant in the town, beating out another highly-regarded restaurant with over 300 reviews (most of them positive).

Ah, real journalism, truth, and the manipulation of socially-anchored systems.

Now direct your attention to “Fact Verification As Easy as Spellcheck?” The point of this article is that figuring what’s accurate and inaccurate is non trivial. The write up reports:

Researchers at Indiana University decided to try a different approach to the problem.  Instead of trying to build complex logic into a program, researchers proposed something simpler.  Why not try measure the likelihood of a statement being true by analyzing the proximity of its terms and the specificity of its connectors?

The procedure involves a knowledge graph. Is this the same, much loved graph approach built with the most frequently used mathematical methods? No information to answer that question is in my files, gentle reader.

My radar is directed at Bloomington, Indiana. Perhaps more information will become available on software’s ability to figure out if the Italian restaurant is real or the confection of real journalists. Note: The GOOG seems to be laboring in this vineyard was well. See this Bezos story.

What if—just hypothetical, of course—the “truth” methods can be spoofed by procedures more sophisticated that cooking up some half cooked tortellini? Those common numerical methods are pliable, based on my team’s research. Really flexible when it comes to what’s “truth.”

Stephen E Arnold, July 6, 2015

Short Honk: Renormalization Group Equation

July 6, 2015

A short honk for the math lovers: I recommend “Why Deep Learning Works 2: The Renormalization Group.” The post presents several important examples of renormalization, which is Fancy Math for figuring out variances and then figuring out what the inputs are “about.” (Math nerds, I am trying to express important concepts in a very, very simple way.) The big ideas explained pretty well are manifolds and renormalization. The methods are applicable to certain types of machine learning. A happy quack to Dr. Charles Martin for this article.

Stephen E Arnold, July 6, 2015

What Watson Can Do For Your Department

July 6, 2015

The story of Justin Chen, a Finance Manager, is one of many “Stories by Role” now displayed on IBM. Each character has a different job, such as Liza Hay from Marketing, Donny Cruz from IT and Anisa Mirza from HR. Each job comes with a problem for which Watson, IBM’s supercomputer, has just the solution. Justin, the article relates, is having trouble deciding which payments to follow. Watson provides solutions,

“With IBM® Watson™ Analytics, Justin can ask which customers are least likely to pay, who is most likely to pay and why. He can analyze this information… [and] collect more payments more efficiently… With Watson Analytics, Justin can ask which customers are likely to leave and which are likely to stay and why. He can use the answers for analysis of customer attrition and retention, predict the effect on revenue and determine which customer investments will lead to more profitable growth.”

It seems that the now world-famous Watson has been converted from search to a basket containing any number of IBM software solutions. It isn’t stated in the article, but we can probably assume that the revenue from each solution counts toward Watson’s soon to be reported billions in revenue.

Chelsea Kerwin, July 6, 2014

Sponsored by, publisher of the CyberOSINT monograph


« Previous PageNext Page »