CyberOSINT banner

Algorithmic Art Historians

July 14, 2015

Apparently, creativity itself is no longer subjective. MIT Technology Review announces, “Machine Vision Algorithm Chooses the Most Creative Paintings in History.” Traditionally, art historians judge how creative a work is based on its novelty and its influence on subsequent artists. The article notes that this is a challenging task, requiring an encyclopedic knowledge of art history and the judgement to decide what is novel and what has been influential. Now, a team at Rutgers University has developed an algorithm they say is qualified for the job.

Researchers Ahmed Elgammal and Babak Saleh credit several developments with bringing AI to this point. First, we’ve recently seen several breakthroughs in machine understanding of visual concepts, called classemes. that include recognition of factors from colors to specific objects. Another important factor: there now exist well-populated online artwork databases that the algorithms can, um, study. The article continues:

“The problem is to work out which paintings are the most novel compared to others that have gone before and then determine how many paintings in the future have uses similar features to work out their influence. Elgammal and Saleh approach this as a problem of network science. Their idea is to treat the history of art as a network in which each painting links to similar paintings in the future and is linked to by similar paintings from the past. The problem of determining the most creative is then one of working out when certain patterns of classemes first appear and how these patterns are adopted in the future. …

“The problem of finding the most creative paintings is similar to the problem of finding the most influential person on a social network, or the most important station in a city’s metro system or super spreaders of disease. These have become standard problems in network theory in recent years, and now Elgammal and Saleh apply it to creativity networks for the first time.”

Just what we needed. I have to admit the technology is quite intriguing, but I wonder: Will all creative human endeavors eventually have their algorithmic counterparts and, if so, how will that effect human expression?

Cynthia Murrell, July 14, 2015

Sponsored by, publisher of the CyberOSINT monograph

Watson Based Tradeoff Analytics Weighs Options

July 13, 2015

IBM’s Watson now lends its considerable intellect to helping users make sound decisions. In “IBM Watson Tradeoff Analytics—General Availability,” the Watson Developer Community announces that the GA release of this new tool can be obtained through the Watson Developer Cloud platform. The release follows an apparently successful Beta run that began last February. The write-up explains that the tool:

“… Allows you to compare and explore many options against multiple criteria at the same time. This ultimately contributes to a more balanced decision with optimal payoff.

“Clients expect to be educated and empowered: ‘don’t just tell me what to do,’ but ‘educate me, and let me choose.’ Tradeoff Analytics achieves this by providing reasoning and insights that enable judgment through assessment of the alternatives and the consequent results of each choice. The tool identifies alternatives that represent interesting tradeoff considerations. In other words: Tradeoff Analytics highlights areas where you may compromise a little to gain a lot. For example, in a scenario where you want to buy a phone, you can learn that if you pay just a little more for one phone, you will gain a better camera and a better battery life, which can give you greater satisfaction than the slightly lower price.”

For those interested in the technical details behind this Watson iteration, the article points you to Tradeoff Analyticsdocumentation. Those wishing to glimpse the visualization capabilities can navigate to  this demo. The write-up also lists post-beta updates and explains pricing, so check it out for more information.

Cynthia Murrell, July 13, 2015

Sponsored by, publisher of the CyberOSINT monograph

Semantic Search: How Far Will This Baloney Tube Stretch?

July 12, 2015

I have seen a number of tweets, messages, and comments about “Semantic Search: the Future of Search Marketing?”

For those looking for traffic, it seems that using the phrase “semantic search” in conjunction with “search marketing” is Grade A click bait. Go for it.

My view is a bit different. I think that the baloney manufactured from semantic search (more correctly the various methods that can be grouped under the word semantic) is low grade baloney.

Search marketing is on a par with the institutional pizza pumped out for freshman in a dorm in DeKalb, Illinois. Yum, tasty. What is it? Oh, I know it is something that is supposed to be nutritious and tasty. The reality is that the pizza isn’t. That’s search marketing. The relevant result may not be. Relevance is jiggling results so that a message is displayed whether the user wants that message or not. Not pizza.

Here’s a passage in the write up I highlighted in pale yellow, the color in my marker set closest to the dorm pizza:

Semantic search is the technology the search engines employ to better understand the context of a search.

Contrast this definition with this one from “Breakthrough Analysis: Two + Nine Types of Semantic Search” published in 2010, five years before the crazy SEO adoption of the buzzword, if not the understanding of what “semantic” embraces:

Semantics (in an IT setting) is meaningful computing: the application of natural language processing (NLP) to support information retrieval, analytics, and data-integration that compass both numerical and “unstructured” information.

The article then trots out these semantic search options:

  1. Related searches and queries
  2. Reference results (dictionary look up)
  3. Annotated results
  4. Similarity search
  5. Syntactic annotations
  6. Concept search
  7. Ontology based search
  8. Semantic Web search
  9. Faceted search
  10. Clustered search
  11. Natural language search

Now there are many, many issues with this list. How about differentiating faceted, concept, and clustered search? Give up yet?

The point is that semantic search is not one thing. If one accepts this list as the touchstone, the functions referenced are going to contain other content processing operations.

The problem is that these functions on their own or used in some magical, affordable combination are not likely to deliver what the user wants.

The user wants relevant results which pertain directly to her specific information need.

The search engine optimization and marketing crowd want the results to be what they want to present to a user.

The objectives are different and may not be congruent or even similar.

In short, the notion of taking crazy, generalized concepts and slapping them on marketing is likely to produce howlers like this write up and the equally wonky list from 2010.

The point is that semantic baloney has been in the supermarket for a long time.

Obviously this baloney has a long shelf life.

In the meantime, how is ad supported Web search working for you? Oh, how is that in house information access system working for you?

If you want traffic, buy Adwords. Please, do not deliver to me the six pack of baloney.

Stephen E Arnold, July 12, 2015

What Is Watt? It Is the Innovation That Counts.

July 11, 2015

Years ago I worked with a polymath named Fred Czufin. Czufin was an author, writer, consultant, and former Office of Strategic Services cartographic specialist. Today Czufin would be buried in geocoding.

Why am I mentioning a fellow who died in 2009.

Czufin introduced me to James Watt. I knew the steam engine thing, but Czufin was bonkers over James Watt’s innovative streak.

I thought of Czufin, my ignorance of an important scientist, and our reasonably fun times when we collaborated on some interesting projects.

I read “A Twelve Year Flash of Genius.” The write up sparked anew my effort to chip away at my ignorance of this 18th century inventor. Watt struggled with the engineering problems of early Newcomen pumps. Mostly these puppies exploded.

Watt went for a walk and cook dup the idea of a condenser. Eureka. Steam engines mostly worked. Even my server room air conditioner contains a version of Watt’s invention.

I am not going to take sides in the flash of genius approach to innovation. One can argue that the antecedents for Watt’s thinking littered the laboratories of his predecessors, tinkerers, and fellow scientists.

My hunch is that there was no single epiphany. The result of sifting through many facts, fiddling around, and then trying to figure out if and then why something worked made him a bright person.

As I think about James Watt, I wonder when a similar thinker will come up with a breakthrough in information access. Most of the search systems with which I am familiar are in their pre-condenser stage. They blow up, fizzle, disappoint, hiss, and produce more angst than smiley faces.

My hunch is that Czufin would be as impatient as I about the opportunity a modern day James Watt can deliver. Search has more in common with Newcomen’s pump than a solution to a very important information problem.

Stephen E Arnold, July 11, 2015

Researchers Glean Audio from Video

July 10, 2015

Now, this is fascinating. Scary, but fascinating. MIT News explains how a team of researchers from MIT, Microsoft, and Adobe are “Extracting Audio from Visual Information.” The article includes a video in which one can clearly hear the poem “Mary Had a Little Lamb” as extrapolated from video of a potato chip bag’s vibrations filmed through soundproof glass, among other amazing feats. I highly recommend you take four-and-a-half minutes to watch the video.

 Writer Larry Hardesty lists some other surfaces from which the team was able reproduce audio by filming vibrations: aluminum foil, water, and plant leaves. The researchers plan to present a paper on their results at this year’s Siggraph computer graphics conference. See the article for some details on the research, including camera specs and algorithm development.

 So, will this tech have any non-spying related applications? Hardesty cites MIT grad student, and first writer on the team’s paper, Abe Davis as he writes:

 “The researchers’ technique has obvious applications in law enforcement and forensics, but Davis is more enthusiastic about the possibility of what he describes as a ‘new kind of imaging.’

“‘We’re recovering sounds from objects,’ he says. ‘That gives us a lot of information about the sound that’s going on around the object, but it also gives us a lot of information about the object itself, because different objects are going to respond to sound in different ways.’ In ongoing work, the researchers have begun trying to determine material and structural properties of objects from their visible response to short bursts of sound.”

 That’s one idea. Researchers are confident other uses will emerge, ones no one has thought of yet. This is a technology to keep tabs on, and not just to decide when to start holding all private conversations in windowless rooms.

 Cynthia Murrell, July 10, 2015

Sponsored by, publisher of the CyberOSINT monograph

SAS Text Miner Promises Unstructured Insight

July 10, 2015

Big data is tools help organizations analyze more than their old, legacy data.  While legacy data does help an organization study how their process have changed, the data is old and does not reflect the immediate, real time trends.  SAS offers a product that bridges old data with the new as well as unstructured and structured data.

The SAS Text Miner is built from Teragram technology.  It features document theme discovery, a function the finds relations between document collections; automatic Boolean rule generation; high performance text mining that quickly evaluates large document collection; term profiling and trending, evaluates term relevance in a collection and how they are used; multiple language support; visual interrogation of results; easily import text; flexible entity options; and a user friendly interface.

The SAS Text Miner is specifically programmed to discover data relationships data, automate activities, and determine keywords and phrases.  The software uses predictive models to analysis data and discover new insights:

“Predictive models use situational knowledge to describe future scenarios. Yet important circumstances and events described in comment fields, notes, reports, inquiries, web commentaries, etc., aren’t captured in structured fields that can be analyzed easily. Now you can add insights gleaned from text-based sources to your predictive models for more powerful predictions.”

Text mining software reveals insights between old and new data, making it one of the basic components of big data.

Whitney Grace, July 10, 2015

Sponsored by, publisher of the CyberOSINT monograph

Cloud is Featured in SharePoint 2016

July 9, 2015

Users are eager to learn all they can about the upcoming release of SharePoint Server 2016. Mark Kashman recently gave a presentation and additional information which is covered in the Redmond Channel Partner article, “Microsoft: Cloud Will Play Prominent Role in SharePoint 2016.”

The article begins:

“Microsoft recently detailed its vision for SharePoint Server 2016, which appears to be very cloud-centric. Microsoft is planning a beta release of the new SharePoint Server 2016 by the end of this year, with final product release planned for Q2 2016. Mark Kashman, a senior product manager at Microsoft on the SharePoint team, gave more details about Microsoft’s plans for the server during a June 17 presentation at the SPBiz Conference titled ‘SharePoint Vision and Roadmap.’”

Users are still waiting to hear how this “cloud-centric” approach affects the overall usability of the product. As more details become available, stay tuned to for the highlights. Stephen E. Arnold is a longtime leader in search, and his distillation of SharePoint new, tips, and tricks on his dedicated SharePoint feed is a way for users to stay on top of the changes without a huge investment in time.

Emily Rae Aldridge, July 9, 2015

Sponsored by, publisher of the CyberOSINT monograph


The Bing Listicle: Bing Search Strategy

July 8, 2015

I noted a slide show designed to pump up page views for eWeek. Navigate to “What the Bing Search Engine Brings to Microsoft’s Web Strategy.” Prepare to be patient because the code used to display the content makes life interesting.

Strategy means the big picture. Tactics means changing the color of an item in the picture. Bing has been an interesting search engine. The team has had a bit of a revolving door. The spin of the door has sucked in Australian and Chinese search wizards. The Bing thing sold its map “business.” The Bing thing cut a deal with AOL to provide search and ads, a sure fire combination for improved relevance in search results.

The listicle hits a number of strategic points. I want to comment on three. Visit the original listicle for the remaining strategic gems.

Strategic Move 1: Apple and Microsoft have a search partnership. Now Apple is rumored to be poking around in the Web search space. The listicle asserts that “Apple, Microsoft Form Search Partnership.” I find this interesting. It may be tactical for Apple and strategic for Microsoft. If Apple creates a semi workable search system, will Apple continue to embrace the besieged Microsoft? My money is on Apple for a deal that helps out Apple until the deal no longer helps out Apple.

Strategic Move 2: Bing offers a rewards program. This is pay to play. If lots of people use rewards, will Microsoft find the offer untenable. My hunch is that this Rewards thing is like the annoying and now-dead Scroogle: A desperate tactic, not a strategic move.

Strategic Move 3: Bing is “handy on Microsoft hardware.” Okay, but I use Apple computers. The notion that Bing is baked into Windows 10 and Windows hardware seems to make sense. But I turn off the crazy Microsoft search functions and rely on third party tools. The strategic move is great for Microsoft internal pitches. The tactic is one that may annoy some folks who use Windows hardware and is essentially another tactic to make Bing zing. If Bing is so wonderful, what’s Microsoft doing with Fast Search technology and the Delve search? I would conclude there is no search strategy at Microsoft.

Stephen E Arnold, July 8, 2015

Want To Know What A Semantic Ecosystem Is

July 8, 2015

Do you want to know what a semantic ecosystem is? The answer is available from TopQuadrant in its article, “Semantic Ecosystem-What’s That About?”  According to the article, a semantic ecosystem enables patterns to be discovered, show the relationships between and within data sources, add meaning to raw data artifacts, and dynamically bring information together.

In short, it shows how data and its sources connect with each other and extracts relationships from it.

What follows the brief explanation about what a semantic ecosystem can do is a paragraph about the importance of data, how it takes many forms, etc., etc.  Trust me, you have heard it before. It then makes a comparison with a natural ecosystem, i.e. the ones find in nature.

The article continues with this piece:

“As in natural ecosystems, we believe that success in business is based on capability – and the ability to adapt and evolve new capabilities. Semantic ecosystems transform existing diverse information into valuable semantic assets. Key characteristics of a semantic ecosystem are that it is adaptable and evolvable. You can start small – with one or more key business solutions and a few data sources – and the semantic foundation can grow and evolve with you.”

It turns out a semantic ecosystem is just another name for information management.  TopQuadrant coined the term to associate with their products and services.  Talk about fancy business jargon, but TopQuadrant makes a point about having an information system work so well that it seems natural.  When a system works naturally, it is able to intuit needs, interpret patterns, and make educated correlations between data.

Whitney Grace, July 8, 2015

Sponsored by, publisher of the CyberOSINT monograph


Semantic Search and Challenging Patent Document Content Domains

July 7, 2015

Over the years, I have bumped into some challenging content domains. One of the most difficult was the collection of mathematical papers organized with the Dienst architecture. Another was a collection of blog posts from African bulletin board systems in a number of different languages, peppered with insider jargon. I also recall my jousts with patent documents for some pretty savvy outfits.

The processing of each of these corpuses and making them searchable by a regular human being remains an unsolved problem. Progress has been slow, and the focus of many innovators has been on workarounds. The challenge of each corpus remains a high hurdle, and in my opinion, no search sprinter is able to make it over the race course without catching a toe and plunging head first into the Multi-layer SB Resin covered surface.

I read “Why Is Semantic Search So Important for Patent Searching?” My answer was and remains, “Because vendors will grab at any buzzy concept in the hopes of capturing a share of the patent research market?”

The write up take a different approach, an approach which I find interesting and somewhat misleading.

The write up states that there are two ways to search for information: Navigational search sort of like Endeca I assume and research search, which is the old fashioned Boolean logic which I really like.

The article points out that keyword search sucks if the person looking for information does not know the exact term. That’s why I used the reference to Dienst. I wanted to provide an example which requires precise knowledge of terminology. That’s a challenge and it requires specialized knowledge from a person who recognizes that he or she may not know the exact terminology required to locate the needed information. Try the Dienst query. Navigate to a whizzy new search engine like and plug away. How is that working out for you, but don’t cheat. You can’t use the term Dienst.

If you run the query on a point and click Web search system like, you cannot locate the term without running a keyword search.

The problems in patents, whether indexed with value added metadata, humans laboring in a warehouse, or with semantic methods are:

  1. Patent documents exist in versions and each document drags along assorted forms which may or may not be findable. Trips to the USPTO with hat in hand and a note from a senator often do not work. Fancy Dan patent attorneys fall back on the good old method of hunting using intermediaries. Not pretty, not easy, not cheap, and not foolproof. The versions and assorted attachments are often unfindable. (There are sometimes interesting reasons for this kettle of fish and the fish within it.) I don’t have a solution to the chains of documents and the versions of patent documents. Sigh.
  2. Patents include art. Usually the novice reacts negatively to lousy screenshots, clunky drawings, and equations which make it tough to figure out what a superscript character is. Keywords and pointing and clicking, metaphors, razzle dazzle search systems, and buzzword charged solutions from outfits like Thomson Reuters and Lexis are just tools, stone tools chiseled by some folks who want to get paid. I don’t have a good solution to the arts and crafts aspect of patent documents. Sigh sigh.
  3. Patent documents are written at a level of generalization, with jargon, Latinate constructs, and assertions that usually give me a headache. Who signed up to read lots of really bad poetry. Working through the Old Norse version of Heimskringla is a walk in the park compared to figuring out what some patents “mean.” I spent a number of years indexing 15th century Latin sermons. At least in that corpus, the common knowledge base was social and political events and assorted religious material. Patents can be all over the known knowledge universe. I don’t know of a patent processing system which can make this weird prose-poetry understandable if there is litigation or findable if there is a need to figure out if someone cooked up the same system and method before the document in question was crafted. Sigh sigh sigh.
  4. None of the systems I have used over the past 40 years does a bang up job of identifying prior art in scientific, technical or medical journal articles, blog posts, trade publications, or Facebook posts by a socially aware astrophysicist working for a social media company. Finding antecedents is a great deal of work. Has been and will be in my opinion. Sigh sigh sigh sigh. But the patent attorneys cry, “Hooray. We get to bill time.”

The write up presents some of those top brass magnets: Snappy visualizations. The idea is that a nifty diagram will address the three problems I identified in the preceding paragraphs. Visualizations may be able to provide some useful way to conceptualize where a particular patent document falls in a cluster of correctly processed patent documents. But an image does not deliver the mental equivalent of a NOW Foods Why Protein Isolate.

Net net: Pitching semantic search as a solution to the challenges of patent information access is a ball. Strikes in patent searching are not easily obtained unless you pay expert patent attorneys and their human assets to do the job. Just bring your checkbook.

Stephen E Arnold, July 7, 2015

« Previous PageNext Page »