CyberOSINT banner

ClearStory Is On the Move

July 1, 2015

The article on Virtual-Strategy Magazine titled ClearStory Data Appoints Dr. Timothy Howes as Chief Technology Offiver; Fromer Vice President of Yahoo, CTO of HP Software, Opsware, and Netscape discusses Howe’s reputation as an innovative thinker who helped invent LDAP. His company Rockmelt Inc. was acquired by Yahoo and he also co-founded Loudcloud, which is now known as Opsware, with the founders of VC firm Andreessen Horowitz, who are current backers of ClearStory Data. Needless to say, obtaining his services is quite a coup for ClearStory. Howe discusses his excitement to join the team in the article,

“There’s a major technology shift happening in the data market right now as businesses want to see and explore more data faster. ClearStory is at the forefront of delivering the next-generation data analysis platform that brings Spark-powered, fast-cycle analysis to the front lines of business in a beautiful, innovative user experience that companies are in dire need of today,” said Howes. “The ClearStory architectural choices made early on, coupled with the focus on an elegant, collaborative user model is impressive.”

The article also mentions that Ali Tore, formerly of Model N, has been named the new Chief Product Officer. Soumitro Tagore of the startup Clari will become the VP of Engineering and Development Operations. ClearStory Data is intent on the acceleration of the movement of data for businesses. Their Intelligent Data Harmonization platform allows data from different sources to be quickly and insightfully explored.

Chelsea Kerwin, July 1, 2014

Sponsored by, publisher of the CyberOSINT monograph

Keyword Search Is Not Productive. Who Says?

June 30, 2015

I noticed a flurry of tweets pointing to a diagram which maps out the Future of Search. You can view the diagram at or Direct your attention to this assertion:

As amount of data grows, keyword search is becoming less productive.

Now look at what will replace keyword search:

  • Social tagging
  • Automatic semantic tagging
  • Natural language search
  • Intelligent agents
  • Web scale reasoning.

The idea is that we will experience a progression through these “operations” or “functions.” The end point is “The Intelligent Web” and the Web scale reasoning approach to information access.

Interesting. But I am not completely comfortable with this analysis.

Let me highlight four observations and then leave you to your own sense of what the Web will become as the amount of data increases.

First, keyword search is a utility function, and it will become ubiquitous. It will not go away or be forgotten. Keyword search will just appear in more and more human machine interactions. Telling your automobile to call John is keyword search. Finding an email is often a matter of plugging a couple of words into the Gmail search box.

Second, more data does translate to programmers lacing together algorithms to deliver information to users. The idea is that a mobile device user will just “get” information. This is a practical response to the form factor, methods to reduce computational loads imposed by routine query processing, and the human desire for good enough information. The information just needs to be good enough which will work for most people. Do you want your child’s doctor to take automatic outputs if your child has cancer?

Third, for certain types of information access, the focus is shifting, as it should, from huge flows of data to chopping flows down into useful chunks. Governments archive intercepts because the computational demands of processing information in real time for large numbers of users who need real time access are an issue. As data volume grows, computing horsepower is laboring to keep pace. Short cuts are, therefore, important. But most of the short cuts require on having a question to answer. Guess what? Those short cuts are often keyword queries. The human may not be doing keyword searching, but the algorithms are.

Fourth, some types of information require both old fashioned Boolean keyword search and retrieval AND the manual, time consuming work of human specialists. In my experience, algorithms are useful, but there are subjects which require the old fashioned methods of querying, reading, researching, analyzing, and discussing. Most of these functions are keyword centric.

In short, keyword queries can be dismissed or dressed up in fancy jargon. I don’t think the method is going away too quickly. Charts and subjective curves are one thing. Real world information interaction is another.

Stephen E Arnold, June 30, 2015

Webinar from BrightFunnel Ties Marketing to Revenue

June 30, 2015

The webinar on BrightFunnel Blog titled Campaign Attribution: Start Measuring True Marketing Impact (How-To Video) adds value to marketing efforts. BrightFunnel defines itself as platform for marketing analytics that works to join marketing more closely to revenue. The webinar is focused on the attribution application. The video poses three major questions that the application can answer about how pipeline and revenue are affected by marketing channels and specific campaigns, as well as how to gain better insight on the customer. The article overviews the webinar,

“Marketers care. We care a lot about what happens to all those leads we generate for sales. It can be hard to get a complete view of marketing impact when you’re limited to trusting that the right contacts, if any, are being added to opportunities! In this recording from our recent webinar, see how BrightFunnel solves key attribution problems by providing seamless visibility into multi-touch campaign attribution so you can accurately measure the impact you have on pipeline and revenue.”

BrightFunnel believes in an intuitive approach, claiming that three to four weeks has been plenty of time for their users to get set up and get to work with their product. They host a series of webinars that allows interested parties to ask direct questions and be answered live.

Chelsea Kerwin, June 30, 2014

Sponsored by, publisher of the CyberOSINT monograph


Alternative Search Engines: The Gray Lady Way

June 29, 2015

I read “Alternative Search Engines.” (Note: If you have to pay to read the article, visit a library and look for the story in the New York Times Magazine.) The process was painful. Distinctions which I find important were not part of the write up. The notion that some outfits actually index Web sites, and other outfits use Bing and Google search results without telling the user or the New York Times this cost cutting, half measure. Well, who cares? I don’t.

The write up asserts:

I was investigating the more practical, or just more traditional, alternatives to Google: Bing (owned by Microsoft), Yahoo (operated by Google back then and by Bing now), (an aggregator of Yahoo/Bing, Google and others) and newer sites like DuckDuckGo and IxQuick (which don’t track your search history), Gibiru and Unbubble (which don’t censor results) and Wolfram Alpha (which curates results). They were all too organized, too logical — the results were all the same, with only slight differences in the order of their presentation. It seemed to me that the Search Engine of Tomorow couldn’t be concerned with the best way to find what users were searching for, but with the best way to find what users didn’t even know they were searching for.

In case the Gray Lady has not figured out the real world, tomorrow means mobile devices. Mobile devices deliver filtered, personalized, swizzled for advertisers results. If you expect to run key word queries on the next iPhone or Android device, give that a whirl and let me know how that works out for you.

The crisis in search is that content is not available. Obtaining primary and certain secondary information is time consuming, difficult, and tedious. The reality of alternative search engines is that these are few and far between.

Do you trust or Do you know what the size of the Exalead search index is? What’s included and what’s omitted from Qwant, the search engine based on Pertimm (who?) which allegedly causes Eric Schmidt to suffer Qwant induced insomnia?

Nah. In Beyond Search, our view has been that the old fashioned, library type of research is a gone goose. The even older fashioned “talk to humans” and “do original research which conforms to the minimal guidelines reviewed in Statistics 101 classes” is just too Baby Boomerish.

With the Gray Lady explaining search, the demise of precision and recall, relevancy, editorial policies for inclusion in an index, and latency between information being available and inclusion in an index is history.

Stephen E Arnold, June 29, 2015

Oracle Data Integrator Extension

June 29, 2015

The article titled Oracle Launches ODI in April with the Aim to Revolutionize Big Data on Market Realist makes it clear that Oracle sees big money in NoSQL. Oracle Data Integrator, or ODI, enables developers and analysts to simplify their lives and training. It cancels the requirement for their learning multiple programming languages and allows them to use Hadoop and the like without much coding expertise. The article states,

“According to a report from PCWorld, Jeff Pollock, Oracle vice president of product management, said, “The Oracle Data Integrator for Big Data makes a non-Hadoop developer instantly productive on Hadoop…” Databases like Hadoop and Spark are targeted towards programmers who have the coding knowledge expertise required to manipulate these databases with knowledge of the coding needed to manage them. On the other hand, analysts usually use software for data analytics.”

The article also relates some of Oracle’s claims about itself, including that it holds a larger revenue than IBM, Microsoft, SAP AG, and Teradata combined. Those are also Oracle’s four major competitors. With the release of ODI, Oracle intends to filter data arriving from a myriad of different places. Clustering data into groups related by their format or framework is part of this process. The end result is a more streamlined version without assumptions about the level of coding knowledge held by an analyst.

Chelsea Kerwin, June 29, 2014

Sponsored by, publisher of the CyberOSINT monograph

A Xoogler Fixes Yahoo Mobile Search

June 27, 2015

If you have not explored Yahoo Search, give it a whirl. Try to find information about these topics:

The query “Yahoo Search: displays this result:


Note that the second hit is to Tumblr. There you go. The other hits point to the very same page I used to launch my search for “Yahoo Search.” Helpful?

Try this query: “price diapers”. On the left side of the results page, Yahoo displayed:


On the right side of the results page, Yahoo displayed:


These are prices from advertisers. Oh, there is a link to something called Yahoo Shopping. Okay, that is one way to generate revenue and create an extra click. Annoying to me. To Yahoo, fulfillment and joy.

Also, try this query: “Dark Web paste sites”.

Here’s the results page:


Ads and two links to Dot ONION addresses. For the Yahoo user, I am not sure if the user will know what to make of this result:


I suppose I can find some positives in these results pages. On the other hand, the impact for me was inconsistency.

Navigate now to “Yahoo Search Becomes More Like Google on Mobile Devices.” The headline tells the story. Yahoo is lost in search space, so the Xoogler running the Yahoo comedy hour is imitating Google.

So much for innovation. One hopes the approach works because when Yahoo is left to its own devices, the information access thing is a bit like a rice cake and water to a Big O tire changer taking a break from three hours of roadside work in the blazing sun.

Stephen E Arnold, June 27, 2015

Matchlight Lights Up Stolen Data

June 26, 2015

It is a common gimmick on crime shows for the computer expert to be able to locate information, often stolen data, by using a few clever hacking tricks.  In reality it is not that easy and quick to find stolen data, but eWeek posted an article about a new intelligence platform that might be able to do the trick: “Terbium Labs Launches Matchlight Data Intelligence Platform.”  Terbium Labs’ Matchlight is able to recover stolen data as soon as it is released on the Dark Web.

How it works is simply remarkable.  Matchlight attaches digital fingerprints to a company’s files, down to the smallest byte.  Data recovered on the Dark Web can then be matched to the Terbium Labs’s database.  Matchlight is available under a SaaS model.  Another option they have for clients is a one-way fingerprinting feature that keeps a company’s data private from Terbium Labs.  They would only have access to the digital fingerprints in order to track the data.  Matchlight can also be integrated into already existing SharePoint or other document management systems.  The entire approach to Matchlight is taking a protective stance towards data, rather than a defensive.

“We see the market shifting toward a risk management approach to information security,” [Danny Rogers, CEO and co-founder of Terbium} said. “Previously, information security was focused on IT and defensive technologies. These days, the most innovative companies are no longer asking if a data breach is going to happen, but when. In fact, the most innovative companies are asking what has already happened that they might not know about. This is where Matchlight provides a unique solution.”

Across the board, data breaches are becoming common and Matchlight offers an automated way to proactively protect data.  While the digital fingerprinting helps track down stolen data, does Terbium Labs have a way to prevent it from being stolen at all?

Whitney Grace, June 26, 2015

Sponsored by, publisher of the CyberOSINT monograph

Digital Reasoning a Self-Described Cognitive Computing Company

June 26, 2015

The article titled Spy Tools Come to the Cloud on Enterprise Tech shows how Amazon’s work with analytics companies on behalf of the government have realized platforms like “GovCloud”, with increased security. The presumed reason for such platforms being the gathering of intelligence and threat analysis on the big data scale. The article explains,

“The Digital Reasoning cognitive computing tool is designed to generate “knowledge graphs of connected objects” gleaned from structured and unstructured data. These “nodes” (profiles of persons or things of interest) and “edges” (the relationships between them) are graphed, “and then being able to take this and put it into time and space,” explained Bill DiPietro, vice president of product management at Digital Reasoning. The partners noted that the elastic computing capability… is allowing customers to bring together much larger datasets.”

For former CIA staff officer DiPietro it logically follows that bigger questions can be answered by the data with tools like the AWS GovCloud and subsequent Hadoop ecosystems. He cites the ability to quickly spotlight and identify someone on a watch list out of the haystack of people as the challenge set to overcome. They call it “cluster on demand,” the process that allows them to manage and bring together data.

Chelsea Kerwin, June 26,  2015

Sponsored by, publisher of the CyberOSINT monograph

Story Telling and Search: Smartlogic Fiction

June 25, 2015

One of my two or three readers sent me a link to an article appearing in the Smartlogic Web log. I found the write up unusual. You may want to check it out: Surviving without Content Intelligence? There’s an Elephant in the Room. The first chapter is here.

The approach is to tell a story which explains the value of Smartlogic’s content intelligence approach. I circled this passage in pale blue:

The OLAP cube and MDM solution he’s spent the first half of the year implementing [you can read about it here] is not going to help him with the emails, call records and file system data that he is being asked to include. He’d always known that 80% of an organization’s data was unstructured – he had hoped that they could get away with the 20% that was structured and easily managed. Now he’s got four times more data to work with, and he can’t just shovel it into the CRM system and hope they can deal with it.

The “read about it here” does not link to anything.

If the story resonates with you, Smartlogic may be exactly what you require.

The subhead “Next Week” includes this passage:

The Smartlogic Semaphore Search Application Framework is a tool for rapidly developing search applications that uniquely combine a Semantic Model with commodity tools such as SOLR and the Google Search Appliance, so users are not restricted to keywords, but can search by meaning as well which dramatically improves the user experience. Last, but not least, the Semaphore Classification Server would have allowed Archie to reliably link structured data and unstructured content without being dependent on existing structures and metadata; but that’s a story for next week.

I found one word fascinating, “commodity.” I think of the Google Search Appliance as an expensive way to process large volumes of content. The GSA no longer takes a one size fits all approach, but it is expensive to set up with fail over and customized functions. Solr is an open source solution perched on top of Lucene. A number of companies offer implementations of these open source products. The current stallion winning races is Elastic, but that is not a commodity like diapers.

The “story” is not complete. Part three will become available soon. Stay tuned.

Stephen E Arnold, June 25, 2015

How the Cloud Might Limit SharePoint Functionality

June 25, 2015

In the highly anticipated SharePoint Server 2016, on-premises, cloud, and hybrid functionality are all emphasized. However, some are beginning to wonder if functionality can suffer based on the variety of deployment chosen. Read all the details in the Search Content Management article, “How Does the Cloud Limit SharePoint Search and Integration?”
The article begins:
“All searches are not created equal, and tradeoffs remain for companies mulling deployment of the cloud, on-premises and hybrid versions of Microsoft’s collaboration platform, SharePoint. SharePoint on-premises has evolved over the years with a focus on customization and integration with other internal systems. That is not yet the case in the cloud with SharePoint Online, and there are still unique challenges for those who look to combine the two products with a hybrid approach.”
The article goes on to say that there are certain restrictions, especially with search customization, for the SharePoint Online deployment. Furthermore, a good amount of configuration is required to maximize search for the hybrid version. To keep up to date on how this might affect your organization, and the required workarounds, stay tuned to Stephen E. Arnold is longtime search professional, and his work on SharePoint is conveniently collocated in a dedicated feed to maximize efficiency.
Emily Rae Aldridge, June 25, 2015
Sponsored by, publisher of the CyberOSINT monograph

« Previous PageNext Page »