CyberOSINT banner

ttwick Deal Search

April 18, 2015

At lunch on Friday, one of the 20 somethings who gnaw at me like locusts in an Illinois corn field, I learned about a “revolutionary”, “Google killing,” super search system. I listened to the champion’s explanation of semantic search, next generation architectures, yadda yadda.

I navigated to the Web site and learned that there is a demo and an application of the search technology to deals. I allowed the service to “know” my location, which is of modest assistance because we are testing virtual private network vendors, but ttwick seemed happy enough to know I was someplace.

I ran a query for but when I clicked on the “search box” which displayed my location, this is what the system displaced to me:


The dark vertical panel on the left was difficult for me to read. I am 70 years old, wear trifocals, and have some difficulty discerning pale blue text against a black background. One of the sharp eyed 20 somethings pointed out that the black vertical panel allowed me to click and narrow the results list to entertainment, food, health, and baby along with the catchall Miscellaneous.

I saw a service called Deal Chicken. This strikes me as somewhat similar with the addition of hotlinks to winnow results. I will add the ttwick engine to my list. I do want my abdomen to look just like the one in the Body Allure ad.

Stephen E Arnold, April 18, 2015

The Law of Moore: Is Information Retrieval an Exception?

April 17, 2015

I read “Moore’s Law Is Dead, Long Live Moore’s Law.” The “law” cooked up by a chip company suggests that in technology stuff gets better, faster, and cheaper.” With electronic brains getting, better, faster, cheaper, it follows that phones are more wonderful every few weeks. The logic applies to laptops, intelligence in automobiles, and airline related functions.

The article focuses on the Intel-like world of computer parts. The write up makes this point which I highlighted:

From 2005 through 2014, Moore’s Law continued — but the emphasis was on improving cost by driving down the expense of each additional transistor. Those transistors might not run more quickly than their predecessors, but they were often more power-efficient and less expensive to build.

Yep, the cheaper point is significant. The article then tracks to a point that warranted a yellow highlight:

After 50 years, Moore’s Law has become cultural shorthand for innovation itself. When Intel, or Nvidia, or Samsung refer to Moore’s Law in this context, they’re referring to the continuous application of decades of knowledge and ingenuity across hundreds of products. It’s a way of acknowledging the tremendous collaboration that continues to occur from the fab line to the living room, the result of painstaking research aimed to bring a platform’s capabilities a little more in line with what users want. Is that marketing? You bet. But it’s not just marketing.

These two points sparked my thinking about the discipline of enterprise information access. Enterprise search relies on a wide range of computing operations. If these operations are indeed getting better, faster, and cheaper, does it make sense to assume that information retrieval is also getting better, faster, and cheaper?

What is happening from my point of view is that the basic design of enterprise information access systems has not changed significantly in the last decade, maybe longer. There is the content acquisition module, the normalization or transformation module, the indexing module, the query processing module, the administrative module, and other bits and pieces.

The outputs from today’s information access systems do not vary much from the outputs available from systems on offer a decade ago. Endeca generated visual reports by 2003. Relationship maps were available from Inxight and Semio (remember that outfit) even earlier. Smart software like the long forgotten Inference system winnowed results on what the user sought in his or her query. Linguistic functions were the heart and soul of Delphes. Statistical procedures were the backbone of PLS, based on Cornell wizardry.

Search and retrieval has benefited from faster hardware. But the computational burdens piled on available resources have made it possible to layer on function after function. The ability to make layers of content processing and filtering work has done little to ameliorate the grousing about many enterprise search systems.

The fix has not been to deliver a solution significantly different from what Autonomy and Fast Search offered in 2001. The fix has been to shift from what users’ need to deal with business questions to:

  • Business intelligence
  • Semantics
  • Natural language processing
  • Cognitive computing
  • Metadata
  • Visualization
  • Text analytics.

I know I am missing some of the chestnuts. The point is that information access may be lagging behind certain other sectors; for example, voice search via a mobile device. When I review a “new” search solution, I often find myself with the same sense of wonder I had when I first walked through the Smithsonian Museum: Interesting but mostly old stuff.

Just a thought that enterprise search is delivering less, not “Moore.”

Stephen E Arnold, April 17, 2015

Improving the Preservica Preservation Process

April 17, 2015

Preservica is a leading program for use in digital preservation, consulting, and research, and now it is compatible with Microsoft SharePointECM Connection has the scoop on the “New Version Of Preservica Aligns Records Management And Digital Preservation.”  The upgrade to Preservica will allow SharePoint managers to preserve content from SharePoint as well as Microsoft Outlook, a necessary task as most companies these days rely on the Internet for business and need to archive transactions.

Preservica wants to become a bigger part of enterprise system strategies such as enterprise content management and information governance.  One of their big selling points is that Preservica will archive information and keep it in a usable format, as obsoleteness becomes a bigger problem as technology advances.

“Jon Tilbury, CEO Preservica adds: ‘The growing volume and diversity of digital content and records along with rapid technology and IT refresh rates is fuelling the need for Records and Compliance managers to properly safe-guard their long-term and permanent digital records by incorporating Digital Preservation into their overall information governance lifecycle. The developing consensus is that organizations should consider digital preservation from the outset – especially if they hold important digital records for more than 10 years or already have records that are older than 10 years. Our vision is to make this a pluggable technology so it can be quickly and seamlessly integrated into the corporate information landscape.’ ”

Digital preservation with a compliant format is one of the most overlooked problems companies deal with.  They may have stored their records on a storage device, but if they do not retain the technology to access them, then the records are useless.  Keeping files in a readable format not only keeps them useful, but it also makes the employee’s life who has to recall them all the easier.

Whitney Grace, April 17, 2015
Stephen E Arnold, Publisher of CyberOSINT at

Exorbyte Pivots and Slows Twitter Stream

April 16, 2015

I was doing a routine check of search vendor Web sites. I noticed that Exorbyte, a search vendor recognized as a Deloitte Technology Fast 50 company in 2o10, has pivoted from eCommerce to identify resolution. What I find interesting is that there are some similarities with WCC Group’s strategy. That company focuses on the human resource and government approach to human information.

Here’s the new look for the Exorbyte Web site:


Exorbyte, like other search vendors, is responding to market signals for security related functions. Coincident with this shift, Exorbyte slowed its stream of Twitter posts. There is considerable chatter about smart software like IBM Watson (Thomas or Sherlock version?). Exorbyte is another example of a vendor with search as a core function and with a positioning that does not evoke the associations of European enterprise search vendors which have been a source of some consternation.

Stephen E Arnold, April 16, 2015

Mobile Office 365 Usage on the Rise

April 16, 2015

A recent study by has found that Mobile Office 365 is growing quickly among its users. Mobile is a huge consideration for all software companies, and now the data is proving that mobile is the go-to for even heavy-hitting work and enterprise applications. Read more in the AppsTechNews article, “The state of mobile Office 365 usage in the workplace – and what it means for SharePoint.”

The article begins with the research:

“24% of mobile users are now using mobile Office 365 in the cloud, compared to 18% six months ago. Not surprisingly, the most popular activity conducted by business users on mobile devices was online and offline document access, according to 81% of the vote. 7% most frequently use their mobile devices to add a SharePoint site, while 4% prefer to favourite documents for later offline access.”

Retrieval is still proven to be the most common mobile function, as devices are still not designed well for efficient input. To keep up with future developments regarding mobile use in the enterprise, stay tuned to Stephen E. Arnold has made a career out of following all things search, and his SharePoint feed is an accessible place to stay tuned in to the latest SharePoint developments.

Emily Rae Aldridge, April 16, 2015

Stephen E Arnold, Publisher of CyberOSINT at

Yahoo: A Portion of Its Fantastical Search History

April 15, 2015

I have a view of Yahoo. Sure, it was formed when I was part of the team that developed The Point (Top 5% of the Internet). Yahoo had a directory. We had a content processing system. We spoke with Yahoo’s David Filo. Yahoo had a vision, he said. We said, No problem.

The Point became part of Lycos, embracing Fuzzy and his round ball chair. Yahoo, well, Yahoo just got bigger and generally went the way of general purpose portals. CEOs came and went. Stakeholders howled and then sulked.

I read or rather looked at “Yahoo. Semantic Search From Document Retrieval to Virtual Assistants.” You can find the PowerPoint “essay” or “revisionist report” on SlideShare. The deck was assembled by the director of research at Yahoo Labs. I don’t think this outfit is into balloons, self driving automobiles, and dealing with complainers at the European Commission. Here’s the link. Keep in mind you may have to sign up with the LinkedIn service in order to do anything nifty with the content.

The premise of the slide deck is that Yahoo is into semantic search. After some stumbles, semantic search started to become a big deal with Google and rich snippets, Bing and its tiles, and Facebook with its Like button and the magical Open Graph Protocol. The OGP has some fascinating uses. My book CyberOSINT can illuminate some of these uses.

And where is Yahoo in the 2008 to 2010 interval when semantic search was abloom? Patience, grasshopper.

Yahoo was chugging along with its Knowledge Graph. If this does not ring a bell, here’s the illustration used in the deck:


The date is 2013, so Yahoo has been busy since Facebook, Google, and Microsoft were semanticizing their worlds. Yahoo has a process in place. Again from the slide deck:


I was reminded of the diagrams created by other search vendors. These particular diagrams echo the descriptions of the now defunct Siderean Software server’s set up. But most content processing systems are more alike than different.

Read more

The Evolution of SharePoint Online Collaboration

April 14, 2015

SharePoint Online is quickly playing catch up to the on-premises version, but the fact that they weren’t identical from the start is still perplexing. Tech Target explores the topic further in their article, “Following the SharePoint Online Collaboration Evolution.”

The article sums up the current situation:

“To an outsider, it would appear that SharePoint would have been the perfect one-to-one on-premises and cloud server option, considering it’s a Web-based option. However, it’s more complex than a move in data center location that’s local to Microsoft. And in terms of development, much of the effort has gone into the option that will drive the migration to Office 365 and the revenue from such a move, which is Exchange Online.”

Hybrid enablement is one area that SharePoint 2016 watchers are keeping a close eye on, as part of an overall focus on bringing more Office 365 experiences to on-premises customers. On the other side of the coin, certain online features are being strengthened by their reliance on SharePoint on-site under the hood. Look for Delva, Office 365, and OneDrive for Business among others. Overall, the future of SharePoint is exciting but still coming into focus. Keep an eye on, a Web service run by a longtime search expert Stephen E. Arnold. His SharePoint feed will make additional SharePoint news accessible as it becomes available.

Emily Rae Aldridge, April 14, 2015

Stephen E Arnold, Publisher of CyberOSINT at

Apple and App Search: Maybe a New Approach Will Work

April 13, 2015

I remember looking for a teleprompter app via my iPad. I used the Apple store and punched in the query “teleprompter.” I got some hits, but the information returned forced me to download apps, test them, and then do some poking around on message boards.

The finding part of the Apple app search worked okay. It did nothing to reassure me that I was not overlooking an app presented with different terms used to describe what I needed: A way to display a script on an iPad. The most important feature I needed was simply not findable via the Apple search system. Run this query: “Support for Wi Drive.” Let me know how that works out for you.

I read “Report: Apple Acquired Startup Ottocat for Its App Store Search Technology.” The important point is that Apple is now taking a look at its existing technology and reaching what I perceive as a pragmatic decision: Buy something that maybe sort of works.

According the write up:

Ottocat’s technology allows the app shopper to use increasingly specific search terms to zero in on the right app. The technology also adds some metadata around the app listing — things like star ratings and percentile rankings. Ottocat also created tools for app developers to get their apps in front of just the right kind of user.

Will it work? Who knows but I hope so. The iPad’s been around with its many apps for five years. Speed is relative but not precision and recall.

Stephen E Arnold, April 13, 2015


Medical Search: A Long Road to Travel

April 13, 2015

Do you want a way to search medical information without false drops, the need to learn specialized vocabularies, and sidestep Boolean? Apparently the purveyors of medical search systems have left a user scratch without an antihistamine within reach.

Navigate to Slideshare (yep, LinkedIn) and flip through “Current Advances to Bridge the Usability Expressivity Gap in biomedical Semantic Search.” Before reading the 51 slide deck, you may want to refresh yourself with Quertle, PubMed, MedNar, or one of the other splendiferous medical information resources for researchers.

The slide deck identifies the problems with the existing search approaches. I can relate to these points. For example, those who tout question answering systems ignore the difficulty of passing a question from medicine to a domain consisting of math content. With math the plumbing in many advanced medical processes, the weakness is a bit of a problem and has been for decades.

The “fix” is semantic search. Well, that’s the theory. I interpreted the slide deck as communicating how a medical search system called ReVeaLD would crack this somewhat difficult nut. As an aside: I don’t like the wonky spelling that some researchers and marketers are foisting on the unsuspecting.

I admit that I am skeptical about many NGIA or next generation information access systems. One reason medical research works as well as it does is its body of generally standardized controlled term words. Learn MeSH and you have a fighting chance of figuring out if the drug the doctor prescribed is going to kill off your liver as it remediates your indigestion. Controlled vocabularies in scientific, technology, engineering, and medical domains address the annoying ambiguity problems encounter when one mixes colloquial words with quasi consultant speak. A technical buzzword is part of a technical education. It works, maybe not too well, but it works better than some of the wild and crazy systems which I have explored over the years.

You will have to dig through old jargon and new jargon such as entity reconciliation. In the law enforcement and intelligence fields, an entity from one language has to be “reconciled” with versions of the “entity” in other languages and from other domains. The technology is easier to market than make work. The ReVeaLD system is making progress as I understand the information in the slide deck.

Like other advanced information access systems, ReVeaLD has a fair number of moving parts. Here’s the diagram from Slide 27 in the deck:


There is also a video available at this link. The video explains that Granatum Project uses a constrained domain specific language. So much for cross domain queries, gentle reader. What is interesting to me is the similarity between the ReVeaLD system and some of the cyber OSINT next generation information access systems profiled in my new monograph. There is a visual query builder, a browser for structured data, visualization, and a number of other bells and whistles.

Several observations:

  • Finding relevant technical information requires effort. NGIA systems also require the user to exert effort. Finding the specific information required to solve a time critical problem remains a hurdle for broader deployment of some systems and methods.
  • The computational load for sophisticated content processing is significant. The ReVeaLD system is likely to such up its share of machine resources.
  • Maintaining a system with many moving parts when deployed outside of a research demonstration presents another series of technical challenges.

I am encouraged, but I want to make certain that my one or two readers understand this point: Demos and marketing are much easier to roll out than a hardened, commercial system. Just as the EC’s Promise program, ReVeaLD may have to communicate its achievements to the outside world. A long road must be followed before this particular NGIA system becomes available in Harrod’s Creek, Kentucky.

Stephen E Arnold, April 13, 2015

Spelling Suggestions via the Bisect Module

April 13, 2015

I know that those who want to implement their own search and retrieval systems learn that some features are tricky to implement. I read “Typos in Search Queries at Khan Academy.”

The author states:

The idea is simple. Store a hash of each word in a sorted array and then do binary search on that array. The hashes are small and can be tightly packed in less than 2 MB. Binary search is fast and allows the spell checking algorithm to service any query.

What is not included in the write up is detail about the time required and the frustration experienced to implement what some senior managers assume is trivial. Yep, search is not too tough when the alleged “expert” has never implemented a system.

With education struggling to teach the three Rs, the need for software that caulks the leaks in users’ ability to spell is a must have.

Stephen E Arnold, April 13, 2015

Next Page »