CyberOSINT banner App Enables Improved Search on Twitter

February 4, 2015

The review on KillerStartups titled Finally! An Effective Way to Filter Twitter! discusses and their algorithm for sorting through the noise on Twitter. Unlike Facebook, the article mentions, Twitter has avoided the use of filters, opting for the chaos of every tweet for itself. Beyond following specific conversations or searching via hashtag, there are not very effective methods for organizing and finding relevant tweets. offers a solution:

“ not only presents the most timely topics front and center on both their mobile-optimized site and app but also lets you search for topics that interest you, again presenting the most relevant tweets before the general jibber-jabber. It’s a great solution for anyone who wants to keep up on the conversations around current events but for whom even the thought of opening Twitter’s main feed makes them sigh with frustration.”

This would improve the hashtag search function, which is still going to present a mess of tweets.’s search algorithm promises to bring the more relevant tweets to the forefront. Additionally sweet for many Twitter-users, is not an app unique to use in the United States. It allows the user to pick between the US, the UK, the Netherlands, Spain and Mexico. The article surmises that this list will grow as the app becomes more popular.

Chelsea Kerwin, February 04, 2014

Sponsored by, developer of Augmentext

Apache Solr Search NoSQL Search Shines Solo

February 3, 2015

Apache Solr is an open source enterprise search engine that is used for relational databases and Hadoop. ZDNet’s article, “Why Apache Solr Search Is On The Rise And Why It’s Going Solo” explores why its lesser-known use as a NoSQL store might explode in 2015.

At the beginning of 2014, the most Solr deployments were using it in the old-fashioned way, but 2015 shows that fifty percent of the pipeline is now using it as a first class data store. Companies are upgrading their old file intranets for the enterprise cloud. They want the upgraded system to be searchable and they are relying on Solr to get the job done.

Search is more complex than basic NoSQL and needs something more robust to handle the new data streams. Solr adds the extra performance level, so users have access to their data and nothing is missing.

” ‘So when we talk about Solr, it’s all your data, all the time at scale. It’s not just a guess that we think is likely the right answer. ‘We’re going to go ahead and push this one forward’. We guarantee the quality of those results. In financial services and other areas where guarantees are important, that makes Solr attractive,’ [CEO Will Hayes of LucidWorks, Apache Solr’s commercial sponsor] said.”

It looks like anything is possible for LucidWorks in the coming year.

Whitney Grace, February 03, 2014
Sponsored by, developer of Augmentext

A Microsoft Azure How to PHP Search

February 2, 2015

Microsoft Azure is a cloud computing platform and infrastructure that has a variety of functions. If you want to hook up Microsoft Azure Search to your PHP Web site and are at a loss about what to do, then you need to check out this MSDN blog by Nick J Trogh. Simply titled Nick’s Blog, Trogh writes about “all things technical about the Microsoft platform.” He recently posted a guide about how to integrate Azure Search service into a PHP Web site and take advantage of advanced search techniques.

Trogh does not complicate the installation process and includes screenshots for easy reference. He ends with two last pieces of advice:

“In this article we’ve gone through adding search as a service using Azure Search to your PHP website.  In a matter of minutes you can get started and provide your users with a complex search functionality. And as your site gets more traffic, you can easily scale out your search service. Make sure to get started with the Azure Search service and also try out the other application, data and infrastructure services in the Microsoft Azure platform. You can get started for free on Azure or activate your MSDN Azure benefits.”

Azure is turning out to be a decent cloud service and much more favored than Windows 8. It is rare to see that Microsoft fans are justified in their praise for Windows.

Whitney Grace, February 02, 2014
Sponsored by, developer of Augmentext

IDC and BA Insight: Cartoons and Keyword Search

January 31, 2015

I kid you not. I received a spam mail from an outfit called BA Insight. The spam was a newsletter published every three months. You know that regular flows of news are what ring Google’s PageRank chimes, right?

Here’s the missive:


The lead item is an invitation to:

Unstructured content – email, video, instant messages, documents and other formats accounts for 90% of all digital information.

View the IDC Infographic:
Unlock the Hidden Value of Information

With my fully protected computer, I boldly clicked on the link. I don’t worry too much about keyword search vendors’ malware, but prudence is a habit my now deceased grandma drummed into me.

Here’s what greeted me:


Yep, a giant infographic cartoon stuffed with assertions and a meaningless chunk of jargon: knowledge quotient. Give me cyber OSINT any day.

The concept presented in this fascinating marketing play is that unstructured content has value waiting to be delivered. I learned:

This content is locked in variety locations [sic] and applications made up of separate repositories that don’t talk to each other—e.g., EMC Documentum,, Google Drive, SharePoint, et al.

Now it looks to me as if the word “of” has been omitted between “variety locations”. I also think that EMC Documentum has a new name. Oh, well. Let’s move on.

The key point in the cartoon is that “some organizations can and do unlock information’s hidden value. Organizations with a high knowledge quotient.”

I thought I addressed this silly phrase in this write up.

Let me be clear. IDC is the outfit that sold my information on Amazon without my permission. More embarrassing to me was the fact that the work was attributed to a fellow named Dave Schubmehl, who is one of the, if not the premier, IDC search expert. Scary I believe. Frightful.

What’s the point?

The world of information access has leapfrogged outfits like BA Insight and “experts” like IDC’s pride of pontificators.

The future of information access is automated collection, analysis, and reporting. You can learn about this new world in CyberOSINT: Next Generation Information Access. No cartoons but plenty of screenshots that show what the outputs of NGIA systems deliver to users who need to reduce risk and make decisions of considerable importance and time sensitivity.,

In the meantime, if you want cartoons, flip through the New Yorker. More intelligent fare I would suggest.

How do you become a knowledge quotient leader? In my opinion, not by licensing a keyword search system or buying information from an outfit that surfs on my research. Just a thought.

Stephen E Arnold, January 31, 2015

IBM Flogs Watson as a Lawyer and a Doctor

January 30, 2015

After the disappointing and somewhat negative IBM financial reports, the Watson PR machine has lurched into action. Watson, as you may know, is the next big thing in content processing. Lucene plus home brew code converts search into an artificial intelligence powerhouse. Well, that’s what the Watson cheerleaders want me to believe. I wonder if cheerleading correlates with making sales of more than $1 billion in the next quarter or two or three or four or five.

I read two news items. One is indicative of the use of Watson on a bounded content set, not the big, wide, wonderful world of real time data flows. The other is somewhat troubling but not particularly surprising.

To business.

IBM Watson is now a lawyer. Navigate to “Meet Ross, the IBM Watson-Powered Lawyer.” The idea is that systems from LexisNexis and Thomson Reuters are not what lawyers or the thrifty  legal searcher wants. Nope, Watson converts to a lawyer more easily than a graduate of a third tier law school chases accident victims. According to the write up:

University of Toronto team launches a cognitive computing application that helps lawyers conduct world-class case research.

If I understand the write up, Watson is a search system equipped with the magical powers that allowed the machine and software to win a TV game show. Is post production allowed in the court room? I know that post plays a part in prime time TV. Just asking.

A couple of thoughts. The current line up of legal research systems are struggling to keep revenues and make profits. The reason for the squeeze is that law firms are having some difficulty returning to the salad days of the LingTemcoVought era. Lawyers are getting fired. Lawyers are suing law schools with allegations of false advertising about the employment picture for the newly minted JDs. Lawyers are becoming human resource, public relations, and school counselors. Others are just quitting. I know one Duke Law lawyer who has worked at several of the world’s most highly regarded law firms. Know what the Duke Law degree is doing for money? Running a health club. Interesting development for those embarking on a l;aw degree.

Will Watson generate significant revenue and a profit from its legal research prowess? The answer, in my opinion, is, “No.” What is going to happen is that efficacy of Watson’s usefulness on a bounded set of legal content can be compared to the outputs from the smart system offered by Thomson Reuters and the decidedly less smart system from LexisNexis. For an academic, this comparison will be an endless source of reputational zoomitude. For the person needing legal advice, hire an attorney. These folks advertise on TV now and offer 24×7 hotlines and toll free numbers.

The second item casts a shadow over my skeptical and extremely tiny intellectual capability. Navigate to to “This Medical Supercomputer Isn’t a Pacemaker, IBM Tells Congress.” Excluding classified and closed hearings about next generation intelligence systems, this may be the first time a Lucene recycler is pitching Congress about search and retrieval. The write up says:

The effort to protect decision support tools like Watson from Food and Drug Administration regulation is part of a proposal by the Republican chairman of the House Energy and Commerce Committee, Michigan’s Fred Upton. Called the 21st Century Cures initiative, it’s a major overhaul in the pharmaceutical and medical-device world, and the possibility of its passage is boosted by Republican control of both chambers of Congress. Upton’s bill would give the FDA two years to come up with a verification process for what it calls “medical software.” Such programs wouldn’t require the strict approval process faced by makers of medical devices like heart stents. Another set of products defined as “health software” wouldn’t require FDA oversight at all.

I think an infusion of US government money will provide some revenue to the game show winner. Go for it. Remember I used to work at Halliburton Nuclear and Booz, Allen & Hamilton. But in terms of utility I think that if the Golden Fleece Award were still around, Watson might get a quick look by the 20 somethings filtering the government funding of interesting projects.

Net net: Watson is going to have to vie with HP Autonomy for the billions in revenue from their content processing technologies. Perhaps IBM should take a closer look at i2 and Cybertap? Those IBM owned content processing systems may deliver more value than the keyword centric, super smart Watson system. Just a suggestion from rural Kentucky.

The gray side of the cloud is that IBM may actually get government money. Will Watson bond with Mr. Obama’s health programs? That is an exciting notion.

Stephen E Arnold, January 30, 2015

AI Everywhere: Inevitable, Ubiquitous Just Like the Internet

January 28, 2015

I enjoy Sillycon Valley conflation. We have the wonderful world of artificial intelligence. Wired magazine’s “From Science Fiction to Reality: The Evolution of Artificial Intelligence.”

I learned:

There is so much potential for AI development that it’s getting harder to imagine a future without it. We’re already seeing an increase in workplace productivity thanks to AI advancements. By the end of the decade, AI will become commonplace in everyday life, whether it’s self-driving cars, more accurate weather predictions, or space exploration. We will even see machine-learning algorithms used to prevent cyber terrorism and payment fraud, albeit with increasing public debate over privacy implications. AI will also have a strong impact in healthcare advancements due to its ability to analyze massive amounts of genomic data, leading to more accurate prevention and treatment of medical conditions on a personalized level.

Yep, technology potential. When I worked at Halliburton Nuclear Utility Services in the early 1970s, that nuclear power thing had potential and still does. Think thorium today. Online information access had potential when SDC Orbit and Dialog made it easy to find citations to journal articles. Ah, potential. Think about a Bing and Google query and how much value the results list delivers.

I am okay with search results that generate ad revenue based on filtered and mostly subjective methods for determining relevance. I am okay with smart phones, smart anything really.

What is interesting to me is the assertion by Google that the Internet will just be part of the environment and invisible. If you can’t see it, and you can use information informed by smart software, life will be wonderful.

The thought I have is, “Are the Sillycon Valley wizards conflating smart software and ubiquity without explaining the implications of this happy union?”

My hunch is that what is obvious to them is that control defaults to the “owner” of the ubiquitous systems chewing through routines informed by smart software. The user may be at a disadvantage.

Just run a query and let me know if you can identify what’s missing, what’s incorrect, and what’s an ad? Do you care? I do.

Stephen E Arnold, January 28, 2015

Smartlogic Brand Has Semantic Nibblers Munching Away

January 25, 2015

I flipped through my Overflight about Smartlogic and noticed that the company has dropped off the radar in terms of the information services I monitor. A bit of investigation revealed the type of challenge that “Brainware” and “Thunderstone” faced; namely, other companies pick up the “name” and apply it to other services. Brainware found itself sucked into a vortex of unsavory links on YouTube and Thunderstone has become enmeshed in game references. With truncation and soundex routines, near matches are included in results list.

Smartlogic, an indexing software company, finds its “name” used by:

Smartlogic has a blog called “Life with Semaphore,” but it too can be difficult to find. The dates of the last four posts are January 19, 2015, November 28, 2014, November 19, 2014, and October 6, 2014. The frequency suggests to some indexing services that frequent spidering is not required.

From a practical point of view, how does a potential customer looking for an indexing system get to the “right” Smartlogic? Without effort, a “name” can be eroded. Depending on the company’s circumstances, this can be a good thing (Brainware is now part of Lexmark, a printer company) or a not so good thing (Thunderstone’s John Turnbull is posting search related information on LinkedIn fora).

Smartlogic’s name erosion is an indication that content processing vendors can lose control of a “name” without care and feeding of the digital indexing systems. Are there fixes to brand erosion? Yep, use Augmentext techniques and keep messaging on point with appropriate brand cultivation.

Stephen E Arnold, January 25, 2015

Exalead Chases Customer Support

January 16, 2015

On Exalead’s blog in the post, “Build Customer Interaction For Tomorrow,” the company examines how startups, such as AirBnb, Uber, online banks, and others dedicated to services, have found success. The reason is they have made customer service a priority through the Internet and using applications that make customer service an easy experience. This allowed the startups to enter the oversaturated market and become viable competition.

They have been able to make customer service a priority, because they have eliminated the barriers that come between clients and the companies.

“First of all, they have to communicate with agility inside the company. When you have numerous colleagues, all specialized in a particular function, the silos have to break down. Nothing can be accomplished without efficient cooperation between teams. The aim: transform internal processes and then boost customer interaction.

Next, external communication, headed by the customer. Each firm has to know its clients in order to respond to their needs. The first step was to develop Big Data technologies. Today we have to go further: create a real 360° view of the customer by enriching data. It’s the only way to answer customer challenges, especially in the multi-channel era.”

The startups have changed the tired, old business model that has been used since the 1980s. The 1980s was solid for the shoulder pads and Aqua Net along with the arguably prosperous economy, but technology and customer relations have changed. Customers want to feel like they are not just another piece of information. They want to connect with a real person and have their problems resolved. New ways to organization information and harness data provide many solutions for customer service, but there are still industries that are forgetting to make the customer the priority.

Whitney Grace, January 16, 2015
Sponsored by, developer of Augmentext

Personalizing Search: A Good Thing?

January 13, 2015

Here’s a passage I noted from “Computers Know You Better Than Your Spouse or Siblings”:

“Big Data and machine-learning provide accuracy that the human mind has a hard time achieving, as humans tend to give too much weight to one or two examples, or lapse into non-rational ways of thinking,” he said. Nevertheless, the authors concede that detection of some traits might be best left to human abilities, those without digital footprints or dependent on subtle cognition.

That pesky human characteristic of behavior shifts to match social context is just so annoying.

Search personalization is better than human-directed search. Right? Think about your answer before it is filtered.

Stephen E Arnold, January 14, 2015

Did You Know Oracle and WCC Go Beyond Search?

January 10, 2015

I love the phrase “beyond search.” Microsoft uses it, working overtime to become the go-to resource for next generation search. I learned that Oracle also finds the phrase ideal for describing the lash up of traditional database technology, the decades old Endeca technology, and the Dutch matching system from WCC Group.

You can read about this beyond search tie up in “Beyond Search in Policing: How Oracle Redefines Real time Policing and Investigation—Complementary Capabilities of Oracle’s Endeca Information Discovery and WCC’s ELISE.”

The white paper explains in 15 pages how intelligence led policing works. I am okay with the assertions, but I wonder if Endeca’s computationally intensive approach is suitable for certain content processing tasks. The meshing of matching with Endeca’s outputs results in an “integrated policing platform.”

The Oracle marketing piece explains ELISE in terms of “Intelligent Fusion.” Fusion is quite important in next generation information access. The diagram explaining ELISE is interesting:


Endeca’s indexing makes use of the MDex storage engine, which works quite well for certain types of applications; for example, bounded content and point-and-click access. Oracle shows this in terms of Endeca’s geographic output as a mash up:


For me, the most interesting part of the marketing piece was this diagram. It shows how two “search” systems integrate to meet the needs of modern police work:


It seems that WCC’s technology, also used for matching candidates with jobs, looks for matches and then Endeca adds an interface component once the Endeca system has worked through its computational processes.

For Oracle, ELISE and Endeca provide two legs of Oracle’s integrated police case management system.

Next generation information access systems move “beyond search” by integrating automated collection, analytics, and reporting functions. In my new monograph for law enforcement and intelligence professionals, I profile 21 vendors who provide NGIA. Oracle may go “beyond search,” but the company has not yet penetrated NGIA, next generation information access. More streamlined methods are required to cope with the type of data flows available to law enforcement and intelligence professionals.

For more information about NGIA, navigate to

Stephen E Arnold, January 10, 2015

« Previous PageNext Page »