CyberOSINT banner

SRCH2: Security and Speed

October 12, 2014

Oracle’s Secure Enterprise Search offered advanced security. Perfect Search stressed its speed. SES has been marginalized. That particular security pitch did not work. Perfect Search also has faded from the scene.

Perhaps pitching both security and speed will yield more together than as separate features.

SRCH2 asserts that it is four times faster than open source search engines. None of the open source search engines is a speed demon. Speed boosts require additional work on the specific subsystem introducing the latency for a particular deployment.

SRCH2’s “Real Time Computer Requires Faster Search” makes a case for the optimization built in to SRCH2’s system. The article states:

SRCH2 offers the world’s fastest search engine. Why is speed so important? After all, the human eye can’t detect the difference between a 10-millisecond and 50-millisecond response time.

Some data backing this assertion would be helpful. In a direct comparison of Lucid Works’ technology with ElasticSearch’s technology, the ArnoldIT team found that one was faster in indexing and the other was faster in query processing. Both could be improved with focused optimization. Perhaps SRCH2 will share some of their data which backs up the “four time faster claim? (I am not at liberty to release the performance data a client requested my team compile from live tests on my test corpus.

SRCH2’s “SRCH2 Introduces Access Control Lists to Improve Search Security.” The article states:

SRCH2 took the approach of providing native support of access control to set restrictions on search results. With SRCH2’s ACL feature, developers can restrict user permissions to access either certain records in an index, or specific attributes within a record or set of records.

The approach is useful. However, it is less robust that the Oracle approach which implemented a wider range of features provided by specialized Oracle subsystems.

Will the combination of security and speed pay off for SRCH2? Good question. I do not have an answer.

Stephen E Arnold, October 11, 2014

Predictions For Real Time Technology

March 10, 2014

Phil Leggetter is a real time software and developer evangelist and on his blog he wrote a post entitled, “10 Real Time Web Technology Predictions For 2014.” He says in the post that he based his 2014 predictions on trends in 2013 and what has happened so far in 2014.

He notes that nearly all applications have a real time sync in their code for relevancy and that real time is becoming a common commodity. This means that real time fixtures will be included in frameworks, but it will not diminish their importance. One can expect to see more real time APIs, increasing API offerings and adding to their values, and WebHooks will gain more prominence.

Leggett mentions that open source needs an data sync solution, which comes as a surprise because there is nearly an open source program for everything. Why has this not been made yet?

Video and audio communication are getting even bigger. Real time video and data communication in real time is going to be even more important for applications and it might be time to check out peer-to-peer data sharing. What is even better is real time developer tools are on the horizon.

The next 10 months of 2014 is going to be very exciting for real time web technology, real time solution providers, real time hosted services, and more importantly for us developers. I expect some serious advancements in existing solutions and some new players to come along. Real time web technology is going to become even easier to integrate into existing applications and we’re going to have a much wider range of choice when building real time apps from the ground up.”

Will real time technology be the buzzword trend this year? Again, it is only predictions.

Whitney Grace, March 10, 2014
Sponsored by, developer of Augmentext

Dr. Jerry Lucas: Exclusive Interview with TeleStrategies ISS Founder

January 14, 2013

Dr. Jerry Lucas, founder of TeleStrategies, is an expert in digital information and founder of the ISS World series of conferences. “ISS” is shorthand for “intelligence support systems.” The scope of Mr. Lucas’ interests range from the technical innards of modern communications systems to the exploding sectors for real time content processing. Analytics, fancy math, and online underpin Mr. Lucas’ expertise and form the backbone of the company’s training and conference activities.

What makes Dr. Lucas’ viewpoint of particular value is his deep experience in “lawful interception, criminal investigations, and intelligence gathering.” The perspective of an individual with Dr. Lucas’ professional career offers an important and refreshing alternative to the baloney promulgated by many of the consulting firms explaining online systems.

Dr. Lucas offered a more “internationalized” view of the Big Data trend which is exercising many US marketers’ and sales professionals’ activities. He said:

“Big Data” is an eye catching buzzword that works  in the US. But as you go east across the globe, “Big Data” as a buzzword doesn’t get traction in the Middle East, Africa and Asia Pacific Regions if you remove Russia and China. One interesting note is that Russian and Chinese government agencies only buy from vendors based in their countries. The US Intelligence Community (IC) has big data problems because of the obvious massive amount of data gathered that’s now being measured in zettabytes.  The data gathered and stored by the US Intelligence Community is growing beyond what typical database software products can handle as well as the tools to capture, store, manage and analyze the data. For the US, Western Europe, Russia and China, “Big Data” is a real problem and not a hyped up buzzword.

Western vendors have been caught in the boundaries between different countries’ requirements. Dr. Lucas observed:

A number of western vendors made a decision because of the negative press attention to abandon the global intelligence gathering market.  In the US  Congress Representative Chris Smith (R, NJ) sponsored a bill that went nowhere to ban the export of intelligence gathering products period.  In France a Bull Group subsidiary, Amesys legally sold intelligence gathering systems to Lybia but received a lot of bad press during Arab Spring.  Since Amesys represented only a few percent of Bull Group’s annual revenues, they just sold the division.  Amesys is now a UAE company, Advanced Middle East Systems (Ames). My take away here is governments particularly in the Middle East, Africa and Asia have concerns about the long term regional presence of western intelligence gathering vendors who desire to keep a low public profile. For example, choosing not to exhibit at ISS World Programs. The next step by these vendors could be abandoning the regional marketplace and product support.

The desire for federated information access is, based on the vendors’ marketing efforts, is high. Dr. Lucas made this comment about the existence of information silos:

Consider the US where you have 16 federal organizations collecting intelligence data plus the oversight of the Office of Director of National Intelligence (ODNI). In addition there are nearly 30,000 local and state police organizations collecting intelligence data as well. Data sharing has been a well identified problem since 9/11.  Congress established the ODNI in 2004 and funded the Department of Homeland Security to set up State and Local Data Fusion Centers.  To date Congress has not been impressed.  DNI James Clapper has come under intelligence gathering fire over Benghazi and the DHS has been criticized in an October Senate report that the $1 Billion spent by DHS on 70 state and local data fusion centers has been an alleged waste of money. The information silo or the information stovepipe problem will not go away quickly in the US for many reasons.  Data cannot be shared because one agency doesn’t have the proper security clearances, job security which means “as long as I control access the data I have a job,” and privacy issues, among others.

The full text of the exclusive interview with Dr. Lucas is at The full text of the 2011 interview with Dr. Lucas is at this link. Stephen E Arnold interviewed Dr. Lucas on January 10, 2013. The full text of the interview is available on the subsite “Search Wizards Speak.”

Worth reading.

Donald Anderson, January 14, 2013

Real Life Alerts Show There is More to Search than Key Words

July 12, 2012

AtHoc joined forces with Intel and received a $5.6 million investment to improve their technology. Since they are the leader in enterprise-class, network-based mass notification systems for the security, life safety and defense sectors of the United States, one would have to agree that was a wise investment.

Contrary to some beliefs, there is more to search than key words. The recent press releases on AtHoc’s page “Intel Invests in AtHoc; Chairman of RSA Security Joins AtHoc’s Board,” are a reminder that increasing device technology demands improvements with critical situational awareness data. Organizations must be able to swiftly analyze and address anomalies because lives may depend on it.

AtHoc does just that with real life, real time alerts as stated:

“AtHoc helps organizations become fully prepared to provide emergency mass communication to all of its constituents. It allows users to provide additional data and responders to remediate the issues at hand, based on the information they receive. AtHoc improves the safety and security of our citizens, first responders, and armed forces personnel around the world.”

Just imagine attempting to get a real time response on the average search engine during an emergency. The repercussions of scanning pages of possible aid would almost assuredly be life threatening. When considering the outcome from that perspective, real life, real time alerts show there is more to search than key words.

Jennifer Shockley, July 12, 2012

Sponsored by Polyspot

Twitter: A Long Road to Travel in Search

June 19, 2012

Twitter has been a success in San Francisco, Silicon Valley, and in the cheerful world of those steeped in real time information. The reality is that Twitter generates a great deal of information in relatively context free outputs. I think of those outputs as an opportunity, but the reality is that the volume of information and the challenge of finding a gem amidst the gravel is big one.

Twitter seems to be making a step forward. Online Media Daily’s “Twitter Hires LinkedIn Pro to Improve Real Time Search, Ads” informed us at Beyond Search that John Wang, a search and open source wizard, is joining the tweeters.


What is missing? The information displayed via Twitter search is useful. Ads in context to the context free messages are not evident. Is this the gap which Twitter will move to fill? We think that for Twitter at this time, advertising revenue is more important than recall and precision.

The search challenge is not one which can be resolved overnight. The fix for the context data is not going to be easy. Did I mention the brevity of the tweets and the volume? If not, both will require thought and money to resolve. When content flows in high volume, the red ink is like the water behind one of those soil dams in the Netherlands. Vigilance and creativity are needed along with luck, money, and an infrastructure which can adapt to avoid a cost problem.

Our view is that better search (whatever that means) is a nice to have. The must have at Twitter will be advertising. However, our hope is that search is defined more in terms of making Twitter information useful. We will watch the evolution of and the ads on the result pages.

Stephen E Arnold, June 19, 2012

Sponsored by Ikanow

DataSift Architecture

March 1, 2012

So you want to do “big data”? This is for the SEO, PR, and marketing consultants who assert that “big data” is part of their firms’ standard fanny pack. You can view the large version of this DataSift architecture image at DataSift, as you may know, processes the Twitter tweet stream. Yep, big data. The IT folks at the new age Madison Avenue firms have this type of technology with their Starbuck’s latte:


The DataSift Architecture: A Bird’s Eye View.

Trivial.for the SEO experts and former middle school English teachers.

Stephen E Arnold, March 1, 2012

Sponsored by


Exogenous Complexity 3: Being Clever

February 24, 2012

I just submitted my March 2012 column to Enterprise Technology Management, published in London by IMI Publishing. In that column I explored the impact of Google’s privacy stance on the firm’s enterprise software business. I am not letting any tiny cat out of a big bag when I suggested that the blow back might be a thorn in Googzilla’s extra large foot.

In this essay, I want to consider exogenous complexity in the context of the consumerization of information technology and, by extension, on information access in an organization. The spark for my thinking was the write up “Google, Safari and Our Final Privacy Wake-Up Call.”

Here’s a clever action. MIT students put a red truck on top of the dome. For more see

If you do not have an iPad or an iPhone or an Android device, you will want to stop reading. Consumerization of information technology boils down to employees and contract workers who show up with mobile devices (yes, including laptops) at work. In the brave new world, the nanny instincts of traditional information technology managers are little more than annoying nags from a corporate mom.

The reality is that when consumer devices enter the workplace, three externalality happen in my experience.

First, security is mostly ineffective. Clever folks then exploit vulnerable systems. I think this is why clever people say that the customer is to blame. So clever exploits cluelessness. Clever is exogenous for the non clever. There are some actions an employer can take; for example, confiscating personal devices before the employee enters the work area. This works in certain law enforcement, intelligence, and a handful of other environments; for example, fabrication facilities in electronics or pharmaceuticals. Mobile devices have cameras and can “do” video. “Secret” processes can become un-secret in a nonce. In the free flowing, disorganized craziness of most organizations, personal devices are ignored or overlooked. In short, in a monitored financial trading environment, a professional can send messages outside the firm and the bank’s security and monitoring systems are happily ignorant. The cost of dropping a truly secure box around a work place is expensive and beyond the core competency of most information technology professionals.

Second, employees blur information which is “for work” with information which is “for friends, lovers, or acquaintances.” The exogenous factor is political. To fix the problem, rules are framed. The more rule applied to a flawed system, the greater the likelihood is that clever people will exploit systems which ignore the rules. Clever actions, therefore, increase. In short, this is a variation of the Facebook phenomena when a posting can reach many people quickly or lie dormant until the data load explodes like long forgotten Fourth of July fire cracker. As people chase the fire, clever folks exploit the fire. Information time bombs are not thought about by most senior managers, but they are on the radar of those involved in a legal matter and in the minds of some disgruntled programmers. The half life of information is less well understood by most professionals than the difference between a uranium based reactor and a thorium based reactor. Work and life information are blended, and in my opinion, the compound is a dangerous one.

Third, vendors focusing on consumerizing information technology spur adoption of devices and practices which cannot be easily controlled. The data-Hoovering processes, therefore, can suck up information which is proprietary, of high value, and potentially damaging to the information owner. Information is not “like sand grains.” Some information is valueless; other information commands a high price. In fact, modern content processing and data analytic systems can take fragments of information and “fuse” them. To most people these amalgams are of little interest. But to someone with specialized knowledge, the fused data are not god nuggets, the fused data are a chunky rosy diamond, maybe a Pink Panther. As a result, an exogenous factor increases the flow of high value data through uncontrolled channels.


A happy quack to Gunaxin. You can see how clever, computer situations, and real life blend in this “pranking” poster. I would have described the wrapping of equipment in plastic “clever.” But I am the fume hood guy, Woodruff High School, 1958 to 1962. Image source:

Now, let’s think about being clever. When I was in high school, I was one of a group of 25 students who were placed in an “advanced” program. Part of the program included attending universities for additional course work. I ended up at the University of Illinois at age 15. I went back to regular high school, did some other Fancy Dan learning programs, and eventually graduated. My specialty was tricking students in “regular” chemistry into modifying their experiments to produce interesting results. One of these suggestions resulted in a fume hood catching fire. Another dispersed carbon strands through the school’s ventilation system. I thought I was clever, but eventually Mr. Shepherd, the chemistry teach, found out that I was the “clever” one. I sat in the hall for the balance of the semester. I adapted quickly, got an A, and became semi-famous. I was already sitting in the hall for writing essays filled with double entendres. Sigh. Clever has its burdens. Some clever folks just retreat into a private world. The Internet is ideal for providing an environment in which isolated clever people can find a “friend.” Once a couple of clever folks hook up, the result is lots of clever activity. Most of the clever activity is not appreciated by the non clever. There is the social angle and the understanding angle. In order to explain a clever action, one has to be somewhat clever. The non clever have no clue what has been done, why, when, or how. There is a general annoyance factor associated with any clever action. So, clever usually gets masked or shrouded in something along the lines, “Gee, I am sorry” or “Goodness gracious, I did not think you would be annoyed.” Apologies usually work because the non clever believe the person saying “I’m sorry” really means it. Nah. I never meant it. I did not pay for the fume hood or the air filter replacement. Clever, right?

What happens when folks from the type of academic experience I had go to work in big companies. Well, it is sink or swim. I have been fortunate because my “real” work experiences began at Halliburton Nuclear Services and continued at Booz, Allen & Hamilton when it was a solid blue chip firm, not the azure chip outfit it is today. The fact that I was surrounded by nuclear engineers whose idea of socializing was arguing about Monte Carlo code and nuclear fuel degradation at the local exercise club. At Booz, Allen the environment was not as erudite as the nuclear outfit, but there were lots of bright people who were actually able to conduct a normal conversation. Nevertheless, the Type As made life interesting for one another, senior managers, clients, and family. Ooops. At the Booz, Allen I knew, one’s family was one’s colleagues. Most spouses had no idea about the odd ball world of big time consulting. There were exceptions. Some folks married a secretary or colleague. That way the spouse knew what work was like. Others just married the firm, converting “quality time” into two days with the dependents at a posh resort.

So clever usually causes one to seek out other clever people or find a circle of friends who appreciate the heat generated by aluminum powder in an oxygen rich environment. When a company employs clever people, it is possible to generalize:

Clever people do clever things.

What’s this mean in search and information access? You probably already know that clever people often have a healthy sense of self worth. There is also arrogance, a most charming quality among other clever people. The non-clever find the arrogance “thing” less appealing.

Let’s talk about information access.

Let’s assume that a clever person wants to know where a particular group of users navigate via a mobile device or a traditional browser. Clever folks know about persistent cookies, workarounds for default privacy settings, spoofing built in browser functions, or installation of rogue code which resets certain user selected settings on a heartbeat or restart. Now those in my advanced class would get a kick out these types of actions. Clever people appreciate the work of clever people. When the work leaves the “non advanced” in a clueless state, the fun curve does the hockey stick schtick. So clever enthuses those who are clever. The unclever are, by definition, clueless and not impressed. For really nifty clever actions, the unclever get annoyed, maybe mad. I was threatened by one student when the Friday afternoon fume hood event took place. Fortunately my debate coach intervened. Hey, I was winning and a broken nose would have imperiled my chances at the tournament on Saturday.

Now more exogenous complexity. Those who are clever often ignore unintended consequences. I could have been expelled, but I figured my getting into big trouble would have created problems with far reaching implications. I won a State Championship in the year of the fume hood. I won some silly scholarship. I published a story in the St Louis Post Dispatch called “Burger Boat Drive In.” I had a poem in a national anthology. So, I concluded that a little sport in regular chemistry class would not have any significant impact. I was correct.

However, when clever people do clever things in a larger arena, then the assumptions have to be recalibrated. Clever people may not look beyond their cube or outside their computer’s display. That’s when the exogenous complexity thing kicks in.

So Google’s clever folks allegedly did some work arounds. But the work around allowed Microsoft to launch an attack on Google. Then the media picked up on the work around and the Microsoft push back. The event allowed me to raise the question, “So workers bring their own consumerized device to work. What’s being tracked? Do you know? Answer: Nope.” What’s Google do? Apologize. Hey, this worked for me with the fume hood event, but on a global stage when organizations are pretty much lost in space when it comes to control of information, effective security, and managing crazed 20 somethings—wow.

In short, the datasphere encourages and rewards exogenous behavior by clever people. Those who are unclever take actions which sets off a flood of actions which benefit the clever.

Clever. Good sometimes. Other times. Not so good. But it is better to be clever than unclever. Exogenous factors reward the clever and brutalize the unclever.

Stephen E Arnold, February 24, 2012

Sponsored by

Al Jazeera and Its US Reach

January 24, 2012

We were surprised, then resigned. Has the US slipped lower on yet another yardstick of achievement?

Al Jazeera English, an international 24 hour English-Language news and current affairs TV channel headquartered in Doha, Qatar, has now reached 250 million homes — 5 million of those being in the U.S.

The Los Angeles Times reported on this startling milestone in the article “Al Jazeera English Now Reaches 250 Million Households.”

We learned:

Five years after its launch, there are 130 countries that carry Al Jazeera English, but in the U.S., the channel has limited availability; it can be found on cable systems in Washington, D.C.; New York; Burlington, Vt.; Toledo, Ohio; and, recently, Chicago and in Los Angeles on KCET. And while the U.S. makes up a fraction of the quarter-billion households, it is a major source of AJE’s Web traffic, totaling 40 percent, according to the network.

The fact that Al Jazeera English has such a large web following in the United States despite its limited availability, leads me to think that a significant shift has taken place.

Jasmine Ashton, January 24, 2012

Sponsored by

Google Does Real Time Again

October 28, 2011

Google+ Rolls Out Real-Time Search and Hashtag Support

On October 12, Google Plus rolled out two new features; both allow users to create custom news streams based around topics being shared and build upon the search functionality of the network. The first feature, a real-time search, finds results from Google+ posts that are related to the search term a user enters. As new posts are created centering around the search topic, the user is notified and a real-time stream of posts is begun. ZDNet’s article, “Google+ Real-Time Search: The Social News “Ticker” tells us more about the changes:

… Google engineer Vic Gundotra – who posted the news from his Google Plus feed – notes that it’s a great way to keep up with real-time news events, such as a speech, a court trial or a sporting event. Basically, it’s a real-time news ticker for niche topics. The second feature – hashtag support – essentially turns any hashtag in a post into a searchable term that can be used as another way to create feeds and real-time streams.

This is a catchy notion. I’m interested to see if Google+ will begin integrating all social networking posts into their search results. What they’re doing right now isn’t groundbreaking; Twitter already offers the exact same feature. However, it would be groundbreaking to be able to follow trending topics on all the major social networking sites as they correlate to breaking news.

But Google did real time before. What’s “real time”? Whatever Google wants it to be I suppose from a marketing viewpoint.

Stephen E Arnold, October 28, 2011

Sponsored by

Lucid Imagination: Open Source Search Reaches for Big Data

September 30, 2011

We are wrapping up a report about the challenges “big data” pose to organizations. Perhaps the most interesting outcome of our research is that there are very few search and content processing systems which can cope with the digital information required by some organizations. Three examples merit listing before I comment on open source search and “big data”.

The first example is the challenge of filtering information required by orgnaizatio0ns produced within the organization and by the organizations staff, contractors, and advisors. We learned in the course of our investigation that the promises of processing updates to Web pages, price lists, contracts, sales and marketing collateral, and other routine information are largely unmet. One of the problems is that the disparate content types have different update and change cycles. The most widely used content management system based on our research results is SharePoint, and SharePoint is not able to deliver a comprehensive listing of content without significant latency. Fixes are available but these are engineering tasks which consume resources. Cloud solutions do not fare much better, once again due to latency. The bottom line is that for information produced within an organization employees are mostly unable to locate information without a manual double check. Latency is the problem. We did identify one system which delivered documented latency across disparate content types of 10 to 15 minutes. The solution is available from Exalead, but the other vendors’ systems were not able to match this problem of putting fresh, timely information produced within an organization in front of system users. Shocked? We were.

lucid decision copy

Reducing latency in search and content processing systems is a major challenge. Vendors often lack the resources required to solve a “hard problem” so “easy problems” are positioned as the key to improving information access. Is latency a popular topic? A few vendors do address the issue; for example, Digital Reasoning and Exalead.

Second, when organizations tap into content produced by third parties, the latency problem becomes more severe. There is the issue of the inefficiency and scaling of frequent index updates. But the larger problem is that once an organization “goes outside” for information, additional variables are introduced. In order to process the broad range of content available from publicly accessible Web sites or the specialized file types used by certain third party content producers, connectors become a factor. Most search vendors obtain connectors from third parties. These work pretty much as advertised for common file types such as Lotus Notes. However, when one of the targeted Web sites such as a commercial news services or a third-party research firm makes a change, the content acquisition system cannot acquire content until the connectors are “fixed”. No problem as long as the company needing the information is prepared to wait. In my experience, broken connectors mean another variable. Again, no problem unless critical information needed to close a deal is overlooked.

Read more

Next Page »