CyberOSINT banner

Oracle is Rocking COLLABORATE

April 15, 2015

News is already sprouting about the COLLABORATE 15: Technology and Applications Forum for the Oracle Community, Oracle’s biggest conference of the year.  BusinessWire tells us that Oracle CEO Mark Hurd and Chief Information Officer and Senior VP Mark Sunday will be keynote speakers, says “Oracle Applications Users Group Announces Oracle’s Key Role at COLLABORATE 15.”

Hurd and Sunday will be delivering key insights into Oracle and the industry at their scheduled talks:

“On Tuesday, Sunday discusses the need to keep a leadership edge in digital transformation, with a special focus on IT leadership in the cloud. Sunday will build upon his keynote from two years ago, giving attendees better insight into adopting a sound cloud strategy in order to ensure greater success.  On Wednesday, Hurd shares his insights on how Oracle continues to drive innovation and protect customer investments with applications and technology. Oracle remains the leading organization in the cloud, and Hurd’s discussion focuses on how to modernize businesses in order to thrive in this space.”

Oracle is really amping up the offerings at this year’s conference.  They will host the Oracle User Experience Usability Lab, Oracle Proactive Support Sessions, Oracle Product Roadmap Session, and more to give attendees the chance to have direct talks with Oracle experts to learn about strategies, functionality, products, and new resources to improve their experience and usage.  Attendees will also be able to take accreditation tests for key product areas.

COLLABORATE, like many conferences, offers attendees the chance to network with Oracle experts, get professional feedback, and meet others in their field.  Oracle is very involved in this conference and is dedicated to putting its staff and products at the service of its users.

Whitney Grace, April 15, 2015

Stephen E Arnold, Publisher of CyberOSINT at

Apache Sparking Big Data

April 3, 2015

Apache Spark is an open source cluster computing framework that rivals MapReduceVenture Beat says that people did not pay that much attention to Apache Spark when it was first invented at University of California’s AMPLAB in 2011.  The article, “How An Early Bet On Apache Spark Paid Off Big” reports the big data open source supporters are adopting Apache Spark, because of its superior capabilities.

People with big data plans want systems that process real-time information at a fast pace and they want a whole lot of it done at once.  MapReduce can do this, but it was not designed for it.  It is all right for batch processing, but it is slow and much to complex to be a viable solution.

“When we saw Spark in action at the AMPLab, it was architecturally everything we hoped it would be: distributed, in-memory data processing speed at scale. We recognized we’d have to fill in holes and make it commercially viable for mainstream analytics use cases that demand fast time-to-insight on hordes of data. By partnering with AMPLab, we dug in, prototyped the solution, and added the second pillar needed for next-generation data analytics, a simple to use front-end application.”

ClearStory Data was built using Apache Spark to access data quickly, deliver key insights, and making the UI very user friendly.  People who use Apache Spark want information immediately to be utilized for profit from a variety of multiple sources.  Apache Spark might ignite the fire for the next wave of data analytics for big data.

Whitney Grace, April 3, 2015
Stephen E Arnold, Publisher of CyberOSINT at

GitHub: More Than Code

April 1, 2015

Short honk: Google killed off its open source software thing. GitHub seems to be the go to repository. However, GitHub is more than code. Navigate to “Le Code Civil francais, sour Git.” Is it important that a code repository is growing its content pool? Nah, just a blip. There is that denial of service attack. But that is probably unrelated to GitHub’s activities.

Stephen E Arnold, March 31, 2015

Relaxing a Query: PostgreSQL Style

March 22, 2015

If you are a user of PostgreSQL and want to implement fuzzy, relaxed, or “show ‘em something sort of close to the user’s query,” you will want to read “Super Fuzzy Searching on PostgreSQL.” Fuzzy search makes it possible to show a user who is not quite sure how terms appear in an index. Fuzzy is not exactly like “close” in horseshoes. More algorithmic magic is at play in information retrieval systems.

The article explains PostgreSQL fuzzy capabilities and launches into the notion of trigrams. Keep in mind that Manning & Napier (creators of DR LINK) possess some n-gram patents. The old Brainware which may have once been SER) also possesses some n-gram type patents. I recall hearing years ago that Brainware developed a trigram search system which worked reasonably well when looking for similar patent claims. Brainware is now part of a printer company, and I have lost track of the search technology. I suppose I could investigate the Brainware/Lexmark status, but I have other tasks beckoning my attention.

The write up explains how to implement trigrams for PostgreSQL. The code examples are useful and the tips for dealing with large datasets are quite helpful. The author does not mention the n-gram related patents. I assume that the author assumes that the patent holders assume no one is infringing. That is a triple assumption set. int ere sti ngt rig ram coi nci den ce_

Stephen E Arnold, March 22, 2015

DuckDuckGo: Boosting Search via Open Source Cash Donations

March 21, 2015

There are many ways for commercial enterprises to gain traction via open source. Some companies, like IBM, cheerlead for Eclipse and Lucene, among other open source projects. Other companies hold conferences to tout an open source solution and then pitch extra cost add ons like consulting and training so the unfamiliar can become familiar with the “free” software. A few firms slip open source hints into their commercial messages. One company which sells a government- and academic-based search system used “open source” on a Web page. When I pointed this commercial outfit hinting that their for fee, proprietary product was open source, the reference disappeared after a frisky email exchange. It seems that some company presidents do not look at their own firm’s Web sites.

I read “2015 Open Source Donations.” The write up was straightforward, listing various donations from DuckDuckGo to worthy causes. One of these is the Amnesic Incognito Live System or Tails.

I am okay with this support for open source via cash. Many firms have followed the path. I find it interesting that DuckDuckGo, which I understand is essentially a metasearch engine, is following this route.

Other commercial outfits will become more open about their support of open source. After all, why use a commercial, proprietary product when you can use a perfectly good open source product. All one needs is know how. That, of course, is what the open source services firm sell.

DuckDuckGo wants to keep the communities in which it has an interest watered, fed, and loved. Good deal.

Stephen E Arnold, March 21, 2015

IBM Hadoop

March 18, 2015

For anyone who sees setting up an instance of Hadoop as a huge challenge, Open Source Insider points to IBM’s efforts to help in, “Has IBM Made (Hard) Hadoop Easier?” Why do some folks consider Hadoop so difficult? Blogger Adrian Bridgwater elaborates:

“More specifically, it has been said that the Hadoop framework for distributed processing of large data sets across clusters of computers using simple programming models is tough to get to grips with because:

Hadoop is not a database

Hadoop is not an analytics environment

Hadoop is not a visualisation tool

Hadoop is not known for clusters that meet enterprise-grade security requirements

Foundation fixation

This is because Hadoop is a ‘foundational’ technology in many senses, so its route to ‘business usefulness’ is neither direct or clear cut in many cases.”

Hmm. So, perhaps one should understand what Hadoop is and what it does before trying to implement it. Still, the folks at IBM would prefer companies just pay them to handle it. The article cites a survey of “bit-data developers” (commissioned by IBM) that shows about a quarter of the respondents us IBM’s Hadoop. Bridgwater also mentions:

“IBM also recently conducted an independently audited benchmark, which was reviewed by third-party Infosizing, of three popular SQL-on-Hadoop implementations, and the results showed that IBM’s Big SQL was the only Hadoop solution tested that was able to run all 99 Hadoop-DS queries…. Smith says that this new report and benchmark are proof that customers can ask more complex questions of IBM when it comes to Hadoop implementation.”

I’m not sure that’s what those factors prove, but it is clear that many companies do turn to the tech giant for help with Hadoop. But is their assistance worth the cost? Unfortunately, this article includes no word on IBM’s Hadoop pricing.

Cynthia Murrell, March 18, 2015

Stephen E Arnold, Publisher of CyberOSINT at

Lookeen Desktop Search: Exclusive Interview Reveals Lucene as a Personal Search Solution

March 17, 2015

Axonic’s enterprise-centric search products eliminate most, if not all, of the problems a Windows user encounters when trying to locate related information produced by different applications on a desktop computer. Email and other types of information are findable with a few keystrokes.

When I was in Germany in June 2014, I learned about Lookeen, a desktop search product that was built on Lucene. The idea was to tap the power of Lucene to put content on a user’s computer at one’s fingertips. Imagine working in Outlook, reading a message, and seeing a reference to a PowerPoint on the user’s external storage device. Lookeen allows access to the content from within Outlook. Now the company is releasing a commercial version of its desktop search product that promises to be a game changer on the desktop and in the enterprise. The company offers robust functionality at a very attractive price point.

The role of Lucene and other technical innovations in the high-performance software appears in an exclusive interview with Lookeen’s chief operating officer. You can find the interview at

Lookeen Search Results

The Lookeen interface is intuitive. No training is required to install the Lucene-based system nor to use it for simple or complex information retrieval tasks. Image used with the permission of Axonic GmbH.

Lookeen is a product developed by Axonic, a software and services firm located in Karlsruhe, Germany, in Rhine Valley, a short distance from Stuttgart.  Axonic is one of the leading software development and services firms for Outlook and Exchange Server search technologies in Europe. The company specializes in enterprise applications and has a core competency in Microsoft technologies.

I wanted more detail about Lookeen’s approach to desktop search. In an exclusive interview, Peter Oehler, COO, revealed a its breakthrough approach to desktop search. The company’s Lookeen software gives Windows users the industry-leading search technology tuned for the Microsoft environment. Outlook email, PowerPoint decks, Word documents and other common file types are instantly findable.

Peter Oehler said:

We’ve utilized Lucene’s extensive query syntax to enable users to use familiar Google-like Boolean search, as well as wildcard, proximity, and keyword matching.  The introduction of more search strings and filter features enable users to narrow down searches in an easy and intuitive way, and more proficient searchers can access the best of Lucene’s query syntax.

He added:

Lucene is a very good, widely used open source search system. Many of the innovations we’ve developed on top of the Lucene engine stem directly from our extensive experience with Outlook. For example, the Lookeen context menu allows a user to open, reply to, forward, move and summarize emails and topics, all from within Lookeen.

What sets Lookeen apart from proprietary, freeware, and shareware is that Axonic has engineered its system to provide real-time access to information on the user’s computer. The system can handle terabytes of user content, returning results almost instantaneously.

Axonic has deep experience with Microsoft technology. Oehler told me:

Lucene is a beast within the Microsoft environment. Microsoft doesn’t make it easy to work with Outlook without causing problems or affecting performance. Outlook is the lifeblood of most professionals – the most important tool. If it stops working, you stop working. The art of our product is how we tackle the complex code hiding under the surface of Outlook and combine it with Lucene to create a deceptively smooth and simple search solution.

Beyond Search ran tests on Lookeen and compared the results with outputs from a number of test systems. Lookeen’s response times were among the fastest. When indexing and searching email, including archived collections of emails, Lookeen was the top performer. Our test systems include Copernic, dtSearch, Effective File Search, Gaviri, ISYS Desktop Search, and X1.

Lookeen requires no special training or complex set up. Lookeen allows a user to search external shared content directly from the Lookeen app. The interface is clear and logical. A busy professional can access needed documents, view and interact with them without launching an external application.

A 14 day free trial is available. The license fee is $58 for a single user version. The company offers a business edition (at $83) which adds group policy functions and an enterprise edition, which begins at about $116 per user, however volume discounts are available.

To read the complete exclusive interview with Peter Oehler, navigate to the Search Wizards Speak service at this link on ArnoldIT. More information about the company is available at

Stephen E Arnold, March 17, 2015

Elasticsearch Stretches and Competitors Could Bounce Off Its Elastic Surface

March 14, 2015

I know that my comments about the dead end nature of enterprise search have caught the attention of some vendors. Let’s face it. Search is a utility, a tool to be used when performing other work. Search is not, as some failed middle school teachers and English majors dressed up in Merlin the Magician outfits, promulgate.

Elasticsearch has shifted gears and rebranded itself as Elastic. The company provides some information about the shift at its new Web site The company says:

Elastic believes getting immediate, actionable insight from data matters. As the company behind the three open source projects — Elasticsearch, Logstash, and Kibana — designed to take data from any source and search, analyze, and visualize it in real time, Elastic is helping people make sense of data. From stock quotes to Twitter streams, Apache logs to WordPress blogs, our products are extending what’s possible with data, delivering on the promise that good things come from connecting the dots.

I think this repositioning is likely to put a tight elastic band around the throat of a number of competitors. I don’t think Elastic is sufficiently tight to kill these outfits. The positioning grip is definitely going to make their breathing more difficult.

Search is not dead at Elastic. The company is responding to the market’s need for a solution that delivers a tangible benefit, not a laundry list of jargon, buzzwords, and assertions that history has made clear are mostly baloney.

One question crossed my mind, “What will LucidWorks do to respond?” My thought is that LucidWorks is probably trying to craft a counter move. Millions are at stake, and I think the financial backers of the former Lucid Imagination will want more than ideas.

Stephen E Arnold, March 14, 2015

Open Source ElasticSearch Added to Google Cloud Platform

March 12, 2015

ElasticSearch is a popular open source search engine that has been downloaded over 10 million times since it deployed in 2010. Amazon recently announced they are planning on adding an ElasticSearch management service to EC2 to relieve workloads for developers. Rival Google announced on the Google Cloud Platform Blog that they will be adding ElasticSearch compatibility to its own cloud computing platform: “Deploy ElasticSearch On Google Compute Engine.”

The Google Compute Engine is ecstatic that ElasticSearch will be deployed on the platform and are actively encouraging end users to download it. They even made a list about why people need to start using ElasticSearch:

1 “Based on Lucene: Elasticsearch is an open source document-oriented search server based on Lucene. Lucene is a time tested open source library that is capable of reading everything from HTML to PDFs.

2 Designed for cloud: Elasticsearch was designed first for the cloud with its capabilities around simple cluster configuration and discovery and high-availability by default. This means you can expand your Elasticsearch deployment simply by adding new nodes. This expansion of your cluster — or in the case of a hardware failure, reduction — results in automatic reconfiguration of your document indices across the cluster.

3 Native use of JSON over HTTP: Extending the platform is simple for developers. The schema doesn’t need to be defined up front and your cluster can be extended with a variety of libraries in your languages of choice, even using the command line.”

ElasticSearch can be deployed with a few easy clicks ad once it is working you can immediately use it for log processes and analysis with Logstash, keyword text search, and data visualization with Kibana.

Deployment on the Google Compute Engine means ElasticSearch will reach an entirely new customer line. Other open source search engines will be pressured to up their ante with new features and services that ElasticSearch does not have. LucidWorks and other open source based search companies are feeling the pressure.

Whitney Grace, March 12, 2015
Sponsored by, developer of Augmentext

What Is New in Elasticsearch

March 10, 2015

A slide deck providing a round up of the new features of Elasticsearch as of June 2014 is available as of March 9, 2015 via Speakerdeck. Snag the deck. Some Elasticsearch presentations disappear themselves quickly.

Stephen E Arnold, March 9, 2015

Next Page »