CyberOSINT banner

Graphic Pits dtSearch against Lucene

February 6, 2015

An oddball TechWars graphic suggests that Lucene is making life difficult for vendors of proprietary search systems. In the site’s head-to-head “dtSearch vs Lucene” comparison, the open source solution seems to handily trounce dtSearch. Of course, for us, Lucene means Elasticsearch. For those unfamiliar with TechWars, here’s what the site’s description of what it does:

Data-driven: TechWars shows objective data gathered from the web to help you make the right decision when choosing technology for your projects.

Up-to-date: TechWars scans the web to catch the latest trends, so you can sit back and relax while we keep you updated.

Professional: TechWars is built for professionals, by professionals. Let’s build the best tech comparison tool together!

Community: TechWars serves the developer community by opening case studies for discussion. We are always open to requests and feedback via Facebook and Twitter.

The graphic compares dtSearch and Lucene in several areas. We’re told that 196 of TechWars users use Lucene, versus just 15 who use dtSearch. Under the “which companies use it?” heading, sixteen companies (several high-profile) are listed for Lucene, but “no companies found” for dtSearch. Um, it seems like a pretty shallow dataset they’re tapping into there. The site does use Google data for one comparison—a graph that shows how very many more folks have searched for information on Lucene than on dtSearch. At a glance, Lucene would seem to be coming out ahead.

Cynthia Murrell, February 06, 2015

Sponsored by, developer of Augmentext

IBM Watson Offers Demos

February 6, 2015

One of Vivisimo’s founders, Jerome Pesenti, seems to be the voice of IBM Watson. Vivisimo was a metasearch system with hit clustering. The company went through several management arabesques and was sold to IBM in 2012. Vivisimo pitched its system as a federated search engine. The configuration method, as I recall, required Jerome level input. In one installation, I learned that the Vivisimo system hit a wall when 250,000 documents were processed. There were work arounds, but these too required humans who knew the ins and outs of Vivisimo.

I recall that prior to the sale of Vivisimo to IBM, Vivisimo shifted to a government consulting services focus. Many search vendors in the hay day of the buy outs followed this path. License fees were not generating the cash the spreadsheet jockeys funding outfits like Endeca, Exalead, and Vivisimo envisioned. No problem. Some organizations wanted proprietary content processing systems and figured that it was time to sell out. The Big Dog of sell outs was Hewlett Packard’s $11 billion purchase of Autonomy. Vivisimo fetched about $20 million or one year’s projected revenue according to the stockholder familiar with the deal suggested.

Fast forward two or three years and Vivisimo is now Watson. Oh, Vivisimo is also a Big Data solution, not a metasearch engine. I assume the index limits have been addressed. I am thinking about IBM Watson for two reasons:

  1. IBM is going through a staff reduction. I assume this action was determined by querying the super smart Watson system
  2. I read “Five New Services Expand IBM Watson Capabilities to Images, Speech, and More,” an IBM in house marketing article.

To my surprise there was a significant shift in Watson marketing; to wit, there are now links to demos of IBM’s text to speech service, image recognition service, relationship analysis service, and something called tradeoff analytics. Now demos are helpful. So is the Watson “great video” about concept insights.

I ran the suggested query for “quantum physics.” Remember I used to work at Halliburton Nuclear Services. Here’s what I saw:


I noticed that each of the experts in the human resources database use the word “quantum” to describe their background.

I then ran a query for “tamarind,” one of the ingredients in a barbeque sauce created by Watson during its recipe phase. Here’s what I saw:


There is no recipe, nor is there an IBM person listing the barbeque recipe as his or her work. I was surprised. No tamarind wizard in the data set.

I asked myself, “Can’t I do this with Elasticsearch?” The answer my mind generated was, “No. No. No. You silly oaf. Watson uses Lucene but it is much, much more.”

How confident are the Watson workers who have dodged IBM layoffs?

What happens if Watson with Vivisimo, iPhrase, WebFountain, and assorted Almaden semantic goodies are aced by Hewlett Packard Autonomy or—heaven forbid—Amazon?

Will Dr. Pesenti be able to build a business that is orders of magnitude larger than Vivisimo’s revenue?

Interesting stuff. Not CyberOSINT level work, but interesting. I wonder why the i2 and related technologies are not pushed more aggressively. i2 works. (Note: I was a consultant to i2 prior to IBM’s purchase of the company.)

Stephen E Arnold, February 6, 2015

IBM and Layoffs: Watson, Watson, Where Are You?

February 4, 2015

For months I have been commenting about the increasingly weird marketing pitches for IBM Watson. This is the Lucene and home grown script system positioned as the next big thing in information retrieval. The financial goals for this system were crazy. My recollection is that IBM wanted to generate a billion in revenue from open source search and bits and pieces of the IBM technology lumber.

Impossible. Having a system ingest bounded content and then answer “questions” about that content is neither new, remarkable, or particularly interesting to me. When the system is presented as a way to solve the problem of cancer and generate barbeque sauce with tamarind, the silliness points to desperation.

IBM marketers were trying everything to make open source search into a billion dollar baby and pull of the stunt quickly. Keep in mind that Autonomy required 15 years and a number of pretty savvy acquisitions to nose into the $700 million range.

IBM, in its confused state, believed that it could do the trick in a fraction of the time. IBM apparently was unaware of the erratic thinking at Hewlett Packard that spent $11 billion for Autonomy and wanted to generate billions from that system at the same time IBM was going to collect a billion or more from the same market.

Both of these companies, dazed by a long term struggle with spreadsheet fever, were ignoring or simply did not understand the doldrums of the enterprise information access market. Big companies were quite happy to give open source solutions a try. Vendors of proprietary systems were pitching their keyword systems as everything from customer support “solutions” to business intelligence systems that would “predict” what the company should know.

Yep, right.

I read with some sadness the posts at Alliance@IBM. The viewpoint is not that of IBM management which is now firing or resource allocating its way people. I am not sure how many folks are going to be terminated, but the comments in this series of IBM employee comments suggest that the staff are unhappy. Some may not go gentle into that good night.

The point is that the underlying problems at IBM were evident in the silly Watson marketing. An organization that can with a straight face suggest that a next generation information access system can discover a new recipe provides a glimpse into an organization’s disconnect at a fundamental level.

Too bad. The stock buybacks, the sale of manufacturing assets, and the assertions that a mainframe is a mobile platform tells me that IBM stockholders may want to reevaluate those holdings.

If IBM asked Watson, I question the outputs.

Stephen E Arnold, February 4, 2015

Apache Solr Search NoSQL Search Shines Solo

February 3, 2015

Apache Solr is an open source enterprise search engine that is used for relational databases and Hadoop. ZDNet’s article, “Why Apache Solr Search Is On The Rise And Why It’s Going Solo” explores why its lesser-known use as a NoSQL store might explode in 2015.

At the beginning of 2014, the most Solr deployments were using it in the old-fashioned way, but 2015 shows that fifty percent of the pipeline is now using it as a first class data store. Companies are upgrading their old file intranets for the enterprise cloud. They want the upgraded system to be searchable and they are relying on Solr to get the job done.

Search is more complex than basic NoSQL and needs something more robust to handle the new data streams. Solr adds the extra performance level, so users have access to their data and nothing is missing.

” ‘So when we talk about Solr, it’s all your data, all the time at scale. It’s not just a guess that we think is likely the right answer. ‘We’re going to go ahead and push this one forward’. We guarantee the quality of those results. In financial services and other areas where guarantees are important, that makes Solr attractive,’ [CEO Will Hayes of LucidWorks, Apache Solr’s commercial sponsor] said.”

It looks like anything is possible for LucidWorks in the coming year.

Whitney Grace, February 03, 2014
Sponsored by, developer of Augmentext

Solcara Founder Speaks to the Benefits of Open Systems

January 28, 2015

The article titled How to Force Giants to “Stop & Listen”- The Legal Tech Entrepeneur Prising Open “Closed Systems” on The Legal Review examines the rewards available for those firms and entrepreneurs willing to take risks. The story of Solcara, the “federated search” technology company that started up just after the .com bubble burst in 2001. The article explains,

“Using Solcara, firms would be able to search legal content buried deep within the likes of “Lexis Nexis, Westlaw UK and Practical Law from Thomson Reuters…using a single search interface. Unsurprisingly, legal publishers who were used to a “closed system”, where they could print and sell entire libraries of bound books to clients, were initially uncomfortable with Solcara’s cherry-picking innovation. ..”The only way Solcara was able to successfully achieve [federated search] was working directly with the law firms… such as Norton Rose.”

Eventually Thomson Reuters acquired Solcara as well as Practical Law, leading Solcara’s co-founder Rob Martin to suggest that something similar needs to happen soon in law firms or clients will force a change on their own. Martin firmly believes that taking risks on innovation and being prepared to change direction is the only way to thrive in a market that fluctuates so easily.

Chelsea Kerwin, January 28, 2014

Sponsored by, developer of Augmentext

Open Source DeepDive Now Available

January 14, 2015

IBM’s Watson has some open-source competition. As EE Times reports in “DARPA Offers Free Watson-Like Artificial Intelligence,” DARPA’s DeepDive is now a freely available alternative to the famous machine-learning AI. Both systems have their roots in the same DARPA-funded project. According to DeepDive’s primary programmer, Christopher Re, while Watson is built to answer questions, DeepDive’s focus is on extracting a wealth of structured data from unstructured sources. Writer R. Colin Johnson informs us:

DeepDive incorporates probability-based learning algorithms as well as open-source tools such as MADlib, Impala (from Oracle), and low-level techniques, such as Hogwild, some of which have also been included in Microsoft’s Adam. To build DeepDive into your application, you should be familiar with SQL and Python.

“Underneath the covers, DeepDive is based on a probability model; this is a very principled, academic approach to build these systems, but the question for use was, ‘Could it actually scale in practice?’ Our biggest innovations in Deep Dive have to do with giving it this ability to scale,” Re told us.

For the future, DeepDive aims to be proven in other domains. “We hope to have similar results in those domains soon, but it’s too early to be very specific about our plans here,” Re told us. “We use a RISC processor right now, we’re trying to make a compiler, and we think machine learning will let us make it much easier to program in the next generation of DeepDive. We also plan to get more data types into DeepDive.”

It sounds like the developers are just getting started. Click here to download DeepDive and for installation instructions.

Cynthia Murrell, January 14, 2015

Sponsored by, developer of Augmentext

Love Open Source? Good News and News

January 13, 2015

I read “Top 10 FOSS Legal Developments of 2014.” A legal eagle generated the listicle. Despite my skepticism for birds of this feather, the list has some good news and—well, to put it positively—news for the open source movement.

The good news is that folks from courts to government agencies are paying attention to free and open source software. The “news” news is that use of open source “by commercial companies expands.” The write up states:

We have discussed in the past how many large companies are using FOSS as an explicit strategy to build their software. Jim Zemlin, Executive Director of the Linux Foundation, has described this strategic use of FOSS as external “research and development.” His conclusions are supported by Gartner who noted that “the top tech companies are still spending tens of billions of dollars on software research and development, the smart ones are leveraging open source for 80 percent of the code and spending their money on the remaining 20 percent, which represents their program’s ‘special sauce.’” The scope of this trend was emphasized by Microsoft’s announcement that it was “open sourcing” the .NET software framework (this software is used by millions of developers to build and operate websites and other large online applications).

The other item of “news” news is that the dust up with regard to Google and Java for Android continues. Who wants to risk a similar patent action? The answer to that question will help inform your assessment of the “news”.

I interpreted the information to suggest that open source is increasingly commercial. Good news or just news?

Stephen E Arnold, January 14, 2015

FOSS Supporters: Sharing Development Costs the 21st Century Way

January 11, 2015

In the good old days, proprietary software was funded by the company owning the technology, shareholders/investors, and “partners” sucked into the “pay to work with us” model perfected by outfits like IBM.

I read “Big Names Like Google Dominate Open Source Funding.” One of the points that I gleaned from the write up is that a handful of larger commercial firms support certain open source projects. The data are based on various records and incomplete data sets. Also the data presumably do not include statements from Eastern European open source contributors who are polishing their résumé or college professors working to create their own bit of financial heaven.

The article includes a graphic that identifies some of the big supporters of open source. There are some names I did not recognize like Credativ and 10gen. But there were a few that jumped out at me; namely, Google, IBM, and that bastion of management excellence, Hewlett Packard.

I formulate three thoughts after working through the admittedly flawed analysis included in Network World, a publication which I view with healthy skepticism.

First, with large companies funding open source projects, the cost of R&D has been pushed down and shared. This is good for big outfits who can get out of the business of supporting software that are essentially utilities.

Second, open source is less about community and more about getting folks jobs and opportunities to set up an “open source consulting services company.” When you take a look at LucidWorks (Really?), you see folks trying to emulate pure consulting firms. But open source search is just one example of this model of using “free stuff” to sell expensive engineering. This works on a small scale, but when you try to pump up a “free software” company to the size of RedHat, that taxes the management capabilities of some whizzy Silicon Valley types.

Third, open source does not always result in free and open source software. Consider IBM’s approach. By repackaging Lucene and attributing serious juju to search, the company hopes to build a $10 billion business in 48 to 60 months. Not gonna happen. IBM faces many challenges, but those infected with spreadsheet fever twiddle the numbers to create a fictional world. Is Google really free and open source? What about Google Earth? Is Hewlett Packard, bless its management heart, is not quite the model of open source goodness shareholders want.

No surprises in the write up, but the change in what once seemed like a good idea does not trouble Network World. Open source sounds great and offers a way out of massive, continuing investments in maintaining certain types of software. That money can be better used to create proprietary extensions that customers have to pay for.

Stephen E Arnold, January 11,2015

Shades of Ray Kurzweil: Watson to Crack Ageing

January 11, 2015

I am not too keen on immortality. My view is that stuff dies. Age appropriate behavior means accepting the lot of mortal man.

But some folks want to extend their lives; others hope to live forever like the nano-stuff creatures in Alastair Reynolds’ novels.

I associate the live longer and collect stock options approach with Ray Kurzweil, the Google big thinker and music inventor. Well, I learned something in “IBM Watson’s Lab to Tackle Aging Issues.” Now Watson with its chugging heart of Lucene has lifetimes of revenue to generate before some activist investors put a bit in this pony’s mouth.

The write up says:

IBM Korea will build a cognitive computing center in Seoul to help tackle an aging society with technology. “IBM submitted a letter of intent to the Seoul Metropolitan Government last month to set up a Watson lab to study smart-aging technology,” IBM Korea said…

I found this statement remarkable because IBM has not turned Lucene and home-grown scripts into a multi billion dollar revenue stream. On the other hand, it has helped the delis close to the IBM Watson facility in Manhattan prospect.

IBM has taken major steps to develop Watson as a new business line for future success. Watson has made achievements in diagnostic medicine and cancer treatment.

The approach involves the phone and microwave company Samsung and various universities, start ups, and public relation professionals in South Korea.

I assume more details will be revealed in Technology Review, a publication that covers Watson’s twists and turns in exquisite, marketing detail.

If you want to get on the anti-ageing train, board in South Korea. Like the projected $10 billion in revenue from a Lucene based system, let me know how those crow’s feet fly.

Stephen E Arnold, January 11, 2015

Ranking Countries Data Openness

January 5, 2015 is one of the largest bastions for the open source community and they recently published an article that ranks countries around the world in how much of their data is open for public access: “The Global ‘Open’” Pulse From The 2014 Open Data Index.” The information is pulled from Open Knowledge’s 2014 Open Data Index. According to the numbers, governments are not being as open as they should, because the level is down to 11% from 15%.

“The OKF defines “open” in the context of this report as a data set which adheres to the open definition standard as open. The current definition of “open” per can be summarized as ‘open data and content can be freely used, modified, and shared by anyone for any purpose.’ “

There was progress in 2014, however. The United Kingdom is the most open. France and India rose on the list of openness. The number of countries who are open went from sixty to ninety-seven. The Is 70% open, dropping to 8th place over second in 2013. Africa, Asia, and the Middle East are improving their numbers.

Open Knowledge’s entire goal is to increase the amount of information about government activities, so people can exercise their rights. What is disappointing is that while many more countries are showing up on the list, they are not living up the definition of “open.”

Whitney Grace, January 05, 2015
Sponsored by, developer of Augmentext

Next Page »