Oracle and Open Source

October 21, 2013

I will be giving my last public talk in 2013 at the upcoming Search Summit. I am revealing some data about the trajectory of commercial search versus free and open source search. My focus is not just on costs. I will address the elephant in the room that few of the sleek search poobahs elect to ignore—management.

As part  of my preparation, I read an interesting public relations and positioning white paper from Oracle. The essay is “The Department of Defense (DoD) and Open Source Software.” You should be able to locate a copy at the Oracle Middleware Web page. But maybe not. Well, take that up with Oracle, Google, and whoever indexes public Web pages.

The argument in the white paper is that open source is useful within the context of commercial software. The premise is that a commercial company develops robust products like Oracle’s database and then rigorously engineers that product to meet the tough standards imposed by the US government. Then, canny engineers will integrate some open source software into that commercial solution. The client—in this case the Microsoft loving Department of Defense—will be able to get the support it needs to handle the demands of global war fighting.

There are three fascinating rhetorical flourishes in the white paper. These are directly germane to the direction some of the discussions of commercial and proprietary versus free and open source software have been moving. I will give a couple of case examples in my talk in early November 2013, and I assume that the slide deck for my talk will find its way into one or more indexing services. I won’t plow that ground again. Below are some new thoughts.

First, the notion that commercial and proprietary software is better than open source software is amusing. I think that any enterprise software is rife with bugs and problems that can never be fixed because there is neither time, money, or appetite to ameliorate the problems. I was at a meeting at the world’s largest software company when one executive said, “There are a couple thousand bugs in Word. Numbering is one issue. We will maybe get around to fixing the problem.” That was six years ago. Guess what? Numbering is still an interesting challenge in a long document. Is Oracle like the world’s largest software company? Oracle has some interesting features in its products? Check out this sample page. Make your own decision. Software has been, is, and will be complicated stuff. The fact that people correlate clicking a hot link with “simple” just adds impetus to the “this is easy” view of modern systems. No software is better. Some works within specific parameters. Push outside the parameters and you find darned exciting things.

Second, the idea that a large bureaucracy can make decisions based on cost benefits is crazy. Worldwide bean counters and lawyers work to nail down assumptions and statements of work that are designed to minimize costs and deliver specific functionality. How is that working out? If I read one more after the fact analysis of the flawed heath insurance Web site, I may unplug my computer and revert to paper and printed books. I did a major study of a government site in 2007. Guess what? The system did not work and still does not work. Are there analyses, reports, and Web pages explaining the issue? Sure. What’s the fix? People either go to a government office and talk to a human or make a phone call in the hope that the human on the other end of the line can address the issue. The computer system? Unchanged. My report? Probably still in a drawer somewhere.

Third, the idea that a publicly traded company cares about open source is amusing. Open source is simply a vehicle to reduce costs to the publicly traded company and generate consulting revenue. The fact is that most of the folks who embrace open source need some help from firms specializing in that open source product. I can name two companies, each with more than $30 million in venture funding, that have a business model built on selling proprietary software, consulting, and engineering services. Open source sure looks like a Trojan horse to me. Why does IBM embrace Lucene yet sell branded products and services? Maybe to eliminate some software acquisition costs and sell consulting.

A happy quack to http://goo.gl/lxKb6I

On one hand, Oracle is correct in pointing out that free and open source software looks cheaper than commercial and proprietary software in terms of licensing fees. Oracle is also correct that the major cost of software has little to do with the license fee.

On the other hand, Oracle adds some mist to the fog surrounding open source. When open source vendors have to generate revenue to pay back investors or build out their commercial business, the costs are likely to be high.

Open source software begins as a public spirited effort, a way to demonstrate programming skills, and a marketing effort. There are other reasons as well. But in today’s world, software is the weak link in most businesses. Systems are getting less reliable, despite the long string of nines that some companies use to prove their systems are wonderful. But like the optical character recognition program that is 99 percent accurate, the more content pushed through these system, the more the errors mount. Xerox continues to struggle with error rates in a technology that was supposed to be a slam dunk.

Net net: Read the Oracle white paper. Then when you work out a budget, focus less on the sizzle of open source and more on the basic management skills it takes to make something work on time and on budget. Remember. Publicly traded companies and open source companies that have taken money from venture capitalists have to generate a profit or they disappear.

The basics are important. The Oracle white paper skips over some of these in its effort to put open source in perspective. Any software project requires attention to detail, pragmatism, technical expertise, and money.

Stephen E Arnold, October 21, 2013

Blippex for a Different Kind of Search

October 17, 2013

Since Google came to dominate the internet search landscape, many rivals have launched. Some have found varying degrees of success, but none have come close to overtaking the master. Now, blogger Christopher Mims believes he may have found a contender in Blippex; “This Is the First Interesting Search Engine Since Google,” Quartz declares. We also found Blippex interesting.

Mims notes that, unlike most competitors, Blippex is not trying to reinvent the Googly wheel. Its approach is different. Instead of indexing the web in general, Blippex looks only at pages its users have visited. The article explains:

“Blippex’s algorithm, called DwellRank, decides relevance based on how long users spend on a site and how many times Blippex users have visited it. Researchers at the University of Massachusetts Amherst have, independently of the Blippex team, established that the amount of time someone spends on a web page or document is, not surprisingly, a pretty good measure of how important and relevant it is (pdf). Blippex gets this information by having you download a plugin for your web browser. This plugin measures how long you spend on each site and sends the information to Blippex, anonymized—that is, stripped of any information that could identify you.”

Isn’t this approach a bit limiting? For now, yes, but the makers of Blippex liken the young site to Wikipedia, which became much more effective as users contributed information. Currently, says Mims, the site’s user base is mostly geeky early adopters, so it is a good place to go for programming questions. It is also adequate for recent events, he writes, but is not the place for more obscure searches. With the limitations, why bother? Well, Blippex’s “fanatical” commitment to privacy is one reason; like DuckDuckGo, the site does not track its users. They even made their browser plugin open source, so folks can verify that it is not collecting private information. And, of course, the results will get better as more people install that plugin.

There remains one question—how will Blippex make money on this ad-free site? If co-founders Max Kossatz and Gerald Bäck have figured that out yet, they don’t seem to be sharing the answer. The company, based in Austria, launched last July.

Cynthia Murrell, October 17, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Content Spoofing: A Question of Relevance

October 16, 2013

I heard an AAAS podcast about fake academic papers in open access publications. I did not catch much information from the 20 second sound bite. I navigated to Google and keyed this query:

aaas open access journals

The hit I sought was number eight on the search results page. What is interesting is that the current “hot” item ranked below older information. In one case, the hit was irrelevant to my intent filtered by Google’s behind-the-scenes personalization methods; for example, www.sciencemag.org. Another hit pointed to a couple of outdated studies dating in one case from 2005.

And Bing? Same query. No relevant hit on the first page of the Bing results list. What about that Bing off stuff? Maybe baloney?

And Yandex? Same query. No relevant hits on the first page of results.

And DuckDuckGo, the metasearch engine causing some to swoon? No relevant hit.

Thoughts:

  1. Timeliness is not a priority in the free Web indexing systems
  2. Access to rich media containing relevant information for a user’s query is NOT indexed. For all practical purposes, the podcasts are invisible without prior knowledge
  3. Junk results are not filtered by any of the systems.

No big deal for me. Just another example of how the simplest query can return some darned interesting results.

By the way, the Google results page include two ads, both from “traditional publishers.” One of the advertisers publishes commercial databases. My recollection is that some of the content in these information services could be viewed as incorrect. In fact, one of the Google advertisers accepted the bogus paper.

What’s my point?

The task of finding relevant, on point information is getting more difficult, not easier. Furthermore, as folks shift to “hectic” modes of work, the idea that most people will double check information before accepting it as gospel may be outmoded.

Stephen E Arnold, October 16, 2013

Xenky Search Vendor Profile: Entopia

October 15, 2013

I have posted a profile of the now offline enterprise search vendor Entopia. You can access the write up at www.xenky.com/vendor-profiles.

Entopia is an interesting case. The company, like Endeca and Fast Search & Transfer, had embraced the idea that information access was the DNA of an organization. With access to information and metadata, a manager could make better decisions. The marketers jumped on the bandwagon and rolled out some fancy buzzwords to surround the incredibly complex Entopia system.

The Entopia approach is, in my opinion, one that took the SAP R/3 massive reengineering of work processes and applied the notion to information. Entopia included Tacit type tracking to identify people who were centers of influence in a company, search, concepts, automatic indexing, semantics, etc.

The only problem was that the cost of implementing the system once a client had been found was high. In 2006, the company wound down. The firm is still offline, but its very ambitious explanations of what information could do inspired many other vendors.

Like Convera, Entopia described a wonderful world of information access. The problem was and still is delivering in a way that meets users’ expectations and delivers a visible, easily documented payoff to the organization buying the dream and the software.

The profiles will not be updated or maintained. I am providing the information because some students may find the explanations, diagrams, and comments of interest. The information is provided on an “as is” basis. If you want to use this for commercial purposes, please, contact me at seaky2000 at yahoo dot com.

Remember. I am almost 70 years old and some of the final versions of these profiles commanded hefty fees. A reader reminded me that some big outfits have taken my work and reused it, sometimes with permission and sometimes not. Well, these are for your personal use.

Stephen E Arnold, October 15, 2013

SAIL LABS Updates Speech Technology

October 9, 2013

Good news from SAIL LABS Technology! The company has made an exciting press announcement, “SAIL LABS Announces New Release Of Media Mining Indexer 6.2.” Known for its speech software, SAIL LABS has updated its top product for real-time, multi-lingual, multimedia indexing. This updates come at the perfect time when companies are searching for ways to capture the next wave of data from mass media. The updates harnesses the power of SAIL LABS’s speech technology and provides a suite for multimedia processing, taking audio and video data and indexing it into a searchable format.

SAIL LABS is excited about the update:

“ ‘We are proud to introduce the latest version of the Media Mining Indexer. This newest version is the result of our ongoing investments in innovation and represents a quantum leap forward in terms of system flexibility and stability, overall performance and processing capabilities to optimally respond to client needs. Our customers will be pleased with the progress and we invite all others to join us on the way into the future,‘ says Gerhard Backfried, Head of Research at SAIL LABS.”

Image driven indexes are predicted to gain more prominence in the next few months. SAIL LABS has caught on quickly to a market need and will likely increase their customer base. Will others follow suit?

Whitney Grace, October 09, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Ex Endeca Execs: Giving New Life to Route 128?

October 6, 2013

I read “Cambridge Firm Is Fertile Ground for Entrepreneurs.” The Massachusetts in crowd should be thrilled with the Boston Globe’s story. In addition to a graphic which puts Endeca at the center of a universe of start ups, the story draws an interesting parallel for me:

Like its much bigger predecessors, Digital Equipment Corp. and Lotus Development Corp., two seminal Boston companies acquired by competitors, Endeca is emerging as a fount of new business activity, churning out the next generation of entrepreneurs and helping to expand the region’s technology economy.

The write up then references the influence tendrils of what I assume is “fertile ground” to Xerox, Digital Equipment, and Lotus.

The article included this passage as well:

But the $1 billion paid by Oracle made some Endeca employees wealthy, which certainly made it easier for them to decide to start companies. And more may follow. Venture capitalists report they are in contact with other Endecans who are contemplating leaving Oracle. Oracle declined to comment for this story. And the Diaspora might have been bigger had Endeca been on the West Coast, where the cycle of people leaving companies for start-ups happens much faster than in Massachusetts. One reason is that many large Boston companies have employees sign noncompete agreements, which can limit their ability to spin off a start-up. Noncompete agreements are not enforced in California. Endeca employees signed noncompetes, but so far those who have started companies are not direct competitors. The new businesses range from social media to medical records companies.

Then this quote to note: “We did a good job of training people how to be entrepreneurs,” said Papa, so that they are not all trying to just “build the next Endeca.” Steve Papa was one of the founders of Endeca.

My thoughts turned to other search companies that sold out. Has there been a similar surge of innovation from:

  • Autonomy founders
  • Exalead founders
  • ISYS Search Software founders
  • Verity founders
  • Vivisimo founders

I don’t recall a similar explosion of innovation from any of these firms nor a glittering write up in a major, “real” newspaper. There are, I believe, some questions which beg to be answered:

  1. What makes Endeca different?
  2. Why haven’t other search vendors’ founders gone the start up route?
  3. What is the survivability of start ups created by founders of iPhrase (acquired by IBM), Inxight (acquired by Business Objects), and other long-ago winners in the buy out game?

I don’t have any answers, and I am personally delighted that there will not be another Endeca coming down the pike. The notion of blending a Yahoo style directory with key word indexing and then layering on eCommerce, publishing, business intelligence, and other functions is a path well worn by Convera, Delphes, Entopia, and some of IBM’s search efforts.

Endeca, based on my notes, was heavy on MBA think and less into Google-style technology. The list of Endeca spawned start ups includes Salsify, Thank Media, and Toast among others. Each has a hefty dose of “management.” Perhaps MBAs are the answer to market traction?

Stephen E Arnold, October 6, 2013

Open Source Vocabulary Server Updates Software

October 5, 2013

Open source most likely has a solution for all of your software needs, including a vocabulary server to manage controlled taxonomies, thesauruses, and, of course, vocabularies. The great news is that one exists and it is called TemaTres. Some open source software has the misfortune of never being updated by its developers, but it was recently updated, “TemaTres 1.7 Released: Now With Meta-Terms And SPARQL Endpoint.”

Here is what you can expect in the newest release:

· Now you can have a SPARQL Endpoint for your TemaTres vocabulary. Many thanks to Enayat Rajabi!!!

· Capability to create and manage meta-terms. Meta-term is a term to describe others terms (Ej: Guide terms, Facets, Categories, etc.). Can’t be use in indexing process.

· New standard reports: all the terms with his UF terms and all the terms with his RT terms.

· Capability to define custom fields in alphabetical export

· New capabilities for TemaTres API: suggest & suggestDetails,

· Fixed bugs and improved several functional aspects.

Most of the changes came in part from the dedicated TemaTres community who helped diagnosed what needed to be fixed and offered ideas for improvement. If only the rest of the open source community could follow TemaTres’s example.

Whitney Grace, October 05, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Exorbyte: From eCommerce to Next Level Info Management

October 4, 2013

I read “Next Level Information Management mit Exorbyte und InovoOlution.” My German is as lousy as my English. I think the main point of the write up is that the eCommerce technology company Exorbyte is now in the “next level information management business” via a partnering deal with InovoOlution.

The first thing that I noticed in the news release was the name “InovoOlution.” The word did not dance trippingly on the tongue. Some folks in Italy did not like the German language in an opera, but then there was Mozart, right?

The second point that caught my attention was this statement:

The partnership delivers integration of the intelligent and fault tolerant search and the NOVO information platform Matchmaker system to improve key processes, in particular the classification , extraction, indexing and classification and search.

My recollection is that Exorbyte’s SearchCube system was once called Matchmaker. Maybe I am wrong. I am not sure if the tie up delivers “next level information management.” The partnership seems to be a combination of a search vendor with a services company. You can get more information about Exorbyte at www.exorbyte.com.

InovoOlution is a company that offers software and services to help licensees optimize and automate the processing of mail, email, and faxes. The firm asserts that it creates “human information technology.” InovoOlution’s Web page is at http://inovoolution.com/en/

Three observations:

  1. Search vendors are pursuing partnerships. This is a good idea because “selling” a standalone search system is getting more difficult based on the information about the difficulty of selling “search”
  2. Company names are becoming quite interesting. I suppose the good names like “Verity” and “Convera” are difficult to match as everyone chases a snazzy domain to help findability
  3. The notion of making information technology human is interesting. The assumption is that organizations will have the money to jump to the “next level of information management.”

One hopes the European Union’s economy remains at its present level. The next level might be a reach. Watch out for the double “O” in InovoOlution when you search for the company via a Web search system.

Stephen E Arnold, October 4, 2013

California Inventors Patent Application for Natural Language Search Interface Now Available Online

September 27, 2013

The article titled “Multimodal Natural Language Interface for Faceted Search” In Patent Application Approval Process on Hispanic Business reveals that inventors in California have applied for a patent of their natural language interface. The inventors are quoted in the article as claiming that the problem of users implementing a “successful query” revolves around an issue of transparency in the criteria of the search being held. The inventors, Farzad Ehsani, Silke Maren Witt-Ehsani filed their patent application in February of 2013 and the patent was made available online early in September of 2013. The article states,

“Solving this problem requires an interface that is natural for the user while producing validly formatted search queries that are sensitive to the structure of the data, and that gives the user an easy and natural method for identifying and modifying search criteria. Ideally, such a system should select an appropriate search engine and tailor its queries based upon the indexing system used by the search engine. Possessing this ability would allow more efficient, accurate and seamless retrieval of appropriate information.”

This quote from the inventors continues on to address the current methods which do not meet the expectations of users in terms of selecting the best search engine and data repository as well as not formulating the search query in the appropriate manner.

Chelsea Kerwin, September 27, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

LinkedIn: Search Less and Less Relevant?

September 24, 2013

I read “Today I Deleted My LinkedIn Account”. One of the goslings handles my LinkedIn account. If a comment is required, the gosling alerts me. One of the researchers snags relevant material and crafts either questions or a statement based on my previously published writings. One research librarian filters requests to be my “friend.” The policy is to ask, “When and where did we meet?” For the most part, the system works, but the information flowing through LinkedIn is not directly relevant to the work we do. Like any social media service, the process helps prevent abuses. We did experience a script kiddy who routed our tweets of articles in this blog and our other online publications through Miley Cyrus’ account. When I looked at what the clever teen had done, I learned a great deal about Ms. Cyrus. Great parenting at work I suppose.

The main point of the “Today I Deleted” write up is that LinkedIn is annoying to the author of the write up. I sympathize with folks who are annoyed at online information services today. The good old days of paying to access File 15 on Dialog are long gone. The hassles were mostly the cost of information and the silliness of the dial up terminal with bunny rabbit ears. I bet you don’t know what bunny rabbit ears are, do you?

The numbers the author presents are astounding. LinkedIn, according to the write up, has 225 million “members.” I am not sure how many are like me, operating through research professionals who are paid to ride herd on social interactions. I am not sure how many are human resource professionals looking to make a buck by referring a person whom the HR professional does not know to a company about which the HR professionals knows only a bit more.

I surmise that the majority of the 225 million are people looking for:

  • Work
  • Human contact albeit digitally intermediated
  • Information about something that will yield money, power, or prestige
  • A way to kill time whilst “looking at potentially high value content”
  • Horn tooting.

The write up focuses on what LinkedIn does to a particular user. For example, LinkedIn emails are annoying. A more interesting aspect of LinkedIn surfaces in this statement:

For the quarter ending June 2013, Facebook reported 1,155,000,000 monthly active users.  Calling their original registration numbers ~1,300,000,000 which is generous), that means that 88% of Facebook’s users actually use the site regularly.

Compare that to LinkedIn, which claims that 170,000,000 of its 218,000,000 users logged in during the quarter ending March 2013, for a total of closer to 77%.  That number actually understates the disparity, because it just measures unique visitors.
While LinkedIn users spend an average of 8 minutes on the site daily, Facebook users hang round for over 33 minutes, or OVER HALF AN HOUR each.  In fact, LinkedIn puts this problem much better than I can:

“The number of our registered members is higher than the number of actual members and a substantial majority of our page views are generated by a minority of our members. Our business may be adversely impacted if we are unable to attract and retain additional members who actively use our services.” (source)

(traffic stats: Facebook,LinkedIn, SEC data: LinkedIn, Facebook).

You should read the original post.

What struck me is that search or finding information within LinkedIn is not mentioned as an issue. LinkedIn hired a Googler to supplement their open source search team. I find that looking for content using the LinkedIn search box is a very interesting process. A direct query leads to the request to log in. (I call the gosling to find out what my user name and password are.) Once logged in, I am asked to upgrade to a paying service. I ignore that and go to third party search systems.

I can access some interesting LinkedIn information using the services which I highlight in my ISS World lecture this week. It appears that some LinkedIn information is indexed by third party services. A click on a link from some of these third party services displays the person’s profile. In some cases, I can view the people in some way “related” the the person about whom I seek information. I find this interesting because I have not been able to answer these questions:

  • What services index LinkedIn content?
  • How much information is available to third party services either via the LinkedIn tools, deals, or just clever spidering?
  • What are the constraints on the use of the LinkedIn data within the third party indexes?

It makes sense to me that LinkedIn would want some of its content in various third party indexes. Because LinkedIn’s search function is unsatisfactory for my purposes, I find the third party approach more helpful to me.

What annoys me about LinkedIn is not its play to make lots of money. I don’t care too much about spam which is easily filtered. I don’t care a whit about the ego centric nature of the system.

I care about search, and I sure hope that LinkedIn improves its search system and I hope it makes explicit what services index LinkedIn content with or without explicit permission.

But saying, doing, and appearing are very different things in today’s challenging business environment. I may get a gosling to look into third party indexing of LinkedIn. For now, I boil down much of an online system’s value to search. For me, that’s the key function. LinkedIn, like other social media systems, wants to focus on other features. Too bad. I think that part of the value of LinkedIn is its content, however flawed. Access would urge me to pay more attention to a service fueled by financial need/desperation, professional branding/visibility, and sales/marketing.

Stephen E Arnold, September 24, 2013

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta