Faceted Search: From the 1990s to Forever and Ever

January 4, 2015

Keyword retrieval is useful. But it is not good for some tasks. In the mid 1990s, Endeca’s founders “invented” a better way. The name that will get you a letter from a lawyer is “guided navigation.” The patents make clear the computational procedure required to make facets work.

The more general name of the feature is “faceted navigation.” For those struggling with indexing, faceted navigation “exposes” the users to content options. This works well if the domain is reasonably stable, the corpus small, and the user knows generally what he or she needs.

To get a useful review of this approach to findability, check out “Faceted Navigation.” Now five years old, the write up will add logs to the fires of taxonomy. However, faceted search is not next generation information access. Faceted navigation is like a flintlock rifle used by Lewis and Clark. Just don’t try to kill any Big Data bears with the method. And Twitter outputs? Look elsewhere.

Stephen E Arnold, January 4, 2014

Bloomreach: Googlers, MBAs, and $41 Million in Funding

January 3, 2015

Founded in 2009, Bloomreach is now popping up in my Overflight system. The company is buying Google ads and publishing a blog written by Bloomreach’s storyteller. The company is a “personalized discovery platform.” The angle seems to be ecommerce search, which will probably make EasyAsk, Endeca, and SLI Systems long for the day when MBAs ignored search for more glamorous endeavors.

The company offers an interesting mix of marketing oriented search services. There is hosted search and consulting. I noted a bit of search engine optimization as well. And, not surprisingly, there is some “Big Data marketing” lingo too.

Information about the company is available at this link.

Stephen E Arnold, January 4, 2015

Enterprise Search: Parkour for Venture Funded Enterprise Search Vendors

January 3, 2015

Parkour refers to the sport of jumping and climbing on man made constructions. Note that most of these “obstacles” have doors, staircases, and maybe elevators.

There are some terms that make this seemingly crazy activity sound really cool. For example, I learned whilst on vacation about the KONG. This is a suat de chat and involves “diving forward over an obstacle so that the body becomes horizontal, pushing off with the hands and tucking the legs such that the body is brought back to a vertical positio0n, ready to land.” See Parkour Terminology.

I also found this maneuver fascinating:

Kash vault This vault is a combination of two vaults; the cat pass and the dash vault. After pushing off with the hands in a cat pass, the body continues past vertical over the object until the feet are leading the body. The kash vault is then finished by pushing off the object at the end, as in a dash vault.

Here’s an image of a parkour expert doing parkour, of course:

image

Image source: http://parkourfreerunningblog.com/wp-content/uploads/2011/10/parkour.jpg

Now this looks like something a crazy person does: Jumping off a large concrete structure. Just my opinion, of course.

And, from my point of view, parkour is very similar to selling proprietary enterprise search and content processing solutions to commercial enterprises. The danger comes from having to pay stakeholders for the cash borrowed to keep the enterprise search company afloat. The thrill comes from the knife edge under feet: one error and some serious pain results. I suppose this focuses the mind.

As 2015 gets underway, enterprise search “experts” and vendors are gearing up to make sales. Some of the antics are beneficial to the mid tier consulting firms and publications that list the “visionaries,” the “companies that matter”, and the “leaders.” There are individual experts who conflate search with mastering Big Data or delivering the fuzzy wuzzy notion of information governance. Then there are the search vendors who wrap keyword search and classification in Dollar General wrapping paper. The idea is that keyword search is customer relationship management, analytics, and business intelligence.

For me, this is search vendor parkour, and it is okay for the tiny percentage of the population who want to jump off man-made structures. But for a person with a bit of information retrieval perspective, there are some other ways to get some exercise, remain whole, and not look absolutely crazy to an outside observer.

Here are some enterprise search realities to ponder this weekend:

First, if IBM and HP actually hit their magical billion collar goals for Watson and IDOL, how much money will be left for the hundreds and hundreds of smaller search system vendors. The answer is, “Generating billions from search is not possible, and the money available tends to be a tiny fraction of these behemoths’ projections.”

Second, why would a company pay for a commercial keyword search system when there are perfectly functional open source solutions like Elasticsearch, FLAX, and SphinxSearch?

Third, how can keyword search enriched with some clustering deliver actionable intelligence? There are companies specializing in delivering actionable intelligence. Such firms as BAE and Leidos have robust platforms that collect, analyze, and report automatically. Guessing which words unlock the treasures of an index seems somewhat old fashioned to me.

Fourth, how will the companies pouring millions upon millions into Attivio, BA Insight, Coveo, and a dozen other keyword search companies get their money back? I suppose there is the hope that Google, Microsoft, or Oracle will buy one of these firms. But that looks like a long shot. My view is that paying back the investors is going to be difficult, if not impossible.

Now these statements are sobering. One can immerse oneself in that baloney generated by the mid tier consultants (one of which Dave Schubmehls my research), the silliness generated by content management blogs about findability, and the wonkery of search engine optimization wizards.

The year 2015 will witness some significant shifts in the enterprise search landscape. In my forthcoming CyberOSINT: Next Generation Information Access, I explain the type of systems that are underpinning intelligence systems in the US and EC nations. I point out the specific functionalities of these next generation systems that make search a utility. Think of Mac OSX and its inclusion of Spotlight. Nice to have, for sure, but search is not OSX. My research team and I also identify some important lessons the NGIA vendors are teaching their customers. We also look ahead and identify some research areas that are likely to capture investors’ attention and yield measurable results.

Search is a utility. The fact that some brave people convert it to parkour does not change the fact that the activity itself is risky, entertaining, and useless. If I were an athlete, which I am not, I would focus on sports that generate the big bucks. Hoops. Football. Soccer. Parkour? That looks nuts from my vantage point in Harrod’s Creek.

Why not sell something the customer can see solves a problem? Crazy jumps just call attention to the last gasps of a software sector that needs life support.

Stephen E Arnold, January 3, 2015

Qwant, Not Quant

January 3, 2015

Remember Pertimm? No problem. I scanned Techmeme yesterday and noted a link to a story about Qwant, another Google killer from France and the publishing wizards at Axil Springer. You might have some trouble locating the service because Techmeme spelled Qwant “correctly” if you live in Silicon Valley:

image

I covered Qwant in one of my for fee Information Today columns. I won’t recycle that analysis here.

The “news” is that Qwant is going to roll out a child friendly version of its search system. Here’s the interface for Qwant. I wonder how many children can figure out what’s what?

image

Notice that the blank column contains news about my query “Qwant child friendly.” What do you think about a service that doesn’t present news about itself?

Fascinating. Will French children be thwarted by Qwant’s effort to protect them from adult content? LA schools found out that blocked iPads were a no brainer to convert from school stuff or more thrilling content.

Stephen E Arnold, January 3, 2015

List of Cyber Security Companies

January 3, 2015

Short honk: Cyber is hot. Cyber security is even hotter. Some, well, most, of the cyber outfits are not household names. The blue chip consulting firm has produced a list of 100 of these cyber security outfits. If you want the list, navigate to New United’s article “Top 100 Cyber Security Companies: Ones to Watch in 2016.” Keep in mind that this list is probably some of the prospects that the consulting firm wants to convert into paying customers. Nevertheless, the list is interesting if incomplete.

Stephen E Arnold, January 3, 2016

Yahoo: The Ghosts of Christmas Past

January 2, 2015

I read “The Day Marissa Mayer’s Honeymoon at Yahoo Ended.” The write up did not mention Ms. Mayer’s penchant for arriving late. That’s a plus. The article states:

Why was Mayer throwing away all the goodwill she had earned with a series of policies that were, at best, poorly rolled out and badly explained to employees or, at worst, plain mistakes. They wondered, more seriously than at any time since she joined, if Mayer was actually up for the job of saving Yahoo.

What Ms. Mayer did, however, as many in attendance will recall, was read a children’s book. The article points out:

No one understood what Mayer was trying to say.

The article walks through a number of interesting managerial actions, including the variation on Neutron Jack’s winnowing of the troops in GE’s business units. Yep, he actually yelled in the meeting I had the thrill of attending. He also turned red. I know that fear was part of the method. Did not work for me, however.

The article provides a useful list of Googley actions that used to work at the GOOG. At Yahoo, the shadow of Semel created a different ethos. Resignation? Indifference? I am not sure.

If you want more about missteps, you will be interested in the book the article promotes. Why not advertise on Yahoo?

In my opinion, Yahoo is wending its way to the same fate that befell Lycos. Is there a Marley amongst the Yahooligans?

Stephen E Arnold, January 2, 2015

Watson Goes Open Source…Not Really

January 2, 2015

IBM’s Watson is becoming a new natural language processing analytical tool. It is doubtful that IBM will ever expose Watson’s guts to the open source community, but parts of its internal software organs were designed around existing open source work. Also do not doubt the open source community’s resourcefulness. The community is already building their own Watson-like entities. InfoWorld lists these open source projects on “Watson Wannabes: 4 Open Source Projects For Machine Intelligence.”

DARPA DeepDive is an automated system for classifying unstructured data that emulates Watosn’s decision-making process with human guidance. Christopher Re of the University of Wisconsin, developed it.

Apache Unstructured Information Management (UIMA) is a program that was actually used to program Watson. It is a standard for performing analysis on textual content. IBM UIMA architecture is available via the open source Apache Foundation. It is not a complete machine learning system and only offers the minimum code to build on.

OpenCog’s goal is to build a platform for developers to build and share artificial intelligence programs. OpenCog wants to help create intelligent systems that have humanlike world understanding rather than being focused on one specific area. OpenCog is already using NLP, making it a practical solution similar to Watson.

The Open Advancement of Question Answering Systems (OAQA) is more akin to Watson than the other three. It offers an advanced question and answering system-using NLP. IBM and Carnegie Mellon University started it. OAQA is only a toolkit, not a downloadable solution.

“The one major drawback to each project, as you can guess, is that they’re not offered in nearly as refined or polished a package as Watson. Whereas Watson is designed to be used immediately in a business context, these are raw toolkits that require heavy lifting. Plus, Watson’s services have already been pre-trained with a curated body of real-world data. With these systems, you’ll have to supply the data sources, which may prove to be a far bigger project than the programming itself.”

All too true.

Whitney Grace, January 02, 2015
Sponsored by ArnoldIT.com, developer of Augmentext

A SASsy Hadoop Data Connection

January 2, 2015

It has been a while since we posted an article that highlights Hadoop’s capabilities and benefits. The SAS Data Management blog talks about how data sources are increasing and Hadoop can help companies organize and use their data: “The Snap, Crackle, And Pop Of Data Management On Hadoop.”

SAS is a leading provider of data management solutions, including an entire line based on the open source Hadoop software. They offer several ways to control data, including the FROM, WITH, and IN options. While the names are simple, they sun up the processes in one world.

The SAS FROM allows users to connect to the Hadoop cluster. It connects to Hadoop using an SAS/ACCESS engine, which collects metadata built in Hadoop and making them available in the data flows. This allows the software to make performance decisions without user intervention.

SAS WITH is more complicated based off its give and take function:

“The SAS WITH story provides transformation capabilities not yet available in Hadoop. UPDATE and DELETE are standard SQL transformations used in a variety of data processing programs. Hive does not yet support these functions, but you can utilize PROC IMSTAT (part of the WITH story) to lift a table or partition into memory and perform these functions in parallel. The table or partition could then be reincorporated into the Hive table, alleviating the need to truncate and reload from an RDBMS data source.”

SAS IN has the most advanced coding capabilities for data management. It allows users to run a program, where they can run eight functions in parallel against Hadoop data tables. They can also use DS2 language to perform difficult transformation of a table in parallel.

SAS’s three new Hadoop interactions allow for better streamlining of data from multiple sources and provides more insight into industry applications.

Whitney Grace, January 02, 2015
Sponsored by ArnoldIT.com, developer of Augmentext

Need Some Emails?

January 1, 2015

I read “Why Deleting Sensitive Information from Github Does Save You.” The write up is intended for developers. The information in the article makes it easy to suck up Github content and extract several million live emails. Here’s an example from the write up:

GHTorrent advertises itself as an “offline mirror of data”. In a nutshell, it keeps track of a ton of data that flows through Github’s Events API stream, and recursively resolves dependencies to relate, say, a commit object to an event object. Currently, they suggest they have accumulated the data from 2012-2014. This database has incredible potential for researchers, but also allows for hackers to pull previously deleted or changed data en masse. Granted, from what I can tell they don’t store the actual file content (so your accidentally committed password won’t be stored), but that doesn’t mean that there isn’t sensitive data to be had.

Want to know how? Just navigate to the original story.

Stephen E Arnold, January 1, 2015

A Big NGIA Year Ahead

January 1, 2015

The New Year is upon us. We will be posting a page on Xenky where you can request a copy of CyberOSINT: Next Generation Information Access, a link to the seminar which is limited to law enforcement and intelligence professionals only, and some supplementary information that will allow my Beyond Search blog to shift from the dead end enterprise search to the hottest topics in information access.

If you want information about CyberOSINT: Next Generation Information Access, you can send an email to benkent2020 at yahoo dot com. We will send you a one pager about the study. To purchase the book, you must be an active member of the armed forces, a working law enforcement professional, or an individual working for one of the recognized intelligence agencies we support; for example, a NATO member’s intelligence operation.

Stephen E Arnold, January 1, 2015

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta