No Wonder Search Is Broken. Software Does Not Work.

September 17, 2012

Several years ago, I ran across a Microsoft centric podcast hosted by an affable American, Scott Hanselman. At the time he worked for a company developing software for the enterprise. Then I think he started working at Microsoft and I lost track of him.

I read “Everything’s Broken and Nobody’s Upset.” The author was Scott Hanselman, who is “a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee.”

The article is a list of bullet points. Each bullet point identifies a range of software problems. Some of these were familiar; for example, iPhoto’s choking on large numbers of pictures on my wife’s new Mac laptop. Others were unknown to me; for example, the lousy performance of Gmail. Hopefully Eric Brewer, founder of Inktomi, can help improve the performance of some Google services.

image

Answer to the Google query “Why are Americans…”

 

The problems, Mr. Hanselman, identifies can be fixed. He writes:

Here we are in 2012 in a world of open standards on an open network, with angle brackets and curly braces flying at gigabit speeds and it’s all a mess. Everyone sucks, equally and completely.

  • Is this a speed problem? Are we feeling we have to develop too fast and loose?
  • Is it a quality issue? Have we forgotten the art and science of Software QA?
  • Is it a people problem? Are folks just not passionate about their software enough to fix it?

I think it’s all of the above. We need to care and we need the collective will to fix it.

My reaction was surprise. I know search, content processing, and Fancy Dan analytics do not work as advertised, as expected, or, in some cases, very well despite the best efforts of rocket scientists.

The idea that the broad world of software is broken was an interesting idea. Last week, I struggled with a client who could not explain what its new technology actually delivered to a user. The reason was that the words the person was using did not match what the new software widget actually did. Maybe the rush to come up with clever marketing catchphrases is more important than solving a problem for a user?

In the three disciplines we monitor—search, content processing, and analytics—I do not have a broad method for remediating “broken” software. My team and I have found that the approach outlined by Martin White and I in Successful Enterprise Search Management is just ignored by those implementing search. I can’t speak for Martin, but my experience is that the people who want to implement a search, content processing or analytics system demonstrate these characteristics. These items are not universally shared, but I have gathered the most frequent actions and statements over the last year for the list. The reason for lousy search-related systems:

  • Short cuts only, please. Another consultant explained that buying third party components was cheaper, quicker, and easier than looking at the existing search related system
  • Something for nothing. The idea is that a free system is going to save the day.
  • New is better. The perception that a new system from a different vendor would solve the findability problem because it was different
  • We are too busy. The belief that talking to the users of a system was a waste of time. The typical statement about this can be summarized, “Users don’t know what they want or need.”
  • No appetite for grunt work. This is an entitlement problem because figuring out metrics like content volume, processing issues for content normalization, and reviewing candidate term lists is not their job or too hard.
  • No knowledge. This is a weird problem caused in part by point-and-click interfaces or predictive systems like Google’s. Those who should know about search related issues do not. Therefore, education is needed. Like recalcitrant 6th graders, the effort required to learn is not there.
  • Looking for greener pastures. Many of those working on search related projects are looking to jump to a different and higher paying job in the organization or leave the company to do a start up. As a result, search related projects are irrelevant.

The problem in search, therefore, is not the technology. Most of the systems are essentially the same as those which have been available for decades. Yes, decades. Precision and recall remain in the 80 percent range. Predictive systems chop down data sets to more usable chunks but prediction is a hit and miss game. Automated indexing requires a human to keep the system on track.

The problem is anchored in humans: Their knowledge, their ability to prioritize search related tasks, their willingness to learn. Net net: Software is not getting much better, but it is prettier than a blinking dot on a VAX terminal. Better? Nah. Upset? Nope, there are distractions and Facebook pals to provide assurances that everything is A-OK.

Stephen E Arnold, September 17, 2012

Sponsored by Augmentext

IBM and Its Predictive Analytics Push

September 12, 2012

I prefer to examine the plumbing of search and content processing systems. What is becoming increasingly obvious to me is that many of the “new” business intelligence and eDiscovery vendors are licensing technology and putting a different user interface on what is a collection of components.

Slap on visualization and some game-like controls and you have “big data analytics.” Swizzle around the decades-old technology from Oracle, and you still find the Oracle database system. Probe the Hadoop vendors, and you find fancy dancing away from the batch orientation of the NoSQL data management framework. Check out the indexing subsystems and you find third parties which a handful of customers who license their technology to a “wrapper company.”

The phrase “wrapper company” and the product approach of “wrapper bundles” is now described in some clever marketing lingo. The notion of federation, real time, and distributed data are woven into systems which predict, permit discovery, and allow users to find answers to questions the user did not know to ask.

Everything sounds so “beyond search.” I think many of the licensees and prospects react to the visualizations in the demos and the promise that a business professional can use these systems without knowing about the underlying data, programming, or statistical methods is what sells. Who wants to pay for a person to babysit a system and write custom reports? Chop that headcount because the modern systems are “smart.”

Next generation analytics systems are, like enterprise search, comprised of many moving parts. For most professionals, the “moving parts” are of little interest and even less frequently scrutinized. Users want answers or information without having to do much more than glance at a visual display. The ideal system says, “Hello, Dave, here’s what you need to know right now.”

The IBM Ad

I noted an advertisement in the Wall Street Journal, on September 10, 2012 on page A20. The advertiser was IBM. The full page ad featured the headline, “We Used to Schedule Repairs.” The idea is that smart software monitors complex systems and proactively find, repairs, and notifies before a system fails.

Sounds fantastic.

The ad asserts:

Fixing what will break next, first. Managing [the client’s] infrastructure proactively rather than reactively has helped the utility reduce its customer calls by 36 percent.”

The argument concludes:

Replacing intuition with analytics. No one knows your organization’s millions of moving parts better than you. But now with IBM predictive maintenance, you can spend less time and fewer resources repairing things either too early to too late, and more time focusing your attention on what happens next.”

The ad points me to this IBM page:

image

Snappy visualizations, the phrase “smarter analytics,” and a video round out the supplemental information.

Observations

Three observations:

  1. IBM has the resources to launch a major promotion of its predictive analytics capabilities. The footprint of IBM in this concept space may boost interest in analytics. However, smaller firms will have to be able to differentiate themselves and offer the type of benefits and customer references IBM employs.
  2. The approach of the copy in the ad is to make predictive analytics synonymous with smart management and cost effective systems. Many of the analytics companies struggle to articulate a clear value proposition like this.
  3. The notion of making a smarter information technology department fits into IBM’s broader message of a smarter planet, city, government, etc. Big ideas like this are certainly easier to grasp than the nitty gritty, weaknesses, and costs of computationally canned methods.

For smaller analytics vendors, it is game on.

Stephen E Arnold, September 12, 2012

Sponsored by Augmentext

More on Marketing Confusion in Big Data Analytics

September 11, 2012

Search vendors are like a squirrel dodging traffic. Some make it across the road safely. Others? Well, there is a squirrel heaven I assume. Which search vendors will survive the speeding tractor trailers carrying big data, analytics, and visualization to customers who are famished for systems which make sense of information? I don’t know. No one really knows.

Do squirrels understand high speed, high volume traffic? A happy quack to http://surykatki.blox.pl/html/1310721,262146,14,15.html?7,2007 for a fierce squirrel image.

What is fascinating is to watch the Darwinian process at work among vendors of search and content processing. TextRadar’s “Content Intelligence: An Unexpected Collision Is Coming” makes clear that there are quite a few companies not widely known in the financial and health care markets. Some of these companies have opportunities to make the leap from government contract work to commercial work for Fortune 1000 companies.

But what about more traditional search vendors?

I received in the snail mail a copy of Oracle Magazine. September October 2012. The article which caught my attention was “New Questions, Fast Answers.” The information was in the form of an interview between Rich Schwerin, an Oracle magazine writer, and Paul Sonderegger, senior director of analytics at Oracle. Mr. Sonderegger was the chief strategist at Endeca, which is now part of the Oracle family of companies.

I have followed Endeca since I first learned about the company in 1999, 22 years ago. Like many traditional search vendors, the underlying technical concepts of Endeca date from the salad days of key word search. Endeca’s innovation was to identify concepts either human-assigned or generated by software to group related information. The idea was that a user could run a query and then click on concepts to “discover” information not in the explicit key word match. Endeca dubbed the function “guided navigation” and applied the approach to eCommerce as well as search across the type of information found in a company. The core of the technology was the “Endeca MDEX” engine. At the time of Endeca’s market entrance, there were only a handful of companies competing for enterprise search and eCommerce. In the last two decades the field has narrowed in one sense with the big name companies acquired by larger firms and broadened in another. There are hundreds of vendors offering search, but the majority of these companies use different words to describe indexing and search.

One Endeca executive (Peter Bell) told me in 2005 that the company had been growing at 100 percent each year since 2002.” At the time of the Oracle buy out, I estimated that Endeca had hit about $150 million in revenues. Oracle paid about $1.1 billion for the company or what, if I am accurate, amounts to about 10 times annual revenues. Endeca was a relative bargain compared to Hewlett Packard’s purchase of Autonomy for $10 billion. Autonomy, founded a few years before Endeca, had reached about $850 million in annual revenues, so the multiple on revenues was greater than the Endeca deal. The point is that both of these search giants ranked one and two in enterprise search revenues. Both companies emphasized their technologies’ ability to handle structured and unstructured information. Both Autonomy and Endeca offered business intelligence solutions. In short, both companies had capabilities which some of the newcomers mentioned in the Text Radar article are now touting as fresh and innovative. One key point: It took 22 years for Endeca to hit $150 million and now Oracle has to generate more revenue from the aging Endeca technology. HP has the same challenge with Autonomy, of course. Revenue generation, in my opinion, has been time consuming and difficult. Of the hundreds of vendors past and present, only two have broken the $150 million in revenue barrier. Google and Microsoft would be quick to point out that their search systems are far larger, but these are special cases because it is difficult to unwrap search revenues from other revenue streams.

What does Mr. Sonderegger say in this Oracle Magazine interview. Let me highlight three points and urge you to read the full text of his remarks.

Easy Access

First, business users do not know how to write queries, so “guided navigation” services are needed. Mr. Sonderegger noted:

There has to be some easy way to explore, some way to search and navigate as easily as you do on an e-commerce site.

Most of the current vendors of analytics and findability systems seem to have made the leap from point-and-click to snazzy visualizations. The Endeca angle is that users want to discover and navigate. The companies referenced in the Text Radar story want to make the experience visual, almost video-game like.

Read more

More Content Processing Brand Confusion

September 7, 2012

On a call with a so-so investment outfit once spawned from JP Morgan’s empire, the whiz kids on the call with me asked me to name some interesting companies I was monitoring. I spit out two or three. One name created a hiatus. The spiffy young MBA asked me, “Are you tracking a pump company?”

I realized that when one names search and content processing firms, the name of the company and its brand are important. I was referring to an outfit called “Centrifuge”, a firm along with dozens if not hundreds of others in the pursuit of the big data rainbow. The company has an interesting product, and you can read about the firm at www.centrifugesystems.com.

Now the confusion. Google thinks Centrifuge business intelligence is the same as centrifuge coolant sludge systems. Interesting.

relationship detail image

There is a pump and valve outfit called Centrifuge at www.centrisys.us. This outfit, it turns out, has a heck of a marketing program. Utilizing YouTube, a search for “centrifuge systems” returns a raft of information timber about viscosity, manganese phosphate, and lead dust slurry.

I have commented on the “findability” problem in the search, analytics, and content processing sector in my various writings and in my few and far between public speaking engagements. My 68 years weigh heavily on me when a 20-something pitches a talk in some place far from Harrod’s Creek, Kentucky.

The semantic difference between analytics and lead dust slurry is obvious to me. To the indexing methods in use at Baidu, Bing, Exalead, Google, Jike, and Yandex—not so much.

How big of a problem is this? You can see that Brainware, Sinequa, Thunderstone, and dozens of other content-centric outfits are conflated with questionable videos, electronic games, and Latin phrases. When looking for these companies and their brands via mobile devices, the findability challenge gets harder, not easier. The constant stream of traditional news releases, isolated blog posts, white papers which are much loved by graduate students in India, and Web collateral miss their intended audiences. I prefer “miss” to the blunt reality of “unread content.”

I am going to start a file in which to track brand confusion and company name erosion. Search, analytics, and content processing vendors should know that preserving the semantic “magnetism” of a word or phrase is important. Surprising it is to me that I can run a query and get links to visual network analytics along side high performance centrifuges. Some watching robots pay close attention to the “centrifuge” concept I assume.

Brand management is important.

Stephen E Arnold, September 7, 2012

Sponsored by Augmentext

Can Brainware and ISYS Search Get Lexmark Back on Track

September 5, 2012

I was surprised to learn that Lexmark in Lexington, Kentucky, was getting into the search and retrieval, content processing, and indexing business. I had a meeting at a Lexmark facility a couple of years ago, and I was struck by the absence of activity in what was and probably still is a very large building. The meeting was held in the “library” for one of the firm’s units. Quiet. Search was a challenge. I left the meeting wondering how the employees found repair data, training manuals, proposals, and technical reference information.

When I learned that in a short span of time in early 2012, Lexmark purchased Brainware. You may know that Brainware was originally a search vendor. The technology which worked the firm’s retrieval magic was based on trigrams or three letter sequences. The query terms were parsed into three letter groups. Documents with the query’s three letter groups were identified and rank order by trigram match. There are numerous technical details associated with the patented technology. The point is that Brainware got into back office processing and took off. The search and retrieval business supported the paper-to-digital-to-index business. Brainware landed some juicy accounts. I assumed that Oracle would acquire the company, but I was wide of the mark. Heck, I wasn’t even in the same county. You can get some details about the deal in the Brainware news release, “Lexmark Acquires Brainware.” To beef up Brainware’s back office capabilities, Lexmark also bought Nolij.

A few days later, Lexmark purchased the ISYS Search Software company. Like IBM’s magical repositioning of Vivisimo, Lexmark described ISYS as being more than search. Okay. According to the news release about the deal, ISYS’s technology dates from 1988. That works out to almost a quarter century. The ISYS technology will complement Lexmark’s Perceptive Software business. The idea is Perceptive will be better able to compete in process and content management solutions.

With the closing of the ink jet business, Lexmark is going to have to find a way to generate significant revenues from its search enabled applications and its search based businesses.

The question becomes, “Will Lexmark be able to generate significant revenue from search?”

In the annual report for 2005, Lexmark said:

Lexmark makes it easier for businesses and consumers to move information between the digital and paper worlds. Since its inception in 1991, Lexmark has become a leading developer, manufacturer and supplier of printing and imaging solutions for offices and homes. Lexmark’s products include laser printers, inkjet printers, multifunction devices, associated supplies, services and solutions.  Lexmark develops and owns most of the technology for its laser and inkjet products and associated supplies, and that differentiates the company from many of its major competitors, including Hewlett-Packard, which purchases its laser engines and cartridges from third-party suppliers. Lexmark also sells dot matrix printers for printing single and multi-part forms by business users and develops, manufactures and markets a broad line of other office imaging products. The company operates in the office products industry. The company is primarily managed along business and consumer market segments.

With this shift, Lexmark is going in a different direction; that is, buying technology instead of developing it. The announcement that Lexmark was terminating more than 1,000 employees with about half located less than an hour from my goose pond in Harrod’s Creek, Kentucky, was bad news in a state with lots of bad

How will that work out?

My view is that Lexmark is likely to experience some unwelcome surprises. As you may recall, Hewlett Packard was shocked at Autonomy’s performance once the company was on board. With the departure of a number of key Autonomy executives, including Mike Lynch, Hewlett Packard has become quiet about Autonomy. I assume that the massive write off of the EDS business is occupying the senior managers. Lexmark may be headed for some cost surprises; for example:

  • Brainware incurs some labor costs with its back office sales. Oracle and other companies want to get into this “old fashioned” business, so the marketing costs are likely to go up. How much of a spike will be determined by the appetite of hospitals and other paper centric operations in a lousy economy and the uncontrollable actions of companies like Oracle.
  • ISYS costs are likely to be a shock as well. ISYS is similar to Fast Search & Transfer, just older. As a result, the cost to keep the system current are likely to grow over time. The fancy new features like text mining are easy to talk about. To build out systems which can compete with services from Digital Reasoning and Quid is another level of investment entirely.
  • Support costs in the search enable applications sector are tough to control. A major company may not tolerate a filtering call handled in India and then a wait for an engineer to get involved. Perhaps Lexmark will use ISYS for customer support?

But what could Lexmark do?

Printing is environmentally unacceptable to many people. In addition, a PDF file can be emailed more quickly and cheaply than sending a document via FedEx. With iPads in the hands of executives, a digital version of a document is good enough.

Like HP, Lexmark is going to have to work some marketing, cost control, and management miracles to get back on the growth path with generous margins. Is it too late for Lexmark to return to revenue glory in the Bluegrass State? Well, I am not willing to go out a limb. Let’s just watch.

Stephen E Arnold, September 5, 2012

Sponsored by Augmentext

Ohloh Code Enhances Koders.com Search Technology

September 4, 2012

Big news from Black Duck brought to you by the goose pond: Ohloh has enhanced Koders.com.

Ohloh Code, a publicly available, free code search site, has made it possible for users to immediately browse the code of projects, search for particular methods, and see Ohloh commit and LOC information all in the same place. This is an improvement upon Koders.com, which lacked an automated way to let users add and update projects or view sources and parent projects. An announcement post made on Ohloh, “Ohloh + Code = Ohloh Code,” informs us of the changes:

“The Ohloh Code search database is populated and updated from a new, automated integration with Ohloh’s project list. We’ve rebuilt the code search engine (also available for private code search: Code Sight) as an upgrade from Koders.com.  We’ve migrated the entire code base from .NET to Java (our team’s language of choice).[…]

To sum up…

Ohloh (ohloh.net) + Ohloh Code (code.ohloh.net) = our vision to create the most comprehensive and free resource for developers to find and explore open source projects and code.”

We think code searchers will be pleased with Ohloh Code’s results and the enhanced technology. Kudos to the team for integrating more languages, filtering and faceting search results, including preservation of the underscore in search results, and enhancing scalability for indexing and searching. Head on over to Ohloh to learn about more of the changes.

Andrea Hayden, September 04, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

Google Updates the Portal from 1996: Info on Multiple Devices

August 30, 2012

The portal never really died. AOL and Yahoo have kept the 1990s “next big thing” despite the financial penalties the approach has imposed on stakeholders. There are other portals which are newer versions of the device which slices, dices, and chops. Examples I have looked at include:

  • NewsIsFree, which delivers headlines, alerts, and allows me to find “sources”
  • WebMD, which is a consumer “treat thyself” and aggregation information portal
  • AutoTrader, which provides a vehicle research, loan, and purchasing portal.

Google when it rolled out 13 years ago took advantage of search systems’ desire to go “beyond search.” The reasons were easy to identify. Over the years, I have enumerated them many times. Google’s surge was due to then-search giants looking for a way to generate enough revenue to pay for the cost of indexing content. Now there are some crazy ideas floating around “real” consultant cubicles that search is cheap and easy.

Next Generation Portals: Gate to Revenue Hell?

Fear not, lads and lasses.

Search is brutally expensive and—guess what?—the costs keep on rising. The principal reasons are that systems need constant mothering and money. No, let’s not forget the costs which MBAs often trivialize. These include people who can make the system work, run faster, remain online, and keep pace with technology. Telecommunications, power, hardware, and a number of other “trivial” items gobble money like a 12 year old soccer player after practice chowing down on junk food.

Next Generation Portals: Gate to Revenue Heaven?

Portals promised to be “sticky”, work like magnets and pull more users, and provide a platform for advertising. Portals were supposed to make money when search generated modest amounts of money. First Overture, then Yahoo, and finally Google realized that the pursuit of objectivity was detrimental to selling traffic. Thus, online pay-to-play programs took off. The portals with a lead like Yahoo fumbled the ball. The more clever Googlers grabbed the melon and kept going back to the farmer’s garden for more. Google had, it appeared, figured out how to remain a search system and make lots of money.

No more.

Do we now witness the portalization of Google? Is the new twist is that the Google portal will require the user to have multiple devices? Will each device connects to Google to show more portal advertising goodness?

There is a popular impression among some MBAs on Wall Street and “real” consultants that Google is riding the same old money rocket in did in 2004 to 2006. My view is different.

Read more

Search: A Persistent Disconnect between Reality and Innovation

August 17, 2012

Two years ago I wrote The New Landscape of Search. Originally published by Pandia in Norway, the book is now available without charge when you sign up for  our new “no holds barred” search newsletter Honk!. In the discussion of Microsoft’s acquisition of Fast Search & Transfer SA in 2008, I cite documents which describe the version of Fast Search which the company hoped to release in 2009 or 2010. After the deal closed, the new version of Fast seemed to drop from view. What became available was “old” Fast.

I read the InfoWorld story “Bring Better Search to SharePoint.” Set aside the PR-iness of the write up. The main point is that SharePoint has a lousy search system. Think of the $1.2 billion Microsoft paid for what seems to be, according to the write up, a mongrel dog. My analysis of Fast Search focused on its age. The code dates from the late 1990s and its use of proprietary, third party, and open source components. Complexity and the 32 bit architecture were in need of attention beyond refactoring.

The InfoWorld passage which caught my attention was:

Longitude Search’s AptivRank technology monitors users as they search, then promotes or demotes content’s relevance rankings based on the actions the user takes with that content. In a nutshell, it takes Microsoft’s search-ranking algorithm and makes it more intelligent…

The solution to SharePoint’s woes amounts to tweaking. In my experience, there are many vendors offering similar functionality and almost identical claims regarding fixing up SharePoint. You can chase down more at www.arnoldit.com/overflight.

The efforts are focused on a product with a large market footprint. In today’s dicey economic casino, it makes sense to trumpet solutions to long standing information retrieval challenges in a product like SharePoint. Heck, if I had to pick a market to pump up my revenue, SharePoint is a better bet than some others.

Contrast the InfoWorld’s “overcome SharePoint weaknesses” with the search assertions in “Search Technology That Can Gauge Opinion and Predict the Future.” We are jumping from the reality of a Microsoft product which has an allegedly flawed search system into the exciting world of what everyone really, really wants—serious magic. Fixing SharePoint is pretty much hobby store magic. Predicting the future: That is big time, hide the Statue of Liberty magic.

Here’s the passage which caught my attention:

A team of EU-funded researchers have developed a new kind of internet search that takes into account factors such as opinion, bias, context, time and location. The new technology, which could soon be in use commercially, can display trends in public opinion about a topic, company or person over time — and it can even be used to predict the future…Future Predictor application is able to make searches based on questions such as ‘What will oil prices be in 2050?’ or ‘How much will global temperatures rise over the next 100 years?’ and find relevant information and forecasts from today’s web. For example, a search for the year 2034 turns up ‘space travel’ as the most relevant topic indexed in today’s news.

Yep, rich indexing, facets, and understanding text are in use.

What these two examples make clear, in my opinion, is that:

Search is broken. If an established product delivers inadequate findability, why hasn’t Microsoft just solved the problem? If off the shelf solutions are available from numerous vendors, why hasn’t Microsoft bought the ones which fix up SharePoint and call it a day? The answer is that none of the existing solutions deliver what users want. Sure, search gets a little better, but the SharePoint search problem has been around for a decade and if search were such an easy problem to solve, Microsoft has the money to do the job. Still a problem? Well, that’s a clue that search is a tough nut to crack in my book. Marketers don’t have to make a system meet user needs. Columnists don’t even have to use the systems about which they write. Pity the users.

Writing about whiz bang new systems funded by government agencies is more fun than figuring out how to get these systems to work in the real world. If SharePoint search does not work, what effort and investment will be required to predict the future via a search query? I am not holding my breath, but the pundits can zoom forward.

The search and retrieval sector is in turmoil, and it will stay that way. The big news in search is that free and open source options are available which work as well as Autonomy- and Endeca-like systems. The proprietary and science fiction solutions illustrate on one hand the problems basic search has in meeting user needs and, on the other hand,  the lengths to which researchers are trying to go to convince their funding sources and regular people that search is going to get better real soon now.

Net net: Search is a problem and it is going to stay that way. Quick fixes, big data, and predictive whatevers are not going to perform serious magic quickly, economically, or reliably without significant investment. InfoWorld seems to see chipper descriptions and assertions as evidence of better search. The Science Daily write up mingles sci-fi excitement with a government funded program to point the way to the future.

Sorry. Search is tough and will remain a chunk of elk hide until the next round of magic is spooned by public relations professionals into the coffee mugs of the mavens and real journalists.

Stephen E Arnold, August 17, 2012

Sponsored by Augmentext

 

Perfecting Web Site Semantics

August 6, 2012

Web site search is most often frustrating, and at its worst, a detriment to customers and commerce.  Fabasoft Mindbreeze, a company heralded for its advances in enterprise search, is bringing its semantic specialization to the world of Web site search with Fabasoft Mindbreeze InSite.  Daniel Fallmann, Fabasoft Mindbreeze CEO, highlights the features of the new product in his blog entry, “4 Points for Perfect Website Semantics.”

Fallmann lays out the problem:

The problem: Standard search machines, in particular the one provided by CMS, are unproductive and don’t consider the website’s sophisticated structure. The best example: enter the search term ‘product’ and the search delivers no results, even though product is its own category on the site. Even if the search produces a result for another term, there’s nothing more than a ‘relatively un-motivating list of links,’ not really much help to a website visitor.

Using semantics in the search means that the Web site is being understood, not just keyword searched.  Automatic indexing preserves the existing site structure, while providing hassle-free search for the customer.  In addition, InSite benefits the Web site developer, in that he/she can see how users are navigating the site and which elements are most often searched.

The attractive “behind-the-scenes” functioning of Fabasoft Mindbreeze InSite means that customers benefit from the intuitive, semantic search without the distraction of a clunky search layer.  Satisfy your customers and your developers by exploring InSite today.

Emily Rae Aldridge, August 6, 2012

Sponsored by ArnoldIT.com, developer of Augmentext.

Short Honk: High Value Podcast about Solr

August 4, 2012

If you are interested in Lucene/Solr and have a long commute, you will want to check out Episode 187 of the IEEE’s Software Engineering Podcast. You can find the podcast on iTunes. Grant Ingersoll, one of Lucid Imagination’s experts in open source search and a committer on the Apache Lucene/Solr project, reviews the origins of Lucene, explains the features of Solr, and covers a range of important, hard to get search information. According to IEEE, the podcast offers a:

dive into the architecture of the Solr search engine. The architecture portion of the interview covers the Lucene full-text index, including the text ingestion process, how indexes are built, and how the search engine ranks search results.  Grant also explains some of the key differences between a search engine and a relational database, and why both have a place within modern application architectures.

One of the highlights of the podcast is Mr. Ingersoll’s explanation of vector space indexing. Even a high school brush with trigonometry is sufficient to make this important subject fascinating. Highly recommended.

Stephen E Arnold, August 4, 2012

Sponsored by Augmentext

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta