Apptus Theca

June 24, 2010

Quite a few search and content processing companies describe themselves as “leading”. The headline “Europe’s Leading Search Technology Company Apptus, Offers a Safe Passage for All Users of Fast on Linux and Unix to Apptus’ Search Platform Theca” caught my attention. The byline was Stockholm, which is definitely a technology center. Microsoft has a large presence. The content management company EPiServer and the smart content processing vendor Silobreaker have roots in the Stockholms skärgård. Microsoft has a significant presence as well. Apptus lit up my radar on one of my visits to Scandinavia. The company was positioned to me as an eCommerce integrator focusing on directory implementations and retail.

The news item, reported on Yahoo so it may become a 404 in a heartbeat, made several interesting points.

First, the news release informed me that Apptus is “Europe’s leading search technology company.” A bit deeper in the news release was this qualifier, “Europe’s leading developer of search and content enrichment services for online directories.” This statement matched my understanding of the the company’s focus. Too bad the headline says one thing which I did not believe and the first paragraph said another thing which seemed to match what I knew about the company. Ah, 20-somethings. Such a delight are they.

Second, Microsoft’s dumping the Linux/Unix Fast Search & Transfer ESP has spawned a competitor. Although the news release does not tell me, I heard that Apptus is using open source search technology and going after the orphaned Fast ESP Linux/Unix users. This makes sense, and the idea that an outfit with expertise in search implementation, tuning, and integration is a good one in my opinion.

Third, Apptus is one of the higher profile outfits taking advantage of Microsoft’s decision to expand its business and give open source search a boost. Keep in mind that Apptus has customers in 18 countries and counts among its clients Yell.com and World Color Press (formerly Quebecor), among others.

In my opinion, what I see happening is a fracturing of an already mixed up and fluid segment of the software industry. I assume that my two or three readers will disagree, but here’s my working hypothesis:

  1. Microsoft’s dumping of Fast Linux/Unix is giving additional impetus to Lucene/Solr. Vendors of proprietary search and content processing solutions may find that Microsoft has unwittingly created an unexpected consequence. It is too soon to tell if Microsoft knows about what I can call the “Apptus effect”. I will have to sit back and watch.
  2. SharePoint centric search vendors may find the open source search providers capturing more customers. SharePoint centric vendors, therefore, may face some tough choices; for example, put resources into fighting the Apptus-style plays, focus only on SharePoint and abandon the Linux/Unix market, or go all in and support Microsoft and Linux/Unix.
  3. The search and content processing vendors who want to offer platforms will have to step up their marketing. Microsoft and Google are platform companies, and it will be increasingly difficult to get attention for very good, but less well known options.
  4. Specialty search vendors will be forced to focus even more sharply on point solutions. This means that crazy marketing lingo aside, some companies will have to pick a sector like customer support and, in the words of Project Runway’s Tim Gunn, “make it work”. The days of morphing from business intelligence, semantics, eDiscovery, and appliances may meet with greater skepticism. Customers with problems will want a best of breed solution and the Heinz 57 varieties creature may be a turn off.
  5. Cloud search solutions may become more desirable. I had a conversation yesterday and pointed out that SAS Teragram offered a cloud solution before the cloud had become the buzzword du jour. Companies like Blossom.com have proved to me that hosted search works like a champ and shaves money and time off search and retrieval.

To sum up, the Apptus announcements strikes me as a big deal. Aside from my stumbling over the Apptus news release headline, there’s a message in the Apptus news item. Who is listening? Search vendors facing financial pressure may want to perk up their ears.

Stephen E Arnold, June 24,2010

Freebie

Is Google Attacking Endeca with Killer Prices?

June 23, 2010

I am not sure if this eWeek story is on the money, but I want to capture the alleged Google pricing for its eCommerce service. Judge for yourself by navigating to “Google Commerce Search 2.0 Gets Refinements, $25K Price Point.” The big dogs in eCommerce include / have included Dieselpoint, EasyAsk, Endeca, SLI Systems, Omniture Mercado, and a handful of other outfits. Price points range from $25,000 right on up to millions, depending on what the customers’ specifications are perceived to be. Keep in mind that scaling and tuning may add significantly to the cost of an ecommerce system.

For me the key paragraph was:

The search engine also added a new price point for Commerce Search. The original entry level price was $50,000 per year for an indexing of 100,000 items and up to 10 million queries. Google has cut that virtually in half to appeal to smaller businesses, or businesses with smaller needs. Businesses may now license Commerce Search for $25,000 per year, which is good for 50,000 products and 3 million queries. Customers will pay more as they scale.

So what? This price point is a bargain until one considers the sentence “Customers will pay more as they scale.” Budget that, grasshopper.

Stephen E Arnold, June 21, 2010

Freebie

PDF Search

June 23, 2010

You can pinpoint PDF files in Google via its advanced search option or just keying this string after your query, filetype:pdf. Too much work? Navigate to http://www.pdfpick.com/. The service limits the query to the wonderful PDF files. My acquaintance with PDFs began at Ziff in the late 1980s. I think I had to kick the tires of what was then called “Trapeze”. Over the last 20 years I have watched the file format become the sleek, well formed, round, firm, and fully packed wonder that it is. Bound phrases? Forget it. Snappy rendering. Forget it. Malware safe? Forget it. Tools for limiting file validity by time or number of opens? Forget it. Universally searchable? Forget it. Autoscaling on mobile devices? Forget it. Users who know what a tiff wrapper is? Forget it. Nevertheless, PDFs are part of the landscape. If you want to limit your query to this file type, give PDFPick a try.

Stephen E Arnold, June 23, 2010

Freebie

Real Time Search Systems, Part 3

June 23, 2010

Editor’s note: This is the draft taxonomy of real time systems that I discussed in my June 15 and 17 lectures. It may or may not make sense, but I wanted to make clear that the broad use of the phrase “real time” does not convey much meaning to me. The partial fix, short of incarcerating the marketers who slap “real time” on their brochures, is to come up with “types” of real time information. The type helps make clear the cost and other characteristic features of a system sporting the label “real time”.

Stop and think about the difference in user expectations between an investment firm and a middle school child processing information. The greed mongers want to get the freshest information possible to make the maximum return on each bet or investment. The middle school kid wants to make fun of a teacher.

The greed mongers spend millions for Fancy Systems from Thomson Reuters, Exegy, or a similar specialist. The reason is that if the Morgan Stanley Type As get bond information a few milliseconds after the God loving folks at Goldman, lots of dough can slip through the clutching paws of the person responsible for a trade. With a great deal at stake, real time means in milliseconds.

The middle school wit is happy with whatever happens as long as the teacher remains blissfully ignorant of the message. If the recipient lets out a hoot, then there may be consequences, but the downside is less painful than what happens to the crafty Wall Street wonder.

The figure below presents the draft taxonomy. If you find it silly, no problem. If you rip it off, a back link would be a nice gesture, but I don’t have any illusions about how stateless users conduct themselves.

image

Where does the latency originate? The diagram below provides the tech sleuth with some places to investigate. The lack of detail is intentional. Free blog, remember?

Read more

How to Download Google Books

June 23, 2010

Short honk: The goose prefers tree-killing books. You may want digital books, specifically Google digital books. If so, you will want to read “Download Google Books.” We have not tried the method. Post the results of your tests in the comments section of this blog.

Stephen E Arnold, June 23, 2010

Freebie

Need an Ontology Editor?

June 23, 2010

Some vendors offer ontology management systems. Most of these are okay, and one or two are exceptional. In this elite category are the products from Access Innovations in Albuquerque, New Mexico. We have worked with the founders for – what is it now – 30 years?

Some interesting open source software for managing word lists and ontologies are becoming available. We learned about Protégé. The software implements an extensible, platform-independent environment for creating and editing ontologies. Handy folks can apply the system to knowledge bases as well.

Developed by Stanford’s Center for for Biomedical Informatics Research, Protégé is free and can be downloaded here. Now does it work? Let me know.

Stephen E Arnold, June 20, 2010

Freebie

Microsoft Headed for a Collapse

June 22, 2010

Business Insider ran a story that will definitely get clicks. The headline is “The Odds Are Increasing that Microsoft’s Business Will Collapse.” I don’t buy the argument, but you may. The idea is that the company has “lots of different businesses” and there is trouble a plenty. For me, one of the most interesting passages was:

Right now, the investors are concluding that Microsoft will gradually become the equivalent of a technology utility–a boring but necessary provider of the software that runs the world’s business community.  A smaller, more optimistic crowd is still arguing that, one day, Microsoft will be able to turn its fortunes around, and fight its way back into an industry leadership position. What almost no one is talking about is a third possibility, one that becomes more likely by the day: The possibility that, a couple of years down the road, Microsoft’s business may just completely collapse.

Now the problem as viewed from the goose pond in Harrod’s Creek is simple.

  1. Wall Street wizards know that IBM pulled back from the brink, became a consulting firm, and became a good news story. Won’t Microsoft follow a similar trajectory?
  2. Maybe not because the Wall Street wizards could break up the company, make big fees, and the smaller entities be left to sink or swim. Did I mention really big fees?
  3. With the mantra of “too big to fail” echoing in the hollow adjacent the goose pond, maybe the lobbyists will focus their efforts on one of the those weekend deals like the one that “saved” BearStearns
  4. And what if the giant $65 billion machine just keeps rolling along. Even with adversaries like Apple and Google, Microsoft has lots of customers, certified professionals, gold resellers, and rank and file Word users who may resist change.

I like the “Will Collapse” angle. I just don’t think “collapse” is the right word. Great click catnip, however.

Stephen E Arnold, June 22, 2010

Freebie

Elsevier Buys Collexis

June 22, 2010

Elsevier continues to add to its search and content processing arsenal. With the cost of human indexing gushing like the BP oil spill, Elsevier is looking for magic to use for publishing scientific, technical, and medical information products and services. Elsevier is the giant company behind journals like The Lancet and the encyclopedia of Mosby reference books. In terms of indexing, sci-tech is easier to machine index than chatty Twitter tweets. To bolster the firm’s multiple methods, Elsevier acquired Collexis Holdings, a semantic technology and software developer. The plan is that the Collexis technology will give Elsevier the ability to help researchers and institutions take advantage of more avenues for finding data and publishing results, creating a better ROI. Is it a good plan? Yahoo has been a practitioner of this approach for years. Perhaps Elsevier can craft a success from this Yahoo-style approach. Now those Collexis assets have to be fine tuned and installed before the company or its clients will start seeing benefits. But kudos for Elsevier for making a positive step.

Jessica West Bratcher, June 22, 2010

Freebie

IBM Back in the NLP Game?

June 22, 2010

IBM has some interesting technology. Like Xerox Parc, good ideas do not necessarily become market-dominant products or services. Remember Stairs III? If you answer, “No,” there you go. I am never sure whether IBM has come up with a great innovation or if it has fired up its public relations machine. With $100 billion in revenue and a motto like “Think”, how can you go wrong betting on IBM?

My my lonely perch in Madrid, I read a Slashdot item here that pointed me to and IBM Web site which timed out and to the fee starved New York Times story “The Watson Trivia Challenge.” The NYT link may be dead when you read my blog post. Be prepared to go hunting in a Vanderbilt-inspired “go-ahead” way. If you are lucky, you will be able to read “Designing a computer that can process and understand natural language.” Here’s a snippet from the pokey IBM Web site:

Known as a Question Answering (QA) system among computer scientists, Watson has been under development for more than three years. According to Dr. David Ferrucci, leader of the project team, “The confidence processing ability is key to winning at Jeopardy! and is critical to implementing useful business applications of Question Answering.” Watson will also incorporate massively parallel analytical capabilities and, just like human competitors, Watson will not be connected to the Internet, or have any other outside assistance.

The idea is that IBM’s technology can play a popular game show better than I can. No contest. I don’t know what the show is nor do I excel at answering questions. For example, I am baffled at such questions as:

  • Why does the IBM.com Web site time out?
  • Why can’t I locate information via the search box on IBM.com?
  • Why is IBM technology focused on search engine optimization, consulting, and beating game show contestants chosen because each can jump up and down, make good television, and give the host an easy target for sly humor?
  • Isn’t this “older” news recycled in what seems to be a World Cup week?

Call me a silly goose, but tracking the IBM innovations which seem to have no significant impact on my information seeking life is confusing. A final question, Watson, “Why is this the case?”

Bring up the theme music. Buzzz. Time’s up. Next week’s contestant? Ask.com. See you then.

Stephen E Arnold, June 22, 2010

Freebie

Real Time Search Systems, Part 2

June 22, 2010

Editor’s note: This post tiptoes through the tulips. In this instance, tulips is a synonym for industrial strength content processing systems that can be licensed by commercial entities. governmental organizations, or individuals who want to become a baby Fuld or Kroll. Achieving this type of azure chip transcendence means that you will be a hit at the local bingo parlor when you share your insights with your table mates.

Industrial Strength Tools

The free services don’t provide the user with much in the way of post processing horsepower. Another weakness of free services is that the average user deals with what each system spits out in response to a click or a query. The industrial strength systems provide such functions as:

A system or method for “plugging” in different streams of content. Examples range from electronic mail in the wonderful Microsoft Exchange Server to proprietary content stuffed into a clunky content management system. These connectors are a big deal because without different inputs of content, a real time search engine does not have the wood to burn in the fire box.

Each system provides or supports some type of software circuit board. The idea is that the content moves from the connectors over the circuits on the circuit board to its destination. Acquired content must be processed so its first destination is a system or systems which extract data, generate metadata, and, in the case of Google, figures out the context of the message. The result is an index that contains index terms, metadata, and often such extras as a representation of the source message, precalculated values, and new information constructs.

Applications or “hooks” that make it possible for another software program to tap into the generated values and processed content to create an output. Now the outputs can vary widely. Another software system may just look up an item. Another software application might glue together different items from the index and content representation. The user sees a report, a display on a mobile phone, or maybe a mashup which allows the human to “recognize” or “spot” what’s needed. No searching required.

The Vendors

In my lectures I mentioned some different outfits in each of my two talks. I have rolled up the vendors in the list below. My suggestion is to do some research about each of these companies. I provide “additional color” on the technologies each vendor licenses, but that information is not going to find its way into a free blog posting. Problem? Read the About information available from the tab at the top of this page.

  • Exalead http://www.exalead.com Robust system which handles structured and unstructured data. Outputs may be piped to other enterprise software, a report, or a peripatetic worker with a mobile phone in Starbucks.
  • Fetch Technologies http://fetch.com Developed initially for certain interesting government information needs, you can customize Fetch using its graphical programming method and perform some quite useful analyses
  • JackBe http://www.jackbe.com Developed initially for certain interesting government information needs, you can license JackBe and process a wide range of content.
  • Silobreaker http://www.silobreaker.com Developed initially for certain interesting government information needs, you can output reports that are as good as the roll ups crafted by a trained intelligence professional.

What do these systems do in “real time?” Each of them, when properly resourced, can ingest flows of data and unstructured content, assign metadata, and output alerts, reports, or Google-style search results within minutes of the content becoming known to the system.

Read more

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta