Exegy: Pushing Deeper in Financial Markets

September 28, 2008

Exegy is not a company that comes up when 20-something search experts kick back and trade stories about water pistol fights in the dorm. The company’s technology processes large volumes of data in near real time. This is not the near real time of the uninformed. Exegy crunches North American equity data fees and option exchanges so that it can display the highest one second peak occurring every 60 seconds. Ivy Schmerken’s “Exegy Launches Marketdatapeaks.com along with Xasax and Financial Information Forum” highlights Exegy’s content processing technology. You can read the write up here. Processing large content streams in near real time is a non trivial task. Most search vendors dance around the issue of machine infrastructure to perform the nifty tricks shown in Flash demos. Not Exegy. The company installs its proprietary appliance. Working with Xasax, the new service provides financial services firms with useful data that are otherwise difficult if not impossible to obtain.

I profiled Exegy in my April 2008 study for the Gilbane Group. You can learn more here. The reason I included the company was to highlight the importance of matching hardware to the content processing task. Clearwell Systems, Google, Thunderstone, and Index Engines have taken a somewhat similar approach. The 20 somethings who are true search mavens are confident that a couple of Dell or HP servers can whip almost any content processing job. The kiddies are wrong. To learn more about the Exegy engineering behind its throughput, click here.

A final thought to the victors of the water pistol wars for search: infrastructure matters. Infrastructure makes or breaks many enterprise search systems.

Search Monopoly: The Users Are Guilty

September 28, 2008

After 21 days of travel, I enjoyed flicking through the digital fish my newsreader snags for me. One article in Seeking Alpha caught my eye. The title is “We Can’t Afford a Search Monopoly, Even If It Kills Yahoo”. You can read the article here. I think the article is by Michael Arrington, and it originally appeared on TechCrunch, but I can’t be sure. The pivot on which the article turns is Google’s deal with Yahoo. In the background are the data that most Internet users in North America turn to Google for search. I am supportive of Google, not because I love the 20 somethings who come up to me after my lectures and expect me to explain to them why I am describing a Google they don’t recognize. The reason is that Googlers are so involved with a tiny circle of other Googlers that these bright folks can’t see the Googzilla against the background of “do no evil”.

I side with Google because every day users vote with their mouse clicks and key taps to make Google number one. I don’t for a minute think advertisers care who gets them business. As long as the ad money delivers sales, advertisers are happy. I also don’t think that users think too much about how Google generates useful results for a query or a click on a Google canned search within Chrome. If users and advertisers were deeply dissatisfied, the GOOG would find itself just another vendor fighting for survival.

I think folks are agitated about Google is that after a decade of indifference and casual dismissal of the company as a search engine that sells ads, some people are waking up to the reality of Google’s application infrastructure. Why is the infrastructure important? At this time, none of the competitors have what Google has. Microsoft is rushing to catch up, but it is tough to close a 60 percent market share gap and an infrastructure gap quickly. In fact, if Microsoft catches up, it will be in the difficult position of finding Google farther ahead. Microsoft or any other competitor  has to leapfrog Google, not catch up.

Furthermore, trying to legislate away Google’s success may be tough too. The legal process can be long and drawn out. Google has enough money to keep its lawyers digging away for decades. Google is morphing into other business sectors, and it is not clear to me how a decision about Google can transfer from one sector to another or from one country to another. Toss in the fact that most people use Google of their own free will, and the problem of Google’s alleged search monopoly becomes more challenging.

Google now finds itself in a combination position. In some ways it is the 21st century version of the pre break up AT&T. In other ways, it is the child of the thumb typing generation. Google’s been chugging away for a decade, and it is going to be difficult to alter the company’s trajectory or bleed off its momentum with Web log postings, complaints from newspaper publishers, and objections from Microsoft and Yahoo that Google is not playing fair and square.

The problem with Google is that it is a service for users and advertisers. To kill Google, someone is going to have to get those users to stop using Google. That’s a big job and one that may be difficult without the aforementioned technical leap frog.

Stephen Arnold, September 28, 2008

Mercado: Healthy, Wealthy, Wise. Pick One.

September 27, 2008

The Marker IT Computerworld (Israel) ran an interesting story. It’s tough from my hollow in Kentucky to know if this is 100 percent accurate. I want to alert you that my source is the Hebrew language Web page here. My Hebrew is not much better than my Dutch, but I wanted to pass along the gist of this story by Guy Greeamelnde, “Mercado Israel Fired about 30 Workers”. If the Market IT story is accurate, this is about 25 percent of the firm’s work force. According to the story, Mercado itself may be reeling from the economic down turn that is gaining momentum in the fragmented, fiercely competitive search and content processing sector.

Under the leadership of CEO Kari Leiebu, the company has been growing. Mercado generates somewhere in the $18.0 to $20.0 million per year. Despite the growth in the last two years, customers pay monthly because the firm’s business model is software as a service. Despite the bookings, cash remains a precious commodity. To conserve cash, employees have to go. The company received an infusion of cash bringing to total of as much as $70 million.

Information about Mercado Software is here. You can get Mercado white papers here. As of September 26, 2008, no information about the economic problems afflicting Mercado appears on the company’s Web site. The firm’s investors include the Challenge Fund, Consensus Business Group, Eucalyptus Ventures, Pitango Venture Capital, Star Ventures, and Valley Venture Capital. I will keep my eyes open for confirmation of this store in Marker IT.

For now, Mercado seems to be healthy, wealthy, and wise. Tomorrow. Who knows?

Stephen Arnold, September 27, 2008

Taxonomy: Silver Bullet or Shallow Puddle

September 27, 2008

Taxonomy is hot. One of my few readers sent me a link to Fumsi, a Web log that contains a two part discussion of taxonomy. I urge you to read this post by James Kelway, whom I don’t know. You can find the article here. The write up is far better than most of the Webby discussions of taxonomies. After a quick pass at nodes and navigation, he jumps into information architecture requiring fewer than 125 words. The often unreliable Wikipedia discussion of taxonomy here chews up more than 6,000. Brevity is the soul of wit, and whoever contributed to the Wikipedia article must be SWD; that is, severely wit deprived.

Take a look at the Google Trends’ chart I generated at 8 pm on Friday, September 26, 2008. Not only is taxonomy generating more Google traffic than the now mud crawler enterprise search. Taxonomy is not as popular as “CMS”, the shorthand for content management system. But “taxonomy” is a specialist concept that seems to be moving into the mainstream. At the just concluded Information Today trifecta conference featuring search, knowledge management (whatever that is), and streaming media, taxonomy was a hot topic. At the Wednesday roof top cocktail, where I worked on my tan in the 90 degree ambient air temperature, I was asked four times about taxonomies. I know I worked on commercial taxonomies and controlled vocabularies for database, but I learned from those years of experience that taxonomies are really tough, demanding, time consuming intellectual undertakings. I thought I was pretty good at making logical, coherent lists. Then I met the late Betty Eddison and the very active Marje Hlava. These two pros taught me a thing or 50.

google trends taxnonomy

In the dumper is the red line which maps “enterprise search” popularity. The blue line is the up and coming taxonomy popularity. The top line is the really popular, yet hugely disappointing, content management term traffic.

I heard people who have been responsible for failed search systems and non functional content management systems asking, “Will a taxonomy improve our content processing?” The answer is, “Sure, if you get an appropriate taxonomy?” I then excuse myself and head to the bar man for a Diet 7 Up. The kicker, of course, is “appropriate”. Figuring out what’s appropriate and then creating a taxonomy that users will actually exploit directly or indirectly is tough work. But today, you can learn how to do a taxonomy in a 40 minute presentation or if you are really studious a full eight hour seminar.

I remember talking with Betty Eddison and Marje Hlava about their learning how to craft appropriate taxonomies. Marje just laughed and turned to her business partner who also burst out laughing. Betty smiled and in her deep, pleasant voice said, “A life time, kiddo.” She called me “kiddo”, and I don’t think anyone else ever did. Marje Hlava chimed in and added, “Well, Jay [her business partner] and I have been at it for two life times.” I figured out pretty quickly that building “appropriate” taxonomies required more than persistence and blissfully ignorant confidence.

Why are taxonomies perceived as the silver bullet that will kill the vampire search or CMS system. A vampire system is one that will suck those working on it into endless nights and weekends and then gobble available budget dollars. In my opinion, here are the top five reasons:

  1. The notion of a taxonomy as a quick fix is easy to understand. Most people think of a taxonomy as the equivalent of the Dewey Decimal system or the Library of Congress subject headings and think, “How tough can this taxonomy stuff be?” After a couple of runs at the problem, the notion of a quick fix withers and dies.
  2. Vendors of lousy enterpriser search systems wriggle off the hook by asserting, “You just need a taxonomy and then our indexing system will be able to generate an assisted navigation interface.” This is the search equivalent of “The check is in the mail.”
  3. CMS vendors, mired in sluggish performance, lost information, and users who can’t find their writings, can suggest, “A taxonomy and classification module makes it much easier to pinpoint the marketing collateral. If you search for a common term, our system displays those documents with that common term. Yes, a taxonomy will do the trick.” This is the same as “Let’s do lunch” repeated every week to a person whom you know but with whom you don’t want to talk for more than 30 seconds on a street corner in mid town Manhattan.
  4. A shill at a user group meeting–now called a “summit”–praises the usefulness of the taxonomy in making it easier for users to find information. Vendors work hard to get a system that works and win over the project manager. Put on center stage and pampered by the vendor’s PR crafts people, the star customer presents a Kodachrome version of the value of taxonomies. Those in the audience often swallow the tale the way my dog Tess goes after a hot dog that falls from the grill. There’s not much thinking in Tess’s actions either.
  5. Vendors of “automated” taxonomy systems demonstrate how their software chops a tough problem down to size in a matter of hours or days. Stuff in some sample content and the smart algorithms do the work of Betty Eddison and Marje Hlava in a nonce. Not on your life, kiddo. The automated systems really are 100 percent automatic. The training corpus is tough to build. The tuning is a manual task. The smart software needs dummies like me to fiddle. Even more startling to licensees of automatic taxonomy systems is that you may have to buy a third party tool from Access Innovations, Marje Hlava’s company, to get the job done. That old phrase “If ignorance is bliss, hello, happy” comes to mind when I hear vendors pitch the “automated taxonomy” tale.

I assume that some readers may violently disagree with my view of 21st century taxonomy work. That’s okay. Use the comments section to teach this 65 year old dog some new tricks. I promise I will try to learn from those who bring hard data. If  you make assertions, you won’t get too far with me.

Stephen Arnold, September 27, 2008

IBM: Another New Search System from Big Blue

September 27, 2008

IBM announced its eDiscovery Analyzer. You can read the IBM news release on the MarketWatch news release aggregation page here. Alternatively you can put up with the sluggish response of IBM.com and read the more details here. You won’t be able to locate this page using IBM.com’s search function. The eDiscovery Analyzer had not been indexed when I ran the query at 7 30 pm on September 27, 2008. I * was * able to locate the page using Google.com. If I were the IBM person running site search, I would shift to Google, which works.

The eDiscovery Analyzer, according to Big Blue:

… provides conceptual search and analysis of cases created by IBM eDiscovery Manager.

Translating: eDiscovery  Manager  assists  with  legal  discovery,  a  formal  investigation  governed  by  court  rules  and  conducted  before
trial,  and  internal  investigations  on  possible  violations  of  company  policies,  by  enabling  users  to  search  e-mail  documents  that
were  archived  from  multiple  mailboxes  or  Mail  Journaling  databases  into  a  central  repository. You license eDiscovery Manager, the bits and pieces needed to make it go and then you license the brand new eDiscovery Analyzer component.

ibm ediscovery interface

I believe that this is the current interface for the “new” IBM eDiscovery Analyzer. Source: IBM’s Information Management Software IBM eDiscovery Analyzer 2.1 marketing collateral.

You will need FileNet, IBM’s aging content management system. The phrase I liked best in the IBM write up was, “[eDiscovery Analyzer] is easy to deploy and use, Web 2.0 based interface requires minimal user training.” I’m not sure about the easy to deploy assertion. And the system has to be easy to use because the intended users are attorneys. In my experience, which is limited, legal eagles are not too excited about complicated technology unless it boosts their billable hours. You can run your FileNet add in on AIX (think IBM servers) or Windows (think lots of servers).

You can read about IBM’s search and discovery technology here. You can tap into such “easy to deploy” systems as classification, content analysis, OmniFind search, and, if you are truly fortunate, DB2, IBM’s user friendly enterprise database management system. You might want to have a certified database administrator, an expert in SQL, and an IBM-trained optimization engineer on hand in case you run into problems with these user friendly systems. If these systems leave you with an appetite for more sophisticated functions, click here to learn about other IBM search and discovery products. You can, for example, read about four different versions of OmniFind and learn how to buy these products.

Remember: look for IBM products by searching Google. IBM.com’s search system won’t do the job. Of course, IBM’s enterprise eDiscovery Analyzer is a different animal, and I assume it works. By the way, when you try to download the user guide, you get to answer a question about the usefulness of the information * before * you have received the file. I conclude that IBM prefers users who are able to read documents without actually having the document.

Stephen Arnold, September 27, 2008

Linguamatics Sells Bayer CropScience

September 27, 2008

My newsreader snagged this item, which I found interesting. The little-known Linguamatics (a content processing company based in the UK) retained its deal with the warm and friendly Bayer CropScience. The Linguamatics’ technology is called I2E, and Bayer has been using the I2E system since the summer of 2007. In September, Bayer CropScience decided to renew its license and process patent documents, scientific and technical information, and perform knowledge discovery. (I must admit I am not sure how one discovers knowledge, but I will believe the article that you can find here.)

For me, this small news item was interesting for several reasons. First, for many years a relatively small number of companies had been granted access to the inner circle of European pharma. I find it refreshing that after two centuries, upstarts like Linguamatics are able to follow in the footsteps of Temis and other firms who have worked to make sales in these somewhat conservative companies. “Conservative” might not be the correct word. Computational chemists are a fun-loving group. One computational chemist told me last October in Barcelona that computational chemists were pharma’s equivalent to Brazilian soccer football fans. On the off change that a clinical trial goes off the rails, some pharma players prefer keeping “knowledge” quite undiscovered until an “issue” can be resolved.

lingua_searchresults

A representative I2E results display. © Linguamatics, 2008.

Second, Linguamatics–a company I profiled after significant bother and effort–is profiled in my April 2008 study Beyond Search, published by the Gilbane Group. You can learn more about this study here because ferreting out information about I2E is not the walk in the park that I expected from a content processing company with a somewhat low profile. Linguamatics has some interesting technology, and I surmise that the uses of the system are somewhat more sophisticated and useful to Bayer CropScience than “discovering knowledge”.

Finally, Bayer CropScience is a subsidiary of the influential Bayer AG, an outfit with an annual turnover of about US$8.0 billion, give or take a billion because of the sad state of the dollar on the international market. My hunch is that if the CropScience deal feels good, other units of this chemical and pharmaceutical giant will learn to love the I2E system.

Stephen Arnold, September 27, 2008

BBC: Search Is a Backwater

September 27, 2008

I just read a quite remarkable essay by a gentleman named Richard Titus, Controller, User Experience & Design for BBC Future Media & Technology. (I like the word controller.) I am still confused by the time zone zipping I have experienced in the past seven days. At this moment in time, I don’t recall if I have met Mr. Titus or if I have read other writings by him. What struck me is that he was a keynote at a BBC Future Media & Technology Conference. My first reaction is that to learn the future a prestigious organization like the BBC might have turned toward the non-BBC world. The Beeb disagreed and looked for its staff to illuminate the gloomy passages of Christmas Yet to Come. You can read this essay “Search and Content Discovery” here. In fact, you must read it.

With enthusiasm I read the essay. Several points flew from the page directly into the dead letter office of my addled goose brain. There these hot little nuggets sat until I could approach them in safety. Here are the points that cooked my thinking:

  1. Key word search is brute force search.
  2. Yahoo BOSS is a way to embrace and extend search
  3. The Xoogler Cuil.com system looked promising but possibly disappoints
  4. Viewdle facial recognition software is prescient. (This is an outfit hooked up with Thomson Reuters, known for innovation by chasing markets before the revenue base crumbles away. I don’t associate professional publishers with innovation, however.)
  5. Naver from Korea is a super electronic game portal.
  6. Mahalo is a human-mediated system and also interesting, and the BBC has a topics page which also looks okay
  7. SearchMe, also built by Xooglers, uses a flash-based interface.

searchmeresults

Xooglers are inspired by Apple’s cover flow. Now how many hits did my query “beyond search” get. Can your father figure out how to view the next hit or make this one large enough to read, a brute force way to get information of course.

These points were followed by this statement:

When you marry solid data and indexing (everyone forgets that Google’s code base is almost ten years old), useful new data points (facial recognition, behavioral targeting, historical precedent, trust, etc) with a compelling and useful user experience, we may see some changes in the market leadership of search.

I would like to comment on each of these points:

Read more

An Exceptional Rumor: MSFT to Buy Yahoo AOL Combo

September 26, 2008

I saw this post on Venture Beat here. Then I saw a follow on story on Peter Kafka’s write up for Silicon Alley Insider here. I am delighted to point out that these writes up do not a done deal make. I find the notion fascinating, and I hope it comes to pass. Google will probably buy another dinosaur skeleton, reinstate day care, and design more lavish housing for the NASA Moffett Field Google Housing Units to celebrate. Please, read these two posts. The plan, as I understand this speculation, is that Yahoo gobbles up the wheezing AOL. I presume Yahoo will be able to work its technical magic on AOL’s infrastructure just as it did Delicious.com’s. Yahoo took two years to rewrite Delicious.com’s code, thus allowing other social sites and bookmarking services to flourish. Once the dust settles from that MBA fueled explosion, the Bain consultants will shape the package so that Microsoft can swoop in and snap up two hot properties, solve its search and portal problems, and catch up with Googzilla and chop off its tail.

When I worked at Booz, Allen & Hamilton, we called the Bain consultants Bainies. I can’t recall if we used this as a term of affection or derision. I like Bain and the work it did for Guinness just about 20 years ago. You can refresh your memory of that project here.

Let’s walk through the search and content processing implications of this hypothetical deal. I promise that I will not comment about SharePoint search, Live.com’s search, Outlook search, SQL Server search, Powerset search, or Fast Search & Transfer search.

  1. AOL has search plus some special sauce. At one time Fast Search & Transfer was laboring in the AOL vine yards. Teragram, prior to its acquisition by  SAS, was also a vendor. Two vendors are enough for Yahoo to rationalize. Heck, Yahoo is relying on Fast Search technology for its AllTheWeb.com service last I heard. The Teragram technology might be a stretch, but the Yahoo technical team will be up to the challenge. The notion of becoming part of Microsoft will put a fire in the engineers’ bellies.
  2. AOL has its portal services. Granted these overlap with Yahoo’s. There’s the issue of AOL mail, AOL messenger, and AOL’s ad deals with various third parties. Google may still have a claw in the AOL operation as well. I haven’t followed Google’s tie up with AOL since word came to me that Google thought it made a bad decision when it pumped a billion into the company.
  3. AOL has a cracker jack customer service operation. Yahoo has a pretty interesting customer service operation as well. I am not sure how one might merge the two units and bring both of them under the Yahoo natural language search system that doesn’t seem to know how to provide guidance to me when I want to cancel one of my very few Yahoo for fee services. Give this a try on your own and let me know how you navigate the system.

I am delighted that I don’t have to figure out how to mesh Yahoo and AOL and then integrate the Yahoo AOL entity with Microsoft. Overlapping services are trivial for these three firms’ engineers. No big deal. If the fix is to operate each much as they now are, I anticipate some cost control problems. Economies of scale are tough to achieve operating three separate systems and their overlapping features.

I think that when I read the stories in my newsreader on Monday, September 29, 2008, I will know more about this rumor. I am still struggling with how disparate systems and the number of search systems can be made to work better, faster, and cheaper. Maybe the owner of the Yahoo AOL property will outsource search to Google. Google is relatively homogeneous, and it works pretty well for quite a few Web users, Web advertisers, and Web watchers. Watch this Web log for clarification of this rumor. For now, the word that comes to mind is a Vista “wow”.

Stephen Arnold, September 26, 2008

TeezIR BV: Coquette or Quitter

September 26, 2008

For my first visit to Utrecht, once a bastion of Catholicism and now Rabobank stronghold, I wanted to speak with interesting companies engaged in search and content processing. After a little sleuthing, I spotted TeezIR, a company founded in November 2007. When I tried to track down one of the principals–Victor Van Tol, Arthus Van Bunningen, and Thijs Westerveld–I was stonewalled. I snagged a taxi and visited the firm’s address (according to trusty Google Maps) at Kanaalweg 17L-E, Building A6. I made my way to the second floor but was unable to rouse the TeezIR team. I am hesitant to say, “No one was there”. My ability to peer through walls after a nine hour flight is limited.

I asked myself, “Is TeezIR playing the role of a coquette or has the aforementioned team quit the search and content processing business?” I still don’t know. At the Hartmann conference, no one had heard of the company. One person asked me, “How did you find out about the company?” I just smiled my crafty goose grin and quacked in an evasive manner.

The trick was that one of my two or three readers of this Web log sent me a snippet of text and asked me if I knew of the company:

Proprietary, state-of-the-art technology is information retrieval and search technology. Technology is built up in “standardized building blocks” around search technology.

So, let’s assume TeezIR is still in business. I hope this is true because search, content processing, and the enterprise systems dependent on these functions are in a sorry state. Cloud computing is racing toward traditional on premises installations the way hurricanes line up to smash the American south east. There’s a reason cloud computing is gaining steam–on premises installations are too expensive, too complicated, and too much of a drag on a struggling business. I wanted to know if TeezIR was the next big thing.

My research revealed that TeezIR had some ties to the University of Twente. One person at the Hartmann conference told me that he thought he heard that a company in Ede had been looking for graduate students to do some work in information retrieval. Beyond that tantalizing comment, I was able to find some references to Antal van den Bosch, who has expertise in entity extraction. I found a single mention of Luuk Kornelius, who may have been an interim officer at TeezIR and at one time a laborer in the venture capital field with Arengo (no valid link found on September 16, 2009). Other interesting connections emerged from TeezIR to Arjen P. de Vries (University of Twente), Thomas Roelleke (once hooked up with Fredhopper), and Guido van’t Noordende (security specialist). Adding these names to the management team here, TeezIR looked like a promising start up.

Since I was drawing a blank on getting people affiliated with TeezIR to speak with me, I turned to my own list of international search engines here, and I began the thrilling task of hunting for needles in hay stacks. I tell people that research for me is a matter of running smart software. But for TeezIR, the work was the old-fashioned variety.

Overview

Here’s what I learned:

First, the company seemed to focus on the problem of locating experts. I grudgingly must call this a knowledge problem. In a large organization, it can be hard to find a colleague who, in theory, knows an answer to another employee’s question. Here’s a depiction of the areas in which TeezIR is (was?) working:

image

Second, TeezIR’s approach is (was?) to make search an implicit function. Like me, the TeezIR team realized that by itself search is a commodity, maybe a non starter in the revenue department. Here’s how TeezIR relates content processing to the problem of finding experts:

image

Read more

Sisense Update

September 26, 2008

Back in August, Beyond Search wrote about Sisense, a business intelligence start up dealing in software solutions. They’ve been working on software that taps Google spreadsheets holding customer data, runs the information through pre-defined intelligence schemes, and crunches away. You can review the basics here.

That software, called Prism, is now on the market. Licensing starts at $50 per month for a workgroup and $10-20 a month for additional users. There’s also a free (hey!) version and a personal option ($100) out there too.

According to a Sisense press release, Prism is desktop-based product with strong analytics and reporting and graphing capabilities. They also promote that it’s the only business intelligence software that doesn’t need IT support. Not a bad plan for users on the road like advertising or distribution personnel.

You can get a trial version at https://www.sisense.com/register.aspx.

Jessica Bratcher, September 26, 2008

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta