eBook Search Engines
March 22, 2010
A happy quack to the reader who sent me a link to “The 5 Best Ebook Search Engines”. My eyes and eBook readers are not like peanut butter and chocolate. I know that a large number of people are getting on the eBook reader bandwagon. The search system that Work Up seems to favor is MegaPDF. I did some testing and got mixed results. For example, a search for Silesian Station returned some unusual results, including a one page PDF from the Book Thief. You are on your own. I prefer to buy hard copies.
Stephen E Arnold, March 22, 2010
A free write up. I suppose I can report “free’ to the Free Public Library here in Louisville, Kentucky.
ArnoldIT Expands Overflight
March 22, 2010
If you want one-click access to what’s new from leading vendors of search and content processing, navigate to ArnoldIT’s free Overflight service. Pick a company name, select a Google topic area, or run a query on Google’s own 70 plus Web logs. We have added three vendors to the watch service:
- Comperio, one of the Microsoft Fast support entities which has former FAST Search engineers on staff.
- Exorbyte, a vendor with a system that matches other eCommerce and databased content systems feature for feature.
- Funnelback, the Australian open source search system offered by SQIZ, an open source content management company.
You will also find a list of three social network service providers: Facebook, Twitter, and LinkedIn. What’s interesting is to click through each of the autogenerated pages for the search and content processing vendors. You may be able to tell who is marketing with some savvy and who is clueless.
Stephen E Arnold, March 22, 2010
A shameless promotion of an ArnoldIT.com service. You now are reminded that Beyond Search is a marketing blog devoted to ArnoldIT.com and Stephen E Arnold.
Coveo and GEICO Host Webinar on March 23, 2010
March 21, 2010
Fierce Media has asked Beyond Search to facilitate a discussion about “how GEICO thinks about leveraging its data-rich enterprise systems to generate real-time business value and intelligence.” The participants are GEICO and Coveo as well as Stephen E Arnold.
Topics include how the Coveo system can:
- Enable improved business intelligence and decision making through dynamic dashboards and information mashups that provide actionable business information
- Access structured and unstructured data from across enterprise systems and repositories without complex integration or data migration, improving efficiency and cost effectiveness through a unified indexing layer
- Lower the cost of legacy system integrations and upgrades, and reduce time-consuming data migration
- Optimize social networks and incorporate the value of collaboration and just-in-time information exchange into the knowledge ecosystem
The audio program will be on Tuesday, March 23, 2010 beginning at 11:00am Eastern/8:00am Pacific. More information about Coveo may be found at http://www.coveo.com. You can register here.
Ben Kent, March 21, 2010, Beyond Search
This is a sponsored post.
Google China: Pundits and Mavens Rev Their Engines
March 20, 2010
Google has to decide what to do about China. I think I heard this on one of the TV news shows that the goslings run when NCAA games are not on the boob tube nailed to a tree on the shore of the goose pond. I think the person making this statement displayed a snapshot of Mrs. Clinton, but maybe it was another luminary.
I read “Opinion: Why Google should stay in China”, which explained what Google should do this way:
Google’s actions will only hurt Google, its shareholders, and those that depend on the Web 2.0 ecosystems Google has been nurturing. By closing the development offices, Google will lose a lifeline into a vibrant economy and culture, one that that it desperately needs to understand and leverage in order to continue its historic growth in the years ahead. This lack of understanding was plain in the way Google made its decision – unilaterally and without even consulting its experts inside China. You need those people, Google, and so do we. So please swallow your own pride and reconsider before abandoning them.
Sounds good. The only hitch in the git along is that the author is not calling the shots for the Google.
I scanned an azure chip consultant’s analysis of the China market. I think the numbers in “Gartner says China will be World’s Fastest Growing Enterprise Software Market Through 2013” are probably fuzzy, but whatever those numbers are, China is a big market.
If the Google bails, my hunch is that some Type A MBA money managers will want to know:
- Why is the Google NOT maximizing shareholder value. China is not Albania.
- What is going to be done to pump up Google’s share price without a really big, juicy market to penetrate?
- Who will be the candidates for the new Google management team if shareholders revolt?
I don’t have any answers, and I don’t think Google’s chess game with China is unfolding with the inevitability Google anticipated.
Stephen E Arnold, March 19, 2010
Free blog and an free article. What could be better? I will report my working without pay to the Department of Labor, where hard work is the norm and that work is not performed by workers for free.
Microsoft Fast Customer Support
March 20, 2010
Short honk: Got your Microsoft Fast installation up and running but have a wee question? You will want to keep this information handy:
- FAST standalone technical support assistance, navigate to http://support.microsoft.com/oas
- FAST telephone support: +1 866-922-5260 (8:00 AM – 8:00 PM Eastern Time)
Enjoy!
Stephen E Arnold, March 19, 2010
A freebie pure and simple.
Are You Ready for Enterprise Search? Nope
March 19, 2010
A reader sent me a link to a white paper from Silicon.com. I clicked the link and was presented with a download request form. I apparently filled a similar form out years ago because I was asked to update my information. I did so. I was then given another page from which to click a link to download a white paper from MobilVox, Inc.
The title? “Are We Ready for Enterprise Search.” The subtitle? “Text analytics and intelligent agents cannot be overlooked.” No problem with the title but the text of the white paper was two pages. This is more of a flier or a fact sheet. A white paper is in my opinion somewhat more substantive. The last one I wrote was about 12 pages long, had diagrams, and included some hard metrics about the performance of a search system.
The white paper pointed me to www.irissearch.net, which through me.
The point of the white paper by MobilVox is to boil down what took me 300 pages to explain in three editions of my Enterprise Search Report to a publisher who, like a chameleon, changed its appearance, and Martin White and I filled 125 pages for Successful Enterprise Search Management, published by Galatea in 2009.
I don’t disagree with the information in the two page write up, but it is a bit short on detail. Here’s phase II of a search implementation:
Strategically select information repositories most critically important to the organization. Deploy the enterprise search solution with these core repositories. Scale up initial roll-out by adding more repositories and connectors to other legacy systems.
Martin and I explained the steps and some of the constituent nuances in 16 pages, and we chopped quite a bit of detail to meet the stipulations of our publisher in the UK.
If you want a white paper that gives you enterprise search on two sheets of paper, have at it. After you end up in a bit of a technical, managerial, and budget bind, drop me an email. seaky2000 at yahoo dot com. I won’t be able to help, but I like to keep track of potentially interesting case examples.
Stephen E Arnold, March 19, 2010
No one paid me to write about search challenges. I will report this sad state of affairs to the Department of Energy, an outfit with deep experience is search systems that are often interesting challenges to senior managers.
InQuira Embraces the Cloud
March 19, 2010
I read “InQuira Puts It Knowledge Solutions in the Cloud” and learned that the approach “is in no way a light weight version.” On premises search systems can be tough to install, tune, and maintain. Blossom has been, in my opinion, one of the trail blazers for hosted search, and it offers a robust, powerful, and customizable solution. InQuira is moving in that direction as well.
According to the write up which quotes an InQuira officer:
InQuira has existing partnerships with Oracle CRM On Demand, Oracle’s Siebel offering, and Genesys Telecommunications Laboratories. The newest on-demand offering will extend the company’s reach…[InQuira] has a really established reputation as the best-of-breed intelligent search vendor that quickly and easily integrates with everyone,” says John Ragsdale, vice president of technology research for the Technology Services Industry Association (TSIA).
One feature of the approach is that storage is provided in an “on demand” model.
You can get more information from www.inquira.com.
Stephen E Arnold, March 19, 2010
Freebie. No one paid me to write this. I will report non payment to the Bureau of Labor Statistics, an outfit who tracks work for no compensation each day, every day.
Globalbrain Version 5
March 17, 2010
My feedreader sent me a story via “Your Story” tagged “Globalbrain v5 Provides More Customizable and Flexible Search Functionality.” According to the write up,
Globalbrain is highly scalable and enables access to vast amounts of structured and unstructured data (including e-mails, e-mail attachments, OCR-ed images, PDF files, word processing documents, spreadsheets and hundreds of other formats). Organizations and their users can simply search for information by using queries comprised of phrases, sentences, paragraphs or even entire documents of text rather than complex Boolean logic or complicated taxonomies.
One of the highlights of the new version, according to the write up, is:
At query time, the engine can decide when to use an inverted index approach for common keyword type queries or the fuzzy contextual based search. The engine can even use a combination of these search approaches. This approach, along with faceted groupings, allows users to further drill down and fine-tune results.
More information is available at www.brainware.com.
Stephen E Arnold, March 17, 2010
A freebie. No one paid me to write this item. I will report non payment to the Dulles Toll Road authority. Brainware can be reached by that route if you wish to visit the firm.
Newsosaurs
March 15, 2010
I read “It’s Hard To Watch The Newsosaurs Turn A Blind Eye To Their Own Extinction” right after I flipped through the New York Times’s Sunday magazine clone from the Wall Street Journal outfit. Let me comment on each information MIRV and offer a couple of observations from my search vantage point.
First, TechCrunch’s write up has a killer comment:
Everyone wants to wall off the Web and keep grazing on declining ad revenues.
I agree. This is a combination of fear, anger, and ostracism. I enjoy pointing out that in the information economy, the traditional giants no longer own the country club. Each day, the former owners find their future will be as caddies to the new information elite. This is, I suppose, a bitter pill to swallow. The TechCrunch article includes the much quoted “burn the boats” admonition from one of the early superstars of the zippy-doo Web that is not the cat’s pajamas. Like Google’s advice to struggling industry, the listeners think that their outfits have already burned the boats, embraced technology, and reinvented themselves. This mismatch between advice and its perception is characteristic of the domain collision that is now taking place. The passage that caught my attention in the TechCrunch write up was:
The longer media companies wait, the bigger disadvantage they will have when they cross over to the other side and find a whole new host of competitors who never had any print legacy businesses to protect. Those competitors right now are blogs and online news hubs who are still furry little rodents in the underbrush, but who won’t stay little forever. The sooner print media companies cross over, the sooner they can be on pure offense. Their online strategies and business models won’t be crippled by any allegiance, or need to protect, to the old print business. If they wait until their online revenues become 25 or 50 percent before they fully commit, it will be too late.
I don’t disagree with the thought. I disagree with the “will be too late.” It is too late.
The example to wish I refer is the oversized, glossy, 80 plus page WSJ Magazine filled with “reading.” Well, that’s interesting. I just counted about 32 pages of ads plus a number of features that are tough for me to determine if these are placed for consideration or are actual editorial. The stories focused on cars and fashion with a profile tossed in for good measure.
I remember being told by my Financial Times’s delivery agent before I dropped my print subscription that he tossed the magazine insert because it was too much of a hassle. I wonder if my delivery person for my Saturday WSJ will follow the same path.
Did I read any of the stories? The answer is, “No.” None of them appealed to me. I have a person who works for me who drives a Mini Cooper and it seems to have constant tire problems. I am tired of with it executives who overcame hardship. Who hasn’t? Fashion? Not interested. I wear black Travel Smith jackets, black never wrinkle pants, and black shoes that do not set off any alarms anywhere I travel. Spare me the trendy. Was there any financial info, business intelligence, or juicy insights into making money grow? Nope. The WSJ added sports and now it is adding a New York Times’s magazine type publication every couple of months.
What’s my take?
- WSJ is going after the NYT advertisers. That’s okay but the effectiveness of print ads have to be demonstrable. That might be tough unless the editorial product provides some content consideration. The boundary between an auto story and an advertiser might be getting a few molecules narrower, might it not?
- The problem with traditional media is not content; the problem is finance and business models. Offering me 30 pages of ads in 80 pages of paper is somewhat 17th century in today’s world.
- The Financial Times’s last home delivery offer to me was $50 a year. Will the Wall Street Journal face the same subscription challenge as readers discover that blending sports, Details magazine editorial, and business profiles might be out of step with what subscribers like me do on a Saturday?
Now search? How will I be able to locate the Gucci suit on the WSJ Web site? Answer: Not until the WSJ figures out image indexing and some other search tricks. I bet that when the iPad version of the WSJ Magazine comes out I will be able to click on a suit and see a map of locations where I can buy a suit that will fit most 20 year old soccer players. Maybe for some folks. Not for me.
Stephen E Arnold, March 14, 2010
No one paid me to write this article. I will report a failure to charge for my writing to the editor of the Army Times, an outfit focused on information in the modern world.
Indexing Craziness
March 15, 2010
I read “Folksonomy and Taxonomy – do you have to choose?,” which takes the position that a SharePoint administrator can use a formal controlled term list or just let the users slap their own terms into an index field. The buzzword for allowing users to index documents is part of a larger 20 something invention—folksonomy. The key segment for me in the SharePoint centric Jopx blog was:
The way that SharePoint 2010 supports the notion of promoting free tags into a managed taxonomy demonstrates that a folksonomy can be used as a source to define a taxonomy as well.
Let me try and save you a lot of grief. Indexing must be normalized. The idea is to use certain terms to retrieve documents with reasonable reliability. Humans who are not trained indexers do a lousy job of applying terms. Even professional indexers working in production settings fall into some well known ruts. For example, unless care is exercised in management and making the term list available, humans will work from memory. The result is indexing that is wrong about 15 percent of the time. Machine indexing when properly tuned can hit that rate. The problem is the that the person looking for information assumes that indexing is 100 percent accurate. It is not.
The idea behind controlled term lists is that these are logically consistent. When changes are made such as the addition of a term such as “webinar” as a related term to “seminar”, a method exists to keep the terms consistent and a system is in place to update the index terms for the corpus.
When there is a mix of indexing methods, the likelihood of having a mess is pretty high. The way around this problem is to throw an array of “related” links in front of the user and invite the user to click around. This approach to discovery entertains the clueless but leads to the potential for rat holes and wasted time.
Most organizations don’t have the appetite to create a controlled term list and keep it current. The result is the approach that is something I encounter frequently. I see a mix of these methods:
- A controlled term list from someplace (old Oracle or Convera term list, a version of the ABI/INFORM or some other commercial database controlled vocabulary, or something from a specialty vendor)
- User assigned terms; that is, uncontrolled terms. (This approach works when you have big data like Google but it is not so good when there are little data, which is how I would characterize most SharePoint installations.)
- Indexes based on parsing the content.
A user may enter a term such as “Smith purchase order” and get a bunch of extra work. Users are not too good at searching, and this patchwork of indexing terms ensures that some users will have to do the Easter egg drill; that is, look for the specific information needed. When it is located, some users like me make a note card and keep in handy. No more Easter egg hunts for that item for me.
What about third party SharePoint metadata generators? These generate metadata but they don’t solve the problem of normalizing index terms.
SharePoint and its touting of metadata as the solution to search woes are interesting. In my opinion, the approach implemented within SharePoint will make it more difficult for some users to find data, not easier. And, in my opinion, the resulting index term list will be a mess. What happens when a search engine uses these flawed index terms, the search results force the user to look for information the old fashioned way.
Stephen E Arnold, March 15, 2010
A free write up. No one paid me to write this article. I will report non payment to the SharePoint fans at the Department of Defense. Metadata works first time every time at the DoD I assume.


