Pingar for Personalized Search

November 2, 2009

A happy quack to the reader who alerted me to Pingar, a SharePoint centric content processing vendor with offices in New Zealand. The company, founded in 2006, announced its enterprise search solution for SharePoint in mid October 2009. According to the company:

“The intelligent enterprise search tool that can be embedded into the upcoming release of Microsoft SharePoint Server 2010 will take the browsing out of browsing,” says Pingar’s co-founder and Managing Director Peter Wren-Hilton.

I stumbled on the notion of taking “the browsing out of browsing”. I use a browser to browse. If don’t want to browse, I use another method. Nevertheless, the system, according to the company:

[the] solution goes inside data documents, finds the content the user is seeking and then places it into a dynamically generated PDF or XPS document, rather than just presenting a list of links like the traditional search model. Pingar’s solution also sorts the search into categories to minimize reading times.

Pingar.com

In a Pingar report, a hit includes a back link to the original source document.

According to the Tauranga Eastern Link Newsletter:

Pingar has developed dynamic software to create an ‘intelligent’ search engine, which enables users to type in a specific question and get an exact answer. Pingar’s new offices will be in Hong Kong’s prestigious Science & Innovation Park, close to one of its key partners as well as a range of potential customers in China and Asia. Sharon-May McCrostie, New Zealand’s Trade Commissioner to Hong Kong, endorses Pingar’s move into the Chinese market. “Pingar has a revolutionary, clever way of search that will transform so many industry sectors, including publishing and monetizing online content. It has been very exciting for us to see a New Zealand company taking on the world and making real inroads…

Written by Stephen E. Arnold · Filed Under Business strategy, Feature, Search, SharePoint | Comments Off on Pingar for Personalized Search

Charging Backward by Charging for Content

October 25, 2009

In my years in the commercial online business, I learned a few things the hard way. When I got rolling in the commercial database business, there were few examples of proven money making methods from digital content. As a result, one had to do some thinking, devise tests, and then go forward. Today there are some models to examine. For example, the idea of the third party payer is a useful one. Originated by GoTo.com (I think), Google looked at the model and used it as the lifeblood of its revenue approach. Another model is what’s called “must have” information. A lawyer engaged in patent litigation knows the USPTO system is less than perfect. Commercial services such as those available from Derwent and Questel provide an alternative but an expensive one. In fact, these services are sufficiently costly to keep most online users in the dark about what the services contain and how they work.

This is an image of Stephen E Arnold when he worked at a traditional publishing company. The environment and the business processes created a case study in the nature vs nurture debate.

One lesson I learned was that an online company has to find a mix of business methods that produce revenue. There is no one size that fits all. Even the Google is pursuing subscriptions, license fees, and partner upfront payments as ways to keep the money pumping.

This is a picture of Stephen E. Arnold when he focused exclusively on electronic publishing. His Neanderthal characteristics have become less evident.

It is, therefore, not surprising that publishing companies want to charge for content. What I find interesting is that some of the publishers are taking a somewhat war like stance to what is little more than a business problem. For example, read “WSJ Editor: Those Who Believe Content Should Be Free Are Neanderthals”. The idea that I took away from this article is that ad hominen arguments are in vogue. I am not sure that I am upset with the possibility that I am prehistoric.

The question I had when reading the article was, “Why are publishers so late to the pay for content party?”

I know that publishers have been trying to crack the online revenue code for a long time. I was a beta tester of the original Dow Jones desktop software. The idea was that I could use the software to search for content on the fledgling Dow Jones service in the 1990s. The Wall Street Journal is still trying. In fact, a person with whom I spoke last week told me, “Dow Jones is the only publisher making money from its online service.” That’s not true. One of the most successful online publishing companies is Thomson Reuters. Others include Reed Elsevier and Consumer Reports.

Written by Stephen E. Arnold · Filed Under Business strategy, Feature, Online (general), Publishing, Technology | 1 Comment

Reflections on SharePoint and Search

October 19, 2009

I had an interesting conversation with a librarian at the international library conference in Hammersmith on December 15, 2009. This professional from a central European country asked my views about SharePoint. As I understood her comments, she was in a SharePoint centric environment and found that making small changes to improve information access was difficult, if not impossible.

One reason was her organization’s bureaucratic set up. Her unit was in the same building at the information technology group, but IT reported to one senior manager and her library to another. Another reason was the reluctance of the IT team to make too many changes to SharePoint and its indifference to her legacy library systems. Her challenge was even more difficult because there were multiple legacy systems in the information center. One of these system offered search, and she wanted to have SharePoint make use of the content in that legacy system’s repository. She was not sure which vendors’ search system was in the legacy system, but she thought it was from the “old” Fast Search & Transfer outfit.

The motto of IT and another unit’s management.

Okay, I thought. Fast Search & Transfer, the company Microsoft bought in 2008 and had spent the last 18 months converting into the world class search system for SharePoint. Fast ESP would break through the 50 million document ceiling in SharePoint search and add user experience plus a nifty range of functions.

To make a long story short, she wanted to know, “Will Microsoft and SharePoint support the “old” versions of Fast ESP?” I told her that I recalled reading that Microsoft would stand behind the Fast ESP technology for a decade. She replied, “Really?” Frankly, I do not know if that guarantee extends to OEM instances of the “old” Fast ESP. My hunch is that the problem will fall upon the vendor licensing the Fast ESP engine. But I have learned and suffered from the fact that it is very easy to write marketing collateral and somewhat more difficult to support a system—any system—for a decade. I know mainframe systems that have been around for 30 years, possibly more. But the Linux centric search systems built to index the Web content are a different kettle of Norwegian herring.

My understanding of the trajectory of Fast ESP is that the company took the core of its high speed Web indexing system and reworked it to handle enterprise content. The “old” Fast ESP abandoned the Web search market when the company’s top brass realized that Google’s technology was powering past the Fast technology. Enterprise search became the focus of the company. Over the years, the core of Fast’s Web search system has been “wrapped” and “tweaked” to handle the rigors and quite different requirements of enterprise search. The problem with the approach as I pointed out in the three editions of the Enterprise Search Report I wrote between 2003 and 2006 were:

Lots of moving parts. My research revealed that a change in a Fast script “here” could produce an unexpected result elsewhere in the system “there”. Not even my wizard son could deal with these discomforting technical killer bees. Chasing down these understandable behaviors took time. With time converted to money, I concluded that lots of moving parts was not a net positive in my enterprise search engagements. Once a system is working, the attitude of the librarian’s IT department is my reaction. Just leave something working alone.
Performance. Web search required in the late 1990s a certain type of spidering. Today, indexing has become more tricky. Updates are often larger than in the past, so the content processing subsystem has to do more work. Once the index has been updated, other processes can take longer because indexes are bigger or broken into chunks. Speeding up a system is not a simple matter of throwing hardware at the problem. In fact, adding more hardware may not improve performance because the bottleneck may be a consequence of poor engineering decisions made a long time ago or because the hardware was added on but to the wrong subsystem; for example, the production server, not the indexing subsystem.
Ignorance. Old systems were built by individuals who may no longer be at the company. Today, the loss of the engineers with implicit knowledge of a subsystem make it very difficult—maybe even impossible– to resolve certain functions such as inefficient use of scratch space for index updates. I recall one job we did at a major telco several years ago. The legacy search system was what it was. My team froze it in place and worked with data the legacy system wrote to a file on a more modern server on a fixed schedule. Not perfect, but it was economical and it worked. No one at the company knew how the legacy search system worked. The client did not want to pay my team to figure out the mystery. And I did not want to make assumptions about how long gone engineers cooked their code. Companies that buy some vendors’ search systems may discover that there is a knowledge problem. Engineers often document code in ways that another engineer cannot figure out. So the new owner has to work through the code line by line to find out what is going on. Executives who buy companies make some interesting but often uninformed assumptions; for example, the software we just bought works, is documented, and assembled in a textbook manner. Reality is less tidy than the fantasies of a spreadsheet jockey.

Written by Stephen E. Arnold · Filed Under Business strategy, Enterprise, Feature, Search, SharePoint, Technology, Text processing | 10 Comments

Image Search Headache Removal Service

October 18, 2009

Searching for images giving you a headache? We just might have found a golden goose.

Along with the data explosion on our hands comes a related problem: image explosion. There are just as many graphics out there as there are files, I bet, and searching through them is just as difficult a prospect, even not even more so. Graphics like .jpg, .tiff, .gif and others don’t necessarily have embedded data–so there’s nothing to catch in a search–and graphics that do carry some form of metadata end up top of the search pile.

Beyond Search was working on a recent project and needed a picture. Our top dog’s comment: “I just had an Easter egg hunt for an image. What a mess.” He’s referring to the mess of figuring what pictures can or cannot be used with securing rights. The Internet is just as visual as it is textual, and more images than you think actually belong to someone. But how do you know? And on the owner’s part, how do you know if your image is being ripped off?

As I learned in my years in journalism, you can’t just pick a pretty picture, slap it on a page (paper or pixel), and publish it. There are such things as digital rights. Yes, someone owns that picture. In a few cases, it may be a free-use image. In some cases, it may be free-use image that use can use if you give them credit. But in many cases, it is a copyrighted image–and you need permission or need to pay to use it. So when you enter your search term “blooming moonflower” into the browser box, beware that list of 1,443,873,899 images that is returned.

We were recently contacted by a representative of PicScout, http://www.picscout.com, a company that deals in image copyright solutions. She sent us a press release about several worldwide companies joining the effort to help safeguard digital rights. Her comment: “Until now, image ownership online was disregarded and disrespected, to say the least. Now that every image has the potential to be visibly associated and directly connected with its licensor or owner, value is restored.”

And it occurred to me that search is intrinsically involved in that effort. From a PicScout press release: “In 2008 an online survey from KRC Research, undertaken for iStockphoto, revealed that 33 percent of Americans use downloaded digital content, but nearly 30 percent are unaware that permission is required for its use.” In general, PicScout’s products produce a list of “tens of millions of images” that display online with the universal information symbol if a surfer is using the ImageExchange Firefox add-on (you can apply to beta test it at http://imagex.picscout.com). And ta-da! You can see right away which images you may or may not use without checking first. How simple is that?

So here’s the scenario: Say ordinary Joe Schmo goes to Google Images and does a search for “pretty pony” to use in a home-grown PR campaign. The GOOG does its thing and returns a billion search results. Joe chooses one and goes on his merry little way… until ImageTracker catches up. Joe fell into the digital rights trap. Any search browser will return results, and unless you have some serious search savvy (and if it’s possible at all), there’s likely no way you can refine your search results to keep that from happening. Well, PicScout’s got that fixed for you.

Now, keep in mind, I’m a lowly duckling dealing with this technology, so this is how I understand it to work. PicScout already had a program called ImageTracker, which searches and finds images based on the algorithms it employs for image identification, as a part of the Image IRC platform. The IRC is what programs the images with metadata (licensing information, owner, etc.) so that ImageTracker will work to police infringement. Those two products are connected by the ImageExchange Add-on, which lets you, a person searching for an image, connect with the owner.

It’s a nifty little idea, quite simple, and it can go a lonnnnnnng way to solving your image search nightmares. So if you deal with online images a lot, you might check PicScout out.

Jessica Bratcher, October 18, 2009

The PR lady did not even thank me for this item. Sigh.

Written by Stephen E. Arnold · Filed Under Feature, Rich media, Search, Technology | 1 Comment

HP Analysis Urges Mainframe Rip and Replace

October 1, 2009

I read “Staying on Legacy Systems Ends Up Costing IT More” absolutely fascinating. The article appeared on the Ziff Davis Web site. There is a link to a podcast (latency and audio made this tough for me given my age, lousy hearing, and general impatience with serial info streams) and a series of excerpts from the “Briefings Direct” podcast discussion. The sponsor of the podcast was, according to the Web site, is Hewlett Packard. HP is on my radar because the company just merged its personal computer and printer business. I suppose that will make it untenable for me to describe HP as “the printer cartridge company”. I really liked that description, but now HP is a consulting firm and a PC company. Much better I suppose.

I abandoned the audio show and jumped to the transcript which you can obtain by clicking http://interarborsolutions.books.officelive.com/Documents/DoingNothing901.pdf.

The premise of the podcast, in my interpretation, is that smart companies will want to dump legacy hardware and systems for the hot, new hardware and systems available from HP. I understand this type of message. I use them myself. The idea sounds good. The notion of progress is based on the idea that what’s new is better than what came before. I won’t drag out the Jacques Ellul argument that technology creates more technology and more, unexpected problems. I will also ignore the studies of progress such as Gregg Easterbrook’s The Progress Paradox: How Life Gets Better While People Feel Worse, originally published in December 2003, five years before the economic dominos starting falling in April 2008. I won’t point out that “legacy” is not defined in a way that helped me understand the premise of the discussion. And, I won’t beat too forcefully on the fuzziness of word “cost” as the industry experts use the term. But costs are the core of the podcast, so I will have to make a quick dash through the thicket of accounting methods but not yet.

HP red ink as metaphor for the cost problems of a mainframe to next generation platform solution.

The first idea that snagged me was “cost hasn’t changed”. What changed was the amount of cash available to organizations. I don’t buy this. First, it is not clear what is included in the data to support the generalization. Without an indication of direct and indirects, capital, services, and any other costs that are associated with a legacy system, I can’t let the generalization into the argument. Without this premise in place, the rest of the assertions are on think ice, at least for me.

Second, consider this assertion by one of the HP “transformation” experts:

What’s still there, and is changing today, is the ability to look at a legacy source code application. We have the tools now to look at the code and visualize it in ways that are very compelling. That’s typically one of the biggest obstacles. If you look at a legacy application and the number of lines of code and number of people that are maintaining it, it’s usually obvious that large portions of the application haven’t really changed much. There’s a lot of library code and that sort of thing

My view is that “obvious” is a word that can be used to create a cloud of unknowing. Mainframe apps, if stable, and doing a good enough job may be useful because the application has not changed. As one of my neighbors here in Harrods Creek said, “If it ain’t broke, don’t fix it.” In my experience, that applies to mainframe apps that are working. If a mainframe app is broken, then an analysis is required to track down direct and indirect costs, opportunity costs, and fuzzy to be sure, but important going-forward costs. Not much is obvious once one gets rolling down the path of the rip-and-replace approach. In my experience, the reason mainframe apps continue to chug along in insurance companies, certain travel sectors, and some manufacturing firms is because they are predictable, known, and stable. Jumping into a whizzy new world may be fun, but such a step may not be prudent within the context of the business. But HP and its wizards aren’t known for their own rock solid business decisions. I am thinking of the ball drop with AltaVista.com and the most recent mash up of the printer and the PC industry. Ink revenue will make HP’s PC revenues soar, but it won’t change the nature of that low margin business.

Written by Stephen E. Arnold · Filed Under Business strategy, Feature, Financial, News, Technology | Comments Off on HP Analysis Urges Mainframe Rip and Replace

What If Google Books Goes Away?

September 21, 2009

I had a talk with one of my partners this morning. The article in TechRadar “Google Books Smacked Down by US Government” was the trigger. This Web log post captures the consequences portion of our discussion. I am not sure Google, authors, or any other pundit embroiled in the dust up over Google Books will agree with these points. That’s okay. I am capturing highlights for myself. If you have forgotten this function of this Beyond Search Web log, quit reading or look at the editorial policy for this marketing / diary publication.

Let’s jump into the discussion in media res. The battle is joined and at this time, Google is on the defensive. Keep in mind that Google has been plugging away at this Google Book “project” since 2000 or 2001 when it made a key hire from Caere (now folded into Nuance) to add a turbo charge to the Books project.

Who is David? Who is Goliath?

With nine years of effort under its belt, Google will get a broken snout if the Google Books project stops. Now, let’s assume that the courts stop Google. What might happen?

First, Google could just keep on scanning. Google lawyers will do lawyer-type things. The wheels of justice will grind forward. With enough money and lawyers, Google can buy time. Let’s face it. Publishers could run out of enthusiasm or cash. If the Google keeps on scanning, discourse will deteriorate, but the acquisition of data for the Google knowledge base and for Google repurposing keeps on keeping on.

Second, Google might agree. Shut up shop and go directly to authors with an offer to buy rights to their work. I have four or five publishers right now. I would toss them overboard for a chance to publish my next monograph on the Google system, let Google monetize it any way it sees fit, and give me a percentage of the revenue. Heck, if I get a couple of hundred a month from the Google I am ahead of the game. Note this: none of my publishers are selling very many expensive studies right now. The for fee columns I write produce a pittance as well. One publisher cut my pay by 30 percent as part of a shift to a four day week and a trimmed publishing schedule. Heck, I love my publishers, but I love an outfit that pays money more. I think quite a few authors would find publishing on the Google Press most interesting. If that happens, the Google Books project has a gap, but going forward, Google has the info and the publishers and non participating authors have a different type of competitive problem.

Third, Google cuts a new deal, adjusts the terms, and keeps on scanning books. Google’s management throws enough bird feed to the flock. Google is secure in its knowledge that the future belongs to a trans-national digital information platform stuffed with digital information of various types. No publisher or group of publishers has a comparable platform. Microsoft and Yahoo were in the book game and bailed out. Perhaps their platforms can at some point in the future match Google’s. But my hunch is that the critics of Google’s book project are not looking at the value of the information to Google’s knowledge base, Google’s repurposing technologies, and Google’s next generation dataspace applications. Because these are dark corners, the bright light of protest is illuminating the dust and mice only.

One theme runs through these three possibilities. Google gets information. In this game, the publishers have lost but have not recognized it. Without a better idea and without an alternative to the irreversible erosion of libraries, Google is not the miserable little worm that so many want the company to be. Just my opinion.

Stephen Arnold, September 21, 2009

Written by Stephen E. Arnold · Filed Under Business strategy, Feature, Google, Government, Publishing, Technology, Text analytics, Text processing | 1 Comment

European Search Vendor Round Up

September 16, 2009

Updated at 8 29 am, September 17, 2009, to 23 vendors

I received a call from a very energetic, quite important investment wizard from a “big” financial firm yesterday. Based in Europe, the caller was having a bad hair day, and he seemed pushy, almost angry. I couldn’t figure out why he was out of sorts and why he was calling me. I asked him. He said, “I read your Web log and you annoy me with your poor coverage of European search vendors.”

I had to admit that I was baffled. I mentioned the companies that I tracked. But he wanted me to do more. I pointed out that the Web log is a marketing vehicle and he can pay me to cover his favorite investment in search. That really set him off. He wanted me to be a journalist (whatever that meant) and provide more detailed information about European vendors. And for free.

Right.

After the call, I took a moment and went through my files to see which European vendors I have mentioned and the general impression I have of each of these companies. The table below summarizes the companies I have either profiled in my for fee studies or the companies I have mentioned in this diary / marketing Web log. You may disagree with my opinions. I know that the azure chip consultants at Gartner, Ovum, Forrester, and others certainly do. But that’s understandable. The addled geese here in Harrod’s Creek actually install systems and test them, a step that most of the azure chip crowd just don’t have time because of their exciting work to generate enough revenue to keep the lights on, advise clients, and conduct social network marketing events. Just my opinion, folks. I am entitled to those despite the wide spread belief that I should be in the Happy Geese Retirement Home.

Vendor	Function	Opinion
Autonomy	Search and eDiscovery	One of the key players in content processing; good marketing
Bitext	Semantic components	Impressive technology
Brox	Open source semantic tools	Energetic, marketing centric open source play
Empolis GmbH	Information management and business intel	No cash tie with Attensity
Exalead	Next generation application platform	The leader in search and content processing technology
Expert System	Semantic toolkit	Works; can be tricky to get working the way the goslings want
Fast ESP	Enterprise search, business intelligence, and everything else	Legacy of a police investigation hangs over the core technology
InfoFinder	Full featured enterprise search system	my contact in Europe reports that this is a European technology. Listed customers are mostly in Norway.
Interse Scan Jour	SharePoint enterprise search alternative	Based in Copenhagen, the Interse system adds useful access functions to SharePoint; sold in Dec 2008
Intellisearch	Enterprise search; closed US office	Basic search positioned as a one size fits all system
Lumur Consulting	Flax is a robust enterprise search system	I have written positively about this system. Continues to improve with each release of the open source engine.
Lexalytics	Sentiment analysis tools	A no cash merger with a US company and UK based Infonics;
Linguamatics	Content processing focused on pharma	Insists that it does not have a price list
Living-e AG	Information management	No cash tie with Attensity
Mindbreeze	Another SharePoint snap in for search	Trying hard; interface confusing to some goslings
Neofonie	Vertical search	Founded in the late 1990s, created Fireball.de
Ontoprise GmbH	Semantic search	The firm’s semantic Web infrastructure product, OntoBroker, is at Version 5.3
Pertimm	Enterprise search	Now positioned as information management
PolySpot	Enterprise search with workflow	Now at Version 4.8, search, work flow, and faceted navigation
SAP Trex	Search tool in NetWeaver; works with R/3 content	Works; getting long in the tooth
Sinequa	Enterprise search with workflow	Now at Version 7, the system includes linguistic tools
Sowsoft	High speed desktop search	Excellent, lightweight desktop search
SurfRay	Now focused on SharePoint	Uncertain; emerging from some business uncertainties
Temis	Content processing and discovery	Original code and integrated components
Tesuji	Lucene enterprise search	Highly usable and speedy; recommended for open source installations

Updated at 8 29 am Eastern, September 17, 2009

Written by Stephen E. Arnold · Filed Under Business strategy, EDiscovery, Feature, Marketing, Open source, Search, Text analytics, Text processing | 3 Comments

Beyond the Database: Implications for Organizations

September 9, 2009

The challenge in information technology in general and information management is particular is that we face a “bridge” challenge. On one side are individuals using a wide range of devices. These include Microsoft Zune HD, Google Android phones, and netbooks like the one I am using. Millions of the young-at-heart have full-scale computers like this Apple iTouch.

On the other side, are large businesses with entrenched information technology infrastructures. Change is expensive, time consuming, and often fiercely resisted by employees. Change means relearning methods that work.

When someone searches for information, a variety of sources is available. For example, if a NOAA professionals gets a weather alert, he or she can pull from many sources. The problem is that the “answer” is not evident.

What about a search for “Florida severe weather”? Bing and Google return laundry lists of results. My research suggests that users do not want laundry lists. Users do want answers or a result that gets them closer to an answer and farther from the almost useless laundry list of results.

In this talk (converted to an essay), I will comment about some of Google’s new technology, but I want to point out that Microsoft is working in this field as well. Most of the major players in search, content processing, and business intelligence know that laundry lists are a dead end, of low value, and a commodity.

Google’s corporate strategy looks unorganized. The Sunday, January 28, 2007, New York Times’s article about Steve Ballmer included a reference to Google’s dependence on search advertising. The implication was that Google is a one-trick pony and therefore vulnerable. Google is in a tough spot because if advertising goes south, the company has to have a way to monetize its infrastructure. Google has spend billions building a global datasphere, a subject to which I will return at the end of this talk / essay.

Stand on the edge of a slice in the land near Antrim, Northern Ireland. You see a gap which you can cross using a rope bridge. Someday, a modern steel structure may be put in place. But for now, the Northern Ireland residents need a “good enough” solution.

That’s the problem the Federal government and many organizations face. Instead of a gap in the terrain, there are many legacy systems inside the organization and new systems outside the organization. The systems gap creates major problems in information access, security, and efficiency. In today’s economic climate and in the new Administration’s commitment to serving citizens, a digital bridge is needed, sooner rather than later.

The opportunity is to bridge these two different sides of the river of technology that flows through our society. Similar gaps can be identified in the structured and unstructured information gap, the legacy systems versus the Web service enabled systems gap, the Microsoft versus Google gap, archived data versus real time data gap, the semantic versus statistical gap, and others.

The question is, “How can we get the bridge built?” and “How can we deal with these gaps?”

These are important issues, and the good news is that tools and approaches are now becoming available. I will highlight some of Google’s innovations and mention one company that has a product available that provides a functional “bridge” between existing IT infrastructure and Google’s services. Many tools are surprisingly affordable, so progress—in my opinion—will be picking up steam in the next six to 12 months.

Because I have limited time, I will focus on Google and make do with side references to other vendors working to build bridges between organizations’ internal systems and the fast-moving, sometimes more innovative world external to the organization.

I have written three monographs about Google technology: The Google Legacy in 2005, Google Version 2.0 in 2006, and Google: The Digital Gutenberg this year. Most of the information I am going to mention comes from my research for these monographs which are available from Infonortics, Ltd. (http://www.infonortics.com). The information in my monographs comes from open source intelligence.

In the Q&A session, I will take questions about IBM’s, Microsoft’s, and other companies’ part in this information drama, but in this talk most of my examples will be drawn from Google. I don’t work for Google and Google probably prefers that I retire, stop writing my monographs and blog posts about the company, and find a different research interest.

Let’s start with a query for an airplane flight.

Written by Stephen E. Arnold · Filed Under Business strategy, Database, Feature, Google, Government | 1 Comment

Real Time Search: Point of View Important

September 3, 2009

Author’s Note: I wrote a version of this essay for Incisive Media, the company that operates an international online meeting. This version of the write up includes some additional information.

Real-time search is shaping up like a series of hurricanes forming off the coast of Florida. As soon as one crashes ashore, scattering Floridians like dry leaves, another hurricane revs up. Real-time search shares some similarities with individual hurricanes and the larger weather systems that create the conditions for hurricanes.

This is a local-global or micro-macro phenomenon. What real time search is and is becoming depends on where one observes the hurricane.

Look at the two pictures below. One shows you a local weather station. Most people check their local weather forecast and make important decisions on the data captured. I don’t walk my dogs when there is a local thunderstorm. Tyson, my former show ring boxer, is afraid of thunder.

Caption: The Local Weather: Easy to Monitor, Good for a Picnic

Image source: http://www.usa.gov

The other picture taken from an earth orbit shows a very different view of a weather system. Most people don’t pay much attention to global weather systems unless they disrupt life with hurricanes or blizzards.

Local weather may be okay for walking a dog. Global weather may suggest that I need to prepare for a larger, more significant weather event.

The Weather from the International Space Station

Image source: http://www.usa.gov

I want to identify these two storms and put each in the context of a larger shift in the information ecosystem perturbed by real time search. The first change in online is the momentum within the struggling traditional newspaper business to charge for content. Two traditional media oligopolies appear to be shifting from the horse latitudes of declining revenue, shrinking profit, and technology change. Rupert Murdoch’s News Corporation wants to charge for quality journalism which is expensive. I am paraphrasing his views which have been widely reported.

The Financial Times–confident with its experiments using information processing technology from Endeca (www.endeca.com) and Lexalytics (www.lexalytics.com)–continues to move forward with its “pay for content” approach to its information. The fact that the Financial Times has been struggling to find a winning formula for online almost as long as the Wall Street Journal has not diminished the newspaper’s appetite for online success. The notion of paying for content is gaining momentum among organizations that have to find a way to produce money to cover their baseline costs. Charging me for information seems to be the logical solution to these companies.

With these two international giants making a commitment to charge customers to access online content, this local storm system is easy to chart. I think it will be interesting to see how this shift in a newspaper’s traditional business model transfers to online. In a broader context, the challenge extends to book, magazine, and specialist publishers. No traditional print-on-paper company is exempt from inclement financial weather.

One cannot step into the same river twice, so I am reluctant to point out that both News Corporation and the Pearson company have struggled with online in various incarnations. News Corporation has watched as Facebook.com reached 350 users as MySpace.com has shriveled. Not even the tie for advertising with Google has been sufficient to give MySpace.com a turbo boost. The Wall Street Journal has embraced marketing with a vengeance. I have documented in my Web log (www.arnoldit.com/wordpress) how the Wall Street Journal spams paying subscribers to buy additional subscriptions. You may have noticed the innovation section of the Wall Street Journal that featured some information and quite a bit of marketing for a seminar series sponsored by a prestigious US university. I was not sure where “quality journalism” began and where the Madison Avenue slickness ended.

Written by Stephen E. Arnold · Filed Under Business strategy, Feature, Real time search, Technology, Text processing | 1 Comment

Ideal, Simple, and Good Enough

September 1, 2009

I just read this Web page headline: “SAP NetWeaver Enterprise Search: Simple and Secure Access to Information”. Wow, simple and search pushed together like peanut butter and jelly, ham and eggs, and hammer and nail. The problem is the word “simple”. Who does not want simplicity? Life today is too complicated. Make it simple.

The three meta issues swirling around simple search and content processing have their roots in the fecund soil of user annoyance. Most users have zero clue about the more sophisticated features in any desktop or Web application. The evidence is not far to seek. Look at these three questions. How many can you answer without recourse to Google, your friendly power user, or digging through books in the ever smaller computer section of Barnes & Noble or Borders.

How do you limit Google results to only those for US government and state information?
How do you create a single, presentation quality graphic from Excel 2007?
How do you delete unwanted colors in Framemaker 7.2 when you import a graphic format other than jpg?

The answer to the Google question is to navigate to Google.com, click on Advanced, scroll to the bottom of the page, and click on the Uncle Sam option.

The answer to the second question is to use a third party application from an outfit in France called GlobFX.

The answer to the Framemaker question is to open a version of the document with the correct color information. Go to File Import and select the option for importing a template. Make sure only the color information option is selected. Make the source the file with the “correct” color information and the target the file with the unwanted color information.

An even better example can be found in the usage of the advanced search functions for Web search systems. In general, users enter 2.3 words on average per query and fewer than five percent of search users access the advanced search functions.

Who cares?

I care a little bit, but not enough to give a talk about the way in which those creating systems make life almost unbearable for users. I am sufficiently motivated to define three terms and offer some comments.

Ideal

I find meetings in which requirements emerge from a group discussion. The focus jumps between a micro problem (“I can’t find my most recent version of this document”) to science fiction (“I need to see information from many sources on one screen so I don’t have to hunt or scan for the information I need”). Unless there is a method for capturing these requirements and assigning some meaningful tag for difficulty or cost to each, the exercise is interesting but often not super productive.

In my experience, folks like to talk about ideal features and functions. The chatter is similar to that I recall from my freshman class in Philosophy in 1962.

The problem is that when a vendor or a developer charts a course for the idea, the journey may be more expensive and time consuming than Odysseus’s return home from Troy.

When my team encounters a cost overrun and a system that is never completed, I think, “Ideal”.

Written by Stephen E. Arnold · Filed Under Feature, Search, Technology, Text processing | Comments Off on Ideal, Simple, and Good Enough

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Pingar for Personalized Search

Charging Backward by Charging for Content

Reflections on SharePoint and Search

Image Search Headache Removal Service

HP Analysis Urges Mainframe Rip and Replace

What If Google Books Goes Away?

European Search Vendor Round Up

Beyond the Database: Implications for Organizations

Real Time Search: Point of View Important

Ideal, Simple, and Good Enough

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta