Kreeo: A New Search System
January 20, 2010
A happy quack to the reader who sent me a link to the new India-based search system Kreeo. I have modest information about the new Web and enterprise product. Based on the story at New Kerala, Kreeo combines a number of technologies to perform:
- Collective intelligence operations
- An alternative to the “traditional model of Web search and indexing”
- Social computing.
According to the news story,
Kreeo’s enterprise offering is an enterprise 2.0 product with a bundle of related services. It’s a new age technology platform that combines best of Web 2.0 and social computing features to bring a new paradigm to managing knowledge and harnessing collective intelligence across people, organizations, customers and other stakeholders with minimal cost and resources. With custom Kreeo solutions, customers can apply the Kreeo framework and technology to solve complex learning and knowledge related challenges and create competitive advantage.
The company says that “The name “Kreeo” is a combination of Latin “Creo” and Sanskrit “Kriya”, meaning “to create” and “activity” respectively. Kreeo, thus, stands for creating knowledge through the activity of networking and collaboration.”
You can get more information at http://www.kreeo.com/#start. We will update you about this new entrant in search as we learn more.
Stephen E Arnold, January 20, 2010
Nope. No one paid me to write this short item. Not one papadum. I will report this unfortunate situation to the Department of Agriculture who tracks lentils and the flour milled from them.
Hakia: No Brand Search Taste Test
January 20, 2010
If you want to run a query and see how Google’s results and Hakia’s results stack up, navigate to NoBrandSearch.com. Google uses its PageRank method plus the accretions added to the core voting system since 1998. Hakia is “a general purpose “semantic” search engine, dedicated to quality search experience. To achieve this goal, our team aspires to establish a new standard of focus, clarity and credibility in Web search.” If you are not familiar with Hakia, you can get basic information and learn about the company’s software and services on the firm’s Web site. For a slightly deeper look at the company’s approach, you can read the interview with Riza C. Berkan in the Search Wizards Speak series on ArnoldIT.com. A running score shows which search system’s results are perceived as “better”. Interesting to run some queries.
Stephen E Arnold, January 19, 2010
A freebie. I did visit the Hakia offices, and I think I got a bottle of water. Otherwise, this is a post done to point out this service. I will report the lack of payment to the Rural Business-Cooperative Service.
Google and Microsoft – Addicted to Health
January 20, 2010
The market is big. Health care is exciting except for those with tons of dough. The White House wants to resuscitate an industry. Oh, did I mention that health care has lots of money sloshing around?
ComputerWorld’s “Microsoft’s Secret Weapon against Google: Health Search” surprised me. Microsoft is pushing into health via deals with big outfits. Google seems to be content to baby step along with what I called its “meh strategy”. You can read more about “meh” in my KMWorld column which I submitted a week ago. For out purposes in this uncompensated blog post, “meh” means “what the heck”. Google likes to dissemble via feigned indifference and doing a little of this and then a little of that. Microsoft is a different animal. The company props up a target and then goes right at that target. Google wanders; Microsoft charges. Same in health.
The article points out that a query for “high blood pressure” in Bing.com generates lots of useful info and suggested links. When ComputerWorld ran the query on Google.com, the GOOG spit out a list of results. Well, that’s not exactly true. The Google has some nifty health search features, but they are only available for certain terms.
Run the ComputerWorld Bing.com query for this phrase: lung cancer. Click around. There is some useful “discover engine” features. Now run the query on Google. Hit end and look about half way down the results list until you see:
Click on the “Google Health Link” and you should see this:
What’s this “standard results list” assertion? The reality is that Google has some nifty health related features but these have not been widely publicized. Google assumes that bright users will connect the dots themselves. The ComputerWorld article does not connect the dots, which is all to common. People assume they know how to search (false assumption) and that they know how to use Google’s system (false assumption). Google does not do much to correct these types of analyses of their system and its services.
Now let’s think about this comment:
Neither Microsoft nor Google does a particular good job with their consumer medical records sites yet. Microsoft HealthVault is slightly superior to Google Health because it appears to have more sites from which you can automatically grab health information, such as pharmacies. But at this point, neither is particularly useful.
I don’t agree. Both companies have made improvements in their medical records functions. The difference is that there are other companies in the consumer medical record business. Microsoft wants to get its platform into the game. Google is doing its “meh” stuff. What’s going to happen is that cost and transformation will become larger and larger factors.
The Google has the right stuff when it comes to content transformation and the tricks its numerical recipes can perform. Microsoft approaches transformation by trying to get everything on Microsoft technologies.
Right now, Microsoft’s push is easier for pundits to see and describe. It is a familiar approach as the ComputerWorld article demonstrates. Google does its “meh” thing and chug chugs along, moving so slowly that its incremental advances are tough to discern just like the search example for lung cancer. If you don’t know where to look, you may miss something that is sitting in front of hundreds of millions of Google users completely unseen. Of course, real journalists don’t miss the obvious or do they?
Stephen E Arnold, January 20, 2010
A freebie. Due to the medical nature of this write up, I will send an email to the administrator of Walter Reed Hospital, a fine facility.
Criticism of Google and the MFA (Made for AdSense) Site
January 20, 2010
Quite an acronym MFA. Even more interesting is that Google is now aiding and abetting the killer of content quality—Demand Media. The search engine optimization crowd is running out of tricks. The adaptation is for consultants and entrepreneurs to get into the content game. Read “The Death of the Professional, Brought to You by Google” and you can decide for yourself if Google is making it easy for content factories to control search results. I did a report for a client about Demand Media, and I think there are some interesting upsides and downsides the company’s approach. I cannot reveal the details of our research in a free, marketing oriented blog, but I can highlight a passage in the Search Engine Guide article and offer several comments. For me, here’s the passage I noted:
If your content is good the visitors have less reason to click the AdSense ads. For the most part the visitor has found what they want and are satisfied. That’s not to say that only bad content gets ad clicks, it’s not, but the more lackluster the content, the AdSense ads provide a way to something more promising. And who get’s paid every time an AdSense ad is clicked? The site running the ads gets a percentage, but the lion’s share go to Google. Because Google controls 70+% of the natural search results and the ads on the side, they essentially have a vested interest in helping companies like Demand Media succeed. The more MFA content that comes up in the search results, the more Google gets paid.
The SEO crowd profited from its collective ability to spoof Google. Google has cracked the SEO problem to some degree and created a content play. The Google will chip away at this problem and then find itself faced with another problem. The reason is that folks unable to create magnetic sites and compelling content want an easy way out.
Think back to your English 101 class. How much time did most of the students spend writing their obligatory essays? How many of those were any good? I had to grade and comment on these papers when I was desperate for money in grad school. My recollection is that the crap on the Internet is about the same as the crap in those freshman essays.
Why not blame Google for that situation as well? Better yet, why not improve writing instruction so that the crap generated for outfits like Demand Media gets better. Last time I checked most people find writing difficult and time consuming. Neither a powerhouse like Google or a Web play like Demand Media can do much about this aspect of reality and the authoring process in my opinion.
Stephen E Arnold, January XX, 2010
A no-one-paid-me article. Because it is about writing, I will this painful fact to the House of Representatives where writing great prose is job one.
Quote to Note: Multicore Chips for Microsoft Devs
January 20, 2010
In “Windows & .NET Watch: Five Predictions for the Next Decade,” Larry O’Brien inked a quote to note. He said:
You cannot develop software for manycore using today’s mainstream concurrency models. I know I sound like a broken record on this, but too many people have stuck their heads in the sand and are willfully ignoring an enormous problem. Writing manycore programs is going to be the hardest technical challenge in your career: harder than understanding object-oriented or functional programming, harder than browser incompatibilities, harder than tracking down memory leaks in a C program.
Azure chip poobahs notwithstanding. Nailed and well stated, Mr. O’Brien.
Stephen E Arnold, January 20, 2010
A freebie. I must report this to the Bureau of Reclamation.
Magnetism and Online
January 19, 2010
I found the write up “Denver Post Owner Plans Bankruptcy Filing” somewhat sad. With middle tier newspapers struggling, I don’t think there will be an easy or quick fix to the collapse of [a] readership, [b] original news reporting, and [c] ad revenue. For me, the most interesting segment of the Washington Times’s story was:
It would be at least the 13th bankruptcy filing by a U.S. newspaper publisher in the past 13 months. The owners of dozens of newspapers have been pushed into bankruptcy protection as the recession and competition from the Internet have sapped advertising revenue. Affiliated Media’s Chapter 11 bankruptcy filing illustrates the uncertainty facing major newspapers publishers as their main source of income — print advertising — has plunged during the past four years. Since 2005, the industry’s annual ad sales have dropped by more than $20 billion, a decline of about 40 percent, based on figures from the Newspaper Association of America.
I also scanned the pride of stories about a big lion (New York Times) getting ready to charge for its content. You can read a summary of this likely step in New York Magazine’s “New York Times Ready to Charge Online Readers”. The write up suggests that NYT execs were not sure about this bold step into charging for access. The passage that struck me as interesting was this one:
The Times has considered three types of pay strategies. One option was a more traditional pay wall along the lines of The Wall Street Journal, in which some parts of the site are free and some subscription-only. For example, editors and business-side executives discussed a premium version of Andrew Ross Sorkin’s DealBook section. Another option was the metered system. The third choice, an NPR-style membership model, was abandoned last fall, two sources explained. The thinking was that it would be too expensive and cumbersome to maintain because subscribers would have to receive privileges (think WNYC tote bags and travel mugs, access to Times events and seminars).
I can see that the thinking about options follows well worn ruts in the online world. The problem is that none of these models will work for a first tier newspaper.
I have written extensively about online for this blog in my Mysteries of Online series. I won’t repeat that information here. Instead I will make a couple of fresh observations and move today’s Sunday New York Times to the recycle bin. I have looked at it already, but I did not find too much to engage me.
Here’s why:
- Take a look at the Financial Times online site at www.ft.com. Now jump to Newssift, the next generation Financial Times’s site. Now navigate to Compete.com and get the usage data for each site. The FT.com site is going nowhere, and it has great content. The Newssift.com site is flat lined. It is not going anywhere ever. Run a query on Newssift and you will see why. I want news, not a logic puzzle. The NYT will be in the same pickle.
- A single title has a very tough time attracting enough traffic and pay the bills. The reason is that mass is needed. The online “mass” has two components. You need lots of content that people really want. And you need eyeballs. I don’t think the NYT can deliver on the “mass”.
- News is one of those information types that has one meaning in the print world and another in the online world. Traditional publishers are not in the real time business. Users of Facebook and Twitter, among other services, are. Governments generate news and don’t charge for it. Some individuals generate news and don’t charge for it. Some trade associations generate news and don’t charge for it. You get the idea. Why pay for the NYT value add which consists of opinions, less than real time turnaround, and the notion that an “editor” knows better than an algorithm? In today’s fast paced world, I trust the algorithm, not traditional newspaper methods.
The NYT demonstrated that it did not understand online when it broke its exclusive with LexisNexis decades ago. The NYT provided it could not sustain a for fee online business when it started on run by my friend and former NYT big wig Jeffrey Pemberton. It provided it could not think like a Web company with its dithering about how too leverage the NYT content over the last decade. Now you want me to believe the NYT has it figured out? The goose is addled, not stupid.
What happens when there is no magnetism? Nothing sticks. That is, quite to my dismay, what is happening to newspapers in general and what will happen to the NYT in particular.
Stephen E Arnold, January 18, 2010
Darn, a freebie. I will report this to the Government Printing Office, an outfit familiar with publishing.
Location Aware Search via Lucene / Solr
January 19, 2010
I located an interesting and helpful post “Location Aware Search with Apache Lucene and Solr” on IBM’s developer works Web site. If you are not familiar with Developer Works you can get additional information by clicking this link. This is IBM’s resource for developers and IT professionals. If you want to search for an article about “location aware Lucene”, you can get a direct link to “Location Aware Search with Apache Lucene and Solr” from the search box at www.ibm.com. That’s a definite plus because the IBM Web site can be tough to navigate.
The write up is quite useful. Like some of the other content on the Developer Works Web site, the author is not an IBM employee. The Lucene / Solr write up is by a member of the technical staff at Lucid Imagination, a company that offers open source builds of Lucene and Solr as well as professional services. (Lucid is interesting because it resells commercial content connectors developed by the Australian company ISYS Search Software.)
The write up is timely and provides quite a bit of detail in the 6,000 word write you. You get a discussion of key Lucene concepts, geospatial search concepts, information about representing spatial data, a discussion of combining spatial data with text in search, examples, sample code, a how to for indexing spatial information in Lucene, a review of how to search by location, and compilation of links to relevant information in other technical documents, interviews with experts, and code, among other pointers.
Several observations:
- The effort that went into this 6,000 word write up is considerable. The work is quite good, and it strikes me as cat nip for some IBM centric developers. IBM is a Lucene user, and I think that IBM and Lucid want to get as many of these developers to use Lucene / Solr as possible. This is a marketing approach comparable to Google’s push to get Android in everything from set top boxes to netbooks.
- The information serves as a teaser for a longer work that will be published under the title of Taming Text. That book should find a ready audience. Based on the data I have seen, many organizations—even those with modest technical resources—are looking at Lucene as a way to get a search system in place without the hassles associated with a full scale search procurement for a commercial system.
- The ecumenical approach taken in the write up is a plus as well. However, in the back of my mind is the constant chant, “Sell consulting, sell consulting, sell consulting”. That’s okay with me because the phrase runs through my addled goose brain every day of the week. But the write up makes clear that there is some heavy lifting required to implement a function such as location aware search using open source software.
The complexity is not unexpected. It does contrast sharply with the approach taken by MarkLogic, an information infrastructure vendor who is making location type search part of the basic framework. Google, on the other hand, takes a slightly different approach. The company allows a developer to use its APIs to perform a large number of geospatial tricks with little fancy dancing. Microsoft is on the ease of use trail as well.
Some folks who are new to Lucene may find the code a piece of cake. Others might take a look and conclude that Lucene is going to be a recipe that requires Julia Childs in the kitchen.
Stephen E Arnold, January 19, 2010
A freebie. An IBM person once gave me an hors d’oeuvre and an Lucid professional bought me a flavored tea. Other than these high value inducements, I wrote this without filthy lucre’s involvement. I will report this to the National Institutes of Health.
YouTube Terms of Service Updated
January 19, 2010
I have been thinking about Google and rich media. Rich media means multimedia. Multimedia means YouTube. These terms are important because Google uses a wide range of words and phrases to describe its rich media services and capabilities.
On January 14, 2010, Google posted “YouTube’s APIs and Refresher on our Terms of Service”. The write up does a good job of highlighting the major changes. My view of the changes is that they nudge the YouTube service forward to commercial payoff land.
For example, the point “Videos belong to their owners” is a gentle reminder that Google’s innovations in giving content owners a control panel on which to input settings is an important function. The more content owners input the rules for a particular content object, the more useful the Google control panel or content owner dashboard becomes in the upload process.
The focus on the YouTube video player is a reminder that consistency for Google is a positive. Google is pointing out that certain actions are not making the Google happy; for example, enable videos for download.
The third point is that the Google wants ads left alone. Period. Stripping ads is a no no. The person who wants to monetize a video can read the API monetization guide. If you have not looked at this API, it is worth a quick look. You can find the API monetization guide with some helpful links on the Google Code page in the write up “Using the YouTube APIs to Bu9ild Monetizable Applications.” We geese at Beyond Search think this is a pretty important chunk of info, by the way.
Finally, Google wants those who do charge for a video to make clear that Google is not charging. My hunch is that Google gets email complaining about fees for some YouTube videos and Google doesn’t have time to handle that type of email. Heck, Google has a tough time handling email for the Nexus One phone. It doesn’t need more email about an issue a content provider causes. Just my opinion, gentle reader.
You may want to add the YouTube API blog to your newsreader if you are into rich media, multimedia, video, or related content types.
Stephen E Arnold, January 19, 2010
A short article I wrote without anyone, including a TV or motion picture company, paying me for the effort. Is the Oscar committee in charge of this type of write up and disclosure. I will report to them to be sure.
ChartSearch: Natural Language Querying for Structured Data
January 19, 2010
On Friday, January 15, 2010, the goslings and I were discussing natural language processing for structured information. Quite a few business intelligence outfits are announcing support for interfaces that eliminate the need for the user to formulate queries. SQL jockeys pay for their hybrid autos because most of the business professionals with whom they work don’t know SELECT from picking a pair of socks out of the drawer. We have looked closely at a number of systems, and each of them offers some nifty features. We heard a rumor about some hot, new Exalead functionality. Our information is fuzzy, so we wish not to speculate.
One of the goslings recalled that a former Web analytics whiz named Chris Modzelewski had developed an NLP interface for structured data. You can check out his approach in the patent documents he has filed. These are available from the cracker jack search system provided by the USPTO. His company ChartSearch, provides software and services to clients who want to find a way to give a plain vanilla business professional access to data locked in structured data tables and guarded by a business intelligence guru flanked by two Oracle DBAs.
ChartSearch uses a variant of XML and a rules based approach to locating and extracting the needed data. Once the system has been set up, anyone with a knowledge of Google can fire off a query to the system. The output is not a laundry list of results or a table of numbers. The method generates a report. His patent applications describe the chart generator, the search query parser, the indexing methods, the user interface, the data search markup language, and a couple of broader disclosures. If you are not a whiz with patent searching, you can start with US20090144318 and then chase the fence posts down the IP trail.
What makes this interesting is that the method has been veticalized; that is, a version of ChartSearch makes it easy to handle consumer data and survey data, special enterprise requirements, and companies that “sell” data but lack a user friendly report and analytic tool.
The founder is a whiz kid who skipped college and then dived into data analytics. If you are looking for a natural language interface to structured data, ChartSearch might be worth a look.
Stephen E Arnold, January 19, 2010
Nope, a freebie. I don’t even visit New York very often, so I can’t call on ChartSearch and demand a bottle of water. Sigh. I will report this to the New York City Department of Environmental Protection. Water is important.
Exclusive Interview with Ana Athayde of Spotter
January 19, 2010
Search solutions have the attention of some executives who want actionable information, not laundry lists of results. I learned about an information retrieval company that I knew nothing about from Ana Athayde. Ms. Athayde developed Spotter as a consequence of her work in business intelligence for a large international organization. She told me, “Laundry lists are not often helpful to a business person.” I agree.
Spotter is what I would describe as a next-generation content processing company. The firm’s technology combines content acquisition, content processing, and output generation in a form tailored to a business professional. Spotter’s chief technology officer (Olivier Massiot) previously worked at the pioneering content processing company, Datops SA.
In an exclusive interview on January 18, Ana Athayde, the founder of Spotter (based in Paris with offices several European cities and the US), provides insight into her vision for next-generation information retrieval. She described the approach her firm takes for customers with an information problem this way:
Our clients ask for strategic input on a brand or market; they require more than a general alert and subject monitoring as provided by the services of popular search engines. Spotter clients expect to know more about their customers and what motivates them, learn about their company’s reputation, and about the current risk pervasive in their environment; not simply obtain an internet search-result report. Our clients need deep dive analysis for decision-making, not just a simple dashboard tool and quantitative graphic displays. They want to be able to interpret what it all means and not just receive a simple data-dump. Spotter provides content analysis and leading edge solutions that meet our customers’ analytical needs such as the ability to map and analyze information pertinent to their business environment, so as to gain a strategic business advantage and make new discoveries. Our solutions solve complex problems and deploy these results throughout the enterprise in a form that makes the information easy to use.
A number of companies are providing knowledge management and business intelligence services that output reports. I asked Ms. Athayde, “What’s the Spotter difference?” She said:
I think the key point we try to make clear is our “bundle”; that is, we deliver a solution, not a collection of puzzle pieces. Our ability to capture, monitor and analyze decisions and their impact requires rich, higher order meta data constructs. Many companies such as Autonomy, Microsoft, and Oracle also promise similar services. But once this has been done, the process of information toward decision is not complete. The main competitive advantage of Spotter is to be able to provide to its clients a full decision-making solution which includes, as I mentioned, analytics and our decision management system… Our solution is engineered to link efficiency and quality control throughout the content processing “chain.”
You can read the full interview with Ms. Athayde on the ArnoldIT.com’s Search Wizards Speak features. For more information about Spotter, visit the firm’s Web site at www.spotter.com. Search Wizards Speaks provides one of the most comprehensive set of interviews with search and content processing vendors available. There are now more than 44 full text interviews. The information in these interviews provides a different slant than the third party “translators” who attempt to “interpret” how a search system works and “explain” a particular vendor’s positioning or approach. The Search Wizards Speak series is a free service from ArnoldIT.com lets you read the full text of key players in the search and content processing sector. Primary source material is the first place to look if you want facts, not fluff.
Stephen E Arnold, January 19, 2010
Full disclosure. Spotter’s sales manager tried to give me a mouse pad. I refused. As a result, no one paid me anything to chase down Ms. Athayde, interview her, and go through the hoops needed to understand the Spotter system. Because the Spotter team seemed quite Euro-centric, I will report my sad state of non compensated work to the US Department of State. An organization sensitive to the needs, wants, and desires of non US people and entities.