Bing and Censorship

July 20, 2009

Short honk: A reader alerted me to the Bing.com filter that chops out certain content and creates a collection of a mini vertical search engine for segmented content. The filter is now applied to X rated content. You can read about the filter in Network World’s story “Bing Gets Porn domain to Filter Out Explicit Images and Videos”. There are a number of complicated issues in play. The present solution creates an interesting revenue generating opportunity for Bing.com. Will Microsoft exploit it? I wonder how different this type of filtering from the Amazon filtering of certain content?

Stephen Arnold, July 20, 2009

Wall Street Journal Suggests Internet Is Dead

July 19, 2009

The addled goose is not certain if the story “The Internet Is Dead (As An Investment)” will be online without a charge when you click the link. Newspapers fascinate me. Some of their information is free; some transient; and some available for hard cash.

What I find useful to follow are stories that make it clear that certain business sectors are “dead”. In Heathrow on Friday, June 17, 2009, I received a free Daily Telegraph when I bought a nut and granola bar. I did not want a newspaper because my Boingo connection was alive. Even though the Daily Telegraph was a svelte bundle of paper, the news was old. Free “yesterday” was not compelling. The argument in James Altucher’s wealth column is that utilities like electricity and the Internet are linked in this way:

Electricity greatly improved our quality of life. But I’m not going to get excited about buying a basket of utility companies. Same for the Internet. Can’t live without it, but can’t live with it (in my portfolio).

I recall reading a business monograph The Mind Of The Strategist: The Art of Japanese Business by Kenichi Ohmae. Now more than a decade old, I recall the case analysis of the bulk chemical business. I wonder if that discussion of an uninteresting, commodity business holds some truths for Mr. Altucher and newspapers thinking along the same lines as the Wall Street Journal. The Daily Telegraph may benefit as well. There were many discarded Telegraphs in the lounge at Heathrow. Online economics requires a recalibration of some business yardsticks. Is Internet investment dead like the company who hit the jackpot with bulk chemicals? Glittering generalities are useful but may reveal more about the thinking of a newspaper’s editorial team beliefs, not the opportunities utilities and commodities represent.

Stephen Arnold, July 19, 2009

Digital Revision and Online

July 18, 2009

Amazon has whipped up a cloud computing thunderstorm. You can tackle this story by entering the word “Kindle” in almost any news search system. One interesting post is MG Giegler’s article for TechCrunch, “Amazon, Why Don’t You Come in Our Houses and Burn Our Books Too?” For me, the key passage was:

This remote deletion issue is an increasingly interesting one. Last year, Apple CEO Steve Jobs confirmed that the company has a remote “kill switch” to remove apps from your device if it thinks that is necessary. To the best of my knowledge, they have yet to use such functionality, and would only do so if there was a malicious app out there that was actually causing harm to iPhones. They have not even used it to kill some poor taste apps that were quickly removed from the App Store, like Baby Shaker.

The addled goose wants to remain above the fray. The core issues from his perspective are different. For instance, as online services roll up via buy outs and the failure of weaker competitive services, a “natural monopoly” emerges. One can see this in the 1980s in the growth of Dialog Information Services and LexisNexis as the big dogs in online search. Over time, options emerged and now there are a handful of “go to” services. As these big dogs respond to challenges and issues, the Amazon deletion event becomes more visible. In my opinion what’s at work is an organization that makes a situational decision and then discovers that its “natural monopoly position” creates unanticipated problems. The ability of some online services to make informed decisions increases after an event such as deleting information. The deletion may be nothing more than a pointer to an object. Metadata and its persistence are more important in some cases than the digital content itself.

The second issue is the increasing awareness users and customers have about utility type services. The customer sees the utility as benign, maybe making decisions in favor of the individual user. The Kindle deletion scenario makes clear that paying customers are not the concern of the organization. I know that the ripples from the deletion of content will not subside quickly. A single firm’s decision becomes a policy issue that is broader than the company’s business decision.

Now shift gears from digital objects that one can find on such sites as Project Gutenberg in Australia to other content. When online services consolidate, the likelihood that digital revisionism will become more widespread seems a likely outcome to me. Policy decisions in commercial entities pivot on money. The policy, therefore, does not consider an individual user.

I know that most government agencies don’t worry about me, paddling around my duck pond. The impact of a decision taken by an online organization seems to send shock waves that may not be on the radar of corporate executives.

The issue, in my opinion, is the blurring of a commercial entity’s decision made for its benefit with broader public policy issues. What happens when an online service becomes a virtual monopoly. Who will regulate the entity? Legal eagles will flock to this issue, but digital revisionism is not new. Digital revisionism now gains importance as more people rely on a commercial entity to deliver a utility service.

Stephen Arnold, July 18. 2009

InQuira IBM Knowledge Assessment

July 18, 2009

A happy quack to the ArnoldIT.com goose who forwarded the InQuira Knowledge Assessment Tool link from one of my test email addresses. InQuire, a company formed from two other firms in the content processing space, has morphed into a knowledge company. The firm’s natural language processing technology is under the hood, but the packaging has shifted to customer support and other sectors where search is an enabler, not the electromagnet.

The survey is designed to obtain information about my knowledge quotient. The url for the survey is http://myknowledgeiq.com. The only hitch in the git-along is that the service seems to be timing out. You can try the survey assessment here. The system came back to life after a two minute delay. My impressions as I worked through this Knowledge IQ test appear below:

Impressions as I Take the Test

InQuira uses some interesting nomenclature. For example, the company asks about customer service and a “centralized knowledge repository”. The choices include this filtering response:

Yes, individuals have personal knowledge repositories (e.g., email threads, folders, network shared drives), but there isn’t a shared repository.

I clicked this choice because distributed content seems to be the norm in my experience. Another interesting question concerns industry best practices. The implicit assumption is that a best practice exists. The survey probes for an indication of who creates content and who maintains the content once created. My hunch at this point in the Knowledge IQ test is that most respondents won’t have much of a system in place. I think I see that I will have a low Knowledge IQ because I am selecting what appear to me to be reasonable responses, no extremes or categoricals like “none” or “all”. I note that some questions have default selections already checked. Ideal for the curious survey taker who wants to get to the “final” report. About mid way through I get a question about the effectiveness of the test taker’s Web site. In my experience, most organizations offer so-so Web sites. I will go with a middle-of-the road assessment. I am now getting tired of the Knowledge IQ test. I just answered questions about customer feedback opportunities. My experience suggests that most companies “say” feedback is desirable. Acting on the feedback is often a tertiary concern, maybe of even lower priority.

My Report

The system is now generating my report. Here’s what I learned: my answers appear to put me in the middle of the radar chart I have a blue diagram which gives me a personal Knowledge IQ.

inquira report 01

Read more

Kapow Technologies

July 17, 2009

With the rise of free real time search systems such as Scoopler, Connecta, and ITPints, established players may find themselves in shadows. Most of the industrial strength real time content processing companies like Connotate and Relegence prefer to be out of the spotlight. The reason is that their customers are often publicity shy. When you are monitoring information to make a billion on Wall Street or to snag some bad guys before those folks can create a disruption, you want to be far from the Twitters.

A news release came to me about an outfit called Kapow Technologies. The company described itself this way:

Kapow Technologies provides Fortune 1000 companies with industry-leading technology for accessing, enriching, and serving real-time enterprise and public Web data. The company’s flagship Kapow Web Data Server powers solutions in Web and business intelligence, portal generation, SOA/WOA enablement, and CMS content migration. The visual programming and integrated development environment (IDE) technology enables business and technical decision-makers to create innovative business applications with no coding required. Kapow currently has more than 300 customers, including AT&T, Wells Fargo, Intel, DHL, Vodafone and Audi. The company is headquartered in Palo Alto, Calif. with additional offices in Denmark, Germany and the U.K

I navigated to the company’s Web site out of curiosity and learned several interesting factoids:

First, the company is a “market leader” in open source intelligence. It has technology to create Web crawling “robots”. The technology can, according to the company, “deliver new Web data sources from inside and outside the agency that can’t be reached with traditional BI and ETL tools.” More information is here. Kapow’s system can perform screen scraping; that is, extracting information from a Web page via software robots.

Second, the company offers what it calls a “portal generation” product. The idea is to build new portals or portlets without coding. The company said:

With Kapow’s technology, IT developers [can]: Avoid the burden of managing different security domains; eliminate the need to code new transaction; and bypass the need to create or access SOA interfaces, event-based bus architectures or proprietary application APIs.

Third, provide a system that handles content migration and transformation. With transformation an expensive line item in the information technology budget, managing these costs becomes more important each month in today’s economic environment. Kapow says here:

The module [shown below]  acts much as an ETL tool, but performs the entire data extraction and transformation at the web GUI level. Kapow can load content directly into a destination application or into standard XML files for import by standard content importing tools. Therefore, any content can be migrated and synchronized to and between any web based CMS, CRM, Project Management or ERP system.

image

Kapow offers connections for a number of widely used content management systems, including Interwoven, Documentum, Vignette, and Oracle Stellent, among others.

Kapow includes a search function along with application programming interfaces, and a range of tools and utilities, including RoboSuite (a block diagram appears below):

image

Source: http://abss2.fiit.stuba.sk/TeamProject/2006/team05/doc/KapowTech.ppt

Read more

Big Data, Big Implications for Microsoft

July 17, 2009

In March 2009, my Overflight service picked up a brief post in the Google Research Web log called “The Unreasonable Effectiveness of Data.” The item mentioned that three Google wizards wrote an article in the IEEE Intelligent Systems journal called “The Unreasonable Effectiveness of Data.” You may be able to download a copy from this link.

On the surface this is a rehash of Google’s big data argument. The idea is that when you process large amounts of data with a zippy system using statistical and other mathematical methods, you get pretty good information. In a very simple way, you know what the odds are that something is in bounds or out of bounds, right or wrong, even good or bad. Murky human methods like judgment are useful, but with big data, you can get close to human judgment and be “right” most of the time.

When you read the IEEE write up, you will want to pay attention to the names of the three authors. These are not just smart guys, these are individuals who are having an impact on Google’s leapfrog technologies. There’s lots of talk about Bing.com and its semantic technology. These three Googlers are into semantics and quite a bit more. The names:

  • Alon Halevy, former Bell Labs researcher and the thinker answering to some degree the question, “What’s after relational databases”?”
  • Peter Norvig, the fellow who wrote the standard textbook on computational intelligence and smart software
  • Fernando Pereira, the former chair of Penn’s computer science department and the Andrew and Debra Rachleff Professor.

So what do these three Googlers offer in their five page “expert opinion” essay?

First, large data makes smart software smart. This is a reference to the Google approach to computational intelligence.

Second, big data can learn from rare events. Small data and human rules are not going to deliver the precision that one gets from algorithms and big data flows. In short, costs for getting software and systems smarter will not spiral out of control.

Third, the Semantic Web is a non starter so another method – semantic interpretation – may hold promise. By implication, if semantic interpretation works, Google gets better search results plus other benefits for users.

Conclusion: dataspaces.

See Google is up front and clear when explaining what its researchers are doing to improve search and other knowledge centric operations. What are the implications for Microsoft? Simple. The big data approach is not used in the Powerset method applied to Bing in my opinion. Therefore, Microsoft has a cost control issue to resolve with its present approach to Web search. Just my opinion. Your mileage may vary.

Stephen Arnold, July 17, 2009

Bozeman’s Hot Idea

July 16, 2009

I have had several conversations with individuals who have had in the course of their working lives some connection with law enforcement and military intelligence. What I learned was that the Bozeman idea has traction. The “Bozeman idea” is the requirement for city job applicants to provide their social networking details. Among the details requested as part of the job application process was log in details for social networking services.

According to the Montana News Station’s “Bozeman City Job Requirement Raises Privacy Concerns”,

The requirement is included on a waiver statement applicants must sign, giving the City permission to conduct an investigation into the person’s “background, references, character, past employment, education, credit history, criminal or police records.” “Please list any and all, current personal or business websites, web pages or memberships on any Internet-based chat rooms, social clubs or forums, to include, but not limited to: Facebook, Google, Yahoo, YouTube.com, MySpace, etc.,” the City form states. There are then three lines where applicants can list the Web sites, their user names and log-in information and their passwords.

What I have now learned is that a number of European entities are discussing the Bozeman idea. Early word – unofficial, of course – is that Bozeman has had a Eureka! moment. Monitoring is much easier if one can log in and configure the system to push information to the interested party.

I am on the fence with regard to this matter. Interesting issue.

Stephen Arnold, July 16, 2009

Software Robots Determine Content Quality

July 15, 2009

ZDNet ran an interesting article by Tom Steinert-Threlkeld about software taking over human editorial judgment. “Quality Scores for Web Content: How Numbers Will Create a Beautiful Cycle of Greatness for Us All” is worth tucking into one’s folder for future reference.

Some background. Mr. Steinert-Threlkeld notes that the hook for his story is a fellow named Patrick Keane, who worked at the Google for several years. What’s not included in Mr. Steinert-Threlkeld’s write up is that Google has been working on “quality scores” for many years. You can get references to specific patent and technical documents in my Google monographs. I just wanted to point out that the notion of letting software methods do the work that arbiters of taste have been doing is not a new idea.

The core of the ZDNet story was:

Keane is at work on figuring out what will constitute a Quality Score, for every article, podcast, Webcast or other piece of output generated by an Associated Content contributor. If his 21st Century content production and distribution network can figure out how to put a useful rank on what it puts out on the Web then it can raise it up, notch by notch. This scoring comes right back to the Page Rank process that is at the heart of Google’s success as a search engine. “The great thing about Page Rank in Google ‘ s algorithm is … seeing the Web as a big popularity contest,’’ said Keane, in Associated Content’s offices on Ninth Avenue in Manhattan.

Mr. Steinert-Threlkeld does a good job of explaining how the method at Mr. Keane’s company (Associated Content) will approach the scoring issue.

My thoughts, before I forget them, are:

  • Digging into what Google has disclosed about its scoring systems and methods is probably a useful exercise for those covering Google and the businesses in which former Googlers find themselves. The key point is that the Google is leaning more heavily on smart software and less on humans. The implication of this decision is that as content flows go up, Google’s costs will rise less quickly than those of outfits such as Associated Content. Costs are the name of the game in my opinion.
  • Former Googlers are going to find themselves playing in interesting jungle gyms. The insights about information will create what I cool “Cuil situations”; that is, how far from the Googzilla nest with a Xoogler stray? My hunch is that Associated Content may find itself surfing on Google because Associated Content will not have the plumbing that the Google possesses.
  • Dependent services, by definition, will be subordinate to the core provider. Xooglers may be capping the uplift of their new employers who will find themselves looking at short term benefits, not the long term implications of certain methods.

I think Associated Content will be an interesting company to watch.

Stephen Arnold, July 15

Semantic Search Revealed

July 14, 2009

I read “Semantic Search round Table at the Semantic Technology Conference” in ZDNet Web logs. Paul Miller, the author of the write up, did a good job, including snippets from the participants in the round table. In order to get a sense of the companies, the topics covered, and the nuances of the session, please, read the original. I want to highlight three points that jumped out at me:

First, I saw that there was a lot of talk about semantics, but I did not come away from the participants’ comments with a sense that a single definition was in play. Two quick examples:

  • One participant said, ‘It means different things’. Okay, but once again we have “wizards” talking about search in general and semantic search in particular and I am forced to deal with ambiguity. “Different things” means absolutely zero to me. True, I am an addled goose, but my warning flights started flashing.
  • The Googler (artificial intelligence guru Dr. Peter Norvig) put my feathers back in place. He is quoted as saying, ‘Different types of answers are appropriate for different types of questions…’. That’s okay, but I think that definition should have been the operating foundation for the entire session.

Second, the wrap up of the article focused on Bing.com. Now Bing incorporates Powerset, according to what I have read. But Bing.com is variation on the types of search results that have been available from such companies as Endeca for a while and from newcomers like Kosmix. The point I wanted to have addressed is what specific use is being made of semantics in each of the search and content processing systems represented in the roundtable discussion. Unreasonable? Sure, but facts are better than generalities and faux politesse.

Finally, I did not learn much about search. Nothing unusual in that. Innovation was not what the participants were conveying in their comments.

Bottomline: great write up, disappointing information.

Stephen Arnold, July 14, 2009

Oracle, Publishing, and XSQL

July 14, 2009

I am a big fan of the MarkLogic technology. A reader told me that I should not be such a fan boy. That’s a fair point, but the reader has not worked on the same engagements I have. As a result, the reader has zero clue about how the MarkLogic technology can resolve some of the fundamental information management, access, and repurposing issues that some organizations face. I am all for making parental type suggestions. I give them to my dog. They don’t work because the dog does not share my context.

The same reader who wanted me to be less supportive of MarkLogic urged me to dig into Oracle’s capabilities in Oracle XSQL, which I know something about because XSQL has been around longer than MarkLogic has.

Now Oracle is a lot like IBM. The company is under pressure because its core business lights up the radar of its licensees’ chief financial officer every time an invoice arrives. Oracle is in the software, consulting, open source, and hardware business. Sure, Oracle may not want to make SPARC chips, but until those units of Sun Micro are dumped, Oracle is a hardware outfit. Like I said, “Like IBM.”

MarkLogic has been growing rapidly. The last time I talked with MarkLogic’s tech team, it was clear to me that the company was thriving. New hires, new clients, and new technologies—these added to the buzz about the company. Then MarkLogic nailed another round of financing to fuel its growth. Positive signs.

Oracle cannot sit on its hands and watch a company that is just up Highway 101 expand into a data management sector right under Oracle’s nose. Enter Oracle XSQL, which is Oracle’s answer to MarkLogic Server.

The first document I examined was “XSQL Pages Publishing Framework” from the Oracle 9i/XML Developer’s Kits Guide. I printed out my copy, but you can locate an online instance on the Oracle West download site. I am not sure if you will have to register. Parts of Oracle recognize me; other parts want me to set up a new account. Go figure. Also, Oracle has published a book about XSQL, and you can learn more about that from eBooksLab.com. You can also snag a Wiley book on the subject: Oracle XSQL: Combining SQL, Oracle Text, XSLT, and Java to Publish Dynamic Web Content (2003). A Google preview is available as well. (I find this possibly ironic because I think Wiley is a MarkLogic licensee but I might be wrong about that.)

Oracle has an Oracle BI Publisher Web log that provides information about the use of XSQL. The most recent post I located was a June 11, 2009, write up but the link pointed to “Crystal Fallout” dated May 22, 2009. Scroll to the bottom of this page because the results are listed in chronological order, with the most recent write up at the bottom of the stack. The first article, dated May 3, 2006, is interesting. “It’s Here: XML Publisher Enterprise Is Released” by Tim Dexter provides a run down of the features of this XSQL product. A download link is provided, but it points to a registration process. I terminated the process because I wasn’t that interested in having an Oracle rep call me.

I found “BI Publisher Enterprise 10.1.3.2. Comes Out of Hiding” interesting as well. The notion that an Oracle product cannot be found underscores another aspect of Oracle’s messaging. From surprising chronological order to hiding a key product, Oracle XSQL seems to be on the sidelines in my opinion.

An August 31, 2007 post “A Brief History of BIP” surprised me. The enterprise publishing project was not a main development effort. It evolved out of frustration with circa 2007 Oracle tools. Mr. Dexter wrote:

Three years later and the tool has come a long way … we still have a long way to go of course. But you’ll find it in EBS, PeopleSoft, JDE, BIEE as a standalone product, integrated with APEX and maybe even bundled with the database one day – its a fun ride, exhausting but fun.

This statement, if accurate, pegs one part of XSQL in 2004. (I apologize that the links point to the long list of postings, but Oracle’s system apparently cannot link to a single Web log post on a separate Web page. Annoying, I know. MarkLogic’s system provides such fine grain control with a mouse click, gentle reader.)

When we hit 2009, posts begin to taper off. A new release—10.1.3.3.3—was announced in May 2008. The interesting posts described the method of tapping into External Data Engines Part I, May 13, 2008) and Part 2, May 15, 2008).

image

The flow seems somewhat non intuitive to me, even after reading two detailed Web log posts.

An iPhone version of Publisher became available on July 17, 2008.

In August 2008, Version 10.1.3.4 was released. The principal features, as I understand them, were:

  • Integration with Oracle Enterprise Performance Management Workspace
  • Integration with Oracle “Smart Space”
  • Support for multidimensional data sources, including Hyperion Essbase, SQL Server, and SAP Business Information Warehouse (!)
  • Usability and operation enhancements which seem to eliminate the need to write scripts for routine functions
  • Support for triggers
  • Enhanced Web  services support
  • A Word template builder
  • Support for BEA Web Logic, JBoss, and Mac OS X.

Another release came out in April 2009. This one was 10.1.3.4.1 and focused on enhancements. When I scanned the list of changes, most of these modifications looked like bug fixes to me. In April 2009, Tim Dexter explained a migration gotcha. I read this as a pretty big glitch in one Oracle service integrating with another Oracle service.

Stepping back I am left with the impression that XSQL and this product are not the mainstream interest of “big” Oracle. In fact, if I had to decide between using Oracle’s XSQL, I would not hesitate in selecting MarkLogic’s solution for these reasons:

  1. MarkLogic has one mission: facilitate content and information management. The company is not running an XQuery side show. The company runs an XQuery main event.
  2. The MarkLogic server generates pages that make it easy to produce crunchy content. The Oracle system produces big chunks of content that are difficult to access and print out. Manual copying and pasting is necessary to extract information from the referenced Web log.
  3. The search function in MarkLogic works. Search in Oracle is slow and returns unpredictable results. I encountered this problem when trying to figure out whether “search” means “Ultra Search” or “SES”.

So, I appreciate the feedback about my enthusiasm for MarkLogic. I think my judgment is sound. Go with an outfit that does something well, not something that is a sideline.

Stephen Arnold, July 14, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta