Google and Content Processing

October 12, 2009

I find the buzz about Google’s upgrades to its existing services and the chatter about Google Books interesting but not substantive. My interest is hooked when Google provides a glimpse of what its researchers are investigating. I had a conversation last week that pivoted on the question, “Why would anyone care what a researcher or graduate students working with Google do?” The question is a good one and it illustrates how angle of view determines what is or what is not important. The media find Google Books fascinating. The Web log authors focus on incremental jumps in Google’s publicly accessible functions. I look for deeper, tectonic clues about this trans-national, next generation company. I sometimes get lonely out on my frontier of research and analysis, but, as I said, perspective is important.

That’s why I want to highlighting a dense, turgid, and opaque patent application with the fetching title “Method and System for Processing Published Content on the Internet”. The document was published on October 8, 2009, but the ever efficient USPTO. The application was filed on June 9, 2009, but its technology drags like an earthworm through a number of previous Google filings in 2004 and more recent disclosures such as the control panel for a content owner’s administering of a distribution and charge back for content. As an isolated invention, the application is little more than a different charge at the well understood world of RSS feeds. The problem Google’s application resolves is inserting ads into RSS content without creating “unintended alerts”. When one puts the invention is a broader context, the system and method of the invention is more flexible and has a number of interesting applications. These are revealed in the claims section of the patent application.

Keep in mind that I am not a legal eagle. I am an addled goose. Nevertheless, what I found suggestive is that the system and method hooks into my analysis of Google’s semantic functions, its data management systems, and, of course, the guts of the Google computational platform itself for scale, performance, and access to other Google services. In short, this is a nifty little invention. The component that caught my attention is the controls made available to publishers. The idea is that a person with a Web log can “steer” or “control” some of the Google functions. The notion of an “augmented” feed in the context of advertising speaks to me of Google’s willingness to allow a content producer to use the Google system like a giant information facility. Everything is under one roof and the content producer can derive revenue by using this facility like a combination production, distribution, and monetization facility. In short, the invention builds out the “digital Gutenberg” aspect of the Google platform.

Here’s how Google explains this invention:

The invention is a method for processing content published on-line so as to identify each item in a unique manner. The invention includes software that receives and reads an RSS feed from a publisher. The software then identifies each item of content in the feed and creates a unique identifier for each item. Each item then has third party content or advertisements associated with the item based on the unique identifier. The entire feed is then stored and, when appropriate, updated. The publisher then receives the augmented feed which contains permanent associations between the third party advertising content and the items in the feed so that as the feed is modified or extended, the permanent relationships between the third party content and previously existing feed items are retained and readers of the publisher’s feed do not receive a false indication of new content each time the third party advertising content is rotated on an item.

The claims wander into the notion of a unique identifier for content objects, item augmentation, and other administrative operations that have considerable utility when applied at scale within the context of other Google services such as the programmable search engine. This is a lot more interesting than a tweak to an existing Google service. Plumbing is a foundation, but it is important in my opinion.

Stephen Arnold, October 12, 2009

Written by Stephen E. Arnold · Filed Under Business strategy, News, Online (general), Publishing, Real time search, Technology, Text processing | Comments Off on Google and Content Processing

History of Social Media

October 9, 2009

I find the social media “revolution” a combination of old and new. The guts of the technology have been exposed for years. Some of the newest applications take advantage of mash up methods and bandwidth to create quite interesting online services. In my next Information World Review column I write about an innovation from Georgia Tech. The system displays real time data from devices such as traffic cameras. Writing the column forced me to do a quick review of the history of social media. I located a useful article that some readers may want to read. “Major Advances in Social Networking” provides a helpful summary of important milestones in this sector of content creation and processing. I found the selection of examples and the categories useful; for example, Lifestreaming. I did not agree with everything in the article, but I found it helpful in looking at the sweep of the social media innovation machinery.

Stephen Arnold, October 9, 2009

Written by Stephen E. Arnold · Filed Under News, Online (general), Social | Comments Off on History of Social Media

Comments about Google and Content Preservation

October 8, 2009

The search pundits are chasing the Google press conference. The addled goose flapped right over the media event and spent time with “Google’s Abandoned Library of 700 Million Titles”. The article, which appeared in Wired Magazine, tackles the history of the Deja.com usenet archive. The article is interesting for three reasons:

The content is “ancient ruins”; that is, not in good shape
Access is problematic because the search function is, according to Wired, are “extremely poor”
There has not been much attention focused on this content collection.

I access Google Groups’s content occasionally. My personal experience is that I can find useful information on certain topics. For me, the most interesting comment in the Wired article was:

In the end, then, the rusting shell of Google Groups is a reminder that Google is an advertising company — not a modern-day Library of Alexandria.

Not as affirming as the news flowing from the Google media event, but I found the Wired article suggestive.

Stephen Arnold, October 8, 2009

Written by Stephen E. Arnold · Filed Under Business strategy, Google, News, Online (general) | 2 Comments

Intelligenx Profiled in CIO

October 2, 2009

A happy quack to the Intelligenx team. The write up in the Spanish language CIO was a PR coup for this Washington, DC area company. You can read the story “La Base de Datos no es el Futuro de los Datos” in Spanish here or in English via Google Translate. Intelligenx delivers blistering performance. The profile said:

Un muy importante Banco Latinoamericano, no llamó por que tenía una amenaza latente de seguridad, el tiempo de indexación de sus logs de todo un día era de 11 horas, utilizaban un servidor de 4 procesadores y 4 Gb de ram. Nosotros tomamos los datos los colocamos en una notebook con 2Gb de ram e indexamos todo en 20 minutos. Se podrán imaginar que no es posible brindar seguridad a un sistema con una demora de 11 horas para saber que ocurre en mis logs. Otro caso similar ocurrió con una empresa de telecomunicaciones que necesitaba guardar los registros de llamadas durante 30 días y estos registros sumaban 30 billones de registros, Cuando tenían un requerimiento judicial para buscar un dato específico en su base, le llevaba mas de 24 horas encontrar un dato y recibían mas de 30 requerimientos judiciales al mes…Otro caso interesante en el que confluyen la capacidad de Search con las capacidades de interoperabilidad de nuestro producto se dio en el Ministerio de Justicia de Brasil, con cinco regiones y cientos de juzgados que tenían plataformas y sistemas diferentes y consultar jurisprudencia era una tarea imposible. Con nuestro producto generamos una capa de interoperabilidad que se adapta a todas y cada una de las plataformas de cada juzgado y disponibilizamos cualquier documento en tiempos que no superan los 150 milisegundos.

A flap of the wings to the Zubair and Iqbal Talib.

Stephen Arnold, October 2, 2009

Written by Stephen E. Arnold · Filed Under Database, Enterprise, News, Online (general), Technology, Text processing | Comments Off on Intelligenx Profiled in CIO

XML May Get Marginalized

September 29, 2009

I found the write up by Jack Vaughan interesting and thought provoking. XML (a wonderful acronym for Extensible Markup Language), a child of CALS and SGML, two fine parents in my opinion, may have its salad days behind it. You can read “XML on the Wane? Say It Isn’t So, Jack” and make up your own mind. Let’s assume that XML is a bum and no longer the lunch guest of big name executives. What happens? First, the Google methods are what I would call “quasi XML”; that is, XML in but Googley once processed by the firm’s proprietary recipes. My view is that Google gets an advantage because its internal data management methods, disclosed to some extent in its open source technical documents, remains above the fray. Second, if XML goes the way of the dodo, then the outfit with the optimal transformation tools can act like one of those infomercial slicers and dicers—for a fee, of course. Finally, the publishers who have invested in XML face yet another expense. More costs will probably thin the herd. In a quest for more revenue, XML junkies may be forced to boost their prices which will further narrow their customer base. In short, if XML gets the bum’s rush, Google may get a boost and others get a dent in the pocketbook.

Stephen Arnold, September 29, 2009

Written by Stephen E. Arnold · Filed Under Google, News, Online (general), Publishing, Technology, Text processing | 2 Comments

Yebol Web Search: Semantics, Facets, and More

September 28, 2009

“Do We Really Need Another Search Engine?” is an article about Yebol. Yebol is another search engine. The write up included this description of the new system:

According to its developers, “Yebol utilizes a combination of patented algorithms paired with human knowledge to build a Web directory for each query and each user. Instead of the common ‘listing’ of Web search queries, Yebol automatically clusters and categorizes search terms, Web sites, pages and contents.” What this actually means is that Yebol uses a combination of methods – web crawlers and algorithms combined with human intelligence – to produce a “homepage” for each and every search query. For example, search Bell Canada in Yebol and, instead of a Google-style listing of results, you’re presented with a “homepage” that provides details about Bell’s various enterprises, executives, competitors as well as a host of other information including recent Tweets that mention Bell.

The site at http://www.yebol.com includes the phrase “knowledge based smart search.” I ran a query for Google and received a wealth of information: links, facets, hot links to Google Maps, etc.

My search for dataspace, on the other hand, was not particularly useful. I anticipate that the service will become more robust in the months ahead.

The PC World write up about Yebol said:

At launch, Yebol can provide categorized results for more than 10 million search terms. According to the company it intends to provide results for ‘every conceivable search term’ in the next three to six months.

The founder is Hongfeng Yin, was a senior data mining researcher at Yahoo! Data Mining Research team, where he built the core behavioral targeting technologies and products which generate multi-hundred millions revenue. Prior to Yahoo, he was a software manager and Sr. staff software engineer with KLA-Tencor. He worked several years on noetic sciences and human think theory with professor Dai Ruwei and professor Tsien Hsue-shen (Qian Xuesen) at Chinese Academy of Sciences. He has a Ph.D. in Computer Science from Concordia University, Canada and Master degree from Huazhong University of Science and Technology, China. Hongfeng has multiple patents on search engine, behavioral targeting and contextual targeting.

The Yebol launch news release is here. The challenge will be to deliver a useful service without running out of cash. The use of patented algorithms is a positive. Combining these recipes with human knowledge can be tricky and potentially expensive.

Stephen Arnold, September 28, 2009

Written by Stephen E. Arnold · Filed Under News, Online (general), Search, Semantic, Technology, Text processing | 3 Comments

Consultant Temp Omits Context for ATT and Google FCC Dust Up

September 28, 2009

I thought ATT was miffed because Google Voice can block calls ATT cannot. With Google’s method Google gets an edge over ATT. Big surprise, right? The Google can block calls to places like Harrod’s Creek. ATT can charge more for this type of connection. I know. ATT is my phone company.

Then, I read “AT&T Calling Google a Noisome Trumpeter to FCC”. Gerson Lehrman Group is a rental agency for consultants. The idea is a good one. Save the big fees imposed by McKinsey, Booz, and Boston Consulting Group and get solid advice. I think it works reasonably well in this belt tightening market. The analysis of the ATT and Google dust up over Google Voice does what most MBA-inspired analyses do: Describes what’s in the newspapers. One comment caught my attention:

AT&T points out the FCC’s fourth principle of the Internet Policy Statement to be about competition among network providers, application and service providers, and content providers. The FCC issue will be if customers with IP connections are favored in making calls with lower costs and more UC capabilities. The goal for the U.S. market has to be that competition improves communications connectivity regardless of the type of provider.

My view of the squabble is that ATT now realizes that Google is a next generation telecommunications company. In fact, Google’s engineers have pushed into technical fields that were converted to Wal*Marts and Costcos by the “old” Baby Bells. Like farmers angered with new uses for their land, the farmers want to go back to the halcyon days of the past.

Google has marginalized the past, particularly with regard to telecommunications in four ways. None of these is referenced in the consulting firm’s analysis:

Google has built a global infrastructure that provides digital or bit-centric services unencumbered by the methods and systems that US telcos in particular provide their customers. The platform approach means that telco is one business thrust, not THE business thrust.
The technology in play at Google is in some cases based upon a Bell Labs-style of investment; that is, bright people working on big problems. When a breakthrough emerges, Google makes an effort to allow various Google units to “do something” with the invention. I would direct the GLB MBA to how Google has learned from a patent application that has now migrated to Alcatel Lucent. ATT had access to the same invention, missed its significance, and now faces a significant challenge in data management. Just one example from the dozens I have gathered, gentle reader. ATT’s research arm, while impressive, is not like Google’s. I think Google has some refugees from the “old” Bell Labs too.
ATT, like other US telcos, continue to resist what seems to be an obvious tactic—exploiting Google. In the US, companies like ATT prefer to block, chastise, and criticize aspects of Google that are little more than manifestations of its applications platform. Google Voice is an application, and it is not a particularly smart one as Google apps go, based on my research. Instead of asking the question “How can we exploit this Google service?”, the response from publishers, media companies, telcos, and some government agencies is to put Google in a box and keep it there. As I argued in 2004 in The Google Legacy, the river of change has broken through a dam. The river cannot be “put back.”
Analyses that convert a long document into a summary are useful. I do this myself, but when that summary leaves out context, the points without proper definitions float like a firefly’s disembodied glow. What else is Google probing in the telco space? That’s an important question because ATT is dealing with a probe, not an assault. Is ATT missing a larger strategic challenge? Can an Apple ATT tie up win in a game that Apple and ATT not fully understand?

To wrap up, the addled goose gets very nervous when he meets agency rental sporting an MBA name tag. By the way, what does this mean: “The letter to the FCC is from AT&T’s Federal Regulatory and deduces from the hearsay about blocked rural calls that Google saves on the higher termination costs imposed by rural telcos.” Too much MBA sophistication for me.

The tag on the bottom of the article speaks volumes, “Request a Consultation.” This addled goose is quite happy, however, to see the article labeled as a marketing item just like this Web log.

Stephen Arnold, September 28, 2009

Written by Stephen E. Arnold · Filed Under Business strategy, Google, News, Online (general), Technology | Comments Off on Consultant Temp Omits Context for ATT and Google FCC Dust Up

Microsoft Fast ESP with the Microsoft Bing Translator

September 27, 2009

A happy quack to the reader who sent me a link to a write up and a screenshot of the integrated translation utility in the new Fast ESP. The idea is to run a query and get results from documents in different languages. Click on an interesting document and get the translation. To my eye the layout of the screen looked a little Googley, but that’s because I look at the world through the two oohs in the Google logo. The write up is “Enterprise Search and Bing Services – Part 1: The Bing Translator” and you should read the story. Here’s the screenshot that caught my attention:

The article said:

In this example, not only is the user’s query translated and expanded to include other languages (French, German, and Chinese), but the user has the ability to translate the teasers or the entire document using the Bing Translator. The search results also include query highlighting for each of the multiple translations of the query. Finally, the user can use the slider bar (or the visual navigator) to favor documents written in certain languages. Any slider action causes the result set to update automatically. The relevance control behind this slider widget is actually a feature of FAST ESP, but it shows another way of surfacing cross-lingual search.

No information was provided about the computational burden the system adds to a Fast ESP system. Interesting, however. I prefer to see a translated version of the document’s title and snippet in the results list with an option to view the hit in its original language. The “old” Fast Search & Transfer operation had some linguistic professionals working Germany. I wonder if that group is now marginalized or if it has been shifted to other projects. Info about that linguistic group would be helpful. Use the comments section of this Web log to share if you are able.

Stephen Arnold, September 27, 2009

Written by Stephen E. Arnold · Filed Under Enterprise, Microsoft, News, Online (general), Technology, Text processing | 1 Comment

Goggle Points Out that Canada Is Lost Amidst the Maple Leaves

September 26, 2009

I liked the power play that turned the piggy Internet Explorer into sleek Chrome. Microsoft can deal with marginalization. But I was not too happy to read the story “Google Exec Says Canada Missing Web’s Potential.” Assume the story is accurate. I don’t perceive Canada as missing much in technology. I was on the Board of the Sports Information Research Center, which was Webby and one of the first government supported entities to generate a profit and then sell a chunk of its business to a big American publishing company. Tim Bray figured out how to do a nifty SGML database and find time to help with Web standards. I pay attention to Web developments from PEI to Vancouver. I even did a job for the Canadian government to use the Internet to get Métis children educational materials where distance and weather disrupt routine educational access. What interests me is why Google executives, who are obviously bright, find it necessary to make political statements that are interpreted by me as stupid. I recall the Googler Cyrus from Google’s LA office, who told me a diagram from a Google patent application was photoshopped by me. Stupid, stupid AND uninformed. May I suggest that Google focus its brilliance on issues that add some spice to my technical life like challenging Oracle in the data management sector or keeping mum when lists of Google acquisitions conveniently omit one of Google’s most important acquisitions in its history. I want to wrap up with this statement from the article cited above. The Googler is talking about online advertising, but I won’t cut this gleaming, wizard any slack:

“It’s not as competitive a business market, which basically suggests that there’s not as many businesses online because they’re not competing for more share amongst each other or there are not enough businesses competing in certain areas,” said Nikesh Arora, Google’s president of global sales operations and business development…”

Yikes. I can see Mr. Arora’s Googley grin as he displays data that shows Canadian businesses’ scores that qualify them for the short bus. In my opinion, this type of comment qualifies him to swim with me in the pond filled with mine drainage.

Stephen Arnold, September 26, 2009

Written by Stephen E. Arnold · Filed Under Business strategy, Marketing, News, Online (general) | Comments Off on Goggle Points Out that Canada Is Lost Amidst the Maple Leaves

Mobile News Aggregation

September 23, 2009

I wrote an essay about the impending implosion of CNN. The problem with traditional media boils down to cost control. Technology along won’t keep these water logged outfits afloat. With demographics working against those 45 years of age and above, the shift from desktop computers to portable devices creates opportunities for some and the specter of greater marginalization for others. I saw a glimpse of the future when I looked at Broadersheet’s iPhone application. You can read about the service in “Broadersheet Launching “Intelligent News Aggregator” iPhone App”. The app combines real time content with more “traditional” RSS content. The operative words for me are “intelligent”” and “iPhone”. More information is available on the Broadersheet Web site. Software that learns and delivers information germane to my interests on a mobile device is not completely new, of course. The Broadsheet approach adds “time” options and a function that lets me add comments to stories. This is not convergence; the application makes clear the more genetic approach of blending DNA from related software functions.

Stephen Arnold, September23, 2009

Written by Stephen E. Arnold · Filed Under Mobile, News, Online (general) | Comments Off on Mobile News Aggregation

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Google and Content Processing

History of Social Media

Comments about Google and Content Preservation

Intelligenx Profiled in CIO

XML May Get Marginalized

Yebol Web Search: Semantics, Facets, and More

Consultant Temp Omits Context for ATT and Google FCC Dust Up

Microsoft Fast ESP with the Microsoft Bing Translator

Goggle Points Out that Canada Is Lost Amidst the Maple Leaves

Mobile News Aggregation

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta