Useful: How to Prevent Scraping

September 4, 2011

It is becoming more and more difficult to retain credit for digital passages. Have you ever thoughtfully posted to your site only to find you’ve been outranked on your own content? “Fighting Scrapers when Google Won’t: A Simple Guide” provides some easily implemented steps toward prevention of content theft.

The advice fits neatly under the following banners:

  1. Make regular Updates
  2. Link back to your site
  3. Add “Read More” URL inclusions
  4. Truncate Your RSS.

These are some useful, common sense suggestions. Basically, treat your online work as you would your lunch in an office: write your name all over it. Those relying on screen scraping technology for content are, in my opinion, lazy. Crating original content or providing a service by highlighting significant articles as I am doing in this short write up, the screen scrapers would reduce clutter on the Internet. Many scrapers are taking content short cuts. Please, heed the advice in the “Fighting Scrapers” article. Add author tags, links to your page, clipping a passage to dangle the meat with a “read more” etc.

Sarah Rogers, September 4, 2011

Sponsored by Pandia.com

Google DoubleClick Usage: A Warning Signal?

September 2, 2011

According to a Google report, Facebook is the leader in page views in June 2011. Google determines this rank by analyzing Google data and is used by ad companies in determining where to place ads. The article, Google DoubleClick Stats May Report Inflated Social Media Numbers, on Media Post News, questions to reliability of the Google report.

The information Google acquires through its giant network of data is influential to companies seeking the best place in which to sink billions of dollars in advertising. As the article explains,

Google gives marketers a guideline by allowing them to click on the link for each of the companies listed in the 1,000 sites to discover demographics of site visitors, such as household income and age. For each site on the list, marketers can see the site category, unique visitors, page views and whether the site has ads. The data provides ad placement information, specifications and keywords to find the site.

ComScore, another company specializing in website visit analysis, disagrees with Google’s numbers, claiming Facebook received half the number of clicks Google claims. The discrepancy lies within how the data is gathered. ComScore sells its information, opposed to Google who shares for free, so ComScore most likely does have more accurate results, explain experts.

It doesn’t take a silly goose like Beyond Search’s owner, Stephen E Arnold, to see that some companies might be tempted to make some tweaks to keep the revenue and traffic looking buff.  With possibly questionable activities already popular among Web mavens and Web masters in the area of search engine optimization, is it likely that new methods will emerge to increase clicks to page, ads, and links? Hopefully, no.

Catherine Lamsfuss, September 2, 2011

Sponsored by Pandia.com

Google Study Finds Web Banners Ineffective

August 31, 2011

On Saturday, one reader sent us a link to this story: “Is Google’s Search for Quality Content a Ruse for a Massive Diversion of Cash to Its Own Sites?” We are not sure if the points in the write up are spot on, but the theme of the article connected to another story we noticed.

According to a 2010 survey by Google, the average click through rate for banner ads this past year was 0.09 percent which is down from 0.1 percent in 2009. This decrease leads me to believe that attempts to make banner ads more inviting to potential customers are failing miserably. However, the article Google: Click-Through Rates Fell in 2010 [Study] states:

[The study] found that the format of a display ad can make a difference. A 250×250 pixel ad using Flash got the highest CTR of any format — 0.26%. The worst performers were vertical 120×240 banners with Flash and a full (468×60) banner with Flash, which both got rates of 0.05%.

As with television ads, it’s difficult to determine the effectiveness of digital advertising by only looking at click-through. It is important that we recognize that banner-ads are not created inside a vacuum, but are rather one small part of a larger complex advertising strategy. Needless to say, if studies continue to come out showing any aspect of this strategy to be failing it could lead to major implications for Google.

At lunch on Sunday, I discussed these two items with two people immersed in Web advertising. Three observations stuck in my mind:

First, if there is a softening in click through or online ad revenue, Google will have little choice but find ways to pump up its revenue.

Second, the notion of social media fatigue seems germane. People may be tired of online ads. The result is to shift to a more low profile “pay to play” model. Overt ads may be on the down side after a long run up.

Third, the urgency for organizations like Google and Flipbook to find a way to inject rich media is an indication that the ad revenues flowing to television advertisers are the next Klondike.

I am not sure what to think, but this notion that online ad revenue may need some xoskeletal supports is fascinating. There are significant implications for objective search results as well.

Jasmine Ashton, August 31, 2011

Sponsored by Pandia.com

Calibre Aces Ebook Conversion and Management

August 30, 2011

Anyone who uses an eBook knows how challenging managing all the books can be. To solve this annoying problem a new program has entered the market: Calibre, an eBook management tool. With so many different types of files and equally different types of eReaders available, it’s nice to finally have a central command to sort through it all.

The concept was borne from an avid eBook enthusiast and reader, who was unhappy with the software available for eBook management and file conversion. Calibre, as it is today, is a work-in-progress that aims to meet the demands of busy eReading folk. As the website explains,

Today Caliber is a vibrant open-source community with half a dozen developers and many, many testers and bug reporters. It is used in over 200 countries and has been translated into a dozen different languages by volunteers. Calibre has become a comprehensive tool for the management of digital texts, allowing you to do whatever you could possibly imagine with your e-book library.

Perhaps the best feature of Calibre is its ability to convert all types of files making it possible for one to download an eBook of any type and then miraculously send it to the eReader of choice. Voila! As one Calibre fan wrote in the article, Best Ebook Library Manager: Calibre, on Book Sprung, “Calibre’s secret weapon is that it’s got crazy ninja formatting skills, and can convert all sorts of files into all sorts of other files. For Kindle owners, this means you can convert unusable file formats into the .mobi format that Kindle likes.”

We look forward to seeing what else Calibre can pull out of its hat, and more importantly, if the eBook providers of the world will play nice with the newest teacher’s pet.

Catherine Lamsfuss, August 30, 2011

Sponsored by Pandia.com

The Internet Means Search and Email

August 24, 2011

We were a bit underwhelmed. Though social media is gaining ground, one survey found that it has a long way to go to overtake the number one use of the internet which is searching for information. As discussed in “Who Uses Search Engines? 92% of Adult U.S. Internet Users [Study]”, research center, Pew Internet, found that searching is the single most popular use of the internet with email coming in second.

The survey found that the amount of people searching on a regular basis has grown over the past 10 years in every demographic. Now 92 percent of internet users utilize search engines, with 59 percent of them doing it on a regular basis. Email has similar numbers. The younger, the wealthier, and the more educated are the most likely to search and use email on a daily basis.

This leaves people to wonder what is happening with social media?

It’s certainly true that social sites are growing rapidly. Since 2004, when Pew Internet started looking at social media usage among those surveyed, social sites have risen from 11 percent usage to 65 percent usage. The growth started slowing in 2009, but is continuing a gradual climb.

 

I think it is safe to say that social media popularity will continue to grow, but will never have the numbers associated with searching and email. The likes of Facebook and Twitter just are not alternatives to a search engine. People are always going to need and seek out information which will safely secure the top spot for companies like Google and Yahoo.

Jennifer Wensink August 24, 2011

Sponsored by Pandia.com

Aggregation: A Brave New World?

August 24, 2011

As I’m typing this article on my computer, I must confess, I love pen and paper, the smell of a new book, the sound a newspaper makes when its pages are turned. Unfortunately, these physical things are slowly becoming extinct thanks to the internet. Though I stubbornly resist the allure of Kindle, I can see the writing on the wall, or the tablet.

The article How the Internet Has All But Destroyed the Market for Films, Music and Newspapers from the UK’s The Guardian, believes the impending death of physical newspapers, among other media outlets, is due to the lack of law governing and enforced on the internet. According to it, as long as information can be easily pirated and transmitted to others for free, those footing the bill for creating the movies, music and news will continue to see sharp declines in profits.

image

Image source: http://www.sreweb.com/weekend_emails/sept_10_2010/

To understand how the internet is killing the newspaper star, one must first understand why newspapers have worked so well for so long. It’s all about aggregation and curation. Aggregation is simply the gathering of ‘stuff’; in a newspaper’s case, that stuff is news stories, sports scores, horoscopes, classified ads, etc… Curation is the culling out of unnecessary ‘stuff’.

Newspapers have created brands for themselves because of their unique aggregating and curating. For hundreds of years if someone liked a column in a specific newspaper, they were forced to buy the entire paper to read the one column of interest. The newspaper hoped that the reader would also find the other articles interesting, but it didn’t really matter because the price of the newspaper was the same whether a reader liked one article or all of them.

Read more

Google Plus Demographics

August 14, 2011

Here at the Beyond Search goose pond, we pay more attention to the less zippy aspects of search. The notion of asking someone and getting an answer is a method we learned at our orientation class at Halliburton NUS 40 years ago. The training went something like this.

When you need to know where the diagrams for the ECCS are, you need to ask the duty officer?

Not too fancy, but the method worked despite government and plant operator bureaucratic “efficiency.” Moving questions to another communication medium seems pretty understandable to us. Searching the digital artifacts is an obvious step. We can even get our tiny minds around the notion of knowing who asked whom, what, and when.

When we think about Google Plus, we see a new service which is changing. We think that the changes are coming less quickly than we anticipated. Google seems to be putting considerable effort into the new service. Once a person provides the who, what, why and when for routine communications one has a very interesting commercialization opportunity.

Study Google+ Winning over Suburban Parents, Losing College Kids and Cafe Dwellers” caught our attention on august 13, 2011. The write up provides some early data about the demographics of the 20 million plus Google Plus users. (Am I the only one who eschews using the plus sign because of its role as an operator in some search systems?)

Here’s the passage we noted:

Google+ seems to be falling out of favor among the “colleges and cafes” crowd, generally younger people without children. However, it’s seeing an increase in interest from the “kids and cabernet” segment — defined as “prosperous, middle-aged married couples living child-focused lives in affluent suburbs.” That’s a group that hasn’t embraced Facebook as much as the rest of the population, according to the Experian Hitwise data.

My hunch is that Google is going to want hundreds of millions of users of all demographic stripes and hues. The inclusion of games is a first obvious step of what is a consumerizing move. The video stuff also points down market to me, but I am 67 and not too keen on the boob tube whether implemented on a big screen TV, a mobile device, or some intermediate gizmo like an iPad. A wasteland is a wasteland to me.

The more consumerized a service, the less utility that service has to me. Facebook is the ultimate consumer “space”, and I don’t spend much time in that service. (A couple of the goslings are working on a Facebook implementation for Augmentext.com, but I just watch and learn. I don’t “do.”) Google Plus seems more appropriate to me, but if it goes down-market, then I will drift away. LinkedIn has already become a crazy “hire me” and “I am an expert” place, and I am not too keen on that digital watering hole either. I am willing to be semi flexible, but since I can’t touch my toes, I don’t know how far I can go in this down-market type environment.

Stephen E Arnold, August 14, 2011

ReVerb: The Whole Language Movement

August 12, 2011

Reverb, a new search method, presents an optimistic future for search engines and intelligence levels. Projecting what Web search engines will look like in ten years, ReVerb should hope that the whole language movement doesn’t make a comeback in schools. Requiring users to input an “argument” and a “predicate,” this program automatically identifies and extracts binary relationships from English sentences—and requires users to know the basic parts of a sentence.

Created by the University of Washington’s Turing Center, as a part of the KnowItAll project, there are currently 15 million Reverb extractions available for academic use. This program has blown similar ones out of the water.

The paper entitled, “Identifying Relations for Open Information Extraction” asserts the following:

“[ReVerb] more than doubles the area under the precision-recall curve relative to previous extractors such as TextRunner and WOE-pos. More than 30% of ReVerb’s extractions are at precision 0.8 or higher— compared to virtually none for earlier systems.”

The creators are confident that ReVerb will be useful for queries where target relations cannot be specified in advance and speed is important. Currently, there is a demo available.

Is this the next big thing in search or another public relations push? Will this generate sympathetic vibrations within the Google?

Megan Feil, August 11, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Is Thomson Reuters Chasing after LegalZoom?

August 9, 2011

Here’s another “me too!” development. Taume reports, “Thomson Reuters Launches Westlaw Form Builder.” LegalZoom offers a client the forms required to create a limited liability corporation for less than $100. A lawyer may charge quite a bit more.

Completing an unending stream of forms is a time-consuming aspect of any legal office, and Thomson Reuters hopes their online tool will spell efficiency for its clients. The press release explains what the company hopes will distinguish its product from the competition:

Attorneys can access more than 20,000 official and lawyer-tested forms anytime and anywhere they have an Internet connection. Westlaw editors continually update the forms to ensure they are current, eliminating the need to download upgrades or verify citations. Unlike static forms, Westlaw Form Builder allows users to customize forms, making them specific to a given client and case. And every Westlaw Form Builder plan includes links to any cited authority or commentary on WestlawNext without incurring additional charges, helping users understand the legal context surrounding a particular form.

Completed forms are downloadable, and client data is stored, saving time on re-entry.

Thompson Reuters provides information management tools to clients around the globe in fields from financial and legal to science and health care. And, of course, the company is a respected source of world news coverage. But Thomson is targeting attorneys who are increasingly cost sensitive. Maybe attorneys are using LegalZoom too? The search system works. Oh, LegalZoom looks like a pretty good bargain. Buying legal information from an outfit like Thomson Reuters? Well, it can be more expensive in my experience.

Cynthia Murrell August 9, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Delightful Irony: Human Crashes Google Car

August 7, 2011

This morning my Overflight information service overflowed with Google related information. There were coveys of quales [Latin and not a misspelling, gentle reader] about Google and patents. There was another Googley shutdown story. The idea is that you should just Google a word. Who cares about a “real” dictionary entry. I find the reference appropriate because who cares about a “real” anything, including an azure chip consulting company with a penchant for becoming authorities in ANSI standard controlled term lists. I found a tardy response to the feline centric “How Do I Hate Google? Let Me Count the Ways”, which had precious little of the Elizabeth Barrett Browning gentleness from her pain and suffering.

Consider this EBB passage:

First time he kissed me, he but only kissed The fingers of this hand wherewith I write; And, ever since, it grew more clean and white.

Now evaluate the budding wordsmith Brian S. Hall’s passage:

David Drummond, you are [lame]. Larry, Sergey, you are [lame]. And I know why you’re [lame]. I know why you have monopoly profits in one business, use them to *destroy* other businesses, dominate the newest business (smartphones) and still whine.

Now who should be the focus for legions of soon to be unemployed English majors?

But what caught my attention was this item: “Google Blames a Human for its Robo-Car Crash.” My take: Algorithm good. Human bad.

Now what happens if Google’s next big product initiative such as a relaunch of the fascinating Google TV product line or a fully integrated, graphically consistent interface to the Android mobile devices flops?

Maybe algorithm good, human bad? Amusing to me because humans, not algorithms, are actually making decisions at the Googleplex. So a failure at Google boils down to “Human bad.” Seems logical.

Stephen E Arnold, August 7, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta