Useful: How to Prevent Scraping
September 4, 2011
It is becoming more and more difficult to retain credit for digital passages. Have you ever thoughtfully posted to your site only to find you’ve been outranked on your own content? “Fighting Scrapers when Google Won’t: A Simple Guide” provides some easily implemented steps toward prevention of content theft.
The advice fits neatly under the following banners:
- Make regular Updates
- Link back to your site
- Add “Read More” URL inclusions
- Truncate Your RSS.
These are some useful, common sense suggestions. Basically, treat your online work as you would your lunch in an office: write your name all over it. Those relying on screen scraping technology for content are, in my opinion, lazy. Crating original content or providing a service by highlighting significant articles as I am doing in this short write up, the screen scrapers would reduce clutter on the Internet. Many scrapers are taking content short cuts. Please, heed the advice in the “Fighting Scrapers” article. Add author tags, links to your page, clipping a passage to dangle the meat with a “read more” etc.
Sarah Rogers, September 4, 2011
Sponsored by Pandia.com
IxReveal Closes Deal with CFIX
September 4, 2011
CFIX is the Central Florida Information Exchange, a regional fusion center for certain government entities and professionals. IxReveal describes itself as “an innovative analytic software company focused on giving end uses the ability to fuse and extract knowledge and insight from large amounts of electronic data.”
What makes the company interesting is that the firm’s technology can harmonize electronic data in almost any form. With data transformation and normalization costs skyrocketing, solutions which can help minimize the expense of converting data from one format to another are of increasing importance.
IxReveal positions its software as a “search and analysis” product. The firm’s system identifies concepts automatically. Furthermore, the system provides automatic analytics which allow a user to sidestep the “you don’t know what you don’t know” issue. IxReveal discerns trends, patterns, anomalies, and relationships in the electronic information processed. In addition, the system provides tools for fusing, managing, and analyzing processed information.
For more information about the company, point your browser toward www.ixreveal.com.
Stephen E Arnold, September 4, 2011
Sponsored by Pandia.com
Facebook: Not Necessarily the Root of All Teen Evils
September 3, 2011
Facebook has changed the landscape of teenage socializing, no one would disagree. While it allows people from across the world to keep in touch and share exciting personal news, it also allows highly susceptible teens to be exposed to illegal activities. According the article, U.S. Teens on Facebook More Likely to Use Drugs, on CBC News, although a new report shows that teens who use Facebook are much more likely to engage in drug, alcohol and tobacco abuse, there may be more factors involved.
The article reports of the study and its results:
The National Center on Addiction and Substance Abuse at Columbia University conducted the back-to-school survey of 1,006 teens who answered questions about their use of social media, TV viewing habits and substance abuse. The findings suggested that compared those aged 12 to 17 who spend no time on social networking sites in a typical day, teens who do were: five times more likely to use tobacco, three times more likely to use alcohol, and twice as more likely to use marijuana.
While these facotids might be accurate, one must ask, “What other factors contribute to the results” The study compared one extreme against everyone else, teens who had no presence on Facebook, and teens who spent any time at all on Facebook. The study also had over 1,000 participants. No information was given on the socioeconomic, age, race, or cultural breakdowns of the group.
These data give one pause. But without more information one should not discount Facebook. The article makes an excellent point of mentioning the role of parents and extra-curricular activities as crucial components in teen abuse of drugs and alcohol. To be fair to Facebook, more information is needed as is a cause-and-effect study of teens, Facebook and alcohol/drug abuse.
What’s the relevance to search? With general purpose research shifting to accommodate social content, we want to understand the “content outputters” before we accept the “inputs” without understanding motivations, provenance, behavior, etc.
Catherine Lamsfuss, September 3, 2011
Sponsored by Pandia.com
Google Names: How a Mechanical Engineer Sees the Issue
September 3, 2011
I imagine you’ve all been acquainted with the latest social media fuss, as it is becoming old news by now. The corporate policy for Google+ requires members of its networking site to operate under real names. Users signed up under pseudonyms are finding their profiles suddenly deleted. That’s it. Now, enter public outrage stage left.
Maybe it’s too early, or because I’ve worked forty hours in three days, but despite being drawn into “Google+ Punts on Kafkaesque Name Policy” (thanks to the literary reference that ultimately falls flat) I find flaws with each side.
So, to Google. I realize you want your newest product to be as much like Facebook as possible, stopping only at donning a Mark Zuckerberg wig. You also must contend with an insatiable desire to squirrel away as much personal data as possible for future monetary gains, be it by ownership and sales or expanded search capacity. But come on, this is ridiculous. The given argument for civility in online discourse is not exclusively yours to make. Not only are you alienating your burgeoning clientele, but I don’t believe you can legally force people to use their real names in a non-government application. It is, after all, the internet.
And to would be Google+ participants. I agree with you, I really do. But if I could draw diagram to illustrate my point, it would include a fist-sized sphere representing corporations, and twelve miles away would be a single dot representing your interests. Why is this policy a surprise? Or an indignation considering your beloved Facebook’s policy is identical? Couldn’t you use any name in the friendship graveyard that was known as MySpace.com, which the masses abandoned in lieu of a more tightly controlled environment?
Coming from an individual with little to no online presence, I respect anonymity as much as the next mechanical engineer, if not more so. But I can’t even get behind bumper stickers. So I do genuinely understand the frustration of the prospective Google+ user. But I would like to gently remind readers that true personal ambiguity was ushered out with the twentieth century. Google already knows who you are and will continue to build tools that glean even more data from a largely willing public.
What I find intriguing is this most important point that largely seems to go ignored by both sides of the argument:
The biggest problem with Google’s identity policy has always been that it’s essentially unenforceable. You can’t police millions of users with algorithms looking for nonstandard characters in names or reviewing user-flagged profiles with enough sensitivity to handle edge cases without devoting an absurd number of employee hours to review every violation. By all accounts, Google hasn’t assigned such resources.
It is for this reason that perhaps the ‘activists’ feverishly working to overturn Google’s chosen identity policy should turn to more worldly causes?
Sarah Rogers, September 3, 2011
Sponsored by Pandia.com
Nstein Expands Capabilities
September 2, 2011
“What’s Next For OpenText As It Continues Integration of Nstein’s Technologies?” gives a sneak preview into new features from the company formerly known for semantic technology, now teamed up with OpenText. In the year since the acquisition, Nstein has already begun adapting its knack for accurate searches from masses of Internet pages to local Intranets and emails as part of WCM. But there will be more to come. We learned:
Some highlights moving forward are taking entity extraction and normalization to the space of collecting, analyzing, and finding business trends that emerge across enterprise’s vast collections of documents, sources and repositories — and also going beyond extracting and categorizing named entities and sentiment from text documents to apply semantics to other media, such as photos, videos and other unstructured information.
Working past challenges including wading thru typical enterprise content, namely huge MS Office files, is only a small hiccup. The company’s are developing banks of industry-specific terms to combat that issue.
Other features we can expect in the future include a “listening” platform application and automation of of certain business functions. We will have to wait and see if the Nstein/OpenText entity can meet or exceed the high aspirations, and it seems they’ve barely started. Based on progress to date, will OpenText will be giving its competitors a run for their money? OpenText has a large number of search and content processing technologies, brands, and systems. We learned that RedDot uses the Autonomy search system. Will integration of the firm’s content processing technologies be a priority in 2012? With the acquisition of Autonomy by Hewlett Packard, providing services for Autonomy installations may become an issue to watch in 2012.
Sarah Rogers, September 2, 2011
Sponsored by Pandia.com
NLP, Just What the Doctor Ordered
September 2, 2011
The article, Natural Language Processing Best for EMR Data, on Nurse.com, explains how a new study utilizing natural language processing (NLP) showed an increase in identifying patient safety concerns.
The information reviewed was vast (almost 3,000 patients) and covered 20 measures of “potential adverse effects during hospitalization” including renal failure and pneumonia. By using NLP, the hospital system was able to get fast and accurate information. The article quotes the authors of the study as explaining the benefits of NLP as,
’The development of automated approaches, such as natural language processing, that extract specific medical concepts from textual medical documents that do not rely on discharge codes offers a powerful alternative to either unreliable administrative data or labor-intensive, expensive manual chart reviews.’
The project was a success with NLP offering much more reliable data percentages in all categories of patient safety. The article hypothesizes on the potential of the technology identifying patients ‘at risk’ upon entering the hospital.
This report is impressive and there is no doubt that NLP can help hospitals sort through their mounds of data, but that doesn’t mean that NLP is the answer to all data problems. Hospitals are unique in many ways and their data tends to be very factual. For analysis of that nature, NLP can be very helpful. But to assume it can help all industries is naïve.
Catherine Lamsfuss, September 2, 2011
Sponsored by Pandia.com
Google DoubleClick Usage: A Warning Signal?
September 2, 2011
According to a Google report, Facebook is the leader in page views in June 2011. Google determines this rank by analyzing Google data and is used by ad companies in determining where to place ads. The article, Google DoubleClick Stats May Report Inflated Social Media Numbers, on Media Post News, questions to reliability of the Google report.
The information Google acquires through its giant network of data is influential to companies seeking the best place in which to sink billions of dollars in advertising. As the article explains,
Google gives marketers a guideline by allowing them to click on the link for each of the companies listed in the 1,000 sites to discover demographics of site visitors, such as household income and age. For each site on the list, marketers can see the site category, unique visitors, page views and whether the site has ads. The data provides ad placement information, specifications and keywords to find the site.
ComScore, another company specializing in website visit analysis, disagrees with Google’s numbers, claiming Facebook received half the number of clicks Google claims. The discrepancy lies within how the data is gathered. ComScore sells its information, opposed to Google who shares for free, so ComScore most likely does have more accurate results, explain experts.
It doesn’t take a silly goose like Beyond Search’s owner, Stephen E Arnold, to see that some companies might be tempted to make some tweaks to keep the revenue and traffic looking buff. With possibly questionable activities already popular among Web mavens and Web masters in the area of search engine optimization, is it likely that new methods will emerge to increase clicks to page, ads, and links? Hopefully, no.
Catherine Lamsfuss, September 2, 2011
Sponsored by Pandia.com
Oracle Text Release Notes Jackpot
September 2, 2011
Short honk: OraChat has posted a healthy list of useful release notes worth referencing under “Getting Hold with Oracle Database 11gR2 RAC“.
A few highlights:
- Complete Checklist for Manual Upgrades to 11gR2 (Doc ID: 837570.1)
- How to Download and Run Oracle’s Database Pre-Upgrade Utility (Doc ID: 884522.1)
- Release Schedule of Current Database Releases (Doc ID: 742060.1)
These are just the tip of the tip of the Oracle informative iceberg; there are dozens of documents listed. When you find you need help installing, running, building, adding-on or any number of Oracle activities you may have never considered, I would recommend perusing this list. It is a great and comprehensible collection.
Go ahead and bookmark it. Oracle and its search solutions may not light up the webinar, blog, and social conference circuit. But the various Oracle search solutions have purchase in most major organizations.
Sarah Rogers, September 2, 2011
Sponsored by Pandia.com
Protected: Another SharePoint Start Guide
September 2, 2011
Google Makes an App Engine Pricing Move
September 2, 2011
Short honk: The price change may be nothing, but I think the shift should be documented. I read “Google Ups Pricing as App Engine Leaves Preview: Bait and Switch?” and noted this factoid:
While it’s undoubtedly true that there’s more than CPU to consider, the change in pricing seems to be leaving GAE users with sticker shock. Comparing pricing from one PaaS or cloud service provider to another’s has never been easy. But comparing Google’s old and new pricing is no easy matter either. Bandwidth prices have remained the same, but the switch from CPU time to instances makes it difficult to do the conversion. One response over on Hacker News indicates that the expected bill will go from $9 a month to $270 a month.
Is Google now taking steps to make its enterprise initiative generate more revenue? Is this pricing change an indication that other Google revenue sources are softening? Worth noting.
Stephen E Arnold, September 2, 2011
Sponsored by Pandia.com