Sophia and Reputable Content
March 17, 2011
Killer Startups reviews promising new internet companies, and one of their recent selections caught our eye: “Sophia.org- Online Information You Can Trust.”
Sophia is different because it provides “lesson packets” which bring together text, images, and video from myriad Web sources. They insist that the information is trustworthy because many of those assembling these packages are experts. They also have in place a system for rewarding quality content. As KillerStartups puts it:
To me, the real strength of Sophia lies in how the site enables people to gather together the richest content found on sites that are often plagued by unreliable information such as YouTube, and have it all gathered in easy-to-assimilate packages. The amount of time students could save through a service like this can make a true difference in their ever-hectic lives.
Trusted content is good. Give it a whirl at Sophia.org.
Cynthia Murrell March 17, 2011
Freebie
Baloney Content
March 17, 2011
But wait, there’s more! (Baloney, that is)
Want a great example of baloney content? You may spot some at Blogmoneyprofit.com.
We were amazed by this Web site, which, at first glance, looks like it might provide some useful information for would-be bloggers on how to make money online. Spend just a few minutes on the site, however, and you’ll discover it appears to be some hard-sell ads and empty promises, far more likely to take your money than help you make it.
Is it possible that Blogmoneyprofit.com is a bit like a bad relationship: once you discover it just isn’t what you thought it was it’s still hard to leave. Something keeps calling you back. Like this message:
WAIT BEFORE YOU GO!
CLICK THE CANCEL BUTTON RIGHT NOW
TO STAY ON THE CURRENT PAGE.
I HAVE SOMETHING VERY SPECIAL FOR YOU!
Click “Cancel” and a box might appear asking you to “Sign up for a FREE keyword niche report and get a chance to win a FREE 7-day blog profit e-course. “
We ran a query for “BlogMoneyProfit” and it came up number one on Google. Is the Google algorithm change aware of a value in this site we cannot discern? We thought the search was a synonym for “Can you spell spam-o-rama?”
When I tried to leave the Web site, I saw another offer. I thought that parting message was worse than a bad relationship. I thought the ad was similar to a late night infomercial. I was unaware of a product called Autopilot Profits, which is also in the Google index. Google’s algorithm change may be a great improvement, but it seems to keep these interesting entries in its index. The product is described as a “Plug and play, in-a-box money system.” The ad never explains what this is, exactly, but promises you don’t have to sell a product or even have a Web site. Just turn it on, and “It can’t stop sending you cash … even if you want it to!” I surmise the promise of money appeals on many levels.
How much does this cost? Not $1,997, not $797, not even $297. Just $27! But this introductory price won’t last long, so you’d better act now!
Uh … right.
Now, to be fair, blogmoneyprofit.com does contain some useful information, but nothing you can’t find from other sources – like Google – but without the get-rich-quick gimmicks. (In fact, blogmoneyprofit.com basically uses a list of search engines. We found a possible similarity between the lists on our ArnoldIT.com site and this blog money profit thing. Interesting.
Robin Broyles, March 17, 2011
Freebie
Digital Reasoning Garners Patent for Groundbreaking Invention
March 16, 2011
There are outfits in the patent fence business. Google, Hitachi, and IBM come to my mind. The patent applications are interesting because they provide a window through which one can gaze at some of the thinking of the firm’s legal, engineering and management professionals.
Then there are outfits who come up with useful and novel systems and methods. The Digital Reasoning patent US7882055, “Knowledge Discovery Agent System and Method”, granted on February 1, 2011, falls into this category. The patent application was filed in July 2007, so it took the ever efficient USPTO about 48 months to figure out what struck me when I first read the application. But the USPTO makes its living with dogged thoroughness. I supplement my retirement income by tracking and following really smart people like Tim Estes. I make my judgments about search and content processing based on my experience, knowledge of what other outfits have claimed as a unique system and method, and talking with the inventor. You can read two of my conversations with Tim Estes in the ArnoldIT.com Search Wizards Speak series. The link to my 2010 interview and my 2011 interview are at www.arnoldit.com/search-wizards-speak. (I did an interview with a remarkable engineer, Abe Music, at Digital Reasoning here.) Keep in mind that I was able to convert my dogging of this company to a small project this year. Hooray!
The guts of the invention are:
A system and method for processing information in unstructured or structured form, comprising a computer running in a distributed network with one or more data agents. Associations of natural language artifacts may be learned from natural language artifacts in unstructured data sources and semantic and syntactic relationship may be learned in structured data sources, using grouping based on a criteria of shared features that are dynamically determined without the use of a priori classifications, by employing conditional probability constraints.
I learned from my contacts at Digital Reasoning:
The pioneering invention entails intelligent software agents that extract meaning from text as humans do – by analyzing concepts and entities in context. The software learns as it runs, continually comparing new text to existing knowledge. Associated entities and synonym relationships are automatically discovered and relevant documents are identified from across extremely large corpora.
The patent specifically covers the mechanism of measurement and the applications of algorithms to develop machine-understandable structures from patterns of symbol usage. In addition, it covers the semantic alignment of those learned structures from unstructured data with pre-existing structured data – a necessary step in creating enterprise-class entity-oriented systems. The technology as implemented in Synthesys (TM)? provides a unique and now protected means of bringing automated understanding to end users in the
enterprise and beyond.
So what’s this mean?
The Traditional Method | The Digital Reasoning Method |
In financial analysis, health information, and intelligence applications which do you want to you and your colleagues to use? I go for the Veyron. The 1998 Mustang is great as a back up or knock about. The Veyron means business in my opinion.
Three points:
- This is a true “beyond text” system and method. Key word search and 1998-type methods cannot deliver Synthesys 3.0 (TM) functionality
- Users don’t want laundry lists. The invention delivers actionable information. The value of the method is proven each day in certain very important applications which involve the top concerns of Maslow’s hierarchy
- The system can make use of human inputs but can operate in automatic mode. Many systems include automatic functions, but the method invented by Mr. Estes is a new one. Think of the difference in performance between a 1998 Mustang and the new Bugatti Veyron. Both are automobiles, but there is a difference in state of the art a long time ago and state of the art now.
If you want more information about Digital Reasoning, the company’s Web site is www.digitalreasoning.com.
Stephen E Arnold, March 15, 2011
Freebie but I want a T shirt from Music Row in Nashville
Metadata Are Important. Good to Know.
March 16, 2011
I read “When it Comes to Securing and Managing Data, It’s all about the Metadata.” The goslings and I have no disagreement about the importance of metadata. We do prefer words and phrases like controlled term lists, controlled vocabularies, classification systems, indexing, and geotagging. But metadata is hot so metadata the term shall be.
There is a phase that is useful when talking about indexing and the sorts of things in our preferred terms list. That phrase is “editorial policy.” Today’s pundits, former English majors, and unemployed Webmasters like the word “governance.” I find the word disconcerting because “governance” is unfamiliar to me. The word is fuzzy and, therefore, ideal for the poobahs who advise organizations unable to find content on the reasons for the lousy performance of one or more enterprise search systems.
The article gallops through these concepts. I learned about the growing issue of managing and securing structured and semi structured data within the enterprise. (Isn’t this part of security?) I learned about collaborative content technologies are on the increase which is an echo of locking a file which several people edit in an authoring system.)
I did notice this factoid:
IDC forecasts that the total digital universe volume will increase by a factor of 44 in 2020. According to the report, unstructured data and metadata have an average annual growth rate of 62 percent. More importantly, high-value information is also skyrocketing. In 2008, IDC found that 22 to 33 percent of the digital universe was high-value information (data and content that are governed by security, compliance and preservation obligations). Today, IDC forecasts that high-value information will comprise close to 50 percent of the digital universe by the end of 2020.
There you go. According to the article, metadata framework technology is a large part of the answer to this problem to collect user and group information, permissions information, access activity, and sensitive content indicators.
My view is to implement an editorial policy for content. Skip the flowery and made-up language. Get back to basics. That would be what I call indexing, a component addressed in an editorial policy. Leave the governance to the government. The government is so darn good at everything it undertakes.
Stephen E Arnold, March 16, 2011
Freebie
Publishers Fix Up Metadata
January 28, 2011
Publishers Weekly has posted an article by Nick Ruffilo, of BookSwim.com which caught our attention. The write up describes the advantages of good metadata. Ruffilo also outlines a comparatively painless way to improve the metadata of all your publications.
Though far from new, metadata has recently received more attention in the publishing industry. Better late than never, we suppose. Now more companies understand that better control over your books’ metadata means more guidance over your publication’s path once it’s out of your hands.
Companies are understandably hesitant, however, to commit to a complete metadata-handling overhaul. Ruffilo insists that you can improve sales by making some gradual changes:
“It’s a strategy I like to call the five degrees rule: make your changes five degrees at a time. Focus on very basic metadata practices to start. Encourage both marketing and editorial department input on metadata. Most importantly, get the data right for all your books, and you’ll find that small steps can lead to big gains.”
Following his suggestions now may help generate more money in the future.
Cynthia Murrell January 28, 2011
Wikileaks and Metadata
January 7, 2011
ITReseller’s “Working to Prevent Being the Next Wikileak? Don’t Forget the Metadata.” is worth a look. The write up calls attention to indexing as part of an organization’s buttoning up its document access procedures.
ITReseller says this about metadata:
A key part of the solution is metadata – data about data (or information about information) – and the technology needed to leverage it. When it comes to identifying sensitive data and protecting access to it, a number of types of metadata are relevant: user and group information, permissions information, access activity, and sensitive content indicators. A key benefit to leveraging metadata for preventing data loss is that it can be used to focus and accelerate the data classification process.. In many instances the ability to leverage metadata can speed up the process by up to 90 percent, providing a shortlist of where an organisation’s most sensitive data is, where it is most at risk, who has access to it and who shouldn’t. Each file and folder, and user or group, has many metadata elements associated with it at any given point in time – permissions, timestamps, location in the file system, etc. – and the constantly changing files and folders generate streams of metadata, especially when combined with access activity. These combined metadata streams become a torrent of critical metadata. To capture, analyze, store and understand so much metadata requires metadata framework technology specifically designed for this purpose.
Some good points here, but what raised our eyebrows was the thought that organizations have not yet figured out how to “index”. Automation is a wonderful thing; however, the uses of metadata are often anchored in humans. One can argue that humans need play no part in indexing or metadata.
We don’t agree. Maybe organizations will take a fresh look at adding trained staff to tackle metadata. By closing in house libraries, many organizations lost the expertise needed to deal with some of the indexing issues touched upon in the article.
Stephen E Arnold, January 7, 2011
Freebie
Forrester Tips on Metadata. Whoa, Nellie!
January 4, 2011
The growth of the Internet over the past couple of decades has created a massive amount of regulated and unregulated data that can be a headache for enterprises to sort through.
According to the ZDNet article “Metadata Virtualization and Orchestration Seen as Critical new Technology to Improve Enterprise Data Integration,” business leaders are now recognizing that managing and exploiting information is a core business competency that will increasingly determine their overall success. Therefore, they need to invest in technology and often third party solutions that will manage their data for them.
Noel Yuhanna, Principal Analyst at Forrester Research, and Todd Brinegar, Senior Vice President for Sales and Marketing at Stone Bond Technologies help readers better understand these issues in a panel discussion.
The article asserts:
This discussion then examines how metadata-driven data virtualization and improved orchestration can help provide the inclusion and scale to accomplish far better data management. Such access then leads to improved integration of all information into an approachable resource for actionable business activities.
Behind all of the unbelievable double talk and jargon, there is a good point to this article. However, one must winnow.
Jasmine Ashton, January 4, 2012
Sponsored by Pandia.com
OCLC-SkyRiver Dust Up
December 16, 2010
In the excitement of the i2 Ltd. legal action against Palantir, I put the OCLC – SkyRiver legal hassle aside. I was reminded of the library wrestling match when I read “SkyRiver Challenges OCLC as Newest LC Authority Records Node.” I don’t do too much in libraries at this time. But OCLC is a familiar name to me; SkyRiver not so much. The original article about the legal issue appeared in Library Journal in July 29, 2010, “SkyRiver and Innovative Interfaces File Major Antitrust Lawsuit against OCLC.” Libraries are mostly about information access. Search would not have become the core function if it had not been for libraries’ early adoption of online services and their making online access available to patrons. In the days before the wild and wooly Web, libraries were harbingers of the revolution in research.
Legal battles are not unknown in the staid world of research, library services, and traditional indexing and content processing activities. But a fight between a household name and OCLC and a company with which I had modest familiarity is news.
Here’s the key passage from the Library Journal write up:
Bibliographic services company SkyRiver Technology Solutions recently announced that it had become an official node of the Name Authority Cooperative Program (NACO), part of the Library of Congress’s (LC) Program for Cooperative Cataloging. It’s the first private company to provide this service, which was already provided by the nonprofit OCLC—SkyRiver’s much larger competitor in the bibliographic services field—and the British Library. Previously, many institutions have submitted their name authority records via OCLC. But SkyRiver’s new status as a NACO node allows it to provide the service, once exclusive to OCLC in the United States, to its users directly.
For me, this is a poke in the eye for OCLC, an outfit that used me on a couple of project when General K. Wayne Smith was running a very tight operation. I don’t know how management works at OCLC, but I think any action by the Library of Congress is going to trigger some meetings.
SkyRiver sees OCLC as acting in a non-competitive way. Now the Library of Congress has blown a kiss at SkyRiver. Looks like the library landscape, already ravaged by budget bulldozers, may be undergoing another change. I think outline of the mountain range where the work is underway appears to spell out the word “Monopoly.” Nah, probably my imagination.
Stephen E Arnold, December 16, 2010
Freebie
Indexing and Content Superficialities
November 27, 2010
“Understanding Content Collection and Indexing” provides a collection of definitions and generalizations which makes clear why so many indexing efforts by eager twenty-somethings with degrees in Home Economics and Eighteenth Century Literature go off the rails: it takes more than learning a list of definitions to create a truly useful indexing system. In our opinion, the process should be about solving problems. As the article states:
The ability to find information is important for myriad reasons. Spending too much time looking for information means we’re unable to spend time on other tasks. An inability to find information might force us to make an uninformed or incorrect decision. In worse scenarios, inability to locate can cause regulatory problems, or, in in a hospital, lead to a fatal mistake.
This list is a place to start. It does describe the very basics of content collection, indexing, language processing, classification, metasearch, and document warehousing. We have to ask, though- is this analysis inspired by Associated Content or Demand Media?
For the real deal on indexing, navigate to www.taxodiary.com.
Cynthia Murrell, November 27, 2010
Freebie
Which Is Better? Abstract or Full Text Search?
November 26, 2010
Please bear with us while we present a short lesson in the obvious: “Users searching full text are more likely to find relevant articles than searching only abstracts.” A recent BMC Bioinformatics research article written by Jimmy Lin titled “Is Searching Full Text More Effective than Searching Abstracts?” explores exactly that.
So maybe we opened with the conclusion, but here is some background information. Since it is no longer an anomaly to view a full-text article online, the author set out to determine if it would be more effective to search full-text versus only the short but direct text of an abstract. The results:
“Experiments show that treating an entire article as an indexing unit does not consistently yield higher effectiveness compared to abstract-only search. However, retrieval based on spans, or paragraphs-sized segments of full-text articles, consistently outperforms abstract-only search. Results suggest that highest overall effectiveness may be achieved by combining evidence from spans and full articles.”
Yep, at the end of the day, searching from a bank of more words will in fact increase your likeliness of a hit. The extension here is the future must bring with it some solutions. Due to the longer length of the full-text articles and the growing digital archive waiting to be tamed, Lin predicts that multiple machines in a cluster as well as distributed text retrieval algorithms will be necessary to effectively handle the search requirements. Wonder who will be first in line to provide these services…
Sarah Rogers, November 26, 2010
Freebie