Wikipedia Looks Ahead To Web 3.0
July 15, 2010
As far as Wikipedia’s Foundation is concerned, one of the cornerstones for moving the global resource to the next level and Web 3.0 will be making that data on the site’s 15 million articles decipherable to computers as well as the humans pushing their buttons.
Last month’s 2010 Semantic Technology conference in San Francisco saw developers showcasing how the needed semantic structure might be added to Wikipedia. It’s a big idea for a big database. Still there is a question as to the real value of the move.
The people attending the conference from Wikipedia were also actively recruiting help to make the base of the website more accessible to both computers and software.
One of the questions is how to determine the benefits when the service is implemented.
Rob Starr, July 15, 2010
Freebie
Facebook Users Mean More Business For Us: Google
July 14, 2010
Google Inc Chief Executive Eric Schmidt isn’t intimidated by the likes of new technology or social media at all according to recent statements. Google co founder Sergey Brin had an even more optimistic spin when he told reporters on Thursday that the more Facebook users there were, the more of them actually wound up doing searches on Google.
There was even a hint of a long running tension between Apple and Google over smart phone technology at the Allen & Co conference in Sun Valley. It all sounds like a lot of rhetoric, posturing and the like.
Apple and Google were one time allies before they faced off in lucrative markets like mobile advertising and smart phones. However, it does seem like there’s quite a bit of validity there when Google points out that Facebook users means more business for them.
Is Google just smarter than everyone else, perceiving things that others cannot? Or, is Google looking at the world from its own reality distortion field.
Rob Starr, July 14, 2010
Does a Broken Chain of Trust Lead to Web Site Traffic?
July 14, 2010
Old is new again on the Web, or so it seems. A Reader’s Write on p2pnet.net
has brought up an old argument about the need to try something different and take the Net back from the point where some users feel the chain of trust has broken down and made the Internet a place where the software industry develops only that which has a dollar sign attached.
The argument here is that while the end to end architecture that uses for its power the leaves in the network is good, it has been manipulated to the point where the new topologies that are being used are inefficient.
According to the article, the chain of trust breaks down because the algorithms that collect the information about us as we surf are more than likely distorting it somehow as it’s being reused. In the end the project is all about developing a social web search, but nothing about it is new. It was first proposed in 2006.
But of course the radical , and some would say naïve opinion, doesn’t stop there. Seeks is the answer, at least partially because it connects people using collaborative functionalities on top of existing search engines according to the report.
The result?
People that search the web with similar queries are connected. Seeks will also afford users a self publishing mechanism. Anyone who has access to the Internet will be able to join in. Finally, an index of information will be created that will gradually recapture the information held by big corporations.
Some heady ideas here. Still, there are a few more practical things that the people who are espousing this need to keep in mind. Namely, the fact needs to be addressed that vendors and businesses like to have capabilities like Exalead. This is just one of the concessions that needs to be baked in to make this whole concept palatable to everyone who would need to use it.
The ideas are good but the verdict is out on whether the web’s broken chain of trust leads to Seeks. Seeks that become clicks are of value.
Rob Starr, July 14, 2010
Cambridge Semantics, Simplifying Information Exchange for Business
July 14, 2010
Semantic technology has often been viewed as something better left to the IT professionals with the ability and know how to track important business information and sort through what’s important to businesses’ everyday operations.
Now a Boston company, Cambridge Semantics, is looking to change that with semantic middleware that will benefit the end user and allow them to use semantic technology without the technical expertise. The hope is that it will help to make sense of some of the information that is stored within Excel spreadsheets.
At first glance, this looks like an interesting prospect but this Anzo software could be a stretch, especially where numbers are concerned. Still, this attempt by Cambridge Semantics at simplifying the exchange of information for business is well conceived.
Rob Starr, July 14, 2010
Facebook Now a Springboard
July 14, 2010
Those us in the Internet marketing aren’t surprised, saw it coming, will all now stand in a line and scream ‘I told you so’ to all those who thought that the social media frenzy might have just been a fad.
According to an article in ReadWriteWeb.com, Gigya, a company that provides social optimization platforms for firms that want to take advantage of these new tools, Facebook is the most common jumping off point for people logging in to other sites from social media. The gap that was widening last January is getting bigger too. Presently Facebook accounts for 46 percent of logins from social media.
Strange how the real competition from Google is coming from social media and not Bing. Maybe it’s time real innovators start targeting that site for some competition since Facebook is the preferred starting point for surfing when it comes to social media.
Rob Starr, July 14, 2010
Freebie
China Keeps An Eye On Facebook
July 14, 2010
No one could have realistically expected that Google was going to be the only electronic medium that the Chinese wanted to keep an eye on. The uprisings in Iran taught all the despots that information is power and control. It’s precisely that kind of control the Chinese won’t relinquish easily—not to Google and certainly not to Facebook.
So it’s not a shock that a Chinese backed government think tank has accused social media sites like Facebook of being agents of Western governments and called for stepped up scrutiny of them.
What doesn’t help is U.S. Defense Secretary Robert Gates saying that these new mediums are a “huge strategic asset.” That kind of talk gives China all the ammunition they need to keep an eye on Facebook and other social networking sites.
US companies, although powerful in their national borders, are not on a par with countries we assert.
Rob Starr, July 14, 2010
Freebie
Lucene Revolution Preview: Otis Gospodnetic, Sematext
July 13, 2010
The Lucene Revolution Conference is shaping up. Among the presenters are open source developers representing a wide range of organizations. One of the speakers is Otis Gospodnetic, Sematext’s founder. Mr. Gospodnetic is also the author of Lucene in Action with co-authors Erik Hatcher and Michael McCandless. His firm implements open source search, natural language processing, and text analytics technology in the enterprise. His team focuses on the design and development of scalable, high-performance search and solutions.
I spoke with Mr. Gospodnetic earlier this week. Here are the highlights of our conversation:
Why are you interested in Lucene/Solr?
I’ve always been interested in information gathering, information extraction, search, and related areas. I’m think that’s because I feel that information gathering, extraction, and searching are precursors for gaining knowledge, and knowledge has always been a hobby of mine. If I look back at all my professional experience, everything I ever built had a strong search component. This is why I was happy when I stumbled upon Lucene around 2000 and why I immediately joined the project, even before it was an Apache project, and why I’ve been using Lucene ever since.
What is your take on the community aspect of Lucene/Solr?
Community around Lucene and Solr is as real and as alive and active as it can be. It’s very knowledgeable and quick to help. I’ve been a part of it for around 10 years now, and have witnessed the community grow, as well as its knowledge breadth and depth increase.
When it comes to Lucene/Solr community, the quote I like to give comes from the former Netflix search guy:
I posted, went to get a sandwich, and came back to see two answers. The change works, and I can get the fix into production today. This list is magic.
Both user and development communities are so strong and active that it’s becoming really hard for people to keep up with the volume of output these communities produce. Earlier this year we started publishing monthly Lucene and Solr Digest blog posts. These posts are for people who want to keep up with (or keep an eye on) Lucene and Solr, but don’t have the time to read some 60+ non-trivial-to-read email messages these communities produce every day. See http://blog.sematext.com/ or http://twitter.com/sematext . I hope we are not going through the trouble of getting this published every month just because of some mythical community!
Commercial companies are playing what I call the “open source card.” Won’t that confuse people?
Judging from the demand, I’d say this is not confusing to people. On the contrary, I get the feeling they like the open-source/commercial blend. Plus, there is precedent – commercial support for open-source software has been around for many years now: MySQL, Red Hat have been doing this for years. Not only is this not confusing, it is welcomed. Some people and organizations love and can rely on the community support. Others prefer paid support. At Sematext we do both – some of us participate on Lucene/Solr mailing lists helping as much as we can via that channel. We also publish the already mentioned monthly Lucene and Solr Digest that summarize the new and interesting developments from those two projects, and we offer paid tech support and other types of services for Lucene, Solr, Hadoop, and other related technologies.
What are the primary benefits of using Lucene/Solr?
Let me highlight the points my work has driven home as pivotal.
First, there is the notion of TCO or total cost of ownership. TCO is *much* lower. There are no license fees, no
limitations about the index size, query rates, number of servers, etc.
Second, Lucene/Solr offer flexibility. If you don’t like how something works in Lucene/Solr, you can change it today and deploy it tomorrow. If your use case is good, the community will adopt it and you won’t have to maintain your customized, forked Lucene/Solr version.
Third, quality. Lucene and Solr are mature. They’ve been worked on by many smart people 24/7 around the world for more than 10 years. These people work on Lucene/Solr because that is their passion, not because they are paid to do so, except for the lucky few who also get paid to work on what they love. Lucene and Solr can do a lot – they have lots of features, they are reliable, they are still being worked on and are improved on a daily basis.
And, finally, agility: You need search? You can have something working today. You don’t have to go through budget approvals, through long sales and negotiation cycles, you don’t have to go through wine and dine dates that just create delays that ultimately increase your costs.
When someone asks you why you don’t use a commercial search solution, what do you tell them?
I tell them to wake up. It’s 2010. There are alternatives. Cheaper. Faster. Better. I tell them to read the answers to the previous questions. When I see how much some (all?) of the commercial search solutions cost and I compare that to what we at Sematext can do for a customer for that sort of money… I recently happened to see a quote from one well-known commercial search vendor and my jaw dropped. Well, not really, because I know they charge an arm and the leg for their software, but when you think about how many kids you can put through college for that kind of money.
Let me also quote something that came up recently in a thread titled “Arguments in Favor of Lucene over Commercial Competition”.
In my initial foray into Lucene several years ago, by the time I’d sent a support request to the vendor of a commercial product and received an answer telling me that I hadn’t included the
correct license info and I’d have to provide it before they could talk to me, I’d found Lucene, downloaded it, indexed some of our data and run searches against it. Not to mention that rather than waiting for days to get a response from the commercial vendor, my questions on the Lucene user’s list were answered within a very few hours. With grace and tolerance for my ignorance.
How do people reach you?
Sematext is at http://sematext.com/ and that is the best way to reach the professional me. Our blog and the Digest posts mentioned earlier are at http://blog.sematext.com/ . We are also at http://twitter.com/sematext if you prefer us in 140 char bites.
Will you elaborate on these points in your Lucene Revolution lecture?
Absolutely. Looking forward to the conference and hearing the great speakers. I understand Cisco is giving a talk too.
Stephen E Arnold, July 13, 2010
Post sponsored by Lucid Imagination and the Lucene Revolution Conference.
Gvoernment Scrapping Sites: Are Traditional Web Methods Dead?
July 13, 2010
What more proof does the average consumer and business person need that the traditional website is antiquated and doesn’t generate the leads and traffic you need than the story in computing.co.uk that appeared recently about the government scrapping websites?
They plan on doing away with 75% of their more than 800 websites. The problems uncovered seem to be in three areas:
- Cost
- Usage
- Resource sharing
It’s clear that a better way is needed. A more cost efficient way to reach the people you want to. The Arnold IT way. Find everything that you need with social networking and the Beyond Search Team. Build your brand, generate leads and/or create a community.
When the UK government starts to worry websites aren’t working, something is wrong. When the government starts scrapping Web sites, is this a signal that new methods of communicating are needed? Is the Google era winding down?
Rob Starr, July 13, 2010
Freebie
A Factoid from Dell Computer
July 13, 2010
“Dell: 90% of Data Is Never Read Again” appeared on PC Pro, a UK Web site. The article presented data from Dell Computer that asserted “90% of company data is written once and never read again.” The write up contains some azure chip stuff; for example:
It’s an odd statistic. How is that data measured? 90% of all documents? 90% of stored bytes? When they said “ever again” did they mean explicitly retrieved by name, or should we include free text searches in that statistic? How long an interval needs to pass before some piece of data is clearly identified as belonging to the 90%, so that steps can be taken to reflect its reduced importance?
Anyone hear about offline storage, near line storage, and online storage? Certainly not at Dell, an outfit trying to boost its storage revenues and its knowledge of what companies do with their data.
One of the challenges of enterprise search is to index information and deliver relevant results. Popularity based systems—like the method used in the original Google Search Appliance—don’t work in organizations. Google figured this out and adapted its system. Specialized vendors, including Index Engines, built their business around the fact that once data are archived no one knows what’s there or how to find it.
Modern search and content processing systems are tough to configure for many reasons. One of them is the fact that most information tucked on an organization’s computers is lost. Only a handful of systems deliver what an employee needs to make a business decision. That information is usually relatively recent data. The write up descends into the weeds of which storage systems are going to ring the journalists’ and consultants’ chimes.
The topic I wanted to see addressed was ignored: search, indexing cycles, relevance, and other trivial questions. Buying hardware is more important I suppose.
Stephen E Arnold, July 13, 2010
Freebie
Oui Oui to Dok Dok
July 13, 2010
It’s no surprise that email is the primary way business shares documents and personal users their information and the attachment is the modern envelope. The paradoxical problem with this method has been numbers and categorizing and there have always been many people working on streamlining this part of the Web experience.
As far as the flow of a typical business day is concerned the Holy Grail of embedded findability as far as attachments were concerned has centered around three areas:
- Verifying the most recent attachment because ( at least where business is concerned) there can be multiple ones from the same source
- Tracking the changes which makes sense where business is concerned
- Sharing changes with others
Those were the goals. And all had to be accomplished without interrupting the flow of a typical business day. This is a lucrative proposition if done right and the Canadians couldn’t ignore the possibilities with their answer called DokDok, which the Quebec firm says is an automatic way for their users to locate, update and share the most recent version of any email attachment.
The Montreal based start up was created in 2009 and they promise that DokDok is not anything like the file sharing applications that are trying to replace email. Of course one of the big questions that any prospects have here will be about security.
People want to know if DokDok will be reading their emails or at least have the ability to do so. To the firm’s credit the answer is no. DokDok only indexes metadata, none of the real content in the actual emails.
You don’t even need to give them your Google Apps password if security is your big issue. Still, nothing’s perfect and there are a few drawbacks to DokDok. It’s important to remember here that the new system works only with Gmail’s web interface and does not work with the standard Gmail account.
However, when they’re out of beta, the firm promises big changes.
It’s good to see that Google understands their services can use improving when a good idea comes along that helps to streamline a business day and increase productivity. That’s why it’s Oui Oui to DokDok.
Rob Starr, July 13, 2010
Freebie