December 3, 2012
SearchHub.org is the latest open source community resource offered by LucidWorks in support of Lucene and Solr developers specifically. More than a blog or a forum, SearchHub is an interactive community to exchange ideas. One new item of interest is a session video, “Solr 4: The SolrCloud Architecture.” Read this description to see if this video might be helpful for you or your organization:
“In this talk, Lucene/Solr committer Mark Miller will discuss the low level architecture and design decisions around SolrCloud and distributedLucene Revolution 2012 Download Presentation indexing. Come learn about the latest work on Solr’s new scaling and fault tolerance solution – how it works and how we built it.”
In addition to this session video, there are screencasts, other conference videos, and many how-to instructional pieces. Also, there is a wonderful compilation of resources on the Reference Materials page. Documentation, comparisons, white papers, and tutorials are all included.
SearchHub.org is another way for LucidWorks to give back to the open source community, supporting Apache Lucene and Solr. However, some users may benefit even more from the utilization of LucidWorks products including LucidWorks Search and LucidWorks Big Data. These products are ready to go out-of-the-box and are supported by the industry-vetted power of LucidWorks.
Emily Rae Aldridge, December 03, 2012
August 31, 2012
We think studies about oneself are fascinating. TechEYE.net shares our enthusiasm in “Wikipedia is Accurate Says, er, Wikipedia Study.” Last autumn the Wikimedia Foundation tapped Epic, an “e-learning” company, and researchers at Oxford University to perform an assessment of Wikipedia’s accuracy. The results of the reflectively funded study? Wikipedia was found to be more accurate than Encyclopaedia Britannica. What an upset! Writer Nick Ferrell notes:
“For the record, if you wrote a page on Wikipedia about yourself, you would find that one of its teams of editors had deleted it for being advertising. However when Wikipedia commissions a study into itself and reports that it is wonderful, this is apparently ok.”
Apparently. Incidentally, a 2005 external peer review showed an average of four mistakes per article, as compared to Britannica’s three. The free encyclopedia has improved markedly, it seems. The new report also found Wikipedia articles tend to be more up-to-date. No surprise there; I’ll give them that one, at least.
“What makes us smell a rat is that the report said that there were little differences between the two on style and overall quality score. We were not aware that the Encyclopaedia Britannica articles were penned by a person with a crayon, like some of the Wikipedia articles appear to have been. Nor does the Encyclopaedia Britannica employ people with faked doctorates.”
Good point. I think I’ll wait on an objective study before I draw any conclusions.
Cynthia Murrell, August 31, 2012
August 4, 2012
Makeuseof presents a handy collection of vertical search sites in “Can’t Find a User Manual for Your Gear? Search These Specialist Websites.” Writer Saikat Basu observes that, in the excitement of a new purchase, most of us stuff our user manuals into some corner and forget about them—until we need them! He comments:
“User manuals – those thick (or thin) soft covered sheaf’s of paper with multi-lingual instructions and weird hieroglyphics that we don’t bother to read. . . . We all have rummaged through the house looking for the user manual we ‘misplaced’. No luck.
“Here’s where a bit of smarts comes in. The meticulous guy with foresight will either scan it and keep a softcopy in his computer, or look for a softcopy that’s usually available as PDF on the manufacturer’s site.
“There’s a third option – a bunch of specialist websites which does the hard work for us lazybones, and stockpiles user manuals for us to search and download.”
So, instead of combing through the filing cabinet or, worse, those paper-piles every office seems to collect, turn to this list of sites that can put the desired information at your fingertips at the speed of, well, of your Internet connection. Basu details six sites, describing the purpose behind each, how it works, and what he values most about each one. For example, he likes the forums on Safe Manuals, and appreciates the teardown diagrams at iFixit.
The other four sites that made the list include Retrevo, Manuals Online, eSpares, and Free Manuals (aka TheManuals.com). I recommend tucking the article away for your next manual-related urgency. At the end of the article, Basu puts out the call for reader recommendations, so check the comments section for similar sites.
Cynthia Murrell, August 4, 2012
July 24, 2012
Dictionaries become part of our lives shortly after we start to read and many of us remember the classic textbook copy of Webster. The old texts seem to gather dust, and the addition of a crowd source dictionary will not increase their popularity.
A new dictionary is in the works according to Stylist Magazine’s article “The World’s First Crowd-Sourced Dictionary.” Dictionary publishers Collins are inviting the general public to contribute to their online dictionary, and become involved in the evolution of the English language.
This new online reference will contain not only words, but some of the phrases from slang between friends to abbreviations, jargon or made-up buzzwords, all input by the users.
Anyone can be a part of the process and submitting content is simple, as:
“Users just need to log and submit their phrase of choice, which will go through an editorial evaluation and if accepted appear on the definition page, with your name forever imprinted as the creator of that word.”
“If there’s a word you use with your friends that you think is absolute genius, now’s your chance to let the world know. Collins will also giving away prizes to a person who submits a word every day until the 31st August 2012.”
The thing that makes Collins stand out from other user content sources like Wikipedia is the moderation and approval aspect. A crowd sourced dictionary is not only an interesting concept, but may bring the occasional chuckle as we watch trendy buzzwords come and go.
Jennifer Shockley, July 24, 2012
June 26, 2012
We have stumbled upon an interesting site. Prochronism.com is the project of Princeton History grad student and Harvard Cultural Observatory fellow Ben Schmidt. It tracks lingual anachronisms (words or phrases are not in their correct historical or chronological time) heard in period TV shows. Schmidt creates word clouds and charts that graphically represent the usages of such language. He also offers commentary. For example:
“The worst phrase, at 30x more common, is ‘status meeting.’ It’s a very rare term in either period, which means that we might be able safely to ignore it: but there are a lot reasons not to. It falls pretty readily into the category I discussed in my Atlantic piece of Mad Men dropping 70s and 80s corporate speech in the 1960s recklessly; the very few places it is used in the 1960s seem to slant towards the government/engineering end of the spectrum, making it out of place at a creative small startup; and the Ngram curve veers pretty sharply up around the Carter/Reagan great divide.”
Picky? Perhaps, but we language folks can get that way. What’s interesting to us, though, is the juxtaposition of text mining and the boob tube. What does such a focus say about America’s intellectual bifurcation?
The sun may not rise. TV writers drag themselves out of bed late in the day anyway and may miss the news about their egregious disregard of TV lingoing.
Cynthia Murrell, June 26, 2012
Sponsored by PolySpot
June 25, 2012
The Center Square Journal recently published “Meet Julie Lynch, Sulzer Library’s Historical Search Engine,” an article that introduces readers to the librarian who oversees the archive of manuscripts, maps and photographs donated by residents of Chicago’s neighborhoods north of North Avenue.
According to the article, the Northside Neighborhood History Collection encompasses more than 30 collections that document the history of schools, religious institutions, neighborhoods, homeowners’ associations, local businesses, community leaders, parks, the Chicago River, and the streets and transportation in communities located north of North Avenue to the city limits on the east, west and north sides of Chicago.
Due to the nature of her work, Lynch is the human equivalent of a search engine. However, she differs in one key aspect:
“Unlike Google, Lynch delivers more than search results, she provides context. That sepia-tinged photograph of the woman in funny-looking clothes on a funny-looking bicycle actually offers a window into the impact bicycles had on women’s independence. An advertisement touting “can build frame houses” demonstrates construction restrictions following the Great Chicago Fire. Surprisingly, high school yearbooks — the collection features past editions from Lane Tech, Amundsen and Lake View High Schools — serve as more than a cautionary tale in the evolution of hairstyles.”
Despite the increase in technology that makes searching information as easy as tapping a touch screen, this article reiterates the importance of having real people to contextualize these documents.
Jasmine Ashton, June 25, 2012
Sponsored by PolySpot
June 18, 2012
There is a new law code in the online world and though it honors Ray Bradbury in reference, it might possibly censor others that do too. The article 451: Web Censorship Status Code explains the possibilities when the server is subject to legal restrictions which prevent it servicing the request. One has to wonder, Is this error code bad news for a consulting firm with the same name, 451?
It is very possible that issues may arise as the article gave the example of:
“A Google Android developer and co-creator of one of the first Web search engines, Open Text and XML, proposed to the Internet Engineering Task Force that code 451 be used for, “when resource access is denied for legal reasons.”
“Now, I haven’t made an error when making this request. Furthermore: The server understood the request, but is refusing to fulfill it. In this case, the server did not even see the request. It was intercepted by my ISP and rejected by them on legal grounds. Therefore, none of the existing “error” codes fitted.”
Ironically, the name of this code and the companies below come from Ray Bradbury’s classic science-fiction novel, Fahrenheit 451. Now 451 Research, a leading global analyst and Data Company and 451, who have been in business since 1995 developing interactive design might be afflicted with issues.
Internet Censorship was common place in Middle-Eastern countries and former dictatorships like China, Iran and Egypt. Now, to the disbelief of millions, it has come to the United States. This code will possibly cause havoc for innocent companies of the same name.
Jennifer Shockley, June 20, 2012
June 16, 2012
Want to dive into analytics as a data scientist? Get started with Stonehill College‘s “How to Read Mathematics.” The well-structured article by Shai Simonson and Fernando Gouvea details the reading protocol that will allow anyone to get the most out of reading mathematical explanations as opposed to, say, reading poetry or fiction. The authors explain:
“Students need to learn how to read mathematics, in the same way they learn how to read a novel or a poem, listen to music, or view a painting. . . . Mathematical ideas are by nature precise and well defined, so that a precise description is possible in a very short space. Both a mathematics article and a novel are telling a story and developing complex ideas, but a math article does the job with a tiny fraction of the words and symbols of those used in a novel. “
The article goes on to explain common mistakes math readers make, such as missing the big picture for the details, reading passively, and reading too fast. A wealth of tips for understanding math texts follows, including examples. Much of this is information I knew, but had trouble articulating when my son was in pre-calc. How I wish I had had this piece then!
For anyone looking at a math-heavy field like data analytics, this article is a must-read.
Cynthia Murrell, June 16, 2012
Sponsored by PolySpot
June 12, 2012
Interesting. TNR Global has made a couple of whitepapers available at their Web site; one is on Fast ESP to Solr Migration and the other on Elasticsearch evaluations. To see the papers, navigate to the TNR Global home page at http://www.tnrglobal.com/ and click on their titles on the right of the screen. The documents are free, but registration is required. The description of the FAST to Solr paper reads:
“Is your company using Microsoft FAST ESP on a Linux platform? Many companies have enjoyed the power of Microsoft FAST ESP as their search platform. Unfortunately, Microsoft announced in 2010 they will cease technical support for FAST ESP 5.3 after its 5 year lifecycle for anyone using Linux as their operation system. Migration to another search platform will be a priority, and business leaders and technology professionals are looking closely at Apache Solr as a solution.
“In our White Paper, we compare the two engines, and review different tools to ease the transition from FAST ESP to Apache Solr.”
The folks at TNR Global have more than a decade’s worth of experience in enterprise search and cloud computing solutions. They specialize in FAST ESP and Lucene Solr search solutions for: news sites; publishing; Web directories; information portals; Web catalogs; education; manufacturing and distribution; customer service; and live science professionals. Founded in 2004, the company is located in Hadley, MA.
Cynthia Murrell, June 12, 2012
Sponsored by PolySpot
June 4, 2012
How refreshing it was to see a free book available on linked data. Linked Data: Evolving the Web into a Global Data Space is being offered at no cost as an HTML document. PDF and hard copy versions are also available for a cost.
Linked Data involves using the Web to connect related data that wasn’t previously linked. Sometimes it is also about using the Web to eliminate the roadblocks to linking data that was previously linked using other methods.
The first chapter of the book discusses how structure can either enable or prohibit sophisticated processing. As a general rule of thumb, the more well-defined the structure of the data the easier it is for people to process it.
The article continues this discussion:
“While most Web sites have some degree of structure, the language in which they are created, HTML, is oriented towards structuring textual documents rather than data. As data is intermingled into the surrounding text, it is hard for software applications to extract snippets of structured data from HTML pages.To address this issue, a variety of microformats5 have been invented.”
Those interested in best practices for exposing, sharing, and connecting pieces of data should definitely check out this free information.
Megan Feil, June 4, 2012
Sponsored by PolySpot