A Roundup Of NLP

March 6, 2014

If you are currently conducting research on natural language processing software, but have come to a halt in resources, we located Connexor’s “NLP Library.” Connexor is a company that develops text analysis software components, solutions, and services. They are experts in their line of work and are keen to help people utilize their data to its full extent. Connexor explains that:

“Connexor components have turned out to be necessary in many types of software products and solutions that need linguistic intelligence in text analytics tasks. We work with software houses, service providers, system integrators, resellers and research labs, in the fields of education, health, security, business and administration. We have customers and partners in over 30 countries.”

The company’s NLP Library includes bibliographic citations for articles. We can assume that Connexor employees wrote these articles. They range on a variety of subjects dealing with natural language processing, text evaluation, and they even touch on emotion extraction from text. These articles are a handy resource, especially if you need up to date research. There is only one article for 2014, but the year is still young and more are probably on the way.

Whitney Grace, March 06, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

AI from Carnegie Mellon Specializes in Images

March 6, 2014

Here’s yet another personified AI attempting to mimic the human brain, and this one focuses on processing pictures. TechRadar invites us to “Meet NEIL, the Computer that Thinks Like You Do.” A team from Carnegie Mellon has developed NEIL (Never Ending Image Learner) specifically to interpret images and make connections between them. Writer Dean Evans reports:

According to Xinlei Chen, a PHd student who works with NEIL, the software “uses a semi-supervised learning algorithm that jointly discovers common sense relationships – e.g ‘Corolla is a kind of/looks similar to Car’, ‘Wheel is part of Car’ – and labels instances of the given visual categories… The input is a large collection of images and the desired output is extracting significant or interesting patterns in visual data – e.g. car is detected frequently in raceways. These patterns help us to extract common sense relationships.

As the ‘never ending’ part of its name suggests, NEIL is being run continuously, and it works by plundering Google Image Search data to amass a library of objects, scenes and attributes. The current array of information includes everything from aircraft carriers to zebras, basilicas to hospitals, speckled textures to distinctive tartan patterns.

Of course, NEIL is not perfect; it has incorrectly linked windmills with helicopters and radiators with accordions, for example. Still, its success rate was pegged at 79 percent in a random sample. See the article for more information on how the system works.

NEIL might be considered the little brother to NELL, the Never Ending Language Learner, also built by researchers at Carnegie Mellon. NELL’s specialty is “to ‘read the web’ and to extract a set of true, structured facts from the pages that it analyses.” NELL has been at it since 2010, and has come to over two million conclusions. Will the University continue adding to the family?

Cynthia Murrell, March 06, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Endeca Experience Manager Upgrade

March 6, 2014

Folks at Thanks Media offers a helping hand with their post, “How to Migrate to Endeca Experience Manager / Key Upgrade Factors.” Writer Maracel Munoz introduces his advice for those looking to upgrade from PageBuilder:

“I’ll be straight, the upgrade from Page Builder to Experience Manager isn’t the kind of migration you can just install a software patch, run a few tests in the development and staging environments, and roll it out to production in a day or two. It’s going to require some significant changes and the level of effort isn’t trivial, but the functionality and productivity enhancements will make it well worth the effort.

“Experience Manager is full of great features that will benefit your business processes and help you provide an even better customer experience which I’ll talk about in another post, the focus of this article is to provide an overview on a few things you should be aware of before you dive into the upgrade documentation and start your migration.”

Yes, there are several reasons to allow a generous amount of time for this project. For example, the new code base means the creation of new custom cartridges and templates. Also, component communication paths have been changed, and rule groups must be reorganized. It was point number four, though, that really caught our attention: there’s no automated way to migrate content—you’ll have to move content from the old environment to the new one manually. Munoz frames this factor positively by pointing out that it is “a great opportunity to clean house and improve the integrity of your data.” That’s one way to look at it. See the article for more if you see this upgrade in your future.

The basic documentation for Endeca Experience Manager can be found here [pdf]. Founded in 1999 and based in Cambridge, Massachusetts, Endeca was acquired by Oracle in 2011. Endeca has long been at the forefront of faceted search technology, particularly for large e-commerce and online library systems.

Cynthia Murrell, March 06, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Planning a SharePoint Deployment

March 6, 2014

While in the back of their minds most people know that SharePoint is a complicated installation process, few are willing or able to prepare for its installation appropriately. But Search Windows Server offers some good reasons for proper planning in their article, “How Simple Planning Can Prevent Common SharePoint Deployment Snags.”

The article elaborates on the problem of a lack of planning for installation:

“It’s possible to install most applications and immediately begin using them, but this isn’t really the case with SharePoint. Even though SharePoint is technically an application, it’s better described as an application framework. SharePoint isn’t something you should simply install and then allow users to immediately begin using — you’ll most likely have problems if you do.”

Stephen E. Arnold has heard this same advice many times and has conveyed it himself through his Web site ArnoldIT.com. His coverage of SharePoint stems from a longtime interest in all things search. His reporting also supports the benefits of planning and proper implementation, as user experience improves when the proper planning is done.

Emily Rae Aldridge, March 6, 2014

New MaxxCAT Search Appliance Supports the Rackless

March 5, 2014

Small businesses that employ desktop servers may want to check out MaxxCAT’s latest offering. Virtual-Strategy Magazine announces that “MaxxCAT Brings Search Appliance to Convenient Desktop Form Factor.” They say the idea came from a customer with a desktop server and no rack space; it is nice when companies respond to customer feedback. The press release elaborates:

“The new line of desktop search appliances features a case that is suited for customers needing high performance search but lacking rack space. It is particularly suited to businesses employing tower servers. The new desktop series retains the price and performance MaxxCAT is known for, starting at $2,995 for the 250GB SB-250d capable of handling 2,500 executed Queries Per Minute(QPM). For larger collections or greater performance requirements, the SB-350d is available for $3,995 and features a 500GB index storage size and is capable of handling 5,000 executed Queries Per Minute. Both appliances will come with MaxxCAT’s standard one year of email support and software updates as well as one-year hardware warranty.”

The company is wisely integrating that higher-capacity version, the SB-350d, into its existing education and non-profit programs. Based in Pittsburg, Pennsylvania, MaxxCAT launched in 2007. Though its focus is on specialized, high-performance enterprise search appliances, the company also provides integration services and managed hosting. MaxxCAT also prides itself on providing quick and painless deployments—particularly important for small businesses with limited resources.

Cynthia Murrell, March 05, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Free Online Commerce Advice from SLI Systems

March 5, 2014

Here’s a free book anyone managing e-commerce should look into. SLI Systems’ blog, Site Search Today, suggests we “Create a Better Shopping Experience with Refinements.” Writer Kemberly Gong advises:

“If users encounter a large set of search results for a particular keyword, they can also feel overwhelmed. As an e-commerce merchant, you can help them tremendously by offering refinement options to narrow numerous product results to a manageable set. While site search is considered one of the most essential elements to an e-commerce site, refinements are just as crucial in guiding shoppers to the right product, and to a sale.

“Our new Big Book of Site Search Tips for 2014, available for free download, is filled with ideas for improving this vital part of your site search solution. Here are just a few of the suggestions from the Big Book – and you’ll also find more than 100 tips on everything from search box placement to merchandising in the Big Book.”

The tips Gong cites include: “make refinements intuitive”; “pick the right place for refinements”; “refinements for ratings and reviews”; and “allow users to navigate between refinements.” See the article for more on each of these points. Better yet, see here to download the free book.

More than 500 e-commerce sites use SLI Systems‘ services. The company, founded in 2001, prides itself on its customer service. SLI Systems maintains offices in San Jose, California; London; Melbourne; and Christchurch, New Zealand.

Cynthia Murrell, March 05, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Stanford Offers Free Machine Learning Tool

March 5, 2014

A team from Stanford is bringing machine learning to the masses, for free. Is this bad news for the for-a-fee text analytics vendors? Stanford Engineering announces, “Stanford Scientists Put Free Text-Analysis Tool on the Web.” Writers Andrew Meyers and Tom Abate explain:

“The etcML website is based on machine-learning techniques that were developed to analyze the meaning embodied in text, then gauge its overall positive or negative sentiment. To access this computational engine, users drag and drop text files into a dialog box. ‘We wanted to make standard machine learning techniques available to people and researchers who may not be able to program,’ said Richard Socher, a doctoral candidate in computer science at Stanford and lead developer of etcML. Socher said the new site gives researchers and citizen activists in fields ranging from political science to linguistics an easy way to analyze news articles, social media posts, closed-caption transcripts of television newscasts and other texts of possible interest.’All users have to do is copy and paste, or drop their text datasets into their browser and click,’ Socher said.”

Several Stanford-affiliated folks have already leveraged the beta version of etcML. Rebecca Weiss, who studies political polarization and media coverage in her doctoral work, uses the tool to classify words and phrases and to tease patterns from millions of articles and transcripts. Meanwhile, computational linguistics researcher Rob Voight has employed etcML to determine what factors make a Kickstarter pitch most successful. Computer science doctoral student Chinmay Kulkarni has also put the solution to good use; it helps him make short(er) work of test-grading for a free online course with about 2,000 students.

So, what will the general public make of this “free and powerful” drag-and-drop tool? I played around with it a bit, and the results are interesting. I think the team may still have some tweaking to do— I made a Twitter-sentiment-query on Elizabeth Warren (I know, my politics are showing), and it counted a tweet that read “Education is really important! More money for colleges! #Vote4Warren” as “negative.” Perhaps the for-profit machine learning vendors are safe for now. Check etcML out for yourself here and see what you think.

Cynthia Murrell, March 05, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Cleaning up SharePoint with Records Management

March 5, 2014

Few people would claim that clean up of any variety is their favorite task, but it seems all the worse when dealing with old SharePoint material. CMS Wire meets the challenge head-on in their recent article, “Clean Up SharePoint Legacy Content.”

The article begins:

“The idea of cleaning up legacy SharePoint content is daunting. Organizations often place cleanup under the ‘Nice to Do’ column as opposed to the ‘Must Do’ column. Why not leverage in-house resources? Legacy SharePoint cleanup is a perfect task for the Records Management (RM) department. Reviewing data and applying retention to it are two of our key responsibilities.”

While devoting resources to cleanup can seem impossible, the fact remains that old or badly organized material that lingers on a SharePoint infrastructure is damaging to workflow and efficiency. Stephen E. Arnold is a longtime leader in search and often covers the intricacies of SharePoint on his Web site, ArnoldIT.com. He finds that a clean, lean infrastructure improves user experience, so a spring-cleaning may be just the right thing for your organization.

Emily Rae Aldridge, March 5, 2014

Still Explaining Bayes

March 4, 2014

Bayes’s Theorem is the founding basis for predictive analytics. Gigaom’s article tries to explain how not only Bayes’s Theorem is used in predictive analytics, but there is another factor: “How the Solution To the Monty Hall Problem Is Also The Key To Predictive Analytics.”

The Monty Hall Problem is named after the Let’s Make a Deal host. Here is how it works:

“The show used what came to be known as the Monty Hall Problem, a probability puzzle named after the original host. It works like this: You choose between three doors. Behind one is a car and the other two are Zonks. You pick a door – say, door number one – and the host, who knows where the prize is, opens another door – say, door number three – which has a goat. He then asks if you want to switch doors. Most contestants assume that since they have two equivalent options, they have a 50/50 shot of winning, and it doesn’t matter whether or not they switch doors. Makes sense, right?”

If a data scientist had been on the show, he would have used Bayes’s Theorem to win the prize. The solution is to switch doors.

The Monty Hall Problem is used in business, but Bayes’s Theorem is becoming more widespread. It is used to link big data and cloud computing, which also powers predictive analytics. What follows is an explanation of the theorem’s importance and impact on business, which is not new. It ends with encouraging people to rely on Bayes over Monty Hall.

What will the next metaphor comparison be?

Whitney Grace, March 04, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

People Are Too Busy To Read

March 4, 2014

The Internet is good for many things, especially generating tera-quads of content. News, social media content, videos, etc. pop up every second and people simply do not have the time to read it. The Verge posted the tongue-in-cheek article, “You’re Not Going To Read This” and it talks about the skyrocketing amount of content. The CEO of Chartbeat Tony Haile dropped a bomb for companies that specialize in content, “We’ve found effectively no correlation between social shares and people actually reading [an article].”

What a smack in the face!

People wear tweet and shared numbers like Girl Scout badges. If this has no value, what is the point of having a social media specialist? It’s not that generating content is bad, but people do not have the time to read every article. They usually skim the headlines and tweet without reading what they send. It really is a data overload.

Upworthy, one of the data companies, found different results. They discovered that people who read 25% of an article are likely to tweet it. Companies are actually changing their approach to marketing content, rather than relying on page views they are focusing on how engaged users are. It is measured by the new metric “attention minutes” that measure how people actively pay attention to a Web site along with the amount that is actually paid. Confused yet? It makes sense after reading more of the article.

Do not worry that quality content will go away, though:

“Upworthy’s critics say it maximizes for social media shares, “sending a (false) message to Facebook that those headlines are the stories its users really want to read,” as Reuters columnist Felix Salmon put it. But the company’s new emphasis on the time spent on a story contradicts that claim, suggesting that Upworthy is playing a longer game. While the number of times a story is shared may not be a perfect signal of quality, it’s reassuring to know that stories that hold a reader’s attention all the way to the end are also rewarded by the Twitter sphere.”

Should we roll our eyes or start changing our social media approach? Is this a surprise or just the rise of content marketing and busy MBAs?

Whitney Grace, March 04, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta