April 28, 2015
Email is still a relatively new concept in the grander scheme of technology, having only been around since the 1990s. As with any human activity, people want to learn more about the trends and habits people have with email. Popular Science has an article called “Here’s What Scientists Learned In The Largest Systematic Study Of Email Habits” with a self-explanatory title. Even though email has been around for over twenty years, no one is quite sure how people use it.
So someone decided to study email usage:
“…researchers from Yahoo Labs looked at emails of two million participants who sent more than 16 billion messages over the course of several months–by far the largest email study ever conducted. They tracked the identities of the senders and the recipients, the subject lines, when the emails were sent, the lengths of the emails, and the number of attachments. They also looked at the ages of the participants and the devices from which the emails were sent or checked.”
The results were said to be so predictable that an algorithm could have predicted them. Usage has a strong correlation to age groups and gender. The young write short, quick responses, while men are also brief in their emails. People also responded more quickly during work hours and the more emails they receive the less likely they are to write a reply. People might already be familiar with these trends, but the data is brand new to data scientists. The article predicts that developers will take the data and design better email platforms.
How about creating an email platform that merges a to-do list with emails, so people don’t form their schedules and tasks from the inbox.
April 28, 2015
Ah, more publisher excitement. Neuroskeptic, a blogger at Discover, weighs in on a spat between scientific journals in, “Academic Journals in Glass Houses….” The write-up begins by printing a charge lobbed at Frontiers in Psychology by the Journal of Nervous and Mental Disease (JNMD), in which the latter accuses the former of essentially bribing peer reviewers. It goes on to explain the back story, and why the blogger feels the claim against Frontiers is baseless. See the article for those details, if you’re curious.
Here’s the part that struck me: Neuroskeptic supplies the example hinted at in his or her headline:
“For the JNMD to question the standards of Frontiers peer review process is a bit of a ‘in glass houses / throwing stones’ moment. Neuroskeptic readers may remember that it was JNMD who one year ago published a paper about a mysterious device called the ‘quantum resonance spectrometer’ (QRS). This paper claimed that QRS can detect a ‘special biological wave… released by the brain’ and thus accurately diagnose schizophrenia and other mental disorders – via a sensor held in the patient’s hand. The article provided virtually no details of what the ‘QRS’ device is, or how it works, or what the ‘special wave’ it is supposed to measure is. Since then, I’ve done some more research and as far as I can establish, ‘QRS’ is an entirely bogus technology. If JNMD are going to level accusations at another journal, they ought to make sure that their own house is in order first.”
This is more support for the conclusion that many of today’s “academic” journals cannot be trusted. Perhaps the profit-driven situation will be overhauled someday, but in the meantime, let the reader beware.
Cynthia Murrell, April 28, 2015
April 27, 2015
The article on PCWorld titled For Attensity’s BI Parsing Tool, Emoticons Are No Problem explains the recent attempts at fine-tuning the monitoring and relaying the conversations about a particular organization or enterprise. The amount of data that must be waded through is massive, and littered with non-traditional grammar, language and symbols. Luminoso is one company interested in aiding companies with their Compass tool, in addition to Attensity. The article says,
“Attensity’s Semantic Annotation natural-language processing tool… Rather than relying on traditional keyword-based approaches to assessing sentiment and deriving meaning… takes a more flexible natural-language approach. By combining and analyzing the linguistic structure of words and the relationship between a sentence’s subject, action and object, it’s designed to decipher and surface the sentiment and themes underlying many kinds of common language—even when there are variations in grammatical or linguistic expression, emoticons, synonyms and polysemies.”
The article does not explain how exactly Attensity’s product works, only that it can somehow “understand” emoticons. This seems like an odd term though, and most likely actually refers to a process of looking it up from a list rather than actually being able to “read” it. At any rate, Attensity promises that their tool will save in hundreds of human work hours.
Chelsea Kerwin, April 27, 2014
February 12, 2015
MakeUseOf offers us a list of graphic-making options in its “4 Data Visualization Tools for Captivating Data Journalism.” Writer Brad Jones describes four options, ranging from the quick and easy to more complex solutions. The first entry, Tableau Public, may be the best place for new users to start. The write-up tells us:
“Data visualization can be a very complex process, and as such the programs and tools used to achieve good results can be similarly complex. Tableau Public, at first glance, is not — it’s a very accommodating, intuitive piece of software to start using. Simply import your data as a text file, an Excel spreadsheet or an Access database, and you’re up and running.
“You can create a chart simply by dragging and dropping various dimensions and measures into your workspace. Figuring out exactly how to produce the sort of visualizations you’re looking for might take some experimentation, but there’s no great challenge in creating simple charts and graphs.
“That said, if you’re looking to go further, Tableau Public can cater to you. It’ll take some time on your part to really understand the breadth of what’s on offer, but it’s a matter of learning a skill rather than the program itself being difficult to use.”
Cynthia Murrell, February 12, 2015
January 19, 2015
Fujitsu has joined many other companies by taking Hadoop and creating its own software from it to leverage big data. IT Web Open Source’s article, “Fujitsu Makes It Easy For Customers To Reap The Benefits Of Big Data With PRIMEFLEX For Hadoop” divulges the details about the new software.
The new Hadoop application is part of Fijitsu’s PRIMEFLEX software line of workload specific integrated systems. Its purpose is similar to many other big data software on the market: harness big data and make use of actionable analytics. Fujitsu describes it as a wonder software:
“Fujitsu has developed PRIMEFLEX for Hadoop to simplify and tame big data. The powerful, dedicated all-in-one hardware cluster is designed to integrate with existing hardware infrastructures, introducing distributed parallel processing based on Cloudera Enterprise Hadoop. This is an open-source software framework which gathers, processes and analyses data from various sources, then puts together and presents the big picture on how to act on the information gathered.”
Fijitsu is a recognized and respected brand, but the big data market is saturated with other companies that offer comparable software. Other companies also started with a Hadoop based application as part of their software line-up. Fujitsu is entering the Hadoop analytics a little late.
January 16, 2015
Organizing uploaded content is a pain in the rear. In order to catalog the content, users either have to add tags manually or use an automated system that requires several tedious fields to be filled out. CMS Wire explains the difficulties with document organization in “Stop Pulling Teeth: A Better Way To Classify Documents.” Manual tagging is the longer of the two processes and if no one created a set of tagging standards, tags will be raining down from the cloud in a content mess. Automated fields are not that bad to work with if you have one or two documents to upload, but if you have a lot of files to fill out you are more prone to fill out the wrong information to finish the job.
Apparently there is a happy medium:
“Encourage users to work with documents the way they normally do and use a third party tool such as an auto classification tool to extract text based content, products, subjects and terms out of the document. This will create good, standardized metadata to use for search refinement. It can even be used to flag sensitive information or report content detected with code names, personally identifiable information such as credit card numbers, social security numbers or phone numbers.”
While the suggestion is sound, we thought that auto-classification tools were normally built in collaborative content platform like SharePoint. Apparently not. Third party software to improve enterprise platforms once more saves the day for the digital paper pusher.
January 8, 2015
The article titled 15 Website Personalization and Recommendation Software Tools on Smart Insights contains a roundup of personalization software. Think of Amazon.com. Groups of customers see vastly different suggestions from the store, all based on what they have bought or looked at in the past and what other people who bought or looked at similar items also considered. But in the last few years personalization software has become even more tailored to specific pursuits. The article explains the winning brands in one category, B2B and publisher personalization tools,
“Evergage is mentioned as tool that fits best in this category. WP Greet Box is a personalisation plug-in used by WordPress blogging users, including me once, to deliver a welcome message to first time users depending on their referrers. It’s amazing this approach isn’t used more on commercial sites. WP Marketing Suite is another WordPress plugin that has been featured in the comments.”
The article also explores the best in the category of Commerce management systems. The article states that “both Sitecore and Kentico have built in tools to personalize content based on various rules, such as geo-location, search terms…” this is in addition to the more widely understood personalization based on user behavior. The idea behind all of these companies is to improve search for consumers.
Chelsea Kerwin, January 08, 2014
December 30, 2014
Despite budget cuts in academic research with print materials, higher education is clamoring for more digital content. You do not need Google Translate to understand that means more revenue for companies in that industry. Virtual Strategy writes that someone wants in on the money: “With Luxid Content Enrichment Platform, Cairn.info Automates The Extraction Of Bibliographic References And The Linking To Corresponding Article.”
Temis is an industry leader in semantic content enrichment solutions for enterprise and they signed a license and service agreement with CAIRN.info. CAIRN.info is a publishing portal for social sciences and humanities, providing students with access to the usual research fare.
Taking note of the changes in academic research, CAIRN.info wants to upgrade its digital records for a more seamless user experience:
“To make its collection easier to navigate, and ahead of the introduction of an additional 20.000 books which will consolidate its role of reference SSH portal, Cairn.info decided to enhance the interconnectedness of SSH publications with semantic enrichment. Indeed, the body of SSH articles often features embedded bibliographic references that don’t include actual links to the target document. Cairn.info therefore chose to exploit the Luxid® Content Enrichment Platform, driven by a customized annotator (Skill Cartridge®), to automatically identify, extract, and normalize these bibliographic references and to link articles to the documents they refer to.”
A round of applause for Cairn.info, realizing that making research easier will help encourage more students to use its services. If only academic databases would take ease of use into consideration and upgrade their UI dashboards.
December 26, 2014
The interesting tool called WikiSummarizer presents a summary of Wikipedia articles, particularly useful for students and consultants. Rather than reading the full text of a Wikipedia article (which is, yes, already a condensed text) you can now search for summarized article to get the headlines of a given subject. The FAQ’s for WikiSummarizer explain,
“WikiSummarizer automatically summarizes the Wikipedia articles. The program identifies the most important keywords and ranks them by relevancy. For each keyword the most significant sentences in the original text are presented to the reader. You instantly get the headlines with the most important sentences and keywords. The blending of visualization with summarization, knowledge browsing, mind mapping provides you with a wide range of means to explore relevant content. At a glance, without much reading, you immediately spot the key information chunks.”
Perhaps someday soon, we will be able to read nothing at all and know… the “chunks.” For example, when you search the keyword Hamlet, (the play) what Wikipedia decides to promote as the most relevant information is when Shakespeare wrote it and what the story was based on. This is followed by several blurbs summarizing the play itself and then a brief description of the critical reception among Romantics, providing what reads as a Sparknote of a Sparknote. WikiSummarizer offers visual summary maps, visual trees, and word clouds connected to the Wikipedia Knowledge base.
Chelsea Kerwin, December 26, 2014
December 11, 2014
We’ve learned that Sail Labs has put out the next iteration of its Media Mining Indexer from the company’s post, “Sail Labs Announces Availability of Release Version 2014-2 and Media Mining Indexer 6.3.” The refreshingly straightforward press release offers bulleted lists of new features and major changes to be found throughout the new version. For the indexer, it lists:
- Support for sentiment analysis, i.e. classification of text segments into positive, negative, neutral or mixed sentiment
- Currently supported languages: US and International English, German and Russian
- Support for continuous intermittent result output, without final XML result, which increases performance in cases where collective results are not required.
- Support for licensing using a central license manager/server (LiMa), which is intended for use with cloud based use cases.
- Script-based building of language models using lmtscript.
For those not already familiar with Media Mining Indexer, it processes speech from multiple sources into XML, which can then be uploaded into a range of digital-asset-management systems for subsequent search and retrieval. The software boasts automatic speech recognition, speaker ID, speaker change detection, story detection, and topic classification.
Sail Labs specializes in high-end software for speech and multimedia analysis for vertical markets. Its name derives from “Speech Artificial Intelligence Language Laboratories.” Sail Labs is located in Vienna, Austria, and was founded in 1999.
Cynthia Murrell, December 11, 2014