July 30, 2015
For anyone using open-source Unix to work with data, IT World has a few tips for you in “The Best Tools and Techniques for Finding Data on Unix Systems.” In her regular column, “Unix as a Second Language,” writer Sandra Henry-Stocker explains:
“Sometimes looking for information on a Unix system is like looking for needles in haystacks. Even important messages can be difficult to notice when they’re buried in huge piles of text. And so many of us are dealing with ‘big data’ these days — log files that are multiple gigabytes in size and huge record collections in any form that might be mined for business intelligence. Fortunately, there are only two times when you need to dig through piles of data to get your job done — when you know what you’re looking for and when you don’t. 😉 The best tools and techniques will depend on which of these two situations you’re facing.”
When you know just what to search for, Henry-Stocker suggests the “grep” command. She supplies a few variations, complete with a poetic example. Sometimes, like when tracking errors, you’re not sure what you will find but do know where to look. In those cases, she suggests using the “sed” command. For both approaches, Henry-Stocker supplies example code and troubleshooting tips. See the article for the juicy details.
Cynthia Murrell, July 30, 2015
July 16, 2015
Peer reviewed journals are supposed to have an extra degree of authority, because a team of experts read and critiqued an academic work. Science 2.0 points out in the article, “Peer Review Is Subjective And The Quality Is Highly Variable” that peer-reviewed journals might not be worth their weight in opinions.
Peer reviews are supposed to be objective criticisms of work, but personal beliefs and political views are working their way into the process and have been for some time. It should not come as a surprise, when academia has been plagued by this problem for decades. It also has also been discussed, but peer review problems are brushed under the rug. In true academic fashion, someone is conducting a test to determine how reliable peer review comments are:
“A new paper on peer review discusses the weaknesses we all see – it is easy to hijack peer review when it is a volunteer effort that can drive out anyone who does not meet the political or cultural litmus test. Wikipedia is dominated by angry white men and climate science is dominated by different angry white men, but in both cases they were caught conspiring to block out anyone who dissented from their beliefs. Then there is the fluctuating nature of guidelines. Some peer review is lax if you are a member, like at the National Academy of Sciences, while the most prominent open access journal is really editorial review, where they check off four boxes and it may never go to peer review or require any data, especially if it matches the aesthetic self-identification of the editor or they don’t want to be yelled at on Twitter.”
The peer review problem is getting worse in the digital landscape. There are suggested solutions, such as banning all fees associated with academic journals and databases, homogenizing review criteria across fields, but the problems would be far from corrected. Reviewers are paid to review works, which likely involves kickbacks of some kind. Also trying to get different academic journals, much less different fields to standardize an issue will take a huge amount of effort and work, if they can come to any sort of agreement.
Fixing the review system will not be done quickly and anytime money is involved, the process is slowed even further. In short, academic journals are far from being objective, which is why it pays to do your own research and take everything with a grain of salt.
July 15, 2015
Writer and web psychologist Liraz Margalit at the Next Web has some important advice for websites in “The Psychology Behind Web Browsing.” Apparently, paying attention to human behavioral tendencies can help webmasters avoid certain pitfalls that could damage their brands. Imagine that!
The article cites a problem an unspecified news site encountered when it tried to build interest in its videos by making them play automatically when a user navigated to their homepage. I suspect I know who they’re talking about, and I recall thinking at the time, “how rude!” I thought it was just because I didn’t want to be chastised by people near me for suddenly blaring a news video. According to Margalit, though, my problem goes much deeper: It’s an issue of control rooted in pre-history. She writes:
“The first humans had to be constantly on alert for changes in their environment, because unexpected sounds or sights meant only one thing: danger. When we click on a website hoping to read an article and instead are confronted with a loud, bright video, the automatic response is not so different from that our prehistoric ancestors, walking in the forest and stumbling upon a bear or a saber-toothed hyena.”
This need for safety has morphed into a need for control; we do not like to be startled or lost. When browsing the Web, we want to encounter what we expect to encounter (perhaps not in terms of content, but certainly in terms of format.) The name for this is the “expectation factor,” and an abrupt assault on the senses is not the only pitfall to be avoided. Getting lost in an endless scroll can also be disturbing; that’s why those floating menus, that follow you as you move down the page, were invented. Margalit notes:
“Visitors like to think they are in charge of their actions. When a video plays without visitors initiating any interaction, they feel the opposite. If a visitor feels that a website is trying to ‘sell’ them something, or push them into viewing certain content without permission, they will resist by trying to take back the interaction and intentionally avoid that content.”
And that, of course, is the opposite of what websites want, so giving users the control they expect is a smart business move. Besides, it’s only polite to ask before engaging a visitor’s Adobe Flash or, especially, speakers.
Cynthia Murrell, July 15, 2015
July 10, 2015
Now, this is fascinating. Scary, but fascinating. MIT News explains how a team of researchers from MIT, Microsoft, and Adobe are “Extracting Audio from Visual Information.” The article includes a video in which one can clearly hear the poem “Mary Had a Little Lamb” as extrapolated from video of a potato chip bag’s vibrations filmed through soundproof glass, among other amazing feats. I highly recommend you take four-and-a-half minutes to watch the video.
Writer Larry Hardesty lists some other surfaces from which the team was able reproduce audio by filming vibrations: aluminum foil, water, and plant leaves. The researchers plan to present a paper on their results at this year’s Siggraph computer graphics conference. See the article for some details on the research, including camera specs and algorithm development.
So, will this tech have any non-spying related applications? Hardesty cites MIT grad student, and first writer on the team’s paper, Abe Davis as he writes:
“The researchers’ technique has obvious applications in law enforcement and forensics, but Davis is more enthusiastic about the possibility of what he describes as a ‘new kind of imaging.’
“‘We’re recovering sounds from objects,’ he says. ‘That gives us a lot of information about the sound that’s going on around the object, but it also gives us a lot of information about the object itself, because different objects are going to respond to sound in different ways.’ In ongoing work, the researchers have begun trying to determine material and structural properties of objects from their visible response to short bursts of sound.”
That’s one idea. Researchers are confident other uses will emerge, ones no one has thought of yet. This is a technology to keep tabs on, and not just to decide when to start holding all private conversations in windowless rooms.
Cynthia Murrell, July 10, 2015
July 10, 2015
Big data is tools help organizations analyze more than their old, legacy data. While legacy data does help an organization study how their process have changed, the data is old and does not reflect the immediate, real time trends. SAS offers a product that bridges old data with the new as well as unstructured and structured data.
The SAS Text Miner is built from Teragram technology. It features document theme discovery, a function the finds relations between document collections; automatic Boolean rule generation; high performance text mining that quickly evaluates large document collection; term profiling and trending, evaluates term relevance in a collection and how they are used; multiple language support; visual interrogation of results; easily import text; flexible entity options; and a user friendly interface.
The SAS Text Miner is specifically programmed to discover data relationships data, automate activities, and determine keywords and phrases. The software uses predictive models to analysis data and discover new insights:
“Predictive models use situational knowledge to describe future scenarios. Yet important circumstances and events described in comment fields, notes, reports, inquiries, web commentaries, etc., aren’t captured in structured fields that can be analyzed easily. Now you can add insights gleaned from text-based sources to your predictive models for more powerful predictions.”
Text mining software reveals insights between old and new data, making it one of the basic components of big data.
Whitney Grace, July 10, 2015
June 21, 2015
Short honk. Looking for an item on Craigslist.org. The main Craigslist.org site wants you to look in your area and then manually grind through listings for other areas region by region. I read “How to Search All Craigslist at Once.” The article does a good job of explaining how to use Google and Ad Huntr. The write lists some other Craigslist search tools as well. A happy quack for Karar Halder, who assembled the article.
Stephen E Arnold, June 21, 2015
June 20, 2015
Think back. Vivisimo asserted that it deduplicated and presented federated search results. There are folks at Oracle who have pointed to Outside In and other file conversion products available from the database company as a way to deal with different types of data. There are specialist vendors, which I will not name, who are today touting their software’s ability to turn a basket of data types into well-behaved rows and columns complete with metatags.
Well, not so fast.
Unifying structured and unstructured information is a time consuming, expensive process. The reasons for the obese exception files where objects which cannot be processed go to live out their short, brutish lives.
I read “Tamr Snaps Up $25.2 Million to Unify Enterprise Data.” The stakeholders know, as do I, that unifying disparate types of data is an elephant in any indexing or content analytics conference room. Only the naive believe that software whips heterogeneous data into Napoleonic War parade formations. Today’s software processing tools cannot get undercover police officers to look ship shape for the mayor.
Ergo, an outfit with an aversion to the vowel “e” plans to capture the flag on top of the money pile available for data normalization and information polishing. The write up states:
Tamr can create a central catalogue of all these data sources (and spreadsheets and logs) spread out across the company and give greater visibility into what exactly a company has. This has value on so many levels, but especially on a security level in light of all the recent high-profile breaches. If you do lose something, at least you have a sense of what you lost (unlike with so many breaches).
Tamr is correct. Organizations don’t know what data they have. I could mention a US government agency which does not know what data reside on the server next to another server managed by the same system administrator. But I shall not. The problem is common and it is not confined to bureaucratic blenders in government entities.
Tamr, despite the odd ball spelling, has Michael Stonebraker, a true wizard on the task. The write up mentions an outfit what might be politely described as a “database challenge” as a customer. If Thomson Reuters cannot figure out data after decades of efforts and millions upon millions of investment, believe me when I point out that Tamr may be on to something.
Stephen E Arnold, June 20, 2015
June 18, 2015
For SharePoint managers and users, continued education and training is essential. There are lots of opportunities for virtual and face-to-face instruction. Benzinga gives some attention to one training option, the upcoming SharePoint Fest Seattle, in their recent article, “Chris McNulty to Lead 2 Sessions and a Workshop at SharePoint Fest Seattle.”
The article begins:
“Chris McNulty will preside over a full day workshop at SharePoint Fest Seattle on August 18th, 2015, as well as conduct two technical training sessions on the 19th and 20th. Both the workshops and sessions are to be held at the Washington State Convention Center in downtown Seattle.”
In addition to all of the great training opportunities at conferences and other face-to-face sessions, staying on top of the latest SharePoint news and online training opportunities is also essential. For a one-stop-shop of all the latest SharePoint news, stay tuned to Stephen E. Arnold’s Web site, ArnoldIT.com, and his dedicated SharePoint feed. He has turned his longtime career in search into a helpful Web service for those that need to stay on top of the latest SharePoint happenings.
Emily Rae Aldridge, June 18, 2015
April 28, 2015
Email is still a relatively new concept in the grander scheme of technology, having only been around since the 1990s. As with any human activity, people want to learn more about the trends and habits people have with email. Popular Science has an article called “Here’s What Scientists Learned In The Largest Systematic Study Of Email Habits” with a self-explanatory title. Even though email has been around for over twenty years, no one is quite sure how people use it.
So someone decided to study email usage:
“…researchers from Yahoo Labs looked at emails of two million participants who sent more than 16 billion messages over the course of several months–by far the largest email study ever conducted. They tracked the identities of the senders and the recipients, the subject lines, when the emails were sent, the lengths of the emails, and the number of attachments. They also looked at the ages of the participants and the devices from which the emails were sent or checked.”
The results were said to be so predictable that an algorithm could have predicted them. Usage has a strong correlation to age groups and gender. The young write short, quick responses, while men are also brief in their emails. People also responded more quickly during work hours and the more emails they receive the less likely they are to write a reply. People might already be familiar with these trends, but the data is brand new to data scientists. The article predicts that developers will take the data and design better email platforms.
How about creating an email platform that merges a to-do list with emails, so people don’t form their schedules and tasks from the inbox.
April 28, 2015
Ah, more publisher excitement. Neuroskeptic, a blogger at Discover, weighs in on a spat between scientific journals in, “Academic Journals in Glass Houses….” The write-up begins by printing a charge lobbed at Frontiers in Psychology by the Journal of Nervous and Mental Disease (JNMD), in which the latter accuses the former of essentially bribing peer reviewers. It goes on to explain the back story, and why the blogger feels the claim against Frontiers is baseless. See the article for those details, if you’re curious.
Here’s the part that struck me: Neuroskeptic supplies the example hinted at in his or her headline:
“For the JNMD to question the standards of Frontiers peer review process is a bit of a ‘in glass houses / throwing stones’ moment. Neuroskeptic readers may remember that it was JNMD who one year ago published a paper about a mysterious device called the ‘quantum resonance spectrometer’ (QRS). This paper claimed that QRS can detect a ‘special biological wave… released by the brain’ and thus accurately diagnose schizophrenia and other mental disorders – via a sensor held in the patient’s hand. The article provided virtually no details of what the ‘QRS’ device is, or how it works, or what the ‘special wave’ it is supposed to measure is. Since then, I’ve done some more research and as far as I can establish, ‘QRS’ is an entirely bogus technology. If JNMD are going to level accusations at another journal, they ought to make sure that their own house is in order first.”
This is more support for the conclusion that many of today’s “academic” journals cannot be trusted. Perhaps the profit-driven situation will be overhauled someday, but in the meantime, let the reader beware.
Cynthia Murrell, April 28, 2015