Tidy Text the Best Way to Utilize Analytics

August 10, 2017

Even though text mining is nothing new natural language processing seems to be the hot new analytics craze. In an effort to understand the value of each, along with the difference, and (most importantly) how to use either efficiently, O’Reilly interviewed text miners, Julia Silge and David Robinson, to learn about their approach.

When asked what advice they would give those drowning in data, they replied,

…our advice is that adopting tidy data principles is an effective strategy to approach text mining problems. The tidy text format keeps one token (typically a word) in each row, and keeps each variable (such as a document or chapter) in a column. When your data is tidy, you can use a common set of tools for exploring and visualizing them. This frees you from struggling to get your data into the right format for each task and instead lets you focus on the questions you want to ask.

The due admits text mining and natural language processing overlap in many areas but both are useful tools for different issues. They regulate text mining to statistical analysis and natural language processing to the relationship between computers and language. The difference may seem minute but with data mines exploding and companies drowning in data, such advice is crucial.

Catherine Lamsfuss, August 10, 2017

Big Data Visualization the Open Source Way

August 10, 2017

Big Data though was hailed in a big way, it is yet to gain full steam because of a shortage of talent. Companies working in this domain are taking another swipe by offering visualization tools for free.

The Customize Windows in an article titled List of Open Source Big Data Visualization Tools:

There are some growing number of websites which write about Big Data, cloud computing and spread wrong information to sell some others paid things.

Many industries have tried the freemium route to attract talent and promote the industry. For instance, Linux OS maker Penguin Computing offered its product for free to users. This move sparked interest among users who wanted to try something other than Windows and Mac.

The move created a huge user base of Linux users and also attracted talent to promote research and development.

Big Data players it seems is following the exact strategy by offering data visualization tools free, which they will monetize later. All that is needed now is patience.

Vishal Ingole, August 10, 2017

Wield Buzzwords with Precision

July 10, 2017

It is difficult to communicate clearly when folks don’t agree on what certain words mean. Nature attempts to clear up confusion around certain popular terms in, “Big Science Has a Buzzword Problem.” We here at Beyond Search like to call jargon words “cacaphones,” but the more traditional “buzzwords” works, too. Writer Megan Scudellari explains:

‘Moonshot’, ‘road map’, ‘initiative’ and other science-planning buzzwords have meaning, yet even some of the people who choose these terms have trouble defining them precisely. The terms might seem interchangeable, but close examination reveals a subtle hierarchy in their intentions and goals. Moonshots, for example, focus on achievable, but lofty, engineering problems. Road maps and decadal surveys (see ‘Alternate aliases’) lay out milestones and timelines or set priorities for a field. That said, many planning projects masquerade as one title while acting as another.

Strategic plans that bear these lofty names often tout big price tags and encourage collaborative undertakings…. The value of such projects is continually debated. On one hand, many argue that the coalescence of resources, organization and long-term goals that comes with large programmes is crucial to science advancement in an era of increasing data and complexity. … Big thinking and big actions have often led to success. But critics argue that buzzword projects add unnecessary layers of bureaucracy and overhead costs to doing science, reduce creativity and funding stability and often lack the basic science necessary to succeed.

In order to help planners use such terms accurately, Scudellari supplies definitions, backgrounds, and usage guidance for several common buzzwords: “moonshot,” “roadmap,” “initiative,” and “framework.” There’s even a tool to help one decide which term best applies to any given project. See the article to explore these distinctions.

Cynthia Murrell, July 10, 2017

Wall Street Can Learn from Google

May 30, 2017

Ruth Porat, CFO, Alphabet tells Economic Club of New York that Wall Street should have an open culture like Google which has helped the company to keep profit levels high and investors happy.

CNBC in its news piece titled Ruth Porat Suggests Financial Crisis Could’ve Been Avoided If Wall Street Acted More Like Google said:

Ruth Porat, the former veteran Morgan Stanley executive who’s now chief financial officer of Alphabet, suggested Monday that the financial crisis could have been prevented — or at least made less severe — if Wall Street had operated with the same transparency as Google’s parent company.

Google has no employee stock option at present. According to Porat, this eliminates the possibility of employees rigging the financial numbers or engaging in financial engineering. For Google, its greatest threat is the pace of innovation.

The company has a weekly meet TGIF wherein executives are asked tough questions by employees on any aspect of the company. Porat feels it is this tool that has helped Alphabet maintain transparency and Wall Street has something to learn from it.

Vishal Ingole, May 30, 2017

Malware Infected USB Sticks on the Loose

May 18, 2017

Oops. We learn from TechRepublic that “IBM Admits it Sent Malware-Infected USB Sticks to Customers.”

The article cites the company’s support Advisory Post announcing the problem, a resource anyone who has received an IBM Storwize V3500, V3700 or V5000 USB drive should check for the models and serial numbers affected. The recommended fix—destroy the drive and, if you’d already inserted it, perform a malware purge on your computer.

Writer Conner Forrest describes:

So, what does the infected drive actually do to a system? ‘When the initialization tool is launched from the USB flash drive, the tool copies itself to a temporary folder on the hard drive of the desktop or laptop during normal operation,’ the IBM post said. Then, a malicious file is copied to a temporary folder called %TMP%\initTool on Windows or /tmp/initTool on Linux or Mac. It is important to note that, while the file is copied onto a machine, it isn’t actually executed during the initialization process, the post also said. As reported by ZDNet’s Danny Palmer, the malware was listed by Kaspersky lab as a member of the Reconyc Trojan malware family, which is primarily used in Russia and India.

It might be understandable if this were the first time this had happened, but IBM also unwittingly distributed infected USB drives back in 2010, at the AusCERT conference in Australia. Let us hope there is not a third time; customers rightly expect more vigilance from such a prominent company.

Cynthia Murrell, May 18, 2017

Some Web Hosting Firms Overwhelmed by Scam Domains

January 27, 2017

An article at Softpedia should be a wakeup call to anyone who takes the issue of online security lightly—“One Crook Running Over 120 Tech Support Scam Domains on GoDaddy.” Writer Catalin Cimpanu explains:

A crook running several tech support scam operations has managed to register 135 domains, most of which are used in his criminal activities, without anybody preventing him from doing so, which shows the sad state of Web domain registrations today. His name and email address are tied to 135 domains, as MalwareHunterTeam told Softpedia. Over 120 of these domains are registered and hosted via GoDaddy and have been gradually registered across time.

The full list is available at the end of this article (text version here), but most of the domains look shady just based on their names. Really, how safe do you feel navigating to ‘security-update-needed-sys-filescorrupted-trojan-detected[.]info’? How about ‘personal-identity-theft-system-info-compromised[.]info’?

Those are ridiculously obvious, but it seems to be that GoDaddy’s abuse department is too swamped to flag and block even these flagrant examples. At least that hosting firm does have an abuse department; many, it seems, can only be reached through national CERT teams. Other hosting companies, though, respond with the proper urgency when abuse is reported—Cimpanu holds up Bluehost and PlanetHoster as examples. That is something to consider for anyone who thinks the choice of hosting firm is unimportant.

We are reminded that educating ourselves is the best protection. The article links to a valuable tech support scam guide provided by veteran Internet security firm Malwarebytes, and suggests studying the wikis or support pages of other security vendors.

Cynthia Murrell, January 27, 2017

Google Needs a Time-Out for Censorship, But Who Will Enforce Regulations

January 26, 2017

The article on U.S. News and World Report titled The New Censorship offers a list of the ways in which Google is censoring its content, and builds a compelling argument for increased regulation of Google. Certain items on the list, such as pro-life music videos being removed from YouTube, might have you rolling your eyes, but the larger point is that Google simply has too much power over what people see, hear, and know. The most obvious problem is Google’s ability to squash a business simply by changing its search algorithm, but the myriad ways that it has censored content is really shocking. The article states,

No one company, which is accountable to its shareholders but not to the general public, should have the power to instantly put another company out of business or block access to any website in the world. How frequently Google acts irresponsibly is beside the point; it has the ability to do so, which means that in a matter of seconds any of Google’s 37,000 employees with the right passwords or skills could laser a business or political candidate into oblivion…

At times the article sounds like a sad conservative annoyed that the most influential company in the world tends toward liberal viewpoints. Hearing white male conservatives complain about discrimination is always a little off-putting, especially when you have politicians like Rand Paul still defending the right of businesses to refuse service based on skin color. But from a liberal standpoint, just because Google often supports left-wing causes like gun control or the pro-choice movement doesn’t mean that it deserves a free ticket to decide what people are exposed to. Additionally, the article points out that the supposed “moral stands” made by Google are often revealed to be moneymaking or anticompetitive schemes. Absolute power corrupts no matter who yields it, and companies must be scrutinized to protect the interests of the people.

Chelsea Kerwin, January 26, 2017

Searchy Automates Your Search Parameters

January 25, 2017

The article on FileForum Beta News titled Searchy for Windows 0.5.1 promises users the ability to gain more control over their search parameters and prevent wasted time on redundant searches.  By using search scopes, categories, and search templates, Searchy claims to simplify and organize search. The service targets users who tend to search for similar items all day, and makes it easier for those users to find what they need without all that extra typing. The article goes into more detail,

Your daily routine consists of lots repetitive searches? With Searchy you can automate that. Just write a template for similar search queries and stop typing the same things over and over… Search using Google’s and Bing’s web, image, video and news search engines. Often performing searches on same websites? Spending much time on advanced search filters in Google or Bing? Searchy will simplify that too. Just add scopes for the websites and search filters, and use them like a boss.

Searchy was developed by freelance developer Alex Kaul, who found that entering the same phrase over and over in Google was annoying. By automating the search phrase, Searchy enables users to skip a step. It may be a small step, but as we all know, a small task when completed one hundred times a day becomes a very large and tiresome one.

Chelsea Kerwin, January 25, 2017

You Too, Can Learn Linear Algebra

January 24, 2017

Algebra was invented in Persia nearly one thousand years ago. It is one of the fundamental branches of mathematics and its theories are applied to many industries.  Algebra ranges from solving for x to complex formulas that leave one scratching their head.  If you are interested in learning linear algebra, then you should visit Sheldon Axler’s Web site.  Along with an apparent love for his pet cat, Axler is a professor of mathematics at San Francisco State University.

On his Web site, Axler lists the various mathematics books he has written and contributed too.  It is an impressive bibliography and his newest book is titled, Linear Algebra Abridged.  He describes the book as:

Linear Algebra Abridged is generated from Linear Algebra Done Right (third edition) by excluding all proofs, examples, and exercises, along with most comments. Learning linear algebra without proofs, examples, and exercises is probably impossible. Thus this abridged version should not substitute for the full book. However, this abridged version may be useful to students seeking to review the statements of the main results of linear algebra.

Algebra can be difficult, but as Axler wrote above learning linear algebra without proofs is near impossible.  However, if you have a grounded understanding of algebra and are simply looking to brush up or study linear principles without spending a sizable chunk on the textbook, then this is a great asset.  The book is free to download from Axler’s Web site, along with information on how to access the regular textbook.

Whitney Grace, January 24, 2017

Hacks to Make Your Google Dependence Even More Rewarding

January 24, 2017

The article on MakeUseOf titled This Cool Website Will Teach You Hundreds of Google Search Tips refers to SearchyApp, a collection of tricks, tips, and shortcuts to navigate Google search more easily. The lengthy list is divided into sections to be less daunting to readers. The article explains,

What makes this site so cool is that the tips are divided into sections, so it’s easy to find what you want. Here are the categories: Facts (e.g. find the elevation of a place, get customer service number,…) Math (e.g. solve a circle, use a calculator, etc.), Operators (search within number range, exclude a keyword from results, find related websites, etc.), Utilities (metronome, stopwatch, tip calculator, etc.), Easter Eggs (42, listen to animal sounds, once in a blue moon, etc.).

The Easter Eggs may be old news, but if you haven’t looked into them before they are a great indicator of Google’s idea of a hoot. But the Utilities section is chock full of useful little tools from dice roller to distance calculator to converting units to translating languages. Also useful are the Operators, or codes and shortcuts to tell Google what you want, sometimes functioning as search restrictions or advanced search settings. Operators might be wise to check out for those of us who forgot what our librarians taught us about online search as well.

Chelsea Kerwin, January 24, 2017

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta