Big Data Too Is Prone to Human Bug

August 2, 2017

Conventional wisdom says Big Data being a realm of machines is immune from human behavioral traits like discrimination. Insights from data scientists, however, are different.

According to an article published by PHYS.ORG titled Discrimination, Lack of Diversity, and Societal Risks of Data Mining Highlighted in Big Data, the author says:

Despite the dramatic growth in big data affecting many areas of research, industry, and society, there are risks associated with the design and use of data-driven systems. Among these are issues of discrimination, diversity, and bias.

The crux of the problem is the way data is mined, processed and decisions made. At every step, humans need to be involved in order to tell machines how each of these processes are executed. If the person guiding the system is biased, these biases are bound to seep into the subsequent processes in some way.

Apart from decisions like granting credit, human resources which also is being automated may have diversity issues. The fundamental remains the same in this case too.

Big Data was touted as the next big thing and may turn out to be so, but most companies are yet to figure out how to utilize it. Streamlining the processes and making them efficient would be the next step.

Vishal Ingole, August 2, 2017

Google Invests in Robot Reporters

July 27, 2017

People fear that robots will replace them in the workforce, but reporters did not have to deal with this worry.  Machines lack the capability to write cohesive news pieces, except that robots are getting smarter.  Google might become the bane of news reporters, because of Business Insider shares that, “Google Is Giving The Press Association £622,000 To Create An Army Of Robot Reporters.”  Google granted the Press Association £622,000 ($810,000) to develop robots the can write 30,000 stories per day for news outlets.

The funds come from Google’s Digital News Initiative will dole out the $810,000 over three years to “stimulate and support innovation in digital journalism across Europe’s news industry.”  The Press Association dubbed the project “Reporters and Data and Robots” (RADAR) that will also run in tandem with the news startup Urbs Media.  The robots will produce stories by:

The robot reporters will draw on open data sets on the internet and use natural Language Generation (NLG) software to produce their copy, PA said.

 

The data sets — to be identified and recorded by a new team of five human journalists — will come from government departments, local authorities, NHS Trusts and more, PA said, adding that they will provide detailed story templates across a range of topics including crime, health, and employment.

The head of the Press Association says that RADAR will ease pressures on news outlets in a cost-effective way while providing local stories.  While this might work, the naysayers are stating that human reports are still needed to cover local news, because it requires investigation and personal relationships.  All we can say is that both arguments are correct.

Whitney Grace, July 27, 2017

Drugmaker Merk Partners with Palantir on Data Analysis

July 21, 2017

Pharmaceutical company Merk is working with data-analysis firm Palantir on a project to inform future research, we learn from the piece, “Merk Forges Cancer-Focused Big Data Alliance with Palantir” at pharmaceutical news site PMLive. The project is an effort to remove the bottleneck that currently exists between growing silos of medical data and practical applications of that information. Writer Phil Taylor specifies:

Merck will work with Palantir on cancer therapies in the first instance, with the aim of developing a collaborative data and analytics platform for the drug development processes that will give researchers new understanding of how new medicines work. Palantir contends that many scientists in pharma companies struggle with unstructured data and information silos that ‘reduce creativity and limit researchers’ corrective analyses’. The data analytics and sharing platform will help Merck researchers analyse real-world and bioinformatics data so they can ‘understand the patients who may benefit most’ from a treatment.

The alliance also has a patient-centric component, and according to Merck will improve the experience of patients using its products, improve adherence as well as provide feedback on real-world efficacy.

Finally, the two companies will collaborate on a platform that will allow improved global supply chain forecasting and help to get medicines to patients who need them around the world as quickly as possible. Neither company has disclosed any financial details on the deal.

This is no surprise move for the 125-year-old Merk, which has been embracing digital technology in part by funding projects around the world. Known as MSD everywhere but the U.S. and Canada, the company started with a small pharmacy in Germany but now has its headquarters in New Jersey.

Palantir has recently stirred up some controversy. The company’s massive-scale data platforms allow even the largest organizations to integrate, manage, and secure all sorts of data. Its founding members include PayPal alumni and Stanford computer-science grads. The company is based in Palo Alto, California, and has offices around the world.

Cynthia Murrell, July 21, 2017

Women in Tech Want Your Opinion on Feminism and Other Falsehoods Programmers Believe

July 14, 2017

The collection of articles on Github titled Awesome Falsehood dives into some of the strange myths and errors believed by tech gnomes and the issues that they can create. For starters, falsehoods about names. Perhaps you have encountered the tragic story of Mr. Null, who encounters a dilemma whenever inputting his last name in a web form because it often will be rejected or even crash the system.

The article explains,

This has all gotten to the point where I’ve developed a number of workarounds for times when this happens. Turning my last name into a combination of my middle name and last name, or middle initial and last name, sometimes works, but only if the website doesn’t choke on multi-word last names. My usual trick is to simply add a period to my name: “Null.” This not only gets around many “null” error blocks, it also adds a sense of finality to my birthright.

Another list expands on the falsehoods about names that programmers seem to buy into. These include cultural cluelessness about people having first names and last names that never change and are all different. Along those lines, one awesome female programmer wrote a list of falsehoods about women in tech, such as their existence revolving around a desire for a boyfriend or to complete web design tasks. (Also, mansplaining is their absolute favorite, did you know?) Another article explores falsehoods about geography, such as the mistaken notion that all places only have one official name, or even one official name per language, or one official address. While the lists may reinforce some negative stereotypes we have about programmers, they also expose the core issues that programmers must resolve to be successful and effective in their jobs.

Chelsea Kerwin, July 14, 2017

Wield Buzzwords with Precision

July 10, 2017

It is difficult to communicate clearly when folks don’t agree on what certain words mean. Nature attempts to clear up confusion around certain popular terms in, “Big Science Has a Buzzword Problem.” We here at Beyond Search like to call jargon words “cacaphones,” but the more traditional “buzzwords” works, too. Writer Megan Scudellari explains:

‘Moonshot’, ‘road map’, ‘initiative’ and other science-planning buzzwords have meaning, yet even some of the people who choose these terms have trouble defining them precisely. The terms might seem interchangeable, but close examination reveals a subtle hierarchy in their intentions and goals. Moonshots, for example, focus on achievable, but lofty, engineering problems. Road maps and decadal surveys (see ‘Alternate aliases’) lay out milestones and timelines or set priorities for a field. That said, many planning projects masquerade as one title while acting as another.

Strategic plans that bear these lofty names often tout big price tags and encourage collaborative undertakings…. The value of such projects is continually debated. On one hand, many argue that the coalescence of resources, organization and long-term goals that comes with large programmes is crucial to science advancement in an era of increasing data and complexity. … Big thinking and big actions have often led to success. But critics argue that buzzword projects add unnecessary layers of bureaucracy and overhead costs to doing science, reduce creativity and funding stability and often lack the basic science necessary to succeed.

In order to help planners use such terms accurately, Scudellari supplies definitions, backgrounds, and usage guidance for several common buzzwords: “moonshot,” “roadmap,” “initiative,” and “framework.” There’s even a tool to help one decide which term best applies to any given project. See the article to explore these distinctions.

Cynthia Murrell, July 10, 2017

Mistakes to Avoid to Implement Hadoop Successfully

July 7, 2017

Hadoop has been at the forefront of Big Data implementation methodologies. The journey so far has been filled with more failures than successes. An expert thus has put up a list of common mistakes to avoid while implementing Hadoop.

Wael Elrifai in a post titled How to Avoid Seven Common Hadoop Mistakes and posted on IT Pro Portal says:

Business needs specialized skills, data integration, and budget all need to factor into planning and implementation. Even when this happens, a large percentage of Hadoop implementations fail.

For instance, the author says that one of the most common mistakes that most consultants commit is treated Hadoop like any other database management system. The trick is to treat data lake like a box of Legos and start building the model with one brick at a time. Some other common mistakes include not migrating the data before implementation, not thinking about security issues at the outset and so on. Read the entire article here.

Vishol Ingole, July 7, 2017

Chan Zuckerberg Initiative to Wield Meta Search for Good

July 6, 2017

Mark Zuckerberg’s and Priscilla Chan’s philanthropic project, aptly named the Chan Zuckerberg Initiative (CZI), is beginning its mission with a compelling step—it has acquired Meta, a search engine built specifically for scientific research. TechCrunch examines the acquisition in, “Chan Zuckerberg Initiative Acquires, and Will Free Up, Science Search Engine Meta.”

Researchers face a uniquely mind-boggling amount of data in their work. The article notes, for example, that between 2,000 and 4,000 scientific papers are published daily in the field of biomedicine alone. The article includes a helpful one-and-a-half minute video explaining the platform’s capabilities. Reporter Josh Constine emphasizes:

What’s special about Meta is that its AI recognizes authors and citations between papers so it can surface the most important research instead of just what has the best SEO. It also provides free full-text access to 18,000 journals and literature sources. …

 

Meta, formerly known as Sciencescape, indexes entire repositories of papers like PubMed and crawls the web, identifying and building profiles for the authors while analyzing who cites or links to what. It’s effectively Google PageRank for science, making it simple to discover relevant papers and prioritize which to read. It even adapts to provide feeds of updates on newly published research related to your previous searches.

The price CZI paid for the startup was not disclosed. Though Meta has charged some users in the past (for subscriptions or customizations), CEO Sam Molyneux promises the platform will be available for free once the transition is complete; he assures us:

Going forward, our intent is not to profit from Meta’s data and capabilities; instead we aim to ensure they get to those who need them most, across sectors and as quickly as possible, for the benefit of the world.

Molyneux posted a heartfelt letter detailing his company’s history and his hopes for the future, so the curious should take a gander. He and his sister Amy founded Meta in Toronto in 2010. Not surprisingly, they are currently hiring.

Cynthia Murrell, July 6, 2017

Deleting Yourself from the Internet Too Good to Be True

July 4, 2017

Most people find themselves saddled with online accounts going back decades and would gladly delete them if they could. Some people even wish they could delete all their accounts and cease to exist online. A new service, Deseat, promises just that. According to The Next Web,

Every account it finds gets paired with an easy delete link pointing to the unsubscribe page for that service. Within in a few clicks you’re freed from it, and depending on how long you need to work through the entire list, you can be unwanted-account-free within the hour.

Theoretically, one could completely erase all trace of themselves from the all-knowing cyber web in the sky. But can it really be this easy?

Yes, eliminating outdated and unused accounts is a much-needed step in cleaning up one’s cyber identity, but we must question the validity of total elimination of one’s cyber identify in just a few clicks. Despite the website’s claim to “wipe your entire existence off the internet in a few clicks” ridding the internet of one’s cyber footprints is probably not that easy.

Catherine Lamsfuss, July 4, 2017

DARPA Progresses on Refining Data Analysis

June 12, 2017

The ideal data analysis platform for global intelligence would take all the data in the world and rapidly make connections, alerting law enforcement or the military about potential events before they happen. It would also make it downright impossible for bad actors to hide their tracks. Our government seems to be moving toward that goal with AIDA, or Active Interpretation of Disparate Alternatives. DARPA discusses the project in its post, “DARPA Wades into Murky Multimedia Information Streams to Catch Big Meaning.” The agency states:

The goal of AIDA is to develop a multi-hypothesis ‘semantic engine’ that generates explicit alternative interpretations or meaning of real-world events, situations, and trends based on data obtained from an expansive range of outlets. The program aims to create technology capable of aggregating and mapping pieces of information automatically derived from multiple media sources into a common representation or storyline, and then generating and exploring multiple hypotheses about the true nature and implications of events, situations, and trends of interest.

‘It is a challenge for those who strive to achieve and maintain an understanding of world affairs that information from each medium is often analyzed independently, without the context provided by information from other media,’ said Boyan Onyshkevych, program manager in DARPA’s Information Innovation Office (I2O). ‘Often, each independent analysis results in only one interpretation, with alternate interpretations eliminated due to lack of evidence even in the absence of evidence that would contradict those alternatives. When these independent, impoverished analyses are combined, generally late in the analysis process, the result can be a single apparent consensus view that does not reflect a true consensus.’

AIDA’s goal of presenting an accurate picture of overall context early on will help avoid that problem. The platform is to assign a confidence level to each piece of information it processes and each hypothesis it generates. It will also, they hope, be able to correct for a journalistic spin by examining variables and probabilities. Is the intelligence community is about to gain an analysis platform capable of chilling accuracy?

Cynthia Murrell, June 12, 2017

Bibliophiles Have 25 Million Reasons to Smile

June 6, 2017

The US Library of Congress has released 25 million records of its collection online and are anyone with Internet access is free to use it.

According to Science Alert article titled The US Library of Congress Just Put 25 Million Records Online, Free of Charge:

The bibliographic data sets, like digital library cards, cover music, books, maps, manuscripts, and more, and their publication online marks the biggest release of digital records in the Library’s history.

The Library of Congress has been on digitization spree for long and users can expect more records to be made online in the near future. The challenge, however, is retrieving books or information that the user needs. The web interface is still complicated and not user-friendly. In short, the enterprise search function is a mess. What The Library of Congress really needs is a user-friendly and efficient way of accessing its vast collection of knowledge to bibliophiles.

Vishal Ingole, June 6, 2017

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta