December 9, 2016
I love Google. You love Google. Everyone loves Google so much that it has become a verb in practically every language. Google does present many problems, however, especially in the inclusion of paid ads in search results and Google searches are not academically credible. Researchers love the ease of use with Google, but there a search engine does not exist that returns results that answer a simple question based on a few keywords, NLP, and citations (those are extremely important).
It is possible that a search engine designed for academia could exist, especially if it can be subject specific and allows full-text access to all results. The biggest problem and barrier in the way of a complete academic search engine is that scholarly research is protected by copyright and most research is behind pay walls belonging to academic publishers, like Elsevier.
Elsevier is a notorious academic publisher because it provides great publication and it is also expensive to subscribe to it digitally. The Mendeley Blog shares that Elsevier has answered the academic search engine cry: “Introducing Elsevier DataSearch.” The Elsevier DataSearch promises to search through reputable information repositories and help researchers accelerate their work.
DataSearch is still in the infant stage and there is an open call for beta testers:
DataSearch offers a new and innovative approach. Most search engines don’t actively involve their users in making them better; we invite you, the user, to join our User Panel and advise how we can improve the results. We are looking for users in a variety of fields, no technical expertise is required (though welcomed). In order to join us, visit https://datasearch.elsevier.com and click on the button marked ‘Join Our User Panel’.”
This is the right step forward for any academic publisher! There is one thing I am worried about and that is: how much is the DataSearch engine going to cost users? I respect copyright and the need to make a profit, but I wish there was one all-encompassing academic database that was free or had a low-cost subscription plan.
Whitney Grace, December 9, 2016
December 9, 2016
Digital Reasoning has released the latest iteration of its Synthesys platform, we learn from Datanami’s piece, “Cognitive Platform Sharpens Focus on Untructured Data.” Readers may recall that Digital Reasoning provides tools to the controversial US Army intelligence system known as DCGS. The write-up specifies:
Version 4 of the Digital Reasoning platform released on Tuesday (June 21) is based on proprietary analytics tools that apply deep learning neural network techniques across text, audio and images. Synthesys 4 also incorporates behavioral analytics based on anomaly detection techniques.
The upgrade also reflects the company’s push into user and ‘entity’ behavior analytics, a technique used to leverage machine learning in security applications such as tracking suspicious activity on enterprise networks and detecting ransomware attacks. ‘We are especially excited to expand into the area of entity behavior analytics, combining the analysis of structured and unstructured data into a person-centric, prioritized profile that can be used to predict employees at risk for insider threats,’ Bill DiPietro, Digital Reasoning’s vice president of product management noted in a statement.
The platform has added Spanish and Chinese to its supported languages, which come with syntactic parsing. There is also now support for Elasticsearch, included in the pursuit of leveraging unstructured data in real time. The company emphasizes the software’s ability to learn from context, as well as enhanced tools for working with reports.
Digital Reasoning was founded in 2000, and makes its primary home in Nashville, Tennessee, with offices in Washington, DC, and London. The booming company is also hiring, especially in the Nashville area.
Cynthia Murrell, December 9, 2016
December 9, 2016
The article on DW titled Germany’s Highest Court Rejects Yahoo Content Payment Case reports that Yahoo’s fight against paying publishers for publishing their content has been sent back to the lower courts. Yahoo claims that the new copyright laws limit access to information. The article explains,
The court, in the western city of Karlsruhe, said on Wednesday that Yahoo hadn’t exhausted its legal possibilities in lower courts and should turn to them first. The decision suggests Yahoo could now take its case to the civil law courts. The judges didn’t rule on the issue itself, which also affects rival search engine companies…. Germany revised its copyright laws in August 2013 allowing media companies to request payment from search engines that use more than snippets of their content.
The article points out that the new law fails to define “snippet.” Does it mean a few sentences or a few paragraphs? The article doesn’t go into much detail on how this major oversight was possible. The outcome of the case will certainly affect Google as well as Yahoo. Since its summer sale of the principal online asset to Verizon, a new direction has emerged. Verizon aims to forge a Yahoo brand that can compete in online advertising with the likes of Google and Facebook.
Chelsea Kerwin, December 9, 2016
December 7, 2016
The impact of Google on our lives is clear through the company’s name being used colloquially as a verb. However, Quantum Run reminds us of their impact, quantifiable, in their piece called All hail Google. Google owns 80% of the smartphone market with over a billion android devices. Gmail’s users tally at 420 million users and Chrome has 800 million users. Also, YouTube, which Google owns, has one billion users. An interesting factoid the article pairs with these stats is that 94% of students equate Google with research. The article notes:
The American Medical association voices their concerns over relying on search engines, saying, “Our concern is the accuracy and trustworthiness of content that ranks well in Google and other search engines. Only 40 percent of teachers say their students are good at assessing the quality and accuracy of information they find via online research. And as for the teachers themselves, only five percent say ‘all/almost all’ of the information they find via search engines is trustworthy — far less than the 28 percent of all adults who say the same.
Apparently, cybercondria is a thing. The article correctly points to the content housed on the deep web and the Dark Web as untouched by Google. The major question sparked by this article is that we now have to question the validity of all the fancy numbers Quantum Run has reported.
Megan Feil, December 7, 2016
December 7, 2016
Google search results are supposed to be objective and accurate. The key phrase in the last sentence was objective, but studies have proven that algorithms can be just as biased as the humans who design them. One would think that Google, one of the most popular search engines in the world, who have discovered how to program objective algorithms, but according to the International Business Times, “Google Search Results Tend To Have Liberal Bias That Could Influence Public Opinion.”
Did you ever hear Uncle Ben’s advice to Spider-Man, “With great power comes great responsibility.” This advice rings true for big corporations, such as Google, that influence the public opinion. CanIRank.com conducted a study the discovered searches using political terms displayed more pages with a liberal than a conservative view. What does Google have to say about it?
The Alphabet-owned company has denied any bias and told the Wall Street Journal: ‘From the beginning, our approach to search has been to provide the most relevant answers and results to our users, and it would undermine people’s trust in our results, and our company, if we were to change course.’ The company maintains that its search results are based on algorithms using hundreds of factors which reflect the content and information available on the Internet. Google has never made its algorithm for determining search results completely public even though over the years researchers have tried to put their reasoning to it.
This is not the first time Google has been accused of a liberal bias in its search results. The consensus is that the liberal leanings are unintentional and is an actual reflection of the amount of liberal content on the Web.
What is the truth? Only the Google gods know.
Whitney Grace, December 7, 2016
December 7, 2016
A Canadian, Tom Spears has managed to publish a heavily plagiarized paper in a science journal by paying some cash. Getting published in a scientific and medical journal helps in advancing the career. ‘
In an article published by SlashDot titled Science Journals Caught Publishing Fake Research For Cash, the author says:
In 2014, journalist Tom Spears intentionally wrote “the world’s worst science research paper…a mess of plagiarism and meaningless garble” — then got it accepted by eight different journals. He did it to expose journals which follow the publish-for-a-fee model, “a fast-growing business that sucks money out of research, undermines genuine scientific knowledge, and provides fake credentials for the desperate.
This is akin to students enlisting services of hackers over Dark Web to manipulate their grades and attendance records. However, in this case, there is no need of Dark Web or Tor browser. Paying some cash is sufficient.
The root of the problem can be traced to OMICS International, an India-based publishing firm that is buying publication companies of these medical journals and publishing whatever is sent to them for cash. In standard practice, the paper needs to be peer-reviewed and also checked for plagiarism before it is published. As written earlier, the separation line between the Dark and Open web seems to be thinning and one day will disappear altogether.
Vishal Ingole, December 7, 2016
December 6, 2016
“Associative semantic” sounds like a new mental diagnosis for the DSM-V (Diagnostic and Statistical Manuel of Mental Disorders), but it actually is the name of a search technology that sounds like it amplifies the basic semantic search. Aistemos has the run down on the new search technology in the article, “Associative Semantic Search Technology: Omnity And IP.” Omnity is the purveyor of the “associative semantic search” and it makes the standard big data promise:
…the discovery of otherwise hidden, high-value patterns of interconnection within and between fields of knowledge as diverse as science, medicine, engineering, law and finance.
All of the companies centered on big data have this same focus or something similar, so what does Omnity offer that makes it stand out? It proposes to find connections between documents that do not directly correlate or cite one another. Omnity uses the word “accelerate” to explain how it will discover hidden patterns and expand knowledge. The implications mean semantic search would once again be augmented and more accurate.
Any industry that relies on detailed documents would benefit:
Such a facility would presumably enable someone to find references to relevant patents, technologies and prior art on a far wider scale than has hitherto been the case. The legal, strategic and commercial implications of being able to do this, for litigation, negotiation, due diligence, investment and forward planning are sufficiently obvious for us not to need to list them here.
The article suggests those who would most be interested in Omnity are intellectual property businesses. I can imagine academics would not mind getting their hands on the associative semantic search to power their research or law enforcement could use it to fight crime.
Whitney Grace, December 6, 2016
December 5, 2016
There’s text search and image search, but soon, searching may be done via hand-drawn sketching. Digital Trends released a story, Forget keywords — this new system lets you search with rudimentary sketches, which covers an emerging technology. Two researchers at Queen Mary University of London’s (QMUL) School of Electronic Engineering and Computer Science taught a deep learning neural network to recognize queries in the form of sketches and then return matches in the form of products. Sketch may have an advantage surpassing image search,
Both of those search modalities have problems,” he says. “Text-based search means that you have to try and describe the item you are looking for. This is especially difficult when you want to describe something at length, because retrieval becomes less accurate the more text you type. Photo-based search, on the other hand, lets you take a picture of an item and then find that particular product. It’s very direct, but it is also overly constrained, allowing you to find just one specific product instead of offering other similar items you may also be interested in.
This search technology is positioning itself to online retail commerce — and perhaps also only users with the ability to sketch? Yes, why read? Drawing pictures works really well for everyone. We think this might present monetization opportunities for Pinterest.
Megan Feil, December 5, 2016
December 5, 2016
In the United States, Google dominates the Internet search market. Bing has gained some traction, but the results are still muddy. In Russia, Yandex chases Google around in circles, but what about the enterprise search market? The enterprise search market has more competition than one would think. We recently received an email from Searchblox, a cognitive platform that developed to help organizations embed information in applications using artificial intelligence and deep learning models. SearchBlox is also a player in the enterprise software market as well as text analytics and sentiment analysis tool.
Their email explained, “3 Reasons To Choose SearchBlox Cognitive Platform” and here they are:
1. EPISTEMOLOGY-BASED. Go beyond just question and answers. SearchBlox uses artificial intelligence (AI) and deep learning models to learn and distill knowledge that is unique to your data. These models encapsulate knowledge far more accurately than any rules based model can create.
2. SMART OPERATION Building a model is half the challenge. Deploying a model to process big data can be even for challenging. SearchBlox is built on open source technologies like Elasticsearch and Apache Storm and is designed to use its custom models for processing high volumes of data.
3. SIMPLIFIED INTEGRATION SearchBlox is bundled with over 75 data connectors supporting over 40 file formats. This dramatically reduces the time required to get your data into SearchBlox. The REST API and the security capabilities allow external applications to easily embed the cognitive processing.
To us, this sounds like what enterprise search has been offering even before big data and artificial intelligence became buzzwords. Not to mention, SearchBlox’s competitors have said the same thing. What makes Searchblox different? The company claims to be more inexpensive and they have won several accolades. SearchBlox is made on open source technology, which allows it to lower the price. Elasticsearch is the most popular open source search software, but what is funny is that Searchblox is like a repackaged version of said Elasticsearch. Mind you are paying for a program that is already developed, but Searchblox is trying to compete with other outfits like Yippy.
Whitney Grace, December 5, 2016
December 5, 2016
An analytics company that collects crime related data from local law enforcement agencies plans to help reduce crime rates by using Big Data.
CrimerReports.com, in its FAQs says:
The data on CrimeReports is sent on an hourly, daily, or weekly basis from more than 1000 participating agencies to the CrimeReports map. Each agency controls their data flow to CrimeReports, including how often they send data, which incidents are included.
Very little is known about the service provider. WhoIs Lookup indicates that though the domain was registered way back in 1999, it was updated few days back on November 25th 2016 and is valid till November 2, 2017.
CrimeReports is linked to a local law enforcement agency that selectively shares the data on crime with the analytics firm. After some number crunching, the service provider then sends the data to its subscribers via emails. According to the firm:
Although no formal, third-party study has been commissioned, there is anecdotal evidence to suggest that public-facing crime mapping—by keeping citizens informed about crime in their area—helps them be more vigilant and implement crime prevention efforts in their homes, workplaces, and communities. In addition, there is anecdotal evidence to suggest that public-facing crime mapping fosters more trust in local law enforcement by members of the community.
To maintain data integrity, the data is collected only through official channels. The crime details are not comprehensive, rather they are redacted to protect victim and criminal’s privacy. As of now, CrimeReports get paid by law enforcement agencies. Certainly, this is something new and probably never tried.
Vishal Ingole, December 5, 2016