The Information Not Accuracy Age

December 7, 2016

The impact of Google on our lives is clear through the company’s name being used colloquially as a verb. However, Quantum Run reminds us of their impact, quantifiable, in their piece called All hail Google. Google owns 80% of the smartphone market with over a billion android devices. Gmail’s users tally at 420 million users and Chrome has 800 million users. Also, YouTube, which Google owns, has one billion users. An interesting factoid the article pairs with these stats is that 94% of students equate Google with research. The article notes:

The American Medical association voices their concerns over relying on search engines, saying, “Our concern is the accuracy and trustworthiness of content that ranks well in Google and other search engines. Only 40 percent of teachers say their students are good at assessing the quality and accuracy of information they find via online research. And as for the teachers themselves, only five percent say ‘all/almost all’ of the information they find via search engines is trustworthy — far less than the 28 percent of all adults who say the same.

Apparently, cybercondria is a thing. The article correctly points to the content housed on the deep web and the Dark Web as untouched by Google. The major question sparked by this article is that we now have to question the validity of all the fancy numbers Quantum Run has reported.

Megan Feil, December 7, 2016

Google Search Results Are Politically Biased

December 7, 2016

Google search results are supposed to be objective and accurate.  The key phrase in the last sentence was objective, but studies have proven that algorithms can be just as biased as the humans who design them.  One would think that Google, one of the most popular search engines in the world, who have discovered how to program objective algorithms, but according to the International Business Times, “Google Search Results Tend To Have Liberal Bias That Could Influence Public Opinion.”

Did you ever hear Uncle Ben’s advice to Spider-Man, “With great power comes great responsibility.”  This advice rings true for big corporations, such as Google, that influence the public opinion.  CanIRank.com conducted a study the discovered searches using political terms displayed more pages with a liberal than a conservative view. What does Google have to say about it?

The Alphabet-owned company has denied any bias and told the Wall Street Journal: ‘From the beginning, our approach to search has been to provide the most relevant answers and results to our users, and it would undermine people’s trust in our results, and our company, if we were to change course.’  The company maintains that its search results are based on algorithms using hundreds of factors which reflect the content and information available on the Internet. Google has never made its algorithm for determining search results completely public even though over the years researchers have tried to put their reasoning to it.

This is not the first time Google has been accused of a liberal bias in its search results.  The consensus is that the liberal leanings are unintentional and is an actual reflection of the amount of liberal content on the Web.

What is the truth?  Only the Google gods know.

Whitney Grace, December 7, 2016

Physiognomy for the Modern Age

December 6, 2016

Years ago, when I first learned about the Victorian-age pseudosciences of physiognomy and phrenology, I remember thinking how glad I was that society had evolved past such nonsense. It appears I was mistaken; the basic concept was just waiting for technology to evolve before popping back up, we learn from NakedSecurity’s article, “’Faception’ Software Claims It Can Spot Terrorists, Pedophiles, Great Poker Players.”  Based in Isreal, Faception calls its technique “facial personality profiling.” Writer Lisa Vaas reports:

The Israeli startup says it can take one look at you and recognize facial traits undetectable to the human eye: traits that help to identify whether you’ve got the face of an expert poker player, a genius, an academic, a pedophile or a terrorist. The startup sees great potential in machine learning to detect the bad guys, claiming that it’s built 15 classifiers to evaluate certain traits with 80% accuracy. … Faception has reportedly signed a contract with a homeland security agency in the US to help identify terrorists.

The article emphasizes how problematic it can be to rely on AI systems to draw conclusions, citing University of Washington professor and “Master Algorithm” author Pedro Domingos:

As he told The Washington Post, a colleague of his had trained a computer system to tell the difference between dogs and wolves. It did great. It achieved nearly 100% accuracy. But as it turned out, the computer wasn’t sussing out barely perceptible canine distinctions. It was just looking for snow. All of the wolf photos featured snow in the background, whereas none of the dog pictures did. A system, in other words, might come to the right conclusions, for all the wrong reasons.

Indeed. Faception suggests that, for this reason, their software would be but one factor among many in any collection of evidence. And, perhaps it would—for most cases, most of the time. We join Vaas in her hope that government agencies will ultimately refuse to buy into this modern twist on Victorian-age pseudoscience.

Cynthia Murrell, December 6, 2016

 

Search Competition Is Fiercer Than You Expect

December 5, 2016

In the United States, Google dominates the Internet search market.  Bing has gained some traction, but the results are still muddy.  In Russia, Yandex chases Google around in circles, but what about the enterprise search market?  The enterprise search market has more competition than one would think.  We recently received an email from Searchblox, a cognitive platform that developed to help organizations embed information in applications using artificial intelligence and deep learning models.  SearchBlox is also a player in the enterprise software market as well as text analytics and sentiment analysis tool.

Their email explained, “3 Reasons To Choose SearchBlox Cognitive Platform” and here they are:

1. EPISTEMOLOGY-BASED. Go beyond just question and answers. SearchBlox uses artificial intelligence (AI) and deep learning models to learn and distill knowledge that is unique to your data. These models encapsulate knowledge far more accurately than any rules based model can create.

2. SMART OPERATION Building a model is half the challenge. Deploying a model to process big data can be even for challenging. SearchBlox is built on open source technologies like Elasticsearch and Apache Storm and is designed to use its custom models for processing high volumes of data.

3. SIMPLIFIED INTEGRATION SearchBlox is bundled with over 75 data connectors supporting over 40 file formats. This dramatically reduces the time required to get your data into SearchBlox. The REST API and the security capabilities allow external applications to easily embed the cognitive processing.

To us, this sounds like what enterprise search has been offering even before big data and artificial intelligence became buzzwords.  Not to mention, SearchBlox’s competitors have said the same thing.  What makes Searchblox different?  The company claims to be more inexpensive and they have won several accolades.  SearchBlox is made on open source technology, which allows it to lower the price.  Elasticsearch is the most popular open source search software, but what is funny is that Searchblox is like a repackaged version of said Elasticsearch.  Mind you are paying for a program that is already developed, but Searchblox is trying to compete with other outfits like Yippy.

Whitney Grace, December 5, 2016

Comprehensive Search System Atlas Recall Enters Open Beta

December 1, 2016

We learn about a new way to search nearly everything one has encountered digitally from TechCrunch’s article, “Atlas Recall, a Search Engine for Your Entire Digital Live, Gets an Open Beta and $20M in Backing.” The platform is the idea of Atlas Informatics CEO, and Napster co-founder, Jordan Ritter, a man after our own hearts. When given funding and his pick of projects, Ritter says, he “immediately” chose to improve the search experience.

The approach the Atlas team has devised may not be for everyone. It keeps track of everything users bring up on their computers and mobile devices (except things they specifically tell it not to.) It brings together data from disparate places like one’s Facebook, Outlook, Spotlight, and Spotify accounts and makes the data available from one cloud-based dashboard.

This does sound extremely convenient, and I don’t doubt the company’s claim that it can save workers hours every week. However, imagine how much damage a bad actor could do if, hypothetically, they were able to get in and search for, say, “account number” or “eyes only.” Make no mistake, security is a top priority for Atlas, and sensible privacy measures are in place. Besides, the company vows, they will not sell tailored (or any) advertising, and are very clear that each user owns their data. Furthermore, Atlas maintains they will have access to metadata, not the actual contents of users’ files.

Perhaps for those who already trust the cloud with much of their data, this arrangement is an acceptable risk. For those potential users, contributor Devin Coldewey describes Atlas Recall:

Not only does it keep track of all those items [which you have viewed] and their contents, but it knows the context surrounding them. It knows when you looked at them, what order you did so in, what other windows and apps you had open at the same time, where you were when you accessed it, who it was shared with before, and tons of other metadata.

The result is that a vague search, say ‘Seahawks game,’ will instantly produce all the data related to it, regardless of what silo it happens to be in, and presented with the most relevant stuff first. In that case maybe it would be the tickets you were emailed, then nearby, the plans you made over email with friends to get there, the Facebook invite you made, the articles you were reading about the team, your fantasy football page. Click on any of them and it takes you straight there. …

When you see it in action, it’s easy to imagine how quickly it could become essential. I happen to have a pretty poor memory, but even if I didn’t, who wants to scrub through four different web apps at work trying to find that one PDF? Wouldn’t it be nice to just type in a project name and have everything related to it — from you and from coworkers — pop up instantly, regardless of where it ‘lives’?

The main Atlas interface can be integrated with other search engines like Google and Spotlight, so users can see aggregated results when they use those, too. Interested readers may want to navigate to the article and view the embedded sales video, shorter than two minutes, which illustrates the platform. If you’re interested in the beta, you can sign up here (scroll down to “When can I start using Atlas?”). Founded in 2015, Atlas Informatics is based in Seattle. As of this writing, they are also hiring developers and engineers.

Cynthia Murrell, December 01, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Iran-Russia Ink Pact for Search Engine Services

November 28, 2016

Owing to geopolitical differences, countries like Iran are turning towards like-minded nations like Russia for technological developments. Russian Diplomat posted in Iran recently announced that home-grown search engine service provider Yandex will offer its services to the people of Iran.

Financial Tribune in a news report Yandex to Arrive Soon said that:

Last October, Russian and Iranian communications ministers Nikolay Nikiforov and Mahmoud Vaezi respectively signed a deal to expand bilateral technological collaborations. During the meeting, Russian Ambassador Vaezi said, We are familiar with the powerful Russian search engine Yandex. We agreed that Yandex would open an office in Iran. The system will be adapted for the Iranian people and will be in Persian.

Iran traditionally has been an extremist nation and at the center of numerous international controversies that indirectly bans American corporations from conducting business in this hostile territory. On the other hand, Russia which is seen as a foe to the US stands to gain from these sour relations.

As of now, .com and .com.tr domains owned by Yandex are banned in Iran, but with the MoU signed, that will change soon. There is another interesting point to be observed in this news piece:

Looking at Yandex.ir, an official reportedly working for IRIB purchased the website, according to a domain registration search.  DomainTools, a portal that lists the owners of websites, says Mohammad Taqi Mozouni registered the domain address back in July.

Technically, and internationally accepted, no individual or organization can own a domain name of a company with any extension (without necessary permissions) that has already carved out a niche for itself online. It is thus worth pondering what prompted a Russian search engine giant to let a foreign governmental agency acquire its domain name.

Vishal Ingole November 28, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Word Embedding Captures Semantic Relationships

November 10, 2016

The article on O’Reilly titled Capturing Semantic Meanings Using Deep Learning explores word embedding in natural language processing. NLP systems typically encode word strings, but word embedding offers a more complex approach that emphasizes relationships and similarities between words by treating them as vectors. The article posits,

For example, let’s take the words woman, man, queen, and king. We can get their vector representations and use basic algebraic operations to find semantic similarities. Measuring similarity between vectors is possible using measures such as cosine similarity. So, when we subtract the vector of the word man from the vector of the word woman, then its cosine distance would be close to the distance between the word queen minus the word king (see Figure 1).

The article investigates the various neural network models that prevent the expense of working with large data. Word2Vec, CBOW, and continuous skip-gram are touted as models and the article goes into great technical detail about the entire process. The final result is that the vectors understand the semantic relationship between the words in the example. Why does this approach to NLP matter? A few applications include predicting future business applications, sentiment analysis, and semantic image searches.

Chelsea Kerwin,  November 10, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Google Introduces Fact Checking Tool

October 26, 2016

If it works as advertised, a new Google feature will be welcomed by many users—World News Report tells us, “Google Introduced Fact Checking Feature Intended to Help Readers See Whether News Is Actually True—Just in Time for US Elections.” The move is part of a trend for websites, who seem to have recognized that savvy readers don’t just believe everything they read. Writer Peter Woodford reports:

Through an algorithmic process from schema.org known as ClaimReview, live stories will be linked to fact checking articles and websites. This will allow readers to quickly validate or debunk stories they read online. Related fact-checking stories will appear onscreen underneath the main headline. The example Google uses shows a headline over passport checks for pregnant women, with a link to Full Fact’s analysis of the issue. Readers will be able to see if stories are fake or if claims in the headline are false or being exaggerated. Fact check will initially be available in the UK and US through the Google News site as well as the News & Weather apps for both Android and iOS. Publishers who wish to become part of the new service can apply to have their sites included.

Woodford points to Facebook’s recent trouble with the truth within its Trending Topics feature and observes that many people are concerned about the lack of honesty on display this particular election cycle. Google, wisely, did not mention any candidates, but Woodford notes that Politifact rates 71% of Trump’s statements as false (and, I would add, 27% of Secretary Clinton’s statements as false. Everything is relative.)  If the trend continues, it will be prudent for all citizens to rely on (unbiased) fact-checking tools on a regular basis.

Cynthia Murrell, October 26, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

National Geographic Quad View

October 13, 2016

Google Maps and other map tools each have their unique features, but some are better than others at helping you find your way.  However, most of these online map tools have the same basic function and information.  While they can help you if you are lost, they are not that useful for topographyNational Geographic comes to our rescue with free topographic PDFs.  Check them out at PDF Quads.

Here are the details straight from the famous nature magazine:

National Geographic has built an easy to use web interface that allows anyone to quickly find any quad in the country for downloading and printing. Each quad has been pre-processed to print on a standard home, letter size printer. These are the same quads that were printed by USGS for decades on giant bus-sized pressed but are now available in multi-page PDFs that can be printed just about anywhere. They are pre-packaged using the standard 7.5 minute, 1:24,000 base but with some twists.

How can there be twists in a topographic map?  They are not really that surprising, just explanations about how the images are printed out.  Page one is an overview map that, pages two through five are standard topographic maps sized to print on regular paper, and hill shading is added to provide the maps with more detail.

Everyone does not use topography maps, but a precise tool is invaluable to those who do.

Whitney Grace, October 13, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Lexmark Upgrades Its Enterprise Search

September 30, 2016

Enterprise search has taken a back a back seat to search news regarding Google’s next endeavor and what the next big thing is in big data.  Enterprise search may have taken a back seat in my news feed, but it is still a major component in enterprise systems.  You can even speculate that without a search function, enterprise systems are useless.

Lexmark, one of the largest suppliers of printers and business solutions in the country, understand the importance of enterprise search.  This is why they recently updated the description of its Perceptive Enterprise Search in its system’s technical specifications:

Perceptive Enterprise Search is a suite of enterprise applications that offer a choice of options for high performance search and mobile information access. The technical specifications in this document are specific to Perceptive Enterprise Search version 10.6…

A required amount of memory and disk space is provided. You must meet these requirements to support your Perceptive Enterprise Search system. These requirements specifically list the needs of Perceptive Enterprise Search and do not include any amount of memory or disk space you require for the operating system, environment, or other software that runs on the same machine.

Some technical specifications also provide recommendations. While requirements define the minimum system required to run Perceptive Enterprise Search, the recommended specifications serve as suggestions to improve the performance of your system. For maximum performance, review your specific environment, network, and platform capabilities and analyze your planned business usage of the system. Your specific system may require additional resources above these recommendations.”

It is pretty standard fare when it comes to technical specifications, in other words, not that interesting but necessary to make the enterprise system work correctly.

Whitney Grace, September 30, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Next Page »