Physiognomy for the Modern Age

December 6, 2016

Years ago, when I first learned about the Victorian-age pseudosciences of physiognomy and phrenology, I remember thinking how glad I was that society had evolved past such nonsense. It appears I was mistaken; the basic concept was just waiting for technology to evolve before popping back up, we learn from NakedSecurity’s article, “’Faception’ Software Claims It Can Spot Terrorists, Pedophiles, Great Poker Players.”  Based in Isreal, Faception calls its technique “facial personality profiling.” Writer Lisa Vaas reports:

The Israeli startup says it can take one look at you and recognize facial traits undetectable to the human eye: traits that help to identify whether you’ve got the face of an expert poker player, a genius, an academic, a pedophile or a terrorist. The startup sees great potential in machine learning to detect the bad guys, claiming that it’s built 15 classifiers to evaluate certain traits with 80% accuracy. … Faception has reportedly signed a contract with a homeland security agency in the US to help identify terrorists.

The article emphasizes how problematic it can be to rely on AI systems to draw conclusions, citing University of Washington professor and “Master Algorithm” author Pedro Domingos:

As he told The Washington Post, a colleague of his had trained a computer system to tell the difference between dogs and wolves. It did great. It achieved nearly 100% accuracy. But as it turned out, the computer wasn’t sussing out barely perceptible canine distinctions. It was just looking for snow. All of the wolf photos featured snow in the background, whereas none of the dog pictures did. A system, in other words, might come to the right conclusions, for all the wrong reasons.

Indeed. Faception suggests that, for this reason, their software would be but one factor among many in any collection of evidence. And, perhaps it would—for most cases, most of the time. We join Vaas in her hope that government agencies will ultimately refuse to buy into this modern twist on Victorian-age pseudoscience.

Cynthia Murrell, December 6, 2016

 

Search Competition Is Fiercer Than You Expect

December 5, 2016

In the United States, Google dominates the Internet search market.  Bing has gained some traction, but the results are still muddy.  In Russia, Yandex chases Google around in circles, but what about the enterprise search market?  The enterprise search market has more competition than one would think.  We recently received an email from Searchblox, a cognitive platform that developed to help organizations embed information in applications using artificial intelligence and deep learning models.  SearchBlox is also a player in the enterprise software market as well as text analytics and sentiment analysis tool.

Their email explained, “3 Reasons To Choose SearchBlox Cognitive Platform” and here they are:

1. EPISTEMOLOGY-BASED. Go beyond just question and answers. SearchBlox uses artificial intelligence (AI) and deep learning models to learn and distill knowledge that is unique to your data. These models encapsulate knowledge far more accurately than any rules based model can create.

2. SMART OPERATION Building a model is half the challenge. Deploying a model to process big data can be even for challenging. SearchBlox is built on open source technologies like Elasticsearch and Apache Storm and is designed to use its custom models for processing high volumes of data.

3. SIMPLIFIED INTEGRATION SearchBlox is bundled with over 75 data connectors supporting over 40 file formats. This dramatically reduces the time required to get your data into SearchBlox. The REST API and the security capabilities allow external applications to easily embed the cognitive processing.

To us, this sounds like what enterprise search has been offering even before big data and artificial intelligence became buzzwords.  Not to mention, SearchBlox’s competitors have said the same thing.  What makes Searchblox different?  The company claims to be more inexpensive and they have won several accolades.  SearchBlox is made on open source technology, which allows it to lower the price.  Elasticsearch is the most popular open source search software, but what is funny is that Searchblox is like a repackaged version of said Elasticsearch.  Mind you are paying for a program that is already developed, but Searchblox is trying to compete with other outfits like Yippy.

Whitney Grace, December 5, 2016

Comprehensive Search System Atlas Recall Enters Open Beta

December 1, 2016

We learn about a new way to search nearly everything one has encountered digitally from TechCrunch’s article, “Atlas Recall, a Search Engine for Your Entire Digital Live, Gets an Open Beta and $20M in Backing.” The platform is the idea of Atlas Informatics CEO, and Napster co-founder, Jordan Ritter, a man after our own hearts. When given funding and his pick of projects, Ritter says, he “immediately” chose to improve the search experience.

The approach the Atlas team has devised may not be for everyone. It keeps track of everything users bring up on their computers and mobile devices (except things they specifically tell it not to.) It brings together data from disparate places like one’s Facebook, Outlook, Spotlight, and Spotify accounts and makes the data available from one cloud-based dashboard.

This does sound extremely convenient, and I don’t doubt the company’s claim that it can save workers hours every week. However, imagine how much damage a bad actor could do if, hypothetically, they were able to get in and search for, say, “account number” or “eyes only.” Make no mistake, security is a top priority for Atlas, and sensible privacy measures are in place. Besides, the company vows, they will not sell tailored (or any) advertising, and are very clear that each user owns their data. Furthermore, Atlas maintains they will have access to metadata, not the actual contents of users’ files.

Perhaps for those who already trust the cloud with much of their data, this arrangement is an acceptable risk. For those potential users, contributor Devin Coldewey describes Atlas Recall:

Not only does it keep track of all those items [which you have viewed] and their contents, but it knows the context surrounding them. It knows when you looked at them, what order you did so in, what other windows and apps you had open at the same time, where you were when you accessed it, who it was shared with before, and tons of other metadata.

The result is that a vague search, say ‘Seahawks game,’ will instantly produce all the data related to it, regardless of what silo it happens to be in, and presented with the most relevant stuff first. In that case maybe it would be the tickets you were emailed, then nearby, the plans you made over email with friends to get there, the Facebook invite you made, the articles you were reading about the team, your fantasy football page. Click on any of them and it takes you straight there. …

When you see it in action, it’s easy to imagine how quickly it could become essential. I happen to have a pretty poor memory, but even if I didn’t, who wants to scrub through four different web apps at work trying to find that one PDF? Wouldn’t it be nice to just type in a project name and have everything related to it — from you and from coworkers — pop up instantly, regardless of where it ‘lives’?

The main Atlas interface can be integrated with other search engines like Google and Spotlight, so users can see aggregated results when they use those, too. Interested readers may want to navigate to the article and view the embedded sales video, shorter than two minutes, which illustrates the platform. If you’re interested in the beta, you can sign up here (scroll down to “When can I start using Atlas?”). Founded in 2015, Atlas Informatics is based in Seattle. As of this writing, they are also hiring developers and engineers.

Cynthia Murrell, December 01, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Iran-Russia Ink Pact for Search Engine Services

November 28, 2016

Owing to geopolitical differences, countries like Iran are turning towards like-minded nations like Russia for technological developments. Russian Diplomat posted in Iran recently announced that home-grown search engine service provider Yandex will offer its services to the people of Iran.

Financial Tribune in a news report Yandex to Arrive Soon said that:

Last October, Russian and Iranian communications ministers Nikolay Nikiforov and Mahmoud Vaezi respectively signed a deal to expand bilateral technological collaborations. During the meeting, Russian Ambassador Vaezi said, We are familiar with the powerful Russian search engine Yandex. We agreed that Yandex would open an office in Iran. The system will be adapted for the Iranian people and will be in Persian.

Iran traditionally has been an extremist nation and at the center of numerous international controversies that indirectly bans American corporations from conducting business in this hostile territory. On the other hand, Russia which is seen as a foe to the US stands to gain from these sour relations.

As of now, .com and .com.tr domains owned by Yandex are banned in Iran, but with the MoU signed, that will change soon. There is another interesting point to be observed in this news piece:

Looking at Yandex.ir, an official reportedly working for IRIB purchased the website, according to a domain registration search.  DomainTools, a portal that lists the owners of websites, says Mohammad Taqi Mozouni registered the domain address back in July.

Technically, and internationally accepted, no individual or organization can own a domain name of a company with any extension (without necessary permissions) that has already carved out a niche for itself online. It is thus worth pondering what prompted a Russian search engine giant to let a foreign governmental agency acquire its domain name.

Vishal Ingole November 28, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Word Embedding Captures Semantic Relationships

November 10, 2016

The article on O’Reilly titled Capturing Semantic Meanings Using Deep Learning explores word embedding in natural language processing. NLP systems typically encode word strings, but word embedding offers a more complex approach that emphasizes relationships and similarities between words by treating them as vectors. The article posits,

For example, let’s take the words woman, man, queen, and king. We can get their vector representations and use basic algebraic operations to find semantic similarities. Measuring similarity between vectors is possible using measures such as cosine similarity. So, when we subtract the vector of the word man from the vector of the word woman, then its cosine distance would be close to the distance between the word queen minus the word king (see Figure 1).

The article investigates the various neural network models that prevent the expense of working with large data. Word2Vec, CBOW, and continuous skip-gram are touted as models and the article goes into great technical detail about the entire process. The final result is that the vectors understand the semantic relationship between the words in the example. Why does this approach to NLP matter? A few applications include predicting future business applications, sentiment analysis, and semantic image searches.

Chelsea Kerwin,  November 10, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Google Introduces Fact Checking Tool

October 26, 2016

If it works as advertised, a new Google feature will be welcomed by many users—World News Report tells us, “Google Introduced Fact Checking Feature Intended to Help Readers See Whether News Is Actually True—Just in Time for US Elections.” The move is part of a trend for websites, who seem to have recognized that savvy readers don’t just believe everything they read. Writer Peter Woodford reports:

Through an algorithmic process from schema.org known as ClaimReview, live stories will be linked to fact checking articles and websites. This will allow readers to quickly validate or debunk stories they read online. Related fact-checking stories will appear onscreen underneath the main headline. The example Google uses shows a headline over passport checks for pregnant women, with a link to Full Fact’s analysis of the issue. Readers will be able to see if stories are fake or if claims in the headline are false or being exaggerated. Fact check will initially be available in the UK and US through the Google News site as well as the News & Weather apps for both Android and iOS. Publishers who wish to become part of the new service can apply to have their sites included.

Woodford points to Facebook’s recent trouble with the truth within its Trending Topics feature and observes that many people are concerned about the lack of honesty on display this particular election cycle. Google, wisely, did not mention any candidates, but Woodford notes that Politifact rates 71% of Trump’s statements as false (and, I would add, 27% of Secretary Clinton’s statements as false. Everything is relative.)  If the trend continues, it will be prudent for all citizens to rely on (unbiased) fact-checking tools on a regular basis.

Cynthia Murrell, October 26, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

National Geographic Quad View

October 13, 2016

Google Maps and other map tools each have their unique features, but some are better than others at helping you find your way.  However, most of these online map tools have the same basic function and information.  While they can help you if you are lost, they are not that useful for topographyNational Geographic comes to our rescue with free topographic PDFs.  Check them out at PDF Quads.

Here are the details straight from the famous nature magazine:

National Geographic has built an easy to use web interface that allows anyone to quickly find any quad in the country for downloading and printing. Each quad has been pre-processed to print on a standard home, letter size printer. These are the same quads that were printed by USGS for decades on giant bus-sized pressed but are now available in multi-page PDFs that can be printed just about anywhere. They are pre-packaged using the standard 7.5 minute, 1:24,000 base but with some twists.

How can there be twists in a topographic map?  They are not really that surprising, just explanations about how the images are printed out.  Page one is an overview map that, pages two through five are standard topographic maps sized to print on regular paper, and hill shading is added to provide the maps with more detail.

Everyone does not use topography maps, but a precise tool is invaluable to those who do.

Whitney Grace, October 13, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Lexmark Upgrades Its Enterprise Search

September 30, 2016

Enterprise search has taken a back a back seat to search news regarding Google’s next endeavor and what the next big thing is in big data.  Enterprise search may have taken a back seat in my news feed, but it is still a major component in enterprise systems.  You can even speculate that without a search function, enterprise systems are useless.

Lexmark, one of the largest suppliers of printers and business solutions in the country, understand the importance of enterprise search.  This is why they recently updated the description of its Perceptive Enterprise Search in its system’s technical specifications:

Perceptive Enterprise Search is a suite of enterprise applications that offer a choice of options for high performance search and mobile information access. The technical specifications in this document are specific to Perceptive Enterprise Search version 10.6…

A required amount of memory and disk space is provided. You must meet these requirements to support your Perceptive Enterprise Search system. These requirements specifically list the needs of Perceptive Enterprise Search and do not include any amount of memory or disk space you require for the operating system, environment, or other software that runs on the same machine.

Some technical specifications also provide recommendations. While requirements define the minimum system required to run Perceptive Enterprise Search, the recommended specifications serve as suggestions to improve the performance of your system. For maximum performance, review your specific environment, network, and platform capabilities and analyze your planned business usage of the system. Your specific system may require additional resources above these recommendations.”

It is pretty standard fare when it comes to technical specifications, in other words, not that interesting but necessary to make the enterprise system work correctly.

Whitney Grace, September 30, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

True or False: Google Fakes Results for Social Engineering

September 13, 2016

Here in Harrod’s Creek, we love the Alphabet Google thing. When we read anti Google articles, we are baffled. Why don’t these articles love and respect the GOOG as we do? A case in point is “How Google’s Search Engines Use Faked Results for Social Engineering.” The loaded words “faked results” and “social engineering” put us on our guard.

What is the angle the write up pursues? Let’s look.

I highlighted this passage as a way get my intellectual toe in the murky water:

Google published an “overview” of how SEO works, but in a nutshell, Google searches for the freshest, most authoritative, easiest-to-display (desktop/laptop and mobile) content to serve its search engine users. It crawls, caches (grabs) content, calculates the speed of download, looks at textual content, counts words to find relevance, and compares how it looks on different sized devices. It not only analyzes what other sites link to it, but counts the number of these links and then determines their quality, meaning the degree to which the links in those sites are considered authoritative. Further, there are algorithms in place that block the listing of “spammy” sites, although, spam would not be relevant here. And recently, they have claimed to boost sites using HTTPS to promote security and privacy (fox henhouse?).

I am not sure about the “fox hen house” reference because fox is a popular burgoo addition. As a result the critters are few and far between. Too bad. They are tasty and their tails make nifty additions to cold weather parkas.

The author of the write up is not happy with how Google responds to a query for “Jihad.” I learned:

Google’s search results give pride of place to IslamicSupremeCouncil.org. The problem, according to the write up, is that this site is not a big hitter in the Jihad content space.

The article points out that Google does not return the search results the person running the test queries expected. The article points out:

When someone in the US, perhaps wanting to educate themselves on the subject, searches for “Jihad” and sees the Islamic Supreme Council as the top-ranked site, the perception is that this is the global, unbiased and authoritative view. If they click on that first, seemingly most popular link, their perception of Jihad will be skewed by the beliefs and doctrine of this peaceful group of people. These people who merely dabble on the edge of Islamic doctrine. These people who are themselves repeatedly targeted for their beliefs that are contrary to those of the majority of Muslims. These people who do not even come close to being any sort of credible or realistic representation of the larger and more prevalent subscribers (nay soldiers) of the “Lesser Jihad” (again, the violent kind).

My thought is that the results I expect from any ad supported, publicly accessible search system are rarely what I expect. The more I know about a particular subject—how legacy search system marketing distorts what the systems can actually do—the more disappointed I am with the search results.

I don’t think Google is intentionally distorting search results. Certain topics just don’t match up to the Google algorithms. Google is pretty good at sports, pizza, and the Housewives of Beverly Hills. Google is not particularly good with fine grained distinctions in certain topic spaces.

If the information presented by, for instance, the Railway Retirement Board is not searched, the Google system does its best to find a way to sell an ad against a topic or word. In short, Google does better with certain popular subjects which generate ad revenue.

Legacy enterprise search systems like STAIRS III are not going to be easy to search. Nailing down the names of the programmers in Germany who worked on the system and how the STAIRS III system influenced BRS Search is a tough slog with the really keen Google system.

If I attribute Google’s indifference to information about STAIRS III to a master scheme put in place by Messrs. Brin and Page, I would be giving them a heck of a lot of credit for micro managing how content is indexed.

The social engineering angle is more difficult for me to understand. I don’t think Google is biased against mainframe search systems which are 50 years old. The content, the traffic, and the ad focus pretty much guarantee that STAIRS III is presented in a good enough way.

The problem, therefore, is that Google’s whiz kid technology is increasingly good enough. That means average or maybe a D plus. The yardstick is neither precision nor recall. At Google, revenue counts.

Baidu, Bing, Silobreaker, Qwant, and Yandex, among other search systems, have similar challenges. But each system is tending to the “good enough” norm. Presenting any subject in a way which makes a subject matter expert happy is not what these systems are tuned to do.

Here in Harrod’s Creek, we recognize that multiple queries across multiple systems are a good first step in research. Then there is the task of identifying individuals with particular expertise and trying to speak with them or at least read what they have written. Finally, there is the slog through the dead tree world.

Expecting Google or any free search engine to perform sophisticated knowledge centric research is okay. We prefer the old fashioned approach to research. That’s why Beyond Search documents some of the more interesting approaches revealed in the world of online analysis.

I like the notion of social engineering, particularly the Augmentext approach. But Google is more interested in money and itself than many search topics which are not represented in a way which I would like. Does Google hate me? Nah, Google doesn’t know I exist. Does Google discriminate against STAIRS III? Nah, of Google’s 65,000 employees probably fewer than 50 know what STAIRS III is? Do Googlers understand revenue? Yep, pretty much.

Stephen E Arnold, September 13, 2016

Ads Appear Here, There, and Everywhere Across Google Landscape

September 12, 2016

The article on CNN Money titled Google Is Going to Start Showing You More Ads discusses the surge in ads that users can expect to barely notice over the coming weeks and months. In efforts to ramp up mobile ad revenue to match the increasing emphasis on mobile search, Google is making mobile ads bigger, more numerous, and just more. The article explains,

Google will be simplifying the work flow for businesses to create display ads with images. The company says advertisers need to “simply provide headlines, a description, an image, and a URL,” and Google will automatically design ads for the business. Location-based ads will start showing up on Google too. If you search for “shoe store” or “car repair near me,” ads for local businesses will populate the search results… The changes come as Google is trying to stay ahead of customers’ changing demands.

Google claims in the article that the increase is already showing strong results for advertisers, which click-through rates (CTR) up 20%. But it is hard to believe. As ads flood the space between articles, search results, and even Google Map directions, they seem to be no more significant than an increase in white noise. If Google really wants to revolutionize marketing, they are going to need to dig deeper than just squeezing more ads in between the lines.

Chelsea Kerwin, September 12, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
There is a Louisville, Kentucky Hidden Web/Dark Web meet up on September 27, 2016.
Information is at this link: https://www.meetup.com/Louisville-Hidden-Dark-Web-Meetup/events/233599645/

 

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta