CyberOSINT banner

The Duck Quacks 12 Million Queries

January 14, 2016

DuckDuckGo keeps waddling through its search queries and quacking that it will not track its users information.  DuckDuckGo has remained a small search engine, but its privacy services are chipping away at Google and search engines’ user base.  TechViral shares that “DuckDuckGo The Anti-Google Search Engine Just Reached A New Milestone” and it is reaching twelve million search queries in one day!

In 2015, DuckDuckGo received 3.25 billion search queries, showing a 74 percent increase compared to the 2014 data.  While DuckDuckGo is a private oasis in a sea of tracking cookies, it still uses targeted ads.  However, unlike Google DuckDuckGo only uses ads based on the immediate keywords used in a search query and doesn’t store user information.  It wipes the search engine clean with each use.

DuckDuckGo’s increase of visitors has attracted partnerships with Mozilla and Apple.  The private search engine is a for profit business, but it does have different goals than Google.

“Otherwise, it should be noted that although he refuses to have the same practices as Google, DuckDuckGo already making profits, yes that’s true. And the company’s CEO, Gabriel Weinberg, stop to think it is necessary to collect information about users to monetize a search engine: ‘You type car and you see an advertisement for a car, Google follows you on all these sites because it operates huge advertising networks and other properties. So they need these data for search engines to follow you.’ ”

DuckDuckGo offers a great service for privacy, while it is gaining more users it doesn’t offer the plethora of services Google does.  DuckDuckGo, why not try private email, free office programs, and online data storage?  Would you still be the same if you offered these services?

Whitney Grace, January 14, 2016
Sponsored by, publisher of the CyberOSINT monograph

Desktop Web Searches Began Permanent Decline in 2013

December 28, 2015

The article on Quartz titled The Product that Made Google Has Peaked for Good presents the startling information that desktop web search is expected to remain in permanent decline. The main reason for Google’s prestige and growth peaked in 2013, the article suggests, and then declined for 20 out of the last 21 months. The article reports,

“Google doesn’t regularly disclose the number of search queries that its users conduct. (It has been “more than 100 billion” per month for a while.)… And while a nice chunk of Google’s revenue growth is coming from YouTube, its overall “Google Websites” business—mostly search ads, but also YouTube, Google Maps, etc.—grew sales 14%, 13%, and 16% year-over-year during the first three quarters of 2015. The mobile era hasn’t resulted in any sort of collapse of Google’s ad business.”

The article also conveys that mobile searches accounted for over half of all global search queries. Yes, overall Google is still a healthy company, but this decline in desktop searches will still certainly force some fancy dancing from Alphabet Google. The article does not provide any possible reasons for the decline. The foundations of the company might seem a little less stable between this decline and the restless future of Internet ads.

Chelsea Kerwin, December 28, 2015

Sponsored by, publisher of the CyberOSINT monograph


Machine Learning Used to Decipher Lute Tablature

December 23, 2015

The Oxford Journal’s Early Music publication reveals a very specialized use of machine learning in, “Bring ‘Musicque into the Tableture’: Machine-Learning Models for Polyphonic Transcription of 16th-Century Lute Tablature” by musical researchers Reinier de Valk and Tillman Weyde. Note that this link will take you to the article’s abstract; to see the full piece, you’ll have to subscribe to the site. The abstract summarizes:

“A large corpus of music written in lute tablature, spanning some three-and-a-half centuries, has survived. This music has so far escaped systematic musicological research because of its notational format. Being a practical instruction for the player, tablature reveals very little of the polyphonic structure of the music it encodes—and is therefore relatively inaccessible to non-specialists. Automatic polyphonic transcription into modern music notation can help unlock the corpus to a larger audience and thus facilitate musicological research.

“In this study we present four variants of a machine-learning model for voice separation and duration reconstruction in 16th-century lute tablature. These models are intended to form the heart of an interactive system for automatic polyphonic transcription that can assist users in making editions tailored to their own preferences. Additionally, such models can provide new methods for analysing different aspects of polyphonic structure.”

The full article lays out the researchers’ modelling approaches and the advantages of each. They report their best model returns accuracy rates of 80 to 90 percent, so for modelers, it might be worth the $39 to check out the full article. We just think it’s nice to see machine learning used for such a unique and culturally valuable project.


Cynthia Murrell, December 23, 2015

Sponsored by, publisher of the CyberOSINT monograph

Use the Sentiment Analysis Luke

December 22, 2015

The newest Star Wars film is out in theaters and any credible Star Wars geek has probably seen the film at least twice.  One theme that continues to be prevalent in the franchise is the use of the mystical, galactic power the Force.  The Force gives the Jedi special powers, such as the ability to read a person’s mind.  Computer Weekly says that data will be able to do the same thing in: “Sentiment Analysis With Hadoop: 5 Steps To Becoming A Mind Reader.”

While the article title reads more like a kit on how to became a psychic cheat, sentiment analysis has proven to predict a person’s actions, especially their shopping habits.  Sentiment analysis is a huge market for companies wanting to learn how to reach their shoppers on a more intimate level, predict trends before they happen, and connect with shoppers in real-time.  Apache Hadoop is a tool used to harness the power of data to make anyone with the right knowledge a mind reader and Twitter is one of the tools used.

First-data is collect, second-label data to create a data dictionary with positive or negative annotations, third-run analytics, fourth-run through a beta phase, and fifth-get the insights. While it sounds easy, the fourth step is going to be the biggest hassle:

“Remember that analytic tools that just look for positive or negative words can be entirely misleading if they miss important context. Typos, intentional misspellings, emoticons and jargon are just few additional obstacles in the task.

Computers also don’t understand sarcasm and irony and as a general rule are yet to develop a sense of humor. Too many of these and you will lose accuracy. It is probably best to address this point by fine-tuning your model.”

The purpose of sentiment analysis is teaching software how to “think” like a human and understand all our illogical ways.  (Hmm…that was a Star Trek reference, whoops!)  Hadoop Apache might not have light sabers or help you find droids, but it does offer to help understand consumers spending habits.   So how about, “These are the greenbacks you have been looking for.”

Whitney Grace, December 22, 2015
Sponsored by, publisher of the CyberOSINT monograph

Big Data Gets Emotional

December 15, 2015

Christmas is the biggest shopping time of the year and retailers spending months studying consumer data.  They want to understand consumer buying habits, popular trends in clothing, toys, and other products, physical versus online retail, and especially what competition will be doing sale wise to entice more customers to buy more.  Smart Data Collective recently wrote about the science of shopping in “Using Big Data To Track And Measure Emotion.”

Customer experience professionals study three things related to customer spending habits: ease, effectiveness, and emotion.  Emotion is the biggest player and is the biggest factor to spur customer loyalty.  If data specialists could figure out the perfect way to measure emotion, shopping and science would change as we know it.

“While it is impossible to ask customers how do they feel at every stage of their journey, there is a largely untapped source of data that can provide a hefty chunk of that information. Every day, enterprise servers store thousands of minutes of phone calls, during which customers are voicing their opinions, wishes and complaints about the brand, product or service, and sharing their feelings in their purest form.”

The article describes some methods emotional data is fathered: phone recordings, surveys, and with vocal layer speech layers being the biggest.  Analytic platforms that measure vocal speech layers that measure relationships between words and phrases to understand the sentiment.  The emotions are ranged on a five-point scale, ranging from positive to negative to discover patterns that trigger reactions.

Customer experience input is a data analyst’s dream as well as nightmare based on all of the data constantly coming.

Whitney Grace, December 15, 2015
Sponsored by, publisher of the CyberOSINT monograph

Computers Pose Barriers to Scientific Reproducibility

December 9, 2015

These days, it is hard to imagine performing scientific research without the help of computers. details the problem that poses in its thorough article, “How Computers Broke Science—And What We Can Do to Fix It.” Many of us learned in school that reliable scientific conclusions rest on a foundation of reproducibility. That is, if an experiment’s results can be reproduced by other scientists following the same steps, the results can be trusted. However, now many of those steps are hidden within researchers’ hard drives, making the test of reproducibility difficult or impossible to apply. Writer, Ben Marwick points out:

“Stanford statisticians Jonathan Buckheit and David Donoho [PDF] described this issue as early as 1995, when the personal computer was still a fairly new idea.

‘An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures.’

“They make a radical claim. It means all those private files on our personal computers, and the private analysis tasks we do as we work toward preparing for publication should be made public along with the journal article.

This would be a huge change in the way scientists work. We’d need to prepare from the start for everything we do on the computer to eventually be made available for others to see. For many researchers, that’s an overwhelming thought. Victoria Stodden has found the biggest objection to sharing files is the time it takes to prepare them by writing documentation and cleaning them up. The second biggest concern is the risk of not receiving credit for the files if someone else uses them.”

So, do we give up on the test of reproducibility, or do we find a way to address those concerns? Well, this is the scientific community we’re talking about. There are already many researchers in several fields devising solutions. Poetically, those solutions tend to be software-based. For example, some are turning to executable scripts instead of the harder-to-record series of mouse clicks. There are also suggestions for standardized file formats and organizational structures. See the article for more details on these efforts.

A final caveat: Marwick notes that computers are not the only problem with reproducibility today. He also cites “poor experimental design, inappropriate statistical methods, a highly competitive research environment and the high value placed on novelty and publication in high-profile journals” as contributing factors. Now we know at least one issue is being addressed.

Cynthia Murrell, December 9, 2015

Sponsored by, publisher of the CyberOSINT monograph

Yandex Takes on Google with Anticompetitive Business Practices

November 30, 2015

Google is the dominate search engine in North America, South America, and Europe.  When it comes to Asia, however, Google faces stiff competition with Yahoo in Japan and Yandex in Russia.  Yandex has been able to hold a firm market share and remains stuff competition for Google.  Reuters says that “Russia’s Yandex Says Complained To EU Over Google’s Android” pointing to how Yandex might be able to one up its competition.

According to the article, Russia has petitioned the European Commission to investigate Google’s practices related to the Android mobile OS.  Yandex has been trying for a long time to dislodge Google’s attempts to gain a stronger market share in Europe and Asia.

“The new complaint could strengthen the case against Google, possibly giving enough ammunition to EU antitrust regulators to eventually charge the company with anti-competitive business practices, on top of accusations related to its Google Shopping service. The formal request was filed in April 2015 and largely mirrors the Russian company’s claims against the U.S. company in a Russian anti-monopoly case that Yandex won.”

The Russian competition watchdog discovered that Google is trying to gain an unfair advantage in the European and Asian search markets.  Yandex is one of the few companies who voices its dislike of Google along with Disconnect, Aptoide, and the FairSearch lobbying group.  Yandex wants the European Commission to restore balance to the market, so that fair competition can return.  Yandex is especially in favor of having mobile device users be able to select their search engine of choice, rather than having one preprogrammed into the OS.

It is interesting to view how competitive business practices take place over seas.  Usually in the United States whoever has the deepest pockets achieves market dominance, but the European Union is proving to uphold a fairer race for search dominance.  Even more interesting is that Google is complaining Yandex is trying to maintain its domiance with these complaints.

Whitney Grace, November 30, 2015
Sponsored by, publisher of the CyberOSINT monograph


Alphabet Google Misspells Relevance, Yikes, Yelp?

November 25, 2015

I read “Google Says Local Search Result That Buried Rivals Yelp, Trip Advisor Is Just a Bug.” I thought the relevance, precision, and objectivity issues had been put into a mummy style sleeping bag and put in the deep freeze.

According to the write up:

executives from public Internet companies Yelp and TripAdvisor noted a disturbing trend: Google searches on smartphones for their businesses had suddenly buried their results beneath Google’s own. It looked like a flagrant reversal of Google’s stated position on search, and a move to edge out rivals.

The article contains this statement attributed to the big dog at Yelp:

Far from a glitch, this is a pattern of behavior by Google.

I don’t have a dog in this fight nor am I looking for a dog friendly hotel or a really great restaurant in Rooster Run, Kentucky.

My own experience running queries on Google is okay. Of course, I have the goslings, several of whom are real live expert searchers with library degrees and one has a couple of well received books to her credit. Oh, I forgot. We also have a pipeline to a couple of high profile library schools, and I have a Rolodex with the names and numbers of research professionals who have pretty good search skills.

So maybe my experience with Google is different from the folks who are not able to work around what the Yelp top dog calls, according to the article, “Google’s monopoly.”

My thought is that those looking for free search results need to understand how oddities like relevance, precision, and objectivity are defined at the Alphabet Google thing.

Google even published a chunky manual to help Web masters, who may have been failed middle school teachers in a previous job, do things the Alphabet Google way. You can find that rules of the Google information highway here.

The Google relevance, precision, and objectivity thing has many moving parts. Glitches are possible. Do Googlers make errors? In my experience, not too many. Well, maybe solving death, Glass, and finding like minded folks in the European Union regulators’ office.

My suggestion? Think about other ways to obtain information. When a former Gannet sci tech reporter could not find Cuba Libre restaurant in DC on his Apple phone, there was an option. I took him there even though the eatery was not in the Google mobile search results. Cuba Libre is not too far from the Alphabet Google DC office. No problem.

Stephen E Arnold, November 25, 2015

Latest Global Internet Report Available

October 30, 2015

The Internet Society has made available its “Global Internet Report 2015,” just the second in its series. World-wide champions of a free and open Internet, the society examines mobile Internet usage patterns around the globe. The report’s Introduction explains:

“We focus this year’s report on the mobile Internet for two reasons. First, as with mobile telephony, the mobile Internet does not just liberate us from the constraints of a wired connection, but it offers hundreds of millions around the world their only, or primary, means of accessing the Internet. Second, the mobile Internet does not just extend the reach of the Internet as used on fixed connections, but it offers new functionality in combination with new portable access devices.”

It continues with this important warning:

“The nature of the Internet should remain collaborative and inclusive, regardless of changing means of access. In particular, the mobile Internet should remain open, to enable the permission-less innovation that has driven the continuous growth and evolution of the Internet to date, including the emergence of the mobile Internet itself.”

Through the report’s landing page, above, you can navigate to the above-cited Introduction, the report’s Executive Summary, and Section 2: Trends and Growth. There is even an interactive mobile Internet timeline. Scroll to the bottom to download the full report, in PDF, Kindle, or ePub formats. The download is free, but those interested can donate to the organization here.

Cynthia Murrell, October 30, 2015

Sponsored by, publisher of the CyberOSINT monograph

CSI Search Informatics Are Actually Real

October 29, 2015

CSI might stand for a popular TV franchise, but it also stands for “compound structured identification” explains in “Bioinformaticians Make The Most Efficient Search Engine For Molecular Structures Available Online.” Sebastian Böcker and his team at the Friedrich Schiller University are researching metabolites, chemical compounds that determine an organism’s metabolism.  Metabolites are used to gauge information about the condition of living cells.

While this is amazing science there are some drawbacks:

“This process is highly complex and seldom leads to conclusive results. However, the work of scientists all over the world who are engaged in this kind of fundamental research has now been made much easier: The bioinformatics team led by Prof. Böcker in Jena, together with their collaborators from the Aalto-University in Espoo, Finland, have developed a search engine that significantly simplifies the identification of molecular structures of metabolites.”

The new search works like a regular search engine, but instead of using keywords it searches through molecular structure databases containing information and structural formulae of metabolites.  The new search will reduce time in identifying the compound structures, saving on costs and time.  The hope is that the new search will further research into metabolites and help researchers spend more time working on possible breakthroughs.

Whitney Grace, October 29, 2015

Sponsored by, publisher of the CyberOSINT monograph

Next Page »