Visualizing a Web of Sites
February 6, 2017
While the World Wide Web is clearly a web, it has not traditionally been presented visually as such. Digital Trends published an article centered around a new visualization of Wikipedia, Race through the Wikiverse for your next internet search. This web-based interactive 3D visualization of the open source encyclopedia is at Wikiverse.io. It was created by Owen Cornec, a Harvard data visualization engineer. It pulls about 250,000 articles from Wikipedia and makes connections between articles based on overlapping content. The write-up tells us,
Of course it would be unreasonable to expect all of Wikipedia’s articles to be on Wikiverse, but Cornec made sure to include top categories, super-domains, and the top 25 articles of the week.
Upon a visit to the site, users are greeted with three options, each of course having different CPU and load-time implications for your computer: “Light,” with 50,000 articles, 1 percent of Wikipedia, “Medium,” 100,000 articles, 2 percent of Wikipedia, and “Complete,” 250,000 articles, 5 percent of Wikipedia.
Will this pave the way for web-visualized search? Or, as the article suggests, become an even more exciting playing field for The Wikipedia Game? Regardless, this advance makes it clear the importance of semantic search. Oh, right — perhaps this would be a better link to locate semantic search (it made the 1 percent “Light” cut).
Megan Feil, February 6, 2017
Counter Measures to Money Laundering
January 30, 2017
Apparently, money laundering has become a very complicated endeavor, with tools like Bitcoin “washers” available via the Dark Web. Other methods include trading money for gaming or other virtual currencies and “carding.” ZDNet discusses law enforcement’s efforts to keep up in, “How Machine Learning Can Stop Terrorists from Money Laundering.”
It will not surprise our readers to learn authorities are turning to machine learning to cope with new money laundering methods. Reporter Charlie Osborne cites the CEO of cybersecurity firm ThetaRay, Mark Gazit, when she writes:
By taking advantage of Big Data, machine learning systems can process and analyze vast streams of information in a fraction of the time it would take human operators. When you have millions of financial transactions taking place every day, ML provides a means for automated pattern detection and potentially a higher chance of discovering suspicious activity and blocking it quickly. Gazit believes that through 2017 and beyond, we will begin to rely more on information and analytics technologies which utilize machine learning to monitor transactions and report crime in real time, which is increasingly important if criminals are going to earn less from fraud, and terrorism groups may also feel the pinch as ML cracks down on money laundering.
Of course, criminals will not stop improving their money-laundering game, and authorities will continue to develop tools to thwart them. Just one facet of the cybersecurity arms race.
Cynthia Murrell, January 30, 2017
Bing Gets Nostalgic
January 25, 2017
In my entire life, I have never seen so many people who were happy to welcome in a New Year. 2016 will be remembered for violence, political uproar, and other stuff that people wish to forget. Despite the negative associations with 2016, other stuff did happen and looking back might offer a bit of nostalgia for the news and search trends of the past year. On MSFT runs down a list of what happened on Bing in 2016,“Check Out The Top Search Trends On Bing This Past Year.”
Rather than focusing on a list of just top searches, Bing’s top 2016 searches are divided into categories: video games, Olympians, viral moments, tech trends, and feel good stories. More top searches are located over at Bing page. However, on the top viral trends it is nice to see that cat videos have gone down in popularity:
Ryder Cup heckler
Villanova’s piccolo girl
Powerball
Aston Martin winner
Who’s the mom?
Evgenia Medvedeva
Harambe the gorilla
#DaysoftheWeek
Cats of the Internet
Pokemon Go
On a personal level, I am surprised that Harambe the gorilla outranked Pokemon Go. Some of these trends I do not even remember making the Internet circuit and I was on YouTube and Reddit for all of 2016. I have been around enough years to recognize that things come and go and 2016 might have come off as a bad year for many, in reality, it was another year. It also did not forecast doomsday. That was back in 2000, folks. Get with the times!
Whitney Grace, January 25, 2017
The Software Behind the Web Sites
January 17, 2017
Have you ever visited an awesome Web site or been curious how an organization manages their Web presence? While we know the answer is some type of software, we usually are not given a specific name. Venture Beat reports that it is possible to figure out the software in the article, “SimilarTech’s Profiler Tells You All Of The Technologies That Web Companies Are Using.”
SimilarTech is a tool designed to crawl the Internet to analyze what technologies, including software, Web site operators use. SimiliarTech is also used to detect which online payment tools are the most popular. It does not come as a surprise that PayPal is the most widely used, with PayPal Subscribe and Alipay in second and third places.
Tracking what technology and software companies utilize for the Web is a boon for salespeople, recruiters, and business development professionals who want a competitive edge as well as:
Overall, SimilarTech provides big data insights about technology adoption and usage analytics for the entire internet, providing access to data that simply wasn’t available before. The insights are used by marketing and sales professionals for website profiling, lead generation, competitive analysis, and business intelligence.
SimiliarTech can also locate contact information for personnel responsible for Web operations, in other words new potential clients.
This tool is kind of like the mailing houses of the past. Mailing houses have data about people, places, organizations, etc. and can generate contact information lists of specific clientele for companies. SimiliarTech offers the contact information, but it does one better by finding the technologies people use for Web site operation.
Whitney Grace, January 17, 2016
McKinsey and Analytics: Make Sure You Are a One Percenter
January 15, 2017
McKinsey & Co., the blue chip consulting firm, is doing its part to motivate students to ace their SATs. You can get a glimpse of the future for those who are not over achievers and able to get hired at an oligopoly in “The Age of Analytics: Competing in a Data Driven World.” If your firm is a customer of McKinsey, you can wrangle a briefing and get even more juicy insights. But for the folks who live in Harrod’s Creek, we have to make do with the free write up.
The main point is that organizations who embrace analytics can just be more successful. More money, more influence, more, more, more. In today’s uncertain business climate, the starving cats are going to pay attention to this catnip.
The write up reveals:
Leading companies are using their capabilities not only to improve their core operations but also to launch entirely new business models. The network effects of digital platforms are creating a winner-take-most situation in some markets. The leading firms have remarkably deep analytical talent taking on various problems—and they are actively looking for ways to enter other industries. These companies can take advantage of their scale and data insights to add new business lines, and those expansions are increasingly blurring traditional sector boundaries.
Net net: hire McKinsey to help you take advantage of this opportunity. For those who are not working hard to be perceived as smart enough to work at a blue chip outfit like McKinsey, there may be universal basic income in your not so bright future.
Stephen E Arnold, January 15, 2017
More Poll Excitement: Information Overload
January 5, 2017
I read “Really? Most Americans Don’t Suffer Information Overload.” The main idea is that folks in the know, in the swim, and in the top one percent suffer from too much information. The rest of the ignorance-is-bliss crowd has a different perception.
The write up explains, reports, states:
A new report from the Pew Research Center says that most Americans do not suffer from information overload—even though many of us frequently say otherwise.
What’s up with that?
The write up points out:
Many people complain about the volume of information coming at us. But we want it. Adweek reported earlier this year that the average person consumes almost 11 hours of media per day. That’s everything from text messages to TV programs to reading a newspaper.
Well, the Pew outfit interviewed 1,520 people which is sample approved by those who look in the back of statistics 101 textbooks rely upon. I have no details about the demographics of the sample, geographic location, and reason these folks took time out from watching Netflix to answer the Pew questions, however.
The answer that lots of people don’t suffer from information overload seems wrong when viewed from the perspective of a millennial struggling to buy a house while working as a customer support rep until the automated system is installed.
But wait. The write up informs me:
the recent national election showed that “in a lot of ways people live in small information bubbles. They get information on social media that has been filtered for them. It is filtered by the network they belong to. In a lot of ways, there’s less information and much of it is less diverse than it was in an earlier era.” The public’s hunger for that information is reflected in a study conducted by Bank of America. The bank found that 71 percent of the people they surveyed sleep within arm’s reach of their smartphone. And 3 percent of those people hold their smartphone while they’re in dreamland.
Too much information for me.
Stephen E Arnold, January 5, 2017
First God, Then History, and Now Prediction: All Dead Like Marley
January 3, 2017
I read a chunk of what looks to me like content marketing called “The Death of Prediction.” Prediction seems like a soft target. There were the polls which made clear that Donald J. Trump was a loser. Well, how did that work out? For some technology titans, the predictions omitted a grim pilgrimage to Trump Tower to meet the deal maker in person. Happy faces? Not so many, judging from the snaps of the Sillycon Valley crowd and one sycophant from Armonk.
The write up points out that predictive analytics are history. The future is “explanatory analytics.” An outfit called Quantifind has figured out that explaining is better than predicting. My hunch is that explaining is little less risky. Saying that the Donald would lose is tough to explain when the Donald allegedly “won.”
Explaining is like looser. The black-white, one-two, or yes-no thing is a bit less gelatinous.
So what’s the explainer explaining? The checklist is interesting:
- Alert me when it matters. The idea is that a system or smart software will proactively note when something important happens and send one of those mobile phone icon things to get a human to shift attention to the new thing. Nothing like distraction I say.
- Explore why on one’s own. Yep, this works really well for spelunkers who find themselves trapped. Exploration is okay, but it is helpful to [a] know where one is, [b] know where one is going, and [c] know the territory. Caves can be killers, not just dark and damp.
- Quantify impact in “real” dollars. The notion of quantifying strikes me as important. But doesn’t one quantify to determine if the prediction were on the money. I sniff a bit of flaming contradiction. The notion of knowing something in real time is good too. Now the problem becomes, “What’s real time?” I have tilled this field before and saying “real time” is different from delivering what one expects and what the system can do and what the outfit can afford.
It’s not even 2017, and I have learned that “prediction” is dead. I hope someone tells the folks at Recorded Future and Palantir Technologies. Will they listen?
Buzzwording with cacaphones is definitely alive and kicking.
Stephen E Arnold, January 3, 2017
Internet Watch Fund Teams with Blockchain Forensics Startup
December 29, 2016
A British charity is teaming up with an online intelligence startup specializing in Bitcoin. The Register reports on this in their piece called, Bitcoin child abuse image pervs will be hunted down by the IWF. The Internet Watch Foundation, with the help of a UK blockchain forensics start-up, Elliptic, aims to identify individuals who use Bitcoin to purchase child abuse images online. The IWF will provide Elliptic with a database of Bitcoin addresses and Elliptic takes care of the rest. We learned,
The IWF has identified more than 68,000 URLs containing child sexual abuse images. UNICEF Malaysia estimates two million children across the globe are affected by sexual exploitation every year. Susie Hargreaves, IWF CEO, said, “Over the past few years, we have seen an increasing amount of Bitcoin activity connected to purchasing child sexual abuse material online. Our new partnership with Elliptic is imperative to helping us tackle this criminal use of Bitcoin.” The collaboration means Elliptic’s clients will be able to automatically monitor transactions they handle for any connection to proceeds of child sex abuse.
Machine learning and data analytics technologies are used by Elliptic to collect actionable evidence for law enforcement and intelligence agencies. The interesting piece of this technology, and others like it, is that it runs perhaps as surreptitiously in the background as those who use the Dark Web and Bitcoin for criminal activity believe they do.
Megan Feil, December 29, 2016
An Apologia for People. Big Data Are Just Peachy Keen
December 25, 2016
I read “Don’t Blame Big Data for Pollsters’ Failings.” The news about the polls predicting a victory for Hillary Clinton reached me in Harrod’s Creek five days after the election. Hey, Beyond Search is in rural Kentucky. It looks from the news reports and the New York Times’s odd letter about doing “real” journalism that the pundits predicted that the mare would win the US derby.
The write up explains that Big Data did not fail. The reason? The pollsters were not using Big Data. The sample sizes were about 1,000 people. Check your statistics book. In the back will be samples sizes for populations. If you have an older statistics book, you have to use the formula like
Big Data doesn’t fool around with formulas. Big Data just uses “big data.” Is the idea is that the bigger the data, the better the output?
The write up states that the problem was the sample itself: The actual humans.
The write up quotes a mid tier consultant from an outfit called Ovum which reminds me of eggs. I circled this statement:
“When you have data sets that are large enough, you can find signals for just about anything,” says Tony Baer, a big data analyst at Ovum. “So this places a premium on identifying the right data sets and asking the right questions, and relentlessly testing out your hypothesis with test cases extending to more or different data sets.”
The write up tosses in social media. Facebook takes the position that its information had minimal effect on the election. Nifty assertion that.
The solution is, as I understand the write up, to use a more real time system, different types of data, and math. The conclusion is:
With significant economic consequences attached to political outcomes, it is clear that those companies with sufficient depth of real-time behavioral data will likely increase in value.
My view is that hope and other distinctly human behaviors certainly threw an egg at reality. It is great to know that there is a fix and that Big Data emerge as the path forward. More work ahead for the consultants who often determine sample sizes by looking at Web sites like SurveySystem and get their sample from lists of contributors, a 20 something’s mobile phone contact list, or lists available from friends.
If you use Big Data, tap into real time streams of information, and do the social media mining—you will be able to predict the future. Sounds logical? Now about that next Kentucky Derby winner? Happy or unhappy holiday?
Stephen E Arnold, December 25, 2016
Potential Tor Browser Vulnerability Reported
December 19, 2016
Over at Hacker Noon, blogger “movrcx” reveals a potential vulnerability chain that he says threatens the entire Tor Browser ecosystem in, “Tor Browser Exposed: Anti-Privacy Implantation at Mass Scale.” Movrcx says the potential avenue for a massive hack has existed for some time, but taking advantage of these vulnerabilities would require around $100,000. This could explain why movrcx’s predicted attack seems not to have taken place. Yet. The write-up summarizes the technique:
Anti-Privacy Implantation at Mass Scale: At a high-level the attack path can be described by the following:
*Attacker gains custody of an addons.mozilla.org TLS certificate (wildcard preferred)
*Attacker begins deployment of malicious exit nodes
*Attacker intercepts the NoScript extension update traffic for addons.mozilla.org
*Attacker returns a malicious update metadata file for NoScript to the requesting Tor Browser
*The malicious extension payload is downloaded and then silently installed without user interaction
*At this point remote code execution is gained
*The attacker may use an additional stage to further implant additional software on the machine or to cover any signs of exploitation
This attack can be demonstrated by using Burp Suite and a custom compiled version of the Tor Browser which includes a hardcoded root certificate authority for transparent man-in-the-middle attacks.
See the article for movrcx’s evidence, reasoning, and technical details. He emphasizes that he is revealing this information in the hope that measures will be taken to nullify the potential attack chain. Preferably before some state or criminal group decides to invest in leveraging it.
Cynthia Murrell, December 19, 2016