Analytics Leaders: No Google, No Voyager Labs

April 25, 2019

I read “Top 50 Organizations for Data Analytics to Be Honored.” Interesting idea: Identify outfits which are really, really good with analytics: Data mining, text mining, math, and numerical recipes which are yummy, yummy.

The names on the list were a bit of a surprise to me; for instance:

  • A football team, Philadelphia Eagles. The Eagles?
  • A not for profit with an interesting history, United Way
  • A company unable to build a 5G technology based product and unable to deliver certain silicon to some customers, Intel.

What’s up?

I expected that the Google would get a mention and a footnote to Recorded Future, partially funded by Google and In-Q-Tel, the investment arm of the Central Intelligence Agency. Where’s Voyager Labs, the developer of Voyager Analytics?

After reading the article, I am not sure how the list was developed, and I am not confident that the organizations cited for excellence in analytics would make my list of analytics leaders.

But the important thing is the PR.

Stephen E Arnold, April 25, 2019

AI: Another Crisis!

April 25, 2019

Science, in today’s post modern world, seems to face crisis after crisis. Religious zealots, political fanboyz, and talking head news are often the triggers, but lack of funding and support are a modern cause. Machine learning is creating a new crisis in science says BBC article, “AAAS: Machine Learning ‘Causing Science Crisis.’”

According to Dr. Genevera Allen from Rice University that an increased reliance on machine learning in scientific studies has led to a crisis. She presented her research on this topic at the American Association for the Advancement of Science, where she warned scientists that if they did not improve their techniques they were wasting precious time and money. More scientific studies rely on machine learning to digest and gather results from data. The data sets are huge and are also expensive.

“But, according to Dr. Allen, the answers they come up with are likely to be inaccurate or wrong because the software is identifying patterns that exist only in that data set and not the real world. ‘Often these studies are not found out to be inaccurate until there’s another real big dataset that someone applies these techniques to and says ‘oh my goodness, the results of these two studies don’t overlap’ she said. ‘There is general recognition of a reproducibility crisis in science right now. I would venture to argue that a huge part of that does come from the use of machine learning techniques in science.’”

The reproducibility crisis means that when an experiment is repeated, scientists cannot replicate the results. Being able to reproduce results is a core practice in the scientific method, a tried and true method that annoys school children but ensures accuracy. When the results cannot be reproduced, it means the first set of results are wrong. It is possible that up to 85% of biomedical research done in the world is not accurate. Is machine learning making scientists lazy? If these results are applied in the real world, it could be worse than lack of funding and possibly religious zealots…possibly.

Whitney Grace, April 25, 2019

Latest GraphDB Edition Available

April 25, 2019

A new version of GraphDB is now available, we learn from the company’s News post, “Ontotext’s GraphDB 8.9 Boosts Semantic Similarity Search.” The semantic graph database offers a couple new features inspired by user feedback. We learn:

“The semantic similarity search is based on the Random Indexing algorithm. … The latest GraphDB release enables users to create hybrid similarity searches using pre-built text-based similarity vectors for the predication-based similarity index. The index combines the power of graph topology with the text similarity. The users can control the index accuracy by specifying the number of iterations required to refine the embeddings. Another improvement is that now GraphDB 8.9 allows users to boost the term weights when searching in text-based similarity indexes. It also simplifies the processes of abortion of running queries or updates from the SPARQL editor in the Workbench.”

The database continues to be updated to the current RDF4J 2.4.6 public release. GraphDB comes in Free, Standard, and Enterprise editions. Begun in 2000, Ontotext is based in Sofia, Bulgaria, and maintains its North American office in New York City.

Cynthia Murrell, April 25, 2019

Nosing Beyond the Machine Learning from Human Curated Data Sets: Autonomy 1996 to Smart Software 2019

April 24, 2019

How does one teach a smart indexing system like Autonomy’s 1996 “neurodynamic” system?* Subject matter experts (SMEs) assembled training collection of textual information. The article and other content would replicate the characteristics of the content which the Autonomy system would process; that is, index and make searchable or analyzable. The work was important. Get the training data wrong and the indexing system would assign metadata or “index terms” and “category names” which could cause a query to generate results the user could perceive as incorrect.

image

How would a licensee adjust the Autonomy “black box”? (Think of my reference to Autonomy and search as a way of approaching “smart software” and “artificial intelligence.”)

The method was to perform re-training. The approach was practical and for most content domains, the re-training worked. It was an iterative process. Because the words in the corpus fed into the “black box” included new words, concepts, bound phrases, entities, and key sequences, there were several functions integrated into the basic Autonomy system as it matured. Examples ranged from support for term lists (controlled vocabularies) and dictionaries.

The combination of re-training and external content available to the system allowed Autonomy to deliver useful outputs.

Where the optimal results departed from the real world results usually boiled down to several factors, often working in concert. First, licensees did not want to pay for re-training. Second, maintenance of the external dictionaries was necessary because new entities arrive with reasonable frequency. Third, testing and organizing the freshening training sets and the editorial work required to keep dictionaries ship shape was too expensive, time consuming, and tedious.

Not surprisingly, some licensees grew unhappy with their Autonomy IDOL (integrated data operating layer) system. That, in my opinion, was not Autonomy’s fault. Autonomy explained in the presentations I heard what was required to get a system up and running and outputting results that could easily hit 80 percent or higher on precision and recall tests.

The Autonomy approach is widely used. In fact, wherever there is a Bayesian system in use, there is the training, re-training, external knowledge base demand. I just took a look at Haystax Constellation. It’s Bayesian and Haystax makes it clear that the “model” has to be training. So what’s changed between 1996 and 2019 with regards to Bayesian methods?

Nothing. Zip. Zero.

Read more

Defriending Facebook? Harsh

April 24, 2019

Whether it was earnest advice or a public-relations ploy, we’re told Mark Zuckerberg’s recent call for regulation would not actually fix the problems with Facebook. Canada’s CBC News describes “The Case Against Facebook: a ‘Dataopoly’ with Too Much Market Power.” I was interested in reporter Ramona Pringle’s explanation of a “dataopoly;” she cites Carleton University professor Dwayne Winseck, who teaches about Internet governance:

“[Winseck] says with its behemoth scale and singular control over the data of its users, Facebook is a ‘dataopoly.’ A company with a monopoly in a traditional, non-digital industry is able to charge consumers higher prices for goods or services due to the lack of competition. In the case of a dataopoly, the results of that unrivalled power can be less privacy, degraded quality of service, and political and social consequences, writes Prof. Maurice Stucke, an antitrust expert at the University of Tennessee College of Law. With more than two billion users who have few, if any, alternatives to the massive social network and its various platforms — which also include Instagram and WhatsApp — there is little incentive for Facebook to change the way it does business. Winseck says this is clear in the company’s ‘take-it-or-leave-it terms of service.’ Even if a user is uncomfortable with some of the Facebook’s practices, if they want to use the social network, they have no choice but to grin and bear it.”

On top of that, we’re reminded, Facebook keeps a tight grip on everything that crosses its platform, like the nature of its services, how advertisers can target users, and what it really does with all that juicy user data. The only real solution, Pringle insists, is the breakup of Zuckerberg’s company. Like others, this article is skeptical of Zuckerberg’s motives, noting that, for various reasons, Facebook could use some good PR about now. If this was the goal, did it backfire?

Cynthia Murrell, April 24, 2019

Factualities for April 24, 2019

April 24, 2019

Ah, data, big and small, are everywhere. Believe ’em or not:

5. Number of US airports with facial recognition systems. Source: Quartz

2. Number of towns in Kansas which gave Facebook-infused educational program an F. Source: New York Times

12,000. Number of factoids the UK government added to Alexa. Source: The Inquirer

10 percent. Percentage of Americans who do not use the Internet. Source: Pew Research Center

$2.7 billion. FBI’s calculation of the losses to cyber crime in 2018. Source: DarkReading

$30 million. Amount Apple spends for Amazon services. Source: Apple Insider

1 million. Number of robotaxis Elon Musk promises in 2020. Source: Engadget

48 percent. Percentage of Canadians who would be broke if they had to come up with more than $200. Source: BNN Bloomberg

90 minutes. Length of time it took The Weather Channel to recover rom a ransomeware attack. Source: ZDNet

33 percent. Percentage of companies using open source to reduce costs. Source: Enterprisers Project

23 million, Number of people in the US using 123456 as a password. Source: Slashdot

40 million. Number of cyber attacks on Ecuador since forcing Wikileaks’ founder out of the UK Ecuador embassy. Source: The Inquirer

28 percent. Number of US drivers who ignore the road due to mobile phone use. Source: CNet

50 percent. Amount of alcohol 10 percent of Australian drinkers imbibe. Source: Online Library Wiley

Stephen E Arnold, April 24, 2019

 

 

 

 

DarkCyber for April 23, 2019, Now Available

April 23, 2019

DarkCyber for April 23, 2019, is now available at www.arnoldit.com/wordpress and on Vimeo at https://vimeo.com/331645696.

The program is a production of Stephen E Arnold. It is the only weekly video news shows focusing on the Dark Web, cybercrime, and lesser known Internet services.

This week’s story line up includes: Candiru, a vendor of cyber software; ways to obtain open source content for free; a shotgun equipped drone; and a look at the conclusions from the audit of the LAPD data driven policing effort.

This week’s feature looks at the conclusions reported in the audit of the Los Angeles Police Department’s data-driven policing programs. In the final part of this three-part series we look at the major weakness identified by the Inspector General’s team. The challenge will be to introduce workflows which reduce the errors in data provided to the analytic systems. Stephen E Arnold, producer of DarkCyber, said: “Investigators have work procedures in place for tangible evidence. Information streaming from GPS systems or automatic devices may vary from the after action reports filed by law enforcement professionals. With conflicting data, the analytic systems can produce outputs which are less accurate. Training can help, but specialists who review data may play a more important role as data-driven policing increases.” The audit reveals that the software used by LAPD helps reduce criminal activity. Data quality requires attention.

Other stories in the DarkCyber video include:

A low-profile cyber intelligence firm called Candiru develops tools for law enforcement and government agencies. The company markets in the Middle East and in some Asian countries. Candiru is just one of more than 100 firms providing cyber services from Tel Aviv. The company’s name evokes a powerful image of how the firm’s technology works.

Russia’s large defense contractor funded a program to develop weaponized drones. One of the more interesting engineering solutions involved a vertical takeoff and landing drone equipped with a shotgun. The drone flies near a target and a ground operator discharges the shotgun in order to disable the target. The drone makes it clear that autonomous or semi-autonomous technology combined with weapons can yield a potent force multiplier.

Social media content is available from commercial vendors, often at costs that range from $5,000 a month an up. DarkCyber reveals that there are low cost or no cost options available to investigators with technical expertise. There are more than a dozen application programming interfaces available. Each can deliver a stream of near-real time data for analysis in an IBM Analyst’s Notebook- or Palantir Technologies-type system.

Kenny Toth, April 23, 2019

Machine Learning and Data Quality

April 23, 2019

We’re updating our data quality files as part of the run up to my lecture at the TechnoSecurity & Digital Forensics Conference. A paper by Sanau.co is worth reading if you are thinking about how to solve some issues with the accuracy of the outputs of some machine learning systems. “Dear AI Startups: Your ML Models Are Dying Quietly.” The slow deterioration of certain Bayesian methods has been a subject I have addressed for years. The Sanau write up called to my attention another source of data deterioration or data rot; that is, seemingly logical changes made to field names and the insidious downstream consequences of these changes. The article provides useful explanations and a concrete example drawn from ecommerce. The article has a much broader application. Worth reading.

Stephen E Arnold, April 23, 2019

Facebook: A Bubbling Cauldron of PR Opportunity

April 23, 2019

I read “Facebook’s New Chief Lawyer Helped Write the Patriot Act.” Then I read “Facebook Taps Former Vulcan and Gates Ventures Exec John Pinette to Run Global Communications.” From these two real news stories, I concluded that the Facebook senior management team is circling its wagons, cleaning up the dorm room, and involving some individuals who may have been excluded from the high school science club party last year. Vulcan Ventures and the Patriot Act. Times are changing at the company which seems to struggle with privacy, legislative wrath, and trust.

Not a moment too soon.

The Guardian, an outfit eager to identify the possible frailties of humanoids in Silicon Valley, published “My TED Talk: How I Took on the Tech Titans in Their Lair” and reported via a contributor who gave a TED talk:

In the theatre, senior executives of Facebook had been “warned” beforehand. And within minutes of stepping off stage, I was told that its press team had already lodged an official complaint. In fairness, what multi-billion dollar corporation with armies of PRs, lawyers and crisis teams, not to mention, embarrassingly, our former deputy prime minister, Nick Clegg, wouldn’t want to push back on the charge that it has broken democracy? Facebook’s difficulty is that it had no grounds to challenge my statement. No counter-evidence. If it was innocent of all charges, why hasn’t Mark Zuckerberg come to Britain and answered parliament’s questions? Though a member of the TED team told me, before the session had even ended, that Facebook had raised a serious challenge to the talk to claim “factual inaccuracies” and she warned me that they had been obliged to send them my script. What factual inaccuracies, we both wondered. “Let’s see what they come back with in the morning,” she said. Spoiler: they never did.

I am not sure when the Patriot Act and Vulcan hires start work, but the Guardian write up may spin up some work for the new, fresh, clear-eyed Facebookers. Not a moment too soon. Wait. Maybe it is too late?

Stephen E Arnold, April 23, 2019

Amazonia for April 22, 2019

April 22, 2019

Amazon continues to grind forward.

Amazon Fails Where Google Struggled: China

China is a big market. China is a country. Armies, police, regulators, and a history of following its leaders. Amazon learned that it, like Google, could not change China. This is a surprise? “Amazon Plans to Shut Down China Marketplace in Rare Retreat” reports:

In a rare retreat for Amazon.com Inc., the e-commerce giant plans to shut down its Chinese marketplace business in July as it shifts its focus to offering mainland consumers overseas products rather than goods from local sellers.

But Amazon will not give up. Even Mark Zuckerberg learned to speak Chinese so he could continue to spread the word about Facebook goodness.

Amazon will keep running its other businesses in China, including Amazon Web Services, Kindle e-books, and cross-border operations that help ship goods from Chinese merchants to customers abroad. Starting on July 18, customers logging in to Amazon’s Chinese web portal, Amazon.cn, will only see a selection of goods from its global store, rather than products from third-party sellers.

Will Amazon triumph in China? That depends on what one means by identifying a victory. DarkCyber does not think the definition will include impinging on Alibaba and JD.com, among other China favorites.

Amazon and Google: Learning to Coexist

DarkCyber noted that the high school science club spat with the high school mathematics club has ended. Amazon’s FireTV will show YouTube videos. Peace in our Time reported:

In a mutual announcement, the two online giants have revealed that they’re collaborating on bringing their services to the other’s devices. “In the coming months,” the YouTube app will be coming back to the Fire TV (Amazon’s devices and Fire TV Edition smart TVs). It will be followed later this year by the YouTube TV and YouTube Kids apps as well. On Amazon’s side, the Prime Video app will add support for casting “in the coming months,” thus supporting Google’s first-party Chromecasts and other Chromecast built-in devices.

Ah, beautiful music to some ears. But wait. Music is not included. The two clubs are likely to meet up in the high school cafeteria to talk about tunes, DarkCyber opines.

Amazon: Staff Management: Energy and Green Edition

Several thousand Amazon staff want Amazon to do more for saving the planet. The issue is not resolved. What triggered the pushback from happy, content, sleek, and well benefited employees. DarkCyber suggests that the firm’s commitment to renewable energy farms half a world away from Seattle were insufficient. Amazon has some deals with Big Energy to help these oil and gas outfits extract carbon sources from Mother Earth. See “Amazon Employees to Execs: Do More on Climate Change” for some exhaust on the subject. Key point: Amazon management faces a management hot spot. First, a supermarket magazine dust up, then the China problem, and now people one pays to do the company’s honest, meaningful labor. Perhaps a Harvard Business School podcast will offer the online bookstore some advice?

Amazon Partners: Implementing the New, Improved IBM Approach to Sales Continues

Some partners of Amazon revealed some of the Amazon plans. Here’s a few which caught our eye:

  • Antian offers its compliance services via Amazon. Source: Geekwire
  • AzCopy has improved its S3 data transfer service. Source: Redmond Magazine
  • Business Software, an income tax services firm, is now an AWS believer. Source: Virtual Strategy
  • Inplayer offers video monetizing services for Amazon. Source: OAOA, part of Aim Media in Texas
  • Instana introduced its cloud management services for Amazon. Source: Virtual Strategy
  • McAfee achieves Amazon certification. Plus, Amazon has identified McAfee as well architected. Source: Marketwatch
  • Phynd, a health care transformation specialist, expanded its Amazon-centric services. The name suggests search, but it seems more of a workflow and content management play. Source: PRNewswire
  • Perspectium provides customer support services via Amazon. The firm also uses Press of Atlantic City for its news releases which wants to charge for marketing information. Annoying indeed. If the link goes dead, think bush league PR play.
  • Pyramid Systems is now an advanced consulting partner for Amazon. Sounds good thought. And, no, DarkCyber does not know what the different levels of partner mean. Source: PRNewswire
  • Tetrate offers Envoy to AWS Mesh users who want micro services for their Web-accessible applications. Source: Marketwatch
  • TigerGraph uses AWS for “pay as you go” graph analytics. Source: Globe News Wire
Amazon: Going the Right Direction Says Yahoo, Verizon, Oath, AOL or Whatever

Despite the mini-crises causing the Bezos bulldozer’s engine to rev, “Jeff Bezos Is Leading Amazon in the Right Direction.” And Verizon should know; it is a paragon of management excellence. According to the company which may have cribbed some ideas from Smarter Analyst:

The founder of Amazon has managed to keep an innovative culture going while they continue to disrupt e-commerce. Bezos anticipates that Amazon can continue to grow its e-commerce footprint in various markets outside the United States where there has been minimal market penetration of e-commerce in general.

Spot on, Yahoooooooo.

For Fans of Amazon’s Policeware

Amazon has added Arabic to the line up of languages which the Amazon Polly system can understand. CRN points out that the service is designed for consumer applications; for example:

The move allows developers to create applications that speak in Arabic and build speech-enabled products and services, including cars, internet of things devices, appliances, automated contact centers, language learning platforms, translation apps and newsreaders.

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta