Voyager Labs Expands into South America
October 14, 2021
Well this is an interesting development. Brazil’s ITForum reports, “Voyager Labs Appoints VP and Opens Operations in Latin America and the Caribbean.” (I read and quote from Google’s serviceable translation.)
Voyager Labs is an Israeli specialized services firm that keeps a very low profile. Their platform uses machine learning to find and analyze clues to fight cyber attacks, organized crime, fraud, corruption, drug trafficking, money laundering, and terrorism. Voyager Labs’ clients include private companies and assorted government agencies around the world.
The brief announcement reveals:
“Voyager Labs, an AI-based cybersecurity and research specialist, announced this week the arrival in Latin America and the Caribbean. To lead the operation, the company appointed Marcelo Comité as regional vice president. The executive, according to the company, has experience in the areas of investigation, security, and defense in Brazil and the region. Comité will have as mission to consolidate teams of experts to improve the services and support in technologies in the region, according to the needs and particularities of each country. ‘It is a great challenge to drive Voyager Labs’ expansion in Latin America and the Caribbean. Together with our network of partners in each country, we will strengthen ties with strategic clients in the areas of government, police, military sector and private companies’, says the executive.”
We are intrigued by the move to South America, since most of the Israeli firms are building operations in Singapore. What’s Voyager know that its competitors do not? Not familiar with Voyager Labs? Worth knowing the company perhaps?
Cynthia Murrell, October 14, 2021
Another Reason for Windows 11?
October 13, 2021
The team at Beyond Search talked yesterday about Windows 11. One individual installed the system on one of our test-only machines and reported, “Not too exciting.” Another dismissed the Windows 11 as a distraction from the still-lingering SolarWinds and Exchange Server security face plants. I took a look and said, “Run some tests to see what it does to the performance of our AMD 5950X machines.”
Then I turned my attention to more interesting things. This morning my trusty Overflight system spit out this headline: “Microsoft: Here’s Why We Shrunk Windows 11 Update Sizes by 40%.” I noted this statement in the article:
…It was necessary to reduce the size of them, which in the past have been almost 5 GB in size. In a word, it’s about bandwidth, which millions of households in the US have a shortage of due to poor broadband in remote areas.
Maybe cost is a factor?
My hunch is that Microsoft has many employees who have opinions about the shift from the last Windows to a last-plus-n Windows.
Several observations from our underground computer lab in rural Kentucky:
- Updates create problems for Microsoft; for example, security issues lurk and actors world wide are enthusiastic about exploring “new” code from Microsoft. Vulnerabilities R’Us it seems.
- Implementing procedures which produce stable code are more expensive than figuring out how to reduce code bloat in updates. Therefore, the pitch touted in the write up cited above.
- Microsoft has shifted from 10,000 sail boats going in the same general direction to 20,000 motor boats going someplace. Evidence? The changing explanation for the existence of Windows 11.
Net net: Big and changing operating system may add vulnerabilities, not just rounded corners and a distraction from deeper issues.
Stephen E Arnold, October 13, 2021
Facebook and Synthetic Data
October 13, 2021
What’s Facebook thinking about its data future?
A partial answer may be that the company is doing some contingency planning. When regulators figure out how to trim Facebook’s data hoovering, the company may have less primary data to mine, refine, and leverage.
The solution?
Synthetic data. The jargon means annotated data that computer simulations output. Run the model. Fiddle with the thresholds. Get good enough data.
How does one get a signal about Facebook’s interest in synthetic data?
Facebook, according to Venture Beat, the responsible social media company acquired AI.Reverie.
Was this a straight forward deal? Sure, just via a Facebook entity called Dolores Acquisition Sub, Inc. If this sounds familiar, the social media leader may have taken its name from a motion picture called “Westworld.”
The write up states:
AI.Reverie — which competed with startups like Tonic, Delphix, Mostly AI, Hazy, Gretel.ai, and Cvedia, among others — has a long history of military and defense contracts. In 2019, the company announced a strategic alliance with Booz Allen Hamilton with the introduction of Modzy at Nvidia’s GTC DC conference. Through Modzy — a platform for managing and deploying AI models — AI.Reverie launched a weapons detection model that ostensibly could spot ammunition, explosives, artillery, firearms, missiles, and blades from “multiple perspectives.”
Booz, Allen may be kicking its weaker partners. Perhaps the wizards at the consulting firm should have purchased AI.Reverie. But Facebook aced out the century old other people’s business outfit. (Note: I used to labor in the BAH vineyards, and I feel sorry for the individuals who were not enthusiastic about acquiring AI.Reverie. Where did that bonus go?)
Several observations are warranted:
- Synthetic data is the ideal dating partner for Snorkel-type machine learning systems
- Some researchers believe that real data is better than synthetic data, but that is a fight like spats between those who love Windows and those who love Mac OSX
- The uptake of “good” enough data for smart statistical systems which aim for 60 percent or better “accuracy” appears to be a mini trend.
Worth watching?
Stephen E Arnold, October 13, 2021
Ampliganda: A Wonderful Word
October 13, 2021
Let’s try to create a meme. That’s sounds like fun. How about coining a word? The Atlantic has one to share. It’s ampliganda.
You can read about the word in “It’s Not Misinformation. It’s Amplified Propaganda.” The write up explains as only the Atlantic and the Stanford Internet Observatory can:
Perhaps the best word for this emergent bottom-up dynamic is one that doesn’t exist quite yet: ampliganda, the shaping of perception through amplification. It can originate from an online nobody or an onscreen celebrity. No single person or organization bears responsibility for its transmission. And it is having a profound effect on democracy and society.
Several observations:
- The Stanford Internet Observatory is definitely quick on the meme trigger. It has been a mere two decades since the search engine optimization crowd figured out how to erode relevance
- A number of the night ampliganda outfits have roots at Stanford. Isn’t that something?
- “Voting” for popularity is a thrilling concept. It works for middle school class officer elections. Algorithms can emulate popularity feedback mechanisms.
Who would have known unless Stanford was on the job? Yep, ampliganda. A word for the ages. Like Google maybe?
Stephen E Arnold, October 13, 2021
Facebook and the UK: About to Get Exciting
October 13, 2021
Remember Cambridge Analytica? I think that some in the UK do. There’s been some suspicion that the Brexit thing may have been shaded by some Cambridge Analytica magic, and that may ignite the modern equivalent of the Protestant-Catholic excitement of the 16th century. Not religion this time. Social media, Facebook style.
The increasingly commercial BBC or Beeb published “Facebook Whistleblower to Appear before UK Parliament.” The write up states:
Frances Haugen, the Facebook whistleblower who accuses the technology giant of putting profit ahead of safety, will give evidence to the UK Parliament later this month. Ms Haugen will appear before the Online Safety Bill committee on 25 October. It is examining a law to impose obligations on social-media companies to protect users, especially children.
Kids are a big deal, but I think the Brexit thing will makes some snorting sounds as well.
The write up states:
Damian Collins, who chairs the committee reviewing the draft legislation, said Ms Haugen’s information to date had “strengthened the case for an independent regulator with the power to audit and inspect the big tech companies”.
Will Facebook’s PR ace get a chance to explain Facebook? What about the Zuck?
Interesting because Ms. Haugen may be asked to do some sharing with EU regulators and the concerned officials in Australia, Canada, and New Zealand too.
Stephen E Arnold, October 13, 2021
Stanford Google AI Bond?
October 12, 2021
I read “Peter Norvig: Today’s Most Pressing Questions in AI Are Human-Centered.” It appears, based on the interview, that Mr. Norvig will work at Stanford’s Institute for Human Centered AI.
Here’s the quote I found interesting:
Now that we have a great set of algorithms and tools, the more pressing questions are human-centered: Exactly what do you want to optimize? Whose interests are you serving? Are you being fair to everyone? Is anyone being left out? Is the data you collected inclusive, or is it biased?
These are interesting questions, and ones that I assume Dr. Timnit Gebru will offer answers.
Will Stanford’s approach to artificial intelligence advance its agenda and address such issues as bias in the Snorkel-type of approach to machine learning? Will Stanford and Google expand their efforts to provide the solutions which Mr. Norvig describes in this way?
You don’t get credit for choosing an especially clever or mathematically sophisticated model, you get credit for solving problems for your users.
Like ads, maybe? Like personnel problems? Like augmenting certain topics for teens? Maybe?
Stephen E Arnold, October 12, 2021
Facebook Engineering: Big Is Tricky
October 12, 2021
The unthinkable happened on October 4, 2021, when Facebook went offline. Despite all the bad press Facebook has recently gotten, the social media network remains an important communication and business tool. The Facebook Engineering blog explains what happened with the shutdown in the post: “More Details About The October 4 Outage.” The outage happened with the system that manages Facebook’s global backbone network capacity.
The backbone connects all of Facebook’s data centers through thousands of miles of fiber optic cable. The post runs down how the backbone essentially works:
“When you open one of our apps and load up your feed or messages, the app’s request for data travels from your device to the nearest facility, which then communicates directly over our backbone network to a larger data center. That’s where the information needed by your app gets retrieved and processed, and sent back over the network to your phone.
The data traffic between all these computing facilities is managed by routers, which figure out where to send all the incoming and outgoing data. And in the extensive day-to-day work of maintaining this infrastructure, our engineers often need to take part of the backbone offline for maintenance — perhaps repairing a fiber line, adding more capacity, or updating the software on the router itself.”
A routine maintenance job issued a command to assess the global backbone’s capacity. Unfortunately it contained a bug the audit system did not catch and it terminated connections between data centers and the Internet. A second problem made things worse. The DNS servers were unreachable yet still operational. Facebook would not connect to their data centers through the normal meals and loss of DNS connections broke internal tools used to repair problems.
Facebook engineers had to physically visit the backbone facility, which is armed with high levels of security. The facility is hard to enter and the systems are purposely designed to be difficult to modify. It took awhile, but Facebook diagnosed and resolved the problems. Baby Boomers were overjoyed to resume posting photos of their grandchildren and anti-vaxxers could read their misinformation feeds.
Perhaps this Facebook incident and the interesting Twitch data breach illustrate that big is tricky? Too big to fail become too big to keep working in a reliable way.
Whitney Grace, October 12, 2021
Glean: Another Enterprise Search Solution
October 12, 2021
Enterprise search features are interesting, but users accept it as an unavoidable tech problems like unfindable content and sluggish indexing.. A former Google engineering director recognized the problem when he started his own startup and Forbes article, “Glean Emerges from Stealth With $55 Million To Bring Search To The Enterprise” tells the story.
Arvind Jain cofounded the cloud data management company Rubrik and always had problems locating information. Rubrik is now worth $3.7 million, but Jain left and formed the new startup Glean with Google veterans Piyush Prahladka, Tony Gentilcore, and T.R. Vishwanath. The team have developed a robust enterprise search engine application from multiple applications. Glean has raised $55 million in funding.
Other companies like Algolia and Elastic addressed the same enterprise search problem, but they focused on search boxes on consumer-facing Web sites instead of working for employees. With more enterprise systems shifting to the cloud and SaaS, Glean’s search product is an invaluable tool. Innovations with deep learning also make Glean’s search product more intuitive and customizable for each user:
“On the user side, Glean’s software analyzes the wording of a search query—for example, it understands that “quarterly goals” or “Q1 areas of focus” are asking the same thing—and shows all the results that correspond to it, whether they are located in Salesforce, Slack or another of the many applications that a company uses. The results are personalized based on the user’s job. Using deep learning, Glean can differentiate personas, such as a salesperson from an engineer, and tailor recommendations based on the colleagues that a user interacts with most frequently.”
Will Glean crack the enterprise search code? Interesting question to which the answer is not yet known.
Whitney Grace, October 12, 2021
AI: The Answer to Cyberthreats Existing Systems Cannot Perceive?
October 12, 2021
This article from The Next Web gives us reason to hope: “Computer Vision Can Help Spot Cyber Threats with Startling Accuracy.” Researchers at the University of Portsmouth and the University of Peloponnese have combined machine learning with binary visualization to identify malware and phishing websites. Both processes involve patterns of color.
Traditional methods of detecting malware involve searching files for known malicious signatures or looking for suspicious behavior during runtime, both of which have their flaws. More recently, several machine learning techniques have been tried but have run into their own problems. Writer Ben Dickson describes these researchers’ approach:
“Binary visualization can redefine malware detection by turning it into a computer vision problem. In this methodology, files are run through algorithms that transform binary and ASCII values to color codes. … When benign and malicious files were visualized using this method, new patterns emerge that separate malicious and safe files. These differences would have gone unnoticed using classic malware detection methods. According to the paper, ‘Malicious files have a tendency for often including ASCII characters of various categories, presenting a colorful image, while benign files have a cleaner picture and distribution of values.’”
See the article for an illustration of this striking difference. The team trained their neural network to recognize these disparities. It became especially good at spotting malware in .doc and .pdf files, both of which are preferred vectors for ransomware attacks.
A phishing attack succeeds when a user is tricked into visiting a malicious website that poses as a legitimate service. Companies have used website blacklists and whitelists to combat the problem. However, blacklists can only be updated once someone has fallen victim to a particular site and whitelists restrict productivity and are time-consuming to maintain. Then there is heuristics, an approach that is more accurate than blacklists but still misses many malicious sites. Here is how the binary visualization – machine learning approach may save the day:
“The technique uses binary visualization libraries to transform website markup and source code into color values. As is the case with benign and malign application files, when visualizing websites, unique patterns emerge that separate safe and malicious websites. The researchers write, ‘The legitimate site has a more detailed RGB value because it would be constructed from additional characters sourced from licenses, hyperlinks, and detailed data entry forms. Whereas the phishing counterpart would generally contain a single or no CSS reference, multiple images rather than forms and a single login form with no security scripts. This would create a smaller data input string when scraped.’”
Again, the write-up shares an illustration of this difference—it would make for a lovely piece of abstract art. The researchers were able to train their neural network to identify phishing websites with an impressive 94% accuracy. Navigate to the article for more details on their methods. The papers’ co-author Stavros Shiaeles says the team is getting its technique ready for real-world applications as well as adapting it to detect malware traffic on the growing Internet of Things.
Cynthia Murrell, October 12, 2021
Progress: Marketing Triumphs, Innovating Becomes SEO
October 11, 2021
I read “Slowed Canonical Progress in Large Fields of Science.” My take on the write up is different from what the authors intended. The notion of “science” I bring ignores physics, medicine, mathematics, and computational chemistry.
The write up is about marketing, good old-fashioned salesmanship. Don’t take my comment as that of a person annoyed at academics or big thinkers. I believe that the authors have articulated an important idea. I simply view their insight as an example of a a particular manifestation of generating buzz, closing a deal, making a sale, or believing the assertions so common in advertising.
The write up states:
Rather than causing faster turnover of field paradigms, a deluge of new publications entrenches top-cited papers, precluding new work from rising into the most-cited, commonly known canon of the field.
Isn’t this “more is better” similar to generating clicks to a Web page — whether the content of the Web page is germane to a topic or not? I do.
I call this the SEO-ization of knowledge. Dr. Gene Garfield, the father of citation analysis, did not anticipate search engine optimization becoming the objective of his approach to determining importance in a scientific field.
The write up makes clear that:
As fields get larger, the most-cited papers become durably dominant, entrenched atop the citation distribution. New papers, in contrast, suffer diminished probability of ever becoming very highly cited and cannot gradually accumulate attention over time. Published papers tend to develop existing ideas more than disrupt them, and rarely launch disruptive new streams of research.
The effect of this “entrenchment” is little more than finding a way to get attention in a setting which resists change.
I think that the data presented in the paper provide an insight useful to understanding the vapidity of so-called corporate white papers to the interesting expressions of business ideas on LinkedIn and much more.
Advertising and search engine optimization are the defining characteristics of the last 10 years. The fact that it permeates scientific and technical work is evidence that intellectual endeavors are little more than key word stuffing.
Who “regulates” the behavior? A government agency? The reviewers of a technical paper? The publishers of journals dependent on commercial enterprises for survival? The young researcher who follows the well-worn path?
Search engine optimization-type thinking has been absorbed into the intellectual foundations of scientific and technical disciplines.
Now it’s marketing which is much easier than innovating and discovering. Even Google advertises in the Wall Street Journal. Google!
Stephen E Arnold, October 11, 2021