Microsoft Adds Semantic Search to Azure Cognitive Search: Is That Fast?
April 9, 2021
Microsoft is adding new capabilities to its cloud-based enterprise search platform Azure Cognitive Search, we learn from “Microsoft Debuts AI-Based Semantic Search on Azure” at Datanami. We’re told the service offers improved development tools. There is also a “semantic caption” function that identifies and displays a document’s most relevant section. Reporter George Leopold writes:
“The new semantic search framework builds on Microsoft’s AI at Scale effort that addresses machine learning models and the infrastructure required to develop new AI applications. Semantic search is among them. The cognitive search engine is based on the BM25 algorithm, (as in ‘best match’), an industry standard for information retrieval via full-text, keyword-based searches. This week, Microsoft released semantic search features in public preview, including semantic ranking. The approach replaces traditional keyword-based retrieval and ranking frameworks with a ranking algorithm using deep neural networks. The algorithm prioritizes search results based on how ‘meaningful’ they are based on query relevance. Semantics-based ranking ‘is applied on top of the results returned by the BM25-based ranker,’ Luis Cabrera-Cordon, group program manager for Azure Cognitive Search, explained in a blog post. The resulting ‘semantic answers’ are generated using an AI model that extracts key passages from the most relevant documents, then ranks them as the sought-after answer to a query. A passage deemed by the model to be the most likely to answer a question is promoted as a semantic answer, according to Cabrera-Cordon.”
By Microsoft’s reckoning, the semantic search feature represents hundreds of development years and millions of dollars in compute time by the Bing search team. We’re told recent developments in transformer-based language models have also played a role, and that this framework is among the first to apply the approach to semantic search. There is one caveat—right now the only language the platform supports is US English. We’re told that others will be added “soon.” Readers who are interested in the public preview of the semantic search engine can register here.
Cynthia Murrell, April 9, 2021
The Alphabet Google YouTube Thing Explains Good Old Outcome Centered Design
April 8, 2021
If you have tried to locate information on a Google Map, you know what good design is, right? What about trying to navigate the YouTube upload interface to add or delete a “channel”? Perfection, okay. What if you have discovered an AMP error email and tried to figure out how a static Web site generated by an AMP approved “partner” can be producing a single flawed Web page? Intuitive and helpful, don’t you think?
Truth is: Google Maps are almost impossible to use regardless of device. The YouTube interface is just weird and better for a 10-year-old video game player than a person over 30, and the AMP messages? Just stupid.
I read “Waymo’s 7 Principles of Outcome-Centered Design Are What Your Product Needs” and thought I stumbled upon a listicle crafted by Stephen Colbert and Jo Koy in the O’Hare Airport’s Jazz Bar.
Waymo (so named because one get way more with Alphabet Google YouTube — hereinafter, AGYT)technology — is managed by co-CEOs. It is semi famous for hiring uber engineer Anthony Levandowski. Plus the company has been beavering away to make driving down 101 semi fun since 2009. The good news is that Waymo seems to be making more headway than the Google team trying to solve death. The Wikipedia entry for Waymo documents 12 collisions, but the exact number of smart errors by the Alphabet Google YouTube software is not known even to some Googlers. Need to know, you know.
What are the rules for outcome centered design; that is, ads but no crashes I presume. The write up presents seven. Here are three and you can let your Chrome browser steer you to the full list. Don’t run into the Tesla Web site either, please.
Principle 2. Create focus by clarifying you8r purpose.
Okay, focus. Let’s see. When riding in a vehicle with no human in charge, the idea is to avoid a crash. What about filtering YouTube for okay content? Well, that only works some of the time. The Waymo crashes appear to underscore the fuzz in the statistical routines.
And Principle 4. Clue in to your customer’s context.
Yep, in a vehicle which knows one browsing history and has access to nifty profiles with probabilities allows the vehicle to just get going. Forget what the humanoid may want. Alphabet Google YouTube is ahead of the humanoid. Sometimes. The AFYT approach is to trim down what the humanoid wants to three options. Close enough for horse shoes. Waymo, like Alphabet Google YouTube, knows best. Just like a digital mistress. The humanoid, however, is going to a previously unvisited location. Another humanoid told the rider face to face about an emergency. The AGYT system cannot figure out context. Not to worry. Those AGYT interfaces will make everything really easy. One can talk to the Waymo equipped smart vehicle. Just speak clearly, slowly, and in a language which Waymo parses in an acceptable manner. Bororo won’t work.
Finally, Principle 7: Edit edit edit.
I think this means revisions. Those are a great idea. Alphabet Google YouTube does an outstanding job with dots, hamburger menus, and breezy writing in low contrast colors. Oh, content? If you don’t get it, you are not Googley. Speak up and you may be the Timnit treatment or the Congressional obfuscation rhetoric. I also like ignoring the antics of senior managers.
Yep, outcome centered. Great stuff. Were Messrs. Colbert and Koy imbibing something other than Sprite at the airport when possibly conjuring this list of really good tips? What’s the outcome? How about ads displayed to passengers in Waymo infused vehicles? Context centered, relevant, and a feature one cannot turn off.
Stephen E Arnold, April 8, 2021
HPE Machine Learning: A Benefit of the Autonomy Tech?
April 8, 2021
This sounds like an optimal solution from HPE (formerly known as HP); too bad it was not available back when the company evaluated the purchase of Autonomy. Network World reports, “HPE Debuts New Opportunity Engine for Fast AI Insights.” The machine-learning platform is called the Software Defined Opportunity Engine, or SDOE. It is based in the cloud, and will greatly reduce the time it takes to create custom sales proposals for HPE channel partners and customers. Citing a blog post from HPE’s Tom Black, writer Andy Patrizio explains:
“It takes a snapshot of the customer’s workloads, configuration, and usage patterns to generate a quote for the best solution for the customer in under a minute. The old method required multiple visits by resellers or HPE itself to take an inventory and gather usage data on the equipment before finally coming back with an offer. That meant weeks. SDOE uses HPE InfoSight, HPE’s database which collects system and use information from HPE’s customer installed base to automatically remediate infrastructure issues. InfoSight is primarily for technical support scenarios. Started in 2010, InfoSight has collected 1,250 trillion data points in a data lake that has been built up from HPE customers. Now HPE is using it to move beyond technical support to rapid sales prep.”
The write-up describes Black’s ah-ha moment when he realized that data could be used for this new purpose. The algorithm-drafted proposals are legally binding—HPE must have a lot of confidence in the system’s accuracy. Besides HPE’s existing database and servers, the process relies on the assessment tool recently acquired when the company snapped up CloudPhysics. We learn that the tool:
“… analyzes on-premises IT environments much in the same way as InfoSight but covers all of the competition as well. It then makes recommendations for cloud migrations, application modernization and infrastructure. The CloudPhysics data lake—which includes more than 200 trillion data samples from more than one million virtual machines—combined with HPE’s InfoSight can provide a fuller picture of their IT infrastructure and not just their HPE gear.”
As of now, SDOE is only for storage systems, but we are told that could change down the road. Black, however, was circumspect on the details.
Cynthia Murrell, April 8, 2021
GitHub: Amusing Security Management
April 8, 2021
I got a kick out of “GitHub Investigating Crypto-Mining Campaign Abusing Its Server Infrastructure.” I am not sure if the write up is spot on, but it is entertaining to think about Microsoft’s security systems struggling to identify an unwanted service running in GitHub. The write up asserts:
Code-hosting service GitHub is actively investigating a series of attacks against its cloud infrastructure that allowed cybercriminals to implant and abuse the company’s servers for illicit crypto-mining operations…
In the wake of the SolarWinds’ and Exchange Server “missteps,” Microsoft has been making noises about the tough time it has dealing with bad actors. I think one MSFT big dog said there were 1,000 hackers attacking the company.
The main idea is that attackers allegedly mine cryptocurrency on GitHub’s own servers.
This is post SolarWinds and Exchange Server “missteps”, right?
What’s the problem with cyber security systems that monitoring real time threats and uncertified processes?
Oh, I forgot. These aggressively marketed cyber systems still don’t work it seems.
Stephen E Arnold, April 8, 2021
Google and the Institutionalization of Me Too, Me Too
April 8, 2021
Never one to let a trend pass it by un-mimicked, Google has created a new YouTube feature. Ars Technica reports, “YouTube’s TikTok Clone, ‘YouTube Shorts,’ Is Live in the US.” The feature actually launched in India last September and has done well there—possibly because TikTok has been banned in that country since June. The feature but has now made its way to our shores. Writer Ron Amadeo tells us:
“The YouTube Shorts section shows up on the mobile apps section of the YouTube home screen and for now has a ‘beta’ label. It works exactly like TikTok, launching a full-screen vertical video interface, and users can swipe vertically between videos. As you’d expect, you can like, dislike, comment on, and share a short. You can also tap on a user name from the Shorts interface to see all the shorts from that user. The YouTube twist is that shorts are also regular YouTube videos and show up on traditional channel pages and in subscription feeds, where they are indistinguishable from normal videos. They have the normal YouTube interface instead of the swipey TikTok interface. This appears to be the only way to view these videos on desktop. A big part of TikTok is the video editor, which allows users to make videos with tons of effects, music, filters, and variable playback speeds that contribute to the signature TikTok video style. The YouTube Shorts editor seems nearly featureless in comparison, offering only speed options and some music.”
Absent those signature features, it seems unlikely Short will successfully rival TikTok. Perhaps it will last about as long as Stadia, Orkut, or Web Accelerator. At least no one can say Google shies away from trying things that may not work out.
Cynthia Murrell, April 8, 2021
Facebook and Microsoft: Communing with the Spirit of Security
April 7, 2021
Two apparently unrelated actions by bad actors. Two paragons of user security. Two. Count ‘em.
The first incident is summarized in “Huge Facebook Leak That Contains Information about 500 Million People Came from Abuse of Contacts Tool, Company Says.” The main point is that flawed software and bad actors were responsible. But 500 million. Where is Alex Stamos when Facebook needs guru-grade security to zoom into a challenge?
The second incident is explained in “Half a Billion LinkedIn Users Have Scraped Data Sold Online.” Microsoft, the creator of the super useful Defender security system, owns LinkedIn. (How is that migration to Azure coming along?) Microsoft has been a very minor character in the great works of 2021. These are, of course, The Taming of SolarWinds and The Rape of Exchange Server.
Now what’s my point. I think when one adds 500 million and 500 million the result is a lot of people. Assume 25 percent overlap. Well, that’s still a lot of people’s information which has taken wing.
Indifference? Carelessness? Cluelessness? A lack of governance? I would suggest that a combination of charming personal characteristics makes those responsible individuals one can trust with sensitive information.
Yep, trust and credibility. Important.
Stephen E Arnold, April 7, 2021
Alphabet Google YouTube: We Are Doing Darned Good Work
April 7, 2021
I read a peculiar item of information about the mom-and-pop outfit Alphabet Google YouTube. You may have a different reaction to the allegedly accurate data. Just navigate to “YouTube Claims It’s Getting Better at Enforcing Its Own Moderation Rules.” The “real news” story reports:
In the final months of 2020, up to 18 out of every 10,000 views on YouTube were on videos that violate the company’s policies and should have been removed before anyone watched them. That’s down from 72 out of every 10,000 views in the fourth quarter of 2017, when YouTube started tracking the figure.
Apparently the mom-and-pop outfit calculates a “violative view rate.” This is a metric possible only if a free video service accepts, indexes, and makes available “videos that contain graphic violence, scams, or hate speech.”
The system, the write up reports that :
YouTube’s team uses the figure internally to understand how well they’re doing at keeping users safe from troubling content. If it’s going up, YouTube can try to figure out what types of videos are slipping through and prioritize developing its machine learning to catch them.
A few questions spring to mind:
- What specifically is “violative” content. An interview I conducted with a former CIA operative was removed a year after the interview appeared as a segment in my 10 to 15 minute twice monthly video news program. An interview with a retired spy was deemed violative. I hope YouTube learned something from this take down. I remain puzzled.
- How does content depicting graphic violence, scams, and hate speech get on the YouTube system? After I upload a video, a message appears to tell me if the video is okay or not okay. I think Google’s system is getting better from the mom-and-pop outfit’s point of view. From other points of view? I am not sure.
- Why trust metrics generated within the Alphabet Google YouTube outfit? By definition, the data collection methods, the sample, and the techniques used to identify what’s important are not revealed. FAANG-type outfits are not exactly the gold standard in ethical behavior for some people. I, of course, believe everything I read online like transcripts of senior executives’ remarks to Congressional committees?
- Why release these data now? What’s the point? Apple is tossing cores at Facebook. Alphabet Google YouTube is reminding some that Microsoft’s security is interesting. Amazon wants to pay tax. Maybe these actions and the violative metric are PR.
The write up contains charts. Low contrast colors show just how much better Alphabet Google YouTube is getting in the violative content game. I love the violative view rate phrase. Delicious.
Stephen E Arnold, April 7, 2021
Australia Demands Fairness from Big Tech. Waves Expected Worldwide
April 7, 2021
After wrangling over the issue for weeks, Australian regulators and Facebook have come to an agreement. Regulators demanded the social media platform, as well as Google, start paying news publishers their fair share for content. Sounds reasonable, considering that out of every $100 spent on online advertising in that country, $53 goes to Google and $28 to Facebook. That is 81% going to just two companies.
Facebook responded by temporarily blocking all news to Australian users. (Google made a similar threat, but made deals with several Australian media groups instead.) Now that a compromise has been reached and the blackout ended, all that remains is for the adjusted media law to be passed. Yahoo News discusses “Why the World is Watching Australia’s Tussle with Big Tech.” Writer Andrew Beatty observes:
“Although the rules would only apply in Australia, regulators elsewhere are looking closely at whether the system works and can be applied in other countries. Microsoft — which could gain market share for its Bing search engine — has backed the proposals and explicitly called for other countries to follow Australia’s lead, arguing the tech sector needs to step up to revive independent journalism that ‘goes to the heart of our democratic freedoms’. European legislators have cited the Australian proposals favorably as they draft their own EU-wide digital market legislation. Facebook’s decision to roll back the news ban comes after it received widespread criticism for the initial blackout, which also impacted some emergency response pages used to alert the public to fires, floods and other disasters. The company quickly moved to amend that mistake, but the incident left questions about whether social media platforms should be able to unilaterally remove services that are part of crisis response and may even be considered critical infrastructure.”
Critical infrastructure—that is an interesting twist. Both Facebook and Google insist they don’t mind paying for content, something each has started to do in very limited ways. They just don’t want to be told how much to pay; Australian regulators would like independent arbiters to oversee deals to be sure they are fair. World Wide Web inventor Tim Berners-Lee warns the precedent of charging for links could “break the internet.” Are the extended consequences of holding these two companies to account really so dire?
Cynthia Murrell, April 07, 2021
India May Use AI to Remove Objectionable Online Content
April 7, 2021
India’s Information Technology Act, 2000 provides for the removal of certain unlawful content online, like child pornography, private images of others, or false information. Of course, it is difficult to impossible to keep up with identifying and removing such content using just human moderators. Now we learn from the Orissa Post that the “Govt Mulls Using AI to Tackle Social Media Misuse.” The write-up states:
“This step was proposed after the government witnessed widespread public disorder because of the spread of rumours in mob lynching cases. The Ministry of Home Affairs has taken up the matter and is exploring ways to implement it. On the rise in sharing of fake news over social media platforms such as Facebook, Twitter and WhatsApp, Minister of Electronics and Information Technology Ravi Shankar Prasad had said in Lok Sabha that ‘With a borderless cyberspace coupled with he possibility of instant communication and anonymity, the potential for misuse of cyberspace and social media platforms for criminal activities is a global issue.’ Prasad explained that cyberspace is a complex environment of people, software, hardware and services on the internet. He said he is aware of the spread of misinformation. The Information Technology (IT) Act, 2000 has provisions for removal of objectionable content. Social media platforms are intermediaries as defined in the Act. Section 79 of the Act provides that intermediaries are required to disable/remove unlawful content on being notified by the appropriate government or its agency.”
The Ministry of Home Affairs has issued several advisories related to real-world consequences of online content since the Act passed, including one on the protection of cows, one on the prevention of cybercrime, and one on lynch mobs spurred on by false rumors of child kidnappings. The central government hopes the use of AI will help speed the removal of objectionable content and reduce its impact on its citizens. And cows.
Cynthia Murrell, April 7, 2021
Could the Google Cloud Secretly Hide Sensitive Information?
April 7, 2021
It is odd seeing Google interested in protecting user information, but Alphabet Inc. follows dollar signs. The high demand for digital security is practically flashing bright neon dollar signs, so it is not surprising Google is investing its talents into security development. Tech Radar shares that simpler applications could lead to better security in the article, “Google Cloud Is Making It Easier For Developers To Smuggle ‘Secrets’ In Their Code.”
A big problem with application development is accidentally exposing sensitive information via the source code. Bad actors can hack applications’ code, then steal the sensitive information. Google Cloud fused its Secret Manager service (a secure method to store private information) with its Cloud Code IDE extensions that speed up cloud-based application development.
The benefits of the merged technologies are:
“The integration allows developers to replace hardcoded data with so-called Secrets, a type of global object available to applications at build or runtime. This way, cloud applications can make use of the sensitive data when needed, but without leaving it exposed in the codebase.
According to Google, the new integration will make it easier for developers to build secure applications, while also avoiding the complexities of securing sensitive data via alternative methods.”
In the past, developers hardcoded sensitive information into their codebase. It made it easier to recall data, but savvy bad actors could access it. Many applications know that hardcoding sensitive information is a security risk, so hey make users run the gambit with authentication services.
Secret Manager and Cloud Code IDE could eliminate the authentication hassle, while protecting sensitive information.
Whitney Grace, April 7, 2021