AI Tech Forces Governments To Upgrade Laws
June 4, 2021
We are living in a time of science fiction due to advances in AI technology. While we are still far from holographic interfaces and competent digital assistants that do not spy on users, today’s technology was yesteryear’s imaginings. Due to advancements in AI, such as facial recognition, world governments are forced to update laws in order to maintain relevancy says Inc42 in, “How The World Is Updating Legislation In The Face Of Persistent AI Advances.”
Thirteen US stated banned facial recognition technology for police use. The ban is based on implicit biases hardwired into the technology from non-diverse datasets that favor light-skinned people. Meanwhile the European Union continues to protect its citizens’ privacy by restricting technology. The newest addition to the EU’s General Data Protection Regulation (GDPR), article 22, protects individuals’ rights from automated decision making, including profiling. EU citizens are guaranteed human intervention when automation harms their rights and freedoms. China continues to use facial recognition to monitor its people, including minority populations.
India remains in limbo when it comes to AI technology laws. While India has one of the world’s fastest growing economies and technology industries, the country also remains one of the poorest and under developed. Since the world lacks standardized AI legislation, India does not have a reference for its own laws. As an Asian country, India does not want to mirror China and it does not have the same development as Europe and the United States.
India’s government did create a Personal Data Protection Bill (PDPB), but it is stuck in parliament. The PDPB contains consumer protections:
“The Bill gives consumers the rights to access, correct and erase their data in its current form (Refer: Clause 19 of the PDPB “Right to data portability” under Chapter 5 “Rights of the data principal”). This is something that all organizations will have to comply within the timeline stipulated by the government. From a commercial perspective, data transference will be a major challenge, with its impact being harder on start-ups and SMEs. This does provide an avenue for new companies to provide services that help complying with the PDPB laws but will also impact start-ups and SMEs that rely on the consumer’s data and inferences of the data.”
There are also provisions for data localization and demands data is stored within India. Technology companies and startups will be the most affected other than Indian citizens. Restrictions on AI could limit technology business development within the Indian economy, but if the PDPB is passed it would benefit the citizens. There is a fine balance between morality and profit, but the people should come first.
Whitney Grace, June 4, 2021
On the Complexity of Ad Algorithms
June 4, 2021
It seems like advertising engineer Harikesh Nair has an innate love of intricacy—perhaps to the point of making things more complicated than they need to be. Tech Xplore interviews the algorithm aficionado in its post, “Expert Discusses the ‘Horrendously Complex Science’ Behind Online Advertising Research.” Once inspired by algorithmic innovations out of Google, Facebook, and Amazon, Nair has since settled his work in Asia, where the advertising scene is beautifully complicated by a couple of factors. First, its online marketplace is accessed almost entirely through mobile devices. Nair explains:
“With mobile apps, there’s limited real estate: Nobody browses beyond the first three pages. If you have 8 listings per page, that’s 24 products. So you have to surface 24 items out of 300 million possible product options, and you have to personalize the rankings by finding a good match between the user’s needs and what’s available. All of that has to be done really fast, within 200 microseconds [0.0002 seconds], and it has to be executed flawlessly. It’s a horrendously complex science problem.”
So there is that. We are told a strong social aspect to consumer behavior further complicates Asia’s marketing scene:
“Their society is more communal than ours, so consumption is more embedded in their social networks. Rather than go to a site and search for something and then click and buy and leave, which is typical American consumption behavior, people in China share products on each other’s social feeds, and by doing so they can often get a price reduction.”
Nair points out that, unlike Google, Facebook, and Amazon, which each play different roles, China’s Tencent and Alibaba each blend search, social media, and sales under one virtual roof. That is an interesting difference.
The data scientist goes on to wax poetic about how difficult it is to prove certain ads actually lead to certain purchases. The process involves elaborate, high-stakes experiments run in the background of ad auctions involving hundreds of thousands of advertisers. Then the results must be dumbed down for “people who don’t really know that much about statistics.” Mm hmm. Are we sure all this is not simply justification for confusing everyone and getting paid to do it?
Cynthia Murrell, June 4, 2021
Professional Publishing and Professional Cheaters
June 4, 2021
“Collusion Rings Threaten the Integrity of Computer Science Research” is an amusing, if not hilarious, write up. The venerable Communications of the Association for Computing Machinery has discovered that there is a “growing problem.” No kidding. I noted this statement:
Collusion rings extend far beyond the field of computer architecture.
This is a nice academic way of saying that technical papers which are peer reviewed are subject to search engine optimization tricks, cheating, and you-scratch-my-back, I-will-scratch-yours behavior. This is a surprise?
The article explains how a collusion ring works. Among its characteristics are hiding agreements to praise certain papers and threaten individuals who don’t go the monkey thing. Monkeys can be quite violent. Check out chimpanzee wars here.
I think ethical behavior in business is a much discussed topic in some circles. I think those desperate for tenure evidence the type of behavior visible in other “professions”; for example, politicians and experts in medieval literature.
The article includes this statement:
The cheaters run the risk of destroying the very system they depend on for their professional success. It is time to take a close look at the peer-review process and to align the incentives so everyone is working toward sharing the best research work possible.
My hunch is that those engaged in self-promotion are likely to say, “Hey, not my problem.”
I think this is the defining viewpoint of the Age of Thumbtypers. It would be interesting to get Timnit Gebru’s take on collusion rings in the workplace.
Stephen E Arnold, May 31, 2021
Misunderstanding Censorship: It Is Not Just Words
June 3, 2021
Popular words now are take down (killing servers), block (filter users or items on a stop list), cancel (ignoring a person or terminating an API call), and a pride of synonyms like terminate with extreme prejudice. The idea is that censorship is go to method to cultivate a more pleasing digital garden. But who owns the garden? The answer is that “ownership” depends on one’s point of view. Big tech has one role to play. Those contributing content in different media have another. The person who reads, listens, or watches “information” gets in the act as well.
The popular words reflect an interesting development. Those “in charge” want to preserve their kingpin role. Those who have an audience want to remain popular and get even more popular if possible. Those users want to consume what they want and will use available tools to satisfy their wants and needs.
In short, censorship seems to be a way for someone in a position to be a gatekeeper to impose a particular view upon information, how something “works” in the datasphere, or what “content” can flow into, through, and out of a 2021 system.
The first example of this imposition of a view point is articulated in “PayPal Shuts Down Long-Time Tor Supporter with No Recourse.” The main point is that an individual who contributed to the Tor project has been “booted” or “terminated with extreme prejudice” from the quasi-bank financial services operation PayPal. The article asserts:
For years, EFF has been documenting instances of financial censorship, in which payment intermediaries and financial institutions shutter accounts and refuse to process payments for people and organizations that haven’t been charged with any crime. Brandt shared months of PayPal transactions with the EFF legal team, and we reviewed his transactions in depth. We found no evidence of wrongdoing that would warrant shutting down his account, and we communicated our concerns to PayPal. Given that the overwhelming majority of transactions on Brandt’s account were payments for servers running Tor nodes, EFF is deeply concerned that Brandt’s account was targeted for shut down specifically as a result of his activities supporting Tor.
Does PayPal the company have strong feelings about software which obfuscates certain online activities? Tor emerged years ago from a government commercial research project. Now it is one of the vehicles allowing some users to engage in cyber crime-like activities. The write up does not dig too deeply into the who, what, when, why, how, and circumstances of “financial persecution.” That’s not surprising because PayPal is a commercial enterprise and can mostly do what it wants. The main point for me is that this type of blocking action has nothing to do with words.
I also want to mention that Amazon Twitch has been wrestling with take downs too. A popular “content creator” named Amouranth was blocked. Also, a 21st century talk show host known as BadBunny was banned. Amouranth’s Twitch stream featured a kiddie pool, an interesting fashion statement in the form of a bathing suit, and lots of eye shadow. BadBunny’s “issue” was related to words. I am not sure what BadBunny is talking about, but apparently the Twitch “proctors” do. So she had to occupy herself with other content creation for two weeks until she was reinstated. At the same time, a content creator named ibabyrainbow (whom I featured in my April National Cyber Crime Conference talk) provides links to Twitch followers who want more intriguing videos of ibabyrainbow’s antics. Thus, far ibabyrainbow has not run afoul of Amazon’s “curators” but Amazon may not know that ibabyrainbow provides other content on different services under the name of babyrainbow. Some of this content could be considered improper in certain countries.
Then I want to reference a remarkable essay about censorship called “How Censorship Became the New Crisis for Social Networks.” This write up states:
There are two strains of outrage related to censorship currently coursing through the platforms. The first are concerns related to governments enacting increasingly draconian measures to prevent their citizens from expressing dissent…. The second and perhaps more novel strain of outrage over censorship relates not to governments but to platforms themselves.
That’s tidy: A dichotomy, an either or, good evil, savage and civilized. Not exactly. I think the reality is messy and generating new complexities as each mouse click or finger swipe occurs.
People generally dislike change. If change is inevitable, some people prefer to experience the change at their own pace. Today the ease with which a threshold can be changed in an algorithm is disconcerting. What happened to my Google photos? Or Why can’t I access my iTunes account? are part of everyday life. Where’s BadBunny, Mr. Twitch?
My view is that censorship and its synonyms to polish up these actions designed to control information has been a standard operating procedure for many, many years. Book burning, anyone? The motivation is to ensure that power is retained, money flows, and particular views are promulgated.
The datasphere is magnifying the ease, effectiveness, and intention of managing words, images, and actions. I prefer to think of censorship as “proaction”; that is, taking the necessary steps to allow those with their hands on the knobs and wheels to further their own ends.
Instead of “terminated with extreme prejudice” implore “proactive measures.” Who is doing it? Maybe China, Iran, North Korea, Russia, and a number of other nation states? What commercial enterprises are practicing proaction? Maybe the FAANGs, the Bezos property Washington Post, the hip digital thing known as the New York Times, and anyone who can direct digital streams to benefit themselves.
Censorship — what I call proaction — is the new normal.
Adapt and avoid dichotomies. That type of thinking is for third graders.
Stephen E Arnold, June 3, 2021
Search: Still Struggling with Synonyms
June 3, 2021
I read “How AI Can Help Resolve Complex Fashion Taxonomies.” The write up states:
ecommerce retailers are struggling to find a system for managing the growing fashion taxonomy. For reference, fashion taxonomy is defined as the science of naming, describing, and classifying items into categories. And it affects every component of the customer experience, from search and discovery to product recommendations.
I agree. The problem has been a persistent one for decades. Statistical methods, manual methods, smart software methods — non works particularly well. Statistics drift as the language changes. What’s a slang word for sneakers; is it “kicks”? The idea is that an ecommerce site might not recognize this term unless a human entered it in a list of synonyms. Smart software might miss the nuances of pickle ball shoes that are wavy or nifty ice for a B.
If a person cannot locate a product, will that person enter synonyms or just click away to another site? That’s bad.
The article asserts:
it’s also becoming increasingly difficult for customers to find what they’re looking for, regardless of search intent.
The phrase “increasingly difficult” does not quite capture what’s happening in online information access. Locating online information which is timely, accurate, and relevant is extraordinarily difficult.
The write up, however, has a possible fix:
Tackling complex fashion taxonomy is a heavy task, but with artificial intelligence, retailers now have different approaches to try. Through text-based and visual search tools, retailers have the power to change the way customers experience their products, leading to higher engagement rates and more conversions. The future of artificial intelligence as a remedy to complex fashion taxonomies is bright – and you can expect to see more products in the market in the future.
But the purpose of the write up is to explain that YesPlz is the way to deliver a “user initiated search experience, combined with artificial intelligence and visual search.”
Possibly, but I think the solutions which have rolled down the cash flow pipelines have not delivered. Language is a moving target and shoppers want the system to “know” what he or she wants without having to speak, type, or do anything.
The big dog in ecommerce is Amazon. Bing and Google are working overtime to make their “shopping” search functions work better than the Bezos bulldozer’s. The problem is that Despite the tricks, the cohorts, the user fingerprint, and the rest of the methods to divine what a shopper wants and will buy is clumsy.
Marketing talk is a heck of a lot easier than solving what is becoming a problem too big to resolve. I don’t want a fashion item. I want a belt which does not look stupid. Woo woo.
Stephen E Arnold, June 4, 2021
Don Quixote Lives: Another Assault on Data Silos
June 3, 2021
Keep in mind that in some organizations data silos are necessary: Poaching colleagues (hello, big pharma), government security requirements (yep, the top Beltway bandits too), and common sense (lawyers heading to trial with a judge who has a certain reputation). Data silos are like everywhere. The were a couple of firms which billed themselves as “silo breakers.” How is that working out? The answer to the question resides in an analyst’s “data silo.” There you go.
Security is the biggest reason much-maligned data silos, also known as fragmented data, persist. Google now hopes to change that, we learn from “Google Cloud Launches New Services for a Unified Data Platform” at IT Brief. The company asserts its new solutions mean organizations can now forget about data silos and securely analyze their data in the cloud. We have yet to see detailed evidence for that claim, however. We will continue to keep our sensitive data separated, thank you very much.
Writer Ryan Morris-Reade describes the three new services upon which Google is pinning its cloudy unification hopes:
- Datastream, a new serverless Change Data Capture and replication service. Datastream enables customers to replicate data streams in real-time, from Oracle and MySQL databases to Google Cloud services such as BigQuery, Cloud SQL, Google Cloud Storage, and Cloud Spanner. This solution allows businesses to power real-time analytics, database replication, and event-driven architectures.
- Analytics Hub, a new capability that allows companies to create, curate, and manage analytics exchanges securely and in real-time. With Analytics Hub, customers can share data and insights, including dynamic dashboards and machine learning models securely inside and outside their organization.
- Dataplex, an intelligent data fabric that provides an integrated analytics experience, bringing the best of Google Cloud and open-source together, to enable users to rapidly curate, secure, integrate, and analyze their data at scale. Automated data quality allows data scientists and analysts to address data consistency across the tools of their choice, to unify and manage data without data movement or duplication. With built-in data intelligence using Google’s best-in-class AI and Machine Learning capabilities, organizations spend less time with infrastructure complexities and more time using data to deliver business outcomes.”
We learn consulting firm Deloitte is helping Google implement these solutions. That company’s global chief commercial officer emphasizes the tools provide “enhanced data experiences” for companies with siloed data by simplifying implementation and management. We are also told that Equifax and Deutsche Bank trust Google Cloud with their data. I guess that is supposed to mean we should, too.
But Google is quite the fan of data silos. Remember “universal search.” Google has separate indexes for news, scholarly information, and other content types. Universal implies breaking down “data silos.” But it is easier to talk about solving the data silo problem than delivering.
And what about Deloitte? This firm was fined about $20 million US because it had data silos which partitioned some partners from the work of the professionals working for Autonomy.
Yep, data silos. Persistent and embarrassing when someone thinks of “universal search” and Deloitte’s internal oversight methods.
Cynthia Murrell, June 03, 2021
Alation Releases New Version of its Enterprise Search Platform
June 3, 2021
Alation announces the latest release of its platform in its post, “How 2021.2 Is Remaking the Future of Enterprise Search.” This version comes with some handy features, like its table view, metadata search, and lexicon pairing. The post contains helpful screenshots. It is the tool’s boosted search ranking system, though, that writer Linh Nguyen puts at the top of the list. The platform’s AI now considers user input in establishing each resource’s worth. She tells us:
“Search results ranking and relevance now takes clues from social indicators. Alation catalog users have always been able to endorse or deprecate a given asset or dataset, signaling to their peers, ‘this asset is trustworthy’ or, by contrast, ‘warning! Use at your own risk! This asset is deprecated.’ With this update, we’re leveraging that tribal knowledge to influence all search rankings, illuminating the best assets that people trust. Specifically, user-created endorsements will boost ranking scores while deprecations will penalize rankings scores. Admins also have the option to customize the score associated with these trust flags (endorsements & deprecations). This empowers admins to effectively ‘endorse the endorsements’, further influencing rankings to promote the best assets to the right people.”
This sounds helpful, but we wonder whether it means content that is difficult to index will become even more difficult to find. What about audio, video, and comments in Slack, Teams, or Zoom; chemical structure and engineering diagrams; legal information within secured repositories; the PowerPoint data on sales professionals’ laptops? Improved UI and other nice-to-haves are well and good, but in our view comprehensive enterprise search remains elusive. Even with the power of AI.
Cynthia Murrell, June 3, 2021
Need to Tame the Information Tsunamis in Databases? DbSurfer May Be Your Deviled Egg
June 2, 2021
An interesting article “DbSurfer: A Search and Navigation Tool for Relational Databases” describes a novel way to locate information in Codd databases. Nope, I won’t make a reference to codfish. The surfing metaphor is good enough today.
The write up states:
We present a new application for keyword search within relational databases, which uses a novel algorithm to solve the join discovery problem by finding Memex-like trails through the graph of foreign key dependencies. It differs from previous efforts in the algorithms used, in the presentation mechanism and in the use of primary-key only database queries at query-time to maintain a fast response for users.
The Memex reference is not to the mostly forgotten Australian search and retrieval system. The Memex in this paper is a nod to everyone’s information hero Vannevar Bush’s fanciful “memex device.” (No, Google is not a memex device.)
The method involves “joins” and “tails.” The result is a system that allows keyword search and navigation through relational databases.
The paper includes a useful list of references. (Some recent computer science graduates who are billing themselves as search experts might find reading a few of the citations helpful. Just a friendly suggestion to the AI, NLP, and semantic whiz types.)
Is this a product? Nope, not yet. Interesting idea, however.
Stephen E Arnold, June 2, 2021
Google: The High School Science Club Management Method Cracks Location Privacy
June 2, 2021
How does one keep one’s location private? Good question. “Apple Is Eating Our Lunch: Google Employees Admit in Lawsuit That the Company Made It Nearly Impossible for Users to Keep Their Location Private” explains:
Google continued collecting location data even when users turned off various location-sharing settings, made popular privacy settings harder to find, and even pressured LG and other phone makers into hiding settings precisely because users liked them, according to the documents.
The fix. Enter random locations in order to baffle the high school science club whiz kids. The write up explains:
The unsealed versions of the documents paint an even more detailed picture of how Google obscured its data collection techniques, confusing not just its users but also its own employees. Google uses a variety of avenues to collect user location data, according to the documents, including WiFi and even third-party apps not affiliated with Google, forcing users to share their data in order to use those apps or, in some cases, even connect their phones to WiFi.
Interesting. The question is, “Why?”
My hunch is that geolocation is a darned useful item of data. Do a bit of sleuthing and check out the importance of geolocation and cross correlation on policeware and intelware solutions. Marketing finds the information useful as well. Does Google have a master plan? Sure, make money. The high school science club wants to keep the data flowing for three reasons:
First, ever increasing revenues are important. Without cash flow, Google’s tough-to-control costs could bring down the company. Geolocation data are valuable and provide a kitting needle to weave other items of information into a detailed just-for-you quilt.
Second, Amazon, Apple, and Facebook pose significant threats to the Google. Amazon is, well, doing its Bezos bulldozer thing. Apple is pushing its quasi privacy campaign to give “users” control. And Facebook is unpredictable and trying to out Google Google in advertising and user engagement. These outfits may be monopolies, but monopolies have to compete so high value data become the weaponized drones of these business wars.
Third, Google’s current high school management approach is mostly unaware of how the company gathers data. The systems and methods were institutionalized years ago. What persists are the modules of code which just sort of mostly do their thing. Newbies use the components and the data collection just functions. Why fix it if it isn’t broken. That assumes that someone knows how to fiddle with legacy Google.
Net net: Confusion. What high school science club admits to not having the answers? I can’t name one, including my high school science club in 1958. Some informed methods are wonderful and lesser being should not meddle. I read the article and think, “If you don’t get it, get out.”
Stephen E Arnold, June 1, 2021
Reconciling Two Views of Cloud Computing
June 2, 2021
I think it would be helpful to read “The Cost of Cloud, a Trillion Dollar Paradox.” The write up is an MBA team effort, and it makes what I think is an interesting point. The cloud makes sense when a company is small and doing the “go fast, break things” stuff. But as the company becomes larger, the cloud becomes expensive and slaps handcuffs on the customer. The MBAs may not agree with my précis, but it works okay for me.
Then read “Atlassian Claims It’s a Step Closer to Achieving Nirvana with Its Data.” The main point of the essay to that centralizing cloud work is better, faster, and all around more wonderful than the multi-cloud thing. I winced at the use of the word “nirvana.” Amazon AWS and nirvana don’t fit together like peanut butter and chocolate or pinga and salt. (Nirvana, I think, means according to Google’s recycling of the Oxford “languages”
transcendent state in which there is neither suffering, desire, nor sense of self, and the subject is released from the effects of karma and the cycle of death and rebirth.
That’s AWS for sure.
Both articles are marketing material. The a16z piece makes it clear that the firm’s analysts are on the ball. I think the message is, “We’re on the ball. We put money where it will really pay off.” The Ziff story is a marketing tchotchke, and it is designed to send a specific message about the freedom from suffering, desire, etc. associated with the use of AWS services.
What’s the nitty gritty?
Marketing, not analysis nor personal experience, has become the payload of what appears to be technical relevant information. This is a good thing, right. Perfect for home economics and political science majors who wrangle jobs in or around technology.
Lock in and cost control are not difficult concepts in my opinion. Pick one. Nirvana is near.
Stephen E Arnold, June 2, 2021