Google, TikTok, and Seriousness

July 15, 2020

Short form video is in the news. TikTok captivates millions of eyeballs. Many of these eyeballs belong to Americans. Most of these Americans choose not to understand several nuances of “free” 30 second videos created, transmitted, viewed, and forwarded via a mobile device; to wit:

  1. Software for mobile phones can covertly or overtly suck up data and send those data to a control node
  2. Those data can be cross correlated in order to yield useful insights about the activities, preferences, and information flowing into and out of a mobile device equipped with an application. Maybe TikTok does this too?
  3. Those digital data can be made available to third parties; for example, advertising analytics vendors and possibly, just maybe, a country’s intelligence services.

The Information published one of those “we can’t tell you where we got these data but by golly this stuff is rock solid” stories. This one is called “TikTok Agreed to Buy More Than $800 Million in Cloud Services From Google.” Let’s assume that this story about the Google TikTok deal is indeed accurate. We learn:

Last week, though, word surfaced of a buzzy new customer for Google Cloud—TikTok, the app for sharing short videos that is the year’s runaway social media hit. The deal is a lucrative one for Google Cloud, The Information has learned. In a three-year agreement signed in May 2019, TikTok committed to buying more than $800 million of cloud services from Google over that period…

What’s with the Google? Great or lousy business judgment? Does Google’s approach to a juicy deal include substantial discounts in order to get cash in the door? Is the deal another attempt by the Google to get at least some of the China market which it masterfully mishandled by advising the Chinese government to change its ways?

Nope. The new Google wants to grow by locking down multi year contracts. The belief is that these “big deals” will give the Google Cloud the protein shake muscles needed to deal with the Microsofties and the Bezos bulldozer.

New management, new thinking at the GOOG, and there will be more of the newness revealed with each tweak of a two decades old “system.”

At the same time as the Information “real” news story arrived in the DarkCyber news center, a pundit published MBA type write up popped into our “real news” folder. This write up is “The TikTok War.”

Unlike the Information’s story, the Stratechery essay is MBA consultant speak, which is different from “real news.” The point of the 3,900 word consultant report is:

I believe it is time to take China seriously and literally…

There you go: An MBA consulting revelation. One should take China seriously and literally.

Okay. Insight. Timely. Incisive.

From this conclusion, TikTok’s service is no longer appropriate in the US. Banning is probably a super duper idea if I understand the TikTok War. (How does one fight a war by banning digital information? Oh, well, irrelevant question. What’s that truism about ostriches putting their heads in the sand? Also irrelevant.)

Let’s step back and put these two different TikTok articles in a larger context.

The Information wants everyone to know that a mysterious “source” has said that Google has a three year deal with TikTok. This is a surprise? Nope. Google is on the hunt for cash because after Google’s own missteps, it is faced with hard to control costs and some real live “just like Google” competitors; namely, Amazon, Apple, Facebook, and Netflix. There’s also the mounting challenges of political and social annoyances to add some spice to the Googlers’ day.

The MBA consultant analysis points out that China has to be taken seriously. Prior to TikTok, China was not taken seriously? I suppose TikTok is the catalyst for seriousness. More likely, the TikTok thing evokes MBA consultant outputs to confirm what many people sort of intuit but have not been able to sum up with a “now is the time” utterance.

In my lecture yesterday for the National Cyber Crime Conference, I presented a diagram of how Chinese telecommunications and software systems can exfiltrate information with or without TikTok.

Banning an app is another one of those “Wow, the barn burned and Alibaba built a giant data center where the Milking Shorthorns once stood” moments.

Sourceless revelations about Google’s willingness to offer a deal to a China centric TikTok and MBA consultant revelations that one should take China seriously warrants one response: The ship sailed, returned, built a giant digital port, and has refueled for a return journey. Ban away.

Stephen E Arnold, July 15, 2020

Close Enough for Horse Shoes? Why Drifting Off Course Has Become a Standard Operating Procedure

July 14, 2020

One of the DarkCyber research team sent me a link to a post on Hacker News: “How Can I Quickly Trim My AWS Bill?” In the write up were some suggestions from a range of people, mostly anonymous. One suggestion caught my researcher’s attention and I too found it suggestive.

Here’s the statement the DarkCyber team member flagged for me:

If instead this is all about training / the volume of your input data: sample it, change your batch sizes, just don’t re-train, whatever you’ve gotta do.

Some context. Certain cloud functions are more “expensive” than others. Tips range from dumping GPUs for CPUs to “Buy some hardware and host it at home/office/etc.”

I kept coming back to the suggestion “don’t retrain.”

One of the magical things about certain smart software is that the little code devils learn from what goes through the system. The training gets the little devils or daemons to some out of bed and in the smart software gym.

However, in many smart processes, the content objects processed include signals not in the original training set. Off the shelf training sets are vulnerable just like those cooked up by three people working from home with zero interest in validating the “training data” from the “real world data.”

What happens?

The indexing or metadata assignments “drift.” This means that the smart software devils index a content object in a way that is different from what that content object should be tagged.

Examples range from this person matches that person to we indexed the food truck as a vehicle used in a robbery. Other examples are even more colorful or tragic depending on what smart software output one examines. Detroit facial recognition ring a bell?

Who cares?

I care. The person directly affected by shoddy thinking about training and retraining smart software, however, does not.

That’s what is troubling about this suggestion. Care and thought are mandatory for initial model training. Then as the model operates, informed humans have to monitor the smart software devils and retrain the system when the indexing goes off track.

The big or maybe I should type BIG problem today is that very few individuals want to do this even it an enlightened superior says, “Do the retraining right.”

Ho ho ho.

The enlightened boss is not going to do much checking and the outputs of a smart system just keep getting farther off track.

In some contexts like Google advertising, getting rid of inventory is more important than digging into the characteristics of Oingo (later Applied Semantics) methods. Get rid of the inventory is job one.

For other model developers, shapers, and tweakers, the suggestion to skip retraining is “good enough.”

That’s the problem.

Good enough has become the way to refactor excellence into substandard work processes.

Stephen E Arnold, July 14, 2020

IHS Markit Data Lake “Catalog”

July 14, 2020

One of the DarkCyber research team spotted this product announcement from IHS, a diversified information company: “IHS Markit’s New Data Lake Delivers Over 1,000 Datsets in an Integrated Catalogued Platform.” The article states:

The cloud-based platform stores, catalogues, and governs access to structured and unstructured data. Data Lake solutions include access to over 1,000 proprietary data assets, which will be expanded over time, as well as a technology platform allowing clients to manage their own data. The IHS Markit Data Lake Catalogue offers robust search and exploration capabilities, accessed via a standardized taxonomy, across datasets from the financial services, transportation and energy sectors.

The idea is consistently organized information. Queries can run across the content to which the customer has access.

Similar services are available from other companies; for example, Oracle BlueKai.

One question which comes up is, “What exactly are the data on offer?” Another is, “How much does it cost to use the service?”

Let’s tackle the first question: Scope.

None of the aggregators make it easy to scan a list of datasets, click on an item, and get a useful synopsis of the content, content elements, number of items in the dataset, update frequency (annual, monthly, weekly, near real time), and the cost method applicable to a particular “standard” query.

A search of Bing and Google reveals the name of particular sets of data; for example, Carfax. However, getting answers to the scope question can require direct interaction with the company. Some aggregators operate in a similar manner.

The second question: Cost?

The answer to the cost question is a tricky one. The data aggregators have adopted a set or a cluster of pricing scenarios. It is up to the customer to look at the disclosed data and do some figuring. In DarkCyber’s experience, the data aggregators know much more about what content process, functions or operations generate the maximum profit for the vendor. The customer does not have this insight. Only through use of the system, analyzing the invoices, and paying them is it possible to get a grip on costs.

DarkCyber’s view is that data marketplaces are vulnerable to disruption. With a growing demand for a wide range of information some potential customers want answers before signing a contract and outputting big bucks.

Aggregators are a participant in what DarkCyber calls “professional publishing.” The key to this sector is mystery and a reluctance to spell out exact answers to important questions.

What company is poised to disrupt the data aggregation business? Is it the small scale specialist like the firms pursued relentlessly by “real” journalists seeking a story about violations of privacy? Is it a giant company casting about for a new source of revenue and, therefore, is easily overlooked. Aggregation is not exactly exciting for many people.

DarkCyber does not know. One thing seems highly likely: Professional publishing data aggregation sector is likely to face competitive pressure in the months ahead.

Some customers may be fed up with the secrecy and lack of clarity and entrepreneurs will spot the opportunity and move forward. Rich innovators will just buy the vendors and move in new directions.

Stephen E Arnold, July 14, 2020

Google and the Middle Kingdom

July 10, 2020

Remember when Google nosed into China and suggested that the country change how it approached life, business, and online? Few do. Suffice it to say that Google’s Silicon Valley inputs did produce one reaction: A small dish of day old sweet red bean dumplings. Yummy.

Flash forward to the present. “Google Shuts Down Cloud Project, Says No Plan to Offer Cloud Services in China” reports that Google

has shut down its cloud project named ‘Isolated Region’ and added that it was not weighing options to offer its cloud platform in China.

The article states:

The search engine giant, however, said that the project’s shutdown was not due to either of those two reasons and that it has not offered cloud platform services in China.

Perhaps Google became impatient waiting for China to modify its methods?

Stephen E Arnold, July 10, 2020

Do It Huiwei, Please

July 9, 2020

Believe it or not.

Huawei is a mobile device brand not well known in the United States, but it provides an Android based device to millions of consumers in the eastern hemisphere. Huawai devices are manufactured in China and in May the company held its seventeenth annual analyst summit. Ameyaw Debrah shares the story in the article, “Huawei Analyst Summit: Security And Privacy In A Seamless AI Life-Only You Control Your Personal Data.”

The Vice President of Consumer Cloud Services Eric Tan delivered the keynote speech called “Rethink the Seamless AI Experience with the Global HMS Ecosystem” related to Huawei’s privacy and security related to the cloud, hardware, application development, and global certifications. Tan stated that Huawei abides by GDPR, GAPP, and local laws to guarantee privacy compliance.

Another speaker, Dr. Wang Chenglu spoke about “Software-Powered, Seamless AI Experiences and Ecosystems.” He stated how distributed security builds trust between people, data, and devices to protect user privacy and data:

“He explained that firstly, ensure that users are using the correct devices to process data and Huawei has developed a comprehensive security and privacy management system that covers smart phone chips, kernels, EMUI, and applications. This allows devices to establish trusted connections and transfer data based on end-to-end encryption.

Secondly, ensure the right people are accessing data and operating services via the distributed security architecture which makes coordinated, multi-device authentication possible. An authentication capability resource pool is established by combining the hardware capabilities of different devices. The system provides the best security authentication measures based on authentication requests and security level requirements in different business scenarios.”

Huawei stressed that privacy and security are its MO, but can one believe that “only you control your private life” when. a country-supported company is coding up a storm?”

Whitney Grace, July 9, 2020

The Cost of Training Smart Software: Is It Rising or Falling?

July 6, 2020

I read “The Cost of AI Training is Improving at 50x the Speed of Moore’s Law: Why It’s Still Early Days for AI.” The article’s main point is that “training” — that is, the cost of making machine learning smart — is declining.

That seems to make sense. First, there are cloud services. Some of these are cheaper than others, but, in general, relying on cloud compute eliminates the capital costs and the “ramp up” costs for creating one’s own infrastructure to train machine learning systems.

Second, use of a machine learning “utility” like Amazon AWS Sagemaker or the similar services available from IBM and Google provides two economic benefits:

  1. Tools are available to reduce engineering lift off and launch time
  2. Components like Sagemaker’s off-the-shelf data bundles eliminate the often-tedious process of finding additional data to use for training.

Third, assumptions about smart software’s efficacy appear to support generalizations about the training, use, and deployment of smart software.

I want to =note that there are some research groups who believe that software can learn by itself. If my memory is working this morning, I think the jazzy way to state is “sui generis.” Turn the system on, let it operate, and it learns by processing. For smart software, the crude parallel is learning the way humans learn: What’s in the environment becomes the raw material for learning.

The article correctly points out that the number of training models has increased. That is indeed accurate. A model is a numerical recipe set up to produce an output that meets the modeler’s goal. Thus, training a model involves providing data to the numerical recipe, observing the outputs, and then making adjustments. These “tweaks” can be simple and easy; for example, changing a threshold governing a decision. More complex fixes include, but are not limited to, selecting a different sequence for the individual processes, concatenating models so that multiple outputs inform a decision, and substituting one mathematical component for another. To get a sense of the range of components available to a modeler, a quick look at Algorithms. This collection is what I would call “ready to run.”

The article includes a number of charts. Each of these presents data supporting the argument that it is getting less costly to training smart software.

I am not certain I agree, although the charts seem to support the argument.

I want to point out that there are some additional costs to consider. A few of these can be “deal breakers” for financial and technical reasons.

Here’s my list of smart software costs. As far as I know, none of these has been the subject of an analyst’s examination and some may be unquantified because those in the business of smart software are not set up to capture them:

  1. Retraining. Anyone with experience with models knows that retraining is required. There are numerous reasons, but retraining is often more expensive than the first set of training activities.
  2. Gathering current or more on point training data. The assumption about training data is that it is useful. We live in the era of so called big data. Unfortunately on point data relevant to the retraining task is a time consuming and can be a complicated task involving subject matter experts.
  3. Data normalization. There is a perception that if data are digital, those data can be provided “as is” to a content processing system. That is not entirely accurate. The normalization processes can easily consume as much as 60 percent of available subject matter expert and data analysts’ time.
  4. Data validation. The era of big data makes possible this generalization, “The volume of data will smooth out any anomalies.” Maybe, but in my experience, the “anomalies” — if not addressed — can easily skew one of the ingredients in the numerical recipe so that the outputs are not reliable. The output may “look” like it is accurate. In real life, the output is not what’s desired. I would refer the reader to the stories about Detroit’s facial recognition system which is incorrect 96 percent of the time. For reference, see this Ars Technica article.
  5. Downstream costs. Let’s use the Detroit police facial recognition system to illustrate this cost. Answer this question, please, “What are the fully loaded costs for the consequences of the misidentification of a US citizen?”

In my view, taking a narrow look at the costs of training smart software is not in the interests of the analyst who benefits from handling investors’ money. Nor are the companies involved in smart software eager to monitor the direct and indirect costs associated with training the models. Finally, it is in no one’s interest to consider the downstream costs of a system which may generate inaccurate outputs.

Net net: In today’s economic environment, ignoring the broader cost picture is a distortion of what it takes to train and retrain smart software.

Stephen E Arnold, July 6, 2020

When You Were a Young Millionaire, Did You Write This Way?

June 29, 2020

I read “Mixer Co-Founder on Microsoft Pulling the Plug, Twitch’s Market Power, and His Startup Journey.” DarkCyber looks at the universes of live streaming services from our observation post in rural Kentucky.

Games are not an all-encompassing world. The travails of Dr. Disrespect, the odd-ball world of ManyVids, or individuals who haunt NoAgendaStream.com.

These services create an opportunity for bad actors, malefactors, and Dr. Jekylls to sell contraband, engage in questionable transactions, and pass messages mostly off the radar of the local country sheriff in Tennessee.

What caught our attention in the GeekWire article was this passage:

“Ultimately, the success of Partners and streamers on Mixer is dependent on our ability to scale the service for them as quickly and broadly as possible. It became clear that the time needed to grow our own livestreaming community to scale was out of measure with the vision and experiences we want to deliver to gamers now, so we’ve decided to close the operations side of Mixer and help the community transition to a new platform.”

The young millionaire and digital nabob may want to consider a job in public relations if he is snubbed by an interesting government agency.

Notable phrases:

  • Ultimately
  • success is dependent
  • vision and experiences
  • we’ve decided
  • operations side
  • help the community
  • transition
  • a new platform.

Yeah, typical 20 something blog speak.

The conclusions we have reached in the DarkCyber intelligence and forecasting center are:

First, Azure couldn’t deliver. If the Softie’s cloud thing can do JEDI, should Azure deliver streaming games? Sure, but it does not.

Second, Microsoft has been friends sort of with Facebook. Does Facebook have a more resilient, agile, responsive, and efficient video service? Facebook may aspire to be social YouTube, but it has a bit of distance to travel.

Third, Microsoft’s mix up with Mixer makes clear that the me to approach to innovation and the blenderized approach to management at Microsoft cannot tap a hot new sector any better than it can update Windows 10.

Net net: DarkCyber is thinking that on our list of soon-to-be-cold technical dinosaurs, Microsoft may find itself making big plans with Hewlett Packard, IBM, and Oracle, among others.

As for the young millionaire, after the election there may be a need for a person with wordsmithing skills, the vocabulary of a millennial lawyer, and the sentence structure of Cicero without the flair unfortunately.

Stephen E Arnold, June 29, 2020

xx

JEDI Winner Continues to Excel in Software Updates

June 25, 2020

Will the US Department of Defense be happy with updates to a JEDI system that cause crashes? Probably slightly unhappy. “New Windows 10 Update Fail Breaks Some of Its Best Features” reports:

people have been complaining that after installing the Windows 10 May 2020 Update (also known as Windows 10 version 2004), they cannot access files synced to OneDrive – even if they can be seen in Windows 10.

The write up adds:

Even more embarrassingly for Microsoft, it seems this bug has been around for months in early versions of Windows 10 May 2020 Update, with Windows Insiders, who can try out versions of Windows 10 before other people in order to spot bugs like this, complaining that OneDrive no longer works.

Visualize this. You are in a fire zone. You need cloud data. Bad actors ranging rounds are getting closer.

Take a deep breath and follow this procedure:

Press Windows Key R
Key this string: %localappdata%\Microsoft\OneDrive\onedrive.exe /reset
Access needed data.
No problemo. Microsofties may ponder this when they grab a carry out lunch at Bai Tong’s. 
Stephen E Arnold, June 25, 2020

Complaints and Protest: But the GOOG Has Been Googling for 20 Years

June 23, 2020

My goodness, we live in the Era of Complaining. The print version of the “flagship podcast” published “Google Employees Demand the Company End Police Contracts.” Let’s put this Google tie up with the US government in context.

Google was poking around the US government as early as 1999 when the chatter about indexing US government content surfaced. The company bid on the FirstGov.gov project and lost. (The US government selected the really interesting solution proposed and provided by AT&T.) Google acquired Keyhole which the CIA investment unit In-Q-Tel supported with cash. In 2005, In-Q-tel sold its shares in Google in 2005. In 2008, Google and In-Q-Tel jointly invested in Recorded Future. Along the way, Google has performed “work” for a number of US government agencies. Despite the low profile of some of these activities, Google has been in the DC game for more than 20 years. I know because I receive a snotty email about why Google should have been selected instead of the AT&T Fast Search solution.

The point is that Google employees are dazzled by their perceptual baloney. The company today is similar to the wonky outfit it was after Backrub took a break, venture money arrived, and in a moment of adulting thrashed about for a way to make money. The solution was, as you and some Googlers may not care to know, was to “be influenced” by Yahoo’s Overture/GoTo online advertising concept. Google settled the Yahoo legal complaint about this “influence’ prior to the firm’s IPO and may have coughed up about $1 billion to grease the skids for the IPO. Yahoo took the deal, and the Google morphed into the online ad outfit it is today.

But employees at Google, based on my limited exposure to these fine individuals, are generally unaware of the company’s interest in US government work, the fascinating way systems and methods arrive at the company, and the old fashioned idea that when you accept money for work you shut up or quit.

Not today.

The online word version of the “flagship podcast” states:

Employees are specifically calling out Google’s ongoing Cloud contract with the Clarkstown Police Department in New York, which was sued for allegedly conducting illegal surveillance on Black Lives Matter protestors in 2015. They’re also highlighting the company’s indirect support of a sheriff’s department in Arizona tracking people who cross the US-Mexico border.

Okay, Google is not the center of the universe when it comes to management sophistication. The company employs what I call “the high school science club management method.” The inability to keep information private and the hiring procedures which seem to favor those who want to decide what a publicly traded commercial enterprise do to earn money illustrates the challenges Google faces.

Mr. Brin’s showing up in senior elected officials’ offices wearing a T shirt and gym shoes with sparklies on them is trivial compared to the larger strategic recent issues at Google.

Not only are employees at Google complaining despite the money, the ping pong tables, and the benefits of working at home — the employees want Google to extricate and no longer pursue revenue producing activities.

Several observations:

  1. Google does and will continue to do government work despite caving to employee demands over Project Maven. Hey, good news for Anduril, right?
  2. Employees don’t know much if anything about the history of Google, the type of decisions its founders made, and efforts the company has made to obtain government work. Candidate vetting and  employee training is working well at the GOOG, don’t you think?
  3. Google management cannot contain confidential information. But the larger question is, “Why is the hiring process failing to recruit individuals who do work and make time to complain about Google’s government work. The contracts don’t just drop from the sky. Effort, sometimes years of effort, are necessary to land these projects. So quit tomorrow? Sure, good for the attorneys, not for the government customers.

Complain, complain, complain. There’s nothing like employees grousing. Why not do something other than send email? Here’s a suggestion: Quit.

What’s Google going to do about this quite embarrassing state of affairs?

Many years ago (I can’t provide details because I signed a document wittingly) a Google senior wizard told me:

Some day it will end. Until then, rock and roll.

And to what does this Gnostic phrase refer?

Google has been putting the pedal to the metal for 20 years. Now the company is operating, like a few others, without meaningful constraints, adult leadership, and much of a purpose other than making money, reducing costs, and dealing with backlashes. The push back against Google is manifesting itself in the government investigations, the talk about monopoly behavior, and the dwindling likelihood that a trip to Brussels or Strasbourg will be a holiday. It is possible that some Google attorneys will enjoy discussing the fines and legal restraints fun, but that’s a sign of changing times.

Net net: The employee grousing reflects a lack of meaningful regulation, a failure of Google leadership, and remediating hiring processes which allow the printed version of the “flagship podcast” to explain that lots of Googlers want to tear the house down. Take direct action. Resign. I am old fashioned. Employees accept job offers. Before hooking up with a publicly traded company as an employee (look up the definition, gentle Googlers with protest on your mind) — learn about the company. That’s your obligation. After accepting a job, like it or leave. Easy. I, however, think these complainers will follow the thought processes I characterize as “Casey Newtonesque.”

Wonderful. Flagship podcast. Real news, yeah!

Stephen E Arnold, June 23, 2020

Mindbreeze: Big News from Austria

June 19, 2020

Moving enterprise search and data analysis to the cloud means security becomes an even greater concern, and one provider recently had an audit performed on its platform. Olean Times Herald reports, “Mindbreeze InSpire SaaS Receives SOC2 Type 1 Attestation.” A System and Organization Controls 2 audit assesses how well a system complies with certain standards on the handling of data. “Type 1” means the assessment reports on a snapshot of time, no longer than six months. Consulting company KPMG completed the audit report. The write-up tells us:

“In the context of the auditing process, KPMG examined whether the Trust Services Criteria (TSC) for security – issued by the American Institute of Certified Public Accountants (AICPA) – are observed. This involved inspecting and documenting the existing internal control mechanisms for the services offered, such as those relating to risk minimization, access controls, monitoring measures, and communication. The audit took the form of an ISAE 3000 Type 1 audit (testing the design and the implementation for a specific deadline) and was conducted over a period of roughly four weeks. Mindbreeze received the final test results as an ISAE 3000 SOC2 Type 1 Report.”

The report will provide information to Mindbreeze’s clients and auditors. Founder and CEO Daniel Fallmann emphasizes that tight security and adherence to operating standards are priorities for his firm. The company’s platforms rely on AI tech to produce business insights to its clients. Based in Chicago, Mindbreeze was founded in 2015.

Cynthia Murrell, June 19, 2020

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta