Orkut Version 6, Maybe 7?

July 15, 2020

Alphabet Inc. had a dismal failure with Google+, but the company is ready to try again with a new social media platform says the Bandwidth Blog in: “With Its Experimental ‘Keen’ Service, Google Hopes To Tackle Pinterest.” Google’s new social media platform is called Keen and it is impossible not to make the assumption they are playing me too with this innovation. Can it even be called innovation?

Pinterest has its devoted users, who post everything from dream wedding albums to their favorite fanart. Google wants Keen to one up Pinterest, but its description sounds innocuous:

“Launched under the auspices of Google’s in-house incubator, Area 120, Keen is a Pinterest-like social network that is designed to marry Google’s machine learning strengths, user data hoarded, and insights from its existing Google Alerts system to craft together a smorgasbord of content that evolves as its users interests take shape.”

Google has developed powerful AI that collects user data and successfully makes personalized recommendations. Google also sells user data to advertisers, so it is not a stretch for them to use it for another social media project.

Keen is already available on Android. Users’ Keen pinboards feature content based on recent Google searches. The content includes YouTube videos, purchase suggestions, and articles. The biggest thing is that when a user becomes interested in a topic, Keen will recommend more content for deeper dives. Users can pin their items to create their own ‘Keen’ interest, then share them with others.

The experience is similar to Pinterest, except the topics on Keen are generated by Google searches and other activity. It would be easy for Google to add Keen to its other free services, especially making it an Apple app or a browser extensions on Chrome. Unlike Google+, Alphabet is not concentrating as much on Keen. It is hard to say if Keen will emerge beyond Android, but anything is possible.

Whitney Grace, July 15, 2020

Close Enough for Horse Shoes? Why Drifting Off Course Has Become a Standard Operating Procedure

July 14, 2020

One of the DarkCyber research team sent me a link to a post on Hacker News: “How Can I Quickly Trim My AWS Bill?” In the write up were some suggestions from a range of people, mostly anonymous. One suggestion caught my researcher’s attention and I too found it suggestive.

Here’s the statement the DarkCyber team member flagged for me:

If instead this is all about training / the volume of your input data: sample it, change your batch sizes, just don’t re-train, whatever you’ve gotta do.

Some context. Certain cloud functions are more “expensive” than others. Tips range from dumping GPUs for CPUs to “Buy some hardware and host it at home/office/etc.”

I kept coming back to the suggestion “don’t retrain.”

One of the magical things about certain smart software is that the little code devils learn from what goes through the system. The training gets the little devils or daemons to some out of bed and in the smart software gym.

However, in many smart processes, the content objects processed include signals not in the original training set. Off the shelf training sets are vulnerable just like those cooked up by three people working from home with zero interest in validating the “training data” from the “real world data.”

What happens?

The indexing or metadata assignments “drift.” This means that the smart software devils index a content object in a way that is different from what that content object should be tagged.

Examples range from this person matches that person to we indexed the food truck as a vehicle used in a robbery. Other examples are even more colorful or tragic depending on what smart software output one examines. Detroit facial recognition ring a bell?

Who cares?

I care. The person directly affected by shoddy thinking about training and retraining smart software, however, does not.

That’s what is troubling about this suggestion. Care and thought are mandatory for initial model training. Then as the model operates, informed humans have to monitor the smart software devils and retrain the system when the indexing goes off track.

The big or maybe I should type BIG problem today is that very few individuals want to do this even it an enlightened superior says, “Do the retraining right.”

Ho ho ho.

The enlightened boss is not going to do much checking and the outputs of a smart system just keep getting farther off track.

In some contexts like Google advertising, getting rid of inventory is more important than digging into the characteristics of Oingo (later Applied Semantics) methods. Get rid of the inventory is job one.

For other model developers, shapers, and tweakers, the suggestion to skip retraining is “good enough.”

That’s the problem.

Good enough has become the way to refactor excellence into substandard work processes.

Stephen E Arnold, July 14, 2020

Yes, Elegance in Language Explains Big Data in a More Satisfying Way for Some

July 14, 2020

I was surprised and then uncomfortable with the information in a tweet thread from Abebab. The tweet explained that “Big Dick Data” is a formal academic term. Apparently this evocative and polished turn of phrase emerged from a write up by “D’Ignazio and F. Klein”.

Here’s the definition:

a formal, academic term that D’Ignazio & F. Klein have coined to denote big data projects that are characterized by masculinist, totalizing fantasies of world domination as enacted through data capture and analysis.

To prove the veracity of the verbal innovation, an image from a publication is presented; herewith a copy:

image

When I came upon the tweet, the item accrued 119 likes.

Observations:

  • Is the phrase a contribution to the discussion of Big Data, or is the phrase a political statement?
  • Will someone undertake a PhD dissertation on the subject, using the phrase as the title or will a business publisher crank out an instant book?
  • What mid tier consulting firm will offer an analysis of this Big Data niche and rank the participants using appropriate categories to communicate each particular method?

Outstanding, tasteful, and one more — albeit quite small — attempt to make clear that discourse is being stretched.

Above all, classy or possibly a way to wrangle a job writing one liners for a comedian looking for Big Data chuckles.

Stephen E Arnold, July 14, 2020

IHS Markit Data Lake “Catalog”

July 14, 2020

One of the DarkCyber research team spotted this product announcement from IHS, a diversified information company: “IHS Markit’s New Data Lake Delivers Over 1,000 Datsets in an Integrated Catalogued Platform.” The article states:

The cloud-based platform stores, catalogues, and governs access to structured and unstructured data. Data Lake solutions include access to over 1,000 proprietary data assets, which will be expanded over time, as well as a technology platform allowing clients to manage their own data. The IHS Markit Data Lake Catalogue offers robust search and exploration capabilities, accessed via a standardized taxonomy, across datasets from the financial services, transportation and energy sectors.

The idea is consistently organized information. Queries can run across the content to which the customer has access.

Similar services are available from other companies; for example, Oracle BlueKai.

One question which comes up is, “What exactly are the data on offer?” Another is, “How much does it cost to use the service?”

Let’s tackle the first question: Scope.

None of the aggregators make it easy to scan a list of datasets, click on an item, and get a useful synopsis of the content, content elements, number of items in the dataset, update frequency (annual, monthly, weekly, near real time), and the cost method applicable to a particular “standard” query.

A search of Bing and Google reveals the name of particular sets of data; for example, Carfax. However, getting answers to the scope question can require direct interaction with the company. Some aggregators operate in a similar manner.

The second question: Cost?

The answer to the cost question is a tricky one. The data aggregators have adopted a set or a cluster of pricing scenarios. It is up to the customer to look at the disclosed data and do some figuring. In DarkCyber’s experience, the data aggregators know much more about what content process, functions or operations generate the maximum profit for the vendor. The customer does not have this insight. Only through use of the system, analyzing the invoices, and paying them is it possible to get a grip on costs.

DarkCyber’s view is that data marketplaces are vulnerable to disruption. With a growing demand for a wide range of information some potential customers want answers before signing a contract and outputting big bucks.

Aggregators are a participant in what DarkCyber calls “professional publishing.” The key to this sector is mystery and a reluctance to spell out exact answers to important questions.

What company is poised to disrupt the data aggregation business? Is it the small scale specialist like the firms pursued relentlessly by “real” journalists seeking a story about violations of privacy? Is it a giant company casting about for a new source of revenue and, therefore, is easily overlooked. Aggregation is not exactly exciting for many people.

DarkCyber does not know. One thing seems highly likely: Professional publishing data aggregation sector is likely to face competitive pressure in the months ahead.

Some customers may be fed up with the secrecy and lack of clarity and entrepreneurs will spot the opportunity and move forward. Rich innovators will just buy the vendors and move in new directions.

Stephen E Arnold, July 14, 2020

The Covidization of Good Enough

July 14, 2020

I have written numerous times about the zippy young PhD with an attitude. After my talk about declining “findability”, Zippy (not his real name) spoke with me after my talk. He had one point and repeated it to me several times:

Search is good enough.

In my first lecture at the National Cyber Crime Conference on Monday, July 13, 2020, I make the point that locating information or getting software updates that work is an indication that “good enough” has become the target.

I read “A Moment of Clarity Regarding the Raison d’Etre for the App Store.” The article, which is critical of the giant weird technology company, includes this statement:

I worry that this sort of “Who cares, it’s better than nothing” attitude has seeped into Apple itself, and explains how we wound up with barely modified iPad apps shipping as system apps on the Mac. But more than anything I worry that this exemplifies where Apple has lost its way with the App Store.

What we have is a big American company allowing another big American company to deliver “good enough” products and services.

Couple this attitude with the challenges in education and pandemics. What do you get?

Hey, a good enough mind set. The problem is that good enough isn’t.

Stephen E Arnold, July 14, 2020

Sillycon Valley: When Molecules No Longer Collide

July 14, 2020

How Remote Work Could Destroy Silicon Valley” presents a slightly dark scenario for those infused with technology. The write up explains that when lots of eager smart people no longer interact in real life, something magical is lost.

If you wonder about the magic of Silicon Valley, Philz Coffee, and the risk of techquakes, you may find the write up interesting.

Two of the DarkCyber team wondered if Brownian motion is a factor. Those hyper educated, over-achievers bask in the sun. The closeness and the “energy”: What could be provide a better greenhouse for innovation. Money can grow on that code.

The spoiler is that WFHers will not benefit from the environment in a basement or a comparatively spacious two bedroom apartment in Salinas, Kansas.

When those folks go to work via Zoom, those sprouts of innovation, those vulnerable innovators suffer.

Thus, Sillycon Valley starts to look more like Salinas in the winter.

Stephen E Arnold, July 14, 2020

Divorced by Smart Software and Hopefully Outstandingly Objective Algorithms

July 13, 2020

Amica, someone’s pal. A divorce adjudicated by smart software.

We are not sure this is a good idea. Fossbytes reports, “Australian Governments Roll Out Amica AI for Settling Divorces.” Can an algorithm replace human arbitration in a heated divorce? Apparently, the Aussies in charge believe it can. Writer Nishit Raghuwanshi explains:

“The Australian government has rolled out an AI named Amica that will help the partners in dividing money and property. Moreover, the AI will also help in making appropriate parenting arrangements without hiring a lawyer. As reported in Gizmodo, Australian AG Christian Porter mentioned that the Australian government is trying its best to improvise the Australian family law system. The main priority of the government is to make the system a bit more fast and cheap. He concluded his statement by saying that the government is also working on making the divorce process less stressful for the partners and their children.

“As per the stats, most of the Australian couples were inclined towards dumping their partners owing to Coronavirus quarantine period. It is expected that just after a little relaxation from COVID-19, a large number of couples will appear in the court for separation cases.”

Apparently this dynamic means post-pandemic will be the perfect time to put the project into place. Australia’s family courts were already swamped, we’re told, and all this forced togetherness threatens to completely overwhelm them. Any time one partner refuses to accept the AI’s recommendations, however, a lawyer will still be required. So If the algorithm is not good at its job the court system may not see much relief. Currently, the tool is free to Australian citizens, but a fee between $113 and $303 will be enacted next year.

DarkCyber wonders if the system was developed by an objective humanoid, hopefully one unaffected by a parental dust up. Revenge? Maybe?

Cynthia Murrell, July 13, 2020

Amazon: We Love the Cheery Smile, But Does It Have a Darker Meaning?

July 13, 2020

Who needs the Dark Web when one has Amazon? The Markup reveals, “Amazon’s Enforcement Failures Leave Open a Back Door to Banned Goods—Some Sold and Shipped by Amazon Itself.” Investigators at The Markup began combing the site for banned goods after a series of deaths and illnesses attributed to one counterfeit pill maker. The fake-Percocet maker, now in prison, revealed he’d bought his pill press right off Amazon. The journalists were dismayed to find nearly 100 dangerous and/or illegal items readily available on the site. All of these products are explicitly banned in Amazon’s third-party seller rules and prohibitions for the U.S. market. Reporters Annie Gilbertson and Jon Keegan write:

“The Markup filled a shopping cart with a bounty of banned items: marijuana bongs, ‘dab kits’ used to inhale cannabis concentrates, ‘crackers’ that can be used to get high on nitrous oxide, and compounds that reviews showed were used as injectable drugs. We found two pill presses and a die used to shape tablets into a Transformers logo, which is among the characters that have been found imprinted on club drugs such as ecstasy. We found listings for prohibited tools for picking locks and jimmying open car doors. And we found AR-15 gun parts and accessories that Amazon specifically bans. Almost three dozen listings for banned items were sold by third parties but available to ship from Amazon’s own warehouses. At least four were listed as ‘Amazon’s Choice.’ The phrase ‘ships from and sold by Amazon.com’ appeared beneath the buy button of five of the banned items we found, which two former employees confirmed means those products are, in fact, sold by Amazon. In addition, one of the sellers we were able to reach also confirmed it sold the items to Amazon.”

Of course, “Amazon’s choices” are often chosen by algorithm, which is part of the problem. The site does have a process for finding and removing banned products, but the human reviewers cannot keep up with the onslaught of third-party uploads. The journalists found several products that evaded detection by being listed as something they are not—like the AR-15 vise block masquerading as a desk accessory, complete with paperclips and pencil erasers in the image. Other items simply avoid telltale keywords, but are plain as day to anyone who views the listing. It is apparent even the algorithm has a clue because it frequently recommends items related to the product at hand. See the article for more examples.

What will Amazon do about this alarming issue? Well, if we take spokesperson Patrick Graham’s responses as a guide, the answer is it will downplay the problem. Seems about right.

Cynthia Murrell, July 13, 2020

Germany Is Getting Serious about Content

July 13, 2020

If accurate, Germany is moving ahead of the Five Eyes’ group in terms of access to online data. “New German Law Would Force ISPs to Allow Secret Service to Install Trojans on User Devices” reports:

A new law being proposed in Germany would see all 19 federal state intelligence agencies in Germany granted the power to spy on German citizens through the use of Trojans. The new law would force internet service providers (ISPs) to install government hardware at their data centers which would reroute data to law enforcement, and then on to its intended destination so the target is blissfully unaware that their communications and even software updates are being proxied.

If accurate, this is an important law. Germany’s experience with this type of legislation will put some oomph in the Five Eyes’ partners efforts as well as influence other European entities.

Stephen E Arnold, July 13, 2020

Possibly Harmful Smart Software: Heck, Release It

July 13, 2020

So, what changed? A brief write-up at Daijiworld reports that the “Elon Musk-Founded OpenAI Releases Text Tool it Once Called Dangerous.” This software can rapidly generate fake news that is so believable its makers once deemed it too dangerous to be released. Now, though, OpenAI seems to have had a change of heart. The API is being released in a private beta rather than into to the world at large as a test run of sorts. Citing an OpenAI blog post, the write-up reveals:

“‘In releasing the API, we are working closely with our partners to see what challenges arise when AI systems are used in the real world,’ OpenAI said in a blog post last week. ‘This will help guide our efforts to understand how deploying future AI systems will go, and what we need to do to make sure they are safe and beneficial for everyone.’ The API that OpenAI finally decided to release provides a general-purpose ‘text in, text out’ interface, allowing users to try it on virtually any English language task. Interested buyers can integrate the API into their product and develop an entirely new application. ‘Given any text prompt, the API will return a text completion, attempting to match the pattern you gave it. You can “program” it by showing it just a few examples of what you’d like it to do; its success generally varies depending on how complex the task is,’ OpenAI said.”

Where will it go from here—will OpenAI decide a general release is worth the risk? We’re guessing it will. Evidently this software is just too juicy to keep under wraps.

Cynthia Murrell, June 25, 2020

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta