Amazon and Fake Reviews: Ah, Ha, Fake Reviews Exist

September 5, 2022

I read “Amazon’s Delay for the Rings of Power Reviews on Prime Video Part of New Initiative to Filter Out Trolls.” The write up makes reasonably official the factoid that Amazon reviews are, in many cases, more fanciful than the plot of Rings of Power.

The write up states:

The series appears to have been review bombed — when trolls flood intentionally negative reviews for a show or film — on other sites like Rotten Tomatoes, where it has an 84% rating from professional critics, but a 37% from user-submitted reviews. “The Rings of Power” has been fending off trolls for months, especially ones who take issue with the decision to cast actors of color as elves, dwarves, hand waves and other folk of Tolkien’s fictional Middle-earth.

Amazon wants to be a good shepherd for truth. The write up says:

Amazon’s new initiative to review its reviews, however, is designed to weed out ones that are posted in bad faith, deadening their impact. In the case of “A League of Their Own,” it appears to have worked: To date, the show has an average 4.3 out of 5 star rating on Prime Video, with 80% of users rating the show with five stars and 14% with one star.

Interesting. My view is that Amazon hand waves about fake reviews but for those which could endanger its own video product. Agree with me or not, Amazon is revealing that fake reviews are an issue. What about those reviews for Chinese shirts which appear to have been fabricated for folks in the seventh grade? SageMaker, what’s up?

Stephen E Arnold, September 12, 2022

Bots Are Hot

September 2, 2022

Developer Michael I Lewis had noble intentions when he launched in 2020. Because Google and other prominent search engines have become little more than SEO and advertising ambushes, he worked evenings and weekends to create a search engine free from both ads and search engine optimization. The site indexes only user-submitted personal and independent sites and leaves content curation up to its community. Naturally, the site also emphasizes privacy and is open source. To keep the lights on, Lewis charges a modest listing fee. Alas, even this principled platform has failed to escape the worst goblins of the SEO field. Lewis laments, “Almost All Searches on my Independent Search Engine Are Now from SEO Spam Bots.”

SEO spam lowers the usual SEO trickery into the realm of hacking. It’s black hat practitioners exploit weaknesses, like insecure passwords or out-of-data plugins, in any website they can penetrate and plant their own keywords, links, and other dubious content. That spam then rides its target site up the search rankings as long as it can, ripping off marks along the way. If the infiltration goes on for long, the reputation and ranking of the infected website will tank, leaving its owner wondering what went awry. The results can be devastating for affected businesses.

In spring of 2022, Lewis detected a suspicious jump in non-human visitors on He writes:

“I’ve always had some activity from bots, but it has been manageable. However, in mid-April 2022, bot activity started to increase dramatically. I didn’t notice at first because the web analytics only shows real users, and the unusual activity could only be seen by looking at the server logs. I initially suspected that it was another search engine scraping results and showing them on their results page, because the IP addresses, user agents and search queries were all different. I then started to wonder if it was a DDoS attack, as the scale of the problem and the impact it was having on the servers (and therefore running costs) started to become apparent. After some deeper investigation, I noticed that most of the search queries followed a similar pattern. … It turns out that these search patterns are ‘scraping footprints’. These are used by the SEO practitioners, when combined with their search terms, to search for URLs to target, implying that has been listed as a search engine in one or more SEO tools like ScrapeBox, GSA SEO or SEnuke. It is hard to imagine any legitimate white-hat SEO techniques requiring these search results, so I would have to imagine it is for black-hat SEO operations.”

Meanwhile, Lewis’ site has seen very little traffic from actual humans. Though it might be tempting to accuse major search engines of deliberately downplaying the competition, he suspects the site is simply drowning in a sea of SEO spam. Are real people browsing the Web anymore, as opposed to lapping up whatever social media sites choose to dish out? A few, but they are increasingly difficult to detect within the crowd of bots looking to make a buck.

Cynthia Murrell, September 2, 2022

Scraping By: A Winner Business Model

May 23, 2022

Will Microsoft-owned LinkedIn try, try, try again? The platform’s latest attempt to protect its users’ data from being ransacked has been thwarted, TechCrunch reveals in, “Web Scraping Is Legal, US Appeals Court Reaffirms.” The case reached the Supreme Court last year, but SCOTUS sent it back down to the Ninth Circuit of Appeals for a re-review. That court reaffirmed its original finding: scraping publicly accessible data is not a violation of the decades-old Computer Fraud and Abuse Act (CFAA). It is a decision to celebrate or to lament, depending on one’s perspective. A threat to the privacy of those who use social media and other online services, the practice is integral to many who preserve, analyze, and report information. Writer Zack Whittaker explains:

“The Ninth Circuit’s decision is a major win for archivists, academics, researchers and journalists who use tools to mass collect, or scrape, information that is publicly accessible on the internet. Without a ruling in place, long-running projects to archive websites no longer online and using publicly accessible data for academic and research studies have been left in legal limbo. But there have been egregious cases of web scraping that have sparked privacy and security concerns. Facial recognition startup Clearview AI claims to have scraped billions of social media profile photos, prompting several tech giants to file lawsuits against the startup. Several companies, including Facebook, Instagram, Parler, Venmo and Clubhouse have all had users’ data scraped over the years. The case before the Ninth Circuit was originally brought by LinkedIn against Hiq Labs, a company that uses public data to analyze employee attrition. LinkedIn said Hiq’s mass web scraping of LinkedIn user profiles was against its terms of service, amounted to hacking and was therefore a violation of the CFAA.”

The Ninth Circuit disagreed. Twice. In the latest decision, the court pointed to last year’s Supreme Court ruling which narrowed the scope of the CFAA to those who “gain unauthorized access to a computer system,” as opposed to those who simply exceed their authorization. A LinkedIn spokesperson expressed disappointment, stating the platform will “continue to fight” for its users’ rights over their data. Stay tuned.

Cynthia Murrell, May 23, 2022

UK Bill Would Require Age Verification

February 25, 2022

It might seem like a no-brainer—require age verification to protect children from adult content wherever it may appear online. But The Register insists it is not so simple in, “ Threatens to Make Adults Give Credit Card Details for Access to Facebook or TikTok.” The UK’s upcoming Online Safety Bill will compel certain websites to ensure users are 18 or older, a process often done using credit card or other sensitive data. Though at first the government vowed this requirement would only apply to dedicated porn sites, a more recent statement from the Department for Digital, Culture, Media, and Sport indicates social media companies will be included. The statement notes research suggests such sites are common places for minors to access adult material.

Writer Gareth Corfield insists the bill will not even work because teenagers are perfectly capable of using a VPN to get around age verification measures. Meanwhile, adults following the rules will have to share sensitive data with third-party gatekeepers just to keep up with friends and family on social media. Then there is the threat to encryption, which would have to be discontinued to enable the bills provision for scanning social media posts. Civil liberties groups have expressed concern, just as they did the last time around. Corfield observes:

“Prior efforts for mandatory age verification controls were originally supposed to be inserted into Digital Economy Act but were abandoned in 2019 after more than one delay. At that time, the government had designated the British Board of Film Classification, rather than Ofcom, as the age verification regulator. In 2018, it estimated that legal challenges to implementing the age check rules could cost it up to £10m in the first year alone. As we pointed out at the time, despite what lawmakers would like to believe – it’s not a simple case of taking offline laws and applying them online. There are no end of technical and societal issues thrown up by asking people to submit personal details to third parties on the internet. … The newer effort, via the Online Safety Bill, will possibly fuel Britons’ use of VPNs and workarounds, which is arguably equally as risky: free VPNs come with a lot of risks and even paid products may not always work as advertised.”

So if this measure is not viable, what could be the solution to keeping kids away from harmful content? If only each child could be assigned one or more adults responsible for what their youngsters access online. We could call them “caregivers,” “guardians,” or “parents,” perhaps.

Cynthia Murrell, February 25, 2022

Coalesce: Tackling the Bottleneck Few Talk About

February 1, 2022

Coalesce went stealth, the fancier and more modern techno slang for “going dark,” to work on projects in secret. The company has returned to the light, says Crowd Fund Insider with a robust business plan and product, plus loads of funding: “Coalesce Debuts From Stealth, Attracts $5.92M For Analytics Platform.”

Coalesce is run by a former Oracle employee and it develops products and services similar to Oracle, but with a Marklogic spin. That is one way to interpret how Coalesce announced its big return with its Coalesce Data Transformation platform that offers modeling, cleansing, governance, and documentation of data with analytical efficiency and flexibility. Do no forger that 11.2 Capital and GreatPoint Ventures raised $5.92 million in seed funding for the new data platform. Coalesce plans to use the funding for engineering functions, developing marketing strategy, and expanding sales.

Coalesce noticed that there is a weak link between organizations’ cloud analytics and actively making use of data:

“ ‘The largest bottleneck in the data analytics supply chain today is transformations. As more companies move to the cloud, the weaknesses in their data transformation layer are becoming apparent,’ said Armon Petrossian, the co-founder and CEO of Coalesce. “Data teams are struggling to keep up with the demands from the business, and this problem has only continued to grow with the volumes and complexity of data combined with the shortage of skilled people. We are on a mission to radically improve the analytics landscape by making enterprise-scale data transformations as efficient and flexible as possible.’”

Coalesce might be duplicating Oracle and MarkLogic, but if they have discovered a niche market in cloud analytics then they are about to rocket from their stealth. Hopefully the company will solve the transformation problem instead of issuing marketing statements as many other firms do.

Whitney Grace, February 1, 2022

Anonymized Location Data: an Oxymoron?

May 13, 2020

Location data. To many the term sounds innocuous, boring really. Perhaps that is why society has allowed apps to collect and sell it with no significant regulation. An engaging (and well-illustrated) piece from Norway’s NRK News, “Revealed by Mobile,” shares the minute details journalists were able to put together about one citizen from location data purchased on the open market. Graciously, this man allowed the findings to published as a cautionary tale. We suggest you read the article for yourself to absorb the chilling reality. (The link we share above runs through Google Translate.)

Vendors of location data would have us believe the information is completely anonymized and cannot be tied to the individuals who generated it. It is only good for general uses like statistics and regional marketing, they assert. Intending to put that claim to the test, NRK purchased a batch of Norwegian location data from the British firm Tamoco. Their investigation shows anonymization is an empty promise. Though the data is stripped of directly identifying information, buyers are a few Internet searches away from correlating location patterns with individuals. Journalists Trude Furuly, Henrik Lied, and Martin Gundersen tell us:

“All modern mobile phones have a GPS receiver, which with the help of satellite can track the exact position of the phone with only a few meters distance. The position data NRK acquired consisted of a table with four hundred million map coordinates from mobiles in Norway. …

“All the coordinates were linked to a date, time, and specific mobile. Thus, the coordinates showed exactly where a mobile or tablet had been at a particular time. NRK coordinated the mobile positions with a map of Norway. Each position was marked on the map as an orange dot. If a mobile was in a location repeatedly and for a long time, the points formed larger clusters. Would it be possible for us to find the identity of a mobile owner by seeing where the phone had been, in combination with some simple web searches? We selected a random mobile from the dataset.

“NRK searched the address where the mobile had left many points about the nights. The search revealed that a man and a woman lived in the house. Then we searched their Facebook profiles. There were several pictures of the two smiling together. It seemed like they were boyfriend and girlfriend. The man’s Facebook profile stated that he worked in a logistics company. When we searched the company in question, we discovered that it was in the same place as the person used to drive in the morning. Thus, we had managed to trace the person who owned the cell phone, even though the data according to Tamoco should have been anonymized.”

The journalists went on to put together a detailed record of that man’s movements over several months. It turns out they knew more about his trip to the zoo, for example, than he recalled himself. When they revealed their findings to their subject, he was shocked and immediately began deleting non-essential apps from his phone. Read the article; you may find yourself doing the same.

Cynthia Murrell, May 12, 2020

Enterprise Document Management: A Remarkable Point of View

March 3, 2020

DarkCyber spotted “What Is an Enterprise Document Management (EDM) System? How to Implement Full Document Control.” The write up is lengthy, running about 4,000 words. There are pictures like this one:


ECM is enterprise content management and in the middle is Enterprise Document Management which is abbreviated DMS, not EDM.

The idea is that documents have to be managed, and DarkCyber assumes that most organizations do not manage their content — regardless of its format — particularly well until the company is involved in a legal matter. Then document management becomes the responsibility of the lawyers.

In order to do any type of document or content management, employees have to follow the rules. The rules are the underlying foundation of the article. A company manufacturing interior panels for an automaker will have to have a product management system, an system to deal with drawings (paper and digital), supplier data, and other bits and pieces to make sure the “door cards” are produced.

The problem is that guidelines often do not translate into consistent employee behavior. One big reason is that the guidelines don’t fit into the work flows and the incentive schemes do not reward the time and effort required to make sure the information ends up in the “system.” Many professionals write something, text it, and move on. Enterprise systems typically do not track fine grained information very well.

Like enterprise search, the “document management” folks try to make workers who may be concerned about becoming redundant, a sick child, an angry boss, or any other perturbation in the consultant’s checklist ignore many information rules.

There is an association focused on records management. There are companies concerned with content management. There are vendors who focus on images, videos, audio, and tweets.

The myth that an EDM, ECM, or enterprise search system can create an affordable, non invasive, legally compliant, and effective way to deal with the digital fruit cake in organizations is worth lots of money.

The problem is that these systems, methods, guidelines, data lakes, federation technologies, smart software, etc. etc. don’t work.

The article does a good job of explaining what a consultant recommends. The information it presents provides fodder for the marketing animals who are going to help sell systems, training, and consulting.

The reality is that humans generate information and use a range of systems to produce content. Tweets about a missed shipment from a person mobile phone may be prohibited. Yeah, explain that to the person who got the order in the door and kept the commitment to the customer.

There are conferences, blogs, consulting firms, reports, and BrightPlanet videos about managing information.

The write up states:

There is no use documenting and managing poor workflows, processes, and documentation. To survive in business, you have to adapt, change and improve. That means continuously evaluating your business operations to identify shortfalls, areas for improvements, and strengths for continuous investment. Regular internal audits of your management systems will enable you to evaluate the effectiveness of your Enterprise Document Management solution.

Right. When these silver bullet, pie-in-the-sky solutions cost more than budgeted, employees quit using them, and triage costs threaten the survival of the company — call in the consultants.

Today’s systems do not work with the people actually doing information creation. As a result, most fail to deliver. Sound familiar? It should. You, gentle reader, will never follow the information rules unless you are specifically paid to follow them or given an ultimatum like “do this or get fired.”

Tweet that and let me know if you managed that information.

Stephen E Arnold, March 3, 2020

Blockchain: A Loser in 2020?

December 31, 2019

I recently completed a report about Amazon’s R&D work in blockchain. If you want a free summary of the report, write darkcyber333 at yandex dot com. If not, no problem. You will want to read “Please Blockchain, Prove Me Wrong.” The author likes to use words on some online services stop list, but that’s okay. The writer is passionate about the perceived failings of blockchain.

Blockchain is, according to the write up:

a solution looking for a problem.”

More proof needed, you gentle but skeptical reader? How about this?

According to Gartner’s Hype Cycle, blockchain is still “sliding into the trough of disillusionment,” meaning the technology is struggling to live up to the expectations created by the hype around it.

There you go. Proof from a marketing company.

DarkCyber’s view is that encryption is likely to continue to toddle forward. Also, the charm of the distributed database continues to woe some people’s attention.

There may be hope, and perhaps that is why Amazon has more than a dozen patents related to blockchain technology. We learn from the impassioned analysis:

Blockchain’s purported promise is such that everyone is willingly taking a multi-faceted approach, not giving much thought to the possibility that its potential may, in fact, be limited. Or maybe blockchain is just the first iteration of something far more powerful, a base we can build on to restore our faith in decentralized systems.

To sum up, for a dead duck, there are some feathers afloat. And there are those Amazon patents? Maybe Mr. Bezos is just off base and should stick to bulldozing outfits like mom and pop stores and outfits like FedEx?

Stephen E Arnold, December 31, 2019

Online Consumption of Data: A Mental Architecture Built on Inherent Addictive Patterns??

December 27, 2019

Two items caught my attention. The first explains that more than 80 percent of a sample group use a “second screen” when watching television. Yep, the boob tube and the vast wasteland. Marshall McLuan, a controversial figure, explained that TV is a kick back and vegetate medium. Punching buttons and formulating a thought for a tweet is hot. The article “88% of Americans Use a Second Screen While Watching TV. Why?” references the factoid that humans are not very adept at multi tasking. Interesting because humans can walk and chew gum, breathe, and think about crossing the street at the same time. But whatever. Also, the write up ignores the McLuhanesque approach that each type of media has its own “construct” or “mental evocation.”

The answer to “Why?” may be as simple as, “Addiction. Just a TV and a computing device.” Can one get the monkey off one’s back? Not easily.

Who can assist another? Consider if this item of information is correct: “70% Parents Cannot Control Their Own Online Activity.” This write up reports:

Around 70 per cent of parents admit that they themselves spend too much time online and 72 per cent feel that internet and mobile device usage in general is impeding family life…

Net net: No wonder information has to be crunchy. Easy to use is becoming a strategy for control. Interesting implications for 2020 and beyond if these two reports are mostly accurate.

Stephen E Arnold, December 27, 2019

Sentiment Analysis: Can a Monkey Can Do It?

June 27, 2019

Sentiment analysis is a machine learning tool companies are employing to understand how their customers feel about their services and products. It is mainly deployed on social media platforms, including Facebook, Instagram, and Twitter. The Monkey Learn blog details how sentiment analysis is specifically being used on Twitter in the post, “Sentiment Analysis Of Twitter.”

Using sentiment analysis is not a new phenomenon, but there are still individuals unaware of the possible power at their fingertips. Monkey Learn specializes in customer machine learning solutions that include intent, keywords, and, of course, sentiment analysis. The post is a guide on the basics of sentiment analysis: what it is, how it works, and real life examples. Monkey Learn defines sentiment analysis as:

Sentiment analysis (a.k.a opinion mining) is the automated process of identifying and extracting the subjective information that underlies a text. This can be either an opinion, a judgment, or a feeling about a particular topic or subject. The most common type of sentiment analysis is called ‘polarity detection’ and consists in classifying a statement as ‘positive’, ‘negative’ or ‘neutral’.”

It also relies on natural language processing (NLP) to understand the information’s context.

Monkey Learn explains that sentiment analysis is important because most of the world’s digital data is unstructured. Machine learning with NLP’s assistance can quickly sort large data sets and detect their polarity. Monkey Learn promises with their sentiment analysis to bring their customers scalability, consistent criteria, and real-time analysis. Many companies are using Twitter sentiment analysis for customer service, brand monitoring, market research, and political campaigns.

The article is basically a promotional piece for Monkey Learn, but it does work as a starting guide for sentiment analysis.

Whitney Grace, June 27, 2019

Next Page »

  • Archives

  • Recent Posts

  • Meta