A Legal Information Truth Inconvenient, Expensive, and Dangerous

December 5, 2022

The Wall Street Journal published “Justice Department Prosecutors Swamped with Data As Cases Leave Long Digital Trails.” The write up addressed a problematic reality without craziness. The basic idea is that prosecutors struggle with digital information. The consequences are higher costs and in some cases allowing potentially problematic individuals to go to Burger King or corporate practices to chug along with felicity.

The article states:

Federal prosecutors are swamped by data, as the way people communicate and engage in behavior scrutinized by investigators often leaves long and complicated digital trails that can outpace the Justice Department’s technology.

What’s the fix? This is a remarkable paragraph:

The Justice Department has been working on ways to address the problem, including by seeking additional funding for electronic-evidence technology and staffing for US attorney’s offices. It is also providing guidance in an annual training for prosecutors to at times collect less data.

Okay, more money which may or may not be spent in a way to address the big data issues, more lawyers (hopefully skilled in manipulating content processing systems functions), annual training, and gather less information germane to a legal matter. I want to mention that misinformation, reformation of data, and weaponized data are apparently not present in prosecutors’ data sets or not yet recognized as a problem by the Justice Department.

My response to this interesting article includes:

  1. This is news? The issue has been problematic for many years. The vendors of specialized systems to manage evidence, index and make searchable content from disparate sources, and output systems which generate a record of what lawyer accessed what and when are asserting their systems can handle this problem. Obviously either licensees discover the systems don’t work like the demos or cannot handle large flows of disparate content.
  2. The legal industry is not associated with groundbreaking information innovation. I may be biased, but I think of lawyers knowing more about billing for their time than making use of appropriate, reliable technology for managing evidence. Excel timesheets are one thing. Dark Web forum content, telephone intercepts, and context free email and chat messages are quite different. Annual training won’t change the situation. The problem has to be addressed by law schools and lawyer certification systems. Licensing a super duper search system won’t deal with the problem no matter what consultants, vendors, and law professors say.
  3. The issue of “big data” is real, particularly when there are some many content objects available to a legal team, its consultants, and the government professionals working on a case or a particular matter. It is just easier to gather and then try to make sense of the data. When the necessary information is not available, time or money runs out and everyone moves on. Big data becomes a process that derails some legal proceedings.

My view is that similar examples of “data failure” will surface. The meltdown of crypto? Yes, too much data. The downstream consequences of certain medical products? Yes, too much data and possibly the subtle challenge of data shaping by certain commercial firms? The interlocks among suppliers of electrical components? Yes, too much data and possibly information weaponization by parties to a legal matter?

When online meant keyword indexing and search, old school research skills and traditional data collection were abundant. Today, short cuts and techno magic are daily fare.

It is time to face reality. Some technology is useful, but human expertise and judgment remain essential. Perhaps that will be handled in annual training, possibly on a cruise ship with colleagues? A vendor conference offering continuing education credits might be a more workable solution than smart software with built in workflow.

Stephen E Arnold, December 5, 2022

Amazon and Fake Reviews: Ah, Ha, Fake Reviews Exist

September 5, 2022

I read “Amazon’s Delay for the Rings of Power Reviews on Prime Video Part of New Initiative to Filter Out Trolls.” The write up makes reasonably official the factoid that Amazon reviews are, in many cases, more fanciful than the plot of Rings of Power.

The write up states:

The series appears to have been review bombed — when trolls flood intentionally negative reviews for a show or film — on other sites like Rotten Tomatoes, where it has an 84% rating from professional critics, but a 37% from user-submitted reviews. “The Rings of Power” has been fending off trolls for months, especially ones who take issue with the decision to cast actors of color as elves, dwarves, hand waves and other folk of Tolkien’s fictional Middle-earth.

Amazon wants to be a good shepherd for truth. The write up says:

Amazon’s new initiative to review its reviews, however, is designed to weed out ones that are posted in bad faith, deadening their impact. In the case of “A League of Their Own,” it appears to have worked: To date, the show has an average 4.3 out of 5 star rating on Prime Video, with 80% of users rating the show with five stars and 14% with one star.

Interesting. My view is that Amazon hand waves about fake reviews but for those which could endanger its own video product. Agree with me or not, Amazon is revealing that fake reviews are an issue. What about those reviews for Chinese shirts which appear to have been fabricated for folks in the seventh grade? SageMaker, what’s up?

Stephen E Arnold, September 12, 2022

Bots Are Hot

September 2, 2022

Developer Michael I Lewis had noble intentions when he launched searchmysite.net in 2020. Because Google and other prominent search engines have become little more than SEO and advertising ambushes, he worked evenings and weekends to create a search engine free from both ads and search engine optimization. The site indexes only user-submitted personal and independent sites and leaves content curation up to its community. Naturally, the site also emphasizes privacy and is open source. To keep the lights on, Lewis charges a modest listing fee. Alas, even this principled platform has failed to escape the worst goblins of the SEO field. Lewis laments, “Almost All Searches on my Independent Search Engine Are Now from SEO Spam Bots.”

SEO spam lowers the usual SEO trickery into the realm of hacking. It’s black hat practitioners exploit weaknesses, like insecure passwords or out-of-data plugins, in any website they can penetrate and plant their own keywords, links, and other dubious content. That spam then rides its target site up the search rankings as long as it can, ripping off marks along the way. If the infiltration goes on for long, the reputation and ranking of the infected website will tank, leaving its owner wondering what went awry. The results can be devastating for affected businesses.

In spring of 2022, Lewis detected a suspicious jump in non-human visitors on searchmysite.net. He writes:

“I’ve always had some activity from bots, but it has been manageable. However, in mid-April 2022, bot activity started to increase dramatically. I didn’t notice at first because the web analytics only shows real users, and the unusual activity could only be seen by looking at the server logs. I initially suspected that it was another search engine scraping results and showing them on their results page, because the IP addresses, user agents and search queries were all different. I then started to wonder if it was a DDoS attack, as the scale of the problem and the impact it was having on the servers (and therefore running costs) started to become apparent. After some deeper investigation, I noticed that most of the search queries followed a similar pattern. … It turns out that these search patterns are ‘scraping footprints’. These are used by the SEO practitioners, when combined with their search terms, to search for URLs to target, implying that searchmysite.net has been listed as a search engine in one or more SEO tools like ScrapeBox, GSA SEO or SEnuke. It is hard to imagine any legitimate white-hat SEO techniques requiring these search results, so I would have to imagine it is for black-hat SEO operations.”

Meanwhile, Lewis’ site has seen very little traffic from actual humans. Though it might be tempting to accuse major search engines of deliberately downplaying the competition, he suspects the site is simply drowning in a sea of SEO spam. Are real people browsing the Web anymore, as opposed to lapping up whatever social media sites choose to dish out? A few, but they are increasingly difficult to detect within the crowd of bots looking to make a buck.

Cynthia Murrell, September 2, 2022

Scraping By: A Winner Business Model

May 23, 2022

Will Microsoft-owned LinkedIn try, try, try again? The platform’s latest attempt to protect its users’ data from being ransacked has been thwarted, TechCrunch reveals in, “Web Scraping Is Legal, US Appeals Court Reaffirms.” The case reached the Supreme Court last year, but SCOTUS sent it back down to the Ninth Circuit of Appeals for a re-review. That court reaffirmed its original finding: scraping publicly accessible data is not a violation of the decades-old Computer Fraud and Abuse Act (CFAA). It is a decision to celebrate or to lament, depending on one’s perspective. A threat to the privacy of those who use social media and other online services, the practice is integral to many who preserve, analyze, and report information. Writer Zack Whittaker explains:

“The Ninth Circuit’s decision is a major win for archivists, academics, researchers and journalists who use tools to mass collect, or scrape, information that is publicly accessible on the internet. Without a ruling in place, long-running projects to archive websites no longer online and using publicly accessible data for academic and research studies have been left in legal limbo. But there have been egregious cases of web scraping that have sparked privacy and security concerns. Facial recognition startup Clearview AI claims to have scraped billions of social media profile photos, prompting several tech giants to file lawsuits against the startup. Several companies, including Facebook, Instagram, Parler, Venmo and Clubhouse have all had users’ data scraped over the years. The case before the Ninth Circuit was originally brought by LinkedIn against Hiq Labs, a company that uses public data to analyze employee attrition. LinkedIn said Hiq’s mass web scraping of LinkedIn user profiles was against its terms of service, amounted to hacking and was therefore a violation of the CFAA.”

The Ninth Circuit disagreed. Twice. In the latest decision, the court pointed to last year’s Supreme Court ruling which narrowed the scope of the CFAA to those who “gain unauthorized access to a computer system,” as opposed to those who simply exceed their authorization. A LinkedIn spokesperson expressed disappointment, stating the platform will “continue to fight” for its users’ rights over their data. Stay tuned.

Cynthia Murrell, May 23, 2022

UK Bill Would Require Age Verification

February 25, 2022

It might seem like a no-brainer—require age verification to protect children from adult content wherever it may appear online. But The Register insists it is not so simple in, “UK.gov Threatens to Make Adults Give Credit Card Details for Access to Facebook or TikTok.” The UK’s upcoming Online Safety Bill will compel certain websites to ensure users are 18 or older, a process often done using credit card or other sensitive data. Though at first the government vowed this requirement would only apply to dedicated porn sites, a more recent statement from the Department for Digital, Culture, Media, and Sport indicates social media companies will be included. The statement notes research suggests such sites are common places for minors to access adult material.

Writer Gareth Corfield insists the bill will not even work because teenagers are perfectly capable of using a VPN to get around age verification measures. Meanwhile, adults following the rules will have to share sensitive data with third-party gatekeepers just to keep up with friends and family on social media. Then there is the threat to encryption, which would have to be discontinued to enable the bills provision for scanning social media posts. Civil liberties groups have expressed concern, just as they did the last time around. Corfield observes:

“Prior efforts for mandatory age verification controls were originally supposed to be inserted into Digital Economy Act but were abandoned in 2019 after more than one delay. At that time, the government had designated the British Board of Film Classification, rather than Ofcom, as the age verification regulator. In 2018, it estimated that legal challenges to implementing the age check rules could cost it up to £10m in the first year alone. As we pointed out at the time, despite what lawmakers would like to believe – it’s not a simple case of taking offline laws and applying them online. There are no end of technical and societal issues thrown up by asking people to submit personal details to third parties on the internet. … The newer effort, via the Online Safety Bill, will possibly fuel Britons’ use of VPNs and workarounds, which is arguably equally as risky: free VPNs come with a lot of risks and even paid products may not always work as advertised.”

So if this measure is not viable, what could be the solution to keeping kids away from harmful content? If only each child could be assigned one or more adults responsible for what their youngsters access online. We could call them “caregivers,” “guardians,” or “parents,” perhaps.

Cynthia Murrell, February 25, 2022

Coalesce: Tackling the Bottleneck Few Talk About

February 1, 2022

Coalesce went stealth, the fancier and more modern techno slang for “going dark,” to work on projects in secret. The company has returned to the light, says Crowd Fund Insider with a robust business plan and product, plus loads of funding: “Coalesce Debuts From Stealth, Attracts $5.92M For Analytics Platform.”

Coalesce is run by a former Oracle employee and it develops products and services similar to Oracle, but with a Marklogic spin. That is one way to interpret how Coalesce announced its big return with its Coalesce Data Transformation platform that offers modeling, cleansing, governance, and documentation of data with analytical efficiency and flexibility. Do no forger that 11.2 Capital and GreatPoint Ventures raised $5.92 million in seed funding for the new data platform. Coalesce plans to use the funding for engineering functions, developing marketing strategy, and expanding sales.

Coalesce noticed that there is a weak link between organizations’ cloud analytics and actively making use of data:

“ ‘The largest bottleneck in the data analytics supply chain today is transformations. As more companies move to the cloud, the weaknesses in their data transformation layer are becoming apparent,’ said Armon Petrossian, the co-founder and CEO of Coalesce. “Data teams are struggling to keep up with the demands from the business, and this problem has only continued to grow with the volumes and complexity of data combined with the shortage of skilled people. We are on a mission to radically improve the analytics landscape by making enterprise-scale data transformations as efficient and flexible as possible.’”

Coalesce might be duplicating Oracle and MarkLogic, but if they have discovered a niche market in cloud analytics then they are about to rocket from their stealth. Hopefully the company will solve the transformation problem instead of issuing marketing statements as many other firms do.

Whitney Grace, February 1, 2022

Anonymized Location Data: an Oxymoron?

May 13, 2020

Location data. To many the term sounds innocuous, boring really. Perhaps that is why society has allowed apps to collect and sell it with no significant regulation. An engaging (and well-illustrated) piece from Norway’s NRK News, “Revealed by Mobile,” shares the minute details journalists were able to put together about one citizen from location data purchased on the open market. Graciously, this man allowed the findings to published as a cautionary tale. We suggest you read the article for yourself to absorb the chilling reality. (The link we share above runs through Google Translate.)

Vendors of location data would have us believe the information is completely anonymized and cannot be tied to the individuals who generated it. It is only good for general uses like statistics and regional marketing, they assert. Intending to put that claim to the test, NRK purchased a batch of Norwegian location data from the British firm Tamoco. Their investigation shows anonymization is an empty promise. Though the data is stripped of directly identifying information, buyers are a few Internet searches away from correlating location patterns with individuals. Journalists Trude Furuly, Henrik Lied, and Martin Gundersen tell us:

“All modern mobile phones have a GPS receiver, which with the help of satellite can track the exact position of the phone with only a few meters distance. The position data NRK acquired consisted of a table with four hundred million map coordinates from mobiles in Norway. …

“All the coordinates were linked to a date, time, and specific mobile. Thus, the coordinates showed exactly where a mobile or tablet had been at a particular time. NRK coordinated the mobile positions with a map of Norway. Each position was marked on the map as an orange dot. If a mobile was in a location repeatedly and for a long time, the points formed larger clusters. Would it be possible for us to find the identity of a mobile owner by seeing where the phone had been, in combination with some simple web searches? We selected a random mobile from the dataset.

“NRK searched the address where the mobile had left many points about the nights. The search revealed that a man and a woman lived in the house. Then we searched their Facebook profiles. There were several pictures of the two smiling together. It seemed like they were boyfriend and girlfriend. The man’s Facebook profile stated that he worked in a logistics company. When we searched the company in question, we discovered that it was in the same place as the person used to drive in the morning. Thus, we had managed to trace the person who owned the cell phone, even though the data according to Tamoco should have been anonymized.”

The journalists went on to put together a detailed record of that man’s movements over several months. It turns out they knew more about his trip to the zoo, for example, than he recalled himself. When they revealed their findings to their subject, he was shocked and immediately began deleting non-essential apps from his phone. Read the article; you may find yourself doing the same.

Cynthia Murrell, May 12, 2020

Enterprise Document Management: A Remarkable Point of View

March 3, 2020

DarkCyber spotted “What Is an Enterprise Document Management (EDM) System? How to Implement Full Document Control.” The write up is lengthy, running about 4,000 words. There are pictures like this one:

image

ECM is enterprise content management and in the middle is Enterprise Document Management which is abbreviated DMS, not EDM.

The idea is that documents have to be managed, and DarkCyber assumes that most organizations do not manage their content — regardless of its format — particularly well until the company is involved in a legal matter. Then document management becomes the responsibility of the lawyers.

In order to do any type of document or content management, employees have to follow the rules. The rules are the underlying foundation of the article. A company manufacturing interior panels for an automaker will have to have a product management system, an system to deal with drawings (paper and digital), supplier data, and other bits and pieces to make sure the “door cards” are produced.

The problem is that guidelines often do not translate into consistent employee behavior. One big reason is that the guidelines don’t fit into the work flows and the incentive schemes do not reward the time and effort required to make sure the information ends up in the “system.” Many professionals write something, text it, and move on. Enterprise systems typically do not track fine grained information very well.

Like enterprise search, the “document management” folks try to make workers who may be concerned about becoming redundant, a sick child, an angry boss, or any other perturbation in the consultant’s checklist ignore many information rules.

There is an association focused on records management. There are companies concerned with content management. There are vendors who focus on images, videos, audio, and tweets.

The myth that an EDM, ECM, or enterprise search system can create an affordable, non invasive, legally compliant, and effective way to deal with the digital fruit cake in organizations is worth lots of money.

The problem is that these systems, methods, guidelines, data lakes, federation technologies, smart software, etc. etc. don’t work.

The article does a good job of explaining what a consultant recommends. The information it presents provides fodder for the marketing animals who are going to help sell systems, training, and consulting.

The reality is that humans generate information and use a range of systems to produce content. Tweets about a missed shipment from a person mobile phone may be prohibited. Yeah, explain that to the person who got the order in the door and kept the commitment to the customer.

There are conferences, blogs, consulting firms, reports, and BrightPlanet videos about managing information.

The write up states:

There is no use documenting and managing poor workflows, processes, and documentation. To survive in business, you have to adapt, change and improve. That means continuously evaluating your business operations to identify shortfalls, areas for improvements, and strengths for continuous investment. Regular internal audits of your management systems will enable you to evaluate the effectiveness of your Enterprise Document Management solution.

Right. When these silver bullet, pie-in-the-sky solutions cost more than budgeted, employees quit using them, and triage costs threaten the survival of the company — call in the consultants.

Today’s systems do not work with the people actually doing information creation. As a result, most fail to deliver. Sound familiar? It should. You, gentle reader, will never follow the information rules unless you are specifically paid to follow them or given an ultimatum like “do this or get fired.”

Tweet that and let me know if you managed that information.

Stephen E Arnold, March 3, 2020

Blockchain: A Loser in 2020?

December 31, 2019

I recently completed a report about Amazon’s R&D work in blockchain. If you want a free summary of the report, write darkcyber333 at yandex dot com. If not, no problem. You will want to read “Please Blockchain, Prove Me Wrong.” The author likes to use words on some online services stop list, but that’s okay. The writer is passionate about the perceived failings of blockchain.

Blockchain is, according to the write up:

a solution looking for a problem.”

More proof needed, you gentle but skeptical reader? How about this?

According to Gartner’s Hype Cycle, blockchain is still “sliding into the trough of disillusionment,” meaning the technology is struggling to live up to the expectations created by the hype around it.

There you go. Proof from a marketing company.

DarkCyber’s view is that encryption is likely to continue to toddle forward. Also, the charm of the distributed database continues to woe some people’s attention.

There may be hope, and perhaps that is why Amazon has more than a dozen patents related to blockchain technology. We learn from the impassioned analysis:

Blockchain’s purported promise is such that everyone is willingly taking a multi-faceted approach, not giving much thought to the possibility that its potential may, in fact, be limited. Or maybe blockchain is just the first iteration of something far more powerful, a base we can build on to restore our faith in decentralized systems.

To sum up, for a dead duck, there are some feathers afloat. And there are those Amazon patents? Maybe Mr. Bezos is just off base and should stick to bulldozing outfits like mom and pop stores and outfits like FedEx?

Stephen E Arnold, December 31, 2019

Online Consumption of Data: A Mental Architecture Built on Inherent Addictive Patterns??

December 27, 2019

Two items caught my attention. The first explains that more than 80 percent of a sample group use a “second screen” when watching television. Yep, the boob tube and the vast wasteland. Marshall McLuan, a controversial figure, explained that TV is a kick back and vegetate medium. Punching buttons and formulating a thought for a tweet is hot. The article “88% of Americans Use a Second Screen While Watching TV. Why?” references the factoid that humans are not very adept at multi tasking. Interesting because humans can walk and chew gum, breathe, and think about crossing the street at the same time. But whatever. Also, the write up ignores the McLuhanesque approach that each type of media has its own “construct” or “mental evocation.”

The answer to “Why?” may be as simple as, “Addiction. Just a TV and a computing device.” Can one get the monkey off one’s back? Not easily.

Who can assist another? Consider if this item of information is correct: “70% Parents Cannot Control Their Own Online Activity.” This write up reports:

Around 70 per cent of parents admit that they themselves spend too much time online and 72 per cent feel that internet and mobile device usage in general is impeding family life…

Net net: No wonder information has to be crunchy. Easy to use is becoming a strategy for control. Interesting implications for 2020 and beyond if these two reports are mostly accurate.

Stephen E Arnold, December 27, 2019

Next Page »

  • Archives

  • Recent Posts

  • Meta