FOGINT: Security Tools Over Promise & Under Deliver

November 22, 2024

While the United States and the rest of the world has been obsessed with the fallout of the former’s presidential election, bad actors planned terrorist plots. I24 News reports that after a soccer/football match in Amsterdam, there was a preplanned attack on Israeli fans: “Evidence From WhatsApp, Telegram Groups Shows Amsterdam Pogrom Was Organized.”

The Daily Telegraph located screenshots from WhatsApp and Telegram that displayed messages calling for a “Jew Hunt” after the game. The message writers were identified as Pro-Palestinian supports. The bad actors also called Jews “cancer dogs”, a vile slur in Dutch and told co-conspirators to bring fireworks to the planned attack. Dutch citizens and other observers were underwhelmed with the response of the Netherlands’ law enforcement. Even King Willem-Alexander noted that his country failed to protect the Jewish community when he spoke with Israeli President Isaac Herzog:

“Dutch king Willem-Alexander reportedly said to Israel’s President Isaac Herzog in a phone call on Friday morning that the ‘we failed the Jewish community of the Netherlands during World War II, and last night we failed again.’”

This an unfortunate example of the failure of cyber security tools that monitor social media. If this was a preplanned attack and the Daily Telegraph located the messages, then a cyber security company should have as well. These police ware and intelware systems failed to alert authorities. Is this another confirmation that cyber security and threat intelligence tools over promise and under deliver? Well, T-Mobile is compromised again and there is that minor lapse in Israel in October 2023.

Whitney Grace, November 22, 2024

More Googley Human Resource Goodness

November 22, 2024

green-dino_thumb_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

The New York Post reported that a Googler has departed. “Google News Executive Shailesh Prakash Resigns As Tensions with Publishers Mount: Report” states:

Shailesh Prakash had served as a vice president and general manager for Google News. A source confirmed that he is no longer with the company… The circumstances behind Prakash’s resignation were not immediately clear. Google declined to comment.

Google tapped a professional who allegedly rode in the Bezos bulldozer when the world’s second or third richest man in the world acquired the Washington Post. (How has that been going? Yeah.)

image

Thanks, MidJourney. Good enough.

Google has been cheerfully indexing content and selling advertising for decades. After a number of years of talking and allegedly providing some support to outfits collecting, massaging, and making “real” news available, the Google is facing some headwinds.

The article reports:

The Big Tech giant rankled online publishers last May after it introduced a feature called “AI Overviews” – which places an auto-generated summary at the top of its search results while burying links to other sites. News Media Alliance, a nonprofit that represents more than 2,200 publishers, including The Post, said the feature would be “catastrophic to our traffic” and has called on the feds to intervene.

News flash from rural Kentucky: The good old days of newspaper publishing are unlikely to make a comeback. What’s the evidence for this statement? Video and outfits like Telegram and WhatsApp deliver content to cohorts who don’t think too much about a print anything.

The article pointed out:

Last month, The Post exclusively reported on emails that revealed how Google leveraged its access to the Office of the US Trade Representative as it sought to undermine overseas regulations — including Canada’s Online News Act, which required Google to pay for the right to display news content.

You can read that report “Google Emails with US Trade Reps Reveal Cozy Ties As Tech Giant Pushed to Hijack Policy” if you have time.

Let’s think about why a member of Google leadership like Shailesh Prakash would bail out. Among the options are:

  1. He wanted to spend more time with his family
  2. Another outfit wanted to hire him to manage something in the world of publishing
  3. He failed in making publishers happy.

The larger question is, “Why would Google think that one fellow could make a multi-decade problem go away?” The fact that I can ask this question reveals how Google’s consulting infused leaders think about an entire business sector. It also provides some insight into the confidence of a professional like Mr. Prakash.

What flees sinking ships? Certainly not the lawyers that Google will throw at this “problem.” Google has money and that may be enough to buy time and perhaps prevail. If there aren’t any publishers grousing, the problem gets resolved. Efficient.

Stephen E Arnold, November 22, 2024

Point-and-Click Coding: An eGame Boom Booster

November 22, 2024

TheNextWeb explains “How AI Can Help You Make a Computer Game Without Knowing Anything About Coding.” That’s great—unless one is a coder who makes one’s living on computer games. Writer Daniel Zhou Hao begins with a story about one promising young fellow:

Take Kyo, an eight-year-old boy in Singapore who developed a simple platform game in just two hours, attracting over 500,000 players. Using nothing but simple instructions in English, Kyo brought his vision to life leveraging the coding app Cursor and also Claude, a general purpose AI. Although his dad is a coder, Kyo didn’t get any help from him to design the game and has no formal coding education himself. He went on to build another game, an animation app, a drawing app and a chatbot, taking about two hours for each. This shows how AI is dramatically lowering the barrier to software development, bridging the gap between creativity and technical skill. Among the range of apps and platforms dedicated to this purpose, others include Google’s AlphaCode 2 and Replit’s Ghostwriter.”

The write-up does not completely leave experienced coders out of the discussion. Hao notes tools like Tabnine and GitHub Copilot act as auto-complete assistance, while Sourcery and DeepCode take the tedium out of code cleanup. For the 70-ish percent of companies that have adopted one or more of these tools, he tells us, the benefits include time savings and more reliable code. Does this mean developers will to shift to “higher value tasks,” like creative collaboration and system design, as Hao insists? Or will it just mean firms will lighten their payrolls?

As for building one’s own game, the article lists seven steps. They are akin to basic advice for developing a product, but with an AI-specific twist. For those who want to know how to make one’s AI game addictive, contact benkent2020 at yahoo dot com.

Cynthia Murrell, November 22, 2024

China Smart, US Dumb: LLMs Bad, MoEs Good

November 21, 2024

Okay, an “MoE” is an alternative to LLMs. An “MoE” is a mixture of experts. An LLM is a one-trick pony starting to wheeze.

Google, Apple, Amazon, GitHub, OpenAI, Facebook, and other organizations are at the top of the list when people think about AI innovations. We forget about other countries and universities experimenting with the technology. Tencent is a China-based technology conglomerate located in Shenzhen and it’s the world’s largest video game company with equity investments are considered. Tencent is also the developer of Hunyuan-Large, the world’s largest MoE.

According to Tencent, LLMs (large language models) are things of the past. LLMs served their purpose to advance AI technology, but Tencent realized that it was necessary to optimize resource consumption while simultaneously maintaining high performance. That’s when the company turned to the next evolution of LLMs or MoE, mixture of experts models.

Cornell University’s open-access science archive posted this paper on the MoE: “Hunyuan-Large: An Open-Source MoE Model With 52 Billion Activated Parameters By Tencent” and the abstract explains it is a doozy of a model:

In this paper, we introduce Hunyuan-Large, which is currently the largest open-source Transformer-based mixture of experts model, with a total of 389 billion parameters and 52 billion activation parameters, capable of handling up to 256K tokens. We conduct a thorough evaluation of Hunyuan-Large’s superior performance across various benchmarks including language understanding and generation, logical reasoning, mathematical problem-solving, coding, long-context, and aggregated tasks, where it outperforms LLama3.1-70B and exhibits comparable performance when compared to the significantly larger LLama3.1-405B model. Key practice of Hunyuan-Large include large-scale synthetic data that is orders larger than in previous literature, a mixed expert routing strategy, a key-value cache compression technique, and an expert-specific learning rate strategy. Additionally, we also investigate the scaling laws and learning rate schedule of mixture of experts models, providing valuable insights and guidance for future model development and optimization. The code and checkpoints of Hunyuan-Large are released to facilitate future innovations and applications.”

Tencent has released Hunyuan-Large as an open source project, so other AI developers can use the technology! The well-known companies will definitely be experimenting with Hunyuan-Large. Is there an ulterior motive? Sure. Money, prestige, and power are at stake in the AI global game.

Whitney Grace, November 21, 2024

Management Brilliance Microsoft Suggests to Customers, “You Did It!”

November 21, 2024

dino orangeNo smart software. Just a dumb dinobaby. Oh, the art? Yeah, MidJourney.

I read an amusing write up called “Microsoft Says Unexpected Windows Server 2025 Automatic Upgrades Were Due to Faulty Third-Party Tools.” I love a management action which points the fingers at “you” — Partners, customers, and anyone other than the raucous Redmond-ians.

image

Good enough, MidJourney. Good enough.

The write up says that Microsoft says:

“Some devices upgraded automatically to Windows Server 2025 (KB5044284). This was observed in environments that use third-party products to manage the update of clients and servers,” Microsoft explained. “Please verify whether third-party update software in your environment is configured not to deploy feature updates. This scenario has been mitigated.”

The article then provides a translation of Microsoftese:

In other words, it’s not Microsoft – it’s you. The company also added the update had the “DeploymentAction=OptionalInstallation” tag, which patch management tools should read as being an optional, rather than recommended update.

Several observations:

  1. Pointing fingers works in some circumstances. Kindergarten type interactions feature the tactic.
  2. The problems of updates seem to be standard operating procedure.
  3. Bad actors love these types of reports because anecdotes about glitches and flaws say, “Come on in, folks.”

Is this a management strategy or an indicator of other issues?

Stephen E Arnold, November 21, 2024

Does Smart Software Forget?

November 21, 2024

A recent paper challenges the big dogs of AI, asking, “Does Your LLM Truly Unlearn? An Embarrassingly Simple Approach to Recover Unlearned Knowledge.” The study was performed by a team of researchers from Penn State, Harvard, and Amazon and published on research platform arXiv. True or false, it is a nifty poke in the eye for the likes of OpenAI, Google, Meta, and Microsoft, who may have overlooked  the obvious. The abstract explains:

“Large language models (LLMs) have shown remarkable proficiency in generating text, benefiting from extensive training on vast textual corpora. However, LLMs may also acquire unwanted behaviors from the diverse and sensitive nature of their training data, which can include copyrighted and private content. Machine unlearning has been introduced as a viable solution to remove the influence of such problematic content without the need for costly and time-consuming retraining. This process aims to erase specific knowledge from LLMs while preserving as much model utility as possible.”

But AI firms may be fooling themselves about this method. We learn:

“Despite the effectiveness of current unlearning methods, little attention has been given to whether existing unlearning methods for LLMs truly achieve forgetting or merely hide the knowledge, which current unlearning benchmarks fail to detect. This paper reveals that applying quantization to models that have undergone unlearning can restore the ‘forgotten’ information.”

Oops. The team found as much as 83% of data thought forgotten was still there, lurking in the shadows. The paper offers a explanation for the problem and suggestions to mitigate it. The abstract concludes:

“Altogether, our study underscores a major failure in existing unlearning methods for LLMs, strongly advocating for more comprehensive and robust strategies to ensure authentic unlearning without compromising model utility.”

See the paper for all the technical details. Will the big tech firms take the researchers’ advice and improve their products? Or will they continue letting their investors and marketing departments lead them by the nose?

Cynthia Murrell, November 21, 2024

Short Snort: How to Find Undocumented APIs

November 20, 2024

green-dino_thumb_thumb_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

The essay / how to “All the Data Can Be Yours” does a very good job of providing a hacker road map. The information in the write up includes:

  1. Tips for finding undocumented APIs in GitHub
  2. Spotting “fetch” requests
  3. WordPress default APIs
  4. Information in robots.txt files
  5. Using the Google
  6. Examining JavaScripts
  7. Poking into mobile apps
  8. Some helpful resources and tools.

Each of these items includes details; for example, specific search strings and “how to make a taco” type of instructions. Assembling this write up took quite a bit of work.

Those engaged in cyber security (white, gray, and black hat types) will find the write up quite interesting.

I want to point out that I am not criticizing the information per se. I do want to remind those with a desire to share their expertise of three behaviors:

  1. Some computer science and programming classes in interesting countries use this type of information to provide students with what I would call hands on instruction
  2. Some governments, not necessarily aligned with US interests, provide the tips to the employees and contractors to certain government agencies to test and then extend the functionalities of the techniques presented in the write up
  3. Certain information might be more effectively distributed in other communication channels.

Stephen E Arnold, November 20, 2024

Europe Wants Its Own Search System: Filtering, Trees, and More

November 20, 2024

green-dino_thumb_thumb_thumb_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

I am not going to recount the history of search companies and government entities building an alternative to Google. One can toss in Bing, but Google is the Big Dog. Yandex is useful for Russian content. But there is a void even though Swisscows.com is providing anonymity (allegedly) and no tracking (allegedly).

Now a new European solution may become available. If you remember Pertimm, you probably know that Qwant absorbed some of that earlier search system’s goodness. And there is Ecosia, a search system which plants trees. The union of these two systems will be an alternative to Google. I think Exalead.com tried this before, but who remembers European search history in rural Kentucky?

Two Upstart Search Engines Are Teaming Up to Take on Google” report:

The for-profit joint venture, dubbed European Search Perspective and located in Paris, could allow the small companies and any others that decide to join up to reduce their reliance on Google and Bing and serve results that are better tailored to their companies’ missions and Europeans’ tastes.

A possible name or temporary handle for the new search system is EUSP or European Search Perspective. What’s interesting is that the plumbing will be provided by a service provider named OVH. Four years ago, OVHcloud became a strategic partner of … wait for it … Google. Apparently that deal does not prohibit OVH from providing services to a European alternative to Google.

Also, you may recall that Eric Schmidt, former adult in the room at Google, suggested that Qwant kept him awake at night. Yes, Qwant has been a threat to Google for 13 years. How has that worked out? The original Qwant was interesting with a novel way of showing results from different types of sources. Now Qwant is actually okay. The problem with any search system, including Bing, is that the cost of maintaining an index containing new content and refreshing  or updating previously indexed content is a big job. Toss in some AI goodness and cash burning furiously.

“Google” is now the word for search whether it works or does not. Perhaps regulatory actions will alter the fact that in Denmark, 99 percent of user queries flow to Google. Yep, Denmark. But one can’t go wrong with a ballpark figure like 95 percent of search queries outside of China and a handful of other countries are part of the Google market share.

How will the new team tackle the Google? I hope in a way that delivers more progress than Cogito. Remember that? Okay, no problem.

PS. Is a 13-year-old company an upstart? Sigh.

Stephen E Arnold, November 20, 2024

FOGINT: Kenya Throttles Telegram to Protect KCSE Exam Integrity

November 20, 2024

Secondary school students in Kenya need to do well on their all-encompassing final exam if they hope to go to college. Several Telegram services have emerged to assist students through this crucial juncture—by helping them cheat on the test. Authorities caught on to the practice and have restricted Telegram usage during this year’s November exams. As a result, reports Kenyans.co.ke, “NetBlocks Confirms Rising User Frustrations with Telegram Slowdown in Kenya.” Since Telegram is Kenya’s fifth most downloaded social-media platform, that is a lot of unhappy users. Writer Rene Otinga tells us:

“According to an internet observatory, NetBlocks, Telegram was restricted in Kenya with their data showing the app as being down across various internet providers. Users across the country have reported receiving several error messages while trying to interact with the app, including a ‘Connecting’ error when trying to access the Telegram desktop. However, a letter shared online from the Communications Authority of Kenya (CAK) also confirmed the temporary suspension of Telegram services to quell the perpetuation of criminal activities.”

Apparently, the restriction worked. We learn:

“On Friday, Education Principal Secretary Belio Kipsang said only 11 incidents of attempted sneaking of mobile phones were reported across the country. While monitoring examinations in Kiambu County, the PS said this was the fewest number of cheating cases the ministry had experienced in recent times.”

That is good news for honest students in Kenya. But for Telegram, this may be just the beginning of its regulatory challenges. Otinga notes:

“Governments are wary of the app, which they suspect is being used to spread disinformation, spread extremism, and in Kenya, promote examination cheating. European countries are particularly critical of the app, with the likes of Belarus, Russia, Ukraine, Germany, Norway, and Spain restricting or banning the messaging app altogether.”

Encryption can hide a multitude of sins. But when regulators are paying attention, it might not be enough to keep one out of hot water.

Cynthia Murrell, November 20, 2024

Content Conversion: Search and AI Vendors Downplay the Task

November 19, 2024

dino orange_thumbNo smart software. Just a dumb dinobaby. Oh, the art? Yeah, MidJourney.

Marketers and PR people often have degrees in political science, communications, or art history. This academic foundation means that some of these professionals can listen to a presentation and struggle to figure out what’s a horse, what’s horse feathers, and what’s horse output.

Consequently, many organizations engaged in “selling” enterprise search, smart software, and fusion-capable intelligence systems downplay or just fib about how darned easy it is to take “content” and shove it into the Fancy Dan smart software. The pitch goes something like this: “We have filters that can handle 90 percent of the organization’s content. Word, PowerPoint, Excel, Portable Document Format (PDF), HTML, XML, and data from any system that can export tab delimited content. Just import and let our system increase your ability to analyze vast amounts of content. Yada yada yada.”

image

Thanks, Midjourney. Good enough.

The problem is that real life content is often a problem. I am not going to trot out my list of content problem children. Instead I want to ask a question: If dealing with content is a slam dunk, why do companies like IBM and Oracle sustain specialized tools to convert Content Type A into Content Type B?

The answer is that content processing is an essential step because [a] structured and unstructured content can exist in different versions. Figuring out the one that is least wrong and most timely is tricky. [b] Humans love mobile devices, laptops, home computers, photos, videos, and audio. Furthermore, how does a content processing get those types of content from a source not located in an organization’s office (assuming it has one) and into the content processing system? The answer is, “Money, time, persuasion, and knowledge of what employee has what.” Finding a unicorn at the Kentucky Derby is more likely. [c] Specialized systems employ lingo like “Export as” and provide some file types. Yeah. The problem is that the output may not contain everything that is in the specialized software program. Examples range from computational chemistry systems to those nifty AutoCAD type drawing system to slick electronic trace routing solutions to DaVinci Resolve video systems which can happily pull “content” from numerous places on a proprietary network set up. Yeah, no problem.

Evidence of how big this content conversion issue is appears in the IBM write up “A New Tool to Unlock Data from Enterprise Documents for Generative AI.” If the content conversion work is trivial, why is IBM wasting time and brainpower figuring out something like making a PowerPoint file smart software friendly?

The reason is that as big outfits get “into” smart software, the people working on the project find that the exception folder gets filled up. Some documents and content types don’t convert. If a boss asks, “How do we know the data in the AI system are accurate?”, the hapless IT person looking at the exception folder either lies or says in a professional voice, “We don’t have a clue?”

IBM’s write up says:

IBM’s new open-source toolkit, Docling, allows developers to more easily convert PDFs, manuals, and slide decks into specialized data for customizing enterprise AI models and grounding them on trusted information.

But one piece of software cannot do the job. That’s why IBM reports:

The second model, TableFormer, is designed to transform image-based tables into machine-readable formats with rows and columns of cells. Tables are a rich source of information, but because many of them lie buried in paper reports, they’ve historically been difficult for machines to parse. TableFormer was developed for IBM’s earlier DeepSearch project to excavate this data. In internal tests, TableFormer outperformed leading table-recognition tools.

Why are these tools needed? Here’s IBM’s rationale:

Researchers plan to build out Docling’s capabilities so that it can handle more complex data types, including math equations, charts, and business forms. Their overall aim is to unlock the full potential of enterprise data for AI applications, from analyzing legal documents to grounding LLM responses on corporate policy documents to extracting insights from technical manuals.

Based on my experience, the paragraph translates as, “This document conversion stuff is a killer problem.”

When you hear a trendy enterprise search or enterprise AI vendor talk about the wonders of its system, be sure to ask about document conversion. Here are a few questions to put the spotlight on what often becomes a black hole of costs:

  • If I process 1,000 pages of PDFs, mostly text but with some charts and graphs, what’s the error rate?
  • If I process 1,000 engineering drawings with embedded product and vendor data, what percentage of the content is parsed for the search or AI system?
  • If I process 1,000 non text objects like videos and iPhone images, what is the time required and the metadata accuracy for the converted objects?
  • Where do unprocessable source objects go? An exception folder, the trash bin, or to my in box for me to fix up?

Have fun asking questions.

Stephen E Arnold, November 19, 2024

Next Page »

  • Archives

  • Recent Posts

  • Meta