Featured

Entity Extraction: Not As Simple As Some Vendors Say

dino orange_thumb_thumb_thumb_thumb_thumbNo smart software. Just a dumb dinobaby. Oh, the art? Yeah, MidJourney.

Most of the systems incorporating entity extraction have been trained to recognize the names of simple entities and mostly based on the use of capitalization. An “entity” can be a person’s name, the name of an organization, or a location like Niagara Falls, near Buffalo, New York. The river “Niagara” when bound to “Falls” means a geologic feature. The “Buffalo” is not a Bubalina; it is a delightful city with even more pleasing weather.

The same entity extraction process has to work for specialized software used by law enforcement, intelligence agencies, and legal professionals. Compared to entity extraction for consumer-facing applications like Google’s Web search or Apple Maps, the specialized software vendors have to contend with:

  • Gang slang in English and other languages; for example, “bumble bee.” This is not an insect; it is a nickname for the Latin Kings.
  • Organizations operating in Lao PDR and converted to English words like Zhao Wei’s Kings Romans Casino. Mr. Wei has been allegedly involved in gambling activities in a poorly-regulated region in the Golden Triangle.
  • Individuals who use aliases like maestrolive, james44123, or ahmed2004. There are either “real” people behind the handles or they are sock puppets (fake identities).

Why do these variations create a challenge? In order to locate a business, the content processing system has to identify the entity the user seeks. For an investigator, chopping through a thicket of language and idiosyncratic personas is the difference between making progress or hitting a dead end. Automated entity extraction systems can work using smart software, carefully-crafted and constantly updated controlled vocabulary list, or a hybrid system.

Automated entity extraction systems can work using smart software, carefully-crafted and constantly updated controlled vocabulary list, or a hybrid system.

Let’s take an example which confronts a person looking for information about the Ku Group. This is a financial services firm responsible for the Kucoin. The Ku Group is interesting because it has been found guilty in the US for certain financial activities in the State of New York and by the US Securities & Exchange Commission. 

Read more »

Interviews

DarkCyber, March 29, 2022: An Interview with Chris Westphal, DataWalk

Chris Westphal is the Chief Analytics Officer of DataWalk, a firm providing an investigative and analysis tool to commercial and government organizations. The 12-minute interview covers DataWalk’s unique capabilities, its data and information resources, and the firm’s workflow functionality. The video can be viewed on YouTube at this location.

Stephen E Arnold, March 29, 2022

Latest News

FOGINT: Security Tools Over Promise & Under Deliver

While the United States and the rest of the world has been obsessed with the fallout of the former’s presidential election, bad actors planned terrorist plots.... Read more »

November 22, 2024 | Comment

More Googley Human Resource Goodness

This essay is the work of a dumb dinobaby. No smart software required. The New York Post reported that a Googler has departed. “Google News Executive Shailesh... Read more »

November 22, 2024 | Comment

Point-and-Click Coding: An eGame Boom Booster

TheNextWeb explains “How AI Can Help You Make a Computer Game Without Knowing Anything About Coding.” That’s great—unless one is a coder who makes one’s... Read more »

November 22, 2024 | Comment

China Smart, US Dumb: LLMs Bad, MoEs Good

Okay, an “MoE” is an alternative to LLMs. An “MoE” is a mixture of experts. An LLM is a one-trick pony starting to wheeze. Google, Apple, Amazon, GitHub,... Read more »

November 21, 2024 | Comment

Management Brilliance Microsoft Suggests to Customers, “You Did It!”

No smart software. Just a dumb dinobaby. Oh, the art? Yeah, MidJourney. I read an amusing write up called “Microsoft Says Unexpected Windows Server 2025 Automatic... Read more »

November 21, 2024 | Comment

Does Smart Software Forget?

A recent paper challenges the big dogs of AI, asking, “Does Your LLM Truly Unlearn? An Embarrassingly Simple Approach to Recover Unlearned Knowledge.” The study... Read more »

November 21, 2024 | Comment

Short Snort: How to Find Undocumented APIs

This essay is the work of a dumb dinobaby. No smart software required. The essay / how to “All the Data Can Be Yours” does a very good job of providing a hacker... Read more »

November 20, 2024 | Comment

Europe Wants Its Own Search System: Filtering, Trees, and More

This essay is the work of a dumb dinobaby. No smart software required. I am not going to recount the history of search companies and government entities building... Read more »

November 20, 2024 | Comment

FOGINT: Kenya Throttles Telegram to Protect KCSE Exam Integrity

Secondary school students in Kenya need to do well on their all-encompassing final exam if they hope to go to college. Several Telegram services have emerged to... Read more »

November 20, 2024 | Comment

Content Conversion: Search and AI Vendors Downplay the Task

No smart software. Just a dumb dinobaby. Oh, the art? Yeah, MidJourney. Marketers and PR people often have degrees in political science, communications, or art history.... Read more »

November 19, 2024 | Comment


  • Archives

  • Recent Posts

  • Meta