Featured
Entity Extraction: Not As Simple As Some Vendors SayNo smart software. Just a dumb dinobaby. Oh, the art? Yeah, MidJourney.
Most of the systems incorporating entity extraction have been trained to recognize the names of simple entities and mostly based on the use of capitalization. An “entity” can be a person’s name, the name of an organization, or a location like Niagara Falls, near Buffalo, New York. The river “Niagara” when bound to “Falls” means a geologic feature. The “Buffalo” is not a Bubalina; it is a delightful city with even more pleasing weather.
The same entity extraction process has to work for specialized software used by law enforcement, intelligence agencies, and legal professionals. Compared to entity extraction for consumer-facing applications like Google’s Web search or Apple Maps, the specialized software vendors have to contend with:
- Gang slang in English and other languages; for example, “bumble bee.” This is not an insect; it is a nickname for the Latin Kings.
- Organizations operating in Lao PDR and converted to English words like Zhao Wei’s Kings Romans Casino. Mr. Wei has been allegedly involved in gambling activities in a poorly-regulated region in the Golden Triangle.
- Individuals who use aliases like maestrolive, james44123, or ahmed2004. There are either “real” people behind the handles or they are sock puppets (fake identities).
Why do these variations create a challenge? In order to locate a business, the content processing system has to identify the entity the user seeks. For an investigator, chopping through a thicket of language and idiosyncratic personas is the difference between making progress or hitting a dead end. Automated entity extraction systems can work using smart software, carefully-crafted and constantly updated controlled vocabulary list, or a hybrid system.
Automated entity extraction systems can work using smart software, carefully-crafted and constantly updated controlled vocabulary list, or a hybrid system.
Let’s take an example which confronts a person looking for information about the Ku Group. This is a financial services firm responsible for the Kucoin. The Ku Group is interesting because it has been found guilty in the US for certain financial activities in the State of New York and by the US Securities & Exchange Commission.
Interviews
DarkCyber, March 29, 2022: An Interview with Chris Westphal, DataWalkChris Westphal is the Chief Analytics Officer of DataWalk, a firm providing an investigative and analysis tool to commercial and government organizations. The 12-minute interview covers DataWalk’s unique capabilities, its data and information resources, and the firm’s workflow functionality. The video can be viewed on YouTube at this location.
Stephen E Arnold, March 29, 2022
Latest News
2025 Consulting JiveHere you go. I have extracted of list of the jargon one needs to write reports, give talks, and mesmerize those with a desire to be the smartest people in the room: Agentic... Read more »
Modern Management Revealed and It Is Jaundiced with a Sickly Yellowish CastThis blog post is the work of an authentic dinobaby. No smart software was used. I was zipping through the YCombinator list of “important” items and spotted... Read more »
MUT Bites: Security Perimeters May Not Work Very WellThis blog post is the work of an authentic dinobaby. No smart software was used. I spotted a summary of an item in Ars Technica which recycled a report from Checkmarx... Read more »
Juicing Up RAG: The RAG Bop BopCan improved information retrieval techniques lead to more relevant data for AI models? One startup is using a pair of existing technologies to attempt just that.... Read more »
Does Apple Thinks Google Is Inept?At a pre-holiday get together, I heard Wilson say, “Don’t ever think you’re completely useless. You can always be used as a bad example.”... Read more »
Anthropic Gifts a Feeling of Safety: Insecurity Blooms This Holiday SeasonWritten by a dinobaby, not an over-achieving, unexplainable AI system. TechCrunch published “Google Is Using Anthropic’s Claude to Improve Its Gemini AI.”... Read more »
FReE tHoSe smaRT SoFtWarEs!No smart software involved. Just a dinobaby’s work. Do you have the list of stop words you use in your NLP prompts? (If not, click here.) You are not happy when... Read more »
McKinsey Takes One for the TeamThis blog post is the work of an authentic dinobaby. No smart software was used. I read the “real” news in “McKinsey & Company to Pay $650 Million for... Read more »
The Future: State Control of Social Media Access, Some HopeIt’s great that parents are concerned for their children’s welfare, especially when there are clear and documented dangers. The Internet has been in concerned... Read more »
VoIP in Russia, Nyet. Telegram Voice, Nyet. Just Not YetWritten by a dinobaby, not an over-achieving, unexplainable AI system. PCNews.ru in everyone’s favorite special operations center reported that Roskomnadzor (a... Read more »