Content Injection Can Have Unanticipated Consequences

February 24, 2025

The work of a real, live dinobaby. Sorry, no smart software involved. Whuff, whuff. That’s the sound of my swishing dino tail. Whuff.

Years ago I gave a lecture to a group of Swedish government specialists affiliated with the Forestry Unit. My topic was the procedure for causing certain common algorithms used for text processing to increase the noise in their procedures. The idea was to input certain types of text and numeric data in a specific way. (No, I will not disclose the methods in this free blog post, but if you have a certain profile, perhaps something can be arranged by writing benkent2020 at yahoo dot com. If not, well, that’s life.)

We focused on a handful of methods widely used in what now is called “artificial intelligence.” Keep in mind that most of the procedures are not new. There are some flips and fancy dancing introduced by individual teams, but the math is not invented by TikTok teens.

In my lecture, the forestry professionals wondered if these methods could be used to achieve specific objectives or “ends”. The answer was and remains, “Yes.” The idea is simple. Once methods are put in place, the algorithms chug along, some are brute force and others are probabilistic. Either way, content and data injections can be shaped, just like the gizmos required to make kinetic events occur.

The point of this forestry excursion is to make clear that a group of people, operating in a loosely coordinated manner can create data or content. Those data or content can be weaponized. When ingested by or injected into a content processing flow, the outputs of the larger system can be fiddled: More emphasis here, a little less accuracy there, and an erosion of whatever “accuracy” calculations are used to keep the system within the engineers’ and designers’ parameters. A plebian way to describe the goal: Disinformation or accuracy erosion.

I read “Meet the Journalists Training AI Models for Meta and OpenAI.” The write up explains that journalists without jobs or in search of extra income are creating “content” for smart software companies. The idea is that if one just does the Silicon Valley thing and sucks down any and all content, lawyers might come calling. Therefore, paying for “real” information is a better path.

Please, read the original article to get a sense of who is doing the writing, what baggage or mind set these people might bring to their work.

If the content is distorted — either intentionally or unintentionally — the impact of these content objects on the larger smart software system might have some interesting consequences. I just wanted to point out that weaponized information can have an impact. Those running smart software and buying content assuming it is just fine, might find some interesting consequences in the outputs.

Stephen E Arnold, February 24, 2025

Written by Stephen E. Arnold · Filed Under AI, Analytics, News, Text analytics, Text processing | 1 Comment

"Real" Entities or Sock Puppets? A New Solution Can Help Analysts and Investigators

January 28, 2025

Bitext’s NAMER (shorthand for "named entity recognition") can deliver precise entity tagging across dozens of languages.

Graphs — knowledge graphs and social graphs — have moved into the mainstream since Leonhard Euler formed the foundation for graph theory in the mid 18th century in Berlin.

With graphs, analysts can take advantage of smart software’s ability to make sense of Named Entity Recognition (NER), event extraction, and relationship mapping.

The problem is that humans change their names (handles, monikers, or aliases) for many reasons: Public embarrassment, a criminal record, a change in marital status, etc.

Bitext’s NER solution, NAMER, is specifically designed to meet the evolving needs of knowledge graph companies, offering exceptional features that tackle industry challenges.

Consider a person disgraced with involvement in a scheme to defraud investors in an artificial intelligence start up. The US Department of Justice published the name of a key actor in this scheme. (Source: https://www.justice.gov/usao-ndca/pr/founder-and-former-ceo-san-francisco-technology-company-and-attorney-indicted-years). The individual was identified by the court as Valerie Lau Beckman. The official court documents used the name "Lau" to reference her involvement in a multi-million dollar scam.

However, in order to correctly identify her in social media, subsequent news stories, and in possible public summaries of her training on a LinkedIn-type of smart software is not enough.

That’s the role of a specialized software solution. Here’s what NAMER delivers.

The system identifies and classifies entities (e.g., people, organizations, locations) in unstructured data. The system accurately links data across different sources of content. The NAMER technology can tag and link significant events (transactions, announcements) to maintain temporal relevance; for example, when Ms. Lau Beckman is discharged from the criminal process. NAMER can connect entities like Ms. Lau or Ms. Beckman to other individuals with whom she works or interacts and her "names" appearance in content streams.

The licensee specifies the languages NAMER is to process, either in a knowledge base or prior to content processing via a large language model.

Access to the proprietary NAMER technology is via a local SDK which is essential for certain types of entity analysis. NAMER can also be integrated into another system or provided as a "white label service" to enhance an intelligence system with NAMER’s unique functions. The developer provides for certain use cases direct access to the source code of the system.

For an organization or investigative team interested in keeping data about Lau Beckman at the highest level of precision, Bitext’s NAMER is an essential service.

Stephen E Arnold, January 28, 2025

Written by Stephen E. Arnold · Filed Under AI, intelware, News, Text analytics, Text processing | Comments Off on "Real" Entities or Sock Puppets? A New Solution Can Help Analysts and Investigators

More about NAMER, the Bitext Smart Entity Technology

January 14, 2025

A dinobaby product! We used some smart software to fix up the grammar. The system mostly worked. Surprised? We were.

We spotted more information about the Madrid, Spain based Bitext technology firm. The company posted “Integrating Bitext NAMER with LLMs” in late December 2024. At about the same time, government authorities arrested a person known as “Broken Tooth.” In 2021, an alert for this individual was posted. His “real” name is Wan Kuok-koi, and he has been in an out of trouble for a number of years. He is alleged to be part of a criminal organization and active in a number of illegal behaviors; for example, money laundering and human trafficking. The online service Irrawady reported that Broken Tooth is “the face of Chinese investment in Myanmar.”

Broken Tooth (né Wan Kuok-koi, born in Macau) is one example of the importance of identifying entity names and relating them to individuals and the organizations with which they are affiliated. A failure to identify entities correctly can mean the difference between resolving an alleged criminal activity and a get-out-of-jail-free card. This is the specific problem that Bitext’s NAMER system addresses. Bitext says that large language models are designed for for text generation, not entity classification. Furthermore, LLMs pose some cost and computational demands which can pose problems to some organizations working within tight budget constraints. Plus, processing certain data in a cloud increases privacy and security risks.

Bitext’s solution provides an alternative way to achieve fine-grained entity identification, extraction, and tagging. Bitext’s solution combines classical natural language processing solutions solutions with large language models. Classical NLP tools, often deployable locally, complement LLMs to enhance NER performance.

NAMER excels at:

Identifying generic names and classifying them as people, places, or organizations.
Resolving aliases and pseudonyms.
Differentiating similar names tied to unrelated entities.

Bitext supports over 20 languages, with additional options available on request. How does the hybrid approach function? There are two effective integration methods for Bitext NAMER with LLMs like GPT or Llama are. The first is pre-processing input. This means that entities are annotated before passing the text to the LLM, ideal for connecting entities to knowledge graphs in large systems. The second is to configure the LLM to call NAMER dynamically.

The output of the Bitext system can generate tagged entity lists and metadata for content libraries or dictionary applications. The NAMER output can integrate directly into existing controlled vocabularies, indexes, or knowledge graphs. Also, NAMER makes it possible to maintain separate files of entities for on-demand access by analysts, investigators, or other text analytics software.

By grouping name variants, Bitext NAMER streamlines search queries, enhancing document retrieval and linking entities to knowledge graphs. This creates a tailored “semantic layer” that enriches organizational systems with precision and efficiency.

For more information about the unique NAMER system, contact Bitext via the firm’s Web site at www.bitext.com.

Stephen E Arnold, January 14, 2025

Written by Stephen E. Arnold · Filed Under AI, cybercrime, intelware, law enforcement, News, Text processing | Comments Off on More about NAMER, the Bitext Smart Entity Technology

FOGINT: A Shocking Assertion about Israeli Intelligence Before the October 2023 Attack

January 13, 2025

A post from the FOGINT team.

One of my colleagues alerted me to a new story in the Jerusalem Post. The article is “IDF Could’ve Stopped Oct. 7 by Monitoring Hamas’s Telegram, Researchers Say.” The title makes clear that this is an “after action” analysis. Everyone knows that thinking about the whys and wherefores right of bang is a safe exercise. Nevertheless, let’s look at what the Jerusalem Post reported on January 5, 2025.

First, this statement:

“These [Telegram] channels were neither secret nor hidden — they were open and accessible to all.” — Lt.-Col. (res.) Jonathan Dahoah-Halevi

Telegram puts some “silent” barriers to prevent some third parties from downloading in real time active discussions. I know of one Israeli cyber security firm which asserts that it monitors Telegram public channel messages. (I won’t ask the question, “Why didn’t analysts at that firm raise an alarm or contact their former Israeli government employers with that information? Those are questions I will sidestep.)

Second, the article reports:

These channels [public Telegram channels like Military Tactics] were neither secret nor hidden — they were open and accessible to all. The “Military Tactics” Telegram channel even shared professional content showcasing the organization’s level of preparedness and operational capabilities. During the critical hours before the attack, beginning at 12:20 a.m. on October 7, the channel posted a series of detailed messages that should have raised red flags, including: “We say to the Zionist enemy, [the operation] coming your way has never been experienced by anyone,” “There are many, many, many surprises,” “We swear by Allah, we will humiliate you and utterly destroy you,” and “The pure rifles are loaded, and your heads are the target.”

Third, I circled this statement:

However, Dahoah-Halevi further asserted that the warning signs appeared much earlier. As early as September 17, a message from the Al-Qassam Brigades claimed, “Expect a major security event soon.” The following day, on September 18, a direct threat was issued to residents of the Gaza border communities, stating, “Before it’s too late, flee and leave […] nothing will help you except escape.”

The attack did occur, and it had terrible consequences for the young people killed and wounded and for the Israeli cyber security industry, which some believe is one of the best in the world. The attack suggested that marketing rather than effectiveness created an impression at odds with reality.

What are the lessons one can take from this report? The FOGINT team will leave that to you to answer.

Stephen E Arnold, January 13, 2025

Written by Stephen E. Arnold · Filed Under cybersecurity, Fogint, News, Text processing | Comments Off on FOGINT: A Shocking Assertion about Israeli Intelligence Before the October 2023 Attack

Juicing Up RAG: The RAG Bop Bop

December 26, 2024

Can improved information retrieval techniques lead to more relevant data for AI models? One startup is using a pair of existing technologies to attempt just that. MarkTechPost invites us to “Meet CircleMind: An AI Startup that is Transforming Retrieval Augmented Generation with Knowledge Graphs and PageRank.” Writer Shobha Kakkar begins by defining Retrieval Augmented Generation (RAG). For those unfamiliar, it basically combines information retrieval with language generation. Traditionally, these models use either keyword searches or dense vector embeddings. This means a lot of irrelevant and unauthoritative data get raked in with the juicy bits. The write-up explains how this new method refines the process:

“CircleMind’s approach revolves around two key technologies: Knowledge Graphs and the PageRank Algorithm. Knowledge graphs are structured networks of interconnected entities—think people, places, organizations—designed to represent the relationships between various concepts. They help machines not just identify words but understand their connections, thereby elevating how context is both interpreted and applied during the generation of responses. This richer representation of relationships helps CircleMind retrieve data that is more nuanced and contextually accurate. However, understanding relationships is only part of the solution. CircleMind also leverages the PageRank algorithm, a technique developed by Google’s founders in the late 1990s that measures the importance of nodes within a graph based on the quantity and quality of incoming links. Applied to a knowledge graph, PageRank can prioritize nodes that are more authoritative and well-connected. In CircleMind’s context, this ensures that the retrieved information is not only relevant but also carries a measure of authority and trustworthiness. By combining these two techniques, CircleMind enhances both the quality and reliability of the information retrieved, providing more contextually appropriate data for LLMs to generate responses.”

CircleMind notes its approach is still in its early stages, and expects it to take some time to iron out all the kinks. Scaling it up will require clearing hurdles of speed and computational costs. Meanwhile, a few early users are getting a taste of the beta version now. Based in San Francisco, the young startup was launched in 2024.

Cynthia Murrell, December 26, 2024

Written by Stephen E. Arnold · Filed Under AI, News, Text analytics | Comments Off on Juicing Up RAG: The RAG Bop Bop

Bitext NAMER: Simplifying Tracking of Translated Organizational Names

December 11, 2024

This blog post is the work of an authentic dinobaby. No smart software was used.

We wrote a short item about tracking Chinese names translated to English, French, or Spanish with widely varying spellings. Now Bitext’s entity extraction system can perform the same disambiguation for companies and non-governmental entities. Analysts may be looking for a casino which operates with a Chinese name. That gambling facility creates marketing collateral or gets news coverage which uses a different name or a spelling which is different from the operation’s actual name. As a result, missing a news item related to that operation is an on-going problem for some professionals.

Bitext has revealed that its proprietary technology can perform the same tagging and extraction process for organizational names in more than two dozen languages. In “Bitext NAMER Cracks Named Entity Recognition,” the company reports:

… issues arise with organizational names, such as “Sun City” (a place and enterprise) or aliases like “Yati New City” for “Shwe Koko”; and, in general, with any language that is written in non-Roman alphabet and needs transliteration. In fact, these issues affect to all languages that do not use Roman alphabet including Hindi, Malayalam or Vietnamese, since transliteration is not a one-to-one function but a one-to-many and, as a result, it generates ambiguity the hinders the work of analysts. With real-time data streaming into government software, resolving ambiguities in entity identification is crucial, particularly for investigations into activities like money laundering.

Unlike some other approaches — for instance, smart large language models — the Bitext NAMER technology:

Identifies correctly generic names
Performs type assignment; specifically, person, place, time, and organization
Tags AKA (also known as) and pseudonyms
Distinguishes simile names linked to unelated entitles; for example, Levo Chan.

The company says:

Our unique method enables accurate, multilingual entity detection and normalization for a variety of applications.

Bitext’s technology is used by three of the top five US companies listed on NASDAQ. The firm’s headquarters are in Madrid, Spain. For more information, contact the company via its Web site, www.bitext.com.

Stephen E Arnold, December 11, 2024

Written by Stephen E. Arnold · Filed Under AI, Indexing, News, Text processing | Comments Off on Bitext NAMER: Simplifying Tracking of Translated Organizational Names

Entity Extraction: Not As Simple As Some Vendors Say

November 19, 2024

No smart software. Just a dumb dinobaby. Oh, the art? Yeah, MidJourney.

Most of the systems incorporating entity extraction have been trained to recognize the names of simple entities and mostly based on the use of capitalization. An “entity” can be a person’s name, the name of an organization, or a location like Niagara Falls, near Buffalo, New York. The river “Niagara” when bound to “Falls” means a geologic feature. The “Buffalo” is not a Bubalina; it is a delightful city with even more pleasing weather.

The same entity extraction process has to work for specialized software used by law enforcement, intelligence agencies, and legal professionals. Compared to entity extraction for consumer-facing applications like Google’s Web search or Apple Maps, the specialized software vendors have to contend with:

Gang slang in English and other languages; for example, “bumble bee.” This is not an insect; it is a nickname for the Latin Kings.
Organizations operating in Lao PDR and converted to English words like Zhao Wei’s Kings Romans Casino. Mr. Wei has been allegedly involved in gambling activities in a poorly-regulated region in the Golden Triangle.
Individuals who use aliases like maestrolive, james44123, or ahmed2004. There are either “real” people behind the handles or they are sock puppets (fake identities).

Why do these variations create a challenge? In order to locate a business, the content processing system has to identify the entity the user seeks. For an investigator, chopping through a thicket of language and idiosyncratic personas is the difference between making progress or hitting a dead end. Automated entity extraction systems can work using smart software, carefully-crafted and constantly updated controlled vocabulary list, or a hybrid system.

Automated entity extraction systems can work using smart software, carefully-crafted and constantly updated controlled vocabulary list, or a hybrid system.

Let’s take an example which confronts a person looking for information about the Ku Group. This is a financial services firm responsible for the Kucoin. The Ku Group is interesting because it has been found guilty in the US for certain financial activities in the State of New York and by the US Securities & Exchange Commission.

Written by Stephen E. Arnold · Filed Under AI, Entity extraction, Feature, intelware, law enforcement, Text processing | 1 Comment

Another Reminder about the Importance of File Conversions That Work

October 18, 2024

Salesforce has revamped its business plan and is heavily investing in AI-related technology. The company is also acquiring AI companies located in Israel. CTech has the lowdown on Salesforce’s latest acquisition related to AI file conversion: “Salesforce Acquiring Zoomin For $450 Million.”

Zoomin is an Israeli data management provider for unstructured at and Salesforce purchased it for $450 million. This is way more than what Zoomin was appraised at in 2021, so investors are happy. Earlier in September, Salesforce also bought another Israeli company Own. Buying Zoomin is part of Salesforce’s long term plan to add AI into its business practices.

Since AI need data libraries to train and companies also possess a lot of unstructured data that needs organizing, Zoomin is a wise investment for Salesforce. Zoomin has a lot to offer Salesforce:

“Following the acquisition, Zoomin’s technology will be integrated into Salesforce’s Agentforce platform, allowing customers to easily connect their existing organizational data and utilize it within AI-based customer experiences. In the initial phase, Zoomin’s solution will be integrated into Salesforce’s Data Cloud and Service Cloud, with plans to expand its use across all Salesforce solutions in the future.”

Salesforce is taking steps that other businesses will eventually follow. Will Salesforce start selling the converted data to train AI? Also will Salesforce become a new Big Tech giant?

Whitney Grace, October 18, 2024

Written by Stephen E. Arnold · Filed Under News, Text processing | Comments Off on Another Reminder about the Importance of File Conversions That Work

Google Synthetic Content Scaffolding

September 3, 2024

This essay is the work of a dumb dinobaby. No smart software required.

Google posted what I think is an important technical paper on the arXiv service. The write up is “Towards Realistic Synthetic User-Generated Content: A Scaffolding Approach to Generating Online Discussions.” The paper has six authors and presumably has the grade of “A”, a mark not award to the stochastic parrot write up about Google-type smart software.

For several years, Google has been exploring ways to make software that would produce content suitable for different use cases. One of these has been an effort to use transformer and other technology to produce synthetic data. The idea is that a set of real data is mimicked by AI so that “real” data does not have to be acquired, intercepted, captured, or scraped from systems in the real-time, highly litigious real world. I am not going to slog through the history of smart software and the research and application of synthetic data. If you are curious, check out Snorkel and the work of the Stanford Artificial Intelligence Lab or SAIL.

The paper I referenced above illustrates that Google is “close” to having a system which can generate allegedly realistic and good enough outputs to simulate the interaction of actual human beings in an online discussion group. I urge you to read the paper, not just the abstract.

Consider this diagram (which I know is impossible to read in this blog format so you will need the PDF of the cited write up):

The important point is that the process for creating synthetic “human” online discussions requires a series of steps. Notice that the final step is “fine tuned.” Why is this important? Most smart software is “tuned” or “calibrated” so that the signals generated by a non-synthetic content set are made to be “close enough” to the synthetic content set. In simpler terms, smart software is steered or shaped to match signals. When the match is “good enough,” the smart software is good enough to be deployed either for a test, a research project, or some use case.

Most of the AI write ups employ steering, directing, massaging, or weaponizing (yes, weaponizing) outputs to achieve an objective. Many jobs will be replaced or supplemented with AI. But the jobs for specialists who can curve fit smart software components to produce “good enough” content to achieve a goal or objective will remain in demand for the foreseeable future.

The paper states in its conclusion:

While these results are promising, this work represents an initial attempt at synthetic discussion thread generation, and there remain numerous avenues for future research. This includes potentially identifying other ways to explicitly encode thread structure, which proved particularly valuable in our results, on top of determining optimal approaches for designing prompts and both the number and type of examples used.

The write up is a preliminary report. It takes months to get data and approvals for this type of public document. How far has Google come between the idea to write up results and this document becoming available on August 15, 2024? My hunch is that Google has come a long way.

What’s the use case for this project? I will let younger, more optimistic minds answer this question. I am a dinobaby, and I have been around long enough to know a potent tool when I encounter one.

Stephen E Arnold, September 3, 2024

Written by Stephen E. Arnold · Filed Under AI, Business strategy, News, Text processing | Comments Off on Google Synthetic Content Scaffolding

Suddenly: Worrying about Content Preservation

August 19, 2024

This essay is the work of a dumb dinobaby. No smart software required.

Digital preservation may be becoming a hot topic for those who rarely think about finding today’s information tomorrow or even later today. Two write ups provide some hooks on which thoughts about finding information could be hung.

The young scholar faces some interesting knowledge hurdles. Traditional institutions are not much help. Thanks, MSFT Copilot. Is Outlook still crashing?

The first concerns PDFs. The essay and how to is “Classifying All of the PDFs on the Internet.” A happy quack to the individual who pursued this project, presented findings, and provided links to the data sets. Several items struck me as important in this project research report:

Tracking down PDF files on the “open” Web is not something that can be done with a general Web search engine. The takeaway for me is that PDFs, like PowerPoint files, are either skipped or not crawled. The author had to resort to other, programmatic methods to find these file types. If an item cannot be “found,” it ceases to exist. How about that for an assertion, archivists?
The distribution of document “source” across the author’s prediction classes splits out mathematics, engineering, science, and technology. Considering these separate categories as one makes clear that the PDF universe is about 25 percent of the content pool. Since technology is a big deal for innovators and money types, losing or not being able to access these data suggest a knowledge hurdle today and tomorrow in my opinion. An entity capturing these PDFs and making them available might have a knowledge advantage.
Entities like national libraries and individualized efforts like the Internet Archive are not capturing the full sweep of PDFs based on my experience.

My reading of the essay made me recognize that access to content on the open Web is perceived to be easy and comprehensive. It is not. Your mileage may vary, of course, but this write up illustrates a large, multi-terabyte problem.

The second story about knowledge comes from the Epstein-enthralled institution’s magazine. This article is “The Race to Save Our Online Lives from a Digital Dark Age.” To make the urgency of the issue more compelling and better for the Google crawling and indexing system, this subtitle adds some lemon zest to the dish of doom:

We’re making more data than ever. What can—and should—we save for future generations? And will they be able to understand it?

The write up states:

For many archivists, alarm bells are ringing. Across the world, they are scraping up defunct websites or at-risk data collections to save as much of our digital lives as possible. Others are working on ways to store that data in formats that will last hundreds, perhaps even thousands, of years.

The article notes:

Human knowledge doesn’t always disappear with a dramatic flourish like GeoCities; sometimes it is erased gradually. You don’t know something’s gone until you go back to check it. One example of this is “link rot,” where hyperlinks on the web no longer direct you to the right target, leaving you with broken pages and dead ends. A Pew Research Center study from May 2024 found that 23% of web pages that were around in 2013 are no longer accessible.

Well, the MIT story has a fix:

One way to mitigate this problem is to transfer important data to the latest medium on a regular basis, before the programs required to read it are lost forever. At the Internet Archive and other libraries, the way information is stored is refreshed every few years. But for data that is not being actively looked after, it may be only a few years before the hardware required to access it is no longer available. Think about once ubiquitous storage mediums like Zip drives or CompactFlash.

To recap, one individual made clear that PDF content is a slippery fish. The other write up says the digital content itself across the open Web is a lot of slippery fish.

The fix remains elusive. The hurdles are money, copyright litigation, and technical constraints like storage and indexing resources.

Net net: If you want to preserve an item of information, print it out on some of the fancy Japanese archival paper. An outfit can say it archives, but in reality the information on the shelves is a tiny fraction of what’s “out there”.

Stephen E Arnold, August 19, 2024

Written by Stephen E. Arnold · Filed Under Database, News, Technology, Text processing | Comments Off on Suddenly: Worrying about Content Preservation

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Content Injection Can Have Unanticipated Consequences

"Real" Entities or Sock Puppets? A New Solution Can Help Analysts and Investigators

More about NAMER, the Bitext Smart Entity Technology

FOGINT: A Shocking Assertion about Israeli Intelligence Before the October 2023 Attack

Juicing Up RAG: The RAG Bop Bop

Bitext NAMER: Simplifying Tracking of Translated Organizational Names

Entity Extraction: Not As Simple As Some Vendors Say

Another Reminder about the Importance of File Conversions That Work

Google Synthetic Content Scaffolding

Suddenly: Worrying about Content Preservation

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta