Aylo: Another Branding Moment?

August 24, 2023

Vea4_thumb_thumb_thumb_thumb_thumb_tNote: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid. Also, I have inserted an asterisk (*) instead of a vowel to sidestep some of the smart software which is making certain types of essays almost impossible to find. Isn’t the modern “mother knows best” approach to information just great?

I wish I had gone to MBA school. My mind struggles with what pod-famous people describe as one of the most significant marketing decisions. Okay? I guess. I worked for a short time at a company called “Bell+Howell” which had acquired a company called “University Microfilms” from Xerox. How’s that for a genetic analysis. I recall one of the incredibly dull drill bits boring me with tales of the “first” devices for a “personal office system.” Obviously this former  Xerox manager now with his machine oil slick hands on the controls of a high-technology company was proud of what I considered a case study in pounding nails with one’s head repeated over and over: “Mr. Arnold, we do not say to our assistants, ‘Xerox this for me, please.’ We say, “Photocopy it, please.” Yeah, I still go to the instant print store and say, “I need 10 Xeroxes of this document, please.” The teen behind the counter, grunts, and asks, “One side or two?” I think the person understood my use of the word “Xerox.”

8 20 apple

This is not an apple. An apple is the proprietary, trademarked, registered, and fiercely protected name of a company that makes a mobile phone. If you thought this was a fruit, you are not paying attention. MidJourney delivered this perspiring apple on the first try. Do not use a word on the stop word list in your prompt. Also, do not order a “coke” when you mean cola drink. Do not say “Xerox” when you want a photocopy. Do not say “p*rn” when you want Aylo (not the musician, thank you).

You get my angle of recollection: Xerox machine in the college library becomes Xerox a verb to make a copy. Hand me a Kleenex. The same. Also, I will have a coke. When I say this at the GenX restaurant near my office, I get this: “We have Pepsi products?” My response is, “Sure, whatever. Thank you.”

What happens in my lectures for law enforcement and intelligence professionals if I show edited images from P*rnhub service? My hunch is that the word P*rnhub does not mean Aylo. For some cyber crime investigators, one brand sticks. The name “Aylo” is going to be something one has to learn. Remember. I am 78 and I still say, “Xerox copy.”

P*rnhub Parent MindGeek Changing Its Name As New Owners Seek Fresh Start” reports a story which adds to this year’s case studies about product and service branding. The article reports as allegedly actual factual:

MindGeek — which has faced scrutiny in recent years for allegedly hosting content involving revenge p*rn, child sex abuse, and victims of sex-trafficking — is rebranding to the name “Aylo” effectively immediately, the company said. The “Aylo” name is likely to lead to some head-scratching — but a company spokesperson said the word was chosen specifically because it doesn’t have a meaning and can’t be found in the dictionary.

I think that the female singer Aylo may find that running a query for her music may produce some unusual results for her teenaged fans. Obviously MindGeek / P*rnhub does not agree. I think I should say, “The new owners of P*rnhub do not agree.” I would wager a copy of my October 2, 2023, keynote for the Massachusetts/New York Association of Crime Analysts’ speech that the marketing wizards who “created” or possibly “borrowed” the word are uninterested in this performer:

Aylo ist next. Niemand sonst derzeit verbindet so authentisch ein Gefühl für die Straße mit einem Gespür für großen Pop. So hat es die Berlinerin mit nur einer Handvoll Songs zur heißesten Newcomerin im Deutschrap gebracht. Hunderttausende Fans auf TikTok können sich nicht irren: Aylo ist echt – und sie ist ein echter Star. Aylo ist jeden Tag am grinden. Mit Tracks wie Kein Limit, Wach, Feuer und Blender deckt sie das komplette Spektrum ab, das sie so besonders macht: von Liebesliedern bis Ansage, von Drip bis Depri, von Super-Pop bis Straße. Und manchmal auch alles gleichzeitig.

The new owners of this well-known vendor of adult content is Ethical Capital Partners. I love the branding of the buy out firm. It pairs the ethos of modern business and the life blood of an MBA: Ethical and Capital. Perfect for adult content and what the cited news story positions as “hosting content involving revenge p*rn, child sex abuse, and victims of sex-trafficking.” I wonder if Socrates when writing or more accurately compiling Nicomachean Ethics thought of positioning his argument in terms of revenge p*rn, child sex abuse, and victims of sex-trafficking. Who knows? Greece had a different moral view in 350 BCE from an MBA working at a financial services firm I would hazard.

I looked up the Canadian company and learned: 

Ethical Capital Partners (ECP) is a private equity firm managed by a multi-disciplinary advisory team with legal, regulatory, law enforcement, public engagement, capital markets and investment banking experience. We seek out investment and advisory opportunities in industries that require principled ethical leadership. ECP invests in opportunities that focus on technology, have legal and regulatory complexity and that put a value on transparency and accountability. ECP’s philosophy is rooted in identifying properties amenable to our responsible investment approach and that have the potential to create attractive returns over a compelling time horizon.

Socrates would have understood. What do you think?

This branding effort is likely to be as confusing at Twitter’s becoming the letter X. I want to point out that searching for certain letters and words can be a challenge. Smart search engines have smart word lists. If you are not familiar with this silent helpers, navigate to this list and get a sense of what may impair findability.

Those MBAs have a knack for making interesting decisions. I love that word pair “ethical capital.” Will it become a bound phrase like “White House” or “Wall Street”?

Stephen E Arnold, August 24, 2023

Laws, Rules, Regulations for Semantic AI (No, I Do Not Know What Semantic AI Means)

March 31, 2023

Vea4_thumb_thumbNote: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

I am not going to dispute the wisdom and insight in the Microsoft essay “Consider the Future of This Decidedly Semantic AI.” The author is Xoogler Sam Schillace, CVP or corporate vice president and now a bigly wizard at the world’s pre-eminent secure software firm. However, I am not sure to what the “this” refers. Let’s assume that it is the Bing thing and not the Google thing although some plumbing may be influenced by Googzilla’s open source contributions to “this.” How would you like to disambiguate that statement, Mr. Bing?

The essay sets forth some guidelines or bright, white lines in the lingo of the New Age search and retrieval fun house. The “Laws” number nine. I want to note some interesting word choice. The reason for my focus on these terms is that taken as a group, more is revealed than I first thought.

Here are the terms I circled in True Blue (a Microsoft color selected for the blue screen of death):

  • Intent. Rule 1 and 3. The user’s intent at first glance. However, what if the intent is the hard wiring of a certain direction in the work flow of the smart software. Intent in separate parts of a model can and will have a significant impact on how the model arrives at certain decisions. Isn’t that a thumb on the scale?
  • Leverage. Rule 2. Okay, some type of Archimedes’ truism about moving the world I think. Upon rereading the sentence in which the word is used, I think it means that old-school baloney like precision and recall are not going to move anything. The “this” world has no use for delivering on point information using outmoded methods like string matching or Boolean statements. Plus, the old-school methods are too expensive, slow, and dorky.
  • Right. Rule 3. Don’t you love it when an expert explains that a “right” way to solve a problem exists. Why then did I have to suffer through calculus classes in which expressions had to be solved different ways to get the “right” answer. Yeah, who is in charge here? Isn’t it wonderful to be a sophomore in high school again?
  • Brittle. Rule 4. Yep, peanut brittle or an old-school light bulb. Easily broken, cut fingers, and maybe blinded? Avoid brittleness by “not hard coding anything.” Is that why Microsoft software is so darned stable? How about those email vulnerabilities in the new smart Outlook?
  • Lack. Rule 5. Am I correct in interpreting the use of the word “lack” as a blanket statement that the “this” is just not very good. I do love the reference to GIGO; that is, garbage in, garbage out. What if that garbage is generated by Bard, the digital phantasm of ethical behavior?
  • Uncertainty. Rule 6. Hello, welcome to the wonderful world of statistical Fancy Dancing. Is that “answer” right? Sure, if it matches the “intent” of the developer and the smart software helping that individual. I love it when smart software is recursive and learns from errors, at least known errors.
  • Protocol. Rule 7. A protocol is, according to the smart search system You.com is:

In computer networking, a protocol refers to a set of rules and guidelines that define a standard way of communicating data over a network. It specifies the format and sequence of messages that are exchanged between the different devices on the network, as well as the actions that are taken when errors occur or when certain events happen.

Yep, more rules and a standard, something universal. I think I get what Microsoft’s agenda has as a starred item: The operating system for smart software in business, the government, and education.

  • Hard. Rule 8. Yes, Microsoft is doing intense, difficult work. The task is to live up to the marketing unleashed at the World Economic Forum. Whew. Time for a break.
  • Pareidolia. Rule 9. The word means something along the lines is that some people see things that aren’t there. Hello, Bruce Lemoine, please. Oh, he’s on a date with a smart avatar. Okay, please, tell him I called. Also, some people may see in the actions of their French bulldog, a certain human quality.

If we step back and view these words in the context of the Microsoft view of semantic AI, can we see an unintentional glimpse into the inner workings of the company’s smart software? I think so. Do you see a shadowy figure eager to dominate while saying, “Ah, shucks, we’re working hard at an uncertain task. Our intent is to leverage what we can to make money.” I do.

Stephen E Arnold, March 31, 2023

A Semantic Search Use Case: But What about General Business Content with Words and Charts?

September 9, 2022

I am okay with semantic search. The idea is that a relatively standard suite of mathematical procedures delivers “close enough for horse shoes” matches germane to a user’s query. Elastic is now combining key word with some semantic goodness. The idea is that mixing methods delivers more useful results. Is this an accurate statement?

The answer is, “It depends on the use cases.”

How Semantic Search Improves Search Accuracy” explains a use case that is anchored in a technical corpus. Now I don’t want to get crossways with a group of search experts. I would submit that, in general, the vocabulary for scientific, medical, and technical information is more constrained. One does not expect to find “cheugy” or OG* in a write up about octonitrocubane.

In my limited experience, what happens is that a constrained corpus allows the developer of a finding system to use precise taxonomies, and some dinobabies may employ controlled vocabularies like those kicking around old-school commercial databases.

However, what happens when the finding system ingests a range of content objects from tweets, online news services, and TikTok-type content?

The write up says:

One particular advantage of semantic search is the resolution of ambiguous terminology and that all specific subtypes (“children”) of a technical term will be found without the need to mention them in the query explicitly.

Sounds good, particularly for scientific and technical content. What about those pesky charts and graphs? These are often useful, but many times are chock full of fudged data. What about the query, “Octonitrocubane invalid data”? I want to have the search system present links to content which may be in an article. Why? I want to make sure the alleged data set squares with my limited knowledge of statistical principles. Yeah, sorry.

The write up asserts:

A lexical search will deliver back all documents in which “pesticides” is mentioned as the text string “pesticides” plus variants thereof. A semantic search will, in addition to all documents containing the text string “pesticides”, also return documents that contain specific pesticides like bixafen, boscalid, or imazamox.

What about a chemical structure search? I want a document with structure information. Few words, just nifty structures just like the stuff inorganic and organic chemists inhale each day. Sorry about that.

Net net: Writing about search is tough when the specific corpus, the content objects, and the presence of controlled terms in addition to strings in a content object are not spelled out. Without this information, the assertions are a bit fluffy.

And the video thing? The DoD, NIST, and other outfits are making videos. Things that go boom are based on chemistry. Can semantic search find the videos and the results of tests?

Yeah, sure. The PowerPoint deck probably says so. Hands on search experience may not. Search-enabled applications may work better than plain old search jazzed up with close enough for horse shoes methods.

Stephen E Arnold, September 9, 2022

[* OG means original gangster]

Not a Eulogy for the Semantic Web, Maybe an Elegy

August 22, 2022

If you are into the semantic Web, you will enjoy “The Semantic Web is Dead – Long Live the Semantic Web!” The article has examples, some explanation, and a prediction. Spoiler: The Semantic Web will rise like Lazarus, just wearing Amazon normcore clothing. You know: For everyone.

I noted one passage and circled it in blue:

The political economy of academia and its interaction with industry is the origin of our current lack of a functional Semantic Web. Academia is structured in a way that there is very little incentive for anyone to build useable software. Instead you are elevated for rapidly throwing together an idea, a tiny proof of concept, and to iterate on microscopic variations of this thing to produce as many papers as possible. In engineering the devil is in the detail. You really need to get into the weeds before you can know what the right thing to do is. This is simultaneously a devastating situation for industry and academia. Nobody is going to wait around for a team of engineers to finish building a system to write about it in Academia. You’ll be passed immediately by legions of paper pushers. And in industry, you can’t just be mucking about with a system that you might have to throw away. We have structured collaboration as the worst of both worlds. Academics drop in random ideas, and industry try them, find them useless, and move on.

I believe in the Tooth Fairy, not Jack Benny’s Blue Fairy. The write up, like me, is mostly optimistic. I learned:

The Future of the Semantic Web is there, the Semantic Web will rise, but it will not be the Semantic Web of the past. Humanities access to data is of ever increasing importance, and the ability to make resilient and distributed methods of curating, updating and utilizing this information is key. The ideas which drove the creation of the Semantic Web are nowhere near obsolete, even if the tool chain and technologies which have defined it up to day are fated to go the way of the dinosaur.

Now I am a dinobaby. What about Web 3-ized well-formed XML? Great idea, right?

Stephen E Arnold, August 22, 2022

Smart Software and Textualists: Are You a Textualist?

June 13, 2022

Many thought it was simply a massive bad decision from an inexperienced judge. But there was more to it—it was a massive bad decision from an inexperienced textualist judge with an overreliance on big data. The Verge discusses “The Linguistics Search Engine that Overturned the Federal Mask Mandate.” Search is useful, but it must be accompanied by good judgment. When a lawsuit challenging the federal mask mandate came across her bench, federal judge Kathryn Mizelle turned to the letter of the law. Literally. Reporter Nicole Wetsman tells us:

“Mizelle took a textualist approach to the question — looking specifically at the meaning of the words in the law. But along with consulting dictionaries, she consulted a database of language, called a corpus, built by a Brigham Young University linguistics professor for other linguists. Pulling every example of the word ‘sanitation’ from 1930 to 1944, she concluded that ‘sanitation’ was used to describe actively making something clean — not as a way to keep something clean. So, she decided, masks aren’t actually ‘sanitation.’”

That is some fine hair splitting. The high-profile decision illustrates a trend in US courts that has been growing since 2018—basing legal decisions on large collections of texts meant for academic exploration. The article explains:

“A corpus is a vast database of written language that can include things like books, articles, speeches, and other texts, amounting to hundreds of millions of lines of text or more. Linguists usually use corpora for scholarly projects to break down how language is used and what words are used for. Linguists are concerned that judges aren’t actually trained well enough to use the tools properly. ‘It really worries me that naive judges would be spending their lunch hour doing quick-and-dirty searches of corpora, and getting data that is going to inform their opinion,’ says Mark Davies, the now-retired Brigham Young University linguistics professor who built both the Corpus of Contemporary American English and the Corpus of Historical American English. These two corpora have become the tools most commonly used by judges who favor legal corpus linguistics.”

Here is an example of how a lack of careful consideration while using the corpora can lead to a bad decision: the most frequent usage of a particular word (like “sanitation”) is not always the most commonly understood usage. Linguists emphasize the proper use of these databases requires skilled interpretation, a finesse a growing number of justices either do not possess or choose not to use. Such textualists apply a strictly literal interpretation to the words that make up a law, ignoring both the intent of lawmakers and legislative history. This approach means judges can avoid having to think too deeply or give reasons on the merits for their interpretations. Why, one might ask, should we have justices at all when we could just ask a database? Perhaps we are headed that way. We suppose it would save a lot of tax dollars.

See the article for more on legal corpora and how judges use them, textualism, and the problems with this simplified approach. If judges won’t respect the opinion of the very authors of the corpora on how they should and should not be used, where does that leave us?

Cynthia Murrell, June 13, 2022

Economical Semantics: Check Out GitHub

June 9, 2022

A person asked me at lunch this week, “How can we do a sentiment analysis search on the cheap?” My reaction was, “There are many options. Check out GitHub and let it rip.” After lunch, one of my trust researchers reminded me that our files contained a cop of a 2021 article called “Semantic Search on the Cheap.” I re-read the article and noticed that I had circled this passage in October 2021:

Innovative models are being released at a blistering pace, with different architectures and better scores against the benchmarks. The models are almost always bigger networks, with billions of parameters, requiring more and more GPU power. These models are extremely expressive, dynamic and can be fine-tuned to solve a multitude of problems.

Despite the cratering of some tech juggernauts, the pace of marketing in the smart software sector continues to outpace innovation. The write up is interesting because it raised a number of questions on Thursday, June 2, 2022. In a post-lunch stupor, I asked myself these questions:

  1. How many organizations want to know the “sentiment” of a chunk of text. The early sentiment analysis systems operated on word lists. Some of the words and phrases in a customer email, for example, reveal the emotional payload of a customer’s message; for example, “sue you” or “terminate our agreement.” The semantic sentiment has launched a thousand PowerPoints, but what about the emotional payload of an employee complaining on TikTok?
  2. Is 85 percent accuracy the high water mark? If it is, the “accuracy” scores are in what I continue to call the “close enough for horse shoes” playing area. In 100 text passages, the best one can do is generate 15 misses. Lower “scores” mean more misses. This is okay for online advertising, but what about diagnosing a child’s medical condition. Hey, only 15 get worse and that is the best case. No sentiment score for the parents’ communications with a malpractice attorney is necessary.
  3. Is cheap the optimal way to get good “performance”? The answer is that it costs money to go fast. Plus, smart software has a nasty tendency to drift. As the content fed into the system reflects words and concepts not part of the system’s furniture, the camp chairs get mixed up with the love seats. For certain applications like customer service in companies that don’t want to hear from customers, this approach is perfect.

Google wants everyone to Snorkel. Meta or Zuckbook wants everyone to embrace the outputs of FAIR (Facebook Artificial Intelligence Research). Clever, eh? Amazon and Microsoft are players too. We must not forget IBM. Who could ever forget Watson and DataFountain?

Net net: Download stuff from GitHub or another open source repository and get coding. Reserve time for a zippy PowerPoint too.

Stephen E Arnold, June 9, 2022

France and French: The Language of Diplomacy Says “Non, Non” to Gamer Lingo

May 31, 2022

I like France. Years ago I shipped my son to Paris to learn French. He learned other things. So, as a good daddy, I shipped him off to a language immersion school in Poitier. He learned other things. Logically, I responded as a good shepherd of my only son, I shipped him to Jarnac, to work for a cognac outfit. He learned other things. Finally, I shipped him to Montpellier. How was his French? Coming along I think.

He knew many slang terms.

Most of these were unknown to my wife (a French teacher) and me (a dolt from central Illinois). We bought a book of French slang, and it was useless. The French language zips right along: Words and phrases from French speaking Swiss people (mon dieu). Words and phrases from North Africans (what’s the term for head butt?). Words and phrases from the Middle East popular among certain fringe groups.

Over the decades, French has become Franglish. But the rock of Gibraltar (which should be a French rock, according to some French historians) is the Académie française e and its mission (a tiny snippet follows but there is a lot more at this link.

La mission confiée à l’Académie est claire : « La principale fonction de l’Académie sera de travailler, avec tout le soin et toute la diligence possibles, à donner des règles certaines à notre langue et à la rendre pure, éloquente et capable de traiter les arts et les sciences.»

Who cares? The French culture ministry (do we have one in the US other than Disneyland?)

France Bans English Gaming Tech Jargon in Push to Preserve Language Purity” explains:

Among several terms to be given official French alternatives were “cloud gaming”, which becomes “jeu video en nuage”, and “eSports”, which will now be translated as “jeu video de competition”. The ministry said experts had searched video game websites and magazines to see if French terms already existed. The overall idea, said the ministry, was to allow the population to communicate more easily.

Will those French “joueur-animateur en direct” abandon the word “streamer”?

Sure, and France will once again dominate Europe, parts of Africa, and the beaver-rich lands in North America. And Gibraltar? Sure, why not?

Stephen E Arnold, May 30, 2022

Deepset: Following the Trail of DR LINK, Fast Search and Transfer, and Other Intrepid Enterprise Search Vendors

April 29, 2022

I noted a Yahooooo! news story called “Deepset Raises $14M to Help Companies Build NLP Apps.” To me the headline could mean:

Customization is our business and services revenue our monetization model

Precursor enterprise search vendors tried to get gullible prospects to believe a company could install software and employees could locate the information needed to answer a business question. STAIRS III, Personal Library Software / SMART, and the outfit with forward truncation (InQuire) among others were there to deliver.

Then reality happened. Autonomy and Verity upped the ante with assorted claims. The Golden Age of Enterprise Search was poking its rosy fingers through the cloud of darkness related to finding an answer.

Quite a ride: The buzzwords sawed through the doubt and outfits like Delphis, Entopia, Inference, and many others embraced variations on the smart software theme. Excursions into asking the system a question to get an answer gained steam. Remember the hand crafted AskJeeves or the mind boggling DR LINK; that was, document retrieval via linguistic knowledge.

Today there are many choices for enterprise search: Free Elastic, Algolia, Funnelback now the delightfully named Squiz, Fabasoft Mindbreeze, and, of course, many, many more.

Now we have Deepset, “the startup behind the open source NLP framework Haystack, not to be confused with Matt Dunie’s memorable “haystack with needles” metaphor, the intelware company Haystack, or a basic piles of dead grass.

The article states:

CEO Milos Rusic co-founded Deepset with Malte Pietsch and Timo Möller in 2018. Pietsch and Möller — who have data science backgrounds — came from Plista, an adtech startup, where they worked on products including an AI-powered ad creation tool. Haystack lets developers build pipelines for NLP use cases. Originally created for search applications, the framework can power engines that answer specific questions (e.g., “Why are startups moving to Berlin?”) or sift through documents. Haystack can also field “knowledge-based” searches that look for granular information on websites with a lot of data or internal wikis.

What strikes me? Three things:

  1. This is essentially a consulting and services approach
  2. Enterprise becomes apps for a situation, department, or specific need
  3. The buzzwords are interesting: NLP, semantic search, BERT,  and humor.

Humor is a necessary quality which trying to make decades old technology work for distributed, heterogeneous data, email on a sales professionals mobile, videos, audio recordings, images, engineering diagrams along with the nifty datasets for the gizmos in the illustration, etc.

A question: Is $14 million enough?

Crickets.

Stephen E Arnold, April 29, 2022

Semantics Have Become an Architecture: Sounds Good but

December 17, 2021

Semantic Architecture Is A Big Data Cash Grab

A few years ago, big data was the hot topic term and in its wake a surge of techno babble followed. Many technology companies develop their own techno babble to peddle their wares, while some of the jargon does have legitimate means to exist. Epiexpress has the lowdown on one term that does have actual meaning: “What Is Semantic Architecture, And How To Build One?”

The semantic data layer is a system’s brain or hub, because most data can be found through a basic search. It overlays the more complex data in a system. Companies can leverage the semantic layer for business decisions and discover new insights. The semantic layer uses an ontology model and enterprise knowledge graph to organize data. Before building the architecture, one should consider the following:

“1. Defining and listing the organizational needs

When developing a semantic enterprise solution, properly-outlined use cases provide the critical questions that the semantic architecture will answer. It, in turn, gives a better knowledge of the stakeholders and users, defines the business value, and facilitates the definition of measurable success criteria.

2. Survey the relevant business data

Many enterprises possess a data architecture founded on data warehouses, relational databases, and an array of hybrid cloud systems and applications that aid analytics and data analysis abilities
In such enterprises, employing relevant unification processes and model mapping practices based on the enterprise’s use cases, staff skill-sets, and enterprise architecture capabilities will be an effective approach for data modeling and mapping from source systems.

3. Using semantic web standards for ensuring governance and interoperability

When implementing semantic architecture, it is important to use semantic technology such as graph management apps to be middleware. Middleware acts as organizational tools for proper metadata governance. Do not forger that users will need tools to interact with the data, such as enterprise search, chatbots, and data visualization tools.

Semantic babble?

Whitney Grace, December 17, 2021

Semantics and the Web: A Snort of Pisco?

November 16, 2021

I read a transcript for the video called “Semantics and the Web: An Awkward History.” I have done a little work in the semantic space, including a stint as an advisor to a couple of outfits. I signed confidentiality agreements with the firms and even though both have entered the well-known Content Processing Cemetery, I won’t name these outfits. However, I thought of the ghosts of these companies as I worked my way through the transcript. I don’t think I will have nightmares, but my hunch is that investors in these failed outfits may have bad dreams. A couple may experience post traumatic stress. Hey, I am just suggesting people read the document, not go bonkers over its implications in our thumbtyping world.

I want to highlight a handful of gems I identified in the write up. If I get involved in another world-saving semantic project, I will want to have these in my treasure chest.

First, I noted this statement:

“Generic coding”, later known as markup, first emerged in the late 1960s, when William Tunnicliffe, Stanley Rice, and Norman Scharpf got the ideas going at the Graphics Communication Association, the GCA.  Goldfarb’s implementations at IBM, with his colleagues Edward Mosher and Raymond Lorie, the G, M, and L, made him the point person for these conversations.

What’s not mentioned is that some in the US government became quite enthusiastic. Imagine the benefit of putting tags in text and providing electronic copies of documents. Much better than loose-leaf notebooks. I wish I have a penny for every time I heard this statement. How does the government produce documents today? The only technology not in wide use is hot metal type. It’s been — what? — a half century?

Second, I circled this passage:

SGML included a sample vocabulary, built on a model from the earliest days of GML. The American Association of Publishers and others used it regularly.

Indeed wonderful. The phrase “slicing and dicing” captured the essence of SGML. Why have human editors? Use SGML. Extract chunks. Presto! A new book. That worked really well but for one drawback: The proliferation of wild and crazy “books” were tough to sell. Experts in SGML were and remain a rare breed of cat. There were SGML ecosystems but adding smarts to content was and remains a work in progress. Yes, I am thinking of Snorkel too.

Third, I like this observation too:

Dumpsters are available in a variety of sizes and styles.  To be honest, though, these have always been available.  Demolition of old projects, waste, and disasters are common and frequent parts of computing.

The Web as well as social media are dumpsters. Let’s toss in TikTok type videos too. I think meta meta tags can burn in our cherry red garbage container. Why not?

What do these observations have to do with “semantics”?

  1. Move from SGML to XML. Much better. Allow XML to run some functions. Yes, great idea.
  2. Create a way to allow content objects to be anywhere. Just pull them together. Was this the precursor to micro services?
  3. One major consequence of tagging or the lack of it or just really lousy tagging, marking up, and relying of software allegedly doing the heavy lifting is an active demand for a way to “make sense” of content. The problem is that an increasing amount of content is non textual. Ooops.

What’s the fix? The semantic Web revivified? The use of pre-structured, by golly, correct mark up editors? A law that says students must learn how to mark up and tag? (Problem: Schools don’t teach math and logic anymore. Oh, well, there’s an online course for those who don’t understand consistency and rules.)

The write up makes clear there are numerous opportunities for innovation. And the non-textual information. Academics have some interesting ideas. Why not go SAILing or revisit the world of semantic search?

Stephen E Arnold, November 16, 2021

Next Page »

  • Archives

  • Recent Posts

  • Meta