Semantics and the Web: A Snort of Pisco?

November 16, 2021

I read a transcript for the video called “Semantics and the Web: An Awkward History.” I have done a little work in the semantic space, including a stint as an advisor to a couple of outfits. I signed confidentiality agreements with the firms and even though both have entered the well-known Content Processing Cemetery, I won’t name these outfits. However, I thought of the ghosts of these companies as I worked my way through the transcript. I don’t think I will have nightmares, but my hunch is that investors in these failed outfits may have bad dreams. A couple may experience post traumatic stress. Hey, I am just suggesting people read the document, not go bonkers over its implications in our thumbtyping world.

I want to highlight a handful of gems I identified in the write up. If I get involved in another world-saving semantic project, I will want to have these in my treasure chest.

First, I noted this statement:

“Generic coding”, later known as markup, first emerged in the late 1960s, when William Tunnicliffe, Stanley Rice, and Norman Scharpf got the ideas going at the Graphics Communication Association, the GCA.  Goldfarb’s implementations at IBM, with his colleagues Edward Mosher and Raymond Lorie, the G, M, and L, made him the point person for these conversations.

What’s not mentioned is that some in the US government became quite enthusiastic. Imagine the benefit of putting tags in text and providing electronic copies of documents. Much better than loose-leaf notebooks. I wish I have a penny for every time I heard this statement. How does the government produce documents today? The only technology not in wide use is hot metal type. It’s been — what? — a half century?

Second, I circled this passage:

SGML included a sample vocabulary, built on a model from the earliest days of GML. The American Association of Publishers and others used it regularly.

Indeed wonderful. The phrase “slicing and dicing” captured the essence of SGML. Why have human editors? Use SGML. Extract chunks. Presto! A new book. That worked really well but for one drawback: The proliferation of wild and crazy “books” were tough to sell. Experts in SGML were and remain a rare breed of cat. There were SGML ecosystems but adding smarts to content was and remains a work in progress. Yes, I am thinking of Snorkel too.

Third, I like this observation too:

Dumpsters are available in a variety of sizes and styles.  To be honest, though, these have always been available.  Demolition of old projects, waste, and disasters are common and frequent parts of computing.

The Web as well as social media are dumpsters. Let’s toss in TikTok type videos too. I think meta meta tags can burn in our cherry red garbage container. Why not?

What do these observations have to do with “semantics”?

  1. Move from SGML to XML. Much better. Allow XML to run some functions. Yes, great idea.
  2. Create a way to allow content objects to be anywhere. Just pull them together. Was this the precursor to micro services?
  3. One major consequence of tagging or the lack of it or just really lousy tagging, marking up, and relying of software allegedly doing the heavy lifting is an active demand for a way to “make sense” of content. The problem is that an increasing amount of content is non textual. Ooops.

What’s the fix? The semantic Web revivified? The use of pre-structured, by golly, correct mark up editors? A law that says students must learn how to mark up and tag? (Problem: Schools don’t teach math and logic anymore. Oh, well, there’s an online course for those who don’t understand consistency and rules.)

The write up makes clear there are numerous opportunities for innovation. And the non-textual information. Academics have some interesting ideas. Why not go SAILing or revisit the world of semantic search?

Stephen E Arnold, November 16, 2021


Comments are closed.

  • Archives

  • Recent Posts

  • Meta