CyberOSINT banner

My Refrigerator Door Shuts Automatically or Content Processing Vendor Works Hard at Repositioning

August 3, 2015

This weekend I checked out the flow of news from several dozen search and content processing vendors. What I discovered was surprising. For example, for the set of 36 vendors, there was zero substantive news about the companies’ information access technology. More disturbing were the hints of revenue difficulties; for example, New Zealand based SLI Systems, a public traded company, continues to lose money. Search and content processing sales challenges are forcing vendors to reposition themselves or align themselves with business trends which are more likely to have traction with senior managers.

image

How does a semantic technology company adapt. The approach is surprising, and it involves the Internet of Things. This is the push to put a Nest in your home and an Internet node in your appliances. One benefit is energy efficiency. The other idea is increased opportunities to push advertising to the hapless consumer who just wants to nuke a burrito in a microwave (smart of dumb microwave may not matter to a hungry teen).

I am not sure about your refrigerator. My double door General Electric refrigerator (what my grandmother called an “ice box” and some folks call a “fridge”) has doors which shut automatically. The refrigerator has an odd energy efficient sticker like the ones I remove from monitors which persist in going to sleep when my intelligence does not match the gizmo’s.

I understand that someday soon I will have a refrigerator with lots of intelligence. I am confident that with a few moments thought, I can kill that puppy’s brain.

In my narrow world, bounded by gun toting neighbors and dynamite crazed bridge builders, the Internet of Things or the somewhat odd acronym “IoT”, pronounced by my Spanish tutor “Eee ooooh tay”, will be a bit like Big Data, semantic search, natural language processing, artificial intelligence, and data lakes. The idea is that a search and content processing vendor can surf on a hot idea like fraud and pump some air into the sagging balloon labeled sales leads.

I am more convinced of this verbal magic each time I read about “new” technology from companies that are essentially vendors of look up functions applicable to information access.

The IoT is, in my opinion, more about getting information about a machine’s performance, the leasee’s adherence to maintenance schedules, and alerts about highly probably device failure.

One of my neighbors has a Mercedes which beeps, vibrates, and flashes when my neighbor strays across the white lines on the highway. Annoying but semi useful. The Mercedes also can phone home if my neighbor’s big expensive SUV experiences a malfunction. Useful. Maybe annoying if the malfunction occurs when the SUV is parked in front of the local Neiman Marcus or Goodwill store.

I read “Content Analysis and the Internet of Things: Never Leave the Fridge Door Open Again?” The main point of the write up is the question which I already answered. My refrigerator automatically shuts its door.

The article states:

The Internet of Things is the expanding network of physical objects that collect information, communicate and sense or interact with their internal states or the external environment according to Gartner, which reports that there will be nearly 26 billion devices on the Internet of Things by 2020.

Ah, yes, the mid tier firm Gartner, an excellent source of objective, unbiased, inclusion free information.

Here’s the article’s keeper passage I noted from a senior manager at a content processing company. Keep that phrase in mind: “content processing.”

With the common method of interaction, we will speak, devices will read, the design will be predicated upon our needs and less so upon the device. The trend seems so simple—for us to understand these devices, the devices must understand us. The difference is meaning. Data is an abstraction, understanding is communication, and to understand and communicate one must know meaning.

I am delighted that data have meaning. I just wonder how much of a stretch it is to apply text centric methods to outputs from an industrial machine connected to the Internet via an iGear service. My hunch is, “Not too much.”

To me the phrase “content processing” means words, not data output from my neighbor’s flashy Mercedes or an Internet enabled refrigerator.

As I said, my refrigerator door closes automatically. Do I want anyone to know that let the hinges do the work?

Stephen E Arnold, August 3, 2015

Finnish Content Discovery Case Study

July 31, 2015

There are many services that offer companies the ability to increase their content discover.  One of these services is Leiki, which offers intelligent user profiling, context-based intelligence, and semantic SaaS solutions.  Rather than having humans adapt their content to get to the top of search engine results, the machine is altered to fit a human’s needs.  Leiki pushes relevant content to a user’s search query.  Leiki released a recent, “Case Study: Lieki Smart Services Increase Customer Flow Significantly At Alma Media.”

Alma Media is one of the largest media companies in Finland, owning many well-known Finnish brands.  These include Finland’s most popular Web site, classified ads, and a tabloid newspaper.  Alma Media employed two of Leiki’s services to grow its traffic:

“Leiki’s Smart Services are adept at understanding textual content across various content types: articles, video, images, classifieds, etc. Each content item is analyzed with our semantic engine Leiki Focus to create a very detailed “fingerprint” or content profile of topics associated with the content.

SmartContext is the market leading service for contextual content recommendations. It’s uniquely able to recommend content across content types and sites and does this by finding related content using the meaning of content – not keyword frequency.

SmartPersonal stands for behavioral content recommendations. As it also uses Leiki’s unique analysis of the meaning in content, it can recommend content from any other site and content type based on usage of one site.”

The case study runs down how Leiki’s services improved traffic and encouraged more users to consume its content. Leiki’s main selling point in the cast study is that offers users personal recommendations based on content they clicked on Alma Media Web sites.  Leiki wants to be a part of developing Web 3.0 and the research shows that personalization is the way for it to go.

Whitney Grace, July 31, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

The Semantic Web and JSON LD: Some Irritation Perhaps?

July 30, 2015

I read the Wikipedia article about JSON LD or JavaScript Object notation for Linked Data when I was pondering the fate of the XML centric start ups like MarkLogic. I highlighted one sentence in the Wikipedia write up which is subject to the usual caveats about bias, incorrect information, etc. And that sentence was:

JSON-LD is designed around the concept of a “context” to provide additional mappings from JSON to an RDF model.

Yes, the much loved RDF model.

When I read “JSON-LD and Why I Hate the Semantic Web,” I noticed a bit of friskiness in the word choice; for example, misguided souls, cryptic, complicated, market share, “kick RDF in the nuts,” and similar rhetorical arabesques. I do like the active verb “kick” however.

The passage I highlighted with my bright orange marker was this one:

The problem with getting a room full of smart people together is that the group’s world view gets skewed. There are many reasons that a working group filled with experts don’t consistently produce great results. For example, many of the participants can be humble about their knowledge so they tend to think that a good chunk of the people that will be using their technology will be just as enlightened. Bad feature ideas can be argued for months and rationalized because smart people, lacking any sort of compelling real world data, are great at debating and rationalizing bad decisions.

Seems normal to me.

In my opinion, this write up explains why some XML centric, Semantic Web cheerleaders have labored to generate organic growth. Just a thought. Talking to fellow travelers is reassuring and comfortable. Those not on the cruise ship may have a different point of view.

Stephen E Arnold, July 30, 2015

Italian Firm Delivers Semantic API to Wall Street

July 22, 2015

Short honk: There are quite a few high technology firms chasing the deep pockets on Wall Street and in the City. Some, like Digital Reasoning, have teamed with larger players to capture customers. Others, like Connotate, have relied on their stakeholders to open doors. Many companies attended financial technology showcases to demonstrate the power of their intelligent systems; for example, Digital Shadows. Some companies like Terbium Labs show up and demonstrate how their advanced technology reduces risk and improves financial performance.

Expert System is approaching the market with what it calls the “first semantic API”. The idea is that money folks can create cognitive computing systems. You can read about the system at this link.

Expert Systems is betting that this is true. The news release quotes Luca Scagliarini, CEO as saying:

Intelligent solutions for strategic information management are absolutely critical in today’s big data world, and no where is this more critical than in the financial services industry where inaccurate or incomplete data can lead to fatal decisions. With Cogito API Finance, we are filling a big gap and tremendous need for customized knowledge management solutions in the financial industry.

Expert System is a publicly traded company (EXSY:MI) so the payoff from this cognitive push should be evident in the firm’s next financial report.

image

Today shares are trading at 2.12, up 0.02 or 0.76 percent. BAE Systems, a company with its NetReveal / Detica technologies which are in use in a number of financial applications, is trading at 29.35. There is market headroom available.

Stephen E Arnold, July 22, 2015

On Embedding Valuable Outside Links

July 21, 2015

If media websites take this suggestion from an article at Monday Note, titled “How Linking to Knowledge Could Boost News Media,” there will be no need to search; we’ll just follow the yellow brick links. Writer Frederic Filloux laments the current state of affairs, wherein websites mostly link to internal content, and describes how embedded links could be much, much more valuable. He describes:

“Now picture this: A hypothetical big-issue story about GE’s strategic climate change thinking, published in the Wall Street Journal, the FT, or in The Atlantic, suddenly opens to a vast web of knowledge. The text (along with graphics, videos, etc.) provided by the news media staff, is amplified by access to three books on global warming, two Ted Talks, several databases containing references to places and people mentioned in the story, an academic paper from Knowledge@Wharton, a MOOC from Coursera, a survey from a Scandinavian research institute, a National Geographic documentary, etc. Since (supposedly), all of the above is semanticized and speaks the same lingua franca as the original journalistic content, the process is largely automatized.”

Filloux posits that such a trend would be valuable not only for today’s Web surfers, but also for future historians and researchers. He cites recent work by a couple of French scholars, Fabian Suchanek and Nicoleta Preda, who have been looking into what they call “Semantic Culturonomics,” defined as “a paradigm that uses semantic knowledge bases in order to give meaning to textual corpora such as news and social media.” Web media that keeps this paradigm in mind will wildly surpass newspapers in the role of contemporary historical documentation, because good outside links will greatly enrich the content.

Before this vision becomes reality, though, media websites must be convinced that linking to valuable content outside their site is worth the risk that users will wander away. The write-up insists that a reputation for providing valuable outside links will more than make up for any amount of such drifting visitors. We’ll see whether media sites agree.

Cynthia Murrell, July 21, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

On Embedding Valuable Outside Links

July 17, 2015

If media websites take this suggestion from an article at Monday Note, titled “How Linking to Knowledge Could Boost News Media,” there will be no need to search; we’ll just follow the yellow brick links. Writer Frederic Filloux laments the current state of affairs, wherein websites mostly link to internal content, and describes how embedded links could be much, much more valuable. He describes:

“Now picture this: A hypothetical big-issue story about GE’s strategic climate change thinking, published in the Wall Street Journal, the FT, or in The Atlantic, suddenly opens to a vast web of knowledge. The text (along with graphics, videos, etc.) provided by the news media staff, is amplified by access to three books on global warming, two Ted Talks, several databases containing references to places and people mentioned in the story, an academic paper from Knowledge@Wharton, a MOOC from Coursera, a survey from a Scandinavian research institute, a National Geographic documentary, etc. Since (supposedly), all of the above is semanticized and speaks the same lingua franca as the original journalistic content, the process is largely automatized.”

Filloux posits that such a trend would be valuable not only for today’s Web surfers, but also for future historians and researchers. He cites recent work by a couple of French scholars, Fabian Suchanek and Nicoleta Preda, who have been looking into what they call “Semantic Culturonomics,” defined as “a paradigm that uses semantic knowledge bases in order to give meaning to textual corpora such as news and social media.” Web media that keeps this paradigm in mind will wildly surpass newspapers in the role of contemporary historical documentation, because good outside links will greatly enrich the content.

Before this vision becomes reality, though, media websites must be convinced that linking to valuable content outside their site is worth the risk that users will wander away. The write-up insists that a reputation for providing valuable outside links will more than make up for any amount of such drifting visitors. We’ll see whether media sites agree.

Cynthia Murrell, July 17, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Want To Know What A Semantic Ecosystem Is

July 8, 2015

Do you want to know what a semantic ecosystem is? The answer is available from TopQuadrant in its article, “Semantic Ecosystem-What’s That About?”  According to the article, a semantic ecosystem enables patterns to be discovered, show the relationships between and within data sources, add meaning to raw data artifacts, and dynamically bring information together.

In short, it shows how data and its sources connect with each other and extracts relationships from it.

What follows the brief explanation about what a semantic ecosystem can do is a paragraph about the importance of data, how it takes many forms, etc., etc.  Trust me, you have heard it before. It then makes a comparison with a natural ecosystem, i.e. the ones find in nature.

The article continues with this piece:

“As in natural ecosystems, we believe that success in business is based on capability – and the ability to adapt and evolve new capabilities. Semantic ecosystems transform existing diverse information into valuable semantic assets. Key characteristics of a semantic ecosystem are that it is adaptable and evolvable. You can start small – with one or more key business solutions and a few data sources – and the semantic foundation can grow and evolve with you.”

It turns out a semantic ecosystem is just another name for information management.  TopQuadrant coined the term to associate with their products and services.  Talk about fancy business jargon, but TopQuadrant makes a point about having an information system work so well that it seems natural.  When a system works naturally, it is able to intuit needs, interpret patterns, and make educated correlations between data.

Whitney Grace, July 8, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

Semantic Search and Challenging Patent Document Content Domains

July 7, 2015

Over the years, I have bumped into some challenging content domains. One of the most difficult was the collection of mathematical papers organized with the Dienst architecture. Another was a collection of blog posts from African bulletin board systems in a number of different languages, peppered with insider jargon. I also recall my jousts with patent documents for some pretty savvy outfits.

The processing of each of these corpuses and making them searchable by a regular human being remains an unsolved problem. Progress has been slow, and the focus of many innovators has been on workarounds. The challenge of each corpus remains a high hurdle, and in my opinion, no search sprinter is able to make it over the race course without catching a toe and plunging head first into the Multi-layer SB Resin covered surface.

I read “Why Is Semantic Search So Important for Patent Searching?” My answer was and remains, “Because vendors will grab at any buzzy concept in the hopes of capturing a share of the patent research market?”

The write up take a different approach, an approach which I find interesting and somewhat misleading.

The write up states that there are two ways to search for information: Navigational search sort of like Endeca I assume and research search, which is the old fashioned Boolean logic which I really like.

The article points out that keyword search sucks if the person looking for information does not know the exact term. That’s why I used the reference to Dienst. I wanted to provide an example which requires precise knowledge of terminology. That’s a challenge and it requires specialized knowledge from a person who recognizes that he or she may not know the exact terminology required to locate the needed information. Try the Dienst query. Navigate to a whizzy new search engine like www.unbubble.eu and plug away. How is that working out for you, but don’t cheat. You can’t use the term Dienst.

If you run the query on a point and click Web search system like Qwant.com, you cannot locate the term without running a keyword search.

The problems in patents, whether indexed with value added metadata, humans laboring in a warehouse, or with semantic methods are:

  1. Patent documents exist in versions and each document drags along assorted forms which may or may not be findable. Trips to the USPTO with hat in hand and a note from a senator often do not work. Fancy Dan patent attorneys fall back on the good old method of hunting using intermediaries. Not pretty, not easy, not cheap, and not foolproof. The versions and assorted attachments are often unfindable. (There are sometimes interesting reasons for this kettle of fish and the fish within it.) I don’t have a solution to the chains of documents and the versions of patent documents. Sigh.
  2. Patents include art. Usually the novice reacts negatively to lousy screenshots, clunky drawings, and equations which make it tough to figure out what a superscript character is. Keywords and pointing and clicking, metaphors, razzle dazzle search systems, and buzzword charged solutions from outfits like Thomson Reuters and Lexis are just tools, stone tools chiseled by some folks who want to get paid. I don’t have a good solution to the arts and crafts aspect of patent documents. Sigh sigh.
  3. Patent documents are written at a level of generalization, with jargon, Latinate constructs, and assertions that usually give me a headache. Who signed up to read lots of really bad poetry. Working through the Old Norse version of Heimskringla is a walk in the park compared to figuring out what some patents “mean.” I spent a number of years indexing 15th century Latin sermons. At least in that corpus, the common knowledge base was social and political events and assorted religious material. Patents can be all over the known knowledge universe. I don’t know of a patent processing system which can make this weird prose-poetry understandable if there is litigation or findable if there is a need to figure out if someone cooked up the same system and method before the document in question was crafted. Sigh sigh sigh.
  4. None of the systems I have used over the past 40 years does a bang up job of identifying prior art in scientific, technical or medical journal articles, blog posts, trade publications, or Facebook posts by a socially aware astrophysicist working for a social media company. Finding antecedents is a great deal of work. Has been and will be in my opinion. Sigh sigh sigh sigh. But the patent attorneys cry, “Hooray. We get to bill time.”

The write up presents some of those top brass magnets: Snappy visualizations. The idea is that a nifty diagram will address the three problems I identified in the preceding paragraphs. Visualizations may be able to provide some useful way to conceptualize where a particular patent document falls in a cluster of correctly processed patent documents. But an image does not deliver the mental equivalent of a NOW Foods Why Protein Isolate.

Net net: Pitching semantic search as a solution to the challenges of patent information access is a ball. Strikes in patent searching are not easily obtained unless you pay expert patent attorneys and their human assets to do the job. Just bring your checkbook.

Stephen E Arnold, July 7, 2015

Need Semantic Search: Lucidworks Asserts It Is the Answer by Golly

July 3, 2015

If you read this blog, you know that I comment on semantic technology every month or so. In June I pointed to an article which had been tweeted as “new stuff.” Wrong. Navigate to “Semantic Search Hoohah: Hakia”; you will learn that Hakia is a quiet outfit. Quiet as in no longer on the Web. Maybe gone?

There are other write ups in my free and for fee columns about semantic search. The theme has been consistent. My view is that semantic technology is one component in a modern cybernized system. (To learn about my use of the term cyber, navigate to www.xenky.com/cyberosint.)

I find the promotion of search engine optimization as “semantic” amusing. I find the search service firms’ promotion of their semantic expertise amusing. I find the notion of open source outfits deep in hock to venture capitalists asserting their semantic wizardry amusing.

I don’t know if you are quite as amused as I am. Here’s an easy way to determine your semantic humor score. Navigate to this slideshare link and cruise through the 34 deck presentation made by one of Lucidworks’ search mavens. Lucidworks is a company I have followed since it fired up its jets with Marc Krellenstein on board. Dr. Krellenstein ejected in short order, and the company has consumed many venture dollars with management shifts, repositionings, and the Big Data thing.

We now have Lucidworks in the semantic search sector.

Here’s what I learned from the deck:

  1. The company has a new logo. I think this is the third or fourth.
  2. Search is about technology and language. Without Google’s predictive and personalized routines, words are indeed necessary.
  3. Buzzwords and jargon do not make semantic methods simple. Consider this statement from the deck, “Tokenization plus vector mathematics (TF/IDF) or one of its cousins—“bag of words” – Algorithmic tweaks – enhanced bag of words.” Got that, gentle reader. If not, check out “sausagization.”
  4. Lucidworks offers a “field cache.” Okay, I am not unfamiliar with caching in order to goose performance, which can be an issue with some open source search systems. But Searchdaimon, an open source search system developed in Norway, runs circles around Lucidworks. My team did the benchmark test of major open source systems. Searchdaimon was the speed champ and had other sector leading characteristics as well.)
  5. Lucidworks does the ontology thing as well. The tie up of “category nodes” and “evidence nodes” may be one reason the performance goblin noses into the story.

The problem I encountered is that the write up for the slide deck emphasized Fusion as a key component. I have been poking around the “fusion” notion as we put our new study of the Dark Web together. Fusion is a tricky problem and the US government has made fusion a priority. Keep in mind that content is more than text. There are images, videos, geocodes, cryptic tweets in Farsi, and quite a few challenging issues with making content available to a researcher or analyst.

It seems that Lucidworks has cracked a problem which continues to trouble some reasonably sophisticated folks in the content analysis business. Here’s the “evidence” that Lucidworks can do what others cannot:

image

This diagram shows that after a connector is available, then “pipelines proliferate.” Well, okay.

I thought the goal was to process content objects with low latency, easily, and with semantic value adds. “Lots of stages” and “index pipelines: one way query pipelines: round trip” does not compute for this addled goose.

If the Lucidworks approach makes sense to you go for it. My team and I will stick to here and now tools and open source technology which works without the semantic jargon which is pretty much incidental to the matter. We need to process more than text. CyberOSINT vendors deliver and most use open source search as a utility function. Yep, utility. Not the main event. The failure of semantic search vendors suggests that the buzzword is not the solution to marketing woes. Pop. (That’s a pre fourth of July celebratory ladyfinger.)

Stephen E Arnold, July 3, 2015

Old Wine: Semantic Search from the Enlightenment

June 24, 2015

I read a weird disclaimer. Here it is:

This is an archived version of Pandia’s original article “Top 5 Semantic Search Engines”, we made it available to the users mainly because it is still among the most sought articles from old site. You can also check kids, radio search, news, people finder and q-cards sections.

An article from the defunct search newsletter Pandia surfaced in a news aggregation list. Pandia published one of my books, but at the moment I cannot remember which of my studies.

The write up identifies “semantic search engines.” Here’s the list with my status update in bold face:

  • Hakia. Out of business
  • SenseBot. Out of business.
  • Powerset. Bought by Microsoft. Fate unknown in the new Delve/Bing world.
  • DeepDyve. Talk about semantics but the system is a variation of the Dialog/BRS for fee search model from the late 1970s.
  • Cognition (Cognition Technologies). May be a unit of Nuance?

What’s the score?

Two failures. Two sales to another company. One survivor which has an old school business model. My take? Zero significant impact on information retrieval.

Feel free to disagree, but the promise of semantic search seems to pivot on finding a buyer and surviving by selling online research. Why so much semantic cheerleading? Beats me. Semantic methods are useful in the plumbing as a component of a richer, more robust system. Most cyberOSINT systems follow this path. Users don’t care too much about plumbing in my experience.

Stephen E Arnold, June 24, 2015

Next Page »