Mondeca: Another Semantic Search Option

April 9, 2018

Mondeca, based in France, has long been focused on indexing and taxonomy. Now they offer a search platform named, simply enough, Semantic Search. Here’s their description:

“Semantic search systems consider various points including context of search, location, intent, variation of words, synonyms, generalized and specialized queries, concept matching and natural language queries to provide relevant search results. Augment your SolR or ElasticSearch capabilities; understand the intent, contextualize search results; search using business terms instead of keywords.”

A few details from the product page caught my eye. Let’s begin with the Search functionality; the page succinctly describes:

“Navigational search – quickly locate specific content or resource. Informational search – learn more about a specific subject. Compound term processing, concept search, fuzzy search, simple but smart search, controlled terms, full text or metadata, relevancy scoring. Takes care of language, spelling, accents, case. Boolean expressions, auto complete, suggestions. Disambiguated queries, suggests alternatives to the original query. Relevance feedback: modify the original query with additional terms. Contextualize by user profile, location, search activity and more.”

The software includes a GUI for visualizing the semantic data, and features word-processing tools like auto complete and a thesaurus. Results are annotated, with key terms highlighted, and filters provide significant refinement, complete with suggestions. Results can also be clustered by either statistics or semantic tags. A personalized dashboard and several options for sharing and publishing round out my list. See the product page for more details.

Established in 1999, Mondeca delivers pragmatic semantic solutions to clients in Europe and North America, and is proud to have developed their own, successful semantic methodology. The firm is based in Paris. Perhaps the next time our beloved leader, Stephen E Arnold, visits Paris, the company will make time to speak with him. Previous attempts to set up a meeting were for naught. Ah, France.

Cynthia Murrell, April 9, 2018

Video Search: Still a Challenge

April 6, 2018

As MIT Technology Review describes in its article, “The Next Big Step for AI? Understanding Video,” artificial intelligence still tends to have trouble correctly interpreting video. A recent slew of new jobs at YouTube (owned by Google) underscores this flaw—“YouTube is Hiring 10,000 People to Police Offensive Videos,” reports the New York Post. When it comes to objectionable content, algorithms just don’t get it. Yet. Meanwhile, the PR machine keeps running.

MIT Tech editor Will Knight discusses some promising solutions in the above article, beginning close to home with a collaboration between MIT and IBM. He writes:

“MIT and IBM this week released a vast data set of video clips painstakingly annotated with details of the action being carried out. The Moments in Time Dataset includes three-second snippets of everything from fishing to break-dancing. ‘A lot of things in the world change from one second to the next,’ says Aude Oliva, a principal research scientist at MIT and one of the people behind the project. ‘If you want to understand why something is happening, motion gives you lot of information that you cannot capture in a single frame.’” … “The MIT-IBM project is in fact just one of several video data sets designed to spur progress in training machines to understand actions in the physical world. Last year, for example, Google released a set of eight million tagged YouTube videos called YouTube-8M. Facebook is developing an annotated data set of video actions called the Scenes, Actions, and Objects set.”

Knight also mentions Twenty Billion Neurons, which, he notes:

“… Created a custom data set by paying crowdsourced workers to perform simple tasks. One of the company’s cofounders, Roland Memisevic, says it also uses a neural network designed specifically to process temporal vision information.”

So, we should not be surprised if, soon, AI can comprehend what it “sees.” Meanwhile, sites that host video content would do well to employ the judgment of humans.

Cynthia Murrell, April 6, 2018

Build an Alternative Google: How To Wanted

April 6, 2018

Hacker News presented an interesting question, “How would you build an internet scale web crawler?” We have been talking with companies which have developed Internet search systems that are not available for free Web search. Those conversations have produced some fascinating information. Some of the data will be included in my upcoming lecture for a government agency and then in my two presentations at the June 2018 Telestrategies ISS Conference in Prague.

What was interesting about this question was the few people responded. That is interesting because my team’s research for my new presentations on deanonymizing encrypted chat and deanonymizing digital currency transactions pivot on comprehensive Internet indexing. In fact, more companies are indexing the Internet content than at any time in the last 10 years.

The second issue the post triggered was a realization that only a handful of people jumped on the topic. This low response to the question in itself is interesting. With more activity in indexing, why aren’t more people helping out JustinGarrson? That’s a question worth thinking about.

Third, one of the responses to the Hacker News question was a pointer to the YaCy.net open source project. We once included this technology in our Internet Research for Law Enforcement training program. My recollection of the system is fuzzy, so I will get one of my team to take at look.

The final thought the Hacker News’ story triggered was, “Have people just accepted Bing, Google, Qwant, and a handful of metasearch systems as too dominant to challenge?” My view is that an opportunity exists to create a public facing Internet search and retrieval system. The reason? Outstanding alternatives to Bing, Google, and Qwant are available for those who qualify as customers and who are willing to pay the license fees.

My hunch is that just as enterprise search has coalesced around the open source Lucene/Solr technologies, free Web search has become “game over” because the ad supported model has won.

The problem, of course, is that a person looking for information usually does not realize that free Web search results are neither comprehensive, timely, or objective.

I hope individuals like JustinGarrison get the information needed to seize an opportunity in Internet search.

Stephen E Arnold, April 6, 2018

Google and Search: More Churn Turmoil

April 4, 2018

I read “John Giannandrea, Head of Google’s Cornerstone Web-Search Unit, Steps Down.” I found the phrase “steps down” amusing. I think the wizard went to the Apple orchard. Since Mr. Giannandrea ran search, Google search has become less useful to me. Now I have to use multiple search systems to locate what I think are slam dunk queries. Nope. I get some pretty off the wall Google search results.

Two points jumped out of this story for me.

First, Google is forced to go back to one of the early Googlers from the AltaVista.com team. (I did some work for an outfit called PersimmonIT, which was a provider to AltaVista.com.) What’s interesting is that Jeff Dean is one of the really old Google guard. I know he’s bright and capable but that begs this question: “Aren’t their younger, smarter, and as or more capable professionals to get the over hyped Google artificial intelligence operation underway.” I can suggest at least one candidate from the DeepMind team. But, hey, who really cares?

Second, search must be pretty broken. The job has fallen to another old timer at the GOOG. Same question: “Aren’t there younger, more with it technical wizards who can handle the massively complex, software wrapped, advertising centric systems? (Yep, systems because there is “regular” search and “mobile” search. Two search systems are part of the index puzzle Google has built over the years.) Plus, do you remember Google’s “universal” search which, as aBearStearns’ legend has it, was cooked up over a weekend to deal with a PR problem triggered by an analyst’s report to which yours truly contributed. You know “universal.” One query gets you blog content,  new Web sites, Google Books, Google Scholar, yada yada. That doesn’t exist and probably will never come to pass for some pretty good reasons. But saying something is just as good as delivering I assume.)

Net net: Google is now a mature company. The founders have distanced themselves from the legal troubles in which the company is mired. The company is caught in the Silicon Valley backlash. The Oracle Jave thing is a Freddie Kruger thing for the GOOG. Management change is a companion to the craziness which seems to characterize some units of the company.

I wonder if a query launched from a desktop computer will return on point results in the near future. I sure hope so.

Stephen E Arnold, April 4, 2018

Hidden Webs May Be a Content Escape Hatch

March 28, 2018

Beyond Search and the Dark Cyber research team discussed a topic which raised some concern among the team. Censorship may be nudging some individuals to the hidden Webs; for example, the Dark Web, i2p, ZeroWeb, etc.

In the wake of several US school shootings, the outcry of more control over gun sales has grown louder. Many organizations have begun to distance themselves from firearms related topics, like YouTube who removed all of their firearms content recently. The response has created a strange subculture, as we discovered in this recent NPR story, “Restricted by YouTube, Gun Enthusiasts are Taking Their Videos to Pornhub.”

According to the story:

“InRangeTV, which has some 144,000 subscribers on its YouTube channel, has chosen to publish videos on an adult website called Pornhub…InRangeTV also recently wrote on Facebook that it is defending “Why are we seeing continuing restrictions and challenges towards content about something demonstrably legal yet not against that which is clearly illegal?” It then posted links to YouTube videos on synthesizing meth and other illicit acts.”

This is an odd place for a freedom of speech battle to take place, but not completely. It seems right in line with something Larry Flynt would have perused. Conversely, as far right leaning content is going closer and closer toward the dark web (pornography is not the dark web, but it feels like that’s the direction this is heading) the dark web is beginning to try to take down YouTube with rightwing trolling at an extreme level. What all this means for average citizens is that search is going to get more complicated, no matter what you are hunting for.

We also noted that a site dedicated to off color content has become the new home for those who are interested in weaponry. We think the shift may be gaining momentum. How does one “find” these types of content? Perhaps encrypted chat or old fashioned word of mouth messaging. Worth watching this possible shift.

Patrick Roland, March 28, 2018

Million Short: A Metasearch Option

March 22, 2018

An interview at Forbes delves into the story behind Million Short, an alternative to Google for Internet Search. As concerns grow about online privacy, information accuracy, and filter bubbles, options that grant the user more control appeal to many. Contributor Julian Mitchell interviews Million Short founder and CEO Sanjay Arora in his piece, “This Search Engine Startup Helps You Find What Google Is Missing.” Mitchell informs us:

Founded in 2012, Million Short is an innovative search engine that takes a new and focused approach to organizing, accessing, and discovering data on the internet. The Toronto-based company aims to provide greater choices to users seeking information by magnifying the public’s access to data online. Cutting through the clutter of popular searches, most-viewed sites and sponsored suggestions, Million Short allows users to remove up to the top one million sites from the search set. Removing ‘an entire slice of the web’, the company hopes to balance the playing field for sites that may be new, suffer from poor SEO, have competitive keywords, or operate a small marketing budget. Million Short Founder and CEO Sanjay Arora shares the vision behind his company, overthrowing Google’s search engine monopoly, and his insight into the future of finding information online.

The subsequent interview gets into details, like Arora’s original motivation for creating Million Short—Search is too important to be dominated by a just few companies, he insists. The pair explores both advantages and challenges the company has seen, as well as a look to the future. See the article for more.

Cynthia Murrell, March 22, 2018

Digital Antique Coca Cola Signs for Search

March 21, 2018

In a turn that is just about the most human thing we’ve ever heard, just as the world is on the cusp of an AI revolution, many are starting to look backward toward simpler times. We got a sideways glance at our fear of change from a PC Magazine story, “Download Your Entire Google Search History.”

The story is primarily about why on Earth anyone would want to see everything they have ever searched for. But it also touches on our desire for nostalgia in this lightning quick era:

“Users can now download their entire saved search history “to see a list of the terms you’ve searched for,” the company said. “This gives you access to your data when and where you want…For safety’s sake, don’t download past searches on a public computer—at the library, an Internet cafe, or even a friend’s house. Save the curiosity for home.”

This, oddly, isn’t the only place where nostalgia and AI are blending. Remember Nokia, the flip phone people? They are back and reintroducing a line of old school not-smart phones. On top of that, the company is dabbling in new tech like AI, which leads us to wonder where these two can possibly intersect. It’s an interesting move and one that will likely have antique hunters quivering.

Patrick Roland, March 21, 2018

Google: Search Civility

March 21, 2018

Among the many fake news battles organizations like Facebook and Google are fighting, far right racist organizations. More often than not, hate groups are more clever at exposing flaws in algorithms than most companies give them credit for. Big tech is still trying to find solutions to these issues, but the problems keep cropping up, as we learned in a recent Phys.org story, “Google Under Fire for Anti-Semitic Search Results in Sweden.”

According to the story:

“A search on Google for the Holocaust showed an anti-Semitic blog post high up containing information about Swedish Jews. With their names, pictures and occupations listed, dozens of them were described in a humiliating and threatening manner, according to local media.

Searches for the neo-Nazi Nordic Resistance Movement’s propaganda website also appeared as news with “top stories from Nordfront.se.”

This isn’t the only occasion that algorithms have been infiltrated by offensive material. Take for example, the story of Facebook users who typed in “Videos of…” and had their search bar autofill with live sex acts. We are clearly still a long way from social media and big search cleaning up their act and once they do (if they do) we will then be in a controversial world of free speech violations.

What headaches will loom in the future?

Patrick Roland, March 21, 2018

Search History: Flipping That Digital Stone May Reveal Interesting Things

March 12, 2018

In a turn that is just about the most human thing we’ve ever heard, just as the world is on the cusp of an AI revolution, many are starting to look backward toward simpler times. We got a sideways glance at our fear of change from a PC Magazine story, “Download Your Entire Google Search History.”

The story is primarily about why on Earth anyone would want to see everything they have ever searched for. But it also touches on our desire for nostalgia in this lightning quick era:

“Users can now download their entire saved search history “to see a list of the terms you’ve searched for,” the company said. “This gives you access to your data when and where you want… For safety’s sake, don’t download past searches on a public computer—at the library, an Internet cafe, or even a friend’s house. Save the curiosity for home.”

A search history provides a useful pool of information about the user of Google search. Among the items of data which may be available are:

Time behavior signals; that is, when a person did searches and what the topics looked for in those time periods

Topic analysis; that is, what subjects did the searcher seek and how frequently were those topics queried

Link analysis; that is, what other sites were searched when a particular site was queries.

Other useful pieces of information can be extracted from a search history. When an analyst reviews the search history of the computers used by a group of people such as those individuals working on our studies of CyberOSINT, it is possible to develop a reasonable “snapshot” or “picture” of the topics we are investigating and the particular companies who products we are researching.

If you have not probed your search history, you might find that flipping over that digital rock may reveal some interesting insights.

Patrick Roland, March 12, 2018

A Step Forward but Museum Image Collections Remain a Search Challenge

March 8, 2018

For a few decades, art and history museums have been struggling with their online presences. The experience of seeing a Jpeg of a painting or sculpture is not the same as seeing it in person. That’s true. But there is one area where museums are holding a lot of valuable data and just now it’s starting to be searchable. We discovered this recently when the Metropolitan Museum of Art’s database “MetPublications.”

According to the page:

“MetPublications includes a description and table of contents for most titles, as well as information about the authors, reviews, awards, and links to related Met titles by author and by theme. Current book titles that are in-print may be previewed and fully searched online, with a link to purchase the book. The full contents of almost all other book titles may be read online, searched, or downloaded as a PDF.”

This includes over five hundred books about various exhibits that have spanned the last five decades. These slim volumes, usually released in conjunction with various exhibits, is fully searchable and a huge score for art lovers and historians. Previously, it was seen as too daunting and, potentially impossible. As far back as 2002 Computer Weekly was bemoaning the fact that museums had missed the digital boat. Turns out museums like the Met didn’t miss the boat, it’s just that their ship sails a little more slowly than the white knuckle world of Silicon Valley. Better late than never, we say.

Patrick Roland, March 8, 2018

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta