CyberOSINT banner

Watson Goes Blekko

March 28, 2015

I read “Goodbye Blekko: Search Engine Joins IBM’s Watson Team.” According to the write up, “Blekko’s home page says its team and technology are now part of IBM’s Watson technology.” I would not know this. I do not use the service. I wrestled with the implementation of Blekko on a news service and then wondered if Yandex was serious about the company. Bottom line: Blekko is not one of my go to search systems, and I don’t cover it in my Alternatives to Google lectures for law enforcement and intelligence professionals.

The write up asserts:

Blekko came out of stealth in 2008 with Skrenta promising to create a search engine with “algorithmic editorial differentiation” compared to Google. Its public search engine finally opened in 2010, launching with what the site called “slashtags” — a personalization and filtering tool that gave users control over the sites they saw in Blekko’s search results.

Another search system becomes part of the puzzling Watson service. How many information access systems does IBM require to make Watson the billion dollar revenue generator or at least robust enough to pay the rent for the Union Square offices?

IBM “owns” the Clementine system which arrived with the SPSS purchase. IBM owns Vivisimo, which morphed into a Big Data system in the acquisition news release, iPhrase, and the wonky search functions in DB2. Somewhere along the line, IBM snagged the Illustra system. From its own labs, IBM has Web Fountain. There is the decades old STAIRS system which may still be available as Service Master. And, of course, there is the Lucene system which provides the dray animals for Watson. Whew. That is a wealth of information access technology, and I am not sure it is comprehensive.

My point is that Blekko and its razzle dazzle assertions now have to provide something that delivers a payoff for IBM. On the other hand, maybe IBM Watson executives are buying technology in the hopes that one of the people “aquihired” or the newly bought zeros and ones will generate massive cash flows.

Watson has morphed from a question answering game show winner into all manner of fantastic information processing capabilities. For me, Watson is an example of what happens when a lack of focus blends with money, executive compensation schemes, and a struggling $100 billion outfit.

Lots of smoke. Not much revenue fire. Stakeholders hope it will change. I am looking forward to a semantically enriched recipe for barbeque sauce that includes tamarind and other spices not available in Harrod’s Creek, Kentucky. Yummy. A tasty addition to the quarterly review menu: Blekko with revenue and a piquant profit sauce.

Perhaps IBM next will acquire Pertimm and the Qwant search system which terrrifes Eric Schmidt? Surprises ahead. I prefer profitable, sustainable revenues however.

Stephen E Arnold, March 28, 2015

Semantic Search Becomes Search Engine Optimization: That Is Going to Improve Relevance

March 27, 2015

I read “The Rapid Evolution of Semantic Search.” It must be my age or the fact that it is cold in Harrod’s Creek, Kentucky, this morning. The write up purports to deliver “an overview of the history of semantic search and what this means for marketers moving forward.” I like that moving forward stuff. It reminds me of Project Runway’s “fashion forward.”

The write up includes a wonky graphic that equates via an arrow Big Data and metadata, volume, smart content, petabytes, data analysis, vast, structured, and framework. Big Data is a cloud with five little arrows pointing down. Does this mean Big Data is pouring from the sky like yesterday’s chilling rain?

The history of the Semantic Web begins in 1998. Let’s see that is 17 years ago. The milestone is in the context of the article, the report “Semantic Web road Map.” I learned that Google was less than a month old. I thought that Google was Backrub and the work on what was named Google begin a couple, maybe three years, earlier. Who cares?

The Big Idea is that the Web is an information space. That sounds good.

Well in 2012, something Big happened. According to the write up Google figured out that 20 percent of its searches were “new.” Aren’t those pesky humans annoying. The article reports:

long tail keywords made up approximately 70 percent of all searches. What this told Google was that users were becoming interested in using their search engine as a tool for answering questions and solving problems, not just looking up facts and finding individual websites. Instead of typing “Los Angeles weather,” people started searching “Los Angeles hourly weather for March 1.” While that’s an extremely simplified explanation, the fact is that Google, Bing, Facebook, and other internet leaders have been working on what Colin Jeavons calls “the silent semantic revolution” for years now. Bing launched Satori, a knowledge storehouse that’s capable of understanding complex relationships between people, things, and entities. Facebook built Knowledge Graph, which reveals additional information about things you search, based on Google’s complex semantic algorithm called Hummingbird.

Yep, a new age dawned. The message in the article is that marketers have a great new opportunity to push their message in front of users. In my book, this is one reason why running a query on any of the ad supported Web search engines returns so much irrelevant information. In my just submitted Information Today column, I report how a query for the phrase “concept searching” returned results littered with a vendor’s marketing hoo-hah.

I did not want information about a vendor. I wanted information about a concept. But, alas, Google knows what I want. I don’t know what I want in the brave new world of search. The article ignores the lack of relevance in results, the dust binning of precision and recall, and the bogus information many search queries generate. Try to find current information about Dark Web onion sites and let me know how helpful the search systems are. In fact, name the top TOR search engines. See how far you get with Bing, Google, and Yandex. (DuckDuckGo and Ixquick seem to be aware of TOS content by the way.)

So semantic in the context of this article boils down to four points:

  1. Think like an end user. I suppose one should not try to locate an explanation of “concept searching.” I guess Google knows I care about a company with a quite narrow set of technology focused on SharePoint.
  2. Invest in semantic markup. Okay, that will make sense to the content marketers. What if the system used to generate the content does not support the nifty features of the Semantic Web. OWL, who? RDF what?
  3. Do social. Okay, that’s useful. Facebook and Twitter are the go to systems for marketing products I assume. Who on Facebook cares about cyber OSINT or GE’s cratering petrochemical business?
  4. And the keeper, “Don’t forget about standard techniques.” This means search engine optimization. That SEO stuff is designed to make relevance irrelevant. Great idea.

Net net: The write up underscores some of the issues associated with generating buzz for a small business like the ones INC Magazine tries to serve. With write ups like this one about Semantic Search, INC may be confusing their core constituency. Can confused executives close deals and make sense of INC articles? I assume so. I know I cannot.

Stephen E Arnold, March 27, 2015

An Incomplete History of the Semantic Web

March 3, 2015

The article on the blog Realizing Semantic Web titled Semantic Web – Story So Far explores where exactly credit it due for the current state of Semantic Web technology. The author notes that as of 2004, there were very few tools for developers interested in investing time and money. Between then and 2010, quite a leap forward took place, with major improvements in the standards and practices of the Semantic Web technology. The article aims to acknowledge the people and companies that did the most important work. The list includes,

Tim Berners Lee for believing when we all thought Semantic web might not work and will be another AI failure. And of course for his His work at the W3C. James Handler – in addition to his continued work on Semantic Web, for coming up with gems such as the definition of Semantics/Linked Data Cloud that is most effective….DBPedia & Linked Data Cloud…OWL/RDF/SKOS…Google Refine and similar efforts…BBC & other case studies…”

This list does, however, still seem incomplete and somewhat partial. The author even suggests that more input might be needed, but he only allows for two or so more additions. Is this an accurate reflection of the development of the Semantic Web?

Chelsea Kerwin, March 03, 2015

Sponsored by ArnoldIT.com, developer of Augmentext

Whatever Happened to Social Search?

January 7, 2015

Social search was supposed to integrate social media and regular semantic search to create a seamless flow of information. This was one of the major search points for a while, yet it has not come to fruition. So what happened? TechCrunch reports that it is “Good Riddance To Social Search” and with good reason, because the combination only cluttered up search results.

TechCrunch explains that Google tried Social Search back in 2009, using its regular search engine and Google+. Now the search engine mogul is not putting forth much effort in promoting social search. Bing tried something by adding more social media features, but it is not present in most of its search results today.

Why did this endeavor fail?

“I think one of the reasons social search failed is because our social media “friendships” don’t actually represent our real-life tastes all that well. Just because we follow people on Twitter or are friends with old high school classmates on Facebook doesn’t mean we like the same restaurants they do or share the politics they do. At the end of the day, I’m more likely to trust an overall score on Yelp, for example, than a single person’s recommendation.”

It makes sense considering how many people consider their social media feeds are filled with too much noise. Having search results free of the noiwy makes them more accurate and helpful to users.

Whitney Grace, January 07, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

Patent Search Needs to Be Semantic

December 31, 2014

An article published on Innography called “Advanced Patent Search” brings to attention how default search software might miss important search results, especially if one is researching patents. It points pout that some parents are purposefully phrased to cause hide their meaning and relevance to escape under the radar.

Deeper into the article it transforms into a press release highlight Innography’s semantic patent search. It highlights how the software searches through descriptive task over product description, keywords, and patent abstracts. This is not anything too exciting, but this makes the software more innovative:

“Innography provides fast and comprehensive metadata analysis as another method to find related patents. For example, there are several “one-click” analyses from a selected patent – classification analysis, citation mining, invalidation, and infringement – with a user-selected similarity threshold to refine the analyses as desired. The most powerful and complete analyses utilize all three methods – keyword search, semantic search, and metadata analysis – to ensure finding the most relevant patents and intellectual property to analyze further.”

Innography’s patent search serves as an example for how search software needs to compete with comparable products. A simple search is not enough anymore, not in the world of big data. Users demand analytics, insights, infographics, easy of use, and accurate results.

Whitney Grace, December 31, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

Drop Everything and Learn These New Tips for Semantic Search

December 31, 2014

IT developers are searching for new ways to manipulate semantic search, but according to Search Engine Journal in “12 Things You Need To Do For Semantic Search” they are all trying to figure out what the user wants. The article offers twelve tips to get back to basics and use semantic search as a tool to drive user adoption.

Some of the tips are quite obvious, such as think like a user, optimize SEO, and harness social media and local resources. Making a Web site stand out, requires taking the obvious tips and using a bit more. The article recommends that it is time to learn more about Google Knowledge Graph and how it applies to your industry. Schema markup is also important, because search engines rely on it for richer results and it develops how users see your site in a search engine.

Here is some advice on future proofing you site:

“Work out how your site can answer questions and provide users with information that doesn’t just read like terms and conditions. Pick the topics, services and niches that apply to your site and start to optimize your site and your content in a way that will benefit users. Users will never stop searching using specific questions, but search engines are actively encouraging them to ask a question or solve a problem so get your services out there by meeting user needs.”

More tips include seeing how results are viewed on search engines other than Google, keeping up with trends, befriending a thesaurus, and being aware that semantic search requires A LOT of work.

Whitney Grace, December 31, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

Temis Attends University

December 30, 2014

Despite budget cuts in academic research with print materials, higher education is clamoring for more digital content. You do not need Google Translate to understand that means more revenue for companies in that industry. Virtual Strategy writes that someone wants in on the money: “With Luxid Content Enrichment Platform, Cairn.info Automates The Extraction Of Bibliographic References And The Linking To Corresponding Article.”

Temis is an industry leader in semantic content enrichment solutions for enterprise and they signed a license and service agreement with CAIRN.info. CAIRN.info is a publishing portal for social sciences and humanities, providing students with access to the usual research fare.

Taking note of the changes in academic research, CAIRN.info wants to upgrade its digital records for a more seamless user experience:

“To make its collection easier to navigate, and ahead of the introduction of an additional 20.000 books which will consolidate its role of reference SSH portal, Cairn.info decided to enhance the interconnectedness of SSH publications with semantic enrichment. Indeed, the body of SSH articles often features embedded bibliographic references that don’t include actual links to the target document. Cairn.info therefore chose to exploit the Luxid® Content Enrichment Platform, driven by a customized annotator (Skill Cartridge®), to automatically identify, extract, and normalize these bibliographic references and to link articles to the documents they refer to.”

A round of applause for Cairn.info, realizing that making research easier will help encourage more students to use its services. If only academic databases would take ease of use into consideration and upgrade their UI dashboards.

Whitney Grace, December 30, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

Lexalytics Positions Semantria in Europe

December 12, 2014

Analytics outfit Lexalytics is going all-in on their European expansion. The write-up, “Lexalytics Expands International Presence: Launches Pain-Free Text Mining Customization” at Virtual-Strategy Magazine tells us that the company has boosted the language capacity of their recently acquired Semantria platform. The text-analytics and sentiment-analysis platform now includes Japanese, Arabic, Malay, and Russian in its supported-language list, which already included English, French, German, Chinese, Spanish, Portuguese, Italian, and Korean.

Lexalytics is also setting up servers in Europe. Because of upcoming changes to EU privacy law, we’re told companies will soon be prohibited from passing data into the U.S. Thanks to these new servers, European clients will be able to use Semantria’s cloud services without running afoul of the law.

Last summer, the company courted Europeans’ attention by becoming a sponsor of the 2014 Enterprise Hackathon in Prague. The press release tells us:

“All participants of the Hackathon were granted unlimited access and support to the Semantria API during the event. Nearly every team tried Semantria during the 36 hours they had to build a program that could crunch enough data to be used at the enterprise level. Redmore says, “We love innovative, quick development events, and are always looking for good events to support. Please contact us if you have a hackathon where you can use the power of our text mining solutions, and we’ll talk about hooking you up!”

Lexalytics is proud to have been the first to offer sentiment analysis, auto theme detection, and Wikipedia integration. Designed to integrate with third-party applications, their text analysis software is chugs along in the background at many data-related organizations. Founded in 2003, Lexalytics is headquartered in Amherst, Massachusetts.

Cynthia Murrell, December 12, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

A Plan for Achieving ROI via Text Analytics

December 6, 2014

ROI is the end goal for many big data and enterprise related projects and it is refreshing to see some information published in regards to if companies achieve it like we recently saw in a Smart Data Collective article, “Text Analytics, Big Data and the Keys to ROI.” According to a study released last year (further discussed in“Text/Content Analytics 2011: User Perspectives on Solutions and Providers”) the reason many businesses do not get positive returns has to do with the planning phase. Many report that they did not start with a clear plan to get there.

The author shares with us an example from his full-time work in text analytics. One of his clients that was focused on sifting through masses of social media data and data from government applications looking for suspicious activity needed a solution for a text-heavy application. The author responded by suggesting a selective cross-lingual process, one which worked with the text in its native language, and only on the text that was relevant to the topic of interest.

The following happened after the author’s suggestion:

Although he seemed to appreciate the logic of my suggestions and the quality benefits of avoiding translation, he just didn’t want to deal with a new approach. He asked to just translate everything and analyze later – as many people do. But I felt strongly that he’d be spending more and getting weaker results. So, I gave him two quotes. One for translating everything first and analyzing later – his way, and one for the cross-lingual approach that I recommended. When he saw that his own plan was going to cost over a million dollars more, he quickly became very open minded about exploring a new approach.

It sounds like the author could have suggested a number of similar semantic processing solutions. For example, Cogito Intelligence API enhances the ability to decipher meaning and insights from a multitude of content sources including social media and unstructured corporate data. The point is that ROI is out there and there are innovative companies like Expert System and beyond enabling it.

Megan Feil, December 6, 2014

OntoText Expands into North America with Strategic Hires

December 3, 2014

The article titled Semantic Technology Provider Ontotext Announces Strategic Hires for Ontotext USA on PRWeb discusses the expansion of Ontotext in North America. Tony Agresta, Brad Bogle and Tom Endyke joined Ontotext, as Senior VP of Worldwide Sales, Director of Marketing and Director of Solutions Architecture, respectively. Ontotext, the semantic search and text-mining leader has laid out several main focuses for the near future, including the growth of worldwide marketing efforts and the development of relationships. The article quotes Tony Agresta on Ontotext’s product development,

“Our flagship product, GraphDB™ (formerly OWLIM) has been deployed across the globe and is widely known as a highly scalable enterprise RDF triplestore… But what makes Ontotext truly unique are three other essential elements: 1) a full complement of semantic enrichment, integration, curation and authoring tools that extend our platform approach, 2) a large critical mass of semantic engineers, professional services and support teams that represent the most experienced professionals in the world and 3) S4, the Self Service Semantic Suite.”

Ontotext has provided semantic solutions for such companies as BBC, AstraZeneca, John Willey & Sons, and The British Museum. Their recent expansion efforts in North America are an attempt to reach more semantic technology users in this continent.

Chelsea Kerwin, December 03, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Next Page »