CyberOSINT banner

Watson Speaks Naturally

September 3, 2015

While there are many companies that offer accurate natural language comprehension software, completely understanding the complexities of human language still eludes computers.  IBM reports that it is close to overcoming the natural language barriers with IBM Watson Content Analytics as described in “Discover And Use Real-World Terminology With IBM Watson Content Analytics.”

The tutorial points out that any analytics program that only relies on structured data loses about four fifths of information, which is a big disadvantage in the big data era, especially when insights are supposed to be hidden in the unstructured.  The Watson Content Analytics is a search and analytics platform and it uses rich-text analysis to find extract actionable insights from new sources, such as email, social media, Web content, and databases.

The Watson Content Analytics can be used in two ways:

  • “Immediately use WCA analytics views to derive quick insights from sizeable collections of contents. These views often operate on facets. Facets are significant aspects of the documents that are derived from either metadata that is already structured (for example, date, author, tags) or from concepts that are extracted from textual content.
  • Extracting entities or concepts, for use by WCA analytics view or other downstream solutions. Typical examples include mining physician or lab analysis reports to populate patient records, extracting named entities and relationships to feed investigation software, or defining a typology of sentiments that are expressed on social networks to improve statistical analysis of consumer behavior.”

The tutorial runs through a domain specific terminology application for the Watson Content Analytics.  The application gets very intensive, but it teaches how Watson Content Analytics is possibly beyond the regular big data application.

Whitney Grace, September 3, 2015
Sponsored by, publisher of the CyberOSINT monograph

Suggestions for Developers to Improve Functionality for Search

September 2, 2015

The article on SiteCrafting titled Maxxcat Pro Tips lays out some guidelines for improved functionality when it comes deep search. Limiting your Crawls is the first suggestion. Since all links are not created equally, it is wise to avoid runaway crawls on links where there will always be a “Next” button. The article suggests hand-selecting the links you want to use. The second tip is Specify Your Snippets. The article explains,

“When MaxxCAT returns search results, each result comes with four pieces of information: url, title, meta, and snippet (a preview of some of the text found at the link). By default, MaxxCAT formulates a snippet by parsing the document, extracting content, and assembling a snippet out of that content. This works well for binary documents… but for webpages you wanted to trim out the content that is repeated on every page (e.g. navigation…) so search results are as accurate as possible.”

The third suggestion is to Implement Meta-Tag Filtering. Each suggestion is followed up with step-by-step instructions. These handy tips come from a partnering between Sitecrafting is a web design company founded in 1995 by Brian Forth. Maxxcat is a company acknowledged for its achievements in high performance search since 2007.

Chelsea Kerwin, September 2, 2015

Sponsored by, publisher of the CyberOSINT monograph

Maverick Search and Match Platform from Exorbyte

August 31, 2015

The article titled Input Management: Exorbyte Automates the Determination of Identities on Business On (a primarily German language website) promotes the Full Page Entity Detect from Exorbyte. Exorbyte is a world leader in search and match for large volumes of data. They boast clients in government, insurance, input management and ICT firms, really any business with identity resolution needs. The article stresses the importance of pulling information from masses of data in the modern office. They explain,

“With Full Page Entity Detect provides exorbyte a solution to the inbox of several million incoming documents.This identity data of the digitized correspondence (can be used for correspondence definition ) extract with little effort from full-text documents such as letters and emails and efficiently compare them with reference databases. The input management tool combines a high fault tolerance with accuracy, speed and flexibility.Gartner, the software company from Konstanz was recently included in the Magic Quadrant for Enterprise Search.”

The company promises that their Matchmaker technology is unrivaled in searching text without restrictions, even without language, allowing for more accurate search. Full Page Entity Detect is said to be particularly useful when it comes to missing information or overlooked errors, since the search is so thorough.

Chelsea Kerwin, August 31 , 2015

Sponsored by, publisher of the CyberOSINT monograph

Beyond Google, How to Work Your Search Engine

August 28, 2015

The article on Funnelback titled Five Ways to Improve Your Website Search offers tips that may seem obvious, but could always stand to be reinforced. Sometimes the Google site:<url> is not enough. The first tip, for example, is simply to be helpful. That means recognizing synonyms and perhaps adding an autocomplete function in case your site users think in different terms than you do. The worst case scenario is search is typing in a term and yielding no results, especially when the problem is just language and the thing being searched for is actually present, just not found. The article goes into the importance of the personal touch as well,

“You can use more than just the user’s search term to inform the results your search engine delivers… For example, if you search for ‘open day’ on a university website, it might be more appropriate to promote and display an ‘International Open Day’ event result to prospective international students instead of your ‘Domestic Student Open Day’ counterpart event. This change in search behavior could be determined by the user’s location – even if it wasn’t part of their original search query.”

The article also suggests learning from the search engine. Obviously, analyzing what customers are most likely to search for on your website will tell you a lot about what sort of marketing is working, and what sort of customers you are attracting. Don’t underestimate search.

Chelsea Kerwin, August 28, 2015

Sponsored by, publisher of the CyberOSINT monograph

Lexmark: Signs of Trouble?

August 27, 2015

I read “Shares of Lexmark International Inc. Sees Large Outflow of Money.”

The main point of the write up in my opinion was:

The company shares have dropped 41.65% in the past 52 Weeks. On August 25, 2014 The shares registered one year high of $50.63 and one year low was seen on August 21, 2015 at $29.11.

Today as I write this (August 26, 2015), Lexmark is trading at $28.25.

Why do I care?

The company acquired several search and content processing systems in the firm’s effort to find a replacement for the firm’s traditional business, printers. As you know, Lexmark is one of the IBM units which had an opportunity to find its future outside of IBM.

The company purchased three vendors which were among the companies I monitored:

  • Brainware, the trigram folks
  • ISYS Search Software, the 1988 old school search and retrieval system
  • Kapow (via Lexmark’s purchase of Kofax), the data normalization outfit.

Also, the company’s headquarters are about an hour from my cabin next to the pond filled with mine run off. Cutbacks at Lexmark may spell more mobile homes in my neck of the woods.

Stephen E Arnold, August 27, 2015

Insights into the Cut and Paste Coding Crowd

August 26, 2015

I read “How Developers Search for Code.” Interesting. The write up points out what I have observed. Programmers search for existing — wait for it — code.

Why write something when there are wonderful snippets to recycle. Here’s the paragraph I highlighted:

We also learn that a search session is generally just one to two minutes in length and involves just one to two queries and one to two file clicks.

Yep, very researchy. Very detailed. Very shallow. Little wonder that most software rolls out in endless waves of fixes. Good enough is the sort of sigma way.

Encouraging. Now why did that air traffic control crash happen? Where are the back ups to the data in Google’s Belgium server center? Why does that wonderful Windows 10 suck down data to mobile devices with little regard for data caps? Why does malware surface in Android apps?

Good enough: the new approach to software QA/QC.

Stephen E Arnold, August 26, 2015

How to Search the Ashley-Madison Data and Discover If You Had an Affair Too

August 26, 2015

If you haven’t heard about the affair-promoting website Ashley Madison’s data breach, you might want to crawl out from under that rock and learn about the millions of email addresses exposed by hackers to be linked to the infidelity site. In spite of claims by parent company Avid Life Media that users’ discretion was secure, and that the servers were “kind of untouchable,” as many as 37 million customers have been exposed. Perhaps unsurprisingly, a huge number of government and military personnel have been found on the list. The article on Reuters titled Hacker’s Ashley Madison Data Dump Threatens Marriages, Reputations also mentions that the dump has divorce lawyers clicking their heels with glee at their good luck. As for the motivation of the hackers? The article explains,

“The hackers’ move to identify members of the marital cheating website appeared aimed at maximum damage to the company, which also runs websites such as, causing public embarrassment to its members, rather than financial gain. “Find yourself in here?,” said the group, which calls itself the Impact Team, in a statement alongside the data dump. “It was [Avid Life Media] that failed you and lied to you. Prosecute them and claim damages. Then move on with your life. Learn your lesson and make amends. Embarrassing now, but you’ll get over it.”

If you would like to “find yourself” or at least check to see if any of your email addresses are part of the data dump, you are able to do so. The original data was put on the dark web, which is not easily accessible for most people. But the website Trustify lets people search for themselves and their partners to see if they were part of the scandal. The website states,

“Many people will face embarrassment, professional problems, and even divorce when their private details were exposed. Enter your email address (or the email address of your spouse) to see if your sexual preferences and other information was exposed on Ashley Madison or Adult Friend Finder. Please note that an email will be sent to this address.”

It’s also important to keep in mind that many of the email accounts registered to Ashley Madison seem to be stolen. However, the ability to search the data has already yielded some embarrassment for public officials and, of course, “family values” activist Josh Duggar. The article on the Daily Mail titled Names of 37 Million Cheating Spouses Are Leaked Online: Hackers Dump Huge Data File Revealing Clients of Adultery Website Ashley Madison- Including Bankers, UN and Vatican Staff goes into great detail about the company, the owners (married couple Noel and Amanda Biderman) and how hackers took it upon themselves to be the moral police of the internet. But the article also mentions,

“Ashley Madison’s sign-up process does not require verification of an email address to set up an account. This means addresses might have been used by others, and doesn’t prove that person used the site themselves.”

Some people are already claiming that they had never heard of Ashley Madison in spite of their emails being included in the data dump. Meanwhile, the Errata Security Blog entry titled Notes on the Ashley-Madison Dump defends the cybersecurity of Ashley Madison. The article says,

“They tokenized credit card transactions and didn’t store full credit card numbers. They hashed passwords correctly with bcrypt. They stored email addresses and passwords in separate tables, to make grabbing them (slightly) harder. Thus, this hasn’t become a massive breach of passwords and credit-card numbers that other large breaches have lead to. They deserve praise for this.”

Praise for this, if for nothing else. The impact of this data breach is still only beginning, with millions of marriages and reputations in the most immediate trouble, and the public perception of the cloud and cybersecurity close behind.


Chelsea Kerwin, August 26, 2015

Sponsored by, publisher of the CyberOSINT monograph

SLI Share Price: Headwinds for Search Evident

August 26, 2015

I read “SLI CEO Ryan Bemoans Low Share price, Says It Should Be $2-Plus.” This is a woulda, coulda, shoulda write up. Reality seems to ignore this somewhat lame mantra.

The write up says:

SLI Systems chief executive Shaun Ryan says the company’s share price is “significantly underpriced” and could be at least four times higher based on other public software-as-a-service valuations.

The write up included this bit of information:

The company today reported a loss of $7.1 million in the year ended June 30, widening from a loss of $5.7 million a year earlier. Operating revenue increased 27 percent to $28.1 million, in line with the $28 million guidance given in April, when it flagged that second-half sales would be lower than expected. Annualized recurring revenue (ARR), its preferred financial measure based on forward subscription revenue, rose 39 percent to $34.6 million.

SLI says its system

… helps you increase e-commerce revenue by connecting your online and mobile shoppers with the products they’re most likely to buy. SLI solutions include SaaS-based learning search, navigation, merchandising, mobile, recommendations and user-generated SEO.

Other publicly trade search vendors are struggling with their financial performance too. For example, Sprylogics, a Canadian vendor, sees it shares trading at $0.33. Lexmark shares are at $28 and change.

Search is a tough niche as Hewlett Packard and IBM are learning.

Stephen E Arnold, August 29, 2015

What Might be Left Out of SharePoint 2016

August 25, 2015

When a new version of any major software is released, users get nervous as to whether their favorite features will continue to be supported or will be phased out. Deprecation is the process of phasing out certain components, and users are warily eyeing SharePoint Server 2016. Read all the details in the Search Content Management article, “Where Can We Expect Deprecation in SharePoint 2016?”

The article begins:

“New versions of Microsoft products always include a variety of additional tools and capabilities, but the flip side of updating software is that familiar features are retired or deprecated. We can expect some changes with SharePoint 2016.”

While Microsoft has yet to officially release the list of what will make the cut and what will be deprecated, they have made it known that InfoPath is being let go. To stay on top of future developments as they happen, stay tuned to Stephen E. Arnold has made a lifetime career out of all things search, and he lends his expertise to SharePoint on a dedicated feed. It is a great resource for SharePoint tips and tricks at a glance.

Emily Rae Aldridge, August 25, 2015

Sponsored by, publisher of the CyberOSINT monograph

Insights Into SharePoint 2013 Search

August 25, 2015

It has been awhile since we have discussed SharePoint 2013 and enterprise search.  Upon reading “SharePoint 2013: Some Observations On Enterprise Search” from Steven Van de Craen’s Blog, we noticed some new insights into how users can locate information on the collaborative content platform.

The first item he brings our attention to is the “content source,” an out-of-the-box managed property option that create result sources that aggregate content from different content sources, i.e. different store houses on the SharePoint.   Content source can become a crawled property.  What happens is that meta elements from Web pages made on SharePoint can be added to crawled properties and can be made searchable content:

“After crawling this Web site with SharePoint 2013 Search it will create (if new) or use (if existing) a Crawled Property and store the content from the meta element. The Crawled Property can then be mapped to Managed Properties to return, filter or sort query results.”

Another useful option was mad possible by a user’s request: making it possible to add query string parameters to crawled properties.  This allows more information to be displayed in the search index.  Unfortunately this option is not available out-of-the-box and it has to be programmed using content enrichment.

Enterprise search on SharePoint 2013 still needs to be tweaked and fine-tuned, especially as users’ search demands become more complex.  It makes us wonder when Microsoft will release the next SharePoint installment and if the next upgrade will resolve some of these issues or will it unleash a brand new slew of problems?  We cannot wait for that can of worms.

Whitney Grace, August 25, 2015
Sponsored by, publisher of the CyberOSINT monograph


« Previous PageNext Page »