Content Matching Helps Police Bust Dark Web Sex Trafficking Ring

September 4, 2015

The Dark Web is not only used to buy and sell illegal drugs, but it is also used to perpetuate sex trafficking, especially of children.  The work of law enforcement agencies working to prevent the abuse of sex trafficking victims is detailed in a report by the Australia Broadcasting Corporation called “Secret ‘Dark Net’ Operation Saves Scores Of Children From Abuse; Ringleader Shannon McCoole Behind Bars After Police Take Over Child Porn Site.”  For ten months, Argos, the Queensland, police anti-pedophile taskforce tracked usage on an Internet bulletin board with 45,000 members that viewed and uploaded child pornography.

The Dark Web is notorious for encrypting user information and that is one of the main draws, because users can conduct business or other illegal activities, such as view child pornography, without fear of retribution.  Even the Dark Web, however, leaves a digital trail and Argos was able to track down the Web site’s administrator.  It turned out the administrator was an Australian childcare worker who had been sentenced to 35 years in jail for sexually abusing seven children in his care and sharing child pornography.

Argos was able to catch the perpetrator by noticing patterns in his language usage in posts he made to the bulletin board (he used the greeting “hiya”). Using advanced search techniques, the police sifted through results and narrowed them down to a Facebook page and a photograph.  From the Facebook page, they got the administrator’s name and made an arrest.

After arresting the ringleader, Argos took over the community and started to track down the rest of the users.

” ‘Phase two was to take over the network, assume control of the network, try to identify as many of the key administrators as we could and remove them,’ Detective Inspector Jon Rouse said.  ‘Ultimately, you had a child sex offender network that was being administered by police.’ ”

When they took over the network, the police were required to work in real-time to interact with the users and gather information to make arrests.

Even though the Queensland police were able to end one Dark Web child pornography ring and save many children from abuse, there are still many Dark Web sites centered on child sex trafficking.


Whitney Grace, September 4, 2015
Sponsored by, publisher of the CyberOSINT monograph




IBM and Its Federated Search Camelot

July 25, 2015

Short honk: I scanned my Twitter feed this morning. What did I see? An impossible assertion from the marketing crazed folks at IBM Watson. Let me tell you, IBM Watson and its minions output a hefty flow of tweets. A year or so ago, IBM relied on mid tier consulting firms experts like Dave Schubmehl (yep, the fellow who sold my research on Amazon without my permission). Now there are other voices.


But the message, not just the medium, are important. IBM’s assertion is that there will be no more “data silos in enterprise search.” You can learn about IBM’s “reality” in a webcast.

Now, I am not planning on sitting through a webcast. I would, however, like to enumerate several learnings from my decades of enterprise information access work. You can use this list as a jump start for your questions to the IBM wizards. Here goes:

  1. In an enterprise, what happens when an indexing system makes available in a federated search system information to a legal matter which is not supposed to be available to anyone except the attorneys involved in the matter?
  2. In an enterprise, what happens if information pertinent to a classified government project is made available in a federated search system which has not be audited for access control compliance?
  3. What happens when personnel information containing data about a medical issue is indexed and made available in an enterprise search system when email attachments are automatically indexed?
  4. How does the federated system deal with content in servers located in a research facility engaged in new product research?
  5. What happens when sales and pricing data shared among key account executives is indexed and made available to a contractor advising the company?
  6. What is the process for removing pointers to data which are not supposed to be in the enterprise search system?
  7. What security measures are in place to ensure that a lost or stolen mobile device does not have access to an enterprise search system?
  8. How much manual work is required before an organization turns on the Watson indexing system?

These will get you started on the cross silo issues?

Oh, the answer to these questions is that the person identified as responsible for making the data available may get to find a future elsewhere. Amazon warehouses are hiring in southern Indiana.

Alternatively one can saddle up a white stallion, snag a lance, and head for the nearest windmill.

Stephen E Arnold, July 25, 2015

Coveo Pivots to Federated Search

October 21, 2014

Through a post at their blog Coveo Insights, enterprise-search firm Coveo urges, “Power Your Customer Service with Unified Search Driven Knowledge.” The write-up gives a few reasons why such “omni-channel” (federated) search functionality is a wise choice for customer service. Writer and Coveo marketing director Tucker Hall explains:

“Customers … engage with companies across a growing number of channels — from self-service portals and contact centers, to social media and field service engagements. Today’s savvy customer expects (and deserves) a seamless and consistent service experience across all of these channels. Omni-channel customer service has now become essential for companies hoping to maximize customer engagement, satisfaction, and retention.

“Successful omni-channel customer service can prove difficult regardless of the specific technologies and systems an organization has in place. That’s because success demands that customers and support personnel alike have swift, intuitive access to the case-resolving knowledge and expertise they need, when and how they need it.”

Hall asserts that many companies are missing out because they “fail to appreciate” the reasons to choose federated search: data and expertise are located in many systems, crowd-sourcing is a thing, and analytics must be actionable. But you, dear reader, already knew those, didn’t you? More on these points can be found in Coveo’s solution brief on the subject (registration required).

It is interesting to note that, while Coveo and others focus on federated search, Microsoft is more into the search-without-searching method called Delve. Let many flowers bloom!

Coveo serves organizations large, medium, and small with solutions that aim to be agile and easy to use yet scalable, fast, and efficient. The company was founded in 2005 by members of the team which developed Copernic Desktop Search. Coveo maintains offices in the U.S., Netherlands, and Quebec.

Cynthia Murrell, October 21, 2014

Sponsored by, developer of Augmentext

Pipe Information Dreams Often Forget

September 14, 2013

Do we dare broach the subject about heath care information and electronic media records? Yes, we do and we take into account “Dr. Karl Kochendorfer: Bridging The Knowledge Gap In Health Care” from Federated Search Blog. Dr. Karl Kochendorfer wants there to be an official federated search for the national health care system. His idea is to connect health care professionals to authoritative information with an instantaneous return. He cites that doctors and nurses are relying on Wikipedia and Google searches rather than authorized databases, because it is faster. Notice the danger?

Dr. Kochendorfer mentions this fact in a TED talk he gave in April called “Seek And Ye Shall.” He presents the idea for a federated search in this discussion, along with more of these facts:

  1. “There are 3 billion terabytes of information out there.
  2. There are 700,000 articles added to the medical literature every year.
  3. Information overload was described 140 years ago by a German surgeon: “It has become increasingly difficult to keep abreast of the reports which accumulate day after day … one suffocates through exposure to the massive body of rapidly growing information.”
  4. With better search tools, 275 million improved decisions could be made.
  5. Clinicians spend 1/3 of their time looking for information.”

Dr. Kochendorfer ‘s idea is grand, but how many academic databases are lining up to offer their information for free or without a hefty subscription fee? Academia is already desperate for money, asking them to share their wealth of knowledge without green will not go over too highly. Should there be a federated search with authoritative information and instantaneous results? Yes. Will it happen? Keep fixing the plumbing.

Whitney Grace, September 14, 2013

Sponsored by, developer of Beyond Search

A Call for Federated Search in Healthcare

September 12, 2013

The general search engines available on the web are simply not adequate for healthcare professionals looking for the latest pertinent information (let alone personalized data on their patients). The Federated Search Blog shares an important Tedx Talk in its piece, “Dr. Karl Kochendorfer: Bridging the Knowledge Gap in Health Care,” which advocates the adoption of federated search for the healthcare industry. I recommend the video not only for those in the healthcare or search fields, but for anyone interested in getting the best care for themselves and their families. The write-up tells us:

“As a family physician and leader in the effort to connect healthcare workers to the information they need, Dr. Kochendorfer acknowledges what those of us in the federated search world already know – Google and the surface web contain so little of the critical information your doctor and his staff need to support important medical decision-making.”

The write-up summarizes highlights from the talk, including the statistic that says a third of clinicians’ time is spent hunting down information. No wonder doctors are spending less time with patients! The article continues:

“And, the most compelling reason to get federated search into healthcare is the sobering thought by Dr. Kochendorfer that doctors are now starting to use Wikipedia to get answers to their questions instead of the best evidence-based sources out there just because Wikipedia is so easy for them to use. Scary.”

Yes, scary is a good word for it. It is true that data reservoirs that feed federated searches can contain errors—a point Kochendorfer does not address in this video. Still, I have to agree with the write-up: the doctor makes a compelling case on this important issue. The video concludes with a call for listeners to support the development of federated healthcare search tools like MedSocket and open standards like Infobuttons. Sounds like a good idea to me.

Cynthia Murrell, September 12, 2013

Sponsored by, developer of Augmentext

Big Questions about Federated and Universal Search Remain

June 16, 2013

Search Engine Watch re-posted an aggressive article towards Google recently: “Google Should Kill or Radically Change Universal Search Results.” The message comes from Foundem, an UK price comparison firm that has rejected Google’s proposed web search concessions.

These concessions come following the European Commission’s ongoing antitrust investigation into Google’s search business. Foundem believes that their proposed concessions will not lessen Google’s monopoly on web search.

The article tells us that the proposed concessions ignore Google’s monopoly on search:

“Instead, the concessions focus on minor alterations to Google’s “self-serving Universal Search inserts.” According to Foundem’s report, any concessions must address Google’s AdWords search capabilities. Foundem says AdWords will continue to give Google an unfair advantage until they are re-worked. The company says that the current proposal fails to correct Google searches relevance for showing its own services in results. Foundem believes that to truly slow Google’s search monopoly it would have to either eliminate universal search or drastically change it.”

This information reported suggests there is still a big question about federated search results despite the fact that Google’s Universal Search initiative was announced back in 2007.

Megan Feil, June 16, 2013

Sponsored by, developer of Beyond Search

WordPress Plugin for SearchBlox Now Available

October 10, 2012

Web sites that wish to use WordPress to build their content and SearchBlox for federated search will soon have an easier time uniting the two. On their blog, SearchBlox announces, “WordPress Plugin Makes It Easy to Integrate SearchBlox.” The post by Timo Selvarag reports:

“SearchBlox has released an updated WordPress plugin to search your WordPress site and integrate faceted search results into your site from the SearchBlox Server. Unlike the Solr Search Plugin, there are fields to configure or schema to load. Simply install the plugin and follow the getting started guide to integrate search into your site. SearchBlox provides fast instant search results from the SearchBlox Server. You can also crawl and integrate external sites, feeds and file system based documents for searching within your WordPress site.”

There’s a demo of the plugin here. WordPress is an open source project licensed under the GPL. Begun as a blogging system in 2003, it has grown into a full content management system with thousands of plugins, widgets, and themes now available.

SearchBlox is built on top of Apache‘s Lucene/Solr. SearchBlox was also founded in 2003, and is located in Richmond, Virginia. Their client roster now tops 300 organizations in 30 countries.

Cynthia Murrell, October 10, 2012

Sponsored by, developer of Augmentext

Federated Data Explained

June 12, 2012

Index Data co-founder Sebastian Hammer discusses the nuts and bolts of search systems in an interview with David Weinberger of the Harvard Library Innovation Lab in “Podcast: Sebastian Hammer on Federated Search.” Both the 23-minute podcast and the written transcript are available at the above link.

The interview begins by defining federated search (a single interface for multiple data sources) and explaining how it differs from search engines like Google (which gather information then pull query results from a unified database.)

Hammer acknowledges that, in some situations, the federated approach is the only choice. For example, you’ll need it if the data you’re after is subject to frequent change. However, federated searches can be terribly slow, and all the data might not be available at the same time. Also, merging federated results can be problematic. On the other hand, building an index by pulling in everything you might possibly want to search can strain practicality. Hammer’s solution– a hybrid approach. He explains:

“So my notion is that you want to be able to gather stuff together in an index when it is practical and possible, and you want to be able to federate for the stuff where it’s not practical or possible. And you want to try to do both of those things as well as you possibly can and you want to try to somehow get the results of both of those types of searches back to the user as a single nice friendly merged search results.”

Simple, right? The interview goes into much greater depth on federated search now and in the future, as well as ways Hammer’s company strives to make the hybrid approach nice and friendly. I recommend checking it out.

Index Data has been creating discovery solutions for over 17 years. Based in Berlin, the company serves national libraries and consortia, government agencies, and businesses. They are proud to contribute significantly to the open source community. The company is happiest when riding on the cutting edge of their field.

Cynthia Murrell, June 12, 2012

Sponsored by PolySpot

IBM Buys Vivisimo Allegedly for Its Big Data Prowess

April 25, 2012

Big data. Wow. That’s an angle only a public relations person with a degree in 20th century American literature could craft. Vivisimo is many things, but a big data system? News to me for sure.

IBM has been a strong consumer and integrator of open source search solutions. Watson, the game show winner, used Lucene with IBM wrapper software to keep the folks in Jeopardy post production on their toes.

vivisimo search

A screen shot of the Vivisimo Velocity system displaying search results for the RAND organization. Notice the folders in the left hand panel. The interface reveals Vivisimo’s roots in traditional search and retrieval. The federating function operates behind the scenes. The newest versions of Velocity permit a user to annotate a search hit so the system will boost it in subsequent queries if the comment is positive. A negative rating on a result suppresses that result.

I learned that IBM allegedly purchased Vivisimo, a company which I have covered in my various monographs about search and content processing. Forbes ran a story which was at odds with my understanding of what the Vivisimo technology actually does. Here’s the Forbes’ title: “IBM To Buy Vivisimo; Expands Bet On Big Data Analytics.” Notice the phrase “big data analytics.”

Why do I point out the “big data” buzzword? The reasons include:

  • Vivisimo has a clustering method which takes search results and groups them, placing similar results identified by the method in “folders”
  • Vivisimo has a federating method which, like Bright Planet’s and Deep Web Technologies’, takes a user’s query and sends the query to two or more indexing systems, retrieves the results, and displays them to the user
  • Vivisimo has a clever de-duplication method which makes the results list present one item. This is important when one encounters a news story which appears on multiple Web sites.

According to the write up in Forbes, a “real” news outfit:

IBM this morning said it has agreed to acquire Vivisimo, a Pittsburgh-based provider of big data access and analysis tools.

Okay, but in Beyond Search we have documented that Vivisimo followed this trajectory in its sales and marketing efforts since the company opened for business in 2000. In fact, the Wikipedia write up about Vivisimo says this:

Vivisimo is a privately held enterprise search software company in Pittsburgh that develops and sells software products to improve search on the web and in enterprises. The focus of Vivisimo’s research thus far has been the concept of clustering search results based on topic: for example, dividing the results of a search for “cell” into groups like “biology,” “battery,” and “prison.” This process allows users to intuitively narrow their search results to a particular category or browse through related fields of information, and seeks to avoid the “overload” problem of sorting through too many results.

Read more

Access Control and Enterprise Search Capabilities

November 29, 2011

Nuances of enterprise search and the challenges some searchers face are discussed in “Why is Enterprise Search more complex than web or desktop search?”

“Access control to the data is a big difference between Enterprise search and the other 2 search types.  On the Web, everybody is allowed to see the data. On your desktop you are allowed to see all data, because you are the owner. Web and desktop search can index all the data without to take access control into account.”

In an enterprise, access control is very important. But we prefer to spend more time finding than searching. To get the results you want, you need the right solution and the right search structure and support.

Access control is not an obstacle for Mindbreeze. Their search technology maintains user rights while searching all company-relevant information within the enterprise and in the cloud.

Sara Wood, November 29, 2011

Sponsored by

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta