About that Cloud Security?
May 21, 2011
Let’s assume the Bloomberg story “Amazon Server Said to Be Used in Sony Attack”. If a one cloud based service can be used to attack another cloud based service, does the owner of the service used in the attack have an obligation to prevent the attack? Bloomberg reports that Sony is concerned. No kidding, but what about the customers? Bloomberg says:
…the breach at Amazon is likely to call attention to concerns some businesses have voiced over the security of computing services delivered via others’ remote servers, referred to as cloud computing. Cloud security is Amazon’s top priority, Chief Executive Officer Jeff Bezos said at an event sponsored by Consumer Reports magazine this week.
Will substantive, timely action be taken to address the issues associated with this type of alleged use of cloud services? I suppose that the companies involved will try to slap on a patch. When the dust settles, will there be significant change? My hunch is that the quest for revenues will come first. The costs associated with figuring out problems * before * they occur are just too high.
We’re still in the react mode when it comes to online. Learning to live with unknown risks just adds spice to the online stew.
Stephen E Arnold, May 21, 2011
Freebie
Booz, Allen: Sunset Limited Leaves Station
May 21, 2011
I know. I know. There are two Booz, Allen & Hamiltons. There is the MBA-infused Booz & Co. in New York. And there is the government centric outfit in Virginia. I worked at the “old” Booz, Allen & Hamilton. May it rest in peace. The only reason I thought about fading blue chips was the chance juxtaposition of two news items about these two new entities.
First, the MBA outfit put out a study that identified the three most innovative companies in the world. Hold your breath. Exhale slowly. Now that you are calm, here’s the shocker. The most innovative firms are Apple, Google, and 3M. Bet you did not know that. You can get the MBA stimulating insight in “Apple, Google, 3M World’s Most Innovative: Booz & Co.”
Second, the government money surfing part of the old Booz, Allen & Hamilton announced that the transportation unit, built from Landrum & Brown, Simpson & Curtain, and some other components assembled by John Dowdle and his colleagues decades ago has been sold off. Read all about it in “Consulting Firm CH2M Hill Agrees to Buy Booz Allen Hamilton Unit.”
What’s my view? Stating the obvious is a good practice for MBAs. The real information surfaces elsewhere. PR is good. As for the dumping of the transportation unit, I had a question: “Aren’t the Booz, Allen managers able to make this unit pay?” Of course, the new Booz Allen has some debt and is competing, not on MBA insights, but on costs, so change is inevitable.
I see blue chip fading to gray. But management consultants are supposed to be experts in management, correct?
Stephen E Arnold, May 21, 2011
Freebie unlike consulting firms’ professional services.
Forensic Logic: Open Source Search and Law Enforcement
May 20, 2011
An exclusive interview with Ronald Mayer, chief technical officer of Forensic Logic, reveals how open source search is contributing to law enforcement activities. Mr. Mayer will be one of the featured speakers at the Lucene Revolution conference in San Francisco the week of May 24, 2011. In the interview, Mr. Mayer observed:
he flexibility of Lucene and Solr interest are what really attracted me to Solr. There are many factors that contribute to how relevant a search is to a law enforcement user. Obviously traditional text-search factors like keyword density, and exact phrase matches matter. How long ago an incident occurred is important (a recent similar crime is more interesting than a long-ago similar crime). And location is important too.
When asked about law enforcement’s use of commercial proprietary search solutions, Mr. Mayer said:
Where appropriate, we also use commercial search solutions. For our analysis and reporting product that works mostly with structured data we use a commercial text search solution because it integrates well with the relational tables that also filter results for such reporting. The place where Solr/Lucene’s flexibility really shined for us is in our product that brings structured, semi-structured, and totally unstructured data together.
To learn more about Forensic Logic, navigate to www.forensiclogic.com. For more information about Lucene/Solr, register now to attend the Lucene Revolution.
Stephen E Arnold, May 2011
Sponsored by Lucid Imagination
Interview: Forensic Logic CTO, Ronald Mayer
May 20, 2011
Introduction
Ronald Mayer has spent his career with technology start-ups in a number of fields ranging from medical devices to digital video to law enforcement software. Ron has also been involved in Open Source for decades, with code that has been incorporated in the LAME MP3 library, the PostgreSQL database, and the PostGIS geospatial extension. His most recent speaking engagement was when he gave a presentation on a broader aspect of this system to the SD Forum’s Emerging Tech SIG titled “Fighting Crime: Information Choke Points & New Software Solutions.” His Lucene Revolution talk is at http://lucenerevolution.org/2011/sessions-day-2#highly-mayer.
Ronald Mayer, Forensic Logic
The Interview
When did you become interested in text and content processing?
I’ve been involved in crime analysis with Forensic Logic for the past eight years. It quickly became apparent that while a lot of law enforcement information is kept in structured database fields, often richer information is in their text narratives, word documents on their desktops, or internal email lists. Police officers are all-to-familiar with long structured search forms for looking stuff up in their systems that are built on top of relational databases. And there are adequate text-search utilities for searching the narratives in their various systems one at a time. And separate text-search utilities for searching their mailing lists. But what they really need is something as simple as Google that works well on all the information they’re interested in–both their structured and unstructured content–both their internal data documents and ones from other sources; so we set out to build one.
What is it about Lucene/Solr that most interests you, particularly as it relates to some of the unique complexity law enforcement search poses?
The flexibility of Lucene and Solr interest are what really attracted me to Solr. There are many factors that contribute to how relevant a search is to a law enforcement user. Obviously traditional text-search factors like keyword density, and exact phrase matches matter. How long ago an incident occurred is important (a recent similar crime is more interesting than a long-ago similar crime). And location is important too. Most police officers are likely to be more interested in crimes that happen in their jurisdiction or neighboring ones. However, a state agent focused on alcoholic beverage licenses may want to search for incidents from anywhere in a state but may be most interested in ones that are at or near bars. The quality of the data makes things interesting too. Victims often have vague descriptions of offenders, and suspects lie. We try to program our system so that a search for “a tall thin teen male” will match an incident mentioning “a 6’3″ 150lb 17 year old boy.” There’s been a steady emergence of information technology in law enforcement, such as in New York City’s CompStat.
What are the major issues in this realm, from an information retrieval processing perspective?
We’ve had meetings with the NYPD’s CompStat group, and they have inspired a number of features in our software including powering the CompStat reports for some of our customers. One of the biggest issues in law enforcement data today is bringing together data from different sources and making sense of it. These sources could be from different systems within a single agency like records management and CAD (Computer Aided Dispatch) systems and internal agency email lists – or groups of cities sharing data with each other – or federal agencies sharing data with state and local agencies.
Is this a matter of finding new information of interest in law enforcement and security? Or is it about integrating the information that’s already there? Put differently, is it about connecting the dots you already have, or finding new dots in new places?
Both. Much of the work we’re doing is connecting dots between data from two different agencies; or two different software systems from within a single agency. But we’re also indexing a number of non-obvious sources as well. One interesting example is a person who was recently found in our software, and one of the better documents describing a gang he’s potentially associated with a Web page about one of his relatives in Wikipedia.
You’ve contributed to Lucene/Solr. How has the community aspect of open source helped you do your job better, and how do you think it has helped other people as well?
It’s a bit early to say I’ve contributed – while I posted my patch to their issue tracking Web site, last I checked it hadn’t been integrated yet. There are a couple users who mentioned to me and the mailing lists that they are using it and would like to see it merged. The community help has been incredible. One example is when we started a project to make a minimal simple user interface to let novice users find agency documents. We noticed that the University of Virginia/Stanford/etc.’s Project Blacklight which is a beautiful library search product built on Solr/Lucene. Our needs for one of our products weren’t too different – just for an internal collection of documents with a few additional facets. With that as a starting point we had a working prototype in a few man-days of work; and a product in a few months.
What are some new or different uses you would like to see evolve within search?
I’d be interesting if the search phrases can be aware of what adjectives go with which nouns. For example a phrase like
‘a tall white male with brown hair and blue eyes and
a short asian female with black hair and brown eyes’
should be a very close match to a document that says
‘blue eyed brown haired tall white male; brown eyed
black haired short asian female’
Solr’s edismax’s “pf2” and “pf3” can do quite a good job at this by considering the distance between words, but note that in the latter document the “brown eyes” clause is nearer to the male than the female; so there’s some room for improvement. I’d like to see some improved spatial features as well. Right now we use a single location in a document to help sort how relevant it might be to a user (incident’s close to a user’s agency are often more interesting than ones half way across the country). But some documents may be highly relevant in multiple different locations, like a drug trafficking ring operating between Dallas and Oakland.
When someone asks you why you don’t use a commercial search solution, what do you tell them?
I tell them that where appropriate, we also use commercial search solutions. For our analysis and reporting product that works mostly with structured data we use a commercial text search solution because it integrates well with the relational tables that also filter results for such reporting. The place where solr/lucene’s flexibility really shined for us is in our product that brings structured, semi-structured, and totally unstructured data together.
What are the benefits to a commercial organization or a government agency when working with your firm? How does an engagement for Forensic Logic move through its life cycle?
Our software is used to power the Law Enforcement Analysis Portal (LEAP) project which is a software-as-a-services platform for law enforcement tools not unlike Salesforce.com is for sales software. The project started in Texas and has recently expanded to include agencies from other states and the federal government. Rather than engaging us directly, a government agency would engage with the LEAP Advisory Board, which is a group of chiefs of police, sheriffs, and state and federal law enforcement officials. We provide some of the domain-specific software, while other partners such as Sungard manage some operations and other software and hardware vendors provide their support. The benefits of government agencies working with us are similar to the benefits of an enterprise working with Salesforce.com – leading edge tools without having to buy expensive equipment and software and manage it internally.
One challenge to those involved with squeezing useful elements from large volumes of content is the volume of content and the rate of change in existing content objects. What does your firm provide to customers to help them deal with the volume scaling) challenge? What is the latency for index updates? Can law enforcement and public security agencies use this technology to deal with updates from high-throughput sources like Twitter? Or is the signal-to-noise ratio too weak to make it worth the effort?
In most cases when a record is updated in an agency’s records management system, the change pushed to our system in a few minutes. For some agencies – mostly with older mainframe based systems, the integration’s a nightly batch job. We don’t yet handle high-throughput sources like Twitter. License plate readers on freeways are probably the highest throughput data source we’re integrating today. But we strongly believe it is worth the effort to handle the high-throughput sources like Twitter, and that it’s our software’s job to deal with the signal-to-noise challenges you mentioned to try to present more signal than noise to the end user.
Visualization has been a great addition to briefings. On the other hand, visualization and other graphic eye candy can be a problem to those in stressful operational situations? What’s your firm’s approach to presenting “outputs” for end user reuse or for mobile access? Is there native support in Lucid Imagination for results formats?
Visualization’s very important to law enforcement; with crime mapping and reporting being very common needs. We have a number of visualization tools like interactive crime maps, heat maps, charts, time lines, and link diagrams built into our software, and we also expose XML Web services to let our customers integrate their own visualization tools. Some of our products were designed with mobile access in mind. Others have such complex user interfaces you really want a keyboard.
There seems to be a popular perception that the world will be doing computing via iPad devices and mobile phones. My concern is that serious computing infrastructures are needed and that users are “cut off” from access to more robust systems? How do you see the computing world over the next 12 to 18 months?
I think the move to mobile devices is *especially* true in law enforcement. For decades most officers have “searched” their systems by using the radio they carry to verbally ask for information about people and property. It’s a natural transition for them to do this on a phone or iPad instead. Similarly, their data entry is often done first in paper in the field, and then re-data-entered into computers. One agency we work with will be getting iPads for each of their officers to replace both of those. We agree that serious computing infrastructures are needed, but our customers don’t want to manage those themselves. Better if an SaaS vendor manages a robust system, and what better devices than iPads and phones to access it. That said, for some kinds of analysis a powerful workstation is useful, so good SaaS vendors will provide Web services so customers can pull whatever data they need into their other applications.
Put on your wizard hat. What are the three most significant technologies that you see affecting your search business? How will your company respond?
Entity extraction from text documents is improving all the time; so soon we’ll be able to distinguish if a paragraph mentioning “Tom Green” is talking about a person or the county in Texas. For certain types of data we integrate, XML standards for information sharing such as the National Information Exchange Model are finally gaining momentum. As more software vendors support it, it’ll make it easier to inter-operate with other systems. Rich-media processing–like facial recognition, license plate reading, OCR, etc.–are making new media types searchable and analyzable as well.
I note that you’re speaking at the Lucene Revolution conference. What effect is open source search having in your space? I note that the term ‘open source intelligence’ doesn’t really overlap with ‘open source software’. What do you think the public sector can learn from the world of open source search applications, and vice versa?
Many of the better tools are open source tools. In addition to Lucene/Solr, I’d note that the PostGIS extension to the PostgreSQL database is leading the commercial implementations of geospatial tools in some ways. That said, there are excellent commercial tools too. We’re not fanatic either way. Open Source Intelligence is important as well; and we’re working with universities to bring some of the collected research that they do on organized crime and gangs into our system. Regarding learning experiences? I think the big lesson is that easy collaboration is a very powerful tool – whether it’s sharing source code or sharing documents and data.
Lucene/Solr seems to have matured significantly in recent years, achieving a following large and sophisticated enough to merit a national conference dedicated to the open source projects, Lucene Revolution. What advice do you have for people who are interested in adopting open source search, but don’t know where to begin?
If they’re interested, one of the easiest ways to begin is to just try it. On Linux you can probably install it with your OS’s standard package manager with a command like “apt-get install solr-jetty” or similar. If they have a particular need in mind, they might want to look if someone already built a Lucene/Solr powered application similar to their needs. For example, we wanted a searchable index for a set of publications/documents, and Project Blacklight gave us a huge head start.
David Fishman, May 20, 2011
Post sponsored by Lucid Imagination. Posted by Stephen E Arnold
A Password Library
May 20, 2011
Some search systems require log ins. We wanted to pass along what we call a “password library.” Jimmy Ruska gives us a password library update in “Most Common Passwords List from 3 Databases.” We submit this link for your reference.
Writes Ruska,
There has been three instances that I know of where a significant number of hacked account passwords have been publicly released. I have obtained the lists and made a thorough analysis of each of them, including the most common passwords and character frequencies.
Scanning the lists, it’s amazing how many easily hackable choices people use. For example, 123456, password, and letmein feature prominently. Why bother having a password if you don’t take it seriously?
Try a word or phrase that you can remember but that can’t be easily linked to you. Then throw in some numbers and special characters. One technique to help you remember is to replace letters with similar l00king choic3s. This does slow down hacking software considerably. So does using the maximum number of characters.
Remember—the lazy ones are the first victims.
Cynthia Murrell, May 20, 2011
Freebie
An Essential Guide for Information Professionals
May 20, 2011
Infonista has posted a review a wonderful book entitled The Information and Knowledge Professional’s Career Handbook. Full disclosure: Ulla de Stricker is a friend of ours, and we just love her and her co-author, Jill Hurst-Wahl.
Though we admit to a little bias, we’re sure we’d be recommending this book in any case. The Infonista review summarizes what you have to look forward to:
“In fifteen chapters, the authors provide detailed, practical career advice that comes across as a cross between coaching, mentoring, and okay, (in the nicest possible way), a bit of nagging. But it’s clear that their goal is to help readers avoid career potholes if possible. . . .
“Reading The Information and Knowledge Professional’s Handbook is like hanging out with two really smart, experienced, and wise mentors who aren’t going to sugarcoat any of their advice – because they know you really need the real deal. The information they provide is practical, actionable, and from this professional’s experience, spot on.”
This praise is no surprise to us, of course. We knew these ladies are at the top of their field.
Do yourself a favor and pick up a copy right away.
Cynthia Murrell May 20, 2011
SharePoint: In a Tuxedo and Ready for the Big Time
May 20, 2011
We at Search Technologies read the short news article called “SharePoint Naked” in Beyond Search on May 18, 2011. We found the write up somewhat amusing, but we also think that the comments about SharePoint as a development platform were at odds with our experience.
First, please, point your browser to the MSDN Developer Team Blog and the story “SharePoint 2010 Development Platform Stack.” The diagram presents the major building blocks of the SharePoint system.
This type of diagram presents what my college psychology professor called the gestalt.These types of broad views serve the same purpose as a city map. One has to know where the major features are, what roadways lead into and out of the city, and a range of other high level information.
The Microsoft blog diagram serves this function for a professional working with SharePoint.In fact, I doubt that a busy financial officer would look at this road map. Financial people monitor other types of information. The CFO works in one city and the SharePoint developer in another.
Both use maps, just different ones.Second, we think this diagram is extremely useful. It identifies the relationship among key components of the SharePoint development stack.
I found the inclusion of the Windows Server 2008 and the SharePoint Server 2010 as “book ends” insightful. Between these digital bookends, the focus on SharePoint Foundation 2010 was useful, clear, and complete. Third, the number of components in an enterprise system does not automatically mean increased costs.
Microsoft is doing an outstanding job of providing “snap in” components, tools, and documentation. In our experience, Search Technologies’ engineers can move from concept to operational status in a short span of time.
The foregoing does not mean that SharePoint is easier or harder than any other enterprise software. SharePoint is a robust system, which when appropriately configured and provisioned, can deliver outstanding return on investment and an excellent user experience.
Encouragingly for us, we’re finding that SharePoint adoptees– especially the big ones–get the importance of great search functionality as a foundation of productivity across the application spectrum. Encouragingly for Microsoft, who paid $1.2 billion for a Norwegian search company a couple of years ago, Fast Search for SharePoint fits the bill very nicely. We currently have a dozen organizations using our Fast Search for SharePoint proof of concept service.
Iain Fletcher, May 20, 2011
Search Technologies
AtHoc Enhancements
May 20, 2011
IWCE’s Urgent Communications reveals another angle on findability in “AtHoc Introduces New Emergency-Notification Applications.” A purveyor of “network-centric emergency mass notification systems,” AtHoc is adding new mobile apps to its notification platform. These resources make the most of today’s mobile devices as well as broadband wireless networks. The write up said:
What we’ve done over the last several months is develop an extension to our technology that uses the data channels of mobile devices, like smartphones, to communicate and integrate into our system,’ [AtHoc CEO and President Guy] Miasnik said. ‘It provides a much more prominent notification to the end user, including certain alarming sounds and vibrations … that were not feasible up to now as a standalone application. In addition, the new IWSAlerts applications enable location tracking of users and allow them to respond to the message, confirming receipt, Miasnik said.
What a welcome tool!
It’s also an intriguing development to us here at Beyond Search. Search is often too slow; in fact, one has to know something before searching. This is push on steroids, where the information materializes to the people who need it.
What happens to the search vendors reinventing themselves as customer support solution providers? They should look at AtHoc-type methods as a possible complement. In my opinion, key words just don’t work when time is short. Bing, Google, you listenin’. Just askin’.
Cynthia Murrell, May 20, 2011
Freebie
Protected: SharePoint Users Are All About Going Social
May 20, 2011
Is SEO the Focus of the Google Inside Search Blog?
May 19, 2011
Well, no big surprise. Money generating advertising now means “search” at Google. The evidence was plentiful before I learned about Inside Search. The story with the not so surprising news was reported in the SEO centric Search Engine Land. Navigate to “Google’s Search Team (Finally) Gets Its Own Blog.”
What caught our in the official Google search blog was:
- Google has after 13 years decided to provide information about search. Well, not my kind of search. The flavor of the blog is more toward the SEO (search engine optimization) side of the world
- The redefinition of search to knowledge is not reflected in the content but Google says:The thirst for knowledge is as old as humanity. It’s only in the past decade that the Internet has made knowledge ubiquitous, and we want to help you find the answers you’re looking for, whether it’s the best price on a new microwave, where to find a great bike ride—or even information about the Internet itself.
- The decision to write about what was the firm’s core business is a surprising one. It begs the question, “Why is Google now trying to explain search?” Google is a verb and that verb means “search”, er, now knowledge. I will have to rewire my 66 year old brain quickly. No more “beyond search.” Now, I will have to think “beyond knowledge”. Sounds grand but sort of fuzzy. That’s the Generation Y and other alphabet cohorts I surmise.
Why the uptick in content outputs about search, er, I mean, knowledge? I catch a soupçon of smoke. The next Google quarterly report may contain a glowing ember.
Stephen E Arnold, May 19, 2011
Freebie