Monday, October 3, 2016

Database Basics for Crime Analysts

I originally wrote this post for a software company’s blog in 2014. This company was bought out and recently their blog and website have been removed from the web permanently. I am reposting it here for posterity. 

One thing that is often daunting for a beginning crime analyst is understanding how databases work. Often times, just the thought of working directly with a database will cause a new analyst's eyes to glaze over. In reality though it's not that hard if you just grasp a few concepts.

So just what is a database?

A database is where data is stored for later retrieval. A recipe box full of index cards is a database. In this case, the data is instructions on how to prepare a dish such as a casserole or Granny's pecan pie.

See, nothing intimidating about this so far. Let's take this recipe box analogy a little farther.

Each recipe card is a "record". A record contains all the data needed to complete the recipe. If I write the title of the recipe on the top of each card this title is a "field". If I also have a place on the card for the number of servings, this also is a field. In this analogy we'd also have fields for ingredients, fields for the preparation instructions and field for oven temperature, etc.

Now in this example, our database is called a "flat file" database. What this means is that all the data needed for the recipe is stored in one record. Flat file databases have their use especially where the data store in each record is limited or simple. An address book application on your computer is one example.

However, if your database is going to be complex, like a police records management system (RMS) for example, it's useful to segregate data into separate tables. A table is just a group of records within a database. A relational database relates records in different tables to each other by a common field that is shared by records in different tables.

The reasons for a relational database is it is more flexible and efficient. For instance a police RMS could store information about vehicles in a vehicles table. That information could then be used by both a traffic accident module and an incident module. The data would only have to be stored once but could be used by more than one module.

Let's apply this to our recipe box analogy. If we created a separate table in our database for cooking technique, for example; grilling, frying, or poaching, then we would not need to rewrite these detailed instructions on each recipe that called for these techniques. Our recipe cards could just refer to the record in the techniques table that had those instructions.

In our recipe box analogy, a record in the recipe table could contain a field for technique and store the value of "grilling".  A record in the technique table would also have the value "grilling". These related fields are the "key". When you retrieved the record from the recipe table for grilled salmon, it would use the key field to also pull the appropriate record with the matching key from the technique table. The whole database only has to store the instructions for grilling once in the database even if a hundred different recipes called for that technique.

All this is well and good but why is it important to know how a database works?

Most crime analysts will access the information stored in their RMS though the RMS application. But there are real advantages to being able to access the data directly. For example, in the RMS I use at my department, there are several screens where various bits of data are captured and stored. However, the canned reports that come with the RMS don't display those bits when you run the report. The data is stored in the database but the report isn't configured to display it. If you can directly access the database, you can use other tools to get at this data.

When you start working with a complex database with data stored across multiple tables you need a way to know where this data is stored. Most RMS systems or other applications have a document called a "schema" that is a roadmap to your data. This schema is important if you are going to use outside tools to create reports or extract data.

At my department I have direct read only access to the database using a technology called OBDC. This allows me to access the data in our RMS using reporting tools such as Crystal Reports or to import the data into a tool like Microsoft Excel or ArcGIS for further analysis. These tools greatly expand what I can do with the information contained in my RMS database.

Now the mechanics of using OBDC to access a database directly is beyond the scope of this post. However, I can tell you it's not as difficult as you might think it is. Your database administrator can likely help teach you how to do this and walk you through some of these techniques. Once you learn how, it can really improve your ability to analyze the information contained in your RMS. It's well worth the time and effort to learn.

No comments:

Post a Comment

I reserve the right to remove defamatory, libelous, inappropriate or otherwise stupid comments. If you are a spammer or are link baiting in the comments, a pox be upon you. The same goes for people trying to sell stuff. Your comment will be deleted without mercy.