Login

Whether we like it or not, many if not most of us developers either regularly work with databases or may have to work with one someday. And considering the amount of misuse and abuse in the wild, and the volume of database-related questions that come up every day, it's fair to say that there are certain concepts that developers should know - even if they don't design or work with databases today.

What is one important concept that developers and other software professionals ought to know about databases?

Every developer should know that this is false: "Profiling a database operation is completely different from profiling code."

There is a clear Big-O in the traditional sense. When you do an `EXPLAIN PLAN` (or the equivalent) you're seeing the algorithm. Some algorithms involve nested loops and are **O**( *n* ^ 2 ). Other algorithms involve B-tree lookups and are **O**( *n* log *n* ).

This is very, very serious. It's central to understanding why indexes matter. It's central to understanding the speed-normalization-denormalization tradeoffs. It's central to understanding why a data warehouse uses a star-schema which is not normalized for transactional updates.

If you're unclear on the algorithm being used do the following. Stop. Explain the Query Execution plan. Adjust indexes accordingly.

Also, the corollary: More Indexes are Not Better.

Sometimes an index focused on one operation will slow other operations down. Depending on the ratio of the two operations, adding an index may have good effects, no overall impact, or be detrimental to overall performance.

First, developers need to understand that there is something to know about databases. They're not just magic devices where you put in the SQL and get out result sets, but rather very complicated pieces of software with their own logic and quirks.

Second, that there are different database setups for different purposes. You do not want a developer making historical reports off an on-line transactional database if there's a data warehouse available.

Third, developers need to understand basic SQL, including joins.

Past this, it depends on how closely the developers are involved. I've worked in jobs where I was developer and de facto DBA, where the DBAs were just down the aisle, and where the DBAs are off in their own area. (I dislike the third.) Assuming the developers are involved in database design:

They need to understand basic normalization, at least the first three normal forms. Anything beyond that, get a DBA. For those with any experience with US courtrooms (and random television shows count here), there's the mnemonic "Depend on the key, the whole key, and nothing but the key, so help you Codd."

They need to have a clue about indexes, by which I mean they should have some idea what indexes they need and how they're likely to affect performance. This means not having useless indices, but not being afraid to add them to assist queries. Anything further (like the balance) should be left for the DBA.

They need to understand the need for data integrity, and be able to point to where they're verifying the data and what they're doing if they find problems. This doesn't have to be in the database (where it will be difficult to issue a meaningful error message for the user), but has to be somewhere.

They should have the basic knowledge of how to get a plan, and how to read it in general (at least enough to tell whether the algorithms are efficient or not).

They should know vaguely what a trigger is, what a view is, and that it's possible to partition pieces of databases. They don't need any sort of details, but they need to know to ask the DBA about these things.

They should of course know not to meddle with production data, or production code, or anything like that, and they should know that all source code goes into a VCS.

I've doubtless forgotten something, but the average developer need not be a DBA, provided there is a real DBA at hand.

For some projects, and Object-Oriented model is better.

For other projects, a Relational model is better.

<h2>Normalization</h2>
It always depresses me to see somebody struggling to write an excessively complicated query that would have been completely straightforward with a normalized design ("Show me total sales per region.").

If you understand this at the outset and design accordingly, you'll save yourself a lot of pain later. It's easy to denormalize for performance after you've normalized; it's not so easy to normalize a database that wasn't designed that way from the start.

At the very least, you should know what 3NF is and how to get there. With most transactional databases, this is a very good balance between making queries easy to write and maintaining good performance.

<h2>Basic Indexing</h2>

I'm always shocked to see a table or an entire database with no indexes, or arbitrary/useless indexes. Even if you're not *designing* the database and just have to write some queries, it's still vital to understand, at a minimum:

* What's indexed in your database and what's not:
* The difference between types of scans, how they're chosen, and how the way you write a query can influence that choice;
* The concept of coverage (why you shouldn't just write `SELECT *`);
* The difference between a clustered and non-clustered index;
* Why more/bigger indexes are not necessarily better;
* Why you should try to avoid wrapping filter columns in functions.

Designers should also be aware of common index anti-patterns, for example:

* The Access anti-pattern (indexing every column, one by one)
* The Catch-All anti-pattern (one massive index on all or most columns, apparently created under the mistaken impression that it would speed up every conceivable query involving any of those columns).

The quality of a database's indexing - and whether or not you take advantage of it with the queries you write - accounts for *by far* the most significant chunk of performance. 9 out of 10 questions posted on SO and other forums complaining about poor performance invariably turn out to be due to poor indexing or a non-sargable expression.

I just want to point out an observation - that is that it seems that the majority of responses assume database is interchangeable with relational databases. There are also object databases, flat file databases. It is important to asses the needs of the of the software project at hand. From a programmer perspective the database decision can be delayed until later. Data modeling on the other hand can be achieved early on and lead to much success.

I think data modeling is a key component and is a relatively old concept yet it is one that has been forgotten by many in the software industry. Data modeling, especially conceptual modeling, can reveal the functional behavior of a system and can be relied on as a road map for development.

On the other hand, the type of database required can be determined based on many different factors to include environment, user volume, and available local hardware such as harddrive space.

Evolutionary Database Design.

[To see links please register here]

These agile methodologies make database change process manageable, predictable and testable.

Developers should know, what it takes to refactor a production database in terms of version control, continious integration and automated testing.

Evolutionary Database Design process has administrative aspects, for example a column is to be dropped after some life time period in all databases of this codebase.

At least know, that Database Refactoring concept and methodologies exist.

[To see links please register here]

Classification and process description makes it possible to implement tooling for these refactorings too.

Avoiding [SQL][1] [injection][2] and how to secure your database

[1]:

[To see links please register here]

[2]:

[To see links please register here]

I think every developer should understand that **databases require a different paradigm**.

When writing a query to get at your data, a set-based approach is needed. Many people with an interative background struggle with this. And yet, when they embrace it, they can achieve far better results, even though the solution may not be the one that first presented itself in their iterative-focussed minds.

allnight51

atrium532

foursomet

autophon712673

buckwheater843343

weavedi

Siraislynngr

Mrgarnetjdhszfqch

trumpetless318566

number555