Database 101 - The Art of Gathering Requirements
A client (if you're self-employed) or a Business Analyst (if you're employed) approaches you and demands "We have a new project that we need you to create a database for". In the conversations below, I will show you how I approach the process of requirements gathering. I am by no means an expert and I learn new things and correct my old mistakes daily, so if you have better suggestions or see mistakes, please feel free to reach out and let me know. I love to learn new things all the time. Lines in blue/italics are their statements, while lines in black regular are mine.
Part 1
We have to store geo political data for some of our states
By our states, do you mean the USA only?
Well, for now, yes.
So in future you do see this project growing to other countries?
hmmm, probably Canada and Mexico
How about Britain or France or Brazil or ..
You know what, let's assume this will cover the entire world at some point if it goes well
This type of conversation represents many that I've had with BAs and clients over my 15 years of dealing with various databases. You have to assume that they have a good idea, but haven't fully fleshed it out yet. You can deal with this scenario in two ways.
1 - tell them to come back with proper requirements or
2 - guide them so you become integral to the process and before you know it, you will be invaluable to the firm.
2 - guide them so you become integral to the process and before you know it, you will be invaluable to the firm.
Based on the above discussion, what have we learned? Well, we need to model it for all countries, even if they only start with the USA. So we will add a Countries table, and while we're at it, we'll also go one level up and add a Continents level. At some point, they will come to you and ask for a report of the number of something, broken down by Continent and if you have the info already, you can laugh at them as you type that 4 line query and amaze them.
Part 2
What kind of data are we going to store?
Like I said, geo-political, so past and present presidents, governors, sheriffs, etc
(Sigh), yes, but is it just text, or will you be storing small or large images, video clips, PDFs?
Ooohh, I hadn't thought about that, I think those are great ideas. Is that difficult to implement?
Not particularly, but if we want to, it's best to plan for that now
Then let's say yes to all of them. I can see how we store flag icons, photos of politicians, video clips
What did we just learn - we may need to store files - preferably somewhere in Azure BLOB storage or Amazon Simple Storage Service (S3) or some equivalent on-prem or (worst case) as Large Objects (LOB) in the database. The scope of the project just ballooned. You may ask - why did you even bring it up? If you'd kept your mouth shut, we wouldn't have this issue. Let me tell you why - if you don't ask now, 3 months after the project is complete and the product released, they will come to you and tell you exactly this. So I prefer to save the future pain by dealing with it now.
Part 3
You mentioned geo, what kind of geographic information will we store?
We're probably going to use publicly available datasets, maybe TIGER data for the US and anything similar we can find for other countries when we get there. You know, data of country and state boundaries.
Ok, so you mean proper geographic data
Yes
How fine will the data be?
What does that mean?
Will state be the finest resolution of data we will store?
Well, we will have county data for the US
And cities?
Just point data, since we just need to show city markers on our map-based reports
I try to remember to use the word we and not you, to build trust and you'll be surprised at how often this changes the tone of discussions. It's not a me vs you thing, it's a "we're doing this together", when you keep using "we". And in the above part of the convo, we learned that we will need Geographic Information System (GIS) support. This means different things in different databases, and we will deal with the details in a future article.
The conversation goes on till I am satisfied with the requirements. Then, I type the gist of it up, and send it to the client/BA, to confirm that I understood their requirements correctly. There will usually be a few more meetings or emails that will change things here and there. For the next part, let's assume these are our final requirements, agreed to, by client/BA.
Final Requirements
- Store geographic data, tabular data and files
- Entities will be Cities (points), Counties/States/Countries/Continents (polygons or multipolygons)
- Final list of properties of each entity to be stored in the database
Step 2 Create a diagram to represent our tables and ensure the structure fits our requirements
Note, avoid reserved words for table and column names to avoid headaches. Use this or other resources to find the reserved words for the database you are working on. Yes, you can quote them or put them in square brackets and such, but why bother?
This would be my draft 1 ERD, which then gets reviewed by a team member, who asks me question after question and if they are not satisfied, we discuss and refine the ERD into draft 2. Once the reviewers and I are happy with the result, we then discuss the features one final time with the client/BA, to ensure that requirements did not change in between the last call and now. Once everyone is on board, we write the Data Definition Language (DDL) for these tables, in whatever RDBMS we are working on. To see a larger version, click here.
In the next article, we'll work on the DDL for these tables in Oracle, SQL Server and PostgreSQL. Because, why learn just one thing when you can learn three in one go. They're all similar to some extent. Hope you enjoyed this 101 session. If you notice mistakes, or have better suggestions, please leave a comment so that I can learn as well. Thank you.
Do the red connectors in the ERD mean Cascade Delete = Yes?
ReplyDeleteYou are correct. And the diagram shows referential action at the end of each connector, on the child table (update restrict or update cascade / delete restrict or cascade)
Delete