How to approach your Business Intelligence project
If you are reading this then you most likely have some interest in creating or improving your business intelligence solution for your business - or your client's business. We are somewhat biased as a company in believing that any business with even moderate levels of data is missing out on a competitive edge by not employing some level of business intelligence technology. The internet is awash with all the reasons why this is true. This article is to set the scene for the fundamental thought processes on selecting and securing the data that should be brought into a business intelligence solution.
Beware Direct Connect
If you are in the market for a Business Intelligence or reporting tool for your business, you need to beware of the quick fix solution providers. A lot of vendors and solution providers will over simplify the architecture of a Business Intelligence solution and will often suggest that you simply connect your BI tool to your operational database. It is true that this is quick and inexpensive initially. But there is a great model we always remind clients of: Cheap, Fast, Good - pick any two because you cannot have all three!
How you connect the BI system to other databases and other systems is a key consideration. There is a huge awareness growing of BI tools within the general business user community. Microsoft Power BI for example offers anyone with Excel charting and reporting skills the opportunity to create their own reports for their specific KPIs. This is a great thing for your business, but do not take the easy path of 'just' connecting their desktop Power BI to the operational database. Always consider the worst case scenario when your business has a data breach. After informing the data protection authoritises and the data subjects impacted, the next priority will be identifying the source of the breach. the less systems that contain Personal Identifiable Information (PII) and the less people that have access to that PII, the better off you will be.
Connecting directly to a production source of data carries a lot of risk. The first is of course operational - analytical queries and platforms such as Power BI expect to work with Analytical (OLAP) style data structures rather than transactional (OLTP or RDBMS). When you create 'cubes', measures and calculated columns within tools like Power BI, behind the scenes a series of complex queries will be run. This is particularly dangerous with a direct connect scenario. It also requires more inbound firewall rules to be created on your production systems.
When you connect Power BI to a data store, you will use credentials - SSO, OAuth, ODBC etc. The first line of defence is insuring that the login used by Power BI - and by extension it's users - only has Read Only access to the data needed to fulfill the reporting. How you secure these credentials is of course critical and a solid process around how to store and share these should be documented and managed very carefully. A systems type login should be used rather than an individual's login in the same way that any systems integration credential would be managed.
We advise clients to always load the required data from the production, operational data stores into a centralised reporting database or data warehouse. From there you can connect reporting and BI tools such as Power BI.
Personal Identifiable Information (PII)
Never include Personal Identifiable Information within your Business Intelligence database unless it is absolutely necessary and it has been thought through in great detail. BI by it's nature is statistical - i.e. discover trends and anomalies in business performance by viewing the data in many different ways, segments, time spans, geographies etc. Why would you ever need a person's name in such analysis? Why do you need an email address? Why a credit card number? The easy solution of course is to load up everything 'just in case' but 'just in case' will not cut it as an excuse if your data warehouse is breached and all of your customer data is exposed for no reasons whatsoever.
There are valid scenarios when some PII is required in the reporting system. The system may be used for segmenting customers for marketing purposes. Ideally the pseudonymization of such data can occur which allows the reporting system to identify the profiles of each customer but does not store the actual contact information. Instead it would send the data exports back to the controlling system that would match the profile ID to the customer PII and send the communication from there. In the event that the marketing system and reporting system are one and the same, then the very least you should try to achieve is using different access accounts for BI purposes that does not have access to the tables that contain the PII.
Once you include any PII data in the flow of data to the reporting or BI system, that system is then in scope for all data protection (GDPR) regulations.
Top Down and Bottom Up
You should first identify what types of data are available from within your organisation and indeed external to your organisation. This bottom up analysis will help you list out what will most likely be a vast amount of data from multiple systems - sales pipeline, market intelligence, population demographics, accounting, HR, logistics, marketing billing etc. Then go Top Down and do some blue sky thinking on the kind of reports and key performance indicators that wuld add value to your business. You most likely have some reports being created in Excel and/or Powerpoint by your team. The goal of a BI solution should be to improve these reports dramatically and to automate their creation and distribution. Business Intelligence is fundamentally different to reports in that the system should allow the user to slice and dice data in an interactive way to discover new key insights and answer questions that were not originally asked.
Once you have gone through the above thought process you will be in a better position to discuss possible solutions with suppliers.