Data Analytics: Part 2 – Set Up the Model

Data Analytics: Part 2 – Set Up the Model

Introduction

This blog is the second in a series of 4 blogs I will produce once a month.  This series will include the following regarding Data Analysis:

In the last blog I touch on my approach to data integrity.  When touching data for the first time, it is critical that we review the data to the point that we feel confident in the source, the filters, and the export (table links, export settings and the content).  If this is a recurring export, then we can set-up rules to look for specific errors while we get familiar with the data.  This helps to reduce the time and risk data integrity issues.  Hence, this leads nicely into this next topic of how we set up the model.

 

Set up the model

For the purposes of this blog, I will assume set-up in Excel. I have used other software for larger data, but I find most can be handled in Excel.  

 

The target of communicating my specific formatting and structure is for you to figure out your own preferences. Here is my methodology:

  • Tabs: I am a believer in color coding and organization. This is VERY important when viewing a model, as there is so much information it is critical to see things by just looking (not just reading).  Therefore, the tabs are colored and organized as follows:
    • Colors:
      1. Black are imports or tabs that do not get touched other than to provide data to the model. Because the color black signifies ‘do not touch’, I also use that color for tabs I am pushing to the side for now and may later delete if no longer a value to the model.  Very important, these tabs are the raw data imported from the database.  We do not modify these, so you can always look back at the original data in case of erasure, or accidental changes in the analysis tabs.
      2. Yellow are working tabs, where formulas, manual inputs and all the analysis work is conducted.
      3. Orange are master file tabs used for changes to imports, or conversions, and also for data validation. These tabs are not changed often and really used to help translate the raw import data in the black tabs into the data needed in the yellow tabs.
      4. Red are for issues. These are where I copy/paste any data problem I find, or a subset of the data I want to quickly pull up.  When I am done with review, the red tabs are used to efficiently communicate issues to the organization for review, response and correction.
      5. Green are summaries including the graphs and totals, also the data validation results (through formulas in the orange tabs)
    • Naming: should be short and concise. I use abbreviations, sort of a code system of my making.  Should be easy for most to understand.  I keep these short because I like to select different tabs without needing to scroll the tabs and formulas between tabs are shorter.  Also makes it easier to identify without needing to read.
    • Order: starting from left to right, the tabs are organized green, yellow, orange, red then black. The least needed tabs are at the furthest right.
  • Black Tabs: These are simply the original downloaded data. The only points I want to make on these are:
    • Keep some past: by ending the tab name with a date ‘20231104’ and storing a few past loads in the workbook has been helpful for validations and trending. Seeing the changes in the data helps to validate and form conclusions.
    • Do not touch: I’ve said a few times, so here is once more…do not touch these tabs once loaded with the fresh data.
  • Working Tabs: The worksheets in the yellow tabs are organized to use freeze panes with far left including the master information. This includes the key data, like Employee ID or Product ID, along with enough information so when scrolling right then you understand what the entire line means.
    • Filters: I often use filters. It is an easy way to see a list of what is in the column, and I like to use along with background cell colors to filter issues.
    • Filtering Columns: There have been times when I use narrow columns to the far left for calculated result(s) I want to filter and also use for reporting categories. These can be numbers (like average or index) or a color code (like ‘G’reen, ‘Y’ellow, ‘R’ed) for thermal based reporting and filtering.  The header of these columns can be key indexes such as inventory stock, development phase, risks and statuses.
    • Single Row Header: I prefer single row header because of simplicity of handling sorts and filters, plus when freeze panes I want to see more data under the header.
    • Column Order: I try to organize the most relevant data from the left to the right. Summaries and most important information are the first to be seen.
    • Data Refresh: When using exported data in the black tabs, there are a couple ways I use to get the data into the working tabs:
      1. Copy/Paste: by setting up the working tab column identical to the import (black tabs), then after reviewing the data integrity you can just highlight the columns and copy/paste over to the working tab. The benefit of this method is when there are changes to the row count, then when pasting it shows where row formulas need to be retracted or extended.
      2. Formula: through use of formulas, pull the data from the black tabs and orange tabs. Note that when row counts change then must make sure the count matches.  I like to use a validation on this in the summary, just to be sure.
    • In case the data comes in ready to analyze, the yellow tabs are still used to create a copy of the data being reviewed. This is because you should always hold original data untouched, so if issue you can always verify.
  • Summary Tabs:
    • Export: A special kind of summary tab is where I will use the result of my work to create an export into another system, usually for reporting online or forecast modeling. In those cases, I run the data, review, then upload it into other software to report.  The data is plain and ready to upload.
    • Report: I use pivot tables, graphs, and manual tables to show the information. I color code when possible and use call outs to mark significate results.  If the Excel workbook is creating a final summary, then I create a ‘dashboard’ that perfectly copies into a PowerPoint.  The summary should tell a story to the reader, where the picture tells the story and the critical points are seen immediately.
    • Validation: This tab can be in the summary, not in print range, or in it’s own tab. Sometimes I have an ‘Input’ tab for system generated totals and data for validation, and I placed the validation in there.  Wherever you decide to place it, this is an area with colors to show if certain validations come back true/false.  Validation should check between master and transaction, between the original data compared to past data, or within the working sheets, and also between system totals vs your workbook totals.  Check totals, counts, trends and ranges.  Anything that gets a calculation, if it still totals to something else, then check the match.  This tab should show all ‘green’ lights if the data matches.

 

This is all I can think of on this topic.  Next we get into how I analyze the data.  If you have other methods for set up of a model, please share.  Have a great day!

My approach is based on my experience, what is yours?  I write these blogs in hopes that it may help others.  I prefer to blog what I have learned from experience, not just a rewrite of what I found on the internet.  I also prefer collaboration over critique, so please comment only if you have something constructive to add. 

 

As always, Contact Us to learn more!

About the Author

Travis King’s educational background is a dual degree in Accounting and Finance from University of Arizona, along with an MBA from UC Irvine.  His operational background is a mix of management in Accounting, Finance, Operations, Program Management, Design and Corporate Strategy.  He has worked in beverage distribution, defense, automotive, software and recreation industries.  He has led projects in forecasting financials, running market data, massive software migrations to SAP, reporting, budgeting, MRP and conducting data integrity audits. 
Back to blog

Leave a comment

Please note, comments need to be approved before they are published.