Join OfficeTipsAndMethods

Have some suggestions to share with the world?

Join OfficeTipsAndMethods and share your insight in a Comment. You must be a member to comment and all comments are moderated before being published. Membership is free and we would never share your personal information with anyone.

Become an OTM Member

Log In

YouTube Video Channel
Article Categories

Posts Tagged ‘Normalization’

Working with Joins

It is important that tables and queries that are the data source of a query be correctly joined in the query design. In some ways a query join is similar to a relationship. The most important distinction is that a query join is temporary. In fact, a query join can temporarily over-ride a permanent relationship.

In Access, there are three join types, affectionately known as Type 1, Type 2, and Type 3. In SQL these joins are more commonly referred to as Equi-or Inner Join , Left Outer Join, and Right Outer Join, respectively. Equi-joins are the default in the Access query designer; if you want a left, or right, outer join you have to modify the join type. In addition to these three join types, there is a fourth, the Cartesian join, which usually happens by mistake and yields results that, most often are far different from what you may be expecting.

Let’s put the theory in more understandable terms with some examples. In the sample database (download from here if you haven’t already done so), are two tables, tblRegions, and tjxLocations. In the database design there is already a relationship between these tables but we are going to override the relationship and set up temporary joins for this demonstration. In this particular example, we know the city and country of every location but we do not always know a city’s region.

When you first create a query using these two tables, the query diagram will include a join based on their relationship. To experiment with join types then, the first step is to right-click on the join line and edit the Join Properties.

Edit Join



Inner Joins

An Inner Join (Type 1) returns records only when there are matching records in both tables. The diagram looks like this:


The query returns four records.


Right Joins

A right join will return all records from the primary table, whether or not there is a matching record in the related (or foreign) table. So, in this example, the query returns one more record than the inner join did. The region for City 2 is not in the table.



Left Joins

A left join returns all records from the foreign table and only matching records from the primary table. In this case, the query result has the same number of records a the inner join example.



But what about the cities and countries, you might ask? We simply have to include two additional tables in the query source, tblCities and tblCountries. For both of these tables an inner join will be adequate because our data includes city and country data for every employee.

Including More Data in the Query

With a query open in design view additional tables and queries can be added to it by right-clicking the the field list area and clicking on Show Table. The required tables and queries can also be simply dragged for the database window or navigation pane depending on your Access version.

Here is an example that will return the city, region and country name for each employee:



Another approach is to create an intermediate query that gathers the city, country, and region names using a query like this (qselLocations_Extended)



Then this query is included in the employee locations query:


This is the result (which is exactly the same result you get when you include tjxLocations and the city, region, and country tables):


Cartesian Joins

A cartesian join is a special case where no explicit join is defined between a pair of tables. There are some special cases where a cartesian join is called for. However, most often the presence of a cartesian join in a query is simply an error. Depending on how many records are in the source tables, a cartesian join can return an extremely large number of records. Multiply the number of records in the first table by the number of records in the second table. The result will by the record count of the query output.

For example, there are two records in the Regions table and five records in the Locations table.



Creating a Multiple Table Select Query

In a properly normalized database, multiple table queries are the rule rather than the exception. A single table will seldom have all the information you need so it may be necessary to combine two, three, or more tables into a single query.

Changes to the Database

The demonstration database has been updated and is available for download from here.

The database now contains, among other things, a normalized employee table and additional tables that a normalized structure requires. All tables now have primary keys and relationships have been established between all the tables.

A Simple ExampletblEmployees_FieldList

Let’s start by looking at a simple query on the Employees Table. Here is the table’s field list. The table contains most of the essential information that we want to keep about each employee except the employee’s actual name and job title.

So, if we were to create a query just using the Employees Table, this is what it would look like when we view it in datasheet view.



It’s pretty difficult to know to which employee the Home Phone, Extension, and Notes refer. The two fields, PersonID and JobTitleID are referred to as foreign keys because they indicate the record in another table that has related information. We can get that information by including the required table to the query.


By including the People Table in the query design, we can now display the correct name for each employee record:


Because a relationship exists between People and Employees, the two tables are correctly connected (or joined) as soon as the People Table is added to the query design. If we wanted to know the employee’s job title in this example, we would also have to include the Job Titles table in the query design.

Incidentally, you can add tables and queries to a query design by dragging the required object from the Database Window (Access 2003 and earlier) or Navigation Pane (Access 2007/2010) or by right-clicking in the field list area in query design view and selecting Show Table to open the Show Table dialogue.

Create an Interleaved Report Part 3 – The Report

The database development process is straightforward but can onCartoonly be efficiently carried out if the developer follows a set sequence of steps. The order of the steps is important because each step depends on the step before it. This series of articles has illustrated this sequence. Part 1 described the results of the data analysis step where we need to gain a full understanding of the data available to meet the client’s requirement. Of course, the design process illustrated here itself depend on a complete and accurate understanding of the client’s needs.     

Even the best design in the world sometimes needs some revisions and tweaking when we start to develop queries and reports. Feedback from the Output phase of the process sometimes sends us back to revise and refine some aspect of the design.


Design Process Diagram


The query at the end of part 2 is a good example of this. As is, that query can only provide the id of a representative, not the representative’s actual name. So we need to add a another table, tblHumanResources  to the query.  This change enables the query to provide representative names for the report.

Final Query Diagray Rather than including the three representative name fields in the query grid, the first column of the query is defined by a calculation: Representative: [hrlname] & ", " & [HrFName] & " " & [hrmname].

All that remains now is to design the actual report.

Report Design Graphic

Building the detail section of the report is simply a matter of dragging and dropping fields form the field list in the correct position. The naming convention described in part 2 helps ensure that the fields are situated in correct product column and weekday Sales or Quotation  row. While the field names obscure the actual product name.

Because there are multiple controls for each product in two different categories, individual addition expressions must be created in order to display weekly totals for both Sales and Quotations. For example the expression for the weekly sales total for Widgets is:


Here again, the naming convention make it easy to create the expressions for each product without a lot of typing. After creating the first control (for product 1 – widgets) and typing the expression as its data source, simply make five copies. Edit the data source for each product total in turn and replace the ‘1’ with ‘2’, ‘3’, ‘4’, ‘5’, or ‘6’, depending on the product. This technique will also work for the Quotations totals.

After adding all the controls, adding appropriate labels then formatting and aligning them, and setting up report grouping on Representative, this is what the final report looks like:

Report Preview Graphic

You should keep in mind that, because queries are limited to a maximum of 255 fields, this approach will only work if 255 or fewer ‘cells’ required in the report details. This report has 60 ‘cell’ (6 products x 5 days x 2 categories) so it is will within the limit. 

Create an Interleaved Report Part 2 – The Data

In this article I will focus on the data structure behind the interleaved report. The data  illustrated is only a small subset of what you would find in a typical full-scale application. In order to produce the report described in  part 1, we need to know:

  • who made the sale or quotation CaveExec cartoon
  • what product was involved
  • was the action a sale or a quotation?
  • when did it happened

These descriptions suggest the need for certain tables:

  • salespeople
  • product lines
  • sales activity
  • sales activity types

Here’s how the tables are organized in the database:


Obviously this is a minimal structure designed to illustrate what’s needed for the reports. The Human Resource table has only three name fields whereas in a real application much more information about people would be required.

The data is normalized to the third normal form. The problem here is to create a report from a flattened version of the data. The client’s requirements in part one specified that the report require one column for each product group sales on one row followed by a similar row or product group quotations. Keeping in mind also that the desired report should have some vertical space between each pair or rows and that there be one pair of rows for each day of the week, it should be apparent that a simple datasheet or even a continuous report will not meet these requirements.

Before designing the actual report it is first necessary to flatten the data. Crosstab queries are specifically designed to do exactly that. The data we will need in the query includes the sale representative, the date, the type of action, and the number of times the action happened on each day. So the basic crosstab design will look like this:

Basic CrossTab

Notice that the query has included a field to restrict the results to a particular time frame (July 26 – 30, 2010). In this example the criterion has been hard-coded. In a working database the query would most likely refer to a user-specified time frame from a form or memory value.

What’s missing from this query, however, is any sort of specification of the activity type (sale or quotation) and the relevant day of the week. To further refine the specification, it is necessary to add two additional Where fields to the query.

WheresActually, in order to report on each product for each day of the week, a total of ten queries will be needed (5 days x 2 action types.) Before creating these queries, however, it will speed things up greatly if you specify column headings in the  queries properties. (Remember that cross-tab data has to have pre-defined column headings if the data is to be used in a report.

Crosstab PropertiesThis basic query is only needed as a model for the queries that will ultimately be used in the report. By setting the query up with as much generic detail as possible, you can quickly create the ten queries simply by modifying the product and day of week criteria and using save as repeatedly for each set of criteria.  Subsequent work with the queries will be much easier if you use a consistent naming style for these day/product queries.

The names used in the example database are:

Sales queries: Monday to Friday

  • qxtbActivity_SA_MO
  • qxtbActivity_SA_TU
  • qxtbActivity_SA_WE
  • qxtbActivity_SA_TH
  • qxtbActivity_SA_FR

Quotation Queries: Monday to Friday


  • qxtbActivity_QU_MO
  • qxtbActivity_QU_TU
  • qxtbActivity_QU_WE

  • qxtbActivity_QU_TH

  • qxtbActivity_QU_FR

Notice how these names follow a consistent pattern. The names all begin with qxtbActivity, indicating that the query is a crosstab of the Activity tab. The next two characters after the first underscore indicates whether the query is retrieving sales or quotation data. The final two indicators indicate the weekday that data represents. Using consistent naming patterns like this simplifies both the creation and identification of the queries. For example, you can use a text editor to search and replace the distinguishing characters to create the other queries in the set.

These queries will return the sales and quotation date for each sales rep for each day on which each rep, respectively, had activity. However, in the unlikely event that a rep had no sales or no quotation activity, that rep will be missing from the days sales or quotation data or both. What we really need is for the rep to be included in the query output with zeros for each product column.

What we need is a list of sales reps who were active in the reporting period. A simple query of representatives in the activity table grouped by rep with a criteria limiting the date range will do the trick. The SQL of the query is:

SELECT tblActivity.Rep
FROM tblActivity
WHERE (((tblActivity.ActDate) Between #7/26/2010# And #7/30/2010#))
GROUP BY tblActivity.Rep;


Now this query can be joined to each of the crosstabs to produce a set of ten selection queries. These queries all follow a similar pattern and use an outer join to ensure that all representatives will be included whether or not they have sales.

Select QueryIn these queries, the product names have been “aliased” so that they can be distinguished as to sales or quotations on a day by day. Again a consistent naming pattern helps speed up the creation process. For example, QUPr1_MO is the alias for Monday’s Widget quotations and SAPr1_MO is the alias for Monday’s Widget sales. The Pr1 simply indicates that this is the first field of products which is widget data. The queries themselves are names “qsel” followed by the part of the crosstab name after “qxtb”

For example, the query in the diagram above is named qselActivity_QU_MO, indicating that it will provide data for the Monday column of the report.

We need one more query to give us the data that the report will display. Think of this as an umbrella query that will combine the results of the ten queries just created into a single query:

Final Query

The order of the sub queries doesn’t matter much but it will be easier to work with query when designing the report if the field list are arranged left to right, Monday to Friday. Each table should be joined to the table on its right by the Rep field. In this diagram the field list are arranged in two rows simply to make it easier to display in this article. In the actual query, qselActivity_QU_MO is to the right of qselActivity_SA_FR.

The final query has 61 columns – rep plus one column for each of the SAPr… and QUPr… fields.

In the final part of this series we will see how to display this query data in a grid format grouped by representative.

Create an Inter-leaved Report From Normalized Data Part 1

The client’s requirements were quite specific. They wanted a report that would show data from two data sources on alternating lines. Each pair of lines had to be separated from the next pair by a small vertical space. The first row of each pair was to report product sales, the second row, product quotations, for each of six distinct product lines. Simple but time-consuming to do in a spreadsheet. Somewhat complicated to setup but simple to produce as an Access report. Here’s a mock-up of the report done in Excel. Report Prototype

Excel or Access?

This question comes up quite regularly. Which would be better for the job I have to do? Truth be known the question is actually miss-stated. You will get a more accurate answer if you ask, “Is this a spreadsheet or a data management task?” Regardless, the question deserves much more in depth treatment than is possible here but the short answer can be found by clearly identifying the job you have to do. Do you simply need to ‘crunch numbers,’ (analyze data) or do you need to collect and manage data on an ongoing basis? Data management in Excel, or any other spreadsheet application, for that matter, quickly becomes cumbersome and error prone. Excel is a tool for analyzing numerical data. Access, on the other hand is a Database Manager.


In the organization requesting the report, sales people generate quotations and sales for each of the company’s six main product lines. (In case you haven’t guessed, the product lines here have fictitious names.) The client wanted a weekly activity report that showed the number of quotations and sale for each salesperson for each product. What clinched the decision that this report needed a data management solution and not a spreadsheet solution was that client wanted other, similar reports, that summarized data over time. In order to produce such on-going reports, sales and quotation histories had to be maintained. Choosing the best data manager for the job is a whole other decision process. Suffice it to say that, in this case, Access was determined to be adequate for the client’s business volume. As you may or may not know, relational database management systems (RDMS) work best if the data is properly normalized. The mock-up above shows the data laid out in what some refer to as a flat-file format. The layout seems natural and easy to understand and that is how most people envision their data should be set up in a database. Sadly, therein lies a trap however. Flat file organization is very inefficient and error prone, in part because there is a lot of repetition of data values. One of the goals or normalized tables, on the other hand, is the elimination of data repetition. In short, there is a simple rule of thumb that applies here, To get the report right, first get the data right.

In part 2, I will illustrate the normalized structure for the data used in the report.