cancel
Showing results for 
Search instead for 
Did you mean: 

Handling tables with the same schema in the same database

Former Member
0 Kudos

We have a series of tables that are denormailzed or flattened and used for reporting. Our goal is to denormalize these tables and put them into a dimensional model, i.e. the table is split into fact and dimensioanl tables. As an fyi there are repeating groups within this table.

The problem I em ecountering is that we have dozens of tables that are identical, same DDL, but handle data for different clients. Each client's data gets put into one of these tables. What we would like to do is write jobs/work flows/data flows once for a table and use it may times on every identical table. Each table has a different name and is located in the same database.

An example would be tbl_client1 and tbl_client2 tables each are located in a database. I'd like to write a job, i.e. job_split_clients_into_dimensional_model once and use it many times against each table. I thought about using Profiles and System_Configurations but they seem handle switching databases used and not using the same database but different table names.

How would you handle this situation?

Edited by: Richard Sherman on Sep 15, 2008 9:02 AM

Accepted Solutions (1)

Accepted Solutions (1)

werner_daehn
Active Contributor
0 Kudos

I think you both had the right idea already...

You could use datastore configurations, but then either the database or the owner has to be different and the table name has to remain the same. If you create a database view called "customer" which is defined as "create view customer as select * from customer_tbl_1" for each owner you would have one table, well view but that's the same, you can import and build the dataflows for and via datastore configurations you can do the switch between them.

That obviously would lead tothe next question, why not recreating the views via a script at the job start? So the job starts, in a parameter you will tell you are interested in "_tbl1" and hence the job will recreate all views to point to the _tbl1 source tables. Even better, you encapsule the create-view statements into a database stored procedure you call from the DI script.

The other option is to use partitioning. Well, actually, your table is partitioned already, what you lack is the view over all tables at once. So you create a new database view:

create view customer as

select *, 1 as partition_no from customer_tbl_1

union all

select *, 2 as partition_no from customer_tbl_2

union all

select *, 3 as partition_no from customer_tbl_3

union all

select *, 4 as partition_no from customer_tbl_4

union all

select *, ... as partition_no from customer_tbl_...

and read from there. In case there is no indication, the view could have an extra column like I added it.

The union_all (! not just union) is an operation that does not cost anything. And depending on the optimizer capabilities (for Oracle I know it would) a query like "select * from customer where partition_no=1" would be as fast as selecting the customer_tbl_1 directly.

Former Member
0 Kudos

Thanks, I'll try these approaches.

Answers (1)

Answers (1)

Former Member
0 Kudos

What database RDBMS are you using?

In Oracle databases I would use a single partitioned table and put all the data for all clients into that, with a partition on each client for improving performance of queries. That way there is only the one table to query which makes life easier wherever the data is required.

Former Member
0 Kudos

Giles,

The database is SQL Server 2005.

Excellent idea but we are not able to alter the systems of record (SOR) in this situation. We are working with a client who is using multiple tables, i.e. tbl_client1, tbl_client2, etc., as well as many other tables, files and applications to provide services to its own clients. It would have been great if they had implemented their source systems in partitions as you suggested rather than "cloned" tables in the same database. My guess is that they were used to SQL Server 2000 which did not have partitions so the idea of cloning tables was the techique they were used to.

Anyways, our situation is how do we build a DW using one set of jobs/work flows/data flows for a tbl_clientxx that we can run against tbl_client1, tbl_clinet2, etc. without rewriting code in the same database.

Thanks,

Rick

Edited by: Richard Sherman on Sep 15, 2008 10:43 AM