Handling tables with the same schema in the same database
We have a series of tables that are denormailzed or flattened and used for reporting. Our goal is to denormalize these tables and put them into a dimensional model, i.e. the table is split into fact and dimensioanl tables. As an fyi there are repeating groups within this table.
The problem I em ecountering is that we have dozens of tables that are identical, same DDL, but handle data for different clients. Each client's data gets put into one of these tables. What we would like to do is write jobs/work flows/data flows once for a table and use it may times on every identical table. Each table has a different name and is located in the same database.
An example would be tbl_client1 and tbl_client2 tables each are located in a database. I'd like to write a job, i.e. job_split_clients_into_dimensional_model once and use it many times against each table. I thought about using Profiles and System_Configurations but they seem handle switching databases used and not using the same database but different table names.
How would you handle this situation?
Edited by: Richard Sherman on Sep 15, 2008 9:02 AM
Werner Daehn replied
I think you both had the right idea already...
You could use datastore configurations, but then either the database or the owner has to be different and the table name has to remain the same. If you create a database view called "customer" which is defined as "create view customer as select * from customer_tbl_1" for each owner you would have one table, well view but that's the same, you can import and build the dataflows for and via datastore configurations you can do the switch between them.
That obviously would lead tothe next question, why not recreating the views via a script at the job start? So the job starts, in a parameter you will tell you are interested in "_tbl1" and hence the job will recreate all views to point to the _tbl1 source tables. Even better, you encapsule the create-view statements into a database stored procedure you call from the DI script.
The other option is to use partitioning. Well, actually, your table is partitioned already, what you lack is the view over all tables at once. So you create a new database view:
create view customer as
select *, 1 as partition_no from customer_tbl_1
select *, 2 as partition_no from customer_tbl_2
select *, 3 as partition_no from customer_tbl_3
select *, 4 as partition_no from customer_tbl_4
select *, ... as partition_no from customer_tbl_...
and read from there. In case there is no indication, the view could have an extra column like I added it.
The union_all (! not just union) is an operation that does not cost anything. And depending on the optimizer capabilities (for Oracle I know it would) a query like "select * from customer where partition_no=1" would be as fast as selecting the customer_tbl_1 directly.