on 04-26-2011 8:03 PM
Hi,
Its ok disable log when using MaxDB only for DataWarehouse? this can speed up ETL?
best regards
Clóvis
Hello Clóvis,
-> What version of the database?
-> What performance issues do you have on the system?
-> Do you mean deactivate database logging?
Please review "Log Area" document at
http://maxdb.sap.com/doc/7_7/19/d2235b234b4339b07627c8a1f2bfab/content.htm
Regards, Natalia Khlopina
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hello Natalia,
MaxDB version is 7.6.06.10
the performance issue is about too low speed to generate Aggregate Tables, where we use select sum to pre generate aggregate tables, that can speed up MDX queries from mondriam.
yes, I mean disabling database log changing Redo log management to off
best regards
Clóvis
Hello Clovis,
are you sure that the logging is slowing down the loading of your aggregate tables?
From the very definition of aggregates I'd assume that these tables contain relatively dense data (that is aggregated data).
Which implies that this data is probably not that much to load.
Have you already checked how the statement for this loading is processed?
Do you use parallel joins or prefetching?
Based on the experience for those kinds of tables in SAP BW systems, I'd guess that your problem is not the logging, but the actual processing for these aggregate tables.
I also would check for constraints, triggers and indexes on the target table that might not be required during the loading process.
regards,
Lars
Hi Lars,
I'm not sure about nothing, just guess that log can increase speed, in real, I dont know what to speed up, just know that is a slow process.
about how that aggregate table is created, we use commands like this:
INSERT INTO "aggtable_Emplacamento" (
"CALENDARIO_ano",
"FABRICANTE_Grupo",
"MODELOVEICULO_Grupo",
"EMPLACAMENTO_emplacado",
"EMPLACAMENTO_autorizado",
"EMPLACAMENTO_invasao",
"EMPLACAMENTO_fact_count")
select
"CALENDARIO"."ANO" as "CALENDARIO_ano",
"FABRICANTE"."GRUPO" as "FABRICANTE_Grupo",
"MODELOVEICULO"."GRUPOFABRICANTE" as "MODELOVEICULO_Grupo",
sum("EMPLACAMENTO"."QTDE") as "EMPLACAMENTO_emplacado",
sum(DECODE (AUTORIZADO, 'S', 1, 0)) as "EMPLACAMENTO_autorizado",
sum(DECODE (AUTORIZADO, 'N', 1, 0)) as "EMPLACAMENTO_invasao",
count(*) as "EMPLACAMENTO_fact_count"
from
"EMPLACAMENTOS"."EMPLACAMENTO" as "EMPLACAMENTO",
"EMPLACAMENTOS"."CALENDARIO" as "CALENDARIO",
"EMPLACAMENTOS"."FABRICANTE" as "FABRICANTE",
"EMPLACAMENTOS"."MODELOVEICULO" as "MODELOVEICULO"
where
"EMPLACAMENTO"."DATA" = "CALENDARIO"."DATA" and
"EMPLACAMENTO"."MYFABRICANTE" = "FABRICANTE"."CODIGO" and
"EMPLACAMENTO"."MYMODELOVEICULO" = "MODELOVEICULO"."CODIGO"
group by
"CALENDARIO"."ANO",
"FABRICANTE"."GRUPO",
"MODELOVEICULO"."GRUPOFABRICANTE";
and here is the explain
OWNER TABLENAME COLUMN_OR_INDEX STRATEGY PAGECOUNT
CALENDARIO ICALENDARIO INDEX SCAN 123
ONLY INDEX ACCESSED
EMPLACAMENTO DATA JOIN VIA KEY RANGE 120921
FABRICANTE CODIGO JOIN VIA KEY COLUMN 1
TABLE HASHED
MODELOVEICULO CODIGO JOIN VIA KEY COLUMN 129
TABLE HASHED
NO TEMPORARY RESULTS CREATED
JDBC_CURSOR_7 RESULT IS COPIED , COSTVALUE IS 121065
JDBC_CURSOR_7 QUERYREWRITE - APPLIED RULES:
JDBC_CURSOR_7 DistinctPullUp 1
i dont know what you mean about: "Do you use parallel joins or prefetching?"
you can explain about that?
best regards.
Clóvis
Hi Clovis,
ok - as this is about a join, prefetching is not usable since it's only available for bare table scans.
But parallel join would be possible to employ here.
To activate this, you have to set parameter OPTIMIZE_JOIN_PARALLEL_SERVER to a value larger than 1.
Be careful, you've to have a MaxDB >7.6.03 for that.
Also, and this might be an issue here as well - it only parallelizes the index access during a join transition.
That means, it builds up a list of primary keys in the target table in parallel and then looks them up in a serial fashion.
Since in your case, the big bad EMPLACAMENTO table is joined via KEY (which is usually a good thing) this parallel join feature won't do much in this case.
Looking at your statement, I see that there is no restriction in it, that is: ALL data is going to be read.
If you're pretty sure about that the join will properly be made via the CALENDARIO table, then you might consider to cluster the EMPLACAMENTO by the DATA column.
The syntax for that would be
ALTER TABLE EMPLACAMENTO CLUSTER ( DATA )
By clustering the table, the data in the table will be stored sorted by DATA and the blocks will be saved in consecutive chunks on disk - both enabling better page access times.
(also check the SDN MaxDB WIKI on that!)
regards,
Lars
Hi Lars
The database version is 7.6.06.10 then I will increase the parameter OPTIMIZE_JOIN_PARALLEL_SERVER to 5, and the table EMPLACAMENTO is already CLUSTER in column DATA.
about Wiki, there is a problem, many points that is interesting or can help me points to internal SAP Notes, and for that I dont have the access, then Wiki for me is allmost "access denied"
but i issued some SQL that checks the Cluster Quality, I dont understand that formula(1), perhaps maybe you can explain me better, follow are the results:
SCHEMANAME TABLENAME CLUSTERCOUNT
EMPLACAMENTOS EMPLACAMENTO 3232
TREELEAVESSIZE
975464
150760
204608
252392
218624
195512
576280
586072
142624
176696
SCHEMANAME TABLENAME INDEXNAME CLUSTERCOUNT
EMPLACAMENTOS EMPLACAMENTO IEMPLACAMENTO1 5898
EMPLACAMENTOS EMPLACAMENTO IEMPLACAMENTO2 17947
EMPLACAMENTOS EMPLACAMENTO IEMPLACAMENTO3 24521
EMPLACAMENTOS EMPLACAMENTO IEMPLACAMENTO4 1876
EMPLACAMENTOS EMPLACAMENTO IEMPLACAMENTO5 15479
EMPLACAMENTOS EMPLACAMENTO IEMPLACAMENTO6 10750
EMPLACAMENTOS EMPLACAMENTO IEMPLACAMENTO_TESTE 6426
EMPLACAMENTOS EMPLACAMENTO IEMPLACAMENTO8 2545
EMPLACAMENTOS EMPLACAMENTO IEMPLACAMENTO 5937
(1) TREELEAVESSIZE (in kilobytes) is divided by the page size (in KB), which is divided by CLUSTERCOUNT, and this determines the cluster quality.
best regards
Clóvis
Hi again
Ok, yes there are many links to SAP Notes in the WIKI (since this whole site is primarily aimed at SAP customers).
However, the vast majority of all information concerning MaxDB that you can find in the SAP notes can also be found in the documentation, the internals course slides and in the expert session materials all available on [http://maxdb.sap.com/training].
Given the super small LOG QUEUE SIZE - well maybe this really was the bottleneck, so check out the statement runtime with a decent value!
Concerning the cluster quality: the whole idea is to save chunks of pages together on disk.
So to tell whether or not this goal is reached and how good we take the total number of pages (which is size /8KB) and this we compare with the number of chunks they are stored in.
The less chunks the better it is
regards,
Lars
Hi again
Ok, yes there are many links to SAP Notes in the WIKI (since this whole site is primarily aimed at SAP customers).
However, the vast majority of all information concerning MaxDB that you can find in the SAP notes can also be found in the documentation, the internals course slides and in the expert session materials all available on [http://maxdb.sap.com/training].
Given the super small LOG QUEUE SIZE - well maybe this really was the bottleneck, so check out the statement runtime with a decent value!
Concerning the cluster quality: the whole idea is to save chunks of pages together on disk.
So to tell whether or not this goal is reached and how good we take the total number of pages (which is size /8KB) and this we compare with the number of chunks they are stored in.
The less chunks the better it is
regards,
Lars
Hi Clovis,
this is solely up to you. We recommend to not switch off the log in productively used environments. It will indeed speed up long running loads but - as you really well know, I think - if anything fails you can loose all your data starting with the last data backup. And we know of customers who even did not have backups.
And furthermore no SAVEPOINT will be triggered any more if the log is switched off. This simply means that upon restarting the database by what reason ever it will have the state of the last SAVEPOINT which might than be long back in time.
Regards,
Steffen
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Steffen,
as this database is only used for datawarehouse, its ok to loss data in database, as all data can be regenerated from outside and loaded again. About SAVEPOINT that occurs only if occurs a database machine hard reset and dont have a complete backup issued, right? if a normal shutdown occurs using for example dbmcli, db_offline that occurs too?
best regards
Clóvis
> as this database is only used for datawarehouse, its ok to loss data in database, as all data can be regenerated from outside and loaded again
Maybe correct ( as you probably won't have the historic data anymore), but it will take a rather long time to reload all the data.
> About SAVEPOINT that occurs only if occurs a database machine hard reset and dont have a complete backup issued, right? if a normal shutdown occurs using for example dbmcli, db_offline that occurs too?
Yes, the data loss would only occur, when no clean shutdown was performed.
You may also see my blog
[Questions to SAP-Support (MaxDB): "Data is gone after reboot - where the heck is it?"
|http://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/wlg/15412] [original link is broken] [original link is broken] [original link is broken]; on this.
Really - logging shouldn't be the big bottleneck for your case.
If it would be the bottleneck you'd see lots of log queue overflows and tasks waiting for LogIO...
Is this the case?
If it is - how large is your log_io_queue and how many have you setup?
regards,
Lars
Hi Lars,
no for now I cant see that LOG is a matter of slow down, just see 2 queue overflows in about 15 days of activity.
the LOG_IO_QUEUE = 50
and the LOG_SEGMENT_SIZE = 43690
what I guess to disable LOG, is more to speed up that types of insert statement, as that dont need to be recovered in a case of system crash, and reduce I/O, as I have in that machine, only one RAID 1 partition.
best regards.
Clóvis
Hi Clovis,
ok - this is really a tiny log_io_queue.
Most SAP customers go with sizes of 2000.
Try this and you'll cut down the number of log flushes (ok, yes, the average no. of the large log flushes...) to 1/40 of what you have now.
LOG_SEGMENT_SIZE is rather irrelevant in this case.
regards,
Lars
User | Count |
---|---|
84 | |
10 | |
10 | |
10 | |
7 | |
6 | |
6 | |
5 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.