cancel
Showing results for 
Search instead for 
Did you mean: 

Its ok disable log when using MaxDB for DataWarehouse

Former Member
0 Kudos

Hi,

Its ok disable log when using MaxDB only for DataWarehouse? this can speed up ETL?

best regards

Clóvis

Accepted Solutions (1)

Accepted Solutions (1)

former_member229109
Active Contributor
0 Kudos

Hello Clóvis,

-> What version of the database?

-> What performance issues do you have on the system?

-> Do you mean deactivate database logging?

Please review "Log Area" document at

http://maxdb.sap.com/doc/7_7/19/d2235b234b4339b07627c8a1f2bfab/content.htm

Regards, Natalia Khlopina

Former Member
0 Kudos

Hello Natalia,

MaxDB version is 7.6.06.10

the performance issue is about too low speed to generate Aggregate Tables, where we use select sum to pre generate aggregate tables, that can speed up MDX queries from mondriam.

yes, I mean disabling database log changing Redo log management to off

best regards

Clóvis

lbreddemann
Active Contributor
0 Kudos

Hello Clovis,

are you sure that the logging is slowing down the loading of your aggregate tables?

From the very definition of aggregates I'd assume that these tables contain relatively dense data (that is aggregated data).

Which implies that this data is probably not that much to load.

Have you already checked how the statement for this loading is processed?

Do you use parallel joins or prefetching?

Based on the experience for those kinds of tables in SAP BW systems, I'd guess that your problem is not the logging, but the actual processing for these aggregate tables.

I also would check for constraints, triggers and indexes on the target table that might not be required during the loading process.

regards,

Lars

Former Member
0 Kudos

Hi Lars,

I'm not sure about nothing, just guess that log can increase speed, in real, I dont know what to speed up, just know that is a slow process.

about how that aggregate table is created, we use commands like this:

INSERT INTO "aggtable_Emplacamento" (
    "CALENDARIO_ano",
    "FABRICANTE_Grupo",
    "MODELOVEICULO_Grupo",
    "EMPLACAMENTO_emplacado",
    "EMPLACAMENTO_autorizado",
    "EMPLACAMENTO_invasao",
    "EMPLACAMENTO_fact_count")
select 
    "CALENDARIO"."ANO" as "CALENDARIO_ano", 
    "FABRICANTE"."GRUPO" as "FABRICANTE_Grupo", 
    "MODELOVEICULO"."GRUPOFABRICANTE" as "MODELOVEICULO_Grupo", 
    sum("EMPLACAMENTO"."QTDE") as "EMPLACAMENTO_emplacado", 
    sum(DECODE (AUTORIZADO, 'S', 1, 0)) as "EMPLACAMENTO_autorizado", 
    sum(DECODE (AUTORIZADO, 'N', 1, 0)) as "EMPLACAMENTO_invasao", 
    count(*) as "EMPLACAMENTO_fact_count"
from 
    "EMPLACAMENTOS"."EMPLACAMENTO" as "EMPLACAMENTO", 
    "EMPLACAMENTOS"."CALENDARIO" as "CALENDARIO", 
    "EMPLACAMENTOS"."FABRICANTE" as "FABRICANTE", 
    "EMPLACAMENTOS"."MODELOVEICULO" as "MODELOVEICULO"
where 
    "EMPLACAMENTO"."DATA" = "CALENDARIO"."DATA" and 
    "EMPLACAMENTO"."MYFABRICANTE" = "FABRICANTE"."CODIGO" and 
    "EMPLACAMENTO"."MYMODELOVEICULO" = "MODELOVEICULO"."CODIGO"
group by 
    "CALENDARIO"."ANO", 
    "FABRICANTE"."GRUPO", 
    "MODELOVEICULO"."GRUPOFABRICANTE";

and here is the explain


OWNER	TABLENAME	COLUMN_OR_INDEX	STRATEGY	PAGECOUNT		
	CALENDARIO	ICALENDARIO	INDEX SCAN	123		
			ONLY INDEX ACCESSED			
	EMPLACAMENTO	DATA	JOIN VIA KEY RANGE	120921		
	FABRICANTE	CODIGO	JOIN VIA KEY COLUMN	1		
			TABLE HASHED			
	MODELOVEICULO	CODIGO	JOIN VIA KEY COLUMN	129		
			TABLE HASHED			
			     NO TEMPORARY RESULTS CREATED			
	JDBC_CURSOR_7		     RESULT IS COPIED   , COSTVALUE IS	121065		
	JDBC_CURSOR_7		QUERYREWRITE - APPLIED RULES:			
	JDBC_CURSOR_7		   DistinctPullUp	1		

i dont know what you mean about: "Do you use parallel joins or prefetching?"

you can explain about that?

best regards.

Clóvis

lbreddemann
Active Contributor
0 Kudos

Hi Clovis,

ok - as this is about a join, prefetching is not usable since it's only available for bare table scans.

But parallel join would be possible to employ here.

To activate this, you have to set parameter OPTIMIZE_JOIN_PARALLEL_SERVER to a value larger than 1.

Be careful, you've to have a MaxDB >7.6.03 for that.

Also, and this might be an issue here as well - it only parallelizes the index access during a join transition.

That means, it builds up a list of primary keys in the target table in parallel and then looks them up in a serial fashion.

Since in your case, the big bad EMPLACAMENTO table is joined via KEY (which is usually a good thing) this parallel join feature won't do much in this case.

Looking at your statement, I see that there is no restriction in it, that is: ALL data is going to be read.

If you're pretty sure about that the join will properly be made via the CALENDARIO table, then you might consider to cluster the EMPLACAMENTO by the DATA column.

The syntax for that would be

ALTER TABLE EMPLACAMENTO CLUSTER ( DATA )

By clustering the table, the data in the table will be stored sorted by DATA and the blocks will be saved in consecutive chunks on disk - both enabling better page access times.

(also check the SDN MaxDB WIKI on that!)

regards,

Lars

Former Member
0 Kudos

Hi Lars

The database version is 7.6.06.10 then I will increase the parameter OPTIMIZE_JOIN_PARALLEL_SERVER to 5, and the table EMPLACAMENTO is already CLUSTER in column DATA.

about Wiki, there is a problem, many points that is interesting or can help me points to internal SAP Notes, and for that I dont have the access, then Wiki for me is allmost "access denied"

but i issued some SQL that checks the Cluster Quality, I dont understand that formula(1), perhaps maybe you can explain me better, follow are the results:


SCHEMANAME	TABLENAME	CLUSTERCOUNT
EMPLACAMENTOS	EMPLACAMENTO	3232


TREELEAVESSIZE
975464
150760
204608
252392
218624
195512
576280
586072
142624
176696


SCHEMANAME	TABLENAME	INDEXNAME	CLUSTERCOUNT
EMPLACAMENTOS	EMPLACAMENTO	IEMPLACAMENTO1	5898
EMPLACAMENTOS	EMPLACAMENTO	IEMPLACAMENTO2	17947
EMPLACAMENTOS	EMPLACAMENTO	IEMPLACAMENTO3	24521
EMPLACAMENTOS	EMPLACAMENTO	IEMPLACAMENTO4	1876
EMPLACAMENTOS	EMPLACAMENTO	IEMPLACAMENTO5	15479
EMPLACAMENTOS	EMPLACAMENTO	IEMPLACAMENTO6	10750
EMPLACAMENTOS	EMPLACAMENTO	IEMPLACAMENTO_TESTE	6426
EMPLACAMENTOS	EMPLACAMENTO	IEMPLACAMENTO8	2545
EMPLACAMENTOS	EMPLACAMENTO	IEMPLACAMENTO	5937

(1) TREELEAVESSIZE (in kilobytes) is divided by the page size (in KB), which is divided by CLUSTERCOUNT, and this determines the cluster quality.

best regards

Clóvis

lbreddemann
Active Contributor
0 Kudos

Hi again

Ok, yes there are many links to SAP Notes in the WIKI (since this whole site is primarily aimed at SAP customers).

However, the vast majority of all information concerning MaxDB that you can find in the SAP notes can also be found in the documentation, the internals course slides and in the expert session materials all available on [http://maxdb.sap.com/training].

Given the super small LOG QUEUE SIZE - well maybe this really was the bottleneck, so check out the statement runtime with a decent value!

Concerning the cluster quality: the whole idea is to save chunks of pages together on disk.

So to tell whether or not this goal is reached and how good we take the total number of pages (which is size /8KB) and this we compare with the number of chunks they are stored in.

The less chunks the better it is

regards,

Lars

lbreddemann
Active Contributor
0 Kudos

Hi again

Ok, yes there are many links to SAP Notes in the WIKI (since this whole site is primarily aimed at SAP customers).

However, the vast majority of all information concerning MaxDB that you can find in the SAP notes can also be found in the documentation, the internals course slides and in the expert session materials all available on [http://maxdb.sap.com/training].

Given the super small LOG QUEUE SIZE - well maybe this really was the bottleneck, so check out the statement runtime with a decent value!

Concerning the cluster quality: the whole idea is to save chunks of pages together on disk.

So to tell whether or not this goal is reached and how good we take the total number of pages (which is size /8KB) and this we compare with the number of chunks they are stored in.

The less chunks the better it is

regards,

Lars

Answers (1)

Answers (1)

steffen_schildberg
Active Participant
0 Kudos

Hi Clovis,

this is solely up to you. We recommend to not switch off the log in productively used environments. It will indeed speed up long running loads but - as you really well know, I think - if anything fails you can loose all your data starting with the last data backup. And we know of customers who even did not have backups.

And furthermore no SAVEPOINT will be triggered any more if the log is switched off. This simply means that upon restarting the database by what reason ever it will have the state of the last SAVEPOINT which might than be long back in time.

Regards,

Steffen

Former Member
0 Kudos

Hi Steffen,

as this database is only used for datawarehouse, its ok to loss data in database, as all data can be regenerated from outside and loaded again. About SAVEPOINT that occurs only if occurs a database machine hard reset and dont have a complete backup issued, right? if a normal shutdown occurs using for example dbmcli, db_offline that occurs too?

best regards

Clóvis

lbreddemann
Active Contributor
0 Kudos

> as this database is only used for datawarehouse, its ok to loss data in database, as all data can be regenerated from outside and loaded again

Maybe correct ( as you probably won't have the historic data anymore), but it will take a rather long time to reload all the data.

> About SAVEPOINT that occurs only if occurs a database machine hard reset and dont have a complete backup issued, right? if a normal shutdown occurs using for example dbmcli, db_offline that occurs too?

Yes, the data loss would only occur, when no clean shutdown was performed.

You may also see my blog

[Questions to SAP-Support (MaxDB): "Data is gone after reboot - where the heck is it?"

|http://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/wlg/15412] [original link is broken] [original link is broken] [original link is broken]; on this.

Really - logging shouldn't be the big bottleneck for your case.

If it would be the bottleneck you'd see lots of log queue overflows and tasks waiting for LogIO...

Is this the case?

If it is - how large is your log_io_queue and how many have you setup?

regards,

Lars

Former Member
0 Kudos

Hi Lars,

no for now I cant see that LOG is a matter of slow down, just see 2 queue overflows in about 15 days of activity.

the LOG_IO_QUEUE = 50

and the LOG_SEGMENT_SIZE = 43690

what I guess to disable LOG, is more to speed up that types of insert statement, as that dont need to be recovered in a case of system crash, and reduce I/O, as I have in that machine, only one RAID 1 partition.

best regards.

Clóvis

lbreddemann
Active Contributor
0 Kudos

Hi Clovis,

ok - this is really a tiny log_io_queue.

Most SAP customers go with sizes of 2000.

Try this and you'll cut down the number of log flushes (ok, yes, the average no. of the large log flushes...) to 1/40 of what you have now.

LOG_SEGMENT_SIZE is rather irrelevant in this case.

regards,

Lars