cancel
Showing results for 
Search instead for 
Did you mean: 

Low performance for inner join on HANA

Former Member
0 Kudos

Hi all,

I have two tables that I want to join.

I have live streaming tweet data on TWITTER_ALL_DATA table which contains about 10 millions records right now (For this table, I have index on USERNAME column).

I have another table CUSTOMER_SOCIAL_MEDIA which has about 16K customer usernames (USERNAME column is already primary key on this table).

When I want to make an inner join on these two tables, it takes about 5 minutes to execute the following query:


SELECT COUNT(*)

FROM CUSTOMER_SOCIAL_MEDIA T2

INNER JOIN TWITTER_ALL_DATA T1

ON T1.USERNAME = T2.USERNAME

I don't know why, but sometimes same query performs in 190 ms. But generally, the execution time is over 5 minutes as I said above.

I wanna reduce the execution time surely. Here is the planviz screenshots, what can I do?

Thanks,

Inanc

Accepted Solutions (0)

Answers (2)

Answers (2)

Former Member
0 Kudos

Hi,

Use count(1) instead of count(*). This may not give you significant margin but relatively faster.

Regards

lbreddemann
Active Contributor
0 Kudos

Complete nonsense.

<PC mode>This is actually untrue. Internally SAP HANA, like most other DBMS nowadays, understands what the user wants to do and processes count(*) and count(1) in the very same way</pc mode>

Former Member
0 Kudos

Thanks Lars for the correction.

More at count(*) or count(id) ? | SCN

SergioG_TX
Active Contributor
0 Kudos

Inanc,

are your 10M rows distinct ? if not, try to get the distinct records, this should reduce the number of rows you are trying to join.

do you have any null values as username? if so, please exclude those.

are you connected to your HANA environment on the same network as the PC you are running this on or are you on VPN? network latency may also contribute to the performance of this query but should be minimal.

are there any other processes running? if so, how much memory is available while you are running this query?

just a few things to consider.

Former Member
0 Kudos

Hi Sergio,

Yes, all 10M records have distinct tweet IDs.

And I don't have null values as username.

I also considered the network issue. Yes I connect to the database over network, so I tried this query on the localhost of the database. Nothing changed, issue still persists.

No other process (queries) running at the same time. Here you can see the memory information

anindya_bose
Active Contributor
0 Kudos

Hi Inanc

Your Plan Viz shows 3 dominant operators with 45, 32 and 0.62 mili seconds and then overall run time is 333 seconds ( slightly more than 5 mins ) .. I am confused here.  Where the time being spent then ?

Not sure why you want "SELECT COUNT ( * )" ..  Can you try below theree options?

1. SELECT * from TWITTER_ALL_DATA  WHERE USERNAME IN  ( SELECT DISTINCT USERNAME FROM CUSTOMER_SOCIAL_MEDIA )

---- you can replace * with the fields required from TWITTER_ALL_DATA


2. SELECT  T1.Field1, T1. Field2 , T2.Field3, T3.Field4  from CUSTOMER_SOCIAL_MEDIA  INNER JOIN


TWITTER_ALL_DATA  T2  ON  T1. USERNAME = T2.USERNAME



3.

SELECT COUNT(*) FROM  ( SELECT  T1.Field1, T1. Field2 , T2.Field3, T3.Field4  from CUSTOMER_SOCIAL_MEDIA  INNER JOIN


TWITTER_ALL_DATA  T2  ON  T1. USERNAME = T2.USERNAME ) 



Regards

Anindya

lbreddemann
Active Contributor
0 Kudos

Anindya Bose wrote:

Your Plan Viz shows 3 dominant operators with 45, 32 and 0.62 mili seconds and then overall run time is 333 seconds ( slightly more than 5 mins ) .. I am confused here.  Where the time being spent then ?    

That's exactly the question that needs to be answered here.

It's not about the execution plan, the data distribution or the way SAP HANA computes joins.

By looking at the PlanViz overview statistics it's already clear, that the instrumented runtime accounting does not cover the long total response time.

If the system is not massively burning CPU during the 5 mins of wall clock time, then it must be waiting for something.

So, checking the state of the threads involved in the query execution (also looking at the stack traces) can be a good first step. If the threads are hanging in some os calls or waiting for low level locks (mutex, futex, semaphores is what you'd typically see in the SAP HANA stack traces), then the next step is to find who's holding those locks.

Another angle for approaching this problem can be to check the OS/HW setup. This a XEN virtual system, so things like CPU assignment/affinity, hyperthreading and transparent huge page setup needs to be checked on all involved levels of the system setup.

Former Member
0 Kudos

Hi Anindya,

Sorry for late answer. I just try your 3 queries, and all of them performed in the same way with my query.

Former Member
0 Kudos

Hi Lars,

Sorry for late answer, just become able to answer.

I just checked Threads, Load and Sessions tabs under Performance tab in HANA Studio as you can see in the pictures. To be honest, I don't know deep knowledge for interpret them nor where else should I look at. Do you have any suggestion or direction for me?

Thanks,

Inanc

lbreddemann
Active Contributor
0 Kudos

The snapshots of the thread states and running statements unfortunately don't show anything besides the internal activity of the statistics service (that's OK) and the statement that actually produces this overview.

So no insight here.

The load graph on the other side shows that there are a couple of active and waiting threads while barely any CPU is used. This leaves me with the conclusion that these "active" threads are only active from SAP HANA perspective (that is, they are not waiting for a a syscall, a lock etc) but that these threads simply don't get any CPU time.

And here we are back at the configuration of the virtual machine this SAP HANA instance is running in - this should be checked.

Former Member
0 Kudos

Thanks Lars, I will try to figure it out.

Former Member
0 Kudos

I am embarrassed to say this but I restarted the services with the commands below and it has been fixed Can't believe this could solve the issue. I think you were right Lars, there was a problem about the configuration of the virtual machine this SAP HANA instance is running in. And after I restarted the services, it has gone somehow.

Commands to restart as below:

/etc/init.d/sapinit stop

Followed by.

/etc/init.d/sapinit start


Give it some time and then run this command to check that the processes are all listening to the ports properly.  You should see one for 8000 and 30015.

lsof -i -n -P | grep LISTEN

lbreddemann
Active Contributor
0 Kudos

Good to hear that the problem vanished and very unsatisfying that we don't know why it had been there in the first place.

Anyhow, enjoy your now fast inner join