cancel
Showing results for 
Search instead for 
Did you mean: 

Small-scale replication into HANA

Former Member
0 Kudos

Hi Folks,

In the near future I plan on building a small prototype system not too different from the typical "side-car" scenario, replicating data from a source system (MS SQL Server) into HANA in real-time (or near real-time).

As I'll be doing this on my own dime, I wanted to find out if anyone has experience building something similar? I've managed to provision free developer instances of HANA and SQL Server but haven't located developer versions of any Sybase Replication Server (according to an SAP rep there are none) or Data Services (I haven't looked extensively). Also highly doubt there's something like a developer edition of SLT (and I believe this would require it's own dedicated server?)

I currently have SQLServer installed on the same machine as my Studio (Remote Desktop to CloudShare). I intend on installing the replication/ETL software there as well.

I have no experience with SQLServer but was able to get it up and running pretty easily. I found "native" replication abilities via publication/subscriptions, but it looks like any non-SQLServer DBs aside from Oracle and DB2 are not well supported. They need to be setup with stored procedures, and I was running into obscure errors with these (which I'm of course less familiar with than HANA's familiar error codes )

I also searched for open source replication technologies and came across SymmetricDS. I'm not sure how well it'll work with HANA but it may be worth investing some time and energy into.

Before I go down too many rabbit holes, thought I'd reach out to the community and see if anyone else has done similar things? What have your experiences been, setting up replication (or ETL) to HANA cloud instances?

Any insights into developer addition of other replication/ETL software packages would be much appreciated! (For my prototype, no strict realtime requirement, so a simple ETL 1:1 should be okay).

Thanks!

Accepted Solutions (1)

Accepted Solutions (1)

justin_molenaur2
Contributor
0 Kudos

Jody, since this is a prototype setup and you don't require real time replication I would recommend to check out MS SSIS (Microsoft Integration Services). If you installed the developer edition of MSSQL BI (2012?), this should come installed as a default option.

Not sure if there are any hidden compatibility issues, but chances are if you can connect with ODBC and setup HANA client on the local machine, you should be able to load into HANA. The bonus here is that the developer edition is close to free, last time I installed it was like $50 - so you can get something up and running relatively cheap and fast. DataStage, DataServices or any of the other replication tools offer nothing like this.

SSIS is fairly straightforward, robust and has a decent GUI, probably better in 2012 - I last used 2010. There are communities around this tool as well to get some basic tutorials under your belt.

Regards,

Justin

Former Member
0 Kudos

@Krishna - thanks for the feedback. I'm curious how you handle delta loads? Do you have timestamps that you filter on or some other mechanism? I'd be interested in the details on this once you publish.

@Justin - thanks also for the idea. Right now I have SQLServer 2012, not sure whether or not BI came as part of that installation, I'll check. I was actually going to ask if you'd done something like this. I should've guessed that you'd be on top of the forums.

Thanks guys!

justin_molenaur2
Contributor
0 Kudos

Jody - you need to have SQL Server 2012 BI Developer edition. That is the one that comes with SSIS. Alternatively, you can download Visual Studio 2012 with the BI plugins and test without having the actual SSIS server up and running.

I tested on my side and was able to connect to a local SQL server and load data into a HANA DB. Additionally, executing some DDL to drop/create tables worked fine as well. This was all using the ODBC source and destination.

Interesting to know that folks needing to integrate with HANA and already have MS BI skills should be able to do so.

Happy HANA,

Justin

Former Member
0 Kudos

Great, thanks Justin! I look forward to trying that out.

Cheers,

Jody

former_member182302
Active Contributor
0 Kudos

Hi Jody,

Apologies for the long delay.

You may want to have a look on the below blog may be it gives you some insights and would love to have your comments too

Preparing another blog as i said, but it is still under progress.

Regards,

Krishna Tangudu

Former Member
0 Kudos

No worries, thanks Krishna.

It looks like delta is essentially handled via UPSERT which can be rather expensive if a simple append is required, and also doesn't handle history.

Nonetheless it looks like a robust (if not a bit over-engineered ) solution for straightforward "replication-like" loads.

Good stuff!

Former Member
0 Kudos

By over-engineered, I'm referring to the use of arrays rather than cursors or selects against table variables.

I probably would've stuck with just loops against table variables for readability, since looping is required in any case, and because the number of columns of a table is quite small.

BUT - I'm certainly not the authority on best practice procedure design - AND - I learned a bit about using arrays in SQLScript - so thanks a lot for your contribution! I may well "steal" some of your code (with attribution of course) for future use cases with this need.

former_member182302
Active Contributor
0 Kudos

Hi Jody,

Yes i agree with you that UPSERT is rather expensive. Which i also see here.

Now we can handle that by :

DELETE FROM TABLE ( Based on primary keys )

INSERT INTO TABLE ( Complete)

This work around actually solves the performance issues a bit.

Yes this solution is bit carefully implemented in the project i worked on, even we added exception handling to ensure the commits and rollbacks are working fine to ensure data availability and consistency.

After all "Client" demands it

Regards,

Krishna tangudu

former_member182302
Active Contributor
0 Kudos

Hmm ok got what you meant by "Over-Engineered".

Now the actual interesting topic.. As i see more often ( !! Avoid Cursors !! )....

Which is better for performance cursors or arrays  or on Table variables ? ( have some tables with relatively more columns ) ?

Agree with you. But would want to know if may be Lars Breddemann or anyother Expert pops up here to help us understand which approach is actually better?

Regards,

Krishna Tangudu

Former Member
0 Kudos

Hey Krishna -

Perhaps you can do what the other "experts" do to gain their knowledge - test it and discover for yourself.

My suspicion is that at a lower-level, selecting values from a cursor is essentially the same thing - indexing an array. So I would expect comparable performance. (If you inspect the indexserver trace after activating your procedure (you may need a certain trace level), you can see the L code that's generated which I suspect will have an array variable. As L is a "safe-subset" of C++, and most of core HANA is written in C++, the performance is likely comparable).

(I'm sure there's also some overhead in building the array from the result set, so at the end of the day my vote is on using cursors. But I'd have to test to be sure.)

That being said, I believe the max number of columns in a table is 1000 - which is way too low of a number to compare performance. Hence my emphasis on readability over potential performance differences.

Cheers,

Jody

Former Member
0 Kudos

Well cool, sounds like a robust solution.

Regarding deltas though - I still don't think you need to touch the whole table (whether you use UPSERT or UPDATE/INSERT).

Why not add a timestamp on your "source" table (i.e. stamp it as it comes through SLT), and then query the source data WHERE SOURCE_TIMESTAMP > LAST_SUCCESSFUL_EXECUTION?

LAST_SUCCESSFUL_EXECUTION would be a timestamp you store from your procedure's execution (i.e. into a logging table).

That way you truly only handle deltas.

Would that work in your scenario?

former_member182302
Active Contributor
0 Kudos

Thanks for your thoughts Jody.

Though i am not using SLT for replication here. We are using the framework we developed using hdbsql for this.

Yes we are handling in that way only but the difference is not with the timestamps but with Batch numbers of the load and also we do not actually delete the data instead we do soft deletes i.e marking them as 'D'.

Regards,

Krishna Tangudu

former_member182302
Active Contributor
0 Kudos

Yes i tried that but in this specific case i was not able to compare the performance but working on some other use case which would give me a chance to compare..

In the mean time i was actually asking these "experts" have already done something and may have some results already which can be shared   ( Advantage of being part of this wonderful Facebook like technical community )

Regards,

Krishna Tangudu

Former Member
0 Kudos

Ah, makes sense. Sounds like a solid approach!

justin_molenaur2
Contributor
0 Kudos

Jody, agree with this and after some more thought I think you can blend this approach (which I believe you already indicated)

- Use timestamp/datestamp functionality to limit the amount of data selected from the source

- Still use UPSERT to simplify logic and not have to handle multiple cases in your code

Upsert may be slower, but as long as we are doing small selects, it will definitely be faster than touching the entire table.

This approach works in the OVERWRITE mode where the columns should be updated. In classic BW, there are some DSO cases where the updates should be additive, but I don't think that applies in HANA since we can aggregate very fast. DSO additive updates usually solved materialized aggregation issues.

Regards,

Justin

Former Member
0 Kudos

Yup we're on the same page.

As for "materialized aggregates", I have no idea what you're talking about.

justin_molenaur2
Contributor
0 Kudos

Should have read "physicalized".

In BW, a DSO object has the ability to take on different aggregation methods when it gets a changed record with the same key as an existing record. You can choose to overwrite all columns (including key figures) OR you can choose to overwrite the attributes but to ADD the key figures to the existing values for that key.

In some use cases, for example, the requirement simply needs data aggregated based on Material and Plant as the key with a QTY as key figure. Someone can set up a DSO based on that, and every new record with the same key will additively update the measure.

This was typically built for speed, and in HANA this becomes a non issue since the speed of aggregation like this is awesome in HANA.

Regards,

Justin

Former Member
0 Kudos

Justin, thanks for the detailed explanation! However, I was being facetious about "materialized aggregates", since HANA stores granular data and the concept of storing aggregates in HANA is a nonstarter. Apologies for my opaque sense of humor!

Always good though for me to learn more about BW!

Answers (1)

Answers (1)

former_member182302
Active Contributor
0 Kudos

Hi Jody,

Just to share the thought we have implemented the replication from Teradata system to SAP HANA with a combination of PL/SQL procs and then invoking these procedures using Unix scripts and then using Job scheduler to invoke these scripts.

This is one way of Native replication. Working on publishing my thoughts into a document which i will share once it is finished.

Regards,

Krishna Tangudu