on 08-02-2013 9:48 AM
Hi,
I am a bit confused on using SAP HANA in conjunction with HADOOP. Can anyone explain how HANA can compliment an existing HADOOP Platform investment for an enterprise? Are there any real Customers using HADOOP along with the in-memory computing from SAP HANA.
Another questions is, will it be right to say that HADOOP provides more freedom on Data Storage technology advancement in terms of Petabytes of data stored with multiple locations, nodes etc, whereas the SAP HANA provides its edge over HADOOP mainly on performance, speed for real time analytic and transnational applications!
Thanks,
Sameer
Hi Sameer,
Hadoop is much more powerful in Big Data Scenarios and HANA is not going to replace Hadoop, HANA will use Hadoop as Data Source for faster analysis of data.
Check these blogs for more info:
http://www.saphana.com/docs/DOC-2934
http://www.saphana.com/community/blogs/blog/2012/08/27/solving-big-data-with-sap-hana-and-hadoop
For HANA and HAdoop Integration, Check the below two blogs by Mahesh:
http://scn.sap.com/community/developer-center/hana/blog/2013/05/20/sap-hana--hadoop-integration-1
http://scn.sap.com/community/developer-center/hana/blog/2013/05/20/sap-hana--hadoop-integration-2
Now with HANA SPS6 you can use Smart Data Access - and it enables you to access remote data as if they are local tables in HANA, without copying the data into SAP HANA.
To learn more check the below two notes:
Note 1868209 - SAP HANA Smart Data Access: Central Note
Note 1879294 - SAP HANA smart data access SP1
You can use Hive Interface to connect to Hadoop using Smart Data Access
To know more, check this blog once:
Check this to see more about Hadoop and Smart Data Access Integration:
http://www.sap.com/corporate-en/news.epx?PressID=20900
Regards,
Vivek
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Vivek - Nice compilation of links, thanks.
HANA offers faster reporting and a better collection of query modelling and developer tools than HADOOP. HADOOP primary relies on Disks for reads and writes, whereas HANA relies primarily on Memory. Disk is a slower cheaper medium. Memory faster and more expensive.
The HADOOP ecosystem is primarily open source with many different tools at different levels of maturity.
If you have high volume of Low value data, then HADOOP might be better for you. [e.g.Web logs]
If you have lower volume of high value data then HANA might be better for you. [e.g. Current year sales figures]
If you have a mix then perhaps integrating both might be a suitable solution.
I've recently put up an example use-case integrating HADOOP and HANA.
Smart data access running off HIVE may not prove to be be that successfully.
HIVE was not built with real time reporting in mind.
Other new solutions on HADOOP, such as Impala, Stinger (Hive v2), and MapR may ultimately turn out to better at integrating with HANA, but it's early days with these solutions.
A HIVE query which takes 30 seconds may run in less than 3 seconds using Impala.
I know which one I'd prefer to use as a virtual table in HANA 😉
User | Count |
---|---|
81 | |
10 | |
10 | |
9 | |
7 | |
6 | |
6 | |
5 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.