Hi Pap,
Yes, there are multiple database technologies. I am a bit biased here,
my favorite DW software is SQL Server 2019 in its "Data Lake" version.
However, the "right database" depends on what do you want to do with it.
If you want to lower the cost, I would recommend DB2 which can do
everything that Oracle can, at half the price. IBM is almost
proverbially inept at selling anything except hardware with one shining
exception being WebSphere application server. DB2 is capable of natively
executing PL/SQL, it supports all SQL Standards, it even has In-Memory
option (called "Blue Acceleration" which actually preceded Oracle for
about a year) which comes for free with the enterprise edition of DB2,
Niot to mention that DB2 also uses licensing per CPU core, with license
per CPU core costing around $5000. However, despite all that, IBM sales
is in dire need of a reorganization. They were not capable of creating a
friendly user and author community or forums like this one. DB2 does
work on Linux and Windows as well. I am rather scared when I think of
what will happen with Red Hat. CentOS, once one of the most popular
Linux distributions on the planet is already dead. One has to fear for
JBoss and GFS, both having competing products within IBM.
SQL Server is also a good alternative, especially for data warehouse
deployments. Just like DB2, it too operates on both Linux and Windows.
It has a proven track record, just like DB2, and a lot of people who
know how to use it and administer it. Unlike DB2, SQL Server cannot
execute PL/SQL. I believe that's something that Kellyn is working on.
However, SQL Server has great partitioning, bitmap indexes and also
in-memory option. Just like DB2, SQL Server is also much cheaper than
Oracle.
As for the open source databases, the only way to figure out which one
to use is a pilot project. Open source databases are missing many
features that commercial databases have. Whether an open source DB is
good enough for your project can only be decided by a pilot project. I
have had some successful implementations with PostgreSQL, mostly for
smaller HR applications or as a database for bugzilla. I wouldn't
necessarily trust benchmarks because cheating on benchmarks is a regular
occurrence. Google for TPC-C cheating or Dhrystone MIPS cheating and
you'll see why benchmarks are not to be trusted. Also, if you test a SAN
using bonnie++, you will never get the same number of IOPS as
advertised. Benchmarks are relevant to see who are the legitimate
players and who are upstarts.
So called No-SQL databases are not using SQL because SQL has certain
transaction consistency requirements (known as ACID) which are rather
hard to implement. That can sometimes be good enough for a document
store. Again, a pilot project is the only way to make a decision.
Regards
On 4/18/21 6:53 AM, Pap wrote:
Thank you mladen.
My apology. Actually I have zero experience on snowflake but came across that blog so thought of checking around that.with the experts if some have experience around that or similar stuff. I have only worked in Oracle databases throughout my career, so no handson around other database technologies.
Thank you for the details. I was not aware about TPC-H benchmark before but as i went through it seems a common standard measurement for DB performances and as you rightly pointed out it has zero information around snowflake which makes it questionable.
And out of curiosity , We are coming across multiple database technologies, So I was also trying to search for a common place which could provide us information around different database technologies and their best use cases or what they are best suited for. Is there any such? Or TPC-H is the right place to see basic comparisons?
And yes i agree that Oracle has 30 years of its invention fed into that product so it must not be that easy for anybody to make it within a short time. But few things I do see, that when we try to scale it at some point just adding additional storage cells is not helping much.
Regards
Pap