greenplum

The Elephant Behind the

The Elephant Behind the Excellence

The Elephant Behind the Excellence 650 486 Exist Software Labs

The preceding year, 2021, was an eventful year for EXIST Data Solutions: new team members were added, new technologies were learned, and new projects were implemented.

In the enterprise database front, PostgrEX was implemented in a prestigious 5-star hotel and casino, a state university, and a major security agency handling the biggest mall in the country.

YugabyteDB, the No. 1 cloud-native, distributed SQL database in the world, was implemented in 3 government projects, and premium EnterpriseDB support was rendered to the country’s primary energy market corporation.

On the data analytics front, Greenplum was also successfully implemented in 3 government projects, thereby enabling these entities to turn their data into actionable insights.

But what do all these business-transforming technologies have in common? In a word: Postgres.

Postgres is the database engine upon which PostgrEX, YugabyteDB, EDB, and Greenplum are based. With most of them, modifications in varying degrees were done to core Postgres to deliver a product that is still Postgres, but better!

As indicated in the article, Databases in 2021: A Year in Review, the dominance of Postgres in the year 2021 was undeniable:

The conventional wisdom among developers has shifted: PostgreSQL has become the first choice in new applications. It is reliable. It has many features and keeps adding more. In 2010, the PostgreSQL development team switched to a more aggressive release schedule to put out a new major version once per year (H/T Tomas Vondra). And of course, PostgreSQL is open-source.

PostgreSQL compatibility is a distinguishing feature for a lot of systems now. Such compatibility is achieved by supporting PostgreSQL’s SQL dialect (DuckDB), wire protocol (QuestDB, HyPer), or the entire front-end (Amazon Aurora, YugaByte, Yellowbrick). The big players have jumped on board. Google announced in October that they added PostgreSQL compatibility in Cloud Spanner. Also in October, Amazon announced the Babelfish feature for converting SQL Server queries into Aurora PostgreSQL.

One measurement of the popularity of a database is the DB-Engine rankings. This ranking is not perfect and the score is somewhat subjective, but it’s a reasonable approximation for the top 10 systems. As of December 2021, the ranking shows that while PostgreSQL remains the fourth most popular database (after Oracle, MySQL, and MSSQL), it reduced the gap with MSSQL in the past year.

Another trend to consider is how often PostgreSQL is mentioned in online communities. This gives another signal for what people are talking about in databases.

What does all this mean for you and your business? It means you can entrust your most mission-critical applications to Postgres and its derivatives; it means you can break free of vendor lock-in and redirect cost savings to core business initiatives; it means your company can be a better version of itself–a more profitable version–in the year 2022!

Contact us and find out how EXIST Data Solutions can meet all your database-related requirements.

Exist is your data solutions partner of choice!

Explore the next level of your digital transformation journey with big data and analytics. Let’s look at opportunities to better maximize your ROI by turning your data into actionable intelligence. Connect with us today, and we’ll proudly collaborate with you!

web 800x507 Befriending 1 768x487 1

Befriending Your Data in 2021

Befriending Your Data in 2021 768 487 Exist Software Labs

It’s the new year and everybody is still living in the wake of the COVID-19 pandemic. We all need a friend in times of trouble and this is no different in the case of business organizations.

This year, 2021, the friend that your company needs more than ever, especially in these trying times, is data. Given the disruption that this virus caused in the preceding year, enterprises need to start (if they haven’t already) befriending their own internal data, and perhaps external data as well, if they are to at least stay viable and at most grow.

The following are some insights from respected data management leaders on how to make friends with your data this year:

  • “Data warehouses are not going to disappear. Data warehouses will continue to be an important legacy technology that organizations will use for mission-critical business application well into the future. With the transition to the cloud, data warehouses got a fresh new look and offer some modern attractive capabilities including self-service and serverless. With the rise of the cloud, data lakes are the new kid on the block. Data lakes are becoming a commodity, legacy technology in their own right. Their rapid emergence from the innovation stage means two things going forward.

    First, organizations will demand simpler, easier to manage, and more cost-effective means of extracting usable business intelligence from their data lakes, using as many data sources as possible. Second, those same organizations will want the above benefit to be delivered via tools that do not lock them into proprietary data management platforms. In short, 2021 will begin to see the rapid introduction and evolution of tools that allow users to keep their data lakes in one place and under their control while driving performance up and cost down.”

  • “Distributed analytical databases and affordable scalable storage are merging into a single new thing called either a unified analytics warehouse or a data lake house depending on who you’re talking to. Data lake vendors are scrambling to add ACID capabilities, improve SQL performance, add governance, resource management, security, lineage, all the things that data warehouse vendors have been perfecting for the last three or four decades. During the ten years, while data lake software has been coalescing, analytical databases have seen their benefits and added them to their existing stacks: unlimited scale, support for widely varied data types, fast ingestion of streaming data, schema-on-read, and machine learning capabilities. Just like a lot of things used to claim to be cloudy before they really were, some vendors will claim to be a unified analytics warehouse when they’ve just jammed the two architectures together into a complicated mess, but everyone is racing to make it happen for real. I think the data warehouse vendors have an unbeatable head start because building a solid, dependable analytical database like Vertica can take ten years or more alone. The data lake vendors have only been around about ten years, and are scrambling to play catch-up.”

  • “One single SQL query for all data workloads

    The way forward is based not only on automation, but also on how quickly and widely you can make your analytics accessible and shareable. Analytics gives you a clear direction of what your next steps should be to keep customers and employees happy, and even save lives. Managing your data is no longer a luxury, but a necessity–and determines how successful you or your company will be. If you can remove complexity or cost of managing data, you’ll be very effective. Ultimately, the winner of the space will take the complexity and cost out of data management, and workloads will be unified so you can write one single SQL query to manage and access all workloads across multiple data residencies.”

  • “Expect more enterprises to declare the battle between data lakes and data warehouses over in 2021 – and focus on driving outcomes and modernizing.

    Data warehouses can continue to support reporting and business intelligence, while modern cloud data lakes support all analytics, AI and ML enablement far more flexibly, scalably, and inexpensively than ever – so enterprises can go transform quickly.

    Cloud migrations and related cloud data lake implementations will get demonstrably faster and easier as DIY approaches are replaced by turnkey SaaS platforms. Such solutions will slash production cloud data lake deployment times from months to minutes, while controlling costs and providing the continuous operations, security and compliance, AI and ML enablement, and self-service access required for modern analytics initiatives. That means that migrations that used to take 9-12+ months are complete in a fraction of the time.”

  • “Co-locating analytics and operational data results in faster data processing to accelerate actionable insights and response times for time-sensitive applications such as dynamic pricing, hyper-personalized recommendations, real-time fraud and risk analysis, business process optimization, predictive maintenance, and more.

    To successfully deploy analytics and ML in production, a more efficient Data Architecture will be deployed, combining OLTP (CRM, ERP, billing, etc.) with OLAP (data lake, data warehouse, BI, etc.) systems with the ability to build the feature vector more quickly, and with more data for accurate, timely results.”

To summarize the various points made by these industry pundits:

1

SQL-driven data warehouses are here to stay and will continue to be the data analytics platform of choice for enterprises in the current year.

2

Data management platforms that integrate well with existing data lakes will dominate as opposed to platforms that focus on one or the other.

3

Data management platforms that have built-in AI/ML functionalities will dominate as well, as this eliminates the cost and complexity of separate AI/ML analytics platforms.

4

Data management platforms that are cloud-ready will also have an edge over those that are not.

Is there a data management platform that possesses all these qualities and has a proven track record in Fortune 500 companies?

Yes, there is. It’s called Greenplum. Read about it here.

Greenplum Blog 768x487 1

Why is Greenplum the Best Choice for a Cloud Data Warehouse?

Why is Greenplum the Best Choice for a Cloud Data Warehouse? 768 487 Exist Software Labs


The Best MADP

Data is the drivetrain of digital transformation and the enterprise with the ability to tap into all possible data sources in order to gain actionable insights is at a key advantage.

In order to gain this advantage, a Modern Analytics Data Platform (MADP) is required. What are the attributes of an MADP that make it the technological foundation of digital transformation?

12attrsofMADP

Greenplum ranks high in every one of these attributes, ensuring the enterprise of continuous access to valuable insights.

In fact, Gartner has ranked Greenplum as the No. 1 open source Data Warehouse platform for 2019, with only the very costly Teradata and Oracle above it:

greenplumgartnerno1 1

This combination of being a premier MADP and no-comparison cost-effectiveness makes Greenplum the leading choice for most enterprises seeking data-driven digital transformation.


Moving the Data Warehouse to the Cloud

There are many benefits to moving your enterprise data warehouse to the cloud aside from the more common advantages of mitigating the cost of and simplifying management, administration, and tuning activities.

The following are some of the more salient benefits:

1. Vertical and horizontal scalability – With the influx of ever-increasing volumes and varieties of data come the need to be able to add processing and storage capability to your existing data warehouse infrastructure in a quick and agile manner. This also includes the ability to scale out and add more nodes as the number of users increases.

2. Drastically-reduced start-up and operating costs – The risk of investing millions of dollars in on-premise machines or appliances only to have them become outdated in a number of years is eliminated with the cloud’s pay-only-for-what-you-use-when-you-use-it model.

3. Agile feature enhancement – Advances in data analytics call for products that are quick to adapt to these new features. The cloud infrastructure allows for seamless integration of new functionalities behind the scenes.

4. Top-notch support – Access to 24/7 support by a team of experts means that your system never has to go down, allowing for stellar SLA fulfillments.

5. Security – Since the top cloud providers are required to meet strict security standards set by health, financial, and government entities, you can be assured that your data is kept safe, making it easier to attain certifications like ISO27001, SOC2, GDPR, HIPAA, and PCI. Authorization, authentication, logging, and auditing are basic to all these platforms.


Greenplum in the Cloud 

Pivotal Greenplum is available in the 3 major cloud service providers: AWS, Azure, and GCP. 

Greenplum on AWS

  • Same Pivotal Greenplum software as on-premises or cloud installation
  • Secure Deployment with Product Review from Amazon
  • GP Browser included (Web based SQL Query Tool)
  • Optional Installer makes installing additional components such as MADlib and Command Center easy!
  • Self Healing automates node recovery without administrative intervention
  • Snapshot Utility automates instant and non-blocking database backups
  • Optimized Deployment for Performance using Best Practices
  • Development to Production Deployments via AWS Cloud Formation
  • PgBouncer Connection Pooler included and preconfigured
  • Upgrade Utility notifies and automates cluster upgrades
  • Disaster Recovery via copied Snapshots simplifies and reduces cost for a DR solution

Greenplum on Azure

  • Same Pivotal Greenplum software as on-premises or cloud installation
  • Secure Deployment with Product Review from Microsoft
  • GP Browser included (Web based SQL Query Tool)
  • Optional Installer makes installing additional components such as MADlib and Command Center easy!
  • Self Healing automates node recovery without administrative intervention
  • Optimized Deployment for Performance using Best Practices
  • Development to Production Deployments via Azure Resource Manager Deployment
  • PgBouncer Connection Pooler included and preconfigured
  • Upgrade Utility notifies and automates cluster upgrades
  • Snapshot Utility automates instant and non-blocking database backups

Greenplum on GCP

  • Same Pivotal Greenplum software as on-premises or cloud installation
  • Secure Deployment with Product Review from Google
  • GP Browser included (Web based SQL Query Tool)
  • Optional Installer makes installing additional components such as MADlib and Command Center easy!
  • Self Healing automates node recovery without administrative intervention
  • Optimized Deployment for Performance using Best Practices
  • Development to Production Deployments via Google Deployment Manager
  • PgBouncer Connection Pooler included and preconfigured
  • Upgrade Utility notifies and automates cluster upgrades

For a more detailed presentation on Greenplum on AWS, watch this:

https://tanzu.vmware.com/content/webinars/apr-2-the-enterprise-data-science-warehouse-greenplum-on-aws

web 800x507 warrenblog2020 768x487 1

Why the Data Warehouse Is Here to Stay

Why the Data Warehouse Is Here to Stay 768 487 Exist Software Labs

The buzzword has been “digital transformation” and the phrase continues to announce the importance of leveraging new technology as the catalyst of improvement in the enterprise. New ways of doing things have been introduced and this is no less apparent in how data is now collected and used for business intelligence and analytics.

The advent of Big Data many years ago brought about huge excitement in these areas. The recognition that there is more data to be collected and used in the enterprise saw the emergence of technologies that facilitated the ingestion of all types of data, their storage in distributed file systems, the ability to scale out easily to accommodate more data, and the various means of getting at this data. But there was a problem.

While the ability to capture and store all types of data, including unstructured data, seemed to be the panacea, it became immediately apparent that:

  • Most business data is structured   
  • Everybody knows SQL
  • The relational model is popular
  • Dimensional modeling works

While it is true that the Big Data “data lake” has the potential of opening up more insights due to the volume and variety of data, real-world use cases have shown that actionable data almost always came in the form of SQL-interfaced, relational data. And this is why the Data Warehouse never really went away.

But the modern data warehouse is a vastly different animal than the traditional data warehouse of years gone by. For a data warehousing platform to be called modern and a true agent of digital transformation, it must have the following attributes:

  • Support any data locality (local disk, Hadoop, private and public cloud data.)
  • In-database advanced analytics.
  • Ability to handle native data types such as spatial, time-series and/or text.
  • Ability to run new analytical workloads including machine learning, geospatial, graph and text analytics.
  • Deployment agnostic including on-premises, private and public cloud.
  • Query optimization for big data.
  • Complex query formation.
  • Massively parallel processing based on the model, not just sharding.
  • Workload management.
  • Load balancing.
  • Scaling to thousands of simultaneous queries.
  • Full ANSI SQL and beyond.
  • MPP data warehouse able to run seamlessly on-premises, public or private clouds, with a much-expanded mission from previous designs.
  • Primarily based on open source projects with strong communities behind them.
  • Supporting both data science computation and preservation and publishing of data science models.
  • In-database analytics and data science libraries. The alternative is running machine learning algorithms against Hadoop or cloud repositories, but needing to move results to another platform for further analysis and presentation (visualization, dimensional models for scenario planning, etc.)
  • Able to support cost-based query optimizations on polymorphic data, while delaying analysis of the data structure until runtime. 1

As you can see, a Hadoop Big Data implementation and the modern Data Warehouse, combined, can become the all-encompassing data platform and single source of truth of an enterprise.

With that said, the best open source-based, modern data warehousing platform in the digital landscape today is Pivotal Greenplum.

pivotalgreenplumarchi

In a succeeding blog post, we will discuss the many features that make Pivotal Greenplum the best data platform for data-driven digital transformation.

 

Notes:
1  Neil Raden, The Data Warehouse in the Age of Digital Transformation