Inline deduplication is a technique that removes redundant components of data before writing it to a storage device. Eliminating duplicate pieces reduces the storage space requirements without compromising the safety of the data.
Overview As the amount of data continues to grow, the demand for more storage space, data center size, cooling, network bandwidth, and other requirements increases. In addition, the growth adds operational complexity, administration time as well as risks. Consequently, ensuring data security and compliance becomes costly and challenging. According to IDC, multiple duplicate copies of content account for about 75% of the data in storage today. As such, removing the redundancies can help organizations reduce their storage needs and costs, and this is where deduplication comes in.
Basically, deduplication, dedupe or deduplicate is the technique of eliminating duplicate components of data before backing up or for primary storage.
Currently, the five main types of deduplication are;
Inline deduplication: removes redundant data before writing to disk
Post-process deduplication: removes duplicates from data already on the backup device.
Although the outcome depends on the environment, the inline deduplication is probably more efficient and economical than the post-process technique for some applications. However, the savings it achieves depend on the type of files, frequency of backups, environment, and other variables. Typical solutions can cut the storage needs by a factor of between 10 and 30, and this translates to lower drive capacity and bandwidth requirements.
Generally, reducing the data footprint has benefits such as smaller data center space, and savings on hardware, software, bandwidth, and power.
How does inline deduplication work? The technique compares new data with what is in the storage device and only writes unique parts of the content. If there are matching pieces, it does not write the data again but adds a pointer to the existing data in the storage media.
The deduplication software breaks the data sets into smaller parts and then uses algorithms to append identifying hashes to each of the chunks, file, byte or block. Using smaller data pieces delivers better reduction and storage efficiency.
When there is new data to write, the algorithms first checks if the hash identifier is in the storage and only writes the unique parts. If there is a match, it does not write the data but rather adds a pointer to the existing piece on the backup drive.
For example, if a file is 100% original, the system copies everything to the backup device. However, if there is a similar file on the backup, it does not back it up in its state; instead, the technology writes a pointer or placeholder to a hash table.
When restoring, the system uses pointers in the hash table listing to retrieve and copy the duplicate pieces of the content.
The removal of duplicates happens before the system writes the data to the disk, and this may slow down the backup process. However, eliminating redundant content reduces the amount of data to write, and the overall delay may be insignificant.
Benefits of inline deduplication The benefits specific to inline processing include;
Inline deduplication occurs before writing files onto the disk. As such, it minimizes the I/O operations hence reducing wear of the drive.
Easy and less costly to implement and operate.
Does not require a large buffer or extra temporary disk for the uncompressed data such as it the case with the post-process deduplication.
Less footprint for backup, replication and data center in addition to lower network bandwidth requirements for the transfer.
You do not have to wait for the entire data set so as to start replicating to a remote site. Therefore, you can replicate even when the backup is running; hence improve the disaster recovery readiness.
Drawbacks of inline deduplication
Despite optimizing storage, restoring data from the backup is usually inefficient or slow.
Since the process happens between the server and the backup device, or processor and storage device, it may slow down the speed of writing the data. This may not be an issue in most cases but may become a problem when there is a need to quickly copy large amounts of data.
Restoring data from inline deduplicated backup is usually a compute-intensive and time-consuming process, but this may differ according to the hardware resources and environment.
Hardware vs. software inline deduplicating appliance The choice between hardware and software deduplication depends on the environment as well as current backup software and configuration. While you need additional software and configuration on older storage systems, the modern hardware such as flash comes with inbuilt inline deduplication options. If you have a system without the inbuilt option, you can extend its capabilities by inserting an inline deduplicating appliance in front of the existing legacy storage array.
Plug n play hardware appliances with built-in deduplicating capabilities provide faster processing and are easy to add. However, scalable is usually a challenge in addition to sometimes requiring complex integrations with existing infrastructure. On the other hand, there are now powerful Intel processors that are enabling software-based solutions to deliver better performance without compromising on the speed. The software approach, such as the Altaro backup solutions and others, have fewer overheads, are less costly, more flexible, easily scalable to the petabyte scale level, and ideal for virtual and cloud environments.
When do you use inline deduplication? Although inline deduplication is one of the major data reduction techniques, it is not suitable for some applications. For example, it delivers negligible savings for engineering test data, music, video, x-ray data, etc.
The technology may not be the best fit for every environment and below are some areas where it works better.
When disk capacity is limited and the cost of expanding is very high or there are physical space challenges such as in HCI appliances.
You regularly back up large amounts of redundant data
You want to minimize or optimize bandwidth to remote disaster recovery or replication sites.
Inline dedupe may be unsuitable for big data and businesses with smaller volumes of data. However, it can reduce the cost of backup significantly for virtual machines.
As an example, imagine your organization has about 500 virtual machines running the same operating system. In such a case, each instance of the OS comprises of identical blocks. Using the inline deduplication, you only need to write each block once instead of 500 times. Another application where technology delivers huge savings is when archiving emails. For example, instead of storing a copy of an attachment for every user, the technology will only write one copy to the backup storage media.
Applications in hyper-convergence infrastructure appliances and virtual desktop environments Most HCI vendors prefer the inline deduplication to optimize internal storage. Compared to the post-process, the inline has better performance in addition to reducing storage capacity requirements and wear of the drives. Usually, the HCI appliances can only accommodate a limited number of physical disks and removing the duplicate data helps to optimize the limited storage space.
Inline deduplication is also suitable for VDI storage which has always been a challenge. Most people are usually after the performance when deploying virtual desktop environments. To achieve this, providers often use expensive, high-performance storage. By reducing the data footprint, you can efficiently use the limited storage that the expensive but high-performance drives offer without spending more on extra drives.
Deduplicating inline for primary storage Although most organizations use inline dedupe for backup or on secondary disks, it is also applicable for primary storage. This is especially useful when you want to take advantage of the fast but expensive flash memory.
Unfortunately, the cost of the flash storage is usually very high and you may not justify purchasing larger capacities. But, eliminating the duplicate information allows you to save and enjoy the high speeds and a better return on investment. In some applications, the inline deduplication has the ability to level the capacity playing field between the low-cost traditional storage arrays and high performance and costly all-flash arrays. For example, a 10:1 ratio means that a 10 terabyte all-flash array has the potential to store data at the same level as an 80 to 100 TB array.
Data volumes continue to grow at a faster rate than the drop in the price of storage. Yet, there is a need to look for ways to reduce the storage costs without sacrificing the security and quality of the data. One of the most effective techniques is the inline deduplication which removes duplicate pieces before writing data to the drive. Consequently, the downstream operation such as the backup, archiving, replication and network transfers will benefit from the lower data footprint.
Day 1 began with the general session, which was a lot different than the previous year where the VMware Executives laid out their vision for the partner community. This general session was focused more correctly on the audience in attendance.
VMware's CTO, of Global Field and Industry, Chris Wolf began the general session. Chris is responsible for shaping VMware’s long term technology vision, while ensuring that Research and Development priorities align with customer and industry needs. With this being a technical partner conference, I felt this was the right choice for leading the general session. Last year, it felt more like a sales pitch and less technical. I am not sure if this was due to feedback that VMware received after last years conference. In my opinion, this demonstrates that this conference is now correctly aligned with the audience attending the event.
VMware Empower 2019 is bigger, and with much richer content than 2018. Now, with over 90 breakout sessions, first time instructor led VMware labs along with VCDX experts on hand to talk with, and opportunities to take a certification exam.
Chris spoke about the nature of applications, how they are changing with an unprecedented growth. Applications are more diversified and the demand is increasing more than ever before. The application needs and requirements are driving IT initiatives within customer business.
Chris continues the general session with talking about the hybrid and public cloud journey. VMware approaches cloud through a consistency within the infrastructure, operations, and a native developer experience. This allows for workloads and user experience to be consistent across both hybrid and public cloud offerings. With bringing consistency within the infrastructure and operations, customers can more easily bring in Service Integration for managing cloud through business KPIs across the customer organization.
Automation allows customers to set guardrails by line of business and manage via policies. Automation also brings with it the ability to re-mediate and to conform to standards, follow best practices, and adhere to industry standards. Governance and Security allows for reporting on compliance and fixing misconfigurations. Governance and security also brings compliance by teams and allows for proactive monitoring of security and compliance risks. The last one, Cost and Visibility, allows IT to accurately allocate costs and find unused resources. IT can optimize costs and infrastructure, automate cost control, and continue cost optimization based on strategy.
VMware Cloud Foundations brings complete cloud integration with vSphere, vSAN, and NSX. Chris talks about how vSAN adoption is growing with more than 38% of the market now and how Cloud Foundations is the right choice for building a consistent cloud experience. Through this platform, IT can more easily manage and deploy automation, governance, and security, all while controlling costs.
VMware demonstrates these abilities through CloudHealth, as seen above. CloudHealth, acquired by VMware, is a crucial multi-cloud management platform that works across AWS, Microsoft Azure, and Google Cloud Platform, giving customers a way to manage cloud cost, usage, security, and performance from a single interface.
CloudHealth has over 80 billion workloads managed through this platform today and are the leader in multi-cloud management. CloudHealth has perspectives on optimization for right sizing with cost controls, downsizing, and reserved instances. Customers can build out policies to control things like low EC2 utilization.
Chris talked about traditional network challenges businesses face today and the need to bring in automation and intrinsic security into the security fabric of networks.
SD-WAN has seen large momentum with 2,000+ customers, with more than 70+ countries. Chris demoed deploying SD-WAN into remote locations. This was very easily deployed and took only minutes to provision in front of a live audience.
Chris spoke about VMware Cloud (VMC) on AWS and the benefits of this platform. He spoke about some use cases like data-center evacuation, disaster recovery, applications integrating with AWS offerings like AWS Lambda, which is an event-driven, server-less computing platform provided by Amazon as a part of the Amazon Web Services. It is a computing service that runs code in response to events and automatically manages the computing resources required by that code. VMC on AWS is now available in Singapore and Canada. You can see the full roadmap and further information from the VMware Cloud on AWS site from VMware, here.
The VMC on AWS provider partners are growing as you can see from the slide above and it's important to note that these partners are VMware Cloud Verified, which means that when you see the VMware Cloud Verified logo, you'll know you can easily access the full set of capabilities of VMware's Cloud Infrastructure. Get the ultimate in cloud choice through flexible and inter-operable infrastructure, from the data center to the cloud.
Chris spoke about simplicity and choice for customers at the edge. He spoke about AWS Greengrass, Azure IoT, Data Analytics, and Hybrid Applications running on that consistent infrastructure of vSphere and spoke about Azure IoT Edge running on vSphere.
The day was ended with a demonstration of edge devices. VMware demonstrated running vSphere on a Mac Mini and a demonstration of ESXi running on an Intel Compute Stick. They performed a vMotion across WiFi which was amazing to witness.
Overall this was one of the better events I have attended from VMware. The sessions along with the general sessions are very technical. I am excited for the next few days in Atlanta.
Have you ever thought about what it takes to skydive? If you are going to skydive then you want to work with a company that has demonstrated an excellent safety record and one that has predictable and repeatable business outcomes; after all, your life is on the line.
There are a lot of inherent dangers in skydiving. You need a company that can prepare you to deal with all aspects of this sport so that you can make it home safely.
There are several recommended areas to focus on to prepare you for the experience of stepping out of an airplane in flight. This includes training and informing you on potential dangers you may face. The same is true for working with a VMware Master Services Competency partner. You know that you will be working with a highly skilled partner that can walk you through all the steps involved to meet your business needs and objectives. In 2018, VMware announced the new master services competencies for partners interested in deepening their practices and/or building additional practice areas. VMware master services competencies require achieving advanced technical certifications and proof of high-level service capability and expertise as validated by the partner’s customers.
These competencies allow partners to differentiate in four specific solution areas:
Cloud management and automation – designates expertise in delivery of VMware Cloud Management and Automation solutions and services. Achieving this competency validates deep understanding and execution of cloud management design principles and methodologies.
Data-center virtualization – designates expertise in delivery of VMware vSphere environments and digital infrastructure services. Achieving this competency validates deep understanding and execution of Data Center Virtualization design principles and methodologies.
Desktop and mobility – designates expertise designing, installing, and maintaining VMware Workspace ONE and Horizon solutions. Achieving this competency validates deep understanding and execution of desktop and mobility design principles and methodologies to deliver a scalable and reliable digital workspace.
Network virtualization – designates expertise in delivery of VMware NSX environments and services. Achieving this competency validates deployment and optimization of NSX environment capabilities.
The VMware master services competencies recognize outstanding service partners for expertise in specific VMware solution areas. Through attaining these competencies, partners like Sirius can demonstrate a proven success and expertise in a specialized area of business.
Russ Kaufmann, National Partner Manager for VMware, says, "As customers embrace digital transformation and shift towards adopting cloud technologies, it is important that partners are able to demonstrate expertise around delivering these type of solutions. This includes next-generation VMware solutions like digital work-spaces, software defined data centers, and cloud services like VMC on AWS. The Master Services Competencies (MSCs) provides a framework to enable partners like Sirius around these solutions but also recognizes their expertise having demonstrated successful delivery. As customers look to “reduce risk” and “accelerate time to value” when adopting VMware solutions identifying partners with MSC designation would be a great starting point."
Since 1980, Sirius has been helping organizations solve complex business challenges so they can meet their business objectives. To achieve their position as a top solution provider, they've built a nationwide consulting, sales and services organization that provides best-of-breed technologies from across the full spectrum of information technology, including hardware, software, storage, networking, security, cloud and voice. Sirius has obtained multiple VMware master services competences in Data Center Virtualization, Network Virtualization, and Desktop & Mobility. This achievement demonstrates to customers that Sirius is dedicated, invested, and have validated expertise in advanced VMware technologies.
According to Deborah Bannworth, Senior Vice President of Strategic Alliances & Inside Sales, “Sirius is making significant investments to support our client's VMware solutions. The master services competencies only help to accelerate our capabilities and skills for our clients. It also recognizes Sirius for the investments we are making to support our field and clients.”
By leveraging these service capabilities, customers of VMware can rest assured that master services competency partners have the repeatable and successful deployments of VMware business related use cases that can meet their business objectives, across multiple solution areas.
VMworld 2018 kicked off with a packed general session and this years theme is all about you, with flashes across the main screen of "Choice Begins with You", and "Possible Begins with You."
VMware's CEO Pat Gelsinger took the stage and kicked things off talking about the theme of this VMworld, "You". Pat said that you are the VMware Community, you are the VMware Team, you are the Partners. Collectively we are wake makes VMware from its workers, to the community of customers, to the partner community assisting their customers through education, architecture, deployment, and further growth of their infrastructure.
This is VMware's 20th Birthday and as Pat states, they are almost old enough to go out for a drink.
Twenty years of innovation and being disruptors to the world of technology and that has not changed. VMware demonstrated at this years VMworld, that they are just as committed to being innovative within their own product lineup, and with the calculated partner decisions like AWS. Pat Gelsinger demonstrated his commitment to VMware and to moving the company to the next level through his newly minted VMware tattoo.
Pat Gelsinger spoke about VMware's unique role within the industry, bridging across silos through multiple iterations.
Act one, as he stated it, was the "Server Wars", where VMware changed the industry with virtualization of the compute. Act two, was the introductions of "Virtual Desktop" technology supporting BYOD. Act, three was "Network Virtualization". Act four is "Cloud Migrations," for private and public clouds. What is the next act, you ask, "Multi-Cloud" through partnerships with the Super Powers of Technology, Amazon, Microsoft, Google, and IBM with each being a super power on their own but stronger together. These partnerships make VMware the Tech Super Power for Cloud, Mobility, AI/ML, and Edge/IoT. Each of thes
Pat spoke about how through cloud you can rent cores by the hour and how with a swipe of an AMEX the industry has been transformed. Through mobility VMware has been able to reach over half of the humans on the earth although the most impoverished have not been and that VMware is committed to reaching them to. That is a lofty commitment but VMware wants to drive change, not just within technology but throughout the global communities, helping to elevate and change the lives of those in need. Through AI/ML VMware is bridging healthcare and designer treatments. Through partnerships with Mercy Ships, which run their infrastructure on VMware, and bring change to global communities that otherwise would not have access to some of the care we take for granted.
There was a lot of new announcements that came out of this first general session. Pat talked about the partnership with Amazon continues to transform and making VMware Cloud Foundations available to their partners to build cloud infrastructures.
Andy Jassy, CEO of Amazon Web Services joined Pat on stage to further discuss the relationship with VMware. He mentioned that the VMC on AWS offering is powerful in that it allows customers to utilize the technology they are familiar with along with all its benefits. The offering is growing at an astonishing rate, doubling every quarter.
The number one use case for the VMC on AWS offering is for migrating on-prem applications to the cloud and gave an example of how MIT migrated over 3000 applications in a very short time. Disaster recovery for companies like BRINKS, is the next use case.
The offering continues to expand into other regions across the globe with Sydney Australia being the latest announced. The number one request coming from customers is to have the VMC on AWS offering available in all AWS regions and in the Gov. Cloud. VMware is committed to accomplishing this goal by the end of 2019.
Three nodes for VMC on AWS along with vSAN utilizing Amazon EBS storage was also announced. This will reduce the initial entry costs into the hybrid-cloud and is in tech preview now. Another announcement was around bulk migrations along with a demonstration.
Amazon also announced Amazon Relational Database Service (RDS) on VMware for on-prem deployments. VMware and Amazon are ramping up their offerings together with NSX and Direct Connect, Enterprise Application and License Migrations, Kubernetes, and much more.
VMware announced Project Dimension which delivers VMware cloud simplicity to data center and edge. Project Dimension will extend VMware Cloud to deliver SDDC infrastructure and hardware as-a-service to on-premises locations. Because this is will be a service, it means that VMware can take care of managing the infrastructure, troubleshooting issues, and performing patching and maintenance. This in turn means customers can focus on differentiating their business building innovative applications rather than spending time on day-to-day infrastructure management.
Another announcement out of VMworld was around the CloudHealth Technologies acquisition. CloudHealth provides VMware with a crucial multi-cloud management platform that works across AWS, Microsoft Azure and Google Cloud Platform, giving customers a way to manage cloud cost, usage, security and performance from a single interface.
This VMworld was packed full of great product enhancements, further partner integrations, and a lot more announcements. This clearly demonstrates that VMware is remaining a disruptive innovator in the industry; With enhancements in Workspace One, to announcements with a Blockchain project called Concord which is an open source project that promises to provide a more efficient approach to processing smart contracts based on distributed ledgers, to Dell Provisioning services for Workspace One, to integrating AppDefense with vSphere Platinum.
There is no short of exciting things for everyone to dive into this VMworld and I look forward to the rest of the days to come.
Back in October of 2016, VMware announced vSphere 6.5. This introduced a lot of changes to their flagship hyper-visor; you can see an earlier blog I wrote about that here. Now it is that time again for a new vSphere to be announced. The announcement of vSphere 6.7 came with a lot of new features and I will go over each of them in this blog. Let's take a look at these new features:
vSphere Client (HTML-5) is about 95% feature complete
vCenter Appliance Improvements
Improved vCenter Backup Management
ESXi Single Reboot Upgrades
ESXi Quick Boot
4K Native Drive Support
Max Virtual Disks increase from 60 to 256
Max ESXi number of Devices from 512 to 1024
Max ESXi paths to Devices from 2048 to 4096
Support for RDMA
vSphere Persistent Memory
DRS initial placement improvements
Let's quickly discuss migration paths. The new version supports upgrades and migrations from vSphere 6.0 or 6.5 only and the current supported migration paths to version 6.7 are as follows:
vSphere 6.0 to 6.7
vSphere 6.5 to 6.7
vSphere 5.5 to 6.7 NOT supported and as we know support ends for 5.5 in September of 2018.
If your environment is running version 5.5, to successfully upgrade to 6.7 you have to migrate at least to version 6.0 then to 6.7.
Before upgrading the vCenter Server in a mixed environment with vCenter Server running 6.0 or 6.5 managing ESXi hosts 5.5, you have to upgrade the hosts to at least version 6.0.
vSphere Client (HTML-5)
This is the long awaited update that everyone has been waiting to be 100% complete and unfortunately VMware is only 90/95% feature complete. I have personally been using it in my home lab for the past 12 months and I am very pleased with how it has turned out. The performance has been improved and provides a more intuitive look and feel. The Web Client now has the Platform Services Controller integrated in for an easier management. In vSphere 6.5, VMware had a list of the functionalities not yet supported in the vSphere Client; hopefully the company will do the same for vSphere 6.7.
vCenter Appliance Improvements
I like the new vSphere Appliance Management Interface (VAMI) a lot and since it is functionally equivalent to the Windows-based vCenter Server, it would take a lot to convince me to use the Windows-based one instead.
The VAMI interface has been improved with new features and tabs focused on monitoring and troubleshooting. These changes in the monitoring tab are very useful along with the services tab. Now, on the monitoring tab you can see the disk partitions and available space so you can immediately see when a particular disk is running out of space and its utilization. You can also restart a particular service in the “Services” tab.
The update section has also been improved to provide for a more flexible patching and update option allowing you to stage or stage and install a patch or update from the VAMI. The changes include more information about what is included in each patch or update as well as type, severity, and if a reboot is required.
All of these new features bring better visibility to CPU, memory, network, database utilization, patching & updates, and are great improvements and resources for administrators.
Improved vCenter Backup Management
Introduced back in vSphere 6.5, was File-based backup. This has been improved in vSphere 6.7 with new native scheduler included in the UI with the retention option available. This was a huge lack in features when first introduced and left administrators having to write scripts to schedule these as reoccurring.
Now in the Appliance Management UI you can simply create a schedule for backup and the file-based restore is now provided with a browser that displays all your backups simplifying the restore process.
ESXi Single Reboot Upgrades
The vSphere upgrades can now be completed with one single reboot. With server reboots typically taking anywhere between 10-15 minutes each, this can add up in lost time. vSphere 6.7 now allows you to do a "quick boot" where it loads vSphere ESXi without restarting the hardware because it only restarts the kernel. This feature is only available with platforms and drivers that are on the Quick Boot whitelist, which is currently quite limited.
ESXi Quick Boot
The Quick Boot feature allows a system to reboot in less than two minutes as it does not re-initialize the physical server BIOS. Not just for reboots, but also for upgrades and updates too. You can create a second ESXi memory image and have it updated when rebooting by simply switching over, However, Quick Boot is only supported on certain systems and does not work with systems that have ESXi Secure Boot enabled.
Note that by default, Quick Boot is enabled if the system supports it.
4K Native Drive Support
Not a lot to write about other than vSphere now supports the larger 4K drives if you want to use them and so does vSAN. There is a nice FAQ talking about 512e and 4K native drives for VMware vSphere and vSAN (2091600) I recommend taking a look at.
Maximum Changes Virtual Machine
Persistent Memory - NVDIMM controllers per VM - 1 Persistent Memory - Non-volatile memory per virtual machine - 1024GB Storage Virtual Adapters and Devices - Virtual SCSI targets per virtual SCSI adapter - 64 Storage Virtual Adapters and Devices - Virtual SCSI targets per virtual machine - 256 Networking Virtual Devices - Virtual RDMA Adapters per Virtual Machine - 1
Fault Tolerance maximums - Virtual CPUs per virtual machine - 8 Fault Tolerance maximums - RAM per FT VM - 128GB Host CPU maximums - Logical CPUs per host - 768 ESXi Host Persistent Memory Maximums - Maximum Non-volatile memory per host - 1TB ESXi Host Memory Maximums - Maximum RAM per host - 16TB Fibre Channel - Number of total paths on a server - 4096 Common VMFS - Volumes per host - 1024 iSCSI Physical - LUNs per server - 1024 iSCSI Physical - Number of total paths on a server - 4096 Fibre Channel - LUNs per host - 1024 Virtual Volumes - Number of PEs per host - 512