cloud space optimization
Rapid change is the new normal as organizations embrace the cloud.
As enterprises migrate to the cloud, data management has emerged as a crucial consideration. Many options are available, so it is important for enterprises to know exactly how they want to structure and store their data. Above all, you need to establish the criteria you are looking for, in order to effectively evaluate what kind of storage needs to be used. Have your data strategy clearly defined as to what workloads need to be migrated to the cloud before choosing a storage type.
Many enterprises are turning to cloud storage solutions because it is viewed as inexpensive, but you need to make sure that application performance is not compromised. Cloud storage can be used for a wide variety of workloads such as archiving and disaster recovery, and there are different proposed tiers of storage depending on the workload.
Storage type plays a key role in the decision process and is determined by different storage patterns, as follows:
- Data in motion: Data in queue needs to be persisted for ‘time to live’ and replicated for resilience.
- Data Lake: Large volumes of data with staging, transformation, harmonization, and scale-out compute for specific workloads
- Analytical data store: More for analytical data access, it combines with dedicated workloads to meet SLAs.
- Operational data store (ODS): Data storage which is more transactional in nature, with frequent updates and deletions
- NoSQL: Optimally store different data types such as XML and content store
- Scale-out storage: Need for scaling out for certain workloads with large data volume, for a limited timeframe, once output is persisted, scale down to a steady state
- Perspective/business functions: Data can be optimally partitioned and stored in different physical manifestations, designing an optimal access path
- Archive: Data that is beyond a retention period of the data topic within the system
- In memory compute: For near real-time access to data
- Cache: For SLA-driven access to components/data
Cloud storage optimization: Five key considerations
A standard methodology needs to be created to associate storage with a particular processing type. Here are five key considerations to determine your optimal cloud storage:
- Segregated repeatable architecture/design patterns and types of processing that might need a different handling of storage type (See Figure 1).
- Understand the key criteria for processing, data access and performance of the application from the client like availability, durability, and scalability.
- Rationalize the criteria and come up with a recommendation of storage type by the patterns.
- Design for federated data access as you will most likely end up accessing data across platforms that are on-premise and in the cloud for certain capabilities.
- Come up with a recommended matrix of storage type by cloud providers, mapped to the given criteria (see Figures 2 and 3).
Data storage is no longer a warehousing issue; implicit in the new world of data everywhere is the implied ability to find, access, and use that data in an efficient manner. Much of that data exists on a cloud, so you need to know how to make sure data storage is optimized and doesn’t become a weak link in your cloud platform. Storage techniques and software tools can help you achieve data and database optimization, and help to manage virtualized data storage through the software layer. This article explores what storage optimization tools can do for you.
Given the vast amounts of data being created daily throughout the world, it is not surprising that businesses are seeking more efficient and cost-effective means of storing data. Once the era of cloud computing emerged, the world needed lots more storage at a low price point; then the cost of storage media declined. But two newer trends, big data analysis and the Internet of Things, threaten to override the savings from lower storage media costs simply because the massive data handling requirements can be overwhelming.
Features of cloud-optimized storage
You can categorize three major methods for optimizing storage for a cloud system: Optimizing the data, optimizing the database, and implementing software-defined storage.
Data optimization is probably the most significant recent software innovation in storage because you can save more information in a smaller physical space. The three tools of data optimization are deduplication, compression, and thin provisioning.
With deduplication (also known as single-instance storage), you save space by eliminating duplicate copies of repeating data. In this process, you analyse data in order to identify and store unique chunks of data (byte patterns). As you continue to analyse the data, other chunks are compared to the stored copy; when a match occurs, the newly found redundant version is replaced by a reference that points to the stored version. In typical data storage, the same byte pattern can occur thousands of times (depending on how large a chunk you are trying to dedupe). The smaller a storage block size is, the greater the ability to dedupe data. Standard file-compression tools tend to identify short substrings inside a single file; deduplication looks for large sections (even entire files) within large volumes of data. Cloud file-sharing applications generally are perfect targets for deduplication. You can also use deduplication on network data transfers to reduce the number of bytes you send.
Data compression (also known as intelligent compression) is a familiar term that simply means encoding information in fewer bits than the original. As a reminder, there are two types of compression, lossy or lossless. In lossy compression, unnecessary information is identified and removed. In lossless compression, the bits required for transmitting data (the statistical redundancy), but not the data itself, are identified and removed. Commonly used file-level compression includes most audio and image files.
Thin provisioning (also known as sparse volumes) is a core concept for virtualization, but it is not a data-reduction method; it is a model of resource provisioning. Thin provisioning gives the appearance that you have more physical resources than are actually available. Space can be easily allocated to servers on a just-enough and just-in-time basis. The mechanism applies to large-scale disk-storage systems, SANs, and storage virtualization systems.
Although sharding might also be considered a data optimization option, it is probably better defined as a database optimization method. Sharding, a form of horizontal data scaling, is designed to meet the demands of data growth and access; it is the process of storing data records across machines once the size of your database causes performance limitations because read/write throughput becomes unacceptable. You just add more machines to support data growth.
Software-Defined Storage (SDS) is a way for computer data storage software to take the lead in managing policy-based provisioning and storage tasks independent of hardware. SDS typically includes some form of storage virtualization tool that separates storage hardware from software. It is better defined by some of its more common characteristics: