Why Compellent proposes fewer disks

Dimitris,

Sorry to see blog posts like this one.

I like your blog and often I read it but, this time I think you have done wrong calculations and the result isn’t very good.

Probably, as per admission, you don’t know very well Compellent’s stuff and you need some help to better understand how it is possible to achieve similar results.

I would like to say in advance that I don’t know this particular deal/customer environment but I’m a Compellent reseller and I work every day to configure systems.

First of all I use 99th and not 95th percentile (Compellent suggests it to me), second I use 177 IOPS per second and not 220 IOPS per second on 15K rpm disks. We are using internal Compellent documents to do the right sizing to be sure to do the right configurations. Compellent is very, very, very – sometimes too picky- in customers data collections (iops, space, latencies, # of servers involved, type of workload and so on) so it seems very strange to me what you are writing.

Compellent, opposed to NetApp, has Fast Track, Data Progression and dynamic block cache and those help to achieve more performance with less resources than competitors, these features works all together to deliver, some time, awesome numbers!

Follow me with my calculations: 11 active disks (I suppose that 12 disks tray means 1 spare) deliver near 2000 raw IOPS but:

Fast Track

it is the ability to write data on the fastest portion of the disk (external tracks, about 20% of the disk space) and obtain more IOPS and less latency (NetApp hasn’t this feature).

If you can read/write data on external tracks you will gain, in practice, something like 15/30% of better performance and very low latencies from your disks (the advantage depends by number of servers and dispersion of writes).

the new total is 2000+20% = 2400 IOPS. ;-)

DataProgression

It is automated tiered storage (its not important what you or your company think about it but, if it is well implemented, it gives very useful advantages). Data Progression works on a single tier of disks too also thanks to Fast Track!

So you can write data in RAID10 on Fastest tracks and then they will be migrated according to policies/profiles to RAID5/6 in other portions of disks.

With NetApp you need to choose in advance the type of RAID (RAID 4 or DP to save space or RAID10 to have better performance, but not all on the same volume!).

We don’t add more IOPS here but there aren’t constraints and less performance due to the use of a certain raid!

Dynamic Block Cache

Compellent’s cache is small (512 MB for mirrored writes and 3GB per controller for reads) but it is very flexible and auto tuned! (BTW, I don’t know the NetApp proposition for the deal you speak about, but if it is a FAS 2040 we are speaking of similar amounts of cache).

You can choose for each LUN the cache behavior and the block is dynamic from 2KB to 256KB, it means that the controller allocates the right space needed and not prefixed blocks. In real world it means that you have more space and flexibility if compared to fixed blocks to manage IO peaks.

I hope I cleared at best why Compellent is proposing less disks than competitors in most of cases.

Doubts, questions or comments? Let me know.

ciao,

Enrico


  • http://real-url.org/twitted.php?id=1026522126 Twitted by esignoretti

    [...] This post was Twitted by esignoretti [...]

  • Keith Aasen

    Enrico,
    Great explanation of the technology, I learned lots. A few questions though. First Dimitris stated that the IO requirement was near 3000 IOPs and you said with Fast Track you would be getting 20% boost putting you near 2400 IOPs (you rounded up to 2000 raw I noticed). So the config still seems low for the customer requirement.

    Second, although I can see the benefit of using the outside of the disk how much data can you fit in that outer edge? If you get 20% boost on the first track that would have to reduce as you move in. Therefore, much like cache you would have to assume that the customers working set would fit into this “Fast Track” area. Is that correct?

    As I said, it is a great explanation and I learned lots but it still seems to be cutting things dangerously close.

    [Full disclosure - I am a NetApp Employee]

    Keith

  • http://www.recoverymonkey.org Dimitris Krekoukias

    Enrico, I posted my reply here: http://bit.ly/bcEHFh many questions for you :)

    D

  • Enrico Signoretti

    Sorry for delay, but I saw some comments to your doubts from Fabio.

    My thoughts were only on the ability of Compellent to offer less disks than other competitors because of features capable to get more optimized access to disks, as I wrote in my post, I don’t know enough details about the customer, the environment, prerequisites, etc. So I will continue only to show you the Compellent capabilities at my best.

    Fast Track and Data Progression are tightly integrated and they achieve awesome results if coupled: I assumed an advantage in about 20% of performance improvements as an example but it can be slightly more in function of the environment (I personally saw 7000 sustained IOPS from an Oracle ERP server on 15*15K+15*10K disks !!! ).

    BTW, I will try to clarify better my points:

    Compellent uses 450GB SAS disks (418GB really).
    In your case we will have 418*11=4598GB of usable space and, of course, 919GB of Fast Track. Each block on Compellent has its RAID level so the real net space varies from 919/2 for RAID1 to 817GB for RAID5-9, this space will reserved and freed dynamically. I don’t think that this customer has more than 800GB of active data in a system capable to deliver only 4TB!

    To recap in a very coarse way: Each write is performed on FastTrack then blocks will be migrated to other RAID levels and/or portion of disks in background at a low priority (this operation doesn’t impact on front end performance). All managed by system policies and volume profiles.

    Finally, it is hard but not impossible to achieve 3000 sustained IOPS on a well configured Compellent with 11 active disks, ;-)

    ciao,
    Enrico

  • Andrew Miller

    Ok….I’m quite intrigued here and maybe stuck on the same point as Keith. If I follow the #’s in your post right…

    11 SAS disks * 177 IOPs/disk = 1947 IOPs * 20% boost for FastTrack = 2336.4 IOPs.

    That still is 600+ IOPs way from the 3000 sustained number that you cited….what am I missing here? Even if cache is in play, there still has to be enough steady IOPs to keep the flushed during a sustained 3000 IOPs workload.

  • Enrico Signoretti

    Sorry, I tried for a short answer but i failed my goal. :-(

    Correct, 418GB is raw usable space when converted in base2. This system has 11 active disks so you have 11*418 = 4598 usable space before raid.
    Compellent RAID protection is quite different from other vendors: you don’t need to define RAID groups but you go directly to create LUNs from the usable space!

    In the LUN creation wizard you will associate a profile to the LUN: the profile defines the behavior of the LUN. There are standard (out of the box) and custom (user created) profiles. In the profile are defined all the RAID levels and tiers for the LUN. You can also modify profiles and LUNs properties on the fly.
    i.e.: you may define a DB LUN to be positioned on RAID 10 Fast, RAID 10 standard, RAID5 standard and snapshots on RAID 5 standard.
    So, you will write and access each new and hot block on “fast tracked” RAID 10 and, on the other hand, old and less used blocks are automatically migrated to RAID5 on less valuable tracks saving space and speed!!!
    To say if the 11 disks can provide 3000 IOPS I need to know more about applications/servers and data involved but, I repeat, it is hard but not impossible.

    Now, go back to the usable net space after RAID: we will have a variable net space (after RAID) ranging from half the raw space (4598/2=2299) to 4092GB for RAID 5 made with 9 disks stripes. The real space depends on how are organized the profiles and data activity… but if we can hypothesize something like 20% of RAID 10 and the rest of RAID5 you will obtain about 450GB net of RAID1 and 3270 net of RAID5. The Total net usable in this case is 450+3270=3720GBs.
    This can be changed with a pair of clicks, the system will start immediately to work with the new profile freeing or allocating space as needed! :-)

    With this architecture you need to change drastically how to think about storage metrics and it is very important to analyze in deep the environment.
    In a well sized and configured Compellent system you will write and read heavily accessed blocks on faster portion of disks (or on SSDs) and you will not pay for a write penalty due to RAID calculations getting awesome performance and space savings.

    It’s not useful to answer how many spindles I use to write in RAID 5 because I don’t need to write in RAID 5 and pay a penalty. I will write in RAID 10 and then the systems move the blocks to RAID 5.

    ciao,
    Enrico

  • http://www.recoverymonkey.org Dimitris Krekoukias

    Hi Enrico, more questions back at http://bit.ly/bcEHFh – I still don’t get it, like Andrew and Keith. Sorry :)

  • Enrico Signoretti

    Dimitris,
    I apologize for my delays, I’m very busy in these days with customers and having some problems to connect, but I see that Fabio (my colleague in Cinetica) is helping you in understanding better Compellent’s Technology.

    He posted a comment some hours ago to your post.

    BTW, I would like to add that Fabio hasn’t considered Fast Track in his comment, this data placement optimization feature can add a variable (some times slightly important) performance improvement to your disks.

    ciao,
    Enrico

  • Enrico Signoretti

    this comment is an answer to last Dimitris questions, you can find them here: http://recoverymonkey.net/wordpress/2010/03/09/more-tales-from-the-field-sizing-best-practices-does-compellent-follow-them/

    Techmute, Dimitris,

    You don’t know the technology involved and it is very difficult for me to speak about theory and compare it with a real world case when we (me and Fabio) don’t know the environment of the customer!
    I invite you to share with us all the informations about this particular case, at least:
    - a complete sampling (28800 samples in 24h, one every 3 seconds for all the server involved)
    - a full picture of the SAN (servers, applications, data)
    - customer requests

    or stop to confuse who is reading!

    BTW, for that customer did you propose something like 23 disks???
    This is very far away from 15 and we may continue to discuss for years about the subsized 12HDs Compellent configuration or the oversized 23 HDs NetApp one!
    My first question may be: Why you are proposing 23 HDs when 16 (15+1 spare) are more than enough?

    Then, please, let me know if you want to talk about the theoretical 3000 IOPS or the real world solution comprising a 12 disks system and relative optimizations.

    Anyway, I suggest you to spend some time on the Compellent site (http://www.compellent.com) to look for some documentation and videos and learn a little bit more about the architecture ( http://www.compellent.com/Products/Architecture.aspx ) of the product, probably you will find some interesting readings about how the system works, this will widen your horizons.

    Well, back to the theory.
    Fabio wrote about the 3000 IOPS with 15*15K disk for the IO pattern you suggested, without any specific optimization, and I add that Compellent may do more with data placement optimization features (Fast Track). He never spoke about a 12K block size because it’s not important, we obtain similar results with 4,8,16K blocks.

    The main difference between Compellent and others is the fully virtualized concept of the LUN thanks to block’s metadata: each LUN is dispersed in every disk of the system (SSD,FC,SAS,SATA) and with different RAID levels.
    There is no staging area (I apologize if my simplifications were pushed to the limit) and all data movements (RAID level and tier) are done when the system’s load allows to.
    All of this makes sense only if the system is properly configured.

    ciao,
    Enrico

  • Fabio Rapposelli

    This is a reply to the thread at: http://bit.ly/bcEHFh

    Dimitris,

    I’m not sure you’re understanding correctly how Compellent works, let me explain how it works basic-style:

    First thing first, the concepts:

    Tiers are organized like that:

    1st Tier – Fastest disks
    2nd Tier – Medium disks
    3rd Tier – Slowest disks

    and they’re dynamically chosen based on the available drives in the storage, every tier is subdivided in different RAID Levels and different Tracks.

    To clear things up, let’s imagine that we have a system configured with:

    15 active drives FC 450GB 15K
    15 active drives SATA 1TB 7.2K

    With this kind of system we would have:

    Tier 1 – 15K drives
    - Raid 10 Fast Tracks
    - Raid 5-9 Fast Tracks
    - Raid 10 Standard Tracks
    - Raid 5-9 Standard Tracks

    Tier 3 – 7.2K drives
    - Raid 10 Fast Tracks
    - Raid 5-9 Fast Tracks
    - Raid 10 Standard Tracks
    - Raid 5-9 Standard Tracks

    I’m using an “old” example since right now you can also have Raid 6 in the mix but let’s leave that alone for now, also Raid 5-9 means that it’s a Raid stripe made of 8 Data blocks and 1 Parity block (You can also have Raid 5-5 if you want)

    So in this system my data can live on those 8 “tiers”, right now when I create a new Volume (LUN) I can choose where to put my active data and my snapshot data just selecting a “Storage Profile”, for example let’s use a best practice for that:

    The “Recommended (All Tiers)” default profile is the most used and it’s configured like that:

    Write data on Tier1:Raid 10 and Tier2:Raid 10
    Snapshot data on Tier1:Raid 5-9, Tier2:Raid 5-9, Tier3:Raid 5-9

    I usually create another custom storage Profile called “Archival Data (R5-9)” that’s configured like that:

    Write data on Tier3:Raid 5-9
    Snapshot Data on Tier3:Raid 5-9

    To accomodate the need for low impact stuff.

    Considering that let’s see how the data flow is for those two profiles:

    —- Profile “Recommended (All Tiers)”

    - Data flow from the server hbas to Compellent front-end ports
    - the Data stage to write cache (512MB per controller) and it’s replicated to the other controller.
    - The data is written to disk on the Tier1, Fast Tracks in Raid 10.

    The data is now on stable storage.

    —- Profile “Archival Data (R5-9)”

    - Data flow from the server hbas to Compellent front-end ports
    - the Data stage to write cache (512MB per controller) and it’s replicated to the other controller.
    - The data is cached until a full stripe write is possible.
    - The data is written to disk on the Tier3, Fast Tracks in Raid 5-9.

    The data is now on stable storage.

    And that’s for the write data flow, there’s no such thing as continuous destaging to Raid 5 from Raid 10.

    After the data is on disk there are several ways to progress to the lower tiers, if you just leave the Volume (LUN) alone, the system will continue to keep statistic for every “chunk” of data (512k, 2MB or 4MB), and then progress it slowly (the algorithm is based on access treshold on a 14-day basis) to the lower tiers, that’s not a “quick & dirty” destage to Raid 5.

    Instead, if you take snapshot, either using scheduled snapshot or manually, the data progress quicker, just for example, let’s imagine we have a 100GB volume (LUN):

    it’s 12.00 pm

    - We write 10GB of data (let’s call it Data Blob 1), it’s in Raid 10 on Tier 1, Fast Tracks, and it’s consuming 20GB of raw space (100% raid overhead due to Raid10)
    - take a snapshot of the volume

    It’s 3.00 pm

    - We write another 5GB of data (call it Data Blob 2), that’s also in Raid 10 on Tier 1, Fast Tracks, 10GB of Raw Space (grand total of 30GB)

    Usually at 7.00 pm (that’s the default but it’s configurable) the Data Progression Job kicks in, due to the fact that Compellent’s snapshot are pointer based (just like NetApp) what we’ve called “Data Blob 1″ becomes a Read-Only Blob of data and it progress immediately to Raid 5-9 to get some free space back.
    So the next morning we find the system in this situation:

    Blob Data 1 who’s now part of the Snapshot Data is written in Raid 5-9 on Tier 1, Fast Tracks, consuming almost 12 GB of Raw Disk Space
    Blob Data 2 who’s part of the Active Data is still written in Raid 10 on Tier 1, Fast Tracks, still consuming 10 GB of Raw Disk Space

    We just got 8GB of Raw Disk Space back, without sacrificing performances, because “Blob Data 1″ is considered “Read Only” thus we don’t have to write Raid 5 but just read from it, eliminating the raid penalty.
    If we’re going to write data on that volume we’ll still write to “Blob Data 2″ who’s still active data, still in Raid 10.

    Said that, I would not imply that the configuration that you found from Compellent was right for the kind of workload the customer had, as I stated many times previously, I trust only *MY* configurations, made from a known set of information that *I* analyze, but I still hope that you can more easily wrap around your head on how Compellent works in detail and why you simply cannot take off the optimizations from the drawing board.

    HTH,
    Fabio

  • http://www.ubervu.com/conversations/www.cinetica.it/2010/03/10/why-compellent-proposes-fewer-disks/ uberVU – social comments

    Social comments and analytics for this post…

    This post was mentioned on Twitter by esignoretti: RT @dikrek: New blog: Sizing best practices and does Compellent follow them? http://is.gd/a5nUz // my comment here: http://bit.ly/d5f3GV...

  • http://siliconangle.com/emcworld2010/2010/05/05/odds-and-ends-tiering-and-performance-planning/ #EMCworld 2010 » Odds and Ends – Tiering, and Performance Planning

    [...] the 'basics' explanation of performance-sizing small arrays – it also has some good information on Compellant's architecture.  My comments regarding this vendor comparison are attached to that post.  As always, [...]

  • http://www.cinetica.it/2010/07/19/archeology-or-technology/ Archeology or Technology? « Cinetica

    [...] this. I wrote a lot on FT in the past and you can find articles in my blog by clicking here, here, here or searching “fast track” for a full list. You can also find a nice and [...]

  • John

    I realize I’m doing some serious necromancy on this old thread, but felt I had to put in a comment anyway.

    1. Thanks to everyone commenting in the thread, it’s all great information.
    2. It’s obvious that the original complaint, that the proposed 12 disk CML system was undersized for the customer’s workload.
    3. I had the following experience, personally. I put out a request for bids earlier this year, and received responses from several vendors, among them Compellent, and Netapp. We were looking to replace an aged EVA 4000 that was hosting an Oracle RAC database with an array that could handle both the DBs, as well as our VMware load. We were using a newer EVA4400 for the VMware, some other DBs, and mail servers at the time. The Compellent engineers did in fact request very granular (3 second interval data for approximately 6 weeks of total time) performance numbers. Because this was the most granular data requested by any of the vendors, I offered to provide it to all of them. One vendor in particular (not Netapp) said their tools wouldn’t accept data that granular, and requested 30 second interval instead, so I provided that to them. 

    Before I had even completed the capture of the performance data, I received a Netapp quote for a V-series array with 12 15k spindles and PAM2. They said that I could just use this to virtualize the existing EVA4400 for VMware, and use the spindles/PAM in the Netapp to host the Oracle RAC workload. While this was definitely an interesting proposal, and getting Netapp features on the EVA would have been nice, this wasn’t what I asked for to begin with. The data collection was completed and data provided to all interested vendors (Netapp reseller stood by their original quote and said they didn’t need the perf data.) From Compellent, and other vendors, I received a bid for 50-80 Tier 1 (15K) spindles, along with 10-30 Tier 3 (7.2K) spindles, along with well laid out documentation of the information gleaned from the perf data and what specific data points had driven their recommendations. I reiterated to the Netapp vendor that I was looking for a standalone array to host the required systems, and if they could reconfigure their bid to meet that requirement, the value add of being able to virtualize the EVA 4400 would be taken into account. The continued to stand by the assertion that 12 15K spindles and PAM would be as good as 50+ spindles from anyone else, and even had the gall to say, “If it’s not, you can just hang another disk shelf off the controller.” When this vendor was told we had selected someone else, they responded by trying to play the “First Hit is Free” game by knocking nearly 45% off their initial bid. 

    The point is, bad recommendations sometimes get made regardless of whose storage is being sold. We ended up with a Compellent system with 63 (60 active, 3 spare) 15K spindles and 24 (22 active, 2 spare) 7.2K spindles. Everything works as advertised, and I’m getting outstanding performance. The system is easy to use and manage, and the service is top notch. Could I have been just as happy with a correctly architected NetApp system? I’m sure I could. It’s not a question of “Can you provide the performance I need?” with any of the vendors I worked with, and there were a lot of the features of the Netapp that I really liked. Unfortunately, sometimes the vendor/reseller gets in the way of providing a system that meets the customers requirements.

    Just because, I want to say that I am not employed by, nor am I receiving any compensation from, any vendor mentioned or not mentioned. I wrote this completely to provide my own experience. 

    HTH,
    John

  • Enrico Signoretti

    John,
    Thank you for you comment,
    I agree with you: every vendor can provide you the performance you need if they want to understand exactly your needs!
    Ciao,
    Enrico

blog comments powered by Disqus