Oct 05, 2009

Improving on existing WPF data virtualization solutions

In a previous post, I compared two data virtualization techniques implemented by Paul McClean and Vincent Van Den Berghe for WPF. In this post, I describe a solution that combines some of the best features of both. I started with Paul’s solution, eliminated a few limitations, and incorporated some of Vincent’s ideas.

Selection

In Paul’s solution, a “collection reset” event is used to notify the UI each time a new page is loaded from the database. As a side effect, this notification unintentionally causes a ListBox to lose track of the selected item. This makes it impossible for a user to scroll through a long list using the down-arrow key; every time a new page is loaded, the ListBox selection jumps back to the beginning of the list. The troublesome code can be found in the following methods of AsyncVirtualizingCollection:

    private void LoadPageCompleted(object args)
    {
        int pageIndex = (int)((object[]) args)[0];
        IList<T> page = (IList<T>)((object[])args)[1];

        PopulatePage(pageIndex, page);
        IsLoading = false;
        FireCollectionReset();
    }

    private void FireCollectionReset()
    {
        NotifyCollectionChangedEventArgs e = new NotifyCollectionChangedEventArgs(NotifyCollectionChangedAction.Reset);
        OnCollectionChanged(e);
    }

One possible solution to this problem is to provide more fine-grained add and remove notifications for the new items, instead of a collection reset. Implementing this is not as straightforward as it seems, though, because of the combination of the following two behaviors : 1) When WPF receives a collection change notification for a newly added item, ListCollectionView accesses that item using the collection’s indexer, even if the item is not visible in the UI. 2) When an item is accessed, Paul’s caching heuristics load its page into memory, as well as the previous or next page depending on whether the item belongs to the first of second half of its page.

With this information, you can probably guess what happens when we provide fine-grained collection notifications. When a page is loaded, we notify WPF that a few items were added to the collection, ListCollectionView accesses each one of those items one by one, triggering a load of the subsequent page, which in turn notifies WPF that a more items were added to the collection, which causes the ListCollectionView to access each one, and so on. Eventually, the whole collection gets loaded, which is exactly what we’re trying to avoid.

We could use fine-grained notifications with either of two possible approaches: 1) change the caching heuristics so that neighboring pages are no longer loaded; or 2) implement our own view (as a replacement for ListCollectionView) that doesn’t call the collection indexer to access each newly added item. Either approach would fix the problem, but they would not fix another related problem. If we happen to select an item that is not yet loaded, selection would be lost when the item finishes loading. This would happen because selection is tracked based on the actual data item – not its index within the ListBox. If I press the down-arrow key until I select an item that hasn’t yet been loaded, when its data item changes at load time (from null to the actual data), the ListBox’s selected item is no longer referring to that same item.

This train of thought made it clear that Vincent’s technique of wrapping each data item could solve all these selection issues. When using data wrappers, the data items associated with each ListBoxItem don’t ever change – they’re the wrappers themselves. The data wrappers are not replaced when data loads, and therefore WPF doesn’t lose track of the selected item. What changes is the data within the wrapper, which means we can now raise property change notifications to update the UI, instead of collection change notifications. This is good news, since property change notifications are very fine-grained, and they work across threads.

My data wrapper class is called DataWrapper and among other properties it contains a reference to the actual data.

    public class DataWrapper<T> : INotifyPropertyChanged where T : class
    {
        private T data;
        …
        public T Data
        {
            get { return this.data; }
            internal set
            {
                this.data = value;
                this.OnPropertyChanged("Data");
                …
            }
        }
        …
    }

Adding wrappers required some changes in the collection code base. In Paul’s code, requesting a page would add a new entry in the page dictionary with value null, and populating a page would set that value to the actual page:

    protected virtual void RequestPage(int pageIndex)
    {
        if (!_pages.ContainsKey(pageIndex))
        {
            _pages.Add(pageIndex, null);
            _pageTouchTimes.Add(pageIndex, DateTime.Now);
            LoadPage(pageIndex);
        }
        else
        {
            _pageTouchTimes[pageIndex] = DateTime.Now;
        }
    }

    protected virtual void PopulatePage(int pageIndex, IList<T> page)
    {
        if ( _pages.ContainsKey(pageIndex) )
            _pages[pageIndex] = page;
    }

To support data wrappers, I changed the code so that a request for a new page results in the immediate creation of a page full of empty data wrappers . This page is added to the dictionary right away. Later, when the actual data gets loaded, populating the page just fills in the data part of the wrappers.

    protected virtual void RequestPage(int pageIndex)
    {
        if (!_pages.ContainsKey(pageIndex))
        {
            int pageLength = Math.Min(this.PageSize, this.Count – pageIndex * this.PageSize);
            DataPage<T> page = new DataPage<T>(pageIndex * this.PageSize, pageLength);
            _pages.Add(pageIndex, page);
            LoadPage(pageIndex, pageLength);
        }
        else
        {
            _pages[pageIndex].TouchTime = DateTime.Now;
        }
    }

    protected virtual void PopulatePage(int pageIndex, IList<T> dataItems)
    {
        DataPage<T> page;
        if (_pages.TryGetValue(pageIndex, out page))
        {
            page.Populate(dataItems);
        }
    }

Contains and IndexOf

In Paul’s data virtualization solution, VirtualizingCollection does not include an implementation for the Contains and IndexOf methods:

    public bool Contains(T item)
    {
        return false;
    }

    public int IndexOf(T item)
    {
        return -1;
    }

As a result, the CurrentItem property of WPF’s collection view doesn’t track the current item correctly, and therefore we can’t implement the Master-Detail scenario by simply binding both a ListBox and a ContentControl to the collection. There are other scenarios equally affected by this.

Providing an implementation for these methods was relatively straightforward:

    public bool Contains(DataWrapper<T> item)
    {
        foreach (DataPage<T> page in _pages.Values)
        {
            if (page.Items.Contains(item))
            {
                return true;
            }
        }
        return false;
    }

    public int IndexOf(DataWrapper<T> item)
    {
        foreach (KeyValuePair<int, DataPage<T>> keyValuePair in _pages)
        {
            int indexWithinPage = keyValuePair.Value.Items.IndexOf(item);
            if (indexWithinPage != -1)
            {
                return PageSize * keyValuePair.Key + indexWithinPage;
            }
        }
        return -1;
    }

Currency

Providing an implementation for Contains and IndexOf enabled currency (CurrentItem), but there were still some corner cases that didn’t work correctly. For example, if I selected an item and then scrolled it off-screen, WPF knew not to virtualize the UI element for that item, but the data was still being virtualized. This also caused problems with currency.

I needed a way to prevent an item from virtualizing its data if its UI was still available. Adding data wrappers had the fortunate side effect of making the fix for this problem easier. I know that a data wrapper is being used if someone is listening to its property change event. So I was able to add an IsInUse property to the data wrapper with the following implementation:

    public class DataWrapper<T> : INotifyPropertyChanged where T : class
    {
        …
        public event PropertyChangedEventHandler PropertyChanged;
        public bool IsInUse
        {
            get { return this.PropertyChanged != null; }
        }
    }

Similarly, I added a property that determines whether a page has at least one item in use:

    public class DataPage<T> where T : class
    {
        …
        public bool IsInUse
        {
            get { return this.Items.Any(wrapper => wrapper.IsInUse); }
        }
    }

Then I used that property to avoid cleaning up pages that are still in use, within VirtualizingCollection:

    public void CleanUpPages()
    {
        int[] keys = _pages.Keys.ToArray();
        foreach (int key in keys)
        {
            // page 0 is a special case, since WPF ItemsControl access the first item frequently
            if (key != 0 && (DateTime.Now – _pages[key].TouchTime).TotalMilliseconds > PageTimeout)
            {
                bool removePage = true;
                DataPage<T> page;
                if (_pages.TryGetValue(key, out page))
                {
                    removePage = !page.IsInUse;
                }

                if (removePage)
                {
                    _pages.Remove(key);
                }
            }
        }
    }

IsInitializing + IsLoading

Paul’s AsyncVirtualizingCollection has an “IsLoading” property that is set to true when the collection is either counting its items or fetching a page. This is useful so that we can provide visual feedback when we’re querying data from the database. On the other hand, it’s a bit limiting to have only one property indicating that work is in progress. We don’t want to prevent the user from interacting with other items in the ListBox just because scrolling causes a few items to start downloading. Ideally, we would get more fine-grained status information.

To solve this problem, I added an “IsInitializing” property that is true when we’re fetching the count, and changed “IsLoading” slightly to inform us when the collection is fetching a new page. The “IsInitializing” property is defined at the collection level, and the “IsLoading” property is defined in the data wrapper.

When the collection count is being fetched (that is, when IsInitializing is true), I display a message in the middle of the empty ListBox and switch to the “Wait” cursor, making it obvious that it’s not yet ready for user interaction:

    <ControlTemplate TargetType="{x:Type ListView}">
        <Grid>
            <theme:ListBoxChrome Name="Bd" … >
                …
            </theme:ListBoxChrome>
            <Grid Background="White" Opacity="0.5" Name="InitializingGrid" Visibility="Collapsed">
                <TextBlock Text="Initializing…" HorizontalAlignment="Center" VerticalAlignment="Center"/>
            </Grid>
        </Grid>
        <ControlTemplate.Triggers>
            …
            <DataTrigger Binding="{Binding Path=IsInitializing}" Value="True">
                <Setter Property="Cursor" Value="Wait" TargetName="InitializingGrid"/>
                <Setter Property="Visibility" Value="Visible" TargetName="InitializingGrid"/>
            </DataTrigger>
        </ControlTemplate.Triggers>
    </ControlTemplate>

When an item is being fetched from the database (that is, when IsLoading is true), I display a message and “Wait” cursor just within the corresponding ListViewItem:

    <ControlTemplate TargetType="{x:Type ListViewItem}">
        …
        <Grid>
            …
            <GridViewRowPresenter …>
            <StackPanel Name="Loading" Orientation="Horizontal" Grid.RowSpan="2" Visibility="Collapsed">
                <TextBlock Text="Loading item " />
                <TextBlock Text="{Binding ItemNumber}" />
                <TextBlock Text="…" />
            </StackPanel>
        </Grid>
        …
        <ControlTemplate.Triggers>
            <DataTrigger Binding="{Binding IsLoading}" Value="True">
                <Setter TargetName="Loading" Property="Visibility" Value="Visible"/>
                <Setter Property="Cursor" Value="Wait" />
                …
            </DataTrigger>
        </ControlTemplate.Triggers>
    </ControlTemplate>

This is the point where I would normally hand the problem over to a visual designer or an interaction designer. Now that we can get fine-grained information about which data items are loading and which are available, a designer could come up with a variety of ways to display this information to the user.

Still missing…

Paul’s solution assumes the collection is read-only, and my code doesn’t really fix that limitation. Although my AsyncVirtualizingCollection will notice if its count has changed when fetching a new page of data, it won’t notice at any other time. If you’re successful at extending this solution to support dynamic collection changes, I’d love to hear from you!

You can download the source for this project.

26 Comments
  1. Per Bernhardsson

    I might have a solution to your missing parts. I simply changed the FetchCount method in the ItemsProvider into a Property called Count and added property changed notification which your VirtualizingCollection handles by calling the LoadCount method which in turn tells the listbox to update and show the new element as well as also sending the same notification. There’s still that Collection Reset you mentioned though, I haven’t looked into that yet.

    If you’d like the full source (it’s a bit of a problem to add it here) I’ll gladly mail it to you.

    • Bea

      Hi Per,

      Thanks for posting, and for sending me the code through email (if anyone else is interested and Per is ok with it, I’ll upload it to my server and add a link to it here).

      Your solution is nice if all changes to the database are driven from your UI. However, ideally I would like to detect when the database changes behind the scenes. I’m very close to having a solution for this – if I have a chance to polish it, I’ll write a blog post about it.

      Thank you so much for sharing your solution!
      Bea

      • Nour

        Great article about data virtualization. I’m playing with the code to bind the collection to a data grid.
        I would be very interested to have a copy of ‘Per’ code changes to make the collection editable.

        Thanks

        • Bea

          Hi Nour,

          With Per’s permission, you can download his code here. Thanks Per!

          Bea

  2. Pierre-Luc Ledoux

    Your solution looks a lot like the virtualizing collection view I coded for Xceed’s DataGridControl for WPF last year.

    A really neat thing about Xceed’s is the built-in support of the fetching of data through the IQueryable interface. Grouping, sorting and filtering is therefore automatically supported without the need to handle any events, which is really cool when using LINQ to SQL.

    It also supports editing of data and is usable on its own outside of our DataGrid control.

    • Bea

      Hi Pierre – very cool! Thanks for posting.

  3. dan

    I pondered over how to do this for some time but could not get the result I wanted.. many thanks for this solution!

  4. AliRam

    Very nice solution. absolutely sugar! ;) Thank you!

  5. Arnoud

    I really love your series on data-virtualization!
    Hope at some point to see these WPF things in Silverlight!
    I’m going to try this approach for our large datasets…

    • Bea

      Hi,

      Yes, you could virtualize data using that solution too.
      One disadvantage of that solution is the fact that it does a query for each item in the collection, instead of doing a single query to fetch a chunk of items. If you’re using a SQL database, that will have a high impact on the speed of your application.

      Thanks for posting. I like seeing other solutions to the problem.

      Bea

  6. Smitha R Mangalore

    How will I add the items to the collection manually after it has been loaded using this approach.

    Scenario: Assume, I am populating the Mail Messages into the List View using the solution provided above by cutsomizing the DemoCustoemrProvider and it works perfectly fine. Now If user gets an instant email, I need to add this mail message to the page (only if the user is viewing the first page…as mails are odreded by TimeStamp….). I have mechanism to know the if the new message has been arrived, but need to add it manually add it to the page. What would be the best way to achieve this ?

    Hope I am clear in my explaination….

    Regards
    Smitha

    • Bea

      Hi Smitha,

      You may want to take a look at Per’s changes to my code. Look at the first comment in this post for a description of the problem it solves and a link to the source. I don’t yet have a solution that detects changes in the database and automatically displays them in the UI, but Per has a solution that should work if all database changes are driven from the UI.

      Thanks,
      Bea

  7. Smitha R Mangalore

    Thanks Bea,

    I had looked into Per’s Solution. In that, when we add an item, it notifies all other component’s about the change and count will be updated accordingly. But In my situation, I want to add the customer item received through WCF Callback, So I dont want to go back to my database to fetch the record again.

    i.e.Whevere there is new item, I want to insert that item into the top of the list . And similarly I should be able to select some item from the list and wants to delete it from the list. And I want to merge this functionality along with Data Virtualization solution you have posted as it is simply superb.

    I am guessing If public void Insert(int index, DataWrapper item) in Virtualizingcollection is implemented and used, I may find the solution, but I am not able to do so….Any clue ??

    Regards,
    Smitha

    • Bea

      Hi Smitha,

      I’m not sure I fully understand your scenario. Are you saying that you want to add an item without updating count? Data virtualization relies on count heavily – scrolling wouldn’t function properly if count is not up to date.

      Bea

  8. Smitha

    Hi Bea,

    We have MailMessage box developed using Data Virtualization in WPF. Data actually comes from WCF service wrapped over LINQ to SQL. This works perfectly fine. Now when new message arrives, Client will be notified using WCF Callback function with NewMessage as its argument. So now this new MailMessage arrived should be added to the MailMessage (WPF ListView) which implements Data Virtualization. Now I am stuck at this point. Count can be incremented, but new message which arrived from different thread (WCF Callback thread) needs to be added to this MailMessage ListView. (like ListView.Items.Add( MailMessageItem msg))..something like this (New Message should be added on top of the MailMessage)… I am not able to achieve this.

    Similarly, It should be possible to select some item and remove it from the list.

    In Per’s soultion, when add item is clicked count in incremented and remove item will decrement the count. And it will fetch only that many records. But In my situation, when I click on Add, I want to add an item into the listBox manually at the top of the list and should be able to pick an item to delete.

    In worst scenario, if this is not possible if we use Data Virtualization, then I need to fetch the records all over again.
    Hope I am clear in my explaination.

    Regards,
    Smitha

    Regards,
    Smitha

    • Bea

      Hi Smitha,

      If I understand correctly, it seems like you’re looking for changes in the database to be reflected in the UI. This is the exact scenario I mention in the “Still missing…” section of the blog post. I have some ideas, but haven’t yet had a chance to try them.

      I hope you’re able to find a solution!

      Bea

  9. Andreas Pircher

    Hi Bea,

    First I want to thank you for your great posts. They helped me more than once.

    I am using your solution in two projects and today I recognized, that they are suddenly loading the whole collection instead of only the first one or two pages. So I downloaded your code again to see what I did wrong and suprise: your code did also load the whole collection. So to make the story a bit shorter: It seams to happen only on Windows 7. On Vista everything is OK but on Windows 7 some Code in PresentationFramework.dll accesses the Enumerator which accesses the indexer for every item which leads to loading the whole collection. And this full collection loading is also triggered by just moving the mouse inside the listview.

    Has anyone else seen this problem on Windows 7?
    Any solutions/workarounds available?

    Andreas

    • Andreas Pircher

      A little clarification and follow-up:

      The problem occurs to me on my new Vaio Z Laptop with Win 7 Ultimate 64bit.
      Now I tested it on Win 7 RC (Build 7100) 32bit and Win Vista Ultimate 32bit and the problem did not occur on those systems.

      Here is an abbreviated part of the call stack:

      …VirtualizingCollection.RequestPage(int pageIndex = 8)
      …VirtualizingCollection.this[int].get(int index = 1126)
      …VirtualizingCollection.GetEnumerator()
      PresentationFramework.dll!…PlaceholderAwareEnumerator.MoveNext() + 0×73 Bytes
      PresentationFramework.dll!…ItemsControlAutomationPeer.GetChildrenCore() + 0x2f2 Bytes
      PresentationFramework.dll…ListViewAutomationPeer.GetChildrenCore() + 0x2e Bytes
      PresentationCore.dll!…AutomationPeer.EnsureChildren() + 0x1c Bytes
      PresentationCore.dll!…AutomationPeer.UpdateChildren() + 0×52 Bytes
      PresentationCore.dll!…AutomationPeer.UpdateSubtree() + 0x2d8 Bytes

      PresentationCore.dll!…ContextLayoutManager.fireAutomationEvents() + 0xb9 Bytes
      PresentationCore.dll!…ContextLayoutManager.UpdateLayout() + 0×812 Bytes

      I do not know much about UI Automation but it seems as if that could be the problem here…
      So it does not seem to be a general Windows 7 problem of this implementation.

      Andreas

      • Bea

        Hi Andreas,

        I’ve heard that issue from other people that have tested this and other data virtualization solutions on multiple platforms. No one has been able to narrow down a generic pattern of hardware, software or combination of hardware and software that makes it happen.

        WPF should work the same in all platforms. If this is not the case, I would assume that there’s some issue in the underlying WPF code of the callstack you show here.

        If you find out more about it, please post that information here, as it may help other people that have the same issue.

        Thank you,
        Bea

        • Ron

          Is there any update on this? I am having the same issue on Windows 7 64-bit – GetEnumerator is being called which in turn causes all items in the list to be accessed.

          • Bea

            Hi Ron,

            I haven’t heard any news about this issue. It’s really unfortunate that this limitation exists.
            If you hear anything about it, please post it here or let me know (I’ve heard about this issue from others, but I’ve never been able to repro it on any of my machines).

            Thanks,
            Bea

  10. Michael Rosen

    This is fantastic. Thanks so much for this.

    It may be worth mentioning that the an “AsyncVirtualizingCollection” is NOT a collection of Customer but rather a collection of DataWrapper. If you try to bind to a property (say, ‘Id’) of Customer you’ll get binding failure:

    BindingExpression path error: ‘Id’ property not found on ‘object’ ”DataWrapper`1′ (HashCode=67090807)’. BindingExpression:Path=Id; DataItem=’DataWrapper`1′ (HashCode=67090807); target element is ‘TextBlock’ (Name=”); target property is ‘Text’ (type ‘String’)

    Instead you’ll need to bind to Data.Id. In the example code, this is buried down in the GridViewRowPresenter of the Style applied to ListViewItems. It is likely to be missed by the casual observer, especially since the GridViewColumns seem to suggest that we can Bind as DisplayMemberBinding=”{Binding Id}”

    • Bea

      Hi Michael,

      Thanks for your comment, which I’m sure will be useful to others. Yes, it’s definitely worth restating that we’re dealing with wrapped items, and that that adds an extra level of indirection when binding to the data.

      Thanks!
      Bea

  11. Tobi

    Hi
    is anyone of you able to provide a working example for a virtualizing treeview using virtualizing collections?
    I never got that working, all the subitems of one level are loaded at once (jI am using the DataVirtualization-Method with the Pages and all pages are loaded at once, not like it is when I use a ListView)

    Tobi

    • Bea

      Hi Tobi,

      Are you already doing the partial data virtualization in this post, where data for each level is loaded only when that level is expanded? This doesn’t help if you have a particular level with lots of data, it helps more if you have lots of levels with a bit of data.

      I haven’t tried to apply the data virtualization technique explained here to TreeView. I would assume that applying this solution to each TreeViewItem (which derives from ItemsControl) would virtualize the data for that level. It sounds simple, but it may be a bit tricky to implement.

      If someone has implemented a solution for that, please leave a note here!

      Bea

Comments are closed.