http://blog.chinaunix.net/uid-26859697-id-4999985.html

根据<span style="-ms-word-wrap: break-word;">git</span>的合入记录,<span style="-ms-word-wrap: break-word;">CMA</span>(<span style="-ms-word-wrap: break-word;">Contiguous Memory Allocator</span>,连续内存分配器)是在内核<span style="-ms-word-wrap: break-word;">3.5</span>的版本引入,由三星的工程师开发实现的,用于<span style="-ms-word-wrap: break-word;">DMA</span>映射框架下提升连续大块内存的申请。

其实现主要是在系统引导时获取内存,并将内存设置为<span style="-ms-word-wrap: break-word;">MIGRATE_CMA</span>迁移类型,然后再将内存归还给系统。内核分配内存时,在<span style="-ms-word-wrap: break-word;">CMA</span>管理内存中仅允许申请可移动类型内存页面(<span style="-ms-word-wrap: break-word;">movable pages</span>),例如<span style="-ms-word-wrap: break-word;">DMA</span>映射时不使用的页面缓存。而通过<span style="-ms-word-wrap: break-word;">dma_alloc_from_contiguous()</span>申请大块连续内存时,将会把这些可移动页面从<span style="-ms-word-wrap: break-word;">CMA</span>管理区中迁移出去,以便腾出足够的连续内存空间满足申请需要。由此,实现了任何时刻只要系统中有足够的内存空间,便可以申请得到大块连续内存。

先由其初始化开始分析,于<span style="-ms-word-wrap: break-word;">/drivers/base/dma-contiguous.c</span>代码文件中,可以找到其初始化函数<span style="-ms-word-wrap: break-word;">cma_init_reserved_areas()</span>,其通过<span style="-ms-word-wrap: break-word;">core_initcall()</span>注册到系统初始化中。

&nbsp; &nbsp;<span style="line-height: 1.5; -ms-word-wrap: break-word;">先看一下</span><span style="line-height: 1.5; -ms-word-wrap: break-word;">cma_init_reserved_areas()</span><span style="line-height: 1.5; -ms-word-wrap: break-word;">实现:</span>


1. 【file:/drivers/base/dma-contiguous.c】
2. static int __init cma_init_reserved_areas(void)
3. {
4.     int i;
5.  
6.     for (i = 0; i < cma_area_count; i++) {
7.         int ret = cma_activate_area(&cma_areas[i]);
8.         if (ret)
9.             return ret;
10.     }
11.  
12.     return 0;
13. }
<span style="line-height: 1.5; -ms-word-wrap: break-word;">&nbsp; &nbsp; 其主要是通过遍历</span><span style="line-height: 1.5; -ms-word-wrap: break-word;">cma_areas</span><span style="line-height: 1.5; -ms-word-wrap: break-word;">的</span><span style="line-height: 1.5; -ms-word-wrap: break-word;">CMA</span><span style="line-height: 1.5; -ms-word-wrap: break-word;">管理区信息,调用</span><span style="line-height: 1.5; -ms-word-wrap: break-word;">cma_activate_area()</span><span style="line-height: 1.5; -ms-word-wrap: break-word;">将各个区进行初始化。其中</span><span style="line-height: 1.5; -ms-word-wrap: break-word;">cma_areas</span><span style="line-height: 1.5; -ms-word-wrap: break-word;">信息来自于</span><span style="line-height: 1.5; -ms-word-wrap: break-word;">DMA</span><span style="line-height: 1.5; -ms-word-wrap: break-word;">初始化时:</span><span style="line-height: 1.5; -ms-word-wrap: break-word;">start_kernel()-&gt;setup_arch()-&gt;&nbsp;&nbsp;&nbsp; dma_contiguous_reserve()</span><span style="line-height: 1.5; -ms-word-wrap: break-word;">读取来自</span><span style="line-height: 1.5; -ms-word-wrap: break-word;">cmdline</span><span style="line-height: 1.5; -ms-word-wrap: break-word;">的信息,然后通过</span><span style="line-height: 1.5; -ms-word-wrap: break-word;">dma_contiguous_reserve_area()</span><span style="line-height: 1.5; -ms-word-wrap: break-word;">进行内存预留和</span><span style="line-height: 1.5; -ms-word-wrap: break-word;">cma_areas</span><span style="line-height: 1.5; -ms-word-wrap: break-word;">内存信息设置。具体这里不做深入分析。</span>

&nbsp;&nbsp;&nbsp; 而继续<span style="-ms-word-wrap: break-word;">cma_activate_area()</span>的实现:


1. 【file:/drivers/base/dma-contiguous.c】
2. static int __init cma_activate_area(struct cma cma)
3. {
4.     int bitmap_size = BITS_TO_LONGS(cma->count) sizeof(long);
5.     unsigned long base_pfn = cma->base_pfn, pfn = base_pfn;
6.     unsigned i = cma->count >> pageblock_order;
7.     struct zone *zone;
8.  
9.     cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
10.  
11.     if (!cma->bitmap)
12.         return -ENOMEM;
13.  
14.     WARN_ON_ONCE(!pfn_valid(pfn));
15.     zone = page_zone(pfn_to_page(pfn));
16.  
17.     do {
18.         unsigned j;
19.         base_pfn = pfn;
20.         for (j = pageblock_nr_pages; j; --j, pfn++) {
21.             WARN_ON_ONCE(!pfn_valid(pfn));
22.             if (page_zone(pfn_to_page(pfn)) != zone)
23.                 return -EINVAL;
24.         }
25.         init_cma_reserved_pageblock(pfn_to_page(base_pfn));
26.     } while (--i);
27.  
28.     return 0;
29. }
<span style="line-height: 1.5; -ms-word-wrap: break-word;">&nbsp; &nbsp;该函数主要是对</span><span style="line-height: 1.5; -ms-word-wrap: break-word;">CMA</span><span style="line-height: 1.5; -ms-word-wrap: break-word;">管理区进行初始化,先是</span><span style="line-height: 1.5; -ms-word-wrap: break-word;">kzalloc()</span><span style="line-height: 1.5; -ms-word-wrap: break-word;">申请位图,然后以最高阶</span><span style="line-height: 1.5; -ms-word-wrap: break-word;">pageblock_order</span><span style="line-height: 1.5; -ms-word-wrap: break-word;">的页面数量</span><span style="line-height: 1.5; -ms-word-wrap: break-word;">pageblock_nr_pages</span><span style="line-height: 1.5; -ms-word-wrap: break-word;">为单位对该区的内存页面进行检验,确保该数量单位的内存页面都合法且同处于一个内存管理区,也就是保证至少有一个最高阶的</span><span style="line-height: 1.5; -ms-word-wrap: break-word;">pageblock_nr_pages</span><span style="line-height: 1.5; -ms-word-wrap: break-word;">数量的内存块会被初始化,如果不够该数量,则返回</span><span style="line-height: 1.5; -ms-word-wrap: break-word;">-EINVAL</span><span style="line-height: 1.5; -ms-word-wrap: break-word;">错误。</span>

&nbsp;&nbsp;&nbsp; 而里面具体初始化页面的函数为<span style="-ms-word-wrap: break-word;">init_cma_reserved_pageblock()</span>:


1. 【file:/drivers/base/dma-contiguous.c】
2. / Free whole pageblock and set its migration type to MIGRATE_CMA. /
3. void init init_cma_reserved_pageblock(struct page page)
4. {
5.     unsigned i = pageblock_nr_pages;
6.     struct page p = page;
7.  
8.     do {
9.         
ClearPageReserved(p);
10.         set_page_count(p, 0);
11.     } while (++p, --i);
12.  
13.     set_pageblock_migratetype(page, MIGRATE_CMA);
14.  
15.     if (pageblock_order >= MAX_ORDER) {
16.         i = pageblock_nr_pages;
17.         p = page;
18.         do {
19.             set_page_refcounted(p);
20.             free_pages(p, MAX_ORDER - 1);
21.             p += MAX_ORDER_NR_PAGES;
22.         } while (i -= MAX_ORDER_NR_PAGES);
23.     } else {
24.         set_page_refcounted(page);
25.         
free_pages(page, pageblock_order);
26.     }
27.  
28.     adjust_managed_page_count(page, pageblock_nr_pages);
29. }
该函数先是<span style="-ms-word-wrap: break-word;">set_page_count()</span>将页面计数初始化,接着<span style="-ms-word-wrap: break-word;">set_pageblock_migratetype()</span>将页面设置为<span style="-ms-word-wrap: break-word;">MIGRATE_CMA</span>类型,然后<span style="-ms-word-wrap: break-word;">set_page_refcounted()</span>重置页面引用计数后通过<span style="-ms-word-wrap: break-word;">__free_pages()</span>将内存释放至伙伴管理算法中,最终是挂到了<span style="-ms-word-wrap: break-word;">zone-&gt;free_area[order].free_list[MIGRATE_CMA]</span>(这里的<span style="-ms-word-wrap: break-word;">order</span>是<span style="-ms-word-wrap: break-word;">pageblock_order</span>或<span style="-ms-word-wrap: break-word;">MAX_ORDER-1</span>),最后通过<span style="-ms-word-wrap: break-word;">adjust_managed_page_count()</span>修改内存管理页面数量。

初始化基本上就这样了。

而<span style="-ms-word-wrap: break-word;">CMA</span>的内存分配则是通过<span style="-ms-word-wrap: break-word;">dma_generic_alloc_coherent()</span>进行分配的。


1. 【file:/arch/x86/kernel/pci-dma.c】
2. void dma_generic_alloc_coherent(struct device dev, size_t size,
3.                  dma_addr_t dma_addr, gfp_t flag,
4.                  struct dma_attrs attrs)
5. {
6.     unsigned long dma_mask;
7.     struct page page;
8.     unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
9.     dma_addr_t addr;
10.  
11.     dma_mask = dma_alloc_coherent_mask(dev, flag);
12.  
13.     flag |= __GFP_ZERO;
14. again:
15.     page = NULL;
16.     / CMA can be used only in the context which permits sleeping /
17.     if (flag & __GFP_WAIT)
18.         page = dma_alloc_from_contiguous(dev, count, get_order(size));
19.     / fallback /
20.     if (!page)
21.         page = alloc_pages_node(dev_to_node(dev), flag, get_order(size));
22.     if (!page)
23.         return NULL;
24.  
25.     addr = page_to_phys(page);
26.     if (addr + size > dma_mask) {
27.         __free_pages(page, get_order(size));
28.  
29.         if (dma_mask < DMA_BIT_MASK(32) && !(flag & GFP_DMA)) {
30.             flag = (flag & ~GFP_DMA32) | GFP_DMA;
31.             goto again;
32.         }
33.  
34.         return NULL;
35.     }
36.  
37.     dma_addr = addr;
38.     return page_address(page);
39. }
如果希望从<span style="-ms-word-wrap: break-word;">CMA</span>管理区中获取内存,则分配标志<span style="-ms-word-wrap: break-word;">flag</span>需允许分配时休眠<span style="-ms-word-wrap: break-word;">__GFP_WAIT</span>。进而将通过<span style="-ms-word-wrap: break-word;">dma_alloc_from_contiguous()</span>获取到内存。

dma_alloc_from_contiguous()实现:


1. 【file:/drivers/base/dma-contiguous.c】
2. /
3.   dma_alloc_from_contiguous() - allocate pages from contiguous area
4.   @dev: Pointer to device for which the allocation is performed.
5.   @count: Requested number of pages.
6.   @align: Requested alignment of pages (in PAGE_SIZE order).
7.  
8.   This function allocates memory buffer for specified device. It uses
9.   device specific contiguous memory area if available or the default
10.   global one. Requires architecture specific get_dev_cma_area() helper
11.   function.
12.  /
13. struct page dma_alloc_from_contiguous(struct device dev, int count,
14.                        unsigned int align)
15. {
16.     unsigned long mask, pfn, pageno, start = 0;
17.     struct cma cma = dev_get_cma_area(dev);
18.     struct page page = NULL;
19.     int ret;
20.  
21.     if (!cma || !cma->count)
22.         return NULL;
23.  
24.     if (align > CONFIG_CMA_ALIGNMENT)
25.         align = CONFIG_CMA_ALIGNMENT;
26.  
27.     pr_debug("%s(cma %p, count %d, align %d)\n", func, (void )cma,
28.          count, align);
29.  
30.     if (!count)
31.         return NULL;
32.  
33.     mask = (1 << align) - 1;
34.  
35.     mutex_lock(&cma_mutex);
36.  
37.     for (;;) {
38.         pageno = bitmap_find_next_zero_area(cma->bitmap, cma->count,
39.                             start, count, mask);
40.         if (pageno >= cma->count)
41.             break;
42.  
43.         pfn = cma->base_pfn + pageno;
44.         ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA);
45.         if (ret == 0) {
46.             bitmap_set(cma->bitmap, pageno, count);
47.             page = pfn_to_page(pfn);
48.             break;
49.         } else if (ret != -EBUSY) {
50.             break;
51.         }
52.         pr_debug("%s(): memory range at %p is busy, retrying\n",
53.              func, pfn_to_page(pfn));
54.         / try again with a bit different memory target */
55.         start = pageno + mask + 1;
56.     }
57.  
58.     mutex_unlock(&cma_mutex);
59.     pr_debug("%s(): returned %p\n", func, page);
60.     return page;
61. }
该函数通过<span style="-ms-word-wrap: break-word;">dev_get_cma_area()</span>获得设备使用的<span style="-ms-word-wrap: break-word;">CMA</span>管理区,然后通过<span style="-ms-word-wrap: break-word;">bitmap_find_next_zero_area()</span>查找到<span style="-ms-word-wrap: break-word;">CMA</span>管理区中合适大小的未被分配的页面空间,接着调用<span style="-ms-word-wrap: break-word;">alloc_contig_range()</span>尝试去分配该查找到的页面空间,如果查找到,则使用<span style="-ms-word-wrap: break-word;">bitmap_set()</span>将该空间的<span style="-ms-word-wrap: break-word;">bitmap</span>位图进行置位表示已被使用,完了<span style="-ms-word-wrap: break-word;">pfn_to_page()</span>通过页框号去得首页面的结构并返回。

其中<span style="-ms-word-wrap: break-word;">bitmap_find_next_zero_area()</span>的实现:


1. 【file:/lib/bitmap.c】
2. /
3.   bitmap_find_next_zero_area - find a contiguous aligned zero area
4.   @map: The address to base the search on
5.   @size: The bitmap size in bits
6.   @start: The bitnumber to start searching at
7.   @nr: The number of zeroed bits we're looking for
8.   @align_mask: Alignment mask for zero area
9.  
10.   The @align_mask should be one less than a power of 2; the effect is that
11.   the bit offset of all zero areas this function finds is multiples of that
12.   power of 2. A @align_mask of 0 means no alignment is required.
13.  /
14. unsigned long bitmap_find_next_zero_area(unsigned long map,
15.                      unsigned long size,
16.                      unsigned long start,
17.                      unsigned int nr,
18.                      unsigned long align_mask)
19. {
20.     unsigned long index, end, i;
21. again:
22.     index = find_next_zero_bit(map, size, start);
23.  
24.     / Align allocation */
25.     index = __ALIGN_MASK(index, align_mask);
26.  
27.     end = index + nr;
28.     if (end > size)
29.         return end;
30.     i = find_next_bit(map, end, index);
31.     if (i < end) {
32.         start = i + 1;
33.         goto again;
34.     }
35.     return index;
36. }
该函数通过<span style="-ms-word-wrap: break-word;">find_next_zero_bit()</span>和<span style="-ms-word-wrap: break-word;">find_next_bit()</span>往返查找<span style="-ms-word-wrap: break-word;">bit</span>位置<span style="-ms-word-wrap: break-word;">0</span>与置<span style="-ms-word-wrap: break-word;">1</span>之间的空间,以期找到足够大的空间以实现空间分配的查找。

而<span style="-ms-word-wrap: break-word;">alloc_contig_range()</span>的实现:


1. 【file:/mm/page_alloc.c】
2. /
3.   alloc_contig_range() -- tries to allocate given range of pages
4.   @start: start PFN to allocate
5.   @end: one-past-the-last PFN to allocate
6.   @migratetype: migratetype of the underlaying pageblocks (either
7.   #MIGRATE_MOVABLE or #MIGRATE_CMA). All pageblocks
8.   in range must have the same migratetype and it must
9.   be either of the two.
10.  
11.   The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
12.   aligned, however it's the caller's responsibility to guarantee that
13.   we are the only thread that changes migrate type of pageblocks the
14.   pages fall in.
15.  
16.   The PFN range must belong to a single zone.
17.  
18.   Returns zero on success or negative error code. On success all
19.   pages which PFN is in [start, end) are allocated for the caller and
20.   need to be freed with free_contig_range().
21.  /
22. int alloc_contig_range(unsigned long start, unsigned long end,
23.                unsigned migratetype)
24. {
25.     unsigned long outer_start, outer_end;
26.     int ret = 0, order;
27.  
28.     struct compact_control cc = {
29.         .nr_migratepages = 0,
30.         .order = -1,
31.         .zone = page_zone(pfn_to_page(start)),
32.         .sync = true,
33.         .ignore_skip_hint = true,
34.     };
35.     INIT_LIST_HEAD(&cc.migratepages);
36.  
37.     /
38.       What we do here is we mark all pageblocks in range as
39.       MIGRATE_ISOLATE. Because pageblock and max order pages may
40.       have different sizes, and due to the way page allocator
41.       work, we align the range to biggest of the two pages so
42.       that page allocator won't try to merge buddies from
43.       different pageblocks and change MIGRATE_ISOLATE to some
44.       other migration type.
45.      
46.       Once the pageblocks are marked as MIGRATE_ISOLATE, we
47.       migrate the pages from an unaligned range (ie. pages that
48.       we are interested in). This will put all the pages in
49.       range back to page allocator as MIGRATE_ISOLATE.
50.      
51.       When this is done, we take the pages in range from page
52.       allocator removing them from the buddy system. This way
53.       page allocator will never consider using them.
54.      
55.       This lets us mark the pageblocks back as
56.       MIGRATE_CMA/MIGRATE_MOVABLE so that free pages in the
57.       aligned range but not in the unaligned, original range are
58.       put back to page allocator so that buddy can use them.
59.      /
60.  
61.     ret = start_isolate_page_range(pfn_max_align_down(start),
62.                        pfn_max_align_up(end), migratetype,
63.                        false);
64.     if (ret)
65.         return ret;
66.  
67.     ret = __alloc_contig_migrate_range(&cc, start, end);
68.     if (ret)
69.         goto done;
70.  
71.     /
72.       Pages from [start, end) are within a MAX_ORDER_NR_PAGES
73.       aligned blocks that are marked as MIGRATE_ISOLATE. What's
74.       more, all pages in [start, end) are free in page allocator.
75.       What we are going to do is to allocate all pages from
76.       [start, end) (that is remove them from page allocator).
77.      
78.       The only problem is that pages at the beginning and at the
79.       end of interesting range may be not aligned with pages that
80.       page allocator holds, ie. they can be part of higher order
81.       pages. Because of this, we reserve the bigger range and
82.       once this is done free the pages we are not interested in.
83.      
84.       We don't have to hold zone->lock here because the pages are
85.       isolated thus they won't get removed from buddy.
86.      /
87.  
88.     lru_add_drain_all();
89.     drain_all_pages();
90.  
91.     order = 0;
92.     outer_start = start;
93.     while (!PageBuddy(pfn_to_page(outer_start))) {
94.         if (++order >= MAX_ORDER) {
95.             ret = -EBUSY;
96.             goto done;
97.         }
98.         outer_start &= ~0UL << order;
99.     }
100.  
101.     / Make sure the range is really isolated. /
102.     if (test_pages_isolated(outer_start, end, false)) {
103.         pr_warn("alloc_contig_range test_pages_isolated(%lx, %lx) failed\n",
104.                outer_start, end);
105.         ret = -EBUSY;
106.         goto done;
107.     }
108.  
109.  
110.     / Grab isolated pages from freelists. /
111.     outer_end = isolate_freepages_range(&cc, outer_start, end);
112.     if (!outer_end) {
113.         ret = -EBUSY;
114.         goto done;
115.     }
116.  
117.     / Free head and tail (if any) /
118.     if (start != outer_start)
119.         free_contig_range(outer_start, start - outer_start);
120.     if (end != outer_end)
121.         free_contig_range(end, outer_end - end);
122.  
123. done:
124.     undo_isolate_page_range(pfn_max_align_down(start),
125.                 pfn_max_align_up(end), migratetype);
126.     return ret;
127. }
该函数主要是用于分配指定页面号的连续内存空间,其特点是内存块不需要页面块或者内存页面阶对齐,而且需要由调用者保证单线程操作,所以在<span style="-ms-word-wrap: break-word;">dma_alloc_from_contiguous()</span>调用时是加了互斥锁做保护的,此外被分配的空间不允许跨内存管理区。

为了深入了解其动作,深入分析一下其调用的几个函数,先看一下<span style="-ms-word-wrap: break-word;">start_isolate_page_range()</span>:


1. 【file:/mm/page_isolation.c】
2. /
3.   start_isolate_page_range() -- make page-allocation-type of range of pages
4.   to be MIGRATE_ISOLATE.
5.   @start_pfn: The lower PFN of the range to be isolated.
6.   @end_pfn: The upper PFN of the range to be isolated.
7.   @migratetype: migrate type to set in error recovery.
8.  
9.   Making page-allocation-type to be MIGRATE_ISOLATE means free pages in
10.   the range will never be allocated. Any free pages and pages freed in the
11.   future will not be allocated again.
12.  
13.   start_pfn/end_pfn must be aligned to pageblock_order.
14.   Returns 0 on success and -EBUSY if any part of range cannot be isolated.
15.  /
16. int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
17.                  unsigned migratetype, bool skip_hwpoisoned_pages)
18. {
19.     unsigned long pfn;
20.     unsigned long undo_pfn;
21.     struct page *page;
22.  
23.     BUG_ON((start_pfn) & (pageblock_nr_pages - 1));
24.     BUG_ON((end_pfn) & (pageblock_nr_pages - 1));
25.  
26.     for (pfn = start_pfn;
27.          pfn < end_pfn;
28.          pfn += pageblock_nr_pages) {
29.         page = __first_valid_page(pfn, pageblock_nr_pages);
30.         if (page &&
31.             set_migratetype_isolate(page, skip_hwpoisoned_pages)) {
32.             undo_pfn = pfn;
33.             goto undo;
34.         }
35.     }
36.     return 0;
37. undo:
38.     for (pfn = start_pfn;
39.          pfn < undo_pfn;
40.          pfn += pageblock_nr_pages)
41.         unset_migratetype_isolate(pfn_to_page(pfn), migratetype);
42.  
43.     return -EBUSY;
44. }
将页面类型设置为<span style="-ms-word-wrap: break-word;">MIGRATE_ISOLATE</span>意味着指定范围的空闲页面将不会被分配,值得注意的时候,这里的迁移类型变更和前面分析的页面迁移是一致的,都是基于<span style="-ms-word-wrap: break-word;">pageblock_nr_pages</span>为基数的页面个数做迁移的。

而里面调用的<span style="-ms-word-wrap: break-word;">set_migratetype_isolate()</span>:


1. 【file:/mm/page_isolation.c】
2. int set_migratetype_isolate(struct page page, bool skip_hwpoisoned_pages)
3. {
4.     struct zone zone;
5.     unsigned long flags, pfn;
6.     struct memory_isolate_notify arg;
7.     int notifier_ret;
8.     int ret = -EBUSY;
9.  
10.     zone = page_zone(page);
11.  
12.     spin_lock_irqsave(&zone->lock, flags);
13.  
14.     pfn = page_to_pfn(page);
15.     arg.start_pfn = pfn;
16.     arg.nr_pages = pageblock_nr_pages;
17.     arg.pages_found = 0;
18.  
19.     /
20.       It may be possible to isolate a pageblock even if the
21.       migratetype is not MIGRATE_MOVABLE. The memory isolation
22.       notifier chain is used by balloon drivers to return the
23.       number of pages in a range that are held by the balloon
24.       driver to shrink memory. If all the pages are accounted for
25.       by balloons, are free, or on the LRU, isolation can continue.
26.       Later, for example, when memory hotplug notifier runs, these
27.       pages reported as "can be isolated" should be isolated(freed)
28.       by the balloon driver through the memory notifier chain.
29.      /
30.     notifier_ret = memory_isolate_notify(MEM_ISOLATE_COUNT, &arg);
31.     notifier_ret = notifier_to_errno(notifier_ret);
32.     if (notifier_ret)
33.         goto out;
34.     /
35.       FIXME: Now, memory hotplug doesn't call shrink_slab() by itself.
36.       We just check MOVABLE pages.
37.      /
38.     if (!has_unmovable_pages(zone, page, arg.pages_found,
39.                  skip_hwpoisoned_pages))
40.         ret = 0;
41.  
42.     /
43.       immobile means "not-on-lru" paes. If immobile is larger than
44.       removable-by-driver pages reported by notifier, we'll fail.
45.      */
46.  
47. out:
48.     if (!ret) {
49.         unsigned long nr_pages;
50.         int migratetype = get_pageblock_migratetype(page);
51.  
52.         set_pageblock_migratetype(page, MIGRATE_ISOLATE);
53.         nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE);
54.  
55.         __mod_zone_freepage_state(zone, -nr_pages, migratetype);
56.     }
57.  
58.     spin_unlock_irqrestore(&zone->lock, flags);
59.     if (!ret)
60.         drain_all_pages();
61.     return ret;
62. }
由该函数可以看到,将页面设置为<span style="-ms-word-wrap: break-word;">MIGRATE_ISOLATE</span>类型时,其确保该空间范围内不存在不可以移动页面,同时其设置完页面类型后,通过<span style="-ms-word-wrap: break-word;">move_freepages_block</span>会将其从原来的页面类型链表中移除并挂入到<span style="-ms-word-wrap: break-word;">MIGRATE_ISOLATE</span>类型的链表中,移入<span style="-ms-word-wrap: break-word;">MIGRATE_ISOLATE</span>类型的页面将不会被分配出去。

start_isolate_page_range()完了如果没有异常状况会返回<span style="-ms-word-wrap: break-word;">0</span>,继而是调用<span style="-ms-word-wrap: break-word;">__alloc_contig_migrate_range()</span>:


1. 【file:/mm/page_isolation.c】
2. / [start, end) must belong to a single zone. /
3. static int __alloc_contig_migrate_range(struct compact_control cc,
4.                     unsigned long start, unsigned long end)
5. {
6.     / This function is based on compact_zone() from compaction.c. */
7.     unsigned long nr_reclaimed;
8.     unsigned long pfn = start;
9.     unsigned int tries = 0;
10.     int ret = 0;
11.  
12.     migrate_prep();
13.  
14.     while (pfn < end || !list_empty(&cc->migratepages)) {
15.         if (fatal_signal_pending(current)) {
16.             ret = -EINTR;
17.             break;
18.         }
19.  
20.         if (list_empty(&cc->migratepages)) {
21.             cc->nr_migratepages = 0;
22.             pfn = isolate_migratepages_range(cc->zone, cc,
23.                              pfn, end, true);
24.             if (!pfn) {
25.                 ret = -EINTR;
26.                 break;
27.             }
28.             tries = 0;
29.         } else if (++tries == 5) {
30.             ret = ret < 0 ? ret : -EBUSY;
31.             break;
32.         }
33.  
34.         nr_reclaimed = reclaim_clean_pages_from_list(cc->zone,
35.                             &cc->migratepages);
36.         cc->nr_migratepages -= nr_reclaimed;
37.  
38.         ret = migrate_pages(&cc->migratepages, alloc_migrate_target,
39.                     0, MIGRATE_SYNC, MR_CMA);
40.     }
41.     if (ret < 0) {
42.         putback_movable_pages(&cc->migratepages);
43.         return ret;
44.     }
45.     return 0;
46. }
该函数中调用的<span style="-ms-word-wrap: break-word;">migrate_prep()</span>主要是为了将<span style="-ms-word-wrap: break-word;">LRU</span>链表进行清空,以便内存页面更好地隔离出来。

其余的则主要是<span style="-ms-word-wrap: break-word;">while</span>循环处理非空闲的页,其中主要涉及函数有<span style="-ms-word-wrap: break-word;">isolate_migratepages_range()</span>、<span style="-ms-word-wrap: break-word;">reclaim_clean_pages_from_list()</span>和<span style="-ms-word-wrap: break-word;">migrate_pages()</span>。

先看一下<span style="-ms-word-wrap: break-word;">isolate_migratepages_range()</span>的实现:


1. 【file:/mm/compaction.c】
2. /
3.   isolate_migratepages_range() - isolate all migrate-able pages in range.
4.   @zone: Zone pages are in.
5.   @cc: Compaction control structure.
6.   @low_pfn: The first PFN of the range.
7.   @end_pfn: The one-past-the-last PFN of the range.
8.   @unevictable: true if it allows to isolate unevictable pages
9.  
10.   Isolate all pages that can be migrated from the range specified by
11.   [low_pfn, end_pfn). Returns zero if there is a fatal signal
12.   pending), otherwise PFN of the first page that was not scanned
13.   (which may be both less, equal to or more then end_pfn).
14.  
15.   Assumes that cc->migratepages is empty and cc->nr_migratepages is
16.   zero.
17.  
18.   Apart from cc->migratepages and cc->nr_migratetypes this function
19.   does not modify any cc's fields, in particular it does not modify
20.   (or read for that matter) cc->migrate_pfn.
21.  /
22. unsigned long
23. isolate_migratepages_range(struct zone zone, struct compact_control cc,
24.         unsigned long low_pfn, unsigned long end_pfn, bool unevictable)
25. {
26.     unsigned long last_pageblock_nr = 0, pageblock_nr;
27.     unsigned long nr_scanned = 0, nr_isolated = 0;
28.     struct list_head migratelist = &cc->migratepages;
29.     isolate_mode_t mode = 0;
30.     struct lruvec lruvec;
31.     unsigned long flags;
32.     bool locked = false;
33.     struct page page = NULL, valid_page = NULL;
34.     bool skipped_async_unsuitable = false;
35.  
36.     /
37.       Ensure that there are not too many pages isolated from the LRU
38.       list by either parallel reclaimers or compaction. If there are,
39.       delay for some time until fewer pages are isolated
40.      /
41.     while (unlikely(too_many_isolated(zone))) {
42.         / async migration should just abort /
43.         if (!cc->sync)
44.             return 0;
45.  
46.         congestion_wait(BLK_RW_ASYNC, HZ/10);
47.  
48.         if (fatal_signal_pending(current))
49.             return 0;
50.     }
51.  
52.     / Time to isolate some pages for migration /
53.     cond_resched();
54.     for (; low_pfn < end_pfn; low_pfn++) {
55.         / give a chance to irqs before checking need_resched() /
56.         if (locked && !((low_pfn+1) % SWAP_CLUSTER_MAX)) {
57.             if (should_release_lock(&zone->lru_lock)) {
58.                 spin_unlock_irqrestore(&zone->lru_lock, flags);
59.                 locked = false;
60.             }
61.         }
62.  
63.         /
64.           migrate_pfn does not necessarily start aligned to a
65.           pageblock. Ensure that pfn_valid is called when moving
66.           into a new MAX_ORDER_NR_PAGES range in case of large
67.           memory holes within the zone
68.          /
69.         if ((low_pfn & (MAX_ORDER_NR_PAGES - 1)) == 0) {
70.             if (!pfn_valid(low_pfn)) {
71.                 low_pfn += MAX_ORDER_NR_PAGES - 1;
72.                 continue;
73.             }
74.         }
75.  
76.         if (!pfn_valid_within(low_pfn))
77.             continue;
78.         nr_scanned++;
79.  
80.         /
81.           Get the page and ensure the page is within the same zone.
82.           See the comment in isolate_freepages about overlapping
83.           nodes. It is deliberate that the new zone lock is not taken
84.           as memory compaction should not move pages between nodes.
85.          /
86.         page = pfn_to_page(low_pfn);
87.         if (page_zone(page) != zone)
88.             continue;
89.  
90.         if (!valid_page)
91.             valid_page = page;
92.  
93.         / If isolation recently failed, do not retry /
94.         pageblock_nr = low_pfn >> pageblock_order;
95.         if (!isolation_suitable(cc, page))
96.             goto next_pageblock;
97.  
98.         /
99.           Skip if free. page_order cannot be used without zone->lock
100.           as nothing prevents parallel allocations or buddy merging.
101.          /
102.         if (PageBuddy(page))
103.             continue;
104.  
105.         /
106.           For async migration, also only scan in MOVABLE blocks. Async
107.           migration is optimistic to see if the minimum amount of work
108.           satisfies the allocation
109.          /
110.         if (!cc->sync && last_pageblock_nr != pageblock_nr &&
111.             !migrate_async_suitable(get_pageblock_migratetype(page))) {
112.             cc->finished_update_migrate = true;
113.             skipped_async_unsuitable = true;
114.             goto next_pageblock;
115.         }
116.  
117.         /
118.           Check may be lockless but that's ok as we recheck later.
119.           It's possible to migrate LRU pages and balloon pages
120.           Skip any other type of page
121.          /
122.         if (!PageLRU(page)) {
123.             if (unlikely(balloon_page_movable(page))) {
124.                 if (locked && balloon_page_isolate(page)) {
125.                     / Successfully isolated /
126.                     cc->finished_update_migrate = true;
127.                     list_add(&page->lru, migratelist);
128.                     cc->nr_migratepages++;
129.                     nr_isolated++;
130.                     goto check_compact_cluster;
131.                 }
132.             }
133.             continue;
134.         }
135.  
136.         /
137.           PageLRU is set. lru_lock normally excludes isolation
138.           splitting and collapsing (collapsing has already happened
139.           if PageLRU is set) but the lock is not necessarily taken
140.           here and it is wasteful to take it just to check transhuge.
141.           Check TransHuge without lock and skip the whole pageblock if
142.           it's either a transhuge or hugetlbfs page, as calling
143.           compound_order() without preventing THP from splitting the
144.           page underneath us may return surprising results.
145.          /
146.         if (PageTransHuge(page)) {
147.             if (!locked)
148.                 goto next_pageblock;
149.             low_pfn += (1 << compound_order(page)) - 1;
150.             continue;
151.         }
152.  
153.         / Check if it is ok to still hold the lock /
154.         locked = compact_checklock_irqsave(&zone->lru_lock, &flags,
155.                                 locked, cc);
156.         if (!locked || fatal_signal_pending(current))
157.             break;
158.  
159.         / Recheck PageLRU and PageTransHuge under lock /
160.         if (!PageLRU(page))
161.             continue;
162.         if (PageTransHuge(page)) {
163.             low_pfn += (1 << compound_order(page)) - 1;
164.             continue;
165.         }
166.  
167.         if (!cc->sync)
168.             mode |= ISOLATE_ASYNC_MIGRATE;
169.  
170.         if (unevictable)
171.             mode |= ISOLATE_UNEVICTABLE;
172.  
173.         lruvec = mem_cgroup_page_lruvec(page, zone);
174.  
175.         / Try isolate the page /
176.         if (__isolate_lru_page(page, mode) != 0)
177.             continue;
178.  
179.         VM_BUG_ON_PAGE(PageTransCompound(page), page);
180.  
181.         / Successfully isolated /
182.         cc->finished_update_migrate = true;
183.         del_page_from_lru_list(page, lruvec, page_lru(page));
184.         list_add(&page->lru, migratelist);
185.         cc->nr_migratepages++;
186.         nr_isolated++;
187.  
188. check_compact_cluster:
189.         / Avoid isolating too much /
190.         if (cc->nr_migratepages == COMPACT_CLUSTER_MAX) {
191.             ++low_pfn;
192.             break;
193.         }
194.  
195.         continue;
196.  
197. next_pageblock:
198.         low_pfn = ALIGN(low_pfn + 1, pageblock_nr_pages) - 1;
199.         last_pageblock_nr = pageblock_nr;
200.     }
201.  
202.     acct_isolated(zone, locked, cc);
203.  
204.     if (locked)
205.         spin_unlock_irqrestore(&zone->lru_lock, flags);
206.  
207.     /
208.       Update the pageblock-skip information and cached scanner pfn,
209.       if the whole pageblock was scanned without isolating any page.
210.       This is not done when pageblock was skipped due to being unsuitable
211.       for async compaction, so that eventual sync compaction can try.
212.      /
213.     if (low_pfn == end_pfn && !skipped_async_unsuitable)
214.         update_pageblock_skip(cc, valid_page, nr_isolated, true);
215.  
216.     trace_mm_compaction_isolate_migratepages(nr_scanned, nr_isolated);
217.  
218.     count_compact_events(COMPACTMIGRATE_SCANNED, nr_scanned);
219.     if (nr_isolated)
220.         count_compact_events(COMPACTISOLATED, nr_isolated);
221.  
222.     return low_pfn;
223. }
该函数主要是将<span style="-ms-word-wrap: break-word;">low_pfn</span>到<span style="-ms-word-wrap: break-word;">end_pfn</span>范围内,可用移动的内存页隔离出来,挂到<span style="-ms-word-wrap: break-word;">cc-&gt;migratepages</span>链表上。为后面的内存迁移做准备。

接着再看<span style="-ms-word-wrap: break-word;">reclaim_clean_pages_from_list()</span>:


1. 【file:/mm/vmscan.c】
2. unsigned long reclaim_clean_pages_from_list(struct zone zone,
3.                         struct list_head page_list)
4. {
5.     struct scan_control sc = {
6.         .gfp_mask = GFP_KERNEL,
7.         .priority = DEF_PRIORITY,
8.         .may_unmap = 1,
9.     };
10.     unsigned long ret, dummy1, dummy2, dummy3, dummy4, dummy5;
11.     struct page page, next;
12.     LIST_HEAD(clean_pages);
13.  
14.     list_for_each_entry_safe(page, next, page_list, lru) {
15.         if (page_is_file_cache(page) && !PageDirty(page) &&
16.             !isolated_balloon_page(page)) {
17.             ClearPageActive(page);
18.             list_move(&page->lru, &clean_pages);
19.         }
20.     }
21.  
22.     ret = shrink_page_list(&clean_pages, zone, &sc,
23.             TTU_UNMAP|TTU_IGNORE_ACCESS,
24.             &dummy1, &dummy2, &dummy3, &dummy4, &dummy5, true);
25.     list_splice(&clean_pages, page_list);
26.     __mod_zone_page_state(zone, NR_ISOLATED_FILE, -ret);
27.     return ret;
28. }
该函数则主要是将文件缓存、干净的以及非隔离的气球页进行直接回收。

继而分析<span style="-ms-word-wrap: break-word;">migrate_pages()</span>:


1. 【file:/mm/migrate.c】
2. /
3.   migrate_pages - migrate the pages specified in a list, to the free pages
4.   supplied as the target for the page migration
5.  
6.   @from: The list of pages to be migrated.
7.   @get_new_page: The function used to allocate free pages to be used
8.   as the target of the page migration.
9.   @private: Private data to be passed on to get_new_page()
10.   @mode: The migration mode that specifies the constraints for
11.   page migration, if any.
12.   @reason: The reason for page migration.
13.  
14.   The function returns after 10 attempts or if no pages are movable any more
15.   because the list has become empty or no retryable pages exist any more.
16.   The caller should call putback_lru_pages() to return pages to the LRU
17.   or free list only if ret != 0.
18.  
19.   Returns the number of pages that were not migrated, or an error code.
20.  /
21. int migrate_pages(struct list_head from, new_page_t get_new_page,
22.         unsigned long private, enum migrate_mode mode, int reason)
23. {
24.     int retry = 1;
25.     int nr_failed = 0;
26.     int nr_succeeded = 0;
27.     int pass = 0;
28.     struct page page;
29.     struct page page2;
30.     int swapwrite = current->flags & PF_SWAPWRITE;
31.     int rc;
32.  
33.     if (!swapwrite)
34.         current->flags |= PF_SWAPWRITE;
35.  
36.     for(pass = 0; pass < 10 && retry; pass++) {
37.         retry = 0;
38.  
39.         list_for_each_entry_safe(page, page2, from, lru) {
40.             cond_resched();
41.  
42.             if (PageHuge(page))
43.                 rc = unmap_and_move_huge_page(get_new_page,
44.                         private, page, pass > 2, mode);
45.             else
46.                 rc = unmap_and_move(get_new_page, private,
47.                         page, pass > 2, mode);
48.  
49.             switch(rc) {
50.             case -ENOMEM:
51.                 goto out;
52.             case -EAGAIN:
53.                 retry++;
54.                 break;
55.             case MIGRATEPAGE_SUCCESS:
56.                 nr_succeeded++;
57.                 break;
58.             default:
59.                 /
60.                   Permanent failure (-EBUSY, -ENOSYS, etc.):
61.                   unlike -EAGAIN case, the failed page is
62.                   removed from migration page list and not
63.                   retried in the next outer loop.
64.                  /
65.                 nr_failed++;
66.                 break;
67.             }
68.         }
69.     }
70.     rc = nr_failed + retry;
71. out:
72.     if (nr_succeeded)
73.         count_vm_events(PGMIGRATE_SUCCESS, nr_succeeded);
74.     if (nr_failed)
75.         count_vm_events(PGMIGRATE_FAIL, nr_failed);
76.     trace_mm_migrate_pages(nr_succeeded, nr_failed, mode, reason);
77.  
78.     if (!swapwrite)
79.         current->flags &= ~PF_SWAPWRITE;
80.  
81.     return rc;
82. }
该函数主要实现的是页面迁移操作。其中核心函数是<span style="-ms-word-wrap: break-word;">unmap_and_move()</span>,其用于申请新页面,将老页面移过去再进行映射,以实现老页面得以回收。由此可知<span style="-ms-word-wrap: break-word;">__alloc_contig_migrate_range()</span>函数主要工作是将页面进行隔离然后再进行分离。

最后回到<span style="-ms-word-wrap: break-word;">alloc_contig_range()</span>函数,其从<span style="-ms-word-wrap: break-word;">__alloc_contig_migrate_range()</span>返回后,将再次调用<span style="-ms-word-wrap: break-word;">lru_add_drain_all()</span>,这里应该是为了防止<span style="-ms-word-wrap: break-word;">__alloc_contig_migrate_range()</span>中间休眠时,<span style="-ms-word-wrap: break-word;">LRU</span>链表被添加上页面了。而<span style="-ms-word-wrap: break-word;">drain_all_pages()</span>则是将每<span style="-ms-word-wrap: break-word;">CPU</span>中缓存的页面进行释放,这些页面将会根据其标记释放至<span style="-ms-word-wrap: break-word;">MIGRATE_ISOLATE</span>空闲列表中。接着再是<span style="-ms-word-wrap: break-word;">test_pages_isolated()</span>,用于检查确保该范围内的页面已经被隔离;<span style="-ms-word-wrap: break-word;">isolate_freepages_range()</span>则是将指定范围的空闲页面隔离出来;最后<span style="-ms-word-wrap: break-word;">undo_isolate_page_range()</span>则是将所有的标记为隔离的页面重新标记为<span style="-ms-word-wrap: break-word;">MIGRATE_CMA</span>,至此所需的连续内存页面已经分配到了,无需在乎其迁移属性了,便更改回去。

&nbsp; &nbsp; &nbsp;<span style="line-height: 1.5; text-indent: 21pt; -ms-word-wrap: break-word;">此外</span><span style="line-height: 1.5; text-indent: 21pt; -ms-word-wrap: break-word;">CMA</span><span style="line-height: 1.5; text-indent: 21pt; -ms-word-wrap: break-word;">管理内存的释放为:</span>


1. 【file:/mm/page_alloc.c】
2. void free_contig_range(unsigned long pfn, unsigned nr_pages)
3. {
4.     unsigned int count = 0;
5.  
6.     for (; nr_pages--; pfn++) {
7.         struct page *page = pfn_to_page(pfn);
8.  
9.         count += page_count(page) != 1;
10.         __free_page(page);
11.     }
12.     WARN(count != 0, "%d pages are still in use!\n", count);
13. }
于是内存释放再次回归到<span style="-ms-word-wrap: break-word;">__free_page()</span>,这就便不再深入了。

原本无意于分析<span style="-ms-word-wrap: break-word;">CMA</span>的,一时好奇琢磨了一下,但已琢磨至此,遂记之,但有部分细节存在疑惑有待深入,因涉及面广,待后期深入分析后再进行细化。如有理解错误之处,望不吝指正。