generic: sync 6.1 patches from official source

Closed: #11208
This commit is contained in:
coolsnowwolf 2023-05-25 20:19:18 +08:00 committed by AmadeusGhost
parent 3ef1f5ade3
commit 398eece1d6
233 changed files with 20771 additions and 3130 deletions

View File

@ -1,2 +1,2 @@
LINUX_VERSION-6.1 = .34
LINUX_KERNEL_HASH-6.1.34 = b26f7cbcbf8031efc49f11f236f372fc34a4fd5fc6ad3151b893d1aa038ed603
LINUX_VERSION-6.1 = .35
LINUX_KERNEL_HASH-6.1.35 = be368143bc5d0dc73dd3e8c6191630c1620520379baf6f47c16116b2c0bc26ac

View File

@ -0,0 +1,352 @@
From 8c20e2eb5f2a0175b774134685e4d7bd93e85ff8 Mon Sep 17 00:00:00 2001
From: Yu Zhao <yuzhao@google.com>
Date: Wed, 21 Dec 2022 21:18:59 -0700
Subject: [PATCH 01/19] UPSTREAM: mm: multi-gen LRU: rename lru_gen_struct to
lru_gen_folio
Patch series "mm: multi-gen LRU: memcg LRU", v3.
Overview
========
An memcg LRU is a per-node LRU of memcgs. It is also an LRU of LRUs,
since each node and memcg combination has an LRU of folios (see
mem_cgroup_lruvec()).
Its goal is to improve the scalability of global reclaim, which is
critical to system-wide memory overcommit in data centers. Note that
memcg reclaim is currently out of scope.
Its memory bloat is a pointer to each lruvec and negligible to each
pglist_data. In terms of traversing memcgs during global reclaim, it
improves the best-case complexity from O(n) to O(1) and does not affect
the worst-case complexity O(n). Therefore, on average, it has a sublinear
complexity in contrast to the current linear complexity.
The basic structure of an memcg LRU can be understood by an analogy to
the active/inactive LRU (of folios):
1. It has the young and the old (generations), i.e., the counterparts
to the active and the inactive;
2. The increment of max_seq triggers promotion, i.e., the counterpart
to activation;
3. Other events trigger similar operations, e.g., offlining an memcg
triggers demotion, i.e., the counterpart to deactivation.
In terms of global reclaim, it has two distinct features:
1. Sharding, which allows each thread to start at a random memcg (in
the old generation) and improves parallelism;
2. Eventual fairness, which allows direct reclaim to bail out at will
and reduces latency without affecting fairness over some time.
The commit message in patch 6 details the workflow:
https://lore.kernel.org/r/20221222041905.2431096-7-yuzhao@google.com/
The following is a simple test to quickly verify its effectiveness.
Test design:
1. Create multiple memcgs.
2. Each memcg contains a job (fio).
3. All jobs access the same amount of memory randomly.
4. The system does not experience global memory pressure.
5. Periodically write to the root memory.reclaim.
Desired outcome:
1. All memcgs have similar pgsteal counts, i.e., stddev(pgsteal)
over mean(pgsteal) is close to 0%.
2. The total pgsteal is close to the total requested through
memory.reclaim, i.e., sum(pgsteal) over sum(requested) is close
to 100%.
Actual outcome [1]:
MGLRU off MGLRU on
stddev(pgsteal) / mean(pgsteal) 75% 20%
sum(pgsteal) / sum(requested) 425% 95%
####################################################################
MEMCGS=128
for ((memcg = 0; memcg < $MEMCGS; memcg++)); do
mkdir /sys/fs/cgroup/memcg$memcg
done
start() {
echo $BASHPID > /sys/fs/cgroup/memcg$memcg/cgroup.procs
fio -name=memcg$memcg --numjobs=1 --ioengine=mmap \
--filename=/dev/zero --size=1920M --rw=randrw \
--rate=64m,64m --random_distribution=random \
--fadvise_hint=0 --time_based --runtime=10h \
--group_reporting --minimal
}
for ((memcg = 0; memcg < $MEMCGS; memcg++)); do
start &
done
sleep 600
for ((i = 0; i < 600; i++)); do
echo 256m >/sys/fs/cgroup/memory.reclaim
sleep 6
done
for ((memcg = 0; memcg < $MEMCGS; memcg++)); do
grep "pgsteal " /sys/fs/cgroup/memcg$memcg/memory.stat
done
####################################################################
[1]: This was obtained from running the above script (touches less
than 256GB memory) on an EPYC 7B13 with 512GB DRAM for over an
hour.
This patch (of 8):
The new name lru_gen_folio will be more distinct from the coming
lru_gen_memcg.
Link: https://lkml.kernel.org/r/20221222041905.2431096-1-yuzhao@google.com
Link: https://lkml.kernel.org/r/20221222041905.2431096-2-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 274865848
(cherry picked from commit 391655fe08d1f942359a11148aa9aaf3f99d6d6f)
Change-Id: I7df67e0e2435ba28f10eaa57d28d98b61a9210a6
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
include/linux/mm_inline.h | 4 ++--
include/linux/mmzone.h | 6 +++---
mm/vmscan.c | 34 +++++++++++++++++-----------------
mm/workingset.c | 4 ++--
4 files changed, 24 insertions(+), 24 deletions(-)
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -178,7 +178,7 @@ static inline void lru_gen_update_size(s
int zone = folio_zonenum(folio);
int delta = folio_nr_pages(folio);
enum lru_list lru = type * LRU_INACTIVE_FILE;
- struct lru_gen_struct *lrugen = &lruvec->lrugen;
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
VM_WARN_ON_ONCE(old_gen != -1 && old_gen >= MAX_NR_GENS);
VM_WARN_ON_ONCE(new_gen != -1 && new_gen >= MAX_NR_GENS);
@@ -224,7 +224,7 @@ static inline bool lru_gen_add_folio(str
int gen = folio_lru_gen(folio);
int type = folio_is_file_lru(folio);
int zone = folio_zonenum(folio);
- struct lru_gen_struct *lrugen = &lruvec->lrugen;
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
VM_WARN_ON_ONCE_FOLIO(gen != -1, folio);
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -404,7 +404,7 @@ enum {
* The number of pages in each generation is eventually consistent and therefore
* can be transiently negative when reset_batch_size() is pending.
*/
-struct lru_gen_struct {
+struct lru_gen_folio {
/* the aging increments the youngest generation number */
unsigned long max_seq;
/* the eviction increments the oldest generation numbers */
@@ -461,7 +461,7 @@ struct lru_gen_mm_state {
struct lru_gen_mm_walk {
/* the lruvec under reclaim */
struct lruvec *lruvec;
- /* unstable max_seq from lru_gen_struct */
+ /* unstable max_seq from lru_gen_folio */
unsigned long max_seq;
/* the next address within an mm to scan */
unsigned long next_addr;
@@ -524,7 +524,7 @@ struct lruvec {
unsigned long flags;
#ifdef CONFIG_LRU_GEN
/* evictable pages divided into generations */
- struct lru_gen_struct lrugen;
+ struct lru_gen_folio lrugen;
/* to concurrently iterate lru_gen_mm_list */
struct lru_gen_mm_state mm_state;
#endif
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3190,7 +3190,7 @@ static int get_nr_gens(struct lruvec *lr
static bool __maybe_unused seq_is_valid(struct lruvec *lruvec)
{
- /* see the comment on lru_gen_struct */
+ /* see the comment on lru_gen_folio */
return get_nr_gens(lruvec, LRU_GEN_FILE) >= MIN_NR_GENS &&
get_nr_gens(lruvec, LRU_GEN_FILE) <= get_nr_gens(lruvec, LRU_GEN_ANON) &&
get_nr_gens(lruvec, LRU_GEN_ANON) <= MAX_NR_GENS;
@@ -3596,7 +3596,7 @@ struct ctrl_pos {
static void read_ctrl_pos(struct lruvec *lruvec, int type, int tier, int gain,
struct ctrl_pos *pos)
{
- struct lru_gen_struct *lrugen = &lruvec->lrugen;
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
int hist = lru_hist_from_seq(lrugen->min_seq[type]);
pos->refaulted = lrugen->avg_refaulted[type][tier] +
@@ -3611,7 +3611,7 @@ static void read_ctrl_pos(struct lruvec
static void reset_ctrl_pos(struct lruvec *lruvec, int type, bool carryover)
{
int hist, tier;
- struct lru_gen_struct *lrugen = &lruvec->lrugen;
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
bool clear = carryover ? NR_HIST_GENS == 1 : NR_HIST_GENS > 1;
unsigned long seq = carryover ? lrugen->min_seq[type] : lrugen->max_seq + 1;
@@ -3688,7 +3688,7 @@ static int folio_update_gen(struct folio
static int folio_inc_gen(struct lruvec *lruvec, struct folio *folio, bool reclaiming)
{
int type = folio_is_file_lru(folio);
- struct lru_gen_struct *lrugen = &lruvec->lrugen;
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
int new_gen, old_gen = lru_gen_from_seq(lrugen->min_seq[type]);
unsigned long new_flags, old_flags = READ_ONCE(folio->flags);
@@ -3733,7 +3733,7 @@ static void update_batch_size(struct lru
static void reset_batch_size(struct lruvec *lruvec, struct lru_gen_mm_walk *walk)
{
int gen, type, zone;
- struct lru_gen_struct *lrugen = &lruvec->lrugen;
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
walk->batched = 0;
@@ -4250,7 +4250,7 @@ static bool inc_min_seq(struct lruvec *l
{
int zone;
int remaining = MAX_LRU_BATCH;
- struct lru_gen_struct *lrugen = &lruvec->lrugen;
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
int new_gen, old_gen = lru_gen_from_seq(lrugen->min_seq[type]);
if (type == LRU_GEN_ANON && !can_swap)
@@ -4286,7 +4286,7 @@ static bool try_to_inc_min_seq(struct lr
{
int gen, type, zone;
bool success = false;
- struct lru_gen_struct *lrugen = &lruvec->lrugen;
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
DEFINE_MIN_SEQ(lruvec);
VM_WARN_ON_ONCE(!seq_is_valid(lruvec));
@@ -4307,7 +4307,7 @@ next:
;
}
- /* see the comment on lru_gen_struct */
+ /* see the comment on lru_gen_folio */
if (can_swap) {
min_seq[LRU_GEN_ANON] = min(min_seq[LRU_GEN_ANON], min_seq[LRU_GEN_FILE]);
min_seq[LRU_GEN_FILE] = max(min_seq[LRU_GEN_ANON], lrugen->min_seq[LRU_GEN_FILE]);
@@ -4329,7 +4329,7 @@ static void inc_max_seq(struct lruvec *l
{
int prev, next;
int type, zone;
- struct lru_gen_struct *lrugen = &lruvec->lrugen;
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
spin_lock_irq(&lruvec->lru_lock);
@@ -4387,7 +4387,7 @@ static bool try_to_inc_max_seq(struct lr
bool success;
struct lru_gen_mm_walk *walk;
struct mm_struct *mm = NULL;
- struct lru_gen_struct *lrugen = &lruvec->lrugen;
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
VM_WARN_ON_ONCE(max_seq > READ_ONCE(lrugen->max_seq));
@@ -4452,7 +4452,7 @@ static bool should_run_aging(struct lruv
unsigned long old = 0;
unsigned long young = 0;
unsigned long total = 0;
- struct lru_gen_struct *lrugen = &lruvec->lrugen;
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
struct mem_cgroup *memcg = lruvec_memcg(lruvec);
for (type = !can_swap; type < ANON_AND_FILE; type++) {
@@ -4737,7 +4737,7 @@ static bool sort_folio(struct lruvec *lr
int delta = folio_nr_pages(folio);
int refs = folio_lru_refs(folio);
int tier = lru_tier_from_refs(refs);
- struct lru_gen_struct *lrugen = &lruvec->lrugen;
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
VM_WARN_ON_ONCE_FOLIO(gen >= MAX_NR_GENS, folio);
@@ -4837,7 +4837,7 @@ static int scan_folios(struct lruvec *lr
int scanned = 0;
int isolated = 0;
int remaining = MAX_LRU_BATCH;
- struct lru_gen_struct *lrugen = &lruvec->lrugen;
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
struct mem_cgroup *memcg = lruvec_memcg(lruvec);
VM_WARN_ON_ONCE(!list_empty(list));
@@ -5237,7 +5237,7 @@ done:
static bool __maybe_unused state_is_valid(struct lruvec *lruvec)
{
- struct lru_gen_struct *lrugen = &lruvec->lrugen;
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
if (lrugen->enabled) {
enum lru_list lru;
@@ -5519,7 +5519,7 @@ static void lru_gen_seq_show_full(struct
int i;
int type, tier;
int hist = lru_hist_from_seq(seq);
- struct lru_gen_struct *lrugen = &lruvec->lrugen;
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
for (tier = 0; tier < MAX_NR_TIERS; tier++) {
seq_printf(m, " %10d", tier);
@@ -5569,7 +5569,7 @@ static int lru_gen_seq_show(struct seq_f
unsigned long seq;
bool full = !debugfs_real_fops(m->file)->write;
struct lruvec *lruvec = v;
- struct lru_gen_struct *lrugen = &lruvec->lrugen;
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
int nid = lruvec_pgdat(lruvec)->node_id;
struct mem_cgroup *memcg = lruvec_memcg(lruvec);
DEFINE_MAX_SEQ(lruvec);
@@ -5823,7 +5823,7 @@ void lru_gen_init_lruvec(struct lruvec *
{
int i;
int gen, type, zone;
- struct lru_gen_struct *lrugen = &lruvec->lrugen;
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
lrugen->max_seq = MIN_NR_GENS + 1;
lrugen->enabled = lru_gen_enabled();
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -223,7 +223,7 @@ static void *lru_gen_eviction(struct fol
unsigned long token;
unsigned long min_seq;
struct lruvec *lruvec;
- struct lru_gen_struct *lrugen;
+ struct lru_gen_folio *lrugen;
int type = folio_is_file_lru(folio);
int delta = folio_nr_pages(folio);
int refs = folio_lru_refs(folio);
@@ -252,7 +252,7 @@ static void lru_gen_refault(struct folio
unsigned long token;
unsigned long min_seq;
struct lruvec *lruvec;
- struct lru_gen_struct *lrugen;
+ struct lru_gen_folio *lrugen;
struct mem_cgroup *memcg;
struct pglist_data *pgdat;
int type = folio_is_file_lru(folio);

View File

@ -0,0 +1,197 @@
From 656287d55d9cfc72a4bcd4d9bd098570f12ce409 Mon Sep 17 00:00:00 2001
From: Yu Zhao <yuzhao@google.com>
Date: Wed, 21 Dec 2022 21:19:00 -0700
Subject: [PATCH 02/19] UPSTREAM: mm: multi-gen LRU: rename lrugen->lists[] to
lrugen->folios[]
lru_gen_folio will be chained into per-node lists by the coming
lrugen->list.
Link: https://lkml.kernel.org/r/20221222041905.2431096-3-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 274865848
(cherry picked from commit 6df1b2212950aae2b2188c6645ea18e2a9e3fdd5)
Change-Id: I09f53e0fb2cd6b8b3adbb8a80b15dc5efbeae857
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
Documentation/mm/multigen_lru.rst | 8 ++++----
include/linux/mm_inline.h | 4 ++--
include/linux/mmzone.h | 8 ++++----
mm/vmscan.c | 20 ++++++++++----------
4 files changed, 20 insertions(+), 20 deletions(-)
--- a/Documentation/mm/multigen_lru.rst
+++ b/Documentation/mm/multigen_lru.rst
@@ -89,15 +89,15 @@ variables are monotonically increasing.
Generation numbers are truncated into ``order_base_2(MAX_NR_GENS+1)``
bits in order to fit into the gen counter in ``folio->flags``. Each
-truncated generation number is an index to ``lrugen->lists[]``. The
+truncated generation number is an index to ``lrugen->folios[]``. The
sliding window technique is used to track at least ``MIN_NR_GENS`` and
at most ``MAX_NR_GENS`` generations. The gen counter stores a value
within ``[1, MAX_NR_GENS]`` while a page is on one of
-``lrugen->lists[]``; otherwise it stores zero.
+``lrugen->folios[]``; otherwise it stores zero.
Each generation is divided into multiple tiers. A page accessed ``N``
times through file descriptors is in tier ``order_base_2(N)``. Unlike
-generations, tiers do not have dedicated ``lrugen->lists[]``. In
+generations, tiers do not have dedicated ``lrugen->folios[]``. In
contrast to moving across generations, which requires the LRU lock,
moving across tiers only involves atomic operations on
``folio->flags`` and therefore has a negligible cost. A feedback loop
@@ -127,7 +127,7 @@ page mapped by this PTE to ``(max_seq%MA
Eviction
--------
The eviction consumes old generations. Given an ``lruvec``, it
-increments ``min_seq`` when ``lrugen->lists[]`` indexed by
+increments ``min_seq`` when ``lrugen->folios[]`` indexed by
``min_seq%MAX_NR_GENS`` becomes empty. To select a type and a tier to
evict from, it first compares ``min_seq[]`` to select the older type.
If both types are equally old, it selects the one whose first tier has
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -256,9 +256,9 @@ static inline bool lru_gen_add_folio(str
lru_gen_update_size(lruvec, folio, -1, gen);
/* for folio_rotate_reclaimable() */
if (reclaiming)
- list_add_tail(&folio->lru, &lrugen->lists[gen][type][zone]);
+ list_add_tail(&folio->lru, &lrugen->folios[gen][type][zone]);
else
- list_add(&folio->lru, &lrugen->lists[gen][type][zone]);
+ list_add(&folio->lru, &lrugen->folios[gen][type][zone]);
return true;
}
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -312,7 +312,7 @@ enum lruvec_flags {
* They form a sliding window of a variable size [MIN_NR_GENS, MAX_NR_GENS]. An
* offset within MAX_NR_GENS, i.e., gen, indexes the LRU list of the
* corresponding generation. The gen counter in folio->flags stores gen+1 while
- * a page is on one of lrugen->lists[]. Otherwise it stores 0.
+ * a page is on one of lrugen->folios[]. Otherwise it stores 0.
*
* A page is added to the youngest generation on faulting. The aging needs to
* check the accessed bit at least twice before handing this page over to the
@@ -324,8 +324,8 @@ enum lruvec_flags {
* rest of generations, if they exist, are considered inactive. See
* lru_gen_is_active().
*
- * PG_active is always cleared while a page is on one of lrugen->lists[] so that
- * the aging needs not to worry about it. And it's set again when a page
+ * PG_active is always cleared while a page is on one of lrugen->folios[] so
+ * that the aging needs not to worry about it. And it's set again when a page
* considered active is isolated for non-reclaiming purposes, e.g., migration.
* See lru_gen_add_folio() and lru_gen_del_folio().
*
@@ -412,7 +412,7 @@ struct lru_gen_folio {
/* the birth time of each generation in jiffies */
unsigned long timestamps[MAX_NR_GENS];
/* the multi-gen LRU lists, lazily sorted on eviction */
- struct list_head lists[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES];
+ struct list_head folios[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES];
/* the multi-gen LRU sizes, eventually consistent */
long nr_pages[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES];
/* the exponential moving average of refaulted */
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4258,7 +4258,7 @@ static bool inc_min_seq(struct lruvec *l
/* prevent cold/hot inversion if force_scan is true */
for (zone = 0; zone < MAX_NR_ZONES; zone++) {
- struct list_head *head = &lrugen->lists[old_gen][type][zone];
+ struct list_head *head = &lrugen->folios[old_gen][type][zone];
while (!list_empty(head)) {
struct folio *folio = lru_to_folio(head);
@@ -4269,7 +4269,7 @@ static bool inc_min_seq(struct lruvec *l
VM_WARN_ON_ONCE_FOLIO(folio_zonenum(folio) != zone, folio);
new_gen = folio_inc_gen(lruvec, folio, false);
- list_move_tail(&folio->lru, &lrugen->lists[new_gen][type][zone]);
+ list_move_tail(&folio->lru, &lrugen->folios[new_gen][type][zone]);
if (!--remaining)
return false;
@@ -4297,7 +4297,7 @@ static bool try_to_inc_min_seq(struct lr
gen = lru_gen_from_seq(min_seq[type]);
for (zone = 0; zone < MAX_NR_ZONES; zone++) {
- if (!list_empty(&lrugen->lists[gen][type][zone]))
+ if (!list_empty(&lrugen->folios[gen][type][zone]))
goto next;
}
@@ -4762,7 +4762,7 @@ static bool sort_folio(struct lruvec *lr
/* promoted */
if (gen != lru_gen_from_seq(lrugen->min_seq[type])) {
- list_move(&folio->lru, &lrugen->lists[gen][type][zone]);
+ list_move(&folio->lru, &lrugen->folios[gen][type][zone]);
return true;
}
@@ -4771,7 +4771,7 @@ static bool sort_folio(struct lruvec *lr
int hist = lru_hist_from_seq(lrugen->min_seq[type]);
gen = folio_inc_gen(lruvec, folio, false);
- list_move_tail(&folio->lru, &lrugen->lists[gen][type][zone]);
+ list_move_tail(&folio->lru, &lrugen->folios[gen][type][zone]);
WRITE_ONCE(lrugen->protected[hist][type][tier - 1],
lrugen->protected[hist][type][tier - 1] + delta);
@@ -4783,7 +4783,7 @@ static bool sort_folio(struct lruvec *lr
if (folio_test_locked(folio) || folio_test_writeback(folio) ||
(type == LRU_GEN_FILE && folio_test_dirty(folio))) {
gen = folio_inc_gen(lruvec, folio, true);
- list_move(&folio->lru, &lrugen->lists[gen][type][zone]);
+ list_move(&folio->lru, &lrugen->folios[gen][type][zone]);
return true;
}
@@ -4850,7 +4850,7 @@ static int scan_folios(struct lruvec *lr
for (zone = sc->reclaim_idx; zone >= 0; zone--) {
LIST_HEAD(moved);
int skipped = 0;
- struct list_head *head = &lrugen->lists[gen][type][zone];
+ struct list_head *head = &lrugen->folios[gen][type][zone];
while (!list_empty(head)) {
struct folio *folio = lru_to_folio(head);
@@ -5250,7 +5250,7 @@ static bool __maybe_unused state_is_vali
int gen, type, zone;
for_each_gen_type_zone(gen, type, zone) {
- if (!list_empty(&lrugen->lists[gen][type][zone]))
+ if (!list_empty(&lrugen->folios[gen][type][zone]))
return false;
}
}
@@ -5295,7 +5295,7 @@ static bool drain_evictable(struct lruve
int remaining = MAX_LRU_BATCH;
for_each_gen_type_zone(gen, type, zone) {
- struct list_head *head = &lruvec->lrugen.lists[gen][type][zone];
+ struct list_head *head = &lruvec->lrugen.folios[gen][type][zone];
while (!list_empty(head)) {
bool success;
@@ -5832,7 +5832,7 @@ void lru_gen_init_lruvec(struct lruvec *
lrugen->timestamps[i] = jiffies;
for_each_gen_type_zone(gen, type, zone)
- INIT_LIST_HEAD(&lrugen->lists[gen][type][zone]);
+ INIT_LIST_HEAD(&lrugen->folios[gen][type][zone]);
lruvec->mm_state.seq = MIN_NR_GENS;
init_waitqueue_head(&lruvec->mm_state.wait);

View File

@ -0,0 +1,192 @@
From 14f9a7a15f3d1af351f30e0438fd747b7ac253b0 Mon Sep 17 00:00:00 2001
From: Yu Zhao <yuzhao@google.com>
Date: Wed, 21 Dec 2022 21:19:01 -0700
Subject: [PATCH 03/19] UPSTREAM: mm: multi-gen LRU: remove eviction fairness
safeguard
Recall that the eviction consumes the oldest generation: first it
bucket-sorts folios whose gen counters were updated by the aging and
reclaims the rest; then it increments lrugen->min_seq.
The current eviction fairness safeguard for global reclaim has a
dilemma: when there are multiple eligible memcgs, should it continue
or stop upon meeting the reclaim goal? If it continues, it overshoots
and increases direct reclaim latency; if it stops, it loses fairness
between memcgs it has taken memory away from and those it has yet to.
With memcg LRU, the eviction, while ensuring eventual fairness, will
stop upon meeting its goal. Therefore the current eviction fairness
safeguard for global reclaim will not be needed.
Note that memcg LRU only applies to global reclaim. For memcg reclaim,
the eviction will continue, even if it is overshooting. This becomes
unconditional due to code simplification.
Link: https://lkml.kernel.org/r/20221222041905.2431096-4-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 274865848
(cherry picked from commit a579086c99ed70cc4bfc104348dbe3dd8f2787e6)
Change-Id: I08ac1b3c90e29cafd0566785aaa4bcdb5db7d22c
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
mm/vmscan.c | 81 +++++++++++++++--------------------------------------
1 file changed, 23 insertions(+), 58 deletions(-)
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -448,6 +448,11 @@ static bool cgroup_reclaim(struct scan_c
return sc->target_mem_cgroup;
}
+static bool global_reclaim(struct scan_control *sc)
+{
+ return !sc->target_mem_cgroup || mem_cgroup_is_root(sc->target_mem_cgroup);
+}
+
/**
* writeback_throttling_sane - is the usual dirty throttling mechanism available?
* @sc: scan_control in question
@@ -498,6 +503,11 @@ static bool cgroup_reclaim(struct scan_c
return false;
}
+static bool global_reclaim(struct scan_control *sc)
+{
+ return true;
+}
+
static bool writeback_throttling_sane(struct scan_control *sc)
{
return true;
@@ -4993,8 +5003,7 @@ static int isolate_folios(struct lruvec
return scanned;
}
-static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swappiness,
- bool *need_swapping)
+static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swappiness)
{
int type;
int scanned;
@@ -5083,9 +5092,6 @@ retry:
goto retry;
}
- if (need_swapping && type == LRU_GEN_ANON)
- *need_swapping = true;
-
return scanned;
}
@@ -5124,67 +5130,26 @@ done:
return min_seq[!can_swap] + MIN_NR_GENS <= max_seq ? nr_to_scan : 0;
}
-static bool should_abort_scan(struct lruvec *lruvec, unsigned long seq,
- struct scan_control *sc, bool need_swapping)
+static unsigned long get_nr_to_reclaim(struct scan_control *sc)
{
- int i;
- DEFINE_MAX_SEQ(lruvec);
-
- if (!current_is_kswapd()) {
- /* age each memcg at most once to ensure fairness */
- if (max_seq - seq > 1)
- return true;
-
- /* over-swapping can increase allocation latency */
- if (sc->nr_reclaimed >= sc->nr_to_reclaim && need_swapping)
- return true;
-
- /* give this thread a chance to exit and free its memory */
- if (fatal_signal_pending(current)) {
- sc->nr_reclaimed += MIN_LRU_BATCH;
- return true;
- }
-
- if (cgroup_reclaim(sc))
- return false;
- } else if (sc->nr_reclaimed - sc->last_reclaimed < sc->nr_to_reclaim)
- return false;
-
- /* keep scanning at low priorities to ensure fairness */
- if (sc->priority > DEF_PRIORITY - 2)
- return false;
-
- /*
- * A minimum amount of work was done under global memory pressure. For
- * kswapd, it may be overshooting. For direct reclaim, the allocation
- * may succeed if all suitable zones are somewhat safe. In either case,
- * it's better to stop now, and restart later if necessary.
- */
- for (i = 0; i <= sc->reclaim_idx; i++) {
- unsigned long wmark;
- struct zone *zone = lruvec_pgdat(lruvec)->node_zones + i;
-
- if (!managed_zone(zone))
- continue;
-
- wmark = current_is_kswapd() ? high_wmark_pages(zone) : low_wmark_pages(zone);
- if (wmark > zone_page_state(zone, NR_FREE_PAGES))
- return false;
- }
+ /* don't abort memcg reclaim to ensure fairness */
+ if (!global_reclaim(sc))
+ return -1;
- sc->nr_reclaimed += MIN_LRU_BATCH;
+ /* discount the previous progress for kswapd */
+ if (current_is_kswapd())
+ return sc->nr_to_reclaim + sc->last_reclaimed;
- return true;
+ return max(sc->nr_to_reclaim, compact_gap(sc->order));
}
static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
{
struct blk_plug plug;
bool need_aging = false;
- bool need_swapping = false;
unsigned long scanned = 0;
unsigned long reclaimed = sc->nr_reclaimed;
- DEFINE_MAX_SEQ(lruvec);
+ unsigned long nr_to_reclaim = get_nr_to_reclaim(sc);
lru_add_drain();
@@ -5208,7 +5173,7 @@ static void lru_gen_shrink_lruvec(struct
if (!nr_to_scan)
goto done;
- delta = evict_folios(lruvec, sc, swappiness, &need_swapping);
+ delta = evict_folios(lruvec, sc, swappiness);
if (!delta)
goto done;
@@ -5216,7 +5181,7 @@ static void lru_gen_shrink_lruvec(struct
if (scanned >= nr_to_scan)
break;
- if (should_abort_scan(lruvec, max_seq, sc, need_swapping))
+ if (sc->nr_reclaimed >= nr_to_reclaim)
break;
cond_resched();
@@ -5666,7 +5631,7 @@ static int run_eviction(struct lruvec *l
if (sc->nr_reclaimed >= nr_to_reclaim)
return 0;
- if (!evict_folios(lruvec, sc, swappiness, NULL))
+ if (!evict_folios(lruvec, sc, swappiness))
return 0;
cond_resched();

View File

@ -0,0 +1,294 @@
From f3c93d2e37a3c56593d7ccf4f4bcf1b58426fdd8 Mon Sep 17 00:00:00 2001
From: Yu Zhao <yuzhao@google.com>
Date: Wed, 21 Dec 2022 21:19:02 -0700
Subject: [PATCH 04/19] BACKPORT: mm: multi-gen LRU: remove aging fairness
safeguard
Recall that the aging produces the youngest generation: first it scans
for accessed folios and updates their gen counters; then it increments
lrugen->max_seq.
The current aging fairness safeguard for kswapd uses two passes to
ensure the fairness to multiple eligible memcgs. On the first pass,
which is shared with the eviction, it checks whether all eligible
memcgs are low on cold folios. If so, it requires a second pass, on
which it ages all those memcgs at the same time.
With memcg LRU, the aging, while ensuring eventual fairness, will run
when necessary. Therefore the current aging fairness safeguard for
kswapd will not be needed.
Note that memcg LRU only applies to global reclaim. For memcg reclaim,
the aging can be unfair to different memcgs, i.e., their
lrugen->max_seq can be incremented at different paces.
Link: https://lkml.kernel.org/r/20221222041905.2431096-5-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 274865848
(cherry picked from commit 7348cc91821b0cb24dfb00e578047f68299a50ab)
[TJ: Resolved conflicts with older function signatures for
min_cgroup_below_min / min_cgroup_below_low]
Change-Id: I6e36ecfbaaefbc0a56d9a9d5d7cbe404ed7f57a5
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
mm/vmscan.c | 126 ++++++++++++++++++++++++----------------------------
1 file changed, 59 insertions(+), 67 deletions(-)
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -136,7 +136,6 @@ struct scan_control {
#ifdef CONFIG_LRU_GEN
/* help kswapd make better choices among multiple memcgs */
- unsigned int memcgs_need_aging:1;
unsigned long last_reclaimed;
#endif
@@ -4455,7 +4454,7 @@ done:
return true;
}
-static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq, unsigned long *min_seq,
+static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq,
struct scan_control *sc, bool can_swap, unsigned long *nr_to_scan)
{
int gen, type, zone;
@@ -4464,6 +4463,13 @@ static bool should_run_aging(struct lruv
unsigned long total = 0;
struct lru_gen_folio *lrugen = &lruvec->lrugen;
struct mem_cgroup *memcg = lruvec_memcg(lruvec);
+ DEFINE_MIN_SEQ(lruvec);
+
+ /* whether this lruvec is completely out of cold folios */
+ if (min_seq[!can_swap] + MIN_NR_GENS > max_seq) {
+ *nr_to_scan = 0;
+ return true;
+ }
for (type = !can_swap; type < ANON_AND_FILE; type++) {
unsigned long seq;
@@ -4492,8 +4498,6 @@ static bool should_run_aging(struct lruv
* stalls when the number of generations reaches MIN_NR_GENS. Hence, the
* ideal number of generations is MIN_NR_GENS+1.
*/
- if (min_seq[!can_swap] + MIN_NR_GENS > max_seq)
- return true;
if (min_seq[!can_swap] + MIN_NR_GENS < max_seq)
return false;
@@ -4512,40 +4516,54 @@ static bool should_run_aging(struct lruv
return false;
}
-static bool age_lruvec(struct lruvec *lruvec, struct scan_control *sc, unsigned long min_ttl)
+static bool lruvec_is_sizable(struct lruvec *lruvec, struct scan_control *sc)
{
- bool need_aging;
- unsigned long nr_to_scan;
- int swappiness = get_swappiness(lruvec, sc);
+ int gen, type, zone;
+ unsigned long total = 0;
+ bool can_swap = get_swappiness(lruvec, sc);
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
struct mem_cgroup *memcg = lruvec_memcg(lruvec);
DEFINE_MAX_SEQ(lruvec);
DEFINE_MIN_SEQ(lruvec);
- VM_WARN_ON_ONCE(sc->memcg_low_reclaim);
+ for (type = !can_swap; type < ANON_AND_FILE; type++) {
+ unsigned long seq;
- mem_cgroup_calculate_protection(NULL, memcg);
+ for (seq = min_seq[type]; seq <= max_seq; seq++) {
+ gen = lru_gen_from_seq(seq);
- if (mem_cgroup_below_min(memcg))
- return false;
+ for (zone = 0; zone < MAX_NR_ZONES; zone++)
+ total += max(READ_ONCE(lrugen->nr_pages[gen][type][zone]), 0L);
+ }
+ }
- need_aging = should_run_aging(lruvec, max_seq, min_seq, sc, swappiness, &nr_to_scan);
+ /* whether the size is big enough to be helpful */
+ return mem_cgroup_online(memcg) ? (total >> sc->priority) : total;
+}
- if (min_ttl) {
- int gen = lru_gen_from_seq(min_seq[LRU_GEN_FILE]);
- unsigned long birth = READ_ONCE(lruvec->lrugen.timestamps[gen]);
+static bool lruvec_is_reclaimable(struct lruvec *lruvec, struct scan_control *sc,
+ unsigned long min_ttl)
+{
+ int gen;
+ unsigned long birth;
+ struct mem_cgroup *memcg = lruvec_memcg(lruvec);
+ DEFINE_MIN_SEQ(lruvec);
- if (time_is_after_jiffies(birth + min_ttl))
- return false;
+ VM_WARN_ON_ONCE(sc->memcg_low_reclaim);
- /* the size is likely too small to be helpful */
- if (!nr_to_scan && sc->priority != DEF_PRIORITY)
- return false;
- }
+ /* see the comment on lru_gen_folio */
+ gen = lru_gen_from_seq(min_seq[LRU_GEN_FILE]);
+ birth = READ_ONCE(lruvec->lrugen.timestamps[gen]);
- if (need_aging)
- try_to_inc_max_seq(lruvec, max_seq, sc, swappiness, false);
+ if (time_is_after_jiffies(birth + min_ttl))
+ return false;
- return true;
+ if (!lruvec_is_sizable(lruvec, sc))
+ return false;
+
+ mem_cgroup_calculate_protection(NULL, memcg);
+
+ return !mem_cgroup_below_min(memcg);
}
/* to protect the working set of the last N jiffies */
@@ -4554,46 +4572,32 @@ static unsigned long lru_gen_min_ttl __r
static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc)
{
struct mem_cgroup *memcg;
- bool success = false;
unsigned long min_ttl = READ_ONCE(lru_gen_min_ttl);
VM_WARN_ON_ONCE(!current_is_kswapd());
sc->last_reclaimed = sc->nr_reclaimed;
- /*
- * To reduce the chance of going into the aging path, which can be
- * costly, optimistically skip it if the flag below was cleared in the
- * eviction path. This improves the overall performance when multiple
- * memcgs are available.
- */
- if (!sc->memcgs_need_aging) {
- sc->memcgs_need_aging = true;
+ /* check the order to exclude compaction-induced reclaim */
+ if (!min_ttl || sc->order || sc->priority == DEF_PRIORITY)
return;
- }
-
- set_mm_walk(pgdat);
memcg = mem_cgroup_iter(NULL, NULL, NULL);
do {
struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat);
- if (age_lruvec(lruvec, sc, min_ttl))
- success = true;
+ if (lruvec_is_reclaimable(lruvec, sc, min_ttl)) {
+ mem_cgroup_iter_break(NULL, memcg);
+ return;
+ }
cond_resched();
} while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)));
- clear_mm_walk();
-
- /* check the order to exclude compaction-induced reclaim */
- if (success || !min_ttl || sc->order)
- return;
-
/*
* The main goal is to OOM kill if every generation from all memcgs is
* younger than min_ttl. However, another possibility is all memcgs are
- * either below min or empty.
+ * either too small or below min.
*/
if (mutex_trylock(&oom_lock)) {
struct oom_control oc = {
@@ -5101,33 +5105,27 @@ retry:
* reclaim.
*/
static unsigned long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc,
- bool can_swap, bool *need_aging)
+ bool can_swap)
{
unsigned long nr_to_scan;
struct mem_cgroup *memcg = lruvec_memcg(lruvec);
DEFINE_MAX_SEQ(lruvec);
- DEFINE_MIN_SEQ(lruvec);
if (mem_cgroup_below_min(memcg) ||
(mem_cgroup_below_low(memcg) && !sc->memcg_low_reclaim))
return 0;
- *need_aging = should_run_aging(lruvec, max_seq, min_seq, sc, can_swap, &nr_to_scan);
- if (!*need_aging)
+ if (!should_run_aging(lruvec, max_seq, sc, can_swap, &nr_to_scan))
return nr_to_scan;
/* skip the aging path at the default priority */
if (sc->priority == DEF_PRIORITY)
- goto done;
+ return nr_to_scan;
- /* leave the work to lru_gen_age_node() */
- if (current_is_kswapd())
- return 0;
+ try_to_inc_max_seq(lruvec, max_seq, sc, can_swap, false);
- if (try_to_inc_max_seq(lruvec, max_seq, sc, can_swap, false))
- return nr_to_scan;
-done:
- return min_seq[!can_swap] + MIN_NR_GENS <= max_seq ? nr_to_scan : 0;
+ /* skip this lruvec as it's low on cold folios */
+ return 0;
}
static unsigned long get_nr_to_reclaim(struct scan_control *sc)
@@ -5146,9 +5144,7 @@ static unsigned long get_nr_to_reclaim(s
static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
{
struct blk_plug plug;
- bool need_aging = false;
unsigned long scanned = 0;
- unsigned long reclaimed = sc->nr_reclaimed;
unsigned long nr_to_reclaim = get_nr_to_reclaim(sc);
lru_add_drain();
@@ -5169,13 +5165,13 @@ static void lru_gen_shrink_lruvec(struct
else
swappiness = 0;
- nr_to_scan = get_nr_to_scan(lruvec, sc, swappiness, &need_aging);
+ nr_to_scan = get_nr_to_scan(lruvec, sc, swappiness);
if (!nr_to_scan)
- goto done;
+ break;
delta = evict_folios(lruvec, sc, swappiness);
if (!delta)
- goto done;
+ break;
scanned += delta;
if (scanned >= nr_to_scan)
@@ -5187,10 +5183,6 @@ static void lru_gen_shrink_lruvec(struct
cond_resched();
}
- /* see the comment in lru_gen_age_node() */
- if (sc->nr_reclaimed - reclaimed >= MIN_LRU_BATCH && !need_aging)
- sc->memcgs_need_aging = false;
-done:
clear_mm_walk();
blk_finish_plug(&plug);

View File

@ -0,0 +1,166 @@
From eca3858631e0cbad2ca6e40f788892749428e4cb Mon Sep 17 00:00:00 2001
From: Yu Zhao <yuzhao@google.com>
Date: Wed, 21 Dec 2022 21:19:03 -0700
Subject: [PATCH 05/19] UPSTREAM: mm: multi-gen LRU: shuffle should_run_aging()
Move should_run_aging() next to its only caller left.
Link: https://lkml.kernel.org/r/20221222041905.2431096-6-yuzhao@google.com
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 274865848
(cherry picked from commit 77d4459a4a1a472b7309e475f962dda87d950abd)
Signed-off-by: T.J. Mercier <tjmercier@google.com>
Change-Id: I3b0383fe16b93a783b4d8c0b3a0b325160392576
Signed-off-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
mm/vmscan.c | 124 ++++++++++++++++++++++++++--------------------------
1 file changed, 62 insertions(+), 62 deletions(-)
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4454,68 +4454,6 @@ done:
return true;
}
-static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq,
- struct scan_control *sc, bool can_swap, unsigned long *nr_to_scan)
-{
- int gen, type, zone;
- unsigned long old = 0;
- unsigned long young = 0;
- unsigned long total = 0;
- struct lru_gen_folio *lrugen = &lruvec->lrugen;
- struct mem_cgroup *memcg = lruvec_memcg(lruvec);
- DEFINE_MIN_SEQ(lruvec);
-
- /* whether this lruvec is completely out of cold folios */
- if (min_seq[!can_swap] + MIN_NR_GENS > max_seq) {
- *nr_to_scan = 0;
- return true;
- }
-
- for (type = !can_swap; type < ANON_AND_FILE; type++) {
- unsigned long seq;
-
- for (seq = min_seq[type]; seq <= max_seq; seq++) {
- unsigned long size = 0;
-
- gen = lru_gen_from_seq(seq);
-
- for (zone = 0; zone < MAX_NR_ZONES; zone++)
- size += max(READ_ONCE(lrugen->nr_pages[gen][type][zone]), 0L);
-
- total += size;
- if (seq == max_seq)
- young += size;
- else if (seq + MIN_NR_GENS == max_seq)
- old += size;
- }
- }
-
- /* try to scrape all its memory if this memcg was deleted */
- *nr_to_scan = mem_cgroup_online(memcg) ? (total >> sc->priority) : total;
-
- /*
- * The aging tries to be lazy to reduce the overhead, while the eviction
- * stalls when the number of generations reaches MIN_NR_GENS. Hence, the
- * ideal number of generations is MIN_NR_GENS+1.
- */
- if (min_seq[!can_swap] + MIN_NR_GENS < max_seq)
- return false;
-
- /*
- * It's also ideal to spread pages out evenly, i.e., 1/(MIN_NR_GENS+1)
- * of the total number of pages for each generation. A reasonable range
- * for this average portion is [1/MIN_NR_GENS, 1/(MIN_NR_GENS+2)]. The
- * aging cares about the upper bound of hot pages, while the eviction
- * cares about the lower bound of cold pages.
- */
- if (young * MIN_NR_GENS > total)
- return true;
- if (old * (MIN_NR_GENS + 2) < total)
- return true;
-
- return false;
-}
-
static bool lruvec_is_sizable(struct lruvec *lruvec, struct scan_control *sc)
{
int gen, type, zone;
@@ -5099,6 +5037,68 @@ retry:
return scanned;
}
+static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq,
+ struct scan_control *sc, bool can_swap, unsigned long *nr_to_scan)
+{
+ int gen, type, zone;
+ unsigned long old = 0;
+ unsigned long young = 0;
+ unsigned long total = 0;
+ struct lru_gen_folio *lrugen = &lruvec->lrugen;
+ struct mem_cgroup *memcg = lruvec_memcg(lruvec);
+ DEFINE_MIN_SEQ(lruvec);
+
+ /* whether this lruvec is completely out of cold folios */
+ if (min_seq[!can_swap] + MIN_NR_GENS > max_seq) {
+ *nr_to_scan = 0;
+ return true;
+ }
+
+ for (type = !can_swap; type < ANON_AND_FILE; type++) {
+ unsigned long seq;
+
+ for (seq = min_seq[type]; seq <= max_seq; seq++) {
+ unsigned long size = 0;
+
+ gen = lru_gen_from_seq(seq);
+
+ for (zone = 0; zone < MAX_NR_ZONES; zone++)
+ size += max(READ_ONCE(lrugen->nr_pages[gen][type][zone]), 0L);
+
+ total += size;
+ if (seq == max_seq)
+ young += size;
+ else if (seq + MIN_NR_GENS == max_seq)
+ old += size;
+ }
+ }
+
+ /* try to scrape all its memory if this memcg was deleted */
+ *nr_to_scan = mem_cgroup_online(memcg) ? (total >> sc->priority) : total;
+
+ /*
+ * The aging tries to be lazy to reduce the overhead, while the eviction
+ * stalls when the number of generations reaches MIN_NR_GENS. Hence, the
+ * ideal number of generations is MIN_NR_GENS+1.
+ */
+ if (min_seq[!can_swap] + MIN_NR_GENS < max_seq)
+ return false;
+
+ /*
+ * It's also ideal to spread pages out evenly, i.e., 1/(MIN_NR_GENS+1)
+ * of the total number of pages for each generation. A reasonable range
+ * for this average portion is [1/MIN_NR_GENS, 1/(MIN_NR_GENS+2)]. The
+ * aging cares about the upper bound of hot pages, while the eviction
+ * cares about the lower bound of cold pages.
+ */
+ if (young * MIN_NR_GENS > total)
+ return true;
+ if (old * (MIN_NR_GENS + 2) < total)
+ return true;
+
+ return false;
+}
+
/*
* For future optimizations:
* 1. Defer try_to_inc_max_seq() to workqueues to reduce latency for memcg

View File

@ -0,0 +1,876 @@
From 8ee8571e47aa75221e5fbd4c9c7802fc4244c346 Mon Sep 17 00:00:00 2001
From: Yu Zhao <yuzhao@google.com>
Date: Wed, 21 Dec 2022 21:19:04 -0700
Subject: [PATCH 06/19] BACKPORT: mm: multi-gen LRU: per-node lru_gen_folio
lists
For each node, memcgs are divided into two generations: the old and
the young. For each generation, memcgs are randomly sharded into
multiple bins to improve scalability. For each bin, an RCU hlist_nulls
is virtually divided into three segments: the head, the tail and the
default.
An onlining memcg is added to the tail of a random bin in the old
generation. The eviction starts at the head of a random bin in the old
generation. The per-node memcg generation counter, whose reminder (mod
2) indexes the old generation, is incremented when all its bins become
empty.
There are four operations:
1. MEMCG_LRU_HEAD, which moves an memcg to the head of a random bin in
its current generation (old or young) and updates its "seg" to
"head";
2. MEMCG_LRU_TAIL, which moves an memcg to the tail of a random bin in
its current generation (old or young) and updates its "seg" to
"tail";
3. MEMCG_LRU_OLD, which moves an memcg to the head of a random bin in
the old generation, updates its "gen" to "old" and resets its "seg"
to "default";
4. MEMCG_LRU_YOUNG, which moves an memcg to the tail of a random bin
in the young generation, updates its "gen" to "young" and resets
its "seg" to "default".
The events that trigger the above operations are:
1. Exceeding the soft limit, which triggers MEMCG_LRU_HEAD;
2. The first attempt to reclaim an memcg below low, which triggers
MEMCG_LRU_TAIL;
3. The first attempt to reclaim an memcg below reclaimable size
threshold, which triggers MEMCG_LRU_TAIL;
4. The second attempt to reclaim an memcg below reclaimable size
threshold, which triggers MEMCG_LRU_YOUNG;
5. Attempting to reclaim an memcg below min, which triggers
MEMCG_LRU_YOUNG;
6. Finishing the aging on the eviction path, which triggers
MEMCG_LRU_YOUNG;
7. Offlining an memcg, which triggers MEMCG_LRU_OLD.
Note that memcg LRU only applies to global reclaim, and the
round-robin incrementing of their max_seq counters ensures the
eventual fairness to all eligible memcgs. For memcg reclaim, it still
relies on mem_cgroup_iter().
Link: https://lkml.kernel.org/r/20221222041905.2431096-7-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 274865848
(cherry picked from commit e4dde56cd208674ce899b47589f263499e5b8cdc)
[TJ: Resolved conflicts with older function signatures for
min_cgroup_below_min / min_cgroup_below_low and includes]
Change-Id: Idc8a0f635e035d72dd911f807d1224cb47cbd655
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
include/linux/memcontrol.h | 10 +
include/linux/mm_inline.h | 17 ++
include/linux/mmzone.h | 117 +++++++++++-
mm/memcontrol.c | 16 ++
mm/page_alloc.c | 1 +
mm/vmscan.c | 374 +++++++++++++++++++++++++++++++++----
6 files changed, 500 insertions(+), 35 deletions(-)
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -790,6 +790,11 @@ static inline void obj_cgroup_put(struct
percpu_ref_put(&objcg->refcnt);
}
+static inline bool mem_cgroup_tryget(struct mem_cgroup *memcg)
+{
+ return !memcg || css_tryget(&memcg->css);
+}
+
static inline void mem_cgroup_put(struct mem_cgroup *memcg)
{
if (memcg)
@@ -1290,6 +1295,11 @@ static inline void obj_cgroup_put(struct
{
}
+static inline bool mem_cgroup_tryget(struct mem_cgroup *memcg)
+{
+ return true;
+}
+
static inline void mem_cgroup_put(struct mem_cgroup *memcg)
{
}
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -122,6 +122,18 @@ static inline bool lru_gen_in_fault(void
return current->in_lru_fault;
}
+#ifdef CONFIG_MEMCG
+static inline int lru_gen_memcg_seg(struct lruvec *lruvec)
+{
+ return READ_ONCE(lruvec->lrugen.seg);
+}
+#else
+static inline int lru_gen_memcg_seg(struct lruvec *lruvec)
+{
+ return 0;
+}
+#endif
+
static inline int lru_gen_from_seq(unsigned long seq)
{
return seq % MAX_NR_GENS;
@@ -297,6 +309,11 @@ static inline bool lru_gen_in_fault(void
return false;
}
+static inline int lru_gen_memcg_seg(struct lruvec *lruvec)
+{
+ return 0;
+}
+
static inline bool lru_gen_add_folio(struct lruvec *lruvec, struct folio *folio, bool reclaiming)
{
return false;
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -7,6 +7,7 @@
#include <linux/spinlock.h>
#include <linux/list.h>
+#include <linux/list_nulls.h>
#include <linux/wait.h>
#include <linux/bitops.h>
#include <linux/cache.h>
@@ -367,6 +368,15 @@ struct page_vma_mapped_walk;
#define LRU_GEN_MASK ((BIT(LRU_GEN_WIDTH) - 1) << LRU_GEN_PGOFF)
#define LRU_REFS_MASK ((BIT(LRU_REFS_WIDTH) - 1) << LRU_REFS_PGOFF)
+/* see the comment on MEMCG_NR_GENS */
+enum {
+ MEMCG_LRU_NOP,
+ MEMCG_LRU_HEAD,
+ MEMCG_LRU_TAIL,
+ MEMCG_LRU_OLD,
+ MEMCG_LRU_YOUNG,
+};
+
#ifdef CONFIG_LRU_GEN
enum {
@@ -426,6 +436,14 @@ struct lru_gen_folio {
atomic_long_t refaulted[NR_HIST_GENS][ANON_AND_FILE][MAX_NR_TIERS];
/* whether the multi-gen LRU is enabled */
bool enabled;
+#ifdef CONFIG_MEMCG
+ /* the memcg generation this lru_gen_folio belongs to */
+ u8 gen;
+ /* the list segment this lru_gen_folio belongs to */
+ u8 seg;
+ /* per-node lru_gen_folio list for global reclaim */
+ struct hlist_nulls_node list;
+#endif
};
enum {
@@ -479,12 +497,87 @@ void lru_gen_init_lruvec(struct lruvec *
void lru_gen_look_around(struct page_vma_mapped_walk *pvmw);
#ifdef CONFIG_MEMCG
+
+/*
+ * For each node, memcgs are divided into two generations: the old and the
+ * young. For each generation, memcgs are randomly sharded into multiple bins
+ * to improve scalability. For each bin, the hlist_nulls is virtually divided
+ * into three segments: the head, the tail and the default.
+ *
+ * An onlining memcg is added to the tail of a random bin in the old generation.
+ * The eviction starts at the head of a random bin in the old generation. The
+ * per-node memcg generation counter, whose reminder (mod MEMCG_NR_GENS) indexes
+ * the old generation, is incremented when all its bins become empty.
+ *
+ * There are four operations:
+ * 1. MEMCG_LRU_HEAD, which moves an memcg to the head of a random bin in its
+ * current generation (old or young) and updates its "seg" to "head";
+ * 2. MEMCG_LRU_TAIL, which moves an memcg to the tail of a random bin in its
+ * current generation (old or young) and updates its "seg" to "tail";
+ * 3. MEMCG_LRU_OLD, which moves an memcg to the head of a random bin in the old
+ * generation, updates its "gen" to "old" and resets its "seg" to "default";
+ * 4. MEMCG_LRU_YOUNG, which moves an memcg to the tail of a random bin in the
+ * young generation, updates its "gen" to "young" and resets its "seg" to
+ * "default".
+ *
+ * The events that trigger the above operations are:
+ * 1. Exceeding the soft limit, which triggers MEMCG_LRU_HEAD;
+ * 2. The first attempt to reclaim an memcg below low, which triggers
+ * MEMCG_LRU_TAIL;
+ * 3. The first attempt to reclaim an memcg below reclaimable size threshold,
+ * which triggers MEMCG_LRU_TAIL;
+ * 4. The second attempt to reclaim an memcg below reclaimable size threshold,
+ * which triggers MEMCG_LRU_YOUNG;
+ * 5. Attempting to reclaim an memcg below min, which triggers MEMCG_LRU_YOUNG;
+ * 6. Finishing the aging on the eviction path, which triggers MEMCG_LRU_YOUNG;
+ * 7. Offlining an memcg, which triggers MEMCG_LRU_OLD.
+ *
+ * Note that memcg LRU only applies to global reclaim, and the round-robin
+ * incrementing of their max_seq counters ensures the eventual fairness to all
+ * eligible memcgs. For memcg reclaim, it still relies on mem_cgroup_iter().
+ */
+#define MEMCG_NR_GENS 2
+#define MEMCG_NR_BINS 8
+
+struct lru_gen_memcg {
+ /* the per-node memcg generation counter */
+ unsigned long seq;
+ /* each memcg has one lru_gen_folio per node */
+ unsigned long nr_memcgs[MEMCG_NR_GENS];
+ /* per-node lru_gen_folio list for global reclaim */
+ struct hlist_nulls_head fifo[MEMCG_NR_GENS][MEMCG_NR_BINS];
+ /* protects the above */
+ spinlock_t lock;
+};
+
+void lru_gen_init_pgdat(struct pglist_data *pgdat);
+
void lru_gen_init_memcg(struct mem_cgroup *memcg);
void lru_gen_exit_memcg(struct mem_cgroup *memcg);
-#endif
+void lru_gen_online_memcg(struct mem_cgroup *memcg);
+void lru_gen_offline_memcg(struct mem_cgroup *memcg);
+void lru_gen_release_memcg(struct mem_cgroup *memcg);
+void lru_gen_rotate_memcg(struct lruvec *lruvec, int op);
+
+#else /* !CONFIG_MEMCG */
+
+#define MEMCG_NR_GENS 1
+
+struct lru_gen_memcg {
+};
+
+static inline void lru_gen_init_pgdat(struct pglist_data *pgdat)
+{
+}
+
+#endif /* CONFIG_MEMCG */
#else /* !CONFIG_LRU_GEN */
+static inline void lru_gen_init_pgdat(struct pglist_data *pgdat)
+{
+}
+
static inline void lru_gen_init_lruvec(struct lruvec *lruvec)
{
}
@@ -494,6 +587,7 @@ static inline void lru_gen_look_around(s
}
#ifdef CONFIG_MEMCG
+
static inline void lru_gen_init_memcg(struct mem_cgroup *memcg)
{
}
@@ -501,7 +595,24 @@ static inline void lru_gen_init_memcg(st
static inline void lru_gen_exit_memcg(struct mem_cgroup *memcg)
{
}
-#endif
+
+static inline void lru_gen_online_memcg(struct mem_cgroup *memcg)
+{
+}
+
+static inline void lru_gen_offline_memcg(struct mem_cgroup *memcg)
+{
+}
+
+static inline void lru_gen_release_memcg(struct mem_cgroup *memcg)
+{
+}
+
+static inline void lru_gen_rotate_memcg(struct lruvec *lruvec, int op)
+{
+}
+
+#endif /* CONFIG_MEMCG */
#endif /* CONFIG_LRU_GEN */
@@ -1219,6 +1330,8 @@ typedef struct pglist_data {
#ifdef CONFIG_LRU_GEN
/* kswap mm walk data */
struct lru_gen_mm_walk mm_walk;
+ /* lru_gen_folio list */
+ struct lru_gen_memcg memcg_lru;
#endif
CACHELINE_PADDING(_pad2_);
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -477,6 +477,16 @@ static void mem_cgroup_update_tree(struc
struct mem_cgroup_per_node *mz;
struct mem_cgroup_tree_per_node *mctz;
+ if (lru_gen_enabled()) {
+ struct lruvec *lruvec = &memcg->nodeinfo[nid]->lruvec;
+
+ /* see the comment on MEMCG_NR_GENS */
+ if (soft_limit_excess(memcg) && lru_gen_memcg_seg(lruvec) != MEMCG_LRU_HEAD)
+ lru_gen_rotate_memcg(lruvec, MEMCG_LRU_HEAD);
+
+ return;
+ }
+
mctz = soft_limit_tree.rb_tree_per_node[nid];
if (!mctz)
return;
@@ -3522,6 +3532,9 @@ unsigned long mem_cgroup_soft_limit_recl
struct mem_cgroup_tree_per_node *mctz;
unsigned long excess;
+ if (lru_gen_enabled())
+ return 0;
+
if (order > 0)
return 0;
@@ -5382,6 +5395,7 @@ static int mem_cgroup_css_online(struct
if (unlikely(mem_cgroup_is_root(memcg)))
queue_delayed_work(system_unbound_wq, &stats_flush_dwork,
2UL*HZ);
+ lru_gen_online_memcg(memcg);
return 0;
offline_kmem:
memcg_offline_kmem(memcg);
@@ -5413,6 +5427,7 @@ static void mem_cgroup_css_offline(struc
memcg_offline_kmem(memcg);
reparent_shrinker_deferred(memcg);
wb_memcg_offline(memcg);
+ lru_gen_offline_memcg(memcg);
drain_all_stock(memcg);
@@ -5424,6 +5439,7 @@ static void mem_cgroup_css_released(stru
struct mem_cgroup *memcg = mem_cgroup_from_css(css);
invalidate_reclaim_iterators(memcg);
+ lru_gen_release_memcg(memcg);
}
static void mem_cgroup_css_free(struct cgroup_subsys_state *css)
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7957,6 +7957,7 @@ static void __init free_area_init_node(i
pgdat_set_deferred_range(pgdat);
free_area_init_core(pgdat);
+ lru_gen_init_pgdat(pgdat);
}
static void __init free_area_init_memoryless_node(int nid)
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -54,6 +54,8 @@
#include <linux/shmem_fs.h>
#include <linux/ctype.h>
#include <linux/debugfs.h>
+#include <linux/rculist_nulls.h>
+#include <linux/random.h>
#include <asm/tlbflush.h>
#include <asm/div64.h>
@@ -134,11 +136,6 @@ struct scan_control {
/* Always discard instead of demoting to lower tier memory */
unsigned int no_demotion:1;
-#ifdef CONFIG_LRU_GEN
- /* help kswapd make better choices among multiple memcgs */
- unsigned long last_reclaimed;
-#endif
-
/* Allocation order */
s8 order;
@@ -3160,6 +3157,9 @@ DEFINE_STATIC_KEY_ARRAY_FALSE(lru_gen_ca
for ((type) = 0; (type) < ANON_AND_FILE; (type)++) \
for ((zone) = 0; (zone) < MAX_NR_ZONES; (zone)++)
+#define get_memcg_gen(seq) ((seq) % MEMCG_NR_GENS)
+#define get_memcg_bin(bin) ((bin) % MEMCG_NR_BINS)
+
static struct lruvec *get_lruvec(struct mem_cgroup *memcg, int nid)
{
struct pglist_data *pgdat = NODE_DATA(nid);
@@ -4440,8 +4440,7 @@ done:
if (sc->priority <= DEF_PRIORITY - 2)
wait_event_killable(lruvec->mm_state.wait,
max_seq < READ_ONCE(lrugen->max_seq));
-
- return max_seq < READ_ONCE(lrugen->max_seq);
+ return false;
}
VM_WARN_ON_ONCE(max_seq != READ_ONCE(lrugen->max_seq));
@@ -4514,8 +4513,6 @@ static void lru_gen_age_node(struct pgli
VM_WARN_ON_ONCE(!current_is_kswapd());
- sc->last_reclaimed = sc->nr_reclaimed;
-
/* check the order to exclude compaction-induced reclaim */
if (!min_ttl || sc->order || sc->priority == DEF_PRIORITY)
return;
@@ -5104,8 +5101,7 @@ static bool should_run_aging(struct lruv
* 1. Defer try_to_inc_max_seq() to workqueues to reduce latency for memcg
* reclaim.
*/
-static unsigned long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc,
- bool can_swap)
+static long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc, bool can_swap)
{
unsigned long nr_to_scan;
struct mem_cgroup *memcg = lruvec_memcg(lruvec);
@@ -5122,10 +5118,8 @@ static unsigned long get_nr_to_scan(stru
if (sc->priority == DEF_PRIORITY)
return nr_to_scan;
- try_to_inc_max_seq(lruvec, max_seq, sc, can_swap, false);
-
/* skip this lruvec as it's low on cold folios */
- return 0;
+ return try_to_inc_max_seq(lruvec, max_seq, sc, can_swap, false) ? -1 : 0;
}
static unsigned long get_nr_to_reclaim(struct scan_control *sc)
@@ -5134,29 +5128,18 @@ static unsigned long get_nr_to_reclaim(s
if (!global_reclaim(sc))
return -1;
- /* discount the previous progress for kswapd */
- if (current_is_kswapd())
- return sc->nr_to_reclaim + sc->last_reclaimed;
-
return max(sc->nr_to_reclaim, compact_gap(sc->order));
}
-static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
+static bool try_to_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
{
- struct blk_plug plug;
+ long nr_to_scan;
unsigned long scanned = 0;
unsigned long nr_to_reclaim = get_nr_to_reclaim(sc);
- lru_add_drain();
-
- blk_start_plug(&plug);
-
- set_mm_walk(lruvec_pgdat(lruvec));
-
while (true) {
int delta;
int swappiness;
- unsigned long nr_to_scan;
if (sc->may_swap)
swappiness = get_swappiness(lruvec, sc);
@@ -5166,7 +5149,7 @@ static void lru_gen_shrink_lruvec(struct
swappiness = 0;
nr_to_scan = get_nr_to_scan(lruvec, sc, swappiness);
- if (!nr_to_scan)
+ if (nr_to_scan <= 0)
break;
delta = evict_folios(lruvec, sc, swappiness);
@@ -5183,10 +5166,251 @@ static void lru_gen_shrink_lruvec(struct
cond_resched();
}
+ /* whether try_to_inc_max_seq() was successful */
+ return nr_to_scan < 0;
+}
+
+static int shrink_one(struct lruvec *lruvec, struct scan_control *sc)
+{
+ bool success;
+ unsigned long scanned = sc->nr_scanned;
+ unsigned long reclaimed = sc->nr_reclaimed;
+ int seg = lru_gen_memcg_seg(lruvec);
+ struct mem_cgroup *memcg = lruvec_memcg(lruvec);
+ struct pglist_data *pgdat = lruvec_pgdat(lruvec);
+
+ /* see the comment on MEMCG_NR_GENS */
+ if (!lruvec_is_sizable(lruvec, sc))
+ return seg != MEMCG_LRU_TAIL ? MEMCG_LRU_TAIL : MEMCG_LRU_YOUNG;
+
+ mem_cgroup_calculate_protection(NULL, memcg);
+
+ if (mem_cgroup_below_min(memcg))
+ return MEMCG_LRU_YOUNG;
+
+ if (mem_cgroup_below_low(memcg)) {
+ /* see the comment on MEMCG_NR_GENS */
+ if (seg != MEMCG_LRU_TAIL)
+ return MEMCG_LRU_TAIL;
+
+ memcg_memory_event(memcg, MEMCG_LOW);
+ }
+
+ success = try_to_shrink_lruvec(lruvec, sc);
+
+ shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, sc->priority);
+
+ if (!sc->proactive)
+ vmpressure(sc->gfp_mask, memcg, false, sc->nr_scanned - scanned,
+ sc->nr_reclaimed - reclaimed);
+
+ sc->nr_reclaimed += current->reclaim_state->reclaimed_slab;
+ current->reclaim_state->reclaimed_slab = 0;
+
+ return success ? MEMCG_LRU_YOUNG : 0;
+}
+
+#ifdef CONFIG_MEMCG
+
+static void shrink_many(struct pglist_data *pgdat, struct scan_control *sc)
+{
+ int gen;
+ int bin;
+ int first_bin;
+ struct lruvec *lruvec;
+ struct lru_gen_folio *lrugen;
+ const struct hlist_nulls_node *pos;
+ int op = 0;
+ struct mem_cgroup *memcg = NULL;
+ unsigned long nr_to_reclaim = get_nr_to_reclaim(sc);
+
+ bin = first_bin = get_random_u32_below(MEMCG_NR_BINS);
+restart:
+ gen = get_memcg_gen(READ_ONCE(pgdat->memcg_lru.seq));
+
+ rcu_read_lock();
+
+ hlist_nulls_for_each_entry_rcu(lrugen, pos, &pgdat->memcg_lru.fifo[gen][bin], list) {
+ if (op)
+ lru_gen_rotate_memcg(lruvec, op);
+
+ mem_cgroup_put(memcg);
+
+ lruvec = container_of(lrugen, struct lruvec, lrugen);
+ memcg = lruvec_memcg(lruvec);
+
+ if (!mem_cgroup_tryget(memcg)) {
+ op = 0;
+ memcg = NULL;
+ continue;
+ }
+
+ rcu_read_unlock();
+
+ op = shrink_one(lruvec, sc);
+
+ if (sc->nr_reclaimed >= nr_to_reclaim)
+ goto success;
+
+ rcu_read_lock();
+ }
+
+ rcu_read_unlock();
+
+ /* restart if raced with lru_gen_rotate_memcg() */
+ if (gen != get_nulls_value(pos))
+ goto restart;
+
+ /* try the rest of the bins of the current generation */
+ bin = get_memcg_bin(bin + 1);
+ if (bin != first_bin)
+ goto restart;
+success:
+ if (op)
+ lru_gen_rotate_memcg(lruvec, op);
+
+ mem_cgroup_put(memcg);
+}
+
+static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
+{
+ struct blk_plug plug;
+
+ VM_WARN_ON_ONCE(global_reclaim(sc));
+
+ lru_add_drain();
+
+ blk_start_plug(&plug);
+
+ set_mm_walk(lruvec_pgdat(lruvec));
+
+ if (try_to_shrink_lruvec(lruvec, sc))
+ lru_gen_rotate_memcg(lruvec, MEMCG_LRU_YOUNG);
+
+ clear_mm_walk();
+
+ blk_finish_plug(&plug);
+}
+
+#else /* !CONFIG_MEMCG */
+
+static void shrink_many(struct pglist_data *pgdat, struct scan_control *sc)
+{
+ BUILD_BUG();
+}
+
+static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
+{
+ BUILD_BUG();
+}
+
+#endif
+
+static void set_initial_priority(struct pglist_data *pgdat, struct scan_control *sc)
+{
+ int priority;
+ unsigned long reclaimable;
+ struct lruvec *lruvec = mem_cgroup_lruvec(NULL, pgdat);
+
+ if (sc->priority != DEF_PRIORITY || sc->nr_to_reclaim < MIN_LRU_BATCH)
+ return;
+ /*
+ * Determine the initial priority based on ((total / MEMCG_NR_GENS) >>
+ * priority) * reclaimed_to_scanned_ratio = nr_to_reclaim, where the
+ * estimated reclaimed_to_scanned_ratio = inactive / total.
+ */
+ reclaimable = node_page_state(pgdat, NR_INACTIVE_FILE);
+ if (get_swappiness(lruvec, sc))
+ reclaimable += node_page_state(pgdat, NR_INACTIVE_ANON);
+
+ reclaimable /= MEMCG_NR_GENS;
+
+ /* round down reclaimable and round up sc->nr_to_reclaim */
+ priority = fls_long(reclaimable) - 1 - fls_long(sc->nr_to_reclaim - 1);
+
+ sc->priority = clamp(priority, 0, DEF_PRIORITY);
+}
+
+static void lru_gen_shrink_node(struct pglist_data *pgdat, struct scan_control *sc)
+{
+ struct blk_plug plug;
+ unsigned long reclaimed = sc->nr_reclaimed;
+
+ VM_WARN_ON_ONCE(!global_reclaim(sc));
+
+ lru_add_drain();
+
+ blk_start_plug(&plug);
+
+ set_mm_walk(pgdat);
+
+ set_initial_priority(pgdat, sc);
+
+ if (current_is_kswapd())
+ sc->nr_reclaimed = 0;
+
+ if (mem_cgroup_disabled())
+ shrink_one(&pgdat->__lruvec, sc);
+ else
+ shrink_many(pgdat, sc);
+
+ if (current_is_kswapd())
+ sc->nr_reclaimed += reclaimed;
+
clear_mm_walk();
blk_finish_plug(&plug);
+
+ /* kswapd should never fail */
+ pgdat->kswapd_failures = 0;
+}
+
+#ifdef CONFIG_MEMCG
+void lru_gen_rotate_memcg(struct lruvec *lruvec, int op)
+{
+ int seg;
+ int old, new;
+ int bin = get_random_u32_below(MEMCG_NR_BINS);
+ struct pglist_data *pgdat = lruvec_pgdat(lruvec);
+
+ spin_lock(&pgdat->memcg_lru.lock);
+
+ VM_WARN_ON_ONCE(hlist_nulls_unhashed(&lruvec->lrugen.list));
+
+ seg = 0;
+ new = old = lruvec->lrugen.gen;
+
+ /* see the comment on MEMCG_NR_GENS */
+ if (op == MEMCG_LRU_HEAD)
+ seg = MEMCG_LRU_HEAD;
+ else if (op == MEMCG_LRU_TAIL)
+ seg = MEMCG_LRU_TAIL;
+ else if (op == MEMCG_LRU_OLD)
+ new = get_memcg_gen(pgdat->memcg_lru.seq);
+ else if (op == MEMCG_LRU_YOUNG)
+ new = get_memcg_gen(pgdat->memcg_lru.seq + 1);
+ else
+ VM_WARN_ON_ONCE(true);
+
+ hlist_nulls_del_rcu(&lruvec->lrugen.list);
+
+ if (op == MEMCG_LRU_HEAD || op == MEMCG_LRU_OLD)
+ hlist_nulls_add_head_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[new][bin]);
+ else
+ hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[new][bin]);
+
+ pgdat->memcg_lru.nr_memcgs[old]--;
+ pgdat->memcg_lru.nr_memcgs[new]++;
+
+ lruvec->lrugen.gen = new;
+ WRITE_ONCE(lruvec->lrugen.seg, seg);
+
+ if (!pgdat->memcg_lru.nr_memcgs[old] && old == get_memcg_gen(pgdat->memcg_lru.seq))
+ WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1);
+
+ spin_unlock(&pgdat->memcg_lru.lock);
}
+#endif
/******************************************************************************
* state change
@@ -5644,11 +5868,11 @@ static int run_cmd(char cmd, int memcg_i
if (!mem_cgroup_disabled()) {
rcu_read_lock();
+
memcg = mem_cgroup_from_id(memcg_id);
-#ifdef CONFIG_MEMCG
- if (memcg && !css_tryget(&memcg->css))
+ if (!mem_cgroup_tryget(memcg))
memcg = NULL;
-#endif
+
rcu_read_unlock();
if (!memcg)
@@ -5796,6 +6020,19 @@ void lru_gen_init_lruvec(struct lruvec *
}
#ifdef CONFIG_MEMCG
+
+void lru_gen_init_pgdat(struct pglist_data *pgdat)
+{
+ int i, j;
+
+ spin_lock_init(&pgdat->memcg_lru.lock);
+
+ for (i = 0; i < MEMCG_NR_GENS; i++) {
+ for (j = 0; j < MEMCG_NR_BINS; j++)
+ INIT_HLIST_NULLS_HEAD(&pgdat->memcg_lru.fifo[i][j], i);
+ }
+}
+
void lru_gen_init_memcg(struct mem_cgroup *memcg)
{
INIT_LIST_HEAD(&memcg->mm_list.fifo);
@@ -5819,7 +6056,69 @@ void lru_gen_exit_memcg(struct mem_cgrou
}
}
}
-#endif
+
+void lru_gen_online_memcg(struct mem_cgroup *memcg)
+{
+ int gen;
+ int nid;
+ int bin = get_random_u32_below(MEMCG_NR_BINS);
+
+ for_each_node(nid) {
+ struct pglist_data *pgdat = NODE_DATA(nid);
+ struct lruvec *lruvec = get_lruvec(memcg, nid);
+
+ spin_lock(&pgdat->memcg_lru.lock);
+
+ VM_WARN_ON_ONCE(!hlist_nulls_unhashed(&lruvec->lrugen.list));
+
+ gen = get_memcg_gen(pgdat->memcg_lru.seq);
+
+ hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[gen][bin]);
+ pgdat->memcg_lru.nr_memcgs[gen]++;
+
+ lruvec->lrugen.gen = gen;
+
+ spin_unlock(&pgdat->memcg_lru.lock);
+ }
+}
+
+void lru_gen_offline_memcg(struct mem_cgroup *memcg)
+{
+ int nid;
+
+ for_each_node(nid) {
+ struct lruvec *lruvec = get_lruvec(memcg, nid);
+
+ lru_gen_rotate_memcg(lruvec, MEMCG_LRU_OLD);
+ }
+}
+
+void lru_gen_release_memcg(struct mem_cgroup *memcg)
+{
+ int gen;
+ int nid;
+
+ for_each_node(nid) {
+ struct pglist_data *pgdat = NODE_DATA(nid);
+ struct lruvec *lruvec = get_lruvec(memcg, nid);
+
+ spin_lock(&pgdat->memcg_lru.lock);
+
+ VM_WARN_ON_ONCE(hlist_nulls_unhashed(&lruvec->lrugen.list));
+
+ gen = lruvec->lrugen.gen;
+
+ hlist_nulls_del_rcu(&lruvec->lrugen.list);
+ pgdat->memcg_lru.nr_memcgs[gen]--;
+
+ if (!pgdat->memcg_lru.nr_memcgs[gen] && gen == get_memcg_gen(pgdat->memcg_lru.seq))
+ WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1);
+
+ spin_unlock(&pgdat->memcg_lru.lock);
+ }
+}
+
+#endif /* CONFIG_MEMCG */
static int __init init_lru_gen(void)
{
@@ -5846,6 +6145,10 @@ static void lru_gen_shrink_lruvec(struct
{
}
+static void lru_gen_shrink_node(struct pglist_data *pgdat, struct scan_control *sc)
+{
+}
+
#endif /* CONFIG_LRU_GEN */
static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
@@ -5859,7 +6162,7 @@ static void shrink_lruvec(struct lruvec
bool proportional_reclaim;
struct blk_plug plug;
- if (lru_gen_enabled()) {
+ if (lru_gen_enabled() && !global_reclaim(sc)) {
lru_gen_shrink_lruvec(lruvec, sc);
return;
}
@@ -6102,6 +6405,11 @@ static void shrink_node(pg_data_t *pgdat
struct lruvec *target_lruvec;
bool reclaimable = false;
+ if (lru_gen_enabled() && global_reclaim(sc)) {
+ lru_gen_shrink_node(pgdat, sc);
+ return;
+ }
+
target_lruvec = mem_cgroup_lruvec(sc->target_mem_cgroup, pgdat);
again:

View File

@ -0,0 +1,202 @@
From 11b14ee8cbbbebd8204609076a9327a1171cd253 Mon Sep 17 00:00:00 2001
From: Yu Zhao <yuzhao@google.com>
Date: Wed, 21 Dec 2022 21:19:05 -0700
Subject: [PATCH 07/19] BACKPORT: mm: multi-gen LRU: clarify scan_control flags
Among the flags in scan_control:
1. sc->may_swap, which indicates swap constraint due to memsw.max, is
supported as usual.
2. sc->proactive, which indicates reclaim by memory.reclaim, may not
opportunistically skip the aging path, since it is considered less
latency sensitive.
3. !(sc->gfp_mask & __GFP_IO), which indicates IO constraint, lowers
swappiness to prioritize file LRU, since clean file folios are more
likely to exist.
4. sc->may_writepage and sc->may_unmap, which indicates opportunistic
reclaim, are rejected, since unmapped clean folios are already
prioritized. Scanning for more of them is likely futile and can
cause high reclaim latency when there is a large number of memcgs.
The rest are handled by the existing code.
Link: https://lkml.kernel.org/r/20221222041905.2431096-8-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 274865848
(cherry picked from commit e9d4e1ee788097484606c32122f146d802a9c5fb)
[TJ: Resolved conflict with older function signature for min_cgroup_below_min, and over
cdded861182142ac4488a4d64c571107aeb77f53 ("ANDROID: MGLRU: Don't skip anon reclaim if swap low")]
Change-Id: Ic2e779eaf4e91a3921831b4e2fa10c740dc59d50
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
mm/vmscan.c | 55 +++++++++++++++++++++++++++--------------------------
1 file changed, 28 insertions(+), 27 deletions(-)
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3185,6 +3185,9 @@ static int get_swappiness(struct lruvec
struct mem_cgroup *memcg = lruvec_memcg(lruvec);
struct pglist_data *pgdat = lruvec_pgdat(lruvec);
+ if (!sc->may_swap)
+ return 0;
+
if (!can_demote(pgdat->node_id, sc) &&
mem_cgroup_get_nr_swap_pages(memcg) < MIN_LRU_BATCH)
return 0;
@@ -4223,7 +4226,7 @@ static void walk_mm(struct lruvec *lruve
} while (err == -EAGAIN);
}
-static struct lru_gen_mm_walk *set_mm_walk(struct pglist_data *pgdat)
+static struct lru_gen_mm_walk *set_mm_walk(struct pglist_data *pgdat, bool force_alloc)
{
struct lru_gen_mm_walk *walk = current->reclaim_state->mm_walk;
@@ -4231,7 +4234,7 @@ static struct lru_gen_mm_walk *set_mm_wa
VM_WARN_ON_ONCE(walk);
walk = &pgdat->mm_walk;
- } else if (!pgdat && !walk) {
+ } else if (!walk && force_alloc) {
VM_WARN_ON_ONCE(current_is_kswapd());
walk = kzalloc(sizeof(*walk), __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN);
@@ -4417,7 +4420,7 @@ static bool try_to_inc_max_seq(struct lr
goto done;
}
- walk = set_mm_walk(NULL);
+ walk = set_mm_walk(NULL, true);
if (!walk) {
success = iterate_mm_list_nowalk(lruvec, max_seq);
goto done;
@@ -4486,8 +4489,6 @@ static bool lruvec_is_reclaimable(struct
struct mem_cgroup *memcg = lruvec_memcg(lruvec);
DEFINE_MIN_SEQ(lruvec);
- VM_WARN_ON_ONCE(sc->memcg_low_reclaim);
-
/* see the comment on lru_gen_folio */
gen = lru_gen_from_seq(min_seq[LRU_GEN_FILE]);
birth = READ_ONCE(lruvec->lrugen.timestamps[gen]);
@@ -4743,12 +4744,8 @@ static bool isolate_folio(struct lruvec
{
bool success;
- /* unmapping inhibited */
- if (!sc->may_unmap && folio_mapped(folio))
- return false;
-
/* swapping inhibited */
- if (!(sc->may_writepage && (sc->gfp_mask & __GFP_IO)) &&
+ if (!(sc->gfp_mask & __GFP_IO) &&
(folio_test_dirty(folio) ||
(folio_test_anon(folio) && !folio_test_swapcache(folio))))
return false;
@@ -4845,9 +4842,8 @@ static int scan_folios(struct lruvec *lr
__count_vm_events(PGSCAN_ANON + type, isolated);
/*
- * There might not be eligible pages due to reclaim_idx, may_unmap and
- * may_writepage. Check the remaining to prevent livelock if it's not
- * making progress.
+ * There might not be eligible folios due to reclaim_idx. Check the
+ * remaining to prevent livelock if it's not making progress.
*/
return isolated || !remaining ? scanned : 0;
}
@@ -5107,8 +5103,7 @@ static long get_nr_to_scan(struct lruvec
struct mem_cgroup *memcg = lruvec_memcg(lruvec);
DEFINE_MAX_SEQ(lruvec);
- if (mem_cgroup_below_min(memcg) ||
- (mem_cgroup_below_low(memcg) && !sc->memcg_low_reclaim))
+ if (mem_cgroup_below_min(memcg))
return 0;
if (!should_run_aging(lruvec, max_seq, sc, can_swap, &nr_to_scan))
@@ -5136,17 +5131,14 @@ static bool try_to_shrink_lruvec(struct
long nr_to_scan;
unsigned long scanned = 0;
unsigned long nr_to_reclaim = get_nr_to_reclaim(sc);
+ int swappiness = get_swappiness(lruvec, sc);
+
+ /* clean file folios are more likely to exist */
+ if (swappiness && !(sc->gfp_mask & __GFP_IO))
+ swappiness = 1;
while (true) {
int delta;
- int swappiness;
-
- if (sc->may_swap)
- swappiness = get_swappiness(lruvec, sc);
- else if (!cgroup_reclaim(sc) && get_swappiness(lruvec, sc))
- swappiness = 1;
- else
- swappiness = 0;
nr_to_scan = get_nr_to_scan(lruvec, sc, swappiness);
if (nr_to_scan <= 0)
@@ -5277,12 +5269,13 @@ static void lru_gen_shrink_lruvec(struct
struct blk_plug plug;
VM_WARN_ON_ONCE(global_reclaim(sc));
+ VM_WARN_ON_ONCE(!sc->may_writepage || !sc->may_unmap);
lru_add_drain();
blk_start_plug(&plug);
- set_mm_walk(lruvec_pgdat(lruvec));
+ set_mm_walk(NULL, sc->proactive);
if (try_to_shrink_lruvec(lruvec, sc))
lru_gen_rotate_memcg(lruvec, MEMCG_LRU_YOUNG);
@@ -5338,11 +5331,19 @@ static void lru_gen_shrink_node(struct p
VM_WARN_ON_ONCE(!global_reclaim(sc));
+ /*
+ * Unmapped clean folios are already prioritized. Scanning for more of
+ * them is likely futile and can cause high reclaim latency when there
+ * is a large number of memcgs.
+ */
+ if (!sc->may_writepage || !sc->may_unmap)
+ goto done;
+
lru_add_drain();
blk_start_plug(&plug);
- set_mm_walk(pgdat);
+ set_mm_walk(pgdat, sc->proactive);
set_initial_priority(pgdat, sc);
@@ -5360,7 +5361,7 @@ static void lru_gen_shrink_node(struct p
clear_mm_walk();
blk_finish_plug(&plug);
-
+done:
/* kswapd should never fail */
pgdat->kswapd_failures = 0;
}
@@ -5932,7 +5933,7 @@ static ssize_t lru_gen_seq_write(struct
set_task_reclaim_state(current, &sc.reclaim_state);
flags = memalloc_noreclaim_save();
blk_start_plug(&plug);
- if (!set_mm_walk(NULL)) {
+ if (!set_mm_walk(NULL, true)) {
err = -ENOMEM;
goto done;
}

View File

@ -0,0 +1,38 @@
From 25887d48dff860751a06caa4188bfaf6bfb6e4b2 Mon Sep 17 00:00:00 2001
From: Yu Zhao <yuzhao@google.com>
Date: Wed, 21 Dec 2022 21:19:06 -0700
Subject: [PATCH 08/19] UPSTREAM: mm: multi-gen LRU: simplify
arch_has_hw_pte_young() check
Scanning page tables when hardware does not set the accessed bit has
no real use cases.
Link: https://lkml.kernel.org/r/20221222041905.2431096-9-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 274865848
(cherry picked from commit f386e9314025ea99dae639ed2032560a92081430)
Change-Id: I84d97ab665b4e3bb862a9bc7d72f50dea7191a6b
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
mm/vmscan.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4415,7 +4415,7 @@ static bool try_to_inc_max_seq(struct lr
* handful of PTEs. Spreading the work out over a period of time usually
* is less efficient, but it avoids bursty page faults.
*/
- if (!force_scan && !(arch_has_hw_pte_young() && get_cap(LRU_GEN_MM_WALK))) {
+ if (!arch_has_hw_pte_young() || !get_cap(LRU_GEN_MM_WALK)) {
success = iterate_mm_list_nowalk(lruvec, max_seq);
goto done;
}

View File

@ -0,0 +1,92 @@
From 620b0ee94455e48d124414cd06d8a53f69fb6453 Mon Sep 17 00:00:00 2001
From: Yu Zhao <yuzhao@google.com>
Date: Mon, 13 Feb 2023 00:53:22 -0700
Subject: [PATCH 09/19] UPSTREAM: mm: multi-gen LRU: avoid futile retries
Recall that the per-node memcg LRU has two generations and they alternate
when the last memcg (of a given node) is moved from one to the other.
Each generation is also sharded into multiple bins to improve scalability.
A reclaimer starts with a random bin (in the old generation) and, if it
fails, it will retry, i.e., to try the rest of the bins.
If a reclaimer fails with the last memcg, it should move this memcg to the
young generation first, which causes the generations to alternate, and
then retry. Otherwise, the retries will be futile because all other bins
are empty.
Link: https://lkml.kernel.org/r/20230213075322.1416966-1-yuzhao@google.com
Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists")
Signed-off-by: Yu Zhao <yuzhao@google.com>
Reported-by: T.J. Mercier <tjmercier@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bug: 274865848
(cherry picked from commit 9f550d78b40da21b4da515db4c37d8d7b12aa1a6)
Change-Id: Ie92535676b005ec9e7987632b742fdde8d54436f
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
mm/vmscan.c | 25 +++++++++++++++----------
1 file changed, 15 insertions(+), 10 deletions(-)
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -5206,18 +5206,20 @@ static int shrink_one(struct lruvec *lru
static void shrink_many(struct pglist_data *pgdat, struct scan_control *sc)
{
+ int op;
int gen;
int bin;
int first_bin;
struct lruvec *lruvec;
struct lru_gen_folio *lrugen;
+ struct mem_cgroup *memcg;
const struct hlist_nulls_node *pos;
- int op = 0;
- struct mem_cgroup *memcg = NULL;
unsigned long nr_to_reclaim = get_nr_to_reclaim(sc);
bin = first_bin = get_random_u32_below(MEMCG_NR_BINS);
restart:
+ op = 0;
+ memcg = NULL;
gen = get_memcg_gen(READ_ONCE(pgdat->memcg_lru.seq));
rcu_read_lock();
@@ -5241,14 +5243,22 @@ restart:
op = shrink_one(lruvec, sc);
- if (sc->nr_reclaimed >= nr_to_reclaim)
- goto success;
-
rcu_read_lock();
+
+ if (sc->nr_reclaimed >= nr_to_reclaim)
+ break;
}
rcu_read_unlock();
+ if (op)
+ lru_gen_rotate_memcg(lruvec, op);
+
+ mem_cgroup_put(memcg);
+
+ if (sc->nr_reclaimed >= nr_to_reclaim)
+ return;
+
/* restart if raced with lru_gen_rotate_memcg() */
if (gen != get_nulls_value(pos))
goto restart;
@@ -5257,11 +5267,6 @@ restart:
bin = get_memcg_bin(bin + 1);
if (bin != first_bin)
goto restart;
-success:
- if (op)
- lru_gen_rotate_memcg(lruvec, op);
-
- mem_cgroup_put(memcg);
}
static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)

View File

@ -0,0 +1,191 @@
From 70d216c71ff5c5b17dd1da6294f97b91fb6aba7a Mon Sep 17 00:00:00 2001
From: Yu Zhao <yuzhao@google.com>
Date: Fri, 30 Dec 2022 14:52:51 -0700
Subject: [PATCH 10/19] UPSTREAM: mm: add vma_has_recency()
Add vma_has_recency() to indicate whether a VMA may exhibit temporal
locality that the LRU algorithm relies on.
This function returns false for VMAs marked by VM_SEQ_READ or
VM_RAND_READ. While the former flag indicates linear access, i.e., a
special case of spatial locality, both flags indicate a lack of temporal
locality, i.e., the reuse of an area within a relatively small duration.
"Recency" is chosen over "locality" to avoid confusion between temporal
and spatial localities.
Before this patch, the active/inactive LRU only ignored the accessed bit
from VMAs marked by VM_SEQ_READ. After this patch, the active/inactive
LRU and MGLRU share the same logic: they both ignore the accessed bit if
vma_has_recency() returns false.
For the active/inactive LRU, the following fio test showed a [6, 8]%
increase in IOPS when randomly accessing mapped files under memory
pressure.
kb=$(awk '/MemTotal/ { print $2 }' /proc/meminfo)
kb=$((kb - 8*1024*1024))
modprobe brd rd_nr=1 rd_size=$kb
dd if=/dev/zero of=/dev/ram0 bs=1M
mkfs.ext4 /dev/ram0
mount /dev/ram0 /mnt/
swapoff -a
fio --name=test --directory=/mnt/ --ioengine=mmap --numjobs=8 \
--size=8G --rw=randrw --time_based --runtime=10m \
--group_reporting
The discussion that led to this patch is here [1]. Additional test
results are available in that thread.
[1] https://lore.kernel.org/r/Y31s%2FK8T85jh05wH@google.com/
Link: https://lkml.kernel.org/r/20221230215252.2628425-1-yuzhao@google.com
Change-Id: I291dcb795197659e40e46539cd32b857677c34ad
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Righi <andrea.righi@canonical.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 8788f6781486769d9598dcaedc3fe0eb12fc3e59)
Bug: 274865848
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
include/linux/mm_inline.h | 8 ++++++++
mm/memory.c | 7 +++----
mm/rmap.c | 42 +++++++++++++++++----------------------
mm/vmscan.c | 5 ++++-
4 files changed, 33 insertions(+), 29 deletions(-)
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -595,4 +595,12 @@ pte_install_uffd_wp_if_needed(struct vm_
#endif
}
+static inline bool vma_has_recency(struct vm_area_struct *vma)
+{
+ if (vma->vm_flags & (VM_SEQ_READ | VM_RAND_READ))
+ return false;
+
+ return true;
+}
+
#endif
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1435,8 +1435,7 @@ again:
force_flush = 1;
set_page_dirty(page);
}
- if (pte_young(ptent) &&
- likely(!(vma->vm_flags & VM_SEQ_READ)))
+ if (pte_young(ptent) && likely(vma_has_recency(vma)))
mark_page_accessed(page);
}
rss[mm_counter(page)]--;
@@ -5170,8 +5169,8 @@ static inline void mm_account_fault(stru
#ifdef CONFIG_LRU_GEN
static void lru_gen_enter_fault(struct vm_area_struct *vma)
{
- /* the LRU algorithm doesn't apply to sequential or random reads */
- current->in_lru_fault = !(vma->vm_flags & (VM_SEQ_READ | VM_RAND_READ));
+ /* the LRU algorithm only applies to accesses with recency */
+ current->in_lru_fault = vma_has_recency(vma);
}
static void lru_gen_exit_fault(void)
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -823,25 +823,14 @@ static bool folio_referenced_one(struct
}
if (pvmw.pte) {
- if (lru_gen_enabled() && pte_young(*pvmw.pte) &&
- !(vma->vm_flags & (VM_SEQ_READ | VM_RAND_READ))) {
+ if (lru_gen_enabled() && pte_young(*pvmw.pte)) {
lru_gen_look_around(&pvmw);
referenced++;
}
if (ptep_clear_flush_young_notify(vma, address,
- pvmw.pte)) {
- /*
- * Don't treat a reference through
- * a sequentially read mapping as such.
- * If the folio has been used in another mapping,
- * we will catch it; if this other mapping is
- * already gone, the unmap path will have set
- * the referenced flag or activated the folio.
- */
- if (likely(!(vma->vm_flags & VM_SEQ_READ)))
- referenced++;
- }
+ pvmw.pte))
+ referenced++;
} else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
if (pmdp_clear_flush_young_notify(vma, address,
pvmw.pmd))
@@ -875,7 +864,20 @@ static bool invalid_folio_referenced_vma
struct folio_referenced_arg *pra = arg;
struct mem_cgroup *memcg = pra->memcg;
- if (!mm_match_cgroup(vma->vm_mm, memcg))
+ /*
+ * Ignore references from this mapping if it has no recency. If the
+ * folio has been used in another mapping, we will catch it; if this
+ * other mapping is already gone, the unmap path will have set the
+ * referenced flag or activated the folio in zap_pte_range().
+ */
+ if (!vma_has_recency(vma))
+ return true;
+
+ /*
+ * If we are reclaiming on behalf of a cgroup, skip counting on behalf
+ * of references from different cgroups.
+ */
+ if (memcg && !mm_match_cgroup(vma->vm_mm, memcg))
return true;
return false;
@@ -906,6 +908,7 @@ int folio_referenced(struct folio *folio
.arg = (void *)&pra,
.anon_lock = folio_lock_anon_vma_read,
.try_lock = true,
+ .invalid_vma = invalid_folio_referenced_vma,
};
*vm_flags = 0;
@@ -921,15 +924,6 @@ int folio_referenced(struct folio *folio
return 1;
}
- /*
- * If we are reclaiming on behalf of a cgroup, skip
- * counting on behalf of references from different
- * cgroups
- */
- if (memcg) {
- rwc.invalid_vma = invalid_folio_referenced_vma;
- }
-
rmap_walk(folio, &rwc);
*vm_flags = pra.vm_flags;
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3778,7 +3778,10 @@ static int should_skip_vma(unsigned long
if (is_vm_hugetlb_page(vma))
return true;
- if (vma->vm_flags & (VM_LOCKED | VM_SPECIAL | VM_SEQ_READ | VM_RAND_READ))
+ if (!vma_has_recency(vma))
+ return true;
+
+ if (vma->vm_flags & (VM_LOCKED | VM_SPECIAL))
return true;
if (vma == get_gate_vma(vma->vm_mm))

View File

@ -0,0 +1,129 @@
From 9ca4e437a24dfc4ec6c362f319eb9850b9eca497 Mon Sep 17 00:00:00 2001
From: Yu Zhao <yuzhao@google.com>
Date: Fri, 30 Dec 2022 14:52:52 -0700
Subject: [PATCH 11/19] UPSTREAM: mm: support POSIX_FADV_NOREUSE
This patch adds POSIX_FADV_NOREUSE to vma_has_recency() so that the LRU
algorithm can ignore access to mapped files marked by this flag.
The advantages of POSIX_FADV_NOREUSE are:
1. Unlike MADV_SEQUENTIAL and MADV_RANDOM, it does not alter the
default readahead behavior.
2. Unlike MADV_SEQUENTIAL and MADV_RANDOM, it does not split VMAs and
therefore does not take mmap_lock.
3. Unlike MADV_COLD, setting it has a negligible cost, regardless of
how many pages it affects.
Its limitations are:
1. Like POSIX_FADV_RANDOM and POSIX_FADV_SEQUENTIAL, it currently does
not support range. IOW, its scope is the entire file.
2. It currently does not ignore access through file descriptors.
Specifically, for the active/inactive LRU, given a file page shared
by two users and one of them having set POSIX_FADV_NOREUSE on the
file, this page will be activated upon the second user accessing
it. This corner case can be covered by checking POSIX_FADV_NOREUSE
before calling folio_mark_accessed() on the read path. But it is
considered not worth the effort.
There have been a few attempts to support POSIX_FADV_NOREUSE, e.g., [1].
This time the goal is to fill a niche: a few desktop applications, e.g.,
large file transferring and video encoding/decoding, want fast file
streaming with mmap() rather than direct IO. Among those applications, an
SVT-AV1 regression was reported when running with MGLRU [2]. The
following test can reproduce that regression.
kb=$(awk '/MemTotal/ { print $2 }' /proc/meminfo)
kb=$((kb - 8*1024*1024))
modprobe brd rd_nr=1 rd_size=$kb
dd if=/dev/zero of=/dev/ram0 bs=1M
mkfs.ext4 /dev/ram0
mount /dev/ram0 /mnt/
swapoff -a
fallocate -l 8G /mnt/swapfile
mkswap /mnt/swapfile
swapon /mnt/swapfile
wget http://ultravideo.cs.tut.fi/video/Bosphorus_3840x2160_120fps_420_8bit_YUV_Y4M.7z
7z e -o/mnt/ Bosphorus_3840x2160_120fps_420_8bit_YUV_Y4M.7z
SvtAv1EncApp --preset 12 -w 3840 -h 2160 \
-i /mnt/Bosphorus_3840x2160.y4m
For MGLRU, the following change showed a [9-11]% increase in FPS,
which makes it on par with the active/inactive LRU.
patch Source/App/EncApp/EbAppMain.c <<EOF
31a32
> #include <fcntl.h>
35d35
< #include <fcntl.h> /* _O_BINARY */
117a118
> posix_fadvise(config->mmap.fd, 0, 0, POSIX_FADV_NOREUSE);
EOF
[1] https://lore.kernel.org/r/1308923350-7932-1-git-send-email-andrea@betterlinux.com/
[2] https://openbenchmarking.org/result/2209259-PTS-MGLRU8GB57
Link: https://lkml.kernel.org/r/20221230215252.2628425-2-yuzhao@google.com
Change-Id: I0b7f5f971d78014ea1ba44cee6a8ec902a4330d0
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Righi <andrea.righi@canonical.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 17e810229cb3068b692fa078bd9b3a6527e0866a)
Bug: 274865848
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
include/linux/fs.h | 2 ++
include/linux/mm_inline.h | 3 +++
mm/fadvise.c | 5 ++++-
3 files changed, 9 insertions(+), 1 deletion(-)
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -166,6 +166,8 @@ typedef int (dio_iodone_t)(struct kiocb
/* File supports DIRECT IO */
#define FMODE_CAN_ODIRECT ((__force fmode_t)0x400000)
+#define FMODE_NOREUSE ((__force fmode_t)0x800000)
+
/* File was opened by fanotify and shouldn't generate fanotify events */
#define FMODE_NONOTIFY ((__force fmode_t)0x4000000)
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -600,6 +600,9 @@ static inline bool vma_has_recency(struc
if (vma->vm_flags & (VM_SEQ_READ | VM_RAND_READ))
return false;
+ if (vma->vm_file && (vma->vm_file->f_mode & FMODE_NOREUSE))
+ return false;
+
return true;
}
--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -80,7 +80,7 @@ int generic_fadvise(struct file *file, l
case POSIX_FADV_NORMAL:
file->f_ra.ra_pages = bdi->ra_pages;
spin_lock(&file->f_lock);
- file->f_mode &= ~FMODE_RANDOM;
+ file->f_mode &= ~(FMODE_RANDOM | FMODE_NOREUSE);
spin_unlock(&file->f_lock);
break;
case POSIX_FADV_RANDOM:
@@ -107,6 +107,9 @@ int generic_fadvise(struct file *file, l
force_page_cache_readahead(mapping, file, start_index, nrpages);
break;
case POSIX_FADV_NOREUSE:
+ spin_lock(&file->f_lock);
+ file->f_mode |= FMODE_NOREUSE;
+ spin_unlock(&file->f_lock);
break;
case POSIX_FADV_DONTNEED:
__filemap_fdatawrite_range(mapping, offset, endbyte,

View File

@ -0,0 +1,67 @@
From 1b5e4c317d80f4826eceb3781702d18d06b14394 Mon Sep 17 00:00:00 2001
From: "T.J. Alumbaugh" <talumbau@google.com>
Date: Wed, 18 Jan 2023 00:18:21 +0000
Subject: [PATCH 12/19] UPSTREAM: mm: multi-gen LRU: section for working set
protection
Patch series "mm: multi-gen LRU: improve".
This patch series improves a few MGLRU functions, collects related
functions, and adds additional documentation.
This patch (of 7):
Add a section for working set protection in the code and the design doc.
The admin doc already contains its usage.
Link: https://lkml.kernel.org/r/20230118001827.1040870-1-talumbau@google.com
Link: https://lkml.kernel.org/r/20230118001827.1040870-2-talumbau@google.com
Change-Id: I65599075fd42951db7739a2ab7cee78516e157b3
Signed-off-by: T.J. Alumbaugh <talumbau@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 7b8144e63d84716f16a1b929e0c7e03ae5c4d5c1)
Bug: 274865848
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
Documentation/mm/multigen_lru.rst | 15 +++++++++++++++
mm/vmscan.c | 4 ++++
2 files changed, 19 insertions(+)
--- a/Documentation/mm/multigen_lru.rst
+++ b/Documentation/mm/multigen_lru.rst
@@ -141,6 +141,21 @@ loop has detected outlying refaults from
this end, the feedback loop uses the first tier as the baseline, for
the reason stated earlier.
+Working set protection
+----------------------
+Each generation is timestamped at birth. If ``lru_gen_min_ttl`` is
+set, an ``lruvec`` is protected from the eviction when its oldest
+generation was born within ``lru_gen_min_ttl`` milliseconds. In other
+words, it prevents the working set of ``lru_gen_min_ttl`` milliseconds
+from getting evicted. The OOM killer is triggered if this working set
+cannot be kept in memory.
+
+This time-based approach has the following advantages:
+
+1. It is easier to configure because it is agnostic to applications
+ and memory sizes.
+2. It is more reliable because it is directly wired to the OOM killer.
+
Summary
-------
The multi-gen LRU can be disassembled into the following parts:
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4459,6 +4459,10 @@ done:
return true;
}
+/******************************************************************************
+ * working set protection
+ ******************************************************************************/
+
static bool lruvec_is_sizable(struct lruvec *lruvec, struct scan_control *sc)
{
int gen, type, zone;

View File

@ -0,0 +1,57 @@
From 5ddf9d53d375e42af49b744bd7c2f8247c6bce15 Mon Sep 17 00:00:00 2001
From: "T.J. Alumbaugh" <talumbau@google.com>
Date: Wed, 18 Jan 2023 00:18:22 +0000
Subject: [PATCH 13/19] UPSTREAM: mm: multi-gen LRU: section for rmap/PT walk
feedback
Add a section for lru_gen_look_around() in the code and the design doc.
Link: https://lkml.kernel.org/r/20230118001827.1040870-3-talumbau@google.com
Change-Id: I5097af63f61b3b69ec2abee6cdbdc33c296df213
Signed-off-by: T.J. Alumbaugh <talumbau@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit db19a43d9b3a8876552f00f656008206ef9a5efa)
Bug: 274865848
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
Documentation/mm/multigen_lru.rst | 14 ++++++++++++++
mm/vmscan.c | 4 ++++
2 files changed, 18 insertions(+)
--- a/Documentation/mm/multigen_lru.rst
+++ b/Documentation/mm/multigen_lru.rst
@@ -156,6 +156,20 @@ This time-based approach has the followi
and memory sizes.
2. It is more reliable because it is directly wired to the OOM killer.
+Rmap/PT walk feedback
+---------------------
+Searching the rmap for PTEs mapping each page on an LRU list (to test
+and clear the accessed bit) can be expensive because pages from
+different VMAs (PA space) are not cache friendly to the rmap (VA
+space). For workloads mostly using mapped pages, searching the rmap
+can incur the highest CPU cost in the reclaim path.
+
+``lru_gen_look_around()`` exploits spatial locality to reduce the
+trips into the rmap. It scans the adjacent PTEs of a young PTE and
+promotes hot pages. If the scan was done cacheline efficiently, it
+adds the PMD entry pointing to the PTE table to the Bloom filter. This
+forms a feedback loop between the eviction and the aging.
+
Summary
-------
The multi-gen LRU can be disassembled into the following parts:
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4553,6 +4553,10 @@ static void lru_gen_age_node(struct pgli
}
}
+/******************************************************************************
+ * rmap/PT walk feedback
+ ******************************************************************************/
+
/*
* This function exploits spatial locality when shrink_folio_list() walks the
* rmap. It scans the adjacent PTEs of a young PTE and promotes hot pages. If

View File

@ -0,0 +1,243 @@
From 397624e12244ec038f51cb1f178ccb7a2ec562e5 Mon Sep 17 00:00:00 2001
From: "T.J. Alumbaugh" <talumbau@google.com>
Date: Wed, 18 Jan 2023 00:18:23 +0000
Subject: [PATCH 14/19] UPSTREAM: mm: multi-gen LRU: section for Bloom filters
Move Bloom filters code into a dedicated section. Improve the design doc
to explain Bloom filter usage and connection between aging and eviction in
their use.
Link: https://lkml.kernel.org/r/20230118001827.1040870-4-talumbau@google.com
Change-Id: I73e866f687c1ed9f5c8538086aa39408b79897db
Signed-off-by: T.J. Alumbaugh <talumbau@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit ccbbbb85945d8f0255aa9dbc1b617017e2294f2c)
Bug: 274865848
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
Documentation/mm/multigen_lru.rst | 16 +++
mm/vmscan.c | 180 +++++++++++++++---------------
2 files changed, 108 insertions(+), 88 deletions(-)
--- a/Documentation/mm/multigen_lru.rst
+++ b/Documentation/mm/multigen_lru.rst
@@ -170,6 +170,22 @@ promotes hot pages. If the scan was done
adds the PMD entry pointing to the PTE table to the Bloom filter. This
forms a feedback loop between the eviction and the aging.
+Bloom Filters
+-------------
+Bloom filters are a space and memory efficient data structure for set
+membership test, i.e., test if an element is not in the set or may be
+in the set.
+
+In the eviction path, specifically, in ``lru_gen_look_around()``, if a
+PMD has a sufficient number of hot pages, its address is placed in the
+filter. In the aging path, set membership means that the PTE range
+will be scanned for young pages.
+
+Note that Bloom filters are probabilistic on set membership. If a test
+is false positive, the cost is an additional scan of a range of PTEs,
+which may yield hot pages anyway. Parameters of the filter itself can
+control the false positive rate in the limit.
+
Summary
-------
The multi-gen LRU can be disassembled into the following parts:
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3209,6 +3209,98 @@ static bool __maybe_unused seq_is_valid(
}
/******************************************************************************
+ * Bloom filters
+ ******************************************************************************/
+
+/*
+ * Bloom filters with m=1<<15, k=2 and the false positive rates of ~1/5 when
+ * n=10,000 and ~1/2 when n=20,000, where, conventionally, m is the number of
+ * bits in a bitmap, k is the number of hash functions and n is the number of
+ * inserted items.
+ *
+ * Page table walkers use one of the two filters to reduce their search space.
+ * To get rid of non-leaf entries that no longer have enough leaf entries, the
+ * aging uses the double-buffering technique to flip to the other filter each
+ * time it produces a new generation. For non-leaf entries that have enough
+ * leaf entries, the aging carries them over to the next generation in
+ * walk_pmd_range(); the eviction also report them when walking the rmap
+ * in lru_gen_look_around().
+ *
+ * For future optimizations:
+ * 1. It's not necessary to keep both filters all the time. The spare one can be
+ * freed after the RCU grace period and reallocated if needed again.
+ * 2. And when reallocating, it's worth scaling its size according to the number
+ * of inserted entries in the other filter, to reduce the memory overhead on
+ * small systems and false positives on large systems.
+ * 3. Jenkins' hash function is an alternative to Knuth's.
+ */
+#define BLOOM_FILTER_SHIFT 15
+
+static inline int filter_gen_from_seq(unsigned long seq)
+{
+ return seq % NR_BLOOM_FILTERS;
+}
+
+static void get_item_key(void *item, int *key)
+{
+ u32 hash = hash_ptr(item, BLOOM_FILTER_SHIFT * 2);
+
+ BUILD_BUG_ON(BLOOM_FILTER_SHIFT * 2 > BITS_PER_TYPE(u32));
+
+ key[0] = hash & (BIT(BLOOM_FILTER_SHIFT) - 1);
+ key[1] = hash >> BLOOM_FILTER_SHIFT;
+}
+
+static bool test_bloom_filter(struct lruvec *lruvec, unsigned long seq, void *item)
+{
+ int key[2];
+ unsigned long *filter;
+ int gen = filter_gen_from_seq(seq);
+
+ filter = READ_ONCE(lruvec->mm_state.filters[gen]);
+ if (!filter)
+ return true;
+
+ get_item_key(item, key);
+
+ return test_bit(key[0], filter) && test_bit(key[1], filter);
+}
+
+static void update_bloom_filter(struct lruvec *lruvec, unsigned long seq, void *item)
+{
+ int key[2];
+ unsigned long *filter;
+ int gen = filter_gen_from_seq(seq);
+
+ filter = READ_ONCE(lruvec->mm_state.filters[gen]);
+ if (!filter)
+ return;
+
+ get_item_key(item, key);
+
+ if (!test_bit(key[0], filter))
+ set_bit(key[0], filter);
+ if (!test_bit(key[1], filter))
+ set_bit(key[1], filter);
+}
+
+static void reset_bloom_filter(struct lruvec *lruvec, unsigned long seq)
+{
+ unsigned long *filter;
+ int gen = filter_gen_from_seq(seq);
+
+ filter = lruvec->mm_state.filters[gen];
+ if (filter) {
+ bitmap_clear(filter, 0, BIT(BLOOM_FILTER_SHIFT));
+ return;
+ }
+
+ filter = bitmap_zalloc(BIT(BLOOM_FILTER_SHIFT),
+ __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN);
+ WRITE_ONCE(lruvec->mm_state.filters[gen], filter);
+}
+
+/******************************************************************************
* mm_struct list
******************************************************************************/
@@ -3333,94 +3425,6 @@ void lru_gen_migrate_mm(struct mm_struct
}
#endif
-/*
- * Bloom filters with m=1<<15, k=2 and the false positive rates of ~1/5 when
- * n=10,000 and ~1/2 when n=20,000, where, conventionally, m is the number of
- * bits in a bitmap, k is the number of hash functions and n is the number of
- * inserted items.
- *
- * Page table walkers use one of the two filters to reduce their search space.
- * To get rid of non-leaf entries that no longer have enough leaf entries, the
- * aging uses the double-buffering technique to flip to the other filter each
- * time it produces a new generation. For non-leaf entries that have enough
- * leaf entries, the aging carries them over to the next generation in
- * walk_pmd_range(); the eviction also report them when walking the rmap
- * in lru_gen_look_around().
- *
- * For future optimizations:
- * 1. It's not necessary to keep both filters all the time. The spare one can be
- * freed after the RCU grace period and reallocated if needed again.
- * 2. And when reallocating, it's worth scaling its size according to the number
- * of inserted entries in the other filter, to reduce the memory overhead on
- * small systems and false positives on large systems.
- * 3. Jenkins' hash function is an alternative to Knuth's.
- */
-#define BLOOM_FILTER_SHIFT 15
-
-static inline int filter_gen_from_seq(unsigned long seq)
-{
- return seq % NR_BLOOM_FILTERS;
-}
-
-static void get_item_key(void *item, int *key)
-{
- u32 hash = hash_ptr(item, BLOOM_FILTER_SHIFT * 2);
-
- BUILD_BUG_ON(BLOOM_FILTER_SHIFT * 2 > BITS_PER_TYPE(u32));
-
- key[0] = hash & (BIT(BLOOM_FILTER_SHIFT) - 1);
- key[1] = hash >> BLOOM_FILTER_SHIFT;
-}
-
-static void reset_bloom_filter(struct lruvec *lruvec, unsigned long seq)
-{
- unsigned long *filter;
- int gen = filter_gen_from_seq(seq);
-
- filter = lruvec->mm_state.filters[gen];
- if (filter) {
- bitmap_clear(filter, 0, BIT(BLOOM_FILTER_SHIFT));
- return;
- }
-
- filter = bitmap_zalloc(BIT(BLOOM_FILTER_SHIFT),
- __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN);
- WRITE_ONCE(lruvec->mm_state.filters[gen], filter);
-}
-
-static void update_bloom_filter(struct lruvec *lruvec, unsigned long seq, void *item)
-{
- int key[2];
- unsigned long *filter;
- int gen = filter_gen_from_seq(seq);
-
- filter = READ_ONCE(lruvec->mm_state.filters[gen]);
- if (!filter)
- return;
-
- get_item_key(item, key);
-
- if (!test_bit(key[0], filter))
- set_bit(key[0], filter);
- if (!test_bit(key[1], filter))
- set_bit(key[1], filter);
-}
-
-static bool test_bloom_filter(struct lruvec *lruvec, unsigned long seq, void *item)
-{
- int key[2];
- unsigned long *filter;
- int gen = filter_gen_from_seq(seq);
-
- filter = READ_ONCE(lruvec->mm_state.filters[gen]);
- if (!filter)
- return true;
-
- get_item_key(item, key);
-
- return test_bit(key[0], filter) && test_bit(key[1], filter);
-}
-
static void reset_mm_stats(struct lruvec *lruvec, struct lru_gen_mm_walk *walk, bool last)
{
int i;

View File

@ -0,0 +1,427 @@
From 48c916b812652f9453be5bd45a703728926d41ca Mon Sep 17 00:00:00 2001
From: "T.J. Alumbaugh" <talumbau@google.com>
Date: Wed, 18 Jan 2023 00:18:24 +0000
Subject: [PATCH 15/19] UPSTREAM: mm: multi-gen LRU: section for memcg LRU
Move memcg LRU code into a dedicated section. Improve the design doc to
outline its architecture.
Link: https://lkml.kernel.org/r/20230118001827.1040870-5-talumbau@google.com
Change-Id: Id252e420cff7a858acb098cf2b3642da5c40f602
Signed-off-by: T.J. Alumbaugh <talumbau@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 36c7b4db7c942ae9e1b111f0c6b468c8b2e33842)
Bug: 274865848
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
Documentation/mm/multigen_lru.rst | 33 +++-
include/linux/mm_inline.h | 17 --
include/linux/mmzone.h | 13 +-
mm/memcontrol.c | 8 +-
mm/vmscan.c | 250 +++++++++++++++++-------------
5 files changed, 178 insertions(+), 143 deletions(-)
--- a/Documentation/mm/multigen_lru.rst
+++ b/Documentation/mm/multigen_lru.rst
@@ -186,9 +186,40 @@ is false positive, the cost is an additi
which may yield hot pages anyway. Parameters of the filter itself can
control the false positive rate in the limit.
+Memcg LRU
+---------
+An memcg LRU is a per-node LRU of memcgs. It is also an LRU of LRUs,
+since each node and memcg combination has an LRU of folios (see
+``mem_cgroup_lruvec()``). Its goal is to improve the scalability of
+global reclaim, which is critical to system-wide memory overcommit in
+data centers. Note that memcg LRU only applies to global reclaim.
+
+The basic structure of an memcg LRU can be understood by an analogy to
+the active/inactive LRU (of folios):
+
+1. It has the young and the old (generations), i.e., the counterparts
+ to the active and the inactive;
+2. The increment of ``max_seq`` triggers promotion, i.e., the
+ counterpart to activation;
+3. Other events trigger similar operations, e.g., offlining an memcg
+ triggers demotion, i.e., the counterpart to deactivation.
+
+In terms of global reclaim, it has two distinct features:
+
+1. Sharding, which allows each thread to start at a random memcg (in
+ the old generation) and improves parallelism;
+2. Eventual fairness, which allows direct reclaim to bail out at will
+ and reduces latency without affecting fairness over some time.
+
+In terms of traversing memcgs during global reclaim, it improves the
+best-case complexity from O(n) to O(1) and does not affect the
+worst-case complexity O(n). Therefore, on average, it has a sublinear
+complexity.
+
Summary
-------
-The multi-gen LRU can be disassembled into the following parts:
+The multi-gen LRU (of folios) can be disassembled into the following
+parts:
* Generations
* Rmap walks
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -122,18 +122,6 @@ static inline bool lru_gen_in_fault(void
return current->in_lru_fault;
}
-#ifdef CONFIG_MEMCG
-static inline int lru_gen_memcg_seg(struct lruvec *lruvec)
-{
- return READ_ONCE(lruvec->lrugen.seg);
-}
-#else
-static inline int lru_gen_memcg_seg(struct lruvec *lruvec)
-{
- return 0;
-}
-#endif
-
static inline int lru_gen_from_seq(unsigned long seq)
{
return seq % MAX_NR_GENS;
@@ -309,11 +297,6 @@ static inline bool lru_gen_in_fault(void
return false;
}
-static inline int lru_gen_memcg_seg(struct lruvec *lruvec)
-{
- return 0;
-}
-
static inline bool lru_gen_add_folio(struct lruvec *lruvec, struct folio *folio, bool reclaiming)
{
return false;
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -368,15 +368,6 @@ struct page_vma_mapped_walk;
#define LRU_GEN_MASK ((BIT(LRU_GEN_WIDTH) - 1) << LRU_GEN_PGOFF)
#define LRU_REFS_MASK ((BIT(LRU_REFS_WIDTH) - 1) << LRU_REFS_PGOFF)
-/* see the comment on MEMCG_NR_GENS */
-enum {
- MEMCG_LRU_NOP,
- MEMCG_LRU_HEAD,
- MEMCG_LRU_TAIL,
- MEMCG_LRU_OLD,
- MEMCG_LRU_YOUNG,
-};
-
#ifdef CONFIG_LRU_GEN
enum {
@@ -557,7 +548,7 @@ void lru_gen_exit_memcg(struct mem_cgrou
void lru_gen_online_memcg(struct mem_cgroup *memcg);
void lru_gen_offline_memcg(struct mem_cgroup *memcg);
void lru_gen_release_memcg(struct mem_cgroup *memcg);
-void lru_gen_rotate_memcg(struct lruvec *lruvec, int op);
+void lru_gen_soft_reclaim(struct lruvec *lruvec);
#else /* !CONFIG_MEMCG */
@@ -608,7 +599,7 @@ static inline void lru_gen_release_memcg
{
}
-static inline void lru_gen_rotate_memcg(struct lruvec *lruvec, int op)
+static inline void lru_gen_soft_reclaim(struct lruvec *lruvec)
{
}
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -478,12 +478,8 @@ static void mem_cgroup_update_tree(struc
struct mem_cgroup_tree_per_node *mctz;
if (lru_gen_enabled()) {
- struct lruvec *lruvec = &memcg->nodeinfo[nid]->lruvec;
-
- /* see the comment on MEMCG_NR_GENS */
- if (soft_limit_excess(memcg) && lru_gen_memcg_seg(lruvec) != MEMCG_LRU_HEAD)
- lru_gen_rotate_memcg(lruvec, MEMCG_LRU_HEAD);
-
+ if (soft_limit_excess(memcg))
+ lru_gen_soft_reclaim(&memcg->nodeinfo[nid]->lruvec);
return;
}
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4690,6 +4690,148 @@ void lru_gen_look_around(struct page_vma
}
/******************************************************************************
+ * memcg LRU
+ ******************************************************************************/
+
+/* see the comment on MEMCG_NR_GENS */
+enum {
+ MEMCG_LRU_NOP,
+ MEMCG_LRU_HEAD,
+ MEMCG_LRU_TAIL,
+ MEMCG_LRU_OLD,
+ MEMCG_LRU_YOUNG,
+};
+
+#ifdef CONFIG_MEMCG
+
+static int lru_gen_memcg_seg(struct lruvec *lruvec)
+{
+ return READ_ONCE(lruvec->lrugen.seg);
+}
+
+static void lru_gen_rotate_memcg(struct lruvec *lruvec, int op)
+{
+ int seg;
+ int old, new;
+ int bin = get_random_u32_below(MEMCG_NR_BINS);
+ struct pglist_data *pgdat = lruvec_pgdat(lruvec);
+
+ spin_lock(&pgdat->memcg_lru.lock);
+
+ VM_WARN_ON_ONCE(hlist_nulls_unhashed(&lruvec->lrugen.list));
+
+ seg = 0;
+ new = old = lruvec->lrugen.gen;
+
+ /* see the comment on MEMCG_NR_GENS */
+ if (op == MEMCG_LRU_HEAD)
+ seg = MEMCG_LRU_HEAD;
+ else if (op == MEMCG_LRU_TAIL)
+ seg = MEMCG_LRU_TAIL;
+ else if (op == MEMCG_LRU_OLD)
+ new = get_memcg_gen(pgdat->memcg_lru.seq);
+ else if (op == MEMCG_LRU_YOUNG)
+ new = get_memcg_gen(pgdat->memcg_lru.seq + 1);
+ else
+ VM_WARN_ON_ONCE(true);
+
+ hlist_nulls_del_rcu(&lruvec->lrugen.list);
+
+ if (op == MEMCG_LRU_HEAD || op == MEMCG_LRU_OLD)
+ hlist_nulls_add_head_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[new][bin]);
+ else
+ hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[new][bin]);
+
+ pgdat->memcg_lru.nr_memcgs[old]--;
+ pgdat->memcg_lru.nr_memcgs[new]++;
+
+ lruvec->lrugen.gen = new;
+ WRITE_ONCE(lruvec->lrugen.seg, seg);
+
+ if (!pgdat->memcg_lru.nr_memcgs[old] && old == get_memcg_gen(pgdat->memcg_lru.seq))
+ WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1);
+
+ spin_unlock(&pgdat->memcg_lru.lock);
+}
+
+void lru_gen_online_memcg(struct mem_cgroup *memcg)
+{
+ int gen;
+ int nid;
+ int bin = get_random_u32_below(MEMCG_NR_BINS);
+
+ for_each_node(nid) {
+ struct pglist_data *pgdat = NODE_DATA(nid);
+ struct lruvec *lruvec = get_lruvec(memcg, nid);
+
+ spin_lock(&pgdat->memcg_lru.lock);
+
+ VM_WARN_ON_ONCE(!hlist_nulls_unhashed(&lruvec->lrugen.list));
+
+ gen = get_memcg_gen(pgdat->memcg_lru.seq);
+
+ hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[gen][bin]);
+ pgdat->memcg_lru.nr_memcgs[gen]++;
+
+ lruvec->lrugen.gen = gen;
+
+ spin_unlock(&pgdat->memcg_lru.lock);
+ }
+}
+
+void lru_gen_offline_memcg(struct mem_cgroup *memcg)
+{
+ int nid;
+
+ for_each_node(nid) {
+ struct lruvec *lruvec = get_lruvec(memcg, nid);
+
+ lru_gen_rotate_memcg(lruvec, MEMCG_LRU_OLD);
+ }
+}
+
+void lru_gen_release_memcg(struct mem_cgroup *memcg)
+{
+ int gen;
+ int nid;
+
+ for_each_node(nid) {
+ struct pglist_data *pgdat = NODE_DATA(nid);
+ struct lruvec *lruvec = get_lruvec(memcg, nid);
+
+ spin_lock(&pgdat->memcg_lru.lock);
+
+ VM_WARN_ON_ONCE(hlist_nulls_unhashed(&lruvec->lrugen.list));
+
+ gen = lruvec->lrugen.gen;
+
+ hlist_nulls_del_rcu(&lruvec->lrugen.list);
+ pgdat->memcg_lru.nr_memcgs[gen]--;
+
+ if (!pgdat->memcg_lru.nr_memcgs[gen] && gen == get_memcg_gen(pgdat->memcg_lru.seq))
+ WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1);
+
+ spin_unlock(&pgdat->memcg_lru.lock);
+ }
+}
+
+void lru_gen_soft_reclaim(struct lruvec *lruvec)
+{
+ /* see the comment on MEMCG_NR_GENS */
+ if (lru_gen_memcg_seg(lruvec) != MEMCG_LRU_HEAD)
+ lru_gen_rotate_memcg(lruvec, MEMCG_LRU_HEAD);
+}
+
+#else /* !CONFIG_MEMCG */
+
+static int lru_gen_memcg_seg(struct lruvec *lruvec)
+{
+ return 0;
+}
+
+#endif
+
+/******************************************************************************
* the eviction
******************************************************************************/
@@ -5386,53 +5528,6 @@ done:
pgdat->kswapd_failures = 0;
}
-#ifdef CONFIG_MEMCG
-void lru_gen_rotate_memcg(struct lruvec *lruvec, int op)
-{
- int seg;
- int old, new;
- int bin = get_random_u32_below(MEMCG_NR_BINS);
- struct pglist_data *pgdat = lruvec_pgdat(lruvec);
-
- spin_lock(&pgdat->memcg_lru.lock);
-
- VM_WARN_ON_ONCE(hlist_nulls_unhashed(&lruvec->lrugen.list));
-
- seg = 0;
- new = old = lruvec->lrugen.gen;
-
- /* see the comment on MEMCG_NR_GENS */
- if (op == MEMCG_LRU_HEAD)
- seg = MEMCG_LRU_HEAD;
- else if (op == MEMCG_LRU_TAIL)
- seg = MEMCG_LRU_TAIL;
- else if (op == MEMCG_LRU_OLD)
- new = get_memcg_gen(pgdat->memcg_lru.seq);
- else if (op == MEMCG_LRU_YOUNG)
- new = get_memcg_gen(pgdat->memcg_lru.seq + 1);
- else
- VM_WARN_ON_ONCE(true);
-
- hlist_nulls_del_rcu(&lruvec->lrugen.list);
-
- if (op == MEMCG_LRU_HEAD || op == MEMCG_LRU_OLD)
- hlist_nulls_add_head_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[new][bin]);
- else
- hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[new][bin]);
-
- pgdat->memcg_lru.nr_memcgs[old]--;
- pgdat->memcg_lru.nr_memcgs[new]++;
-
- lruvec->lrugen.gen = new;
- WRITE_ONCE(lruvec->lrugen.seg, seg);
-
- if (!pgdat->memcg_lru.nr_memcgs[old] && old == get_memcg_gen(pgdat->memcg_lru.seq))
- WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1);
-
- spin_unlock(&pgdat->memcg_lru.lock);
-}
-#endif
-
/******************************************************************************
* state change
******************************************************************************/
@@ -6078,67 +6173,6 @@ void lru_gen_exit_memcg(struct mem_cgrou
}
}
-void lru_gen_online_memcg(struct mem_cgroup *memcg)
-{
- int gen;
- int nid;
- int bin = get_random_u32_below(MEMCG_NR_BINS);
-
- for_each_node(nid) {
- struct pglist_data *pgdat = NODE_DATA(nid);
- struct lruvec *lruvec = get_lruvec(memcg, nid);
-
- spin_lock(&pgdat->memcg_lru.lock);
-
- VM_WARN_ON_ONCE(!hlist_nulls_unhashed(&lruvec->lrugen.list));
-
- gen = get_memcg_gen(pgdat->memcg_lru.seq);
-
- hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[gen][bin]);
- pgdat->memcg_lru.nr_memcgs[gen]++;
-
- lruvec->lrugen.gen = gen;
-
- spin_unlock(&pgdat->memcg_lru.lock);
- }
-}
-
-void lru_gen_offline_memcg(struct mem_cgroup *memcg)
-{
- int nid;
-
- for_each_node(nid) {
- struct lruvec *lruvec = get_lruvec(memcg, nid);
-
- lru_gen_rotate_memcg(lruvec, MEMCG_LRU_OLD);
- }
-}
-
-void lru_gen_release_memcg(struct mem_cgroup *memcg)
-{
- int gen;
- int nid;
-
- for_each_node(nid) {
- struct pglist_data *pgdat = NODE_DATA(nid);
- struct lruvec *lruvec = get_lruvec(memcg, nid);
-
- spin_lock(&pgdat->memcg_lru.lock);
-
- VM_WARN_ON_ONCE(hlist_nulls_unhashed(&lruvec->lrugen.list));
-
- gen = lruvec->lrugen.gen;
-
- hlist_nulls_del_rcu(&lruvec->lrugen.list);
- pgdat->memcg_lru.nr_memcgs[gen]--;
-
- if (!pgdat->memcg_lru.nr_memcgs[gen] && gen == get_memcg_gen(pgdat->memcg_lru.seq))
- WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1);
-
- spin_unlock(&pgdat->memcg_lru.lock);
- }
-}
-
#endif /* CONFIG_MEMCG */
static int __init init_lru_gen(void)

View File

@ -0,0 +1,40 @@
From bec433f29537652ed054148edfd7e2183ddcf7c3 Mon Sep 17 00:00:00 2001
From: "T.J. Alumbaugh" <talumbau@google.com>
Date: Wed, 18 Jan 2023 00:18:25 +0000
Subject: [PATCH 16/19] UPSTREAM: mm: multi-gen LRU: improve
lru_gen_exit_memcg()
Add warnings and poison ->next.
Link: https://lkml.kernel.org/r/20230118001827.1040870-6-talumbau@google.com
Change-Id: I53de9e04c1ae941e122b33cd45d2bbb5f34aae0c
Signed-off-by: T.J. Alumbaugh <talumbau@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 37cc99979d04cca677c0ad5c0acd1149ec165d1b)
Bug: 274865848
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
mm/vmscan.c | 5 +++++
1 file changed, 5 insertions(+)
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -6160,12 +6160,17 @@ void lru_gen_exit_memcg(struct mem_cgrou
int i;
int nid;
+ VM_WARN_ON_ONCE(!list_empty(&memcg->mm_list.fifo));
+
for_each_node(nid) {
struct lruvec *lruvec = get_lruvec(memcg, nid);
+ VM_WARN_ON_ONCE(lruvec->mm_state.nr_walkers);
VM_WARN_ON_ONCE(memchr_inv(lruvec->lrugen.nr_pages, 0,
sizeof(lruvec->lrugen.nr_pages)));
+ lruvec->lrugen.list.next = LIST_POISON1;
+
for (i = 0; i < NR_BLOOM_FILTERS; i++) {
bitmap_free(lruvec->mm_state.filters[i]);
lruvec->mm_state.filters[i] = NULL;

View File

@ -0,0 +1,135 @@
From fc0e3b06e0f19917b7ecad7967a72f61d4743644 Mon Sep 17 00:00:00 2001
From: "T.J. Alumbaugh" <talumbau@google.com>
Date: Wed, 18 Jan 2023 00:18:26 +0000
Subject: [PATCH 17/19] UPSTREAM: mm: multi-gen LRU: improve walk_pmd_range()
Improve readability of walk_pmd_range() and walk_pmd_range_locked().
Link: https://lkml.kernel.org/r/20230118001827.1040870-7-talumbau@google.com
Change-Id: Ia084fbf53fe989673b7804ca8ca520af12d7d52a
Signed-off-by: T.J. Alumbaugh <talumbau@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit b5ff4133617d0eced35b685da0bd0929dd9fabb7)
Bug: 274865848
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
mm/vmscan.c | 40 ++++++++++++++++++++--------------------
1 file changed, 20 insertions(+), 20 deletions(-)
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3980,8 +3980,8 @@ restart:
}
#if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG)
-static void walk_pmd_range_locked(pud_t *pud, unsigned long next, struct vm_area_struct *vma,
- struct mm_walk *args, unsigned long *bitmap, unsigned long *start)
+static void walk_pmd_range_locked(pud_t *pud, unsigned long addr, struct vm_area_struct *vma,
+ struct mm_walk *args, unsigned long *bitmap, unsigned long *first)
{
int i;
pmd_t *pmd;
@@ -3994,18 +3994,19 @@ static void walk_pmd_range_locked(pud_t
VM_WARN_ON_ONCE(pud_leaf(*pud));
/* try to batch at most 1+MIN_LRU_BATCH+1 entries */
- if (*start == -1) {
- *start = next;
+ if (*first == -1) {
+ *first = addr;
+ bitmap_zero(bitmap, MIN_LRU_BATCH);
return;
}
- i = next == -1 ? 0 : pmd_index(next) - pmd_index(*start);
+ i = addr == -1 ? 0 : pmd_index(addr) - pmd_index(*first);
if (i && i <= MIN_LRU_BATCH) {
__set_bit(i - 1, bitmap);
return;
}
- pmd = pmd_offset(pud, *start);
+ pmd = pmd_offset(pud, *first);
ptl = pmd_lockptr(args->mm, pmd);
if (!spin_trylock(ptl))
@@ -4016,15 +4017,16 @@ static void walk_pmd_range_locked(pud_t
do {
unsigned long pfn;
struct folio *folio;
- unsigned long addr = i ? (*start & PMD_MASK) + i * PMD_SIZE : *start;
+
+ /* don't round down the first address */
+ addr = i ? (*first & PMD_MASK) + i * PMD_SIZE : *first;
pfn = get_pmd_pfn(pmd[i], vma, addr);
if (pfn == -1)
goto next;
if (!pmd_trans_huge(pmd[i])) {
- if (arch_has_hw_nonleaf_pmd_young() &&
- get_cap(LRU_GEN_NONLEAF_YOUNG))
+ if (arch_has_hw_nonleaf_pmd_young() && get_cap(LRU_GEN_NONLEAF_YOUNG))
pmdp_test_and_clear_young(vma, addr, pmd + i);
goto next;
}
@@ -4053,12 +4055,11 @@ next:
arch_leave_lazy_mmu_mode();
spin_unlock(ptl);
done:
- *start = -1;
- bitmap_zero(bitmap, MIN_LRU_BATCH);
+ *first = -1;
}
#else
-static void walk_pmd_range_locked(pud_t *pud, unsigned long next, struct vm_area_struct *vma,
- struct mm_walk *args, unsigned long *bitmap, unsigned long *start)
+static void walk_pmd_range_locked(pud_t *pud, unsigned long addr, struct vm_area_struct *vma,
+ struct mm_walk *args, unsigned long *bitmap, unsigned long *first)
{
}
#endif
@@ -4071,9 +4072,9 @@ static void walk_pmd_range(pud_t *pud, u
unsigned long next;
unsigned long addr;
struct vm_area_struct *vma;
- unsigned long pos = -1;
+ unsigned long bitmap[BITS_TO_LONGS(MIN_LRU_BATCH)];
+ unsigned long first = -1;
struct lru_gen_mm_walk *walk = args->private;
- unsigned long bitmap[BITS_TO_LONGS(MIN_LRU_BATCH)] = {};
VM_WARN_ON_ONCE(pud_leaf(*pud));
@@ -4115,18 +4116,17 @@ restart:
if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat))
continue;
- walk_pmd_range_locked(pud, addr, vma, args, bitmap, &pos);
+ walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first);
continue;
}
#endif
walk->mm_stats[MM_NONLEAF_TOTAL]++;
- if (arch_has_hw_nonleaf_pmd_young() &&
- get_cap(LRU_GEN_NONLEAF_YOUNG)) {
+ if (arch_has_hw_nonleaf_pmd_young() && get_cap(LRU_GEN_NONLEAF_YOUNG)) {
if (!pmd_young(val))
continue;
- walk_pmd_range_locked(pud, addr, vma, args, bitmap, &pos);
+ walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first);
}
if (!walk->force_scan && !test_bloom_filter(walk->lruvec, walk->max_seq, pmd + i))
@@ -4143,7 +4143,7 @@ restart:
update_bloom_filter(walk->lruvec, walk->max_seq + 1, pmd + i);
}
- walk_pmd_range_locked(pud, -1, vma, args, bitmap, &pos);
+ walk_pmd_range_locked(pud, -1, vma, args, bitmap, &first);
if (i < PTRS_PER_PMD && get_next_vma(PUD_MASK, PMD_SIZE, args, &start, &end))
goto restart;

View File

@ -0,0 +1,148 @@
From e604c3ccb4dfbdde2467fccef9bb36170a392695 Mon Sep 17 00:00:00 2001
From: "T.J. Alumbaugh" <talumbau@google.com>
Date: Wed, 18 Jan 2023 00:18:27 +0000
Subject: [PATCH 18/19] UPSTREAM: mm: multi-gen LRU: simplify
lru_gen_look_around()
Update the folio generation in place with or without
current->reclaim_state->mm_walk. The LRU lock is held for longer, if
mm_walk is NULL and the number of folios to update is more than
PAGEVEC_SIZE.
This causes a measurable regression from the LRU lock contention during a
microbencmark. But a tiny regression is not worth the complexity.
Link: https://lkml.kernel.org/r/20230118001827.1040870-8-talumbau@google.com
Change-Id: I9ce18b4f4062e6c1c13c98ece9422478eb8e1846
Signed-off-by: T.J. Alumbaugh <talumbau@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit abf086721a2f1e6897c57796f7268df1b194c750)
Bug: 274865848
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
mm/vmscan.c | 73 +++++++++++++++++------------------------------------
1 file changed, 23 insertions(+), 50 deletions(-)
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4571,13 +4571,12 @@ static void lru_gen_age_node(struct pgli
void lru_gen_look_around(struct page_vma_mapped_walk *pvmw)
{
int i;
- pte_t *pte;
unsigned long start;
unsigned long end;
- unsigned long addr;
struct lru_gen_mm_walk *walk;
int young = 0;
- unsigned long bitmap[BITS_TO_LONGS(MIN_LRU_BATCH)] = {};
+ pte_t *pte = pvmw->pte;
+ unsigned long addr = pvmw->address;
struct folio *folio = pfn_folio(pvmw->pfn);
struct mem_cgroup *memcg = folio_memcg(folio);
struct pglist_data *pgdat = folio_pgdat(folio);
@@ -4594,25 +4593,28 @@ void lru_gen_look_around(struct page_vma
/* avoid taking the LRU lock under the PTL when possible */
walk = current->reclaim_state ? current->reclaim_state->mm_walk : NULL;
- start = max(pvmw->address & PMD_MASK, pvmw->vma->vm_start);
- end = min(pvmw->address | ~PMD_MASK, pvmw->vma->vm_end - 1) + 1;
+ start = max(addr & PMD_MASK, pvmw->vma->vm_start);
+ end = min(addr | ~PMD_MASK, pvmw->vma->vm_end - 1) + 1;
if (end - start > MIN_LRU_BATCH * PAGE_SIZE) {
- if (pvmw->address - start < MIN_LRU_BATCH * PAGE_SIZE / 2)
+ if (addr - start < MIN_LRU_BATCH * PAGE_SIZE / 2)
end = start + MIN_LRU_BATCH * PAGE_SIZE;
- else if (end - pvmw->address < MIN_LRU_BATCH * PAGE_SIZE / 2)
+ else if (end - addr < MIN_LRU_BATCH * PAGE_SIZE / 2)
start = end - MIN_LRU_BATCH * PAGE_SIZE;
else {
- start = pvmw->address - MIN_LRU_BATCH * PAGE_SIZE / 2;
- end = pvmw->address + MIN_LRU_BATCH * PAGE_SIZE / 2;
+ start = addr - MIN_LRU_BATCH * PAGE_SIZE / 2;
+ end = addr + MIN_LRU_BATCH * PAGE_SIZE / 2;
}
}
- pte = pvmw->pte - (pvmw->address - start) / PAGE_SIZE;
+ /* folio_update_gen() requires stable folio_memcg() */
+ if (!mem_cgroup_trylock_pages(memcg))
+ return;
- rcu_read_lock();
arch_enter_lazy_mmu_mode();
+ pte -= (addr - start) / PAGE_SIZE;
+
for (i = 0, addr = start; addr != end; i++, addr += PAGE_SIZE) {
unsigned long pfn;
@@ -4637,56 +4639,27 @@ void lru_gen_look_around(struct page_vma
!folio_test_swapcache(folio)))
folio_mark_dirty(folio);
+ if (walk) {
+ old_gen = folio_update_gen(folio, new_gen);
+ if (old_gen >= 0 && old_gen != new_gen)
+ update_batch_size(walk, folio, old_gen, new_gen);
+
+ continue;
+ }
+
old_gen = folio_lru_gen(folio);
if (old_gen < 0)
folio_set_referenced(folio);
else if (old_gen != new_gen)
- __set_bit(i, bitmap);
+ folio_activate(folio);
}
arch_leave_lazy_mmu_mode();
- rcu_read_unlock();
+ mem_cgroup_unlock_pages();
/* feedback from rmap walkers to page table walkers */
if (suitable_to_scan(i, young))
update_bloom_filter(lruvec, max_seq, pvmw->pmd);
-
- if (!walk && bitmap_weight(bitmap, MIN_LRU_BATCH) < PAGEVEC_SIZE) {
- for_each_set_bit(i, bitmap, MIN_LRU_BATCH) {
- folio = pfn_folio(pte_pfn(pte[i]));
- folio_activate(folio);
- }
- return;
- }
-
- /* folio_update_gen() requires stable folio_memcg() */
- if (!mem_cgroup_trylock_pages(memcg))
- return;
-
- if (!walk) {
- spin_lock_irq(&lruvec->lru_lock);
- new_gen = lru_gen_from_seq(lruvec->lrugen.max_seq);
- }
-
- for_each_set_bit(i, bitmap, MIN_LRU_BATCH) {
- folio = pfn_folio(pte_pfn(pte[i]));
- if (folio_memcg_rcu(folio) != memcg)
- continue;
-
- old_gen = folio_update_gen(folio, new_gen);
- if (old_gen < 0 || old_gen == new_gen)
- continue;
-
- if (walk)
- update_batch_size(walk, folio, old_gen, new_gen);
- else
- lru_gen_update_size(lruvec, folio, old_gen, new_gen);
- }
-
- if (!walk)
- spin_unlock_irq(&lruvec->lru_lock);
-
- mem_cgroup_unlock_pages();
}
/******************************************************************************

View File

@ -0,0 +1,273 @@
From 418038c22452df38cde519cc8c662bb15139764a Mon Sep 17 00:00:00 2001
From: Kalesh Singh <kaleshsingh@google.com>
Date: Thu, 13 Apr 2023 14:43:26 -0700
Subject: [PATCH 19/19] mm: Multi-gen LRU: remove wait_event_killable()
Android 14 and later default to MGLRU [1] and field telemetry showed
occasional long tail latency (>100ms) in the reclaim path.
Tracing revealed priority inversion in the reclaim path. In
try_to_inc_max_seq(), when high priority tasks were blocked on
wait_event_killable(), the preemption of the low priority task to call
wake_up_all() caused those high priority tasks to wait longer than
necessary. In general, this problem is not different from others of its
kind, e.g., one caused by mutex_lock(). However, it is specific to MGLRU
because it introduced the new wait queue lruvec->mm_state.wait.
The purpose of this new wait queue is to avoid the thundering herd
problem. If many direct reclaimers rush into try_to_inc_max_seq(), only
one can succeed, i.e., the one to wake up the rest, and the rest who
failed might cause premature OOM kills if they do not wait. So far there
is no evidence supporting this scenario, based on how often the wait has
been hit. And this begs the question how useful the wait queue is in
practice.
Based on Minchan's recommendation, which is in line with his commit
6d4675e60135 ("mm: don't be stuck to rmap lock on reclaim path") and the
rest of the MGLRU code which also uses trylock when possible, remove the
wait queue.
[1] https://android-review.googlesource.com/q/I7ed7fbfd6ef9ce10053347528125dd98c39e50bf
Link: https://lkml.kernel.org/r/20230413214326.2147568-1-kaleshsingh@google.com
Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks")
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Suggested-by: Minchan Kim <minchan@kernel.org>
Reported-by: Wei Wang <wvw@google.com>
Acked-by: Yu Zhao <yuzhao@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/mmzone.h | 8 +--
mm/vmscan.c | 112 +++++++++++++++--------------------------
2 files changed, 42 insertions(+), 78 deletions(-)
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -453,18 +453,14 @@ enum {
struct lru_gen_mm_state {
/* set to max_seq after each iteration */
unsigned long seq;
- /* where the current iteration continues (inclusive) */
+ /* where the current iteration continues after */
struct list_head *head;
- /* where the last iteration ended (exclusive) */
+ /* where the last iteration ended before */
struct list_head *tail;
- /* to wait for the last page table walker to finish */
- struct wait_queue_head wait;
/* Bloom filters flip after each iteration */
unsigned long *filters[NR_BLOOM_FILTERS];
/* the mm stats for debugging */
unsigned long stats[NR_HIST_GENS][NR_MM_STATS];
- /* the number of concurrent page table walkers */
- int nr_walkers;
};
struct lru_gen_mm_walk {
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3371,18 +3371,13 @@ void lru_gen_del_mm(struct mm_struct *mm
if (!lruvec)
continue;
- /* where the last iteration ended (exclusive) */
+ /* where the current iteration continues after */
+ if (lruvec->mm_state.head == &mm->lru_gen.list)
+ lruvec->mm_state.head = lruvec->mm_state.head->prev;
+
+ /* where the last iteration ended before */
if (lruvec->mm_state.tail == &mm->lru_gen.list)
lruvec->mm_state.tail = lruvec->mm_state.tail->next;
-
- /* where the current iteration continues (inclusive) */
- if (lruvec->mm_state.head != &mm->lru_gen.list)
- continue;
-
- lruvec->mm_state.head = lruvec->mm_state.head->next;
- /* the deletion ends the current iteration */
- if (lruvec->mm_state.head == &mm_list->fifo)
- WRITE_ONCE(lruvec->mm_state.seq, lruvec->mm_state.seq + 1);
}
list_del_init(&mm->lru_gen.list);
@@ -3478,68 +3473,54 @@ static bool iterate_mm_list(struct lruve
struct mm_struct **iter)
{
bool first = false;
- bool last = true;
+ bool last = false;
struct mm_struct *mm = NULL;
struct mem_cgroup *memcg = lruvec_memcg(lruvec);
struct lru_gen_mm_list *mm_list = get_mm_list(memcg);
struct lru_gen_mm_state *mm_state = &lruvec->mm_state;
/*
- * There are four interesting cases for this page table walker:
- * 1. It tries to start a new iteration of mm_list with a stale max_seq;
- * there is nothing left to do.
- * 2. It's the first of the current generation, and it needs to reset
- * the Bloom filter for the next generation.
- * 3. It reaches the end of mm_list, and it needs to increment
- * mm_state->seq; the iteration is done.
- * 4. It's the last of the current generation, and it needs to reset the
- * mm stats counters for the next generation.
+ * mm_state->seq is incremented after each iteration of mm_list. There
+ * are three interesting cases for this page table walker:
+ * 1. It tries to start a new iteration with a stale max_seq: there is
+ * nothing left to do.
+ * 2. It started the next iteration: it needs to reset the Bloom filter
+ * so that a fresh set of PTE tables can be recorded.
+ * 3. It ended the current iteration: it needs to reset the mm stats
+ * counters and tell its caller to increment max_seq.
*/
spin_lock(&mm_list->lock);
VM_WARN_ON_ONCE(mm_state->seq + 1 < walk->max_seq);
- VM_WARN_ON_ONCE(*iter && mm_state->seq > walk->max_seq);
- VM_WARN_ON_ONCE(*iter && !mm_state->nr_walkers);
- if (walk->max_seq <= mm_state->seq) {
- if (!*iter)
- last = false;
+ if (walk->max_seq <= mm_state->seq)
goto done;
- }
- if (!mm_state->nr_walkers) {
- VM_WARN_ON_ONCE(mm_state->head && mm_state->head != &mm_list->fifo);
+ if (!mm_state->head)
+ mm_state->head = &mm_list->fifo;
- mm_state->head = mm_list->fifo.next;
+ if (mm_state->head == &mm_list->fifo)
first = true;
- }
-
- while (!mm && mm_state->head != &mm_list->fifo) {
- mm = list_entry(mm_state->head, struct mm_struct, lru_gen.list);
+ do {
mm_state->head = mm_state->head->next;
+ if (mm_state->head == &mm_list->fifo) {
+ WRITE_ONCE(mm_state->seq, mm_state->seq + 1);
+ last = true;
+ break;
+ }
/* force scan for those added after the last iteration */
- if (!mm_state->tail || mm_state->tail == &mm->lru_gen.list) {
- mm_state->tail = mm_state->head;
+ if (!mm_state->tail || mm_state->tail == mm_state->head) {
+ mm_state->tail = mm_state->head->next;
walk->force_scan = true;
}
+ mm = list_entry(mm_state->head, struct mm_struct, lru_gen.list);
if (should_skip_mm(mm, walk))
mm = NULL;
- }
-
- if (mm_state->head == &mm_list->fifo)
- WRITE_ONCE(mm_state->seq, mm_state->seq + 1);
+ } while (!mm);
done:
- if (*iter && !mm)
- mm_state->nr_walkers--;
- if (!*iter && mm)
- mm_state->nr_walkers++;
-
- if (mm_state->nr_walkers)
- last = false;
-
if (*iter || last)
reset_mm_stats(lruvec, walk, last);
@@ -3567,9 +3548,9 @@ static bool iterate_mm_list_nowalk(struc
VM_WARN_ON_ONCE(mm_state->seq + 1 < max_seq);
- if (max_seq > mm_state->seq && !mm_state->nr_walkers) {
- VM_WARN_ON_ONCE(mm_state->head && mm_state->head != &mm_list->fifo);
-
+ if (max_seq > mm_state->seq) {
+ mm_state->head = NULL;
+ mm_state->tail = NULL;
WRITE_ONCE(mm_state->seq, mm_state->seq + 1);
reset_mm_stats(lruvec, NULL, true);
success = true;
@@ -4172,10 +4153,6 @@ restart:
walk_pmd_range(&val, addr, next, args);
- /* a racy check to curtail the waiting time */
- if (wq_has_sleeper(&walk->lruvec->mm_state.wait))
- return 1;
-
if (need_resched() || walk->batched >= MAX_LRU_BATCH) {
end = (addr | ~PUD_MASK) + 1;
goto done;
@@ -4208,8 +4185,14 @@ static void walk_mm(struct lruvec *lruve
walk->next_addr = FIRST_USER_ADDRESS;
do {
+ DEFINE_MAX_SEQ(lruvec);
+
err = -EBUSY;
+ /* another thread might have called inc_max_seq() */
+ if (walk->max_seq != max_seq)
+ break;
+
/* folio_update_gen() requires stable folio_memcg() */
if (!mem_cgroup_trylock_pages(memcg))
break;
@@ -4442,25 +4425,12 @@ static bool try_to_inc_max_seq(struct lr
success = iterate_mm_list(lruvec, walk, &mm);
if (mm)
walk_mm(lruvec, mm, walk);
-
- cond_resched();
} while (mm);
done:
- if (!success) {
- if (sc->priority <= DEF_PRIORITY - 2)
- wait_event_killable(lruvec->mm_state.wait,
- max_seq < READ_ONCE(lrugen->max_seq));
- return false;
- }
+ if (success)
+ inc_max_seq(lruvec, can_swap, force_scan);
- VM_WARN_ON_ONCE(max_seq != READ_ONCE(lrugen->max_seq));
-
- inc_max_seq(lruvec, can_swap, force_scan);
- /* either this sees any waiters or they will see updated max_seq */
- if (wq_has_sleeper(&lruvec->mm_state.wait))
- wake_up_all(&lruvec->mm_state.wait);
-
- return true;
+ return success;
}
/******************************************************************************
@@ -6105,7 +6075,6 @@ void lru_gen_init_lruvec(struct lruvec *
INIT_LIST_HEAD(&lrugen->folios[gen][type][zone]);
lruvec->mm_state.seq = MIN_NR_GENS;
- init_waitqueue_head(&lruvec->mm_state.wait);
}
#ifdef CONFIG_MEMCG
@@ -6138,7 +6107,6 @@ void lru_gen_exit_memcg(struct mem_cgrou
for_each_node(nid) {
struct lruvec *lruvec = get_lruvec(memcg, nid);
- VM_WARN_ON_ONCE(lruvec->mm_state.nr_walkers);
VM_WARN_ON_ONCE(memchr_inv(lruvec->lrugen.nr_pages, 0,
sizeof(lruvec->lrugen.nr_pages)));

View File

@ -1,59 +0,0 @@
From e9aef3d90b4bd11fccbde3741f2396ea05a9f386 Mon Sep 17 00:00:00 2001
From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Wed, 30 Nov 2022 23:28:26 +0100
Subject: [PATCH] net: add netdev_sw_irq_coalesce_default_on()
Add a helper for drivers wanting to set SW IRQ coalescing
by default. The related sysfs attributes can be used to
override the default values.
Follow Jakub's suggestion and put this functionality into
net core so that drivers wanting to use software interrupt
coalescing per default don't have to open-code it.
Note that this function needs to be called before the
netdevice is registered.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
include/linux/netdevice.h | 1 +
net/core/dev.c | 16 ++++++++++++++++
2 files changed, 17 insertions(+)
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -78,6 +78,7 @@ struct xdp_buff;
void synchronize_net(void);
void netdev_set_default_ethtool_ops(struct net_device *dev,
const struct ethtool_ops *ops);
+void netdev_sw_irq_coalesce_default_on(struct net_device *dev);
/* Backlog congestion levels */
#define NET_RX_SUCCESS 0 /* keep 'em coming, baby */
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -10534,6 +10534,22 @@ void netdev_set_default_ethtool_ops(stru
}
EXPORT_SYMBOL_GPL(netdev_set_default_ethtool_ops);
+/**
+ * netdev_sw_irq_coalesce_default_on() - enable SW IRQ coalescing by default
+ * @dev: netdev to enable the IRQ coalescing on
+ *
+ * Sets a conservative default for SW IRQ coalescing. Users can use
+ * sysfs attributes to override the default values.
+ */
+void netdev_sw_irq_coalesce_default_on(struct net_device *dev)
+{
+ WARN_ON(dev->reg_state == NETREG_REGISTERED);
+
+ dev->gro_flush_timeout = 20000;
+ dev->napi_defer_hard_irqs = 1;
+}
+EXPORT_SYMBOL_GPL(netdev_sw_irq_coalesce_default_on);
+
void netdev_freemem(struct net_device *dev)
{
char *addr = (char *)dev - dev->padded;

View File

@ -1,56 +0,0 @@
From fd4f7a449938ffd21bf2f5a1708d811cc5f3daa5 Mon Sep 17 00:00:00 2001
From: Denis Kirjanov <dkirjanov@suse.de>
Date: Thu, 27 Oct 2022 21:45:02 +0300
Subject: [PATCH 2/4] drivers: net: convert to boolean for the mac_managed_pm
flag
Signed-off-by: Dennis Kirjanov <dkirjanov@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/ethernet/freescale/fec_main.c | 2 +-
drivers/net/ethernet/realtek/r8169_main.c | 2 +-
drivers/net/usb/asix_devices.c | 4 ++--
3 files changed, 4 insertions(+), 4 deletions(-)
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -2234,7 +2234,7 @@ static int fec_enet_mii_probe(struct net
fep->link = 0;
fep->full_duplex = 0;
- phy_dev->mac_managed_pm = 1;
+ phy_dev->mac_managed_pm = true;
phy_attached_info(phy_dev);
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -5015,7 +5015,7 @@ static int r8169_mdio_register(struct rt
return -EUNATCH;
}
- tp->phydev->mac_managed_pm = 1;
+ tp->phydev->mac_managed_pm = true;
phy_support_asym_pause(tp->phydev);
--- a/drivers/net/usb/asix_devices.c
+++ b/drivers/net/usb/asix_devices.c
@@ -714,7 +714,7 @@ static int ax88772_init_phy(struct usbne
}
phy_suspend(priv->phydev);
- priv->phydev->mac_managed_pm = 1;
+ priv->phydev->mac_managed_pm = true;
phy_attached_info(priv->phydev);
@@ -734,7 +734,7 @@ static int ax88772_init_phy(struct usbne
return -ENODEV;
}
- priv->phydev_int->mac_managed_pm = 1;
+ priv->phydev_int->mac_managed_pm = true;
phy_suspend(priv->phydev_int);
return 0;

View File

@ -1,38 +0,0 @@
From fd149c4ab09b01136c7e80db020eed59a3385d24 Mon Sep 17 00:00:00 2001
From: Juhee Kang <claudiajkang@gmail.com>
Date: Wed, 30 Nov 2022 01:12:44 +0900
Subject: [PATCH 3/4] r8169: use tp_to_dev instead of open code
The open code is defined as a helper function(tp_to_dev) on r8169_main.c,
which the open code is &tp->pci_dev->dev. The helper function was added
in commit 1e1205b7d3e9 ("r8169: add helper tp_to_dev"). And then later,
commit f1e911d5d0df ("r8169: add basic phylib support") added
r8169_phylink_handler function but it didn't use the helper function.
Thus, tp_to_dev() replaces the open code. This patch doesn't change logic.
Signed-off-by: Juhee Kang <claudiajkang@gmail.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/20221129161244.5356-1-claudiajkang@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
drivers/net/ethernet/realtek/r8169_main.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -4556,12 +4556,13 @@ static int rtl8169_poll(struct napi_stru
static void r8169_phylink_handler(struct net_device *ndev)
{
struct rtl8169_private *tp = netdev_priv(ndev);
+ struct device *d = tp_to_dev(tp);
if (netif_carrier_ok(ndev)) {
rtl_link_chg_patch(tp);
- pm_request_resume(&tp->pci_dev->dev);
+ pm_request_resume(d);
} else {
- pm_runtime_idle(&tp->pci_dev->dev);
+ pm_runtime_idle(d);
}
phy_print_status(tp->phydev);

View File

@ -1,33 +0,0 @@
From 74ec605a11b7ecf68036c3f086f684bbe7381353 Mon Sep 17 00:00:00 2001
From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Wed, 30 Nov 2022 23:30:15 +0100
Subject: [PATCH 4/4] r8169: enable GRO software interrupt coalescing per
default
There are reports about r8169 not reaching full line speed on certain
systems (e.g. SBC's) with a 2.5Gbps link.
There was a time when hardware interrupt coalescing was enabled per
default, but this was changed due to ASPM-related issues on few systems.
So let's use software interrupt coalescing instead and enable it
using new function netdev_sw_irq_coalesce_default_on().
Even with these conservative settings interrupt load on my 1Gbps test
system reduced significantly.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/ethernet/realtek/r8169_main.c | 2 ++
1 file changed, 2 insertions(+)
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -5280,6 +5280,8 @@ static int rtl_init_one(struct pci_dev *
dev->hw_features |= NETIF_F_RXALL;
dev->hw_features |= NETIF_F_RXFCS;
+ netdev_sw_irq_coalesce_default_on(dev);
+
/* configure chip for default features */
rtl8169_set_features(dev, dev->features);

View File

@ -0,0 +1,63 @@
From 579aee9fc594af94c242068c011b0233563d4bbf Mon Sep 17 00:00:00 2001
From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Mon, 10 Oct 2022 16:57:21 +1100
Subject: [PATCH] powerpc: suppress some linker warnings in recent linker
versions
This is a follow on from commit
0d362be5b142 ("Makefile: link with -z noexecstack --no-warn-rwx-segments")
for arch/powerpc/boot to address wanrings like:
ld: warning: opal-calls.o: missing .note.GNU-stack section implies executable stack
ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
ld: warning: arch/powerpc/boot/zImage.epapr has a LOAD segment with RWX permissions
This fixes issue https://github.com/linuxppc/issues/issues/417
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221010165721.106267e6@canb.auug.org.au
---
arch/powerpc/boot/wrapper | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
--- a/arch/powerpc/boot/wrapper
+++ b/arch/powerpc/boot/wrapper
@@ -215,6 +215,11 @@ ld_version()
}'
}
+ld_is_lld()
+{
+ ${CROSS}ld -V 2>&1 | grep -q LLD
+}
+
# Do not include PT_INTERP segment when linking pie. Non-pie linking
# just ignores this option.
LD_VERSION=$(${CROSS}ld --version | ld_version)
@@ -223,6 +228,14 @@ if [ "$LD_VERSION" -ge "$LD_NO_DL_MIN_VE
nodl="--no-dynamic-linker"
fi
+# suppress some warnings in recent ld versions
+nowarn="-z noexecstack"
+if ! ld_is_lld; then
+ if [ "$LD_VERSION" -ge "$(echo 2.39 | ld_version)" ]; then
+ nowarn="$nowarn --no-warn-rwx-segments"
+ fi
+fi
+
platformo=$object/"$platform".o
lds=$object/zImage.lds
ext=strip
@@ -504,7 +517,7 @@ if [ "$platform" != "miboot" ]; then
text_start="-Ttext $link_address"
fi
#link everything
- ${CROSS}ld -m $format -T $lds $text_start $pie $nodl $rodynamic $notext -o "$ofile" $map \
+ ${CROSS}ld -m $format -T $lds $text_start $pie $nodl $nowarn $rodynamic $notext -o "$ofile" $map \
$platformo $tmp $object/wrapper.a
rm $tmp
fi

View File

@ -0,0 +1,35 @@
From ebed787a0becb9354f0a23620a5130cccd6c730c Mon Sep 17 00:00:00 2001
From: Daniel Golle <daniel@makrotopia.org>
Date: Thu, 19 Jan 2023 03:45:43 +0000
Subject: [PATCH] mtd: spinand: macronix: use scratch buffer for DMA operation
The mx35lf1ge4ab_get_eccsr() function uses an SPI DMA operation to
read the eccsr, hence the buffer should not be on stack. Since commit
380583227c0c7f ("spi: spi-mem: Add extra sanity checks on the op param")
the kernel emmits a warning and blocks such operations.
Use the scratch buffer to get eccsr instead of trying to directly read
into a stack-allocated variable.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Dhruva Gole <d-gole@ti.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/Y8i85zM0u4XdM46z@makrotopia.org
---
drivers/mtd/nand/spi/macronix.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/mtd/nand/spi/macronix.c
+++ b/drivers/mtd/nand/spi/macronix.c
@@ -83,9 +83,10 @@ static int mx35lf1ge4ab_ecc_get_status(s
* in order to avoid forcing the wear-leveling layer to move
* data around if it's not necessary.
*/
- if (mx35lf1ge4ab_get_eccsr(spinand, &eccsr))
+ if (mx35lf1ge4ab_get_eccsr(spinand, spinand->scratchbuf))
return nanddev_get_ecc_conf(nand)->strength;
+ eccsr = *spinand->scratchbuf;
if (WARN_ON(eccsr > nanddev_get_ecc_conf(nand)->strength ||
!eccsr))
return nanddev_get_ecc_conf(nand)->strength;

View File

@ -0,0 +1,47 @@
From 281f7a6c1a33fffcde32001bacbb4f672140fbf9 Mon Sep 17 00:00:00 2001
From: Michael Walle <michael@walle.cc>
Date: Wed, 8 Mar 2023 09:20:21 +0100
Subject: [PATCH] mtd: core: prepare mtd_otp_nvmem_add() to handle
-EPROBE_DEFER
NVMEM soon will get the ability for nvmem layouts and these might
not be ready when nvmem_register() is called and thus it might
return -EPROBE_DEFER. Don't print the error message in this case.
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20230308082021.870459-4-michael@walle.cc
---
drivers/mtd/mtdcore.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
--- a/drivers/mtd/mtdcore.c
+++ b/drivers/mtd/mtdcore.c
@@ -953,8 +953,8 @@ static int mtd_otp_nvmem_add(struct mtd_
nvmem = mtd_otp_nvmem_register(mtd, "user-otp", size,
mtd_nvmem_user_otp_reg_read);
if (IS_ERR(nvmem)) {
- dev_err(dev, "Failed to register OTP NVMEM device\n");
- return PTR_ERR(nvmem);
+ err = PTR_ERR(nvmem);
+ goto err;
}
mtd->otp_user_nvmem = nvmem;
}
@@ -971,7 +971,6 @@ static int mtd_otp_nvmem_add(struct mtd_
nvmem = mtd_otp_nvmem_register(mtd, "factory-otp", size,
mtd_nvmem_fact_otp_reg_read);
if (IS_ERR(nvmem)) {
- dev_err(dev, "Failed to register OTP NVMEM device\n");
err = PTR_ERR(nvmem);
goto err;
}
@@ -983,7 +982,7 @@ static int mtd_otp_nvmem_add(struct mtd_
err:
nvmem_unregister(mtd->otp_user_nvmem);
- return err;
+ return dev_err_probe(dev, err, "Failed to register OTP NVMEM device\n");
}
/**

View File

@ -0,0 +1,41 @@
From 7390609b0121a1b982c5ecdfcd72dc328e5784ee Mon Sep 17 00:00:00 2001
From: Michael Walle <michael@walle.cc>
Date: Mon, 6 Feb 2023 13:43:42 +0000
Subject: [PATCH] net: add helper eth_addr_add()
Add a helper to add an offset to a ethernet address. This comes in handy
if you have a base ethernet address for multiple interfaces.
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20230206134356.839737-9-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
include/linux/etherdevice.h | 14 ++++++++++++++
1 file changed, 14 insertions(+)
--- a/include/linux/etherdevice.h
+++ b/include/linux/etherdevice.h
@@ -508,6 +508,20 @@ static inline void eth_addr_inc(u8 *addr
}
/**
+ * eth_addr_add() - Add (or subtract) an offset to/from the given MAC address.
+ *
+ * @offset: Offset to add.
+ * @addr: Pointer to a six-byte array containing Ethernet address to increment.
+ */
+static inline void eth_addr_add(u8 *addr, long offset)
+{
+ u64 u = ether_addr_to_u64(addr);
+
+ u += offset;
+ u64_to_ether_addr(u, addr);
+}
+
+/**
* is_etherdev_addr - Tell if given Ethernet address belongs to the device.
* @dev: Pointer to a device structure
* @addr: Pointer to a six-byte array containing the Ethernet address

View File

@ -0,0 +1,48 @@
From 9c5a170677c3c8facc83e931a57f4c99c0511ae0 Mon Sep 17 00:00:00 2001
From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Date: Thu, 27 Oct 2022 14:10:37 +0100
Subject: [PATCH] net: phylink: add phylink_get_link_timer_ns() helper
Add a helper to convert the PHY interface mode to the required link
timer setting as stated by the appropriate standard. Inappropriate
interface modes return an error.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
include/linux/phylink.h | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
--- a/include/linux/phylink.h
+++ b/include/linux/phylink.h
@@ -614,6 +614,30 @@ int phylink_speed_up(struct phylink *pl)
void phylink_set_port_modes(unsigned long *bits);
+/**
+ * phylink_get_link_timer_ns - return the PCS link timer value
+ * @interface: link &typedef phy_interface_t mode
+ *
+ * Return the PCS link timer setting in nanoseconds for the PHY @interface
+ * mode, or -EINVAL if not appropriate.
+ */
+static inline int phylink_get_link_timer_ns(phy_interface_t interface)
+{
+ switch (interface) {
+ case PHY_INTERFACE_MODE_SGMII:
+ case PHY_INTERFACE_MODE_QSGMII:
+ case PHY_INTERFACE_MODE_USXGMII:
+ return 1600000;
+
+ case PHY_INTERFACE_MODE_1000BASEX:
+ case PHY_INTERFACE_MODE_2500BASEX:
+ return 10000000;
+
+ default:
+ return -EINVAL;
+ }
+}
+
void phylink_mii_c22_pcs_decode_state(struct phylink_link_state *state,
u16 bmsr, u16 lpa);
void phylink_mii_c22_pcs_get_state(struct mdio_device *pcs,

View File

@ -0,0 +1,394 @@
From 4765a9722e09765866e131ec31f7b9cf4c1f4854 Mon Sep 17 00:00:00 2001
From: Daniel Golle <daniel@makrotopia.org>
Date: Sun, 19 Mar 2023 12:57:50 +0000
Subject: [PATCH] net: pcs: add driver for MediaTek SGMII PCS
The SGMII core found in several MediaTek SoCs is identical to what can
also be found in MediaTek's MT7531 Ethernet switch IC.
As this has not always been clear, both drivers developed different
implementations to deal with the PCS.
Recently Alexander Couzens pointed out this fact which lead to the
development of this shared driver.
Add a dedicated driver, mostly by copying the code now found in the
Ethernet driver. The now redundant code will be removed by a follow-up
commit.
Suggested-by: Alexander Couzens <lynxis@fe80.eu>
Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Tested-by: Frank Wunderlich <frank-w@public-files.de>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
MAINTAINERS | 8 +
drivers/net/pcs/Kconfig | 7 +
drivers/net/pcs/Makefile | 1 +
drivers/net/pcs/pcs-mtk-lynxi.c | 305 ++++++++++++++++++++++++++++++
include/linux/pcs/pcs-mtk-lynxi.h | 13 ++
5 files changed, 334 insertions(+)
create mode 100644 drivers/net/pcs/pcs-mtk-lynxi.c
create mode 100644 include/linux/pcs/pcs-mtk-lynxi.h
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12926,6 +12926,14 @@ L: netdev@vger.kernel.org
S: Maintained
F: drivers/net/ethernet/mediatek/
+MEDIATEK ETHERNET PCS DRIVER
+M: Alexander Couzens <lynxis@fe80.eu>
+M: Daniel Golle <daniel@makrotopia.org>
+L: netdev@vger.kernel.org
+S: Maintained
+F: drivers/net/pcs/pcs-mtk-lynxi.c
+F: include/linux/pcs/pcs-mtk-lynxi.h
+
MEDIATEK I2C CONTROLLER DRIVER
M: Qii Wang <qii.wang@mediatek.com>
L: linux-i2c@vger.kernel.org
--- a/drivers/net/pcs/Kconfig
+++ b/drivers/net/pcs/Kconfig
@@ -32,4 +32,11 @@ config PCS_ALTERA_TSE
This module provides helper functions for the Altera Triple Speed
Ethernet SGMII PCS, that can be found on the Intel Socfpga family.
+config PCS_MTK_LYNXI
+ tristate
+ select REGMAP
+ help
+ This module provides helpers to phylink for managing the LynxI PCS
+ which is part of MediaTek's SoC and Ethernet switch ICs.
+
endmenu
--- a/drivers/net/pcs/Makefile
+++ b/drivers/net/pcs/Makefile
@@ -7,3 +7,4 @@ obj-$(CONFIG_PCS_XPCS) += pcs_xpcs.o
obj-$(CONFIG_PCS_LYNX) += pcs-lynx.o
obj-$(CONFIG_PCS_RZN1_MIIC) += pcs-rzn1-miic.o
obj-$(CONFIG_PCS_ALTERA_TSE) += pcs-altera-tse.o
+obj-$(CONFIG_PCS_MTK_LYNXI) += pcs-mtk-lynxi.o
--- /dev/null
+++ b/drivers/net/pcs/pcs-mtk-lynxi.c
@@ -0,0 +1,305 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2018-2019 MediaTek Inc.
+/* A library for MediaTek SGMII circuit
+ *
+ * Author: Sean Wang <sean.wang@mediatek.com>
+ * Author: Alexander Couzens <lynxis@fe80.eu>
+ * Author: Daniel Golle <daniel@makrotopia.org>
+ *
+ */
+
+#include <linux/mdio.h>
+#include <linux/of.h>
+#include <linux/pcs/pcs-mtk-lynxi.h>
+#include <linux/phylink.h>
+#include <linux/regmap.h>
+
+/* SGMII subsystem config registers */
+/* BMCR (low 16) BMSR (high 16) */
+#define SGMSYS_PCS_CONTROL_1 0x0
+#define SGMII_BMCR GENMASK(15, 0)
+#define SGMII_BMSR GENMASK(31, 16)
+
+#define SGMSYS_PCS_DEVICE_ID 0x4
+#define SGMII_LYNXI_DEV_ID 0x4d544950
+
+#define SGMSYS_PCS_ADVERTISE 0x8
+#define SGMII_ADVERTISE GENMASK(15, 0)
+#define SGMII_LPA GENMASK(31, 16)
+
+#define SGMSYS_PCS_SCRATCH 0x14
+#define SGMII_DEV_VERSION GENMASK(31, 16)
+
+/* Register to programmable link timer, the unit in 2 * 8ns */
+#define SGMSYS_PCS_LINK_TIMER 0x18
+#define SGMII_LINK_TIMER_MASK GENMASK(19, 0)
+#define SGMII_LINK_TIMER_VAL(ns) FIELD_PREP(SGMII_LINK_TIMER_MASK, \
+ ((ns) / 2 / 8))
+
+/* Register to control remote fault */
+#define SGMSYS_SGMII_MODE 0x20
+#define SGMII_IF_MODE_SGMII BIT(0)
+#define SGMII_SPEED_DUPLEX_AN BIT(1)
+#define SGMII_SPEED_MASK GENMASK(3, 2)
+#define SGMII_SPEED_10 FIELD_PREP(SGMII_SPEED_MASK, 0)
+#define SGMII_SPEED_100 FIELD_PREP(SGMII_SPEED_MASK, 1)
+#define SGMII_SPEED_1000 FIELD_PREP(SGMII_SPEED_MASK, 2)
+#define SGMII_DUPLEX_HALF BIT(4)
+#define SGMII_REMOTE_FAULT_DIS BIT(8)
+
+/* Register to reset SGMII design */
+#define SGMSYS_RESERVED_0 0x34
+#define SGMII_SW_RESET BIT(0)
+
+/* Register to set SGMII speed, ANA RG_ Control Signals III */
+#define SGMII_PHY_SPEED_MASK GENMASK(3, 2)
+#define SGMII_PHY_SPEED_1_25G FIELD_PREP(SGMII_PHY_SPEED_MASK, 0)
+#define SGMII_PHY_SPEED_3_125G FIELD_PREP(SGMII_PHY_SPEED_MASK, 1)
+
+/* Register to power up QPHY */
+#define SGMSYS_QPHY_PWR_STATE_CTRL 0xe8
+#define SGMII_PHYA_PWD BIT(4)
+
+/* Register to QPHY wrapper control */
+#define SGMSYS_QPHY_WRAP_CTRL 0xec
+#define SGMII_PN_SWAP_MASK GENMASK(1, 0)
+#define SGMII_PN_SWAP_TX_RX (BIT(0) | BIT(1))
+
+/* struct mtk_pcs_lynxi - This structure holds each sgmii regmap andassociated
+ * data
+ * @regmap: The register map pointing at the range used to setup
+ * SGMII modes
+ * @dev: Pointer to device owning the PCS
+ * @ana_rgc3: The offset of register ANA_RGC3 relative to regmap
+ * @interface: Currently configured interface mode
+ * @pcs: Phylink PCS structure
+ * @flags: Flags indicating hardware properties
+ */
+struct mtk_pcs_lynxi {
+ struct regmap *regmap;
+ u32 ana_rgc3;
+ phy_interface_t interface;
+ struct phylink_pcs pcs;
+ u32 flags;
+};
+
+static struct mtk_pcs_lynxi *pcs_to_mtk_pcs_lynxi(struct phylink_pcs *pcs)
+{
+ return container_of(pcs, struct mtk_pcs_lynxi, pcs);
+}
+
+static void mtk_pcs_lynxi_get_state(struct phylink_pcs *pcs,
+ struct phylink_link_state *state)
+{
+ struct mtk_pcs_lynxi *mpcs = pcs_to_mtk_pcs_lynxi(pcs);
+ unsigned int bm, adv;
+
+ /* Read the BMSR and LPA */
+ regmap_read(mpcs->regmap, SGMSYS_PCS_CONTROL_1, &bm);
+ regmap_read(mpcs->regmap, SGMSYS_PCS_ADVERTISE, &adv);
+
+ phylink_mii_c22_pcs_decode_state(state, FIELD_GET(SGMII_BMSR, bm),
+ FIELD_GET(SGMII_LPA, adv));
+}
+
+static int mtk_pcs_lynxi_config(struct phylink_pcs *pcs, unsigned int mode,
+ phy_interface_t interface,
+ const unsigned long *advertising,
+ bool permit_pause_to_mac)
+{
+ struct mtk_pcs_lynxi *mpcs = pcs_to_mtk_pcs_lynxi(pcs);
+ bool mode_changed = false, changed, use_an;
+ unsigned int rgc3, sgm_mode, bmcr;
+ int advertise, link_timer;
+
+ advertise = phylink_mii_c22_pcs_encode_advertisement(interface,
+ advertising);
+ if (advertise < 0)
+ return advertise;
+
+ /* Clearing IF_MODE_BIT0 switches the PCS to BASE-X mode, and
+ * we assume that fixes it's speed at bitrate = line rate (in
+ * other words, 1000Mbps or 2500Mbps).
+ */
+ if (interface == PHY_INTERFACE_MODE_SGMII) {
+ sgm_mode = SGMII_IF_MODE_SGMII;
+ if (phylink_autoneg_inband(mode)) {
+ sgm_mode |= SGMII_REMOTE_FAULT_DIS |
+ SGMII_SPEED_DUPLEX_AN;
+ use_an = true;
+ } else {
+ use_an = false;
+ }
+ } else if (phylink_autoneg_inband(mode)) {
+ /* 1000base-X or 2500base-X autoneg */
+ sgm_mode = SGMII_REMOTE_FAULT_DIS;
+ use_an = linkmode_test_bit(ETHTOOL_LINK_MODE_Autoneg_BIT,
+ advertising);
+ } else {
+ /* 1000base-X or 2500base-X without autoneg */
+ sgm_mode = 0;
+ use_an = false;
+ }
+
+ if (use_an)
+ bmcr = BMCR_ANENABLE;
+ else
+ bmcr = 0;
+
+ if (mpcs->interface != interface) {
+ link_timer = phylink_get_link_timer_ns(interface);
+ if (link_timer < 0)
+ return link_timer;
+
+ /* PHYA power down */
+ regmap_set_bits(mpcs->regmap, SGMSYS_QPHY_PWR_STATE_CTRL,
+ SGMII_PHYA_PWD);
+
+ /* Reset SGMII PCS state */
+ regmap_set_bits(mpcs->regmap, SGMSYS_RESERVED_0,
+ SGMII_SW_RESET);
+
+ if (mpcs->flags & MTK_SGMII_FLAG_PN_SWAP)
+ regmap_update_bits(mpcs->regmap, SGMSYS_QPHY_WRAP_CTRL,
+ SGMII_PN_SWAP_MASK,
+ SGMII_PN_SWAP_TX_RX);
+
+ if (interface == PHY_INTERFACE_MODE_2500BASEX)
+ rgc3 = SGMII_PHY_SPEED_3_125G;
+ else
+ rgc3 = SGMII_PHY_SPEED_1_25G;
+
+ /* Configure the underlying interface speed */
+ regmap_update_bits(mpcs->regmap, mpcs->ana_rgc3,
+ SGMII_PHY_SPEED_MASK, rgc3);
+
+ /* Setup the link timer */
+ regmap_write(mpcs->regmap, SGMSYS_PCS_LINK_TIMER,
+ SGMII_LINK_TIMER_VAL(link_timer));
+
+ mpcs->interface = interface;
+ mode_changed = true;
+ }
+
+ /* Update the advertisement, noting whether it has changed */
+ regmap_update_bits_check(mpcs->regmap, SGMSYS_PCS_ADVERTISE,
+ SGMII_ADVERTISE, advertise, &changed);
+
+ /* Update the sgmsys mode register */
+ regmap_update_bits(mpcs->regmap, SGMSYS_SGMII_MODE,
+ SGMII_REMOTE_FAULT_DIS | SGMII_SPEED_DUPLEX_AN |
+ SGMII_IF_MODE_SGMII, sgm_mode);
+
+ /* Update the BMCR */
+ regmap_update_bits(mpcs->regmap, SGMSYS_PCS_CONTROL_1,
+ BMCR_ANENABLE, bmcr);
+
+ /* Release PHYA power down state
+ * Only removing bit SGMII_PHYA_PWD isn't enough.
+ * There are cases when the SGMII_PHYA_PWD register contains 0x9 which
+ * prevents SGMII from working. The SGMII still shows link but no traffic
+ * can flow. Writing 0x0 to the PHYA_PWD register fix the issue. 0x0 was
+ * taken from a good working state of the SGMII interface.
+ * Unknown how much the QPHY needs but it is racy without a sleep.
+ * Tested on mt7622 & mt7986.
+ */
+ usleep_range(50, 100);
+ regmap_write(mpcs->regmap, SGMSYS_QPHY_PWR_STATE_CTRL, 0);
+
+ return changed || mode_changed;
+}
+
+static void mtk_pcs_lynxi_restart_an(struct phylink_pcs *pcs)
+{
+ struct mtk_pcs_lynxi *mpcs = pcs_to_mtk_pcs_lynxi(pcs);
+
+ regmap_set_bits(mpcs->regmap, SGMSYS_PCS_CONTROL_1, BMCR_ANRESTART);
+}
+
+static void mtk_pcs_lynxi_link_up(struct phylink_pcs *pcs, unsigned int mode,
+ phy_interface_t interface, int speed,
+ int duplex)
+{
+ struct mtk_pcs_lynxi *mpcs = pcs_to_mtk_pcs_lynxi(pcs);
+ unsigned int sgm_mode;
+
+ if (!phylink_autoneg_inband(mode)) {
+ /* Force the speed and duplex setting */
+ if (speed == SPEED_10)
+ sgm_mode = SGMII_SPEED_10;
+ else if (speed == SPEED_100)
+ sgm_mode = SGMII_SPEED_100;
+ else
+ sgm_mode = SGMII_SPEED_1000;
+
+ if (duplex != DUPLEX_FULL)
+ sgm_mode |= SGMII_DUPLEX_HALF;
+
+ regmap_update_bits(mpcs->regmap, SGMSYS_SGMII_MODE,
+ SGMII_DUPLEX_HALF | SGMII_SPEED_MASK,
+ sgm_mode);
+ }
+}
+
+static const struct phylink_pcs_ops mtk_pcs_lynxi_ops = {
+ .pcs_get_state = mtk_pcs_lynxi_get_state,
+ .pcs_config = mtk_pcs_lynxi_config,
+ .pcs_an_restart = mtk_pcs_lynxi_restart_an,
+ .pcs_link_up = mtk_pcs_lynxi_link_up,
+};
+
+struct phylink_pcs *mtk_pcs_lynxi_create(struct device *dev,
+ struct regmap *regmap, u32 ana_rgc3,
+ u32 flags)
+{
+ struct mtk_pcs_lynxi *mpcs;
+ u32 id, ver;
+ int ret;
+
+ ret = regmap_read(regmap, SGMSYS_PCS_DEVICE_ID, &id);
+ if (ret < 0)
+ return NULL;
+
+ if (id != SGMII_LYNXI_DEV_ID) {
+ dev_err(dev, "unknown PCS device id %08x\n", id);
+ return NULL;
+ }
+
+ ret = regmap_read(regmap, SGMSYS_PCS_SCRATCH, &ver);
+ if (ret < 0)
+ return NULL;
+
+ ver = FIELD_GET(SGMII_DEV_VERSION, ver);
+ if (ver != 0x1) {
+ dev_err(dev, "unknown PCS device version %04x\n", ver);
+ return NULL;
+ }
+
+ dev_dbg(dev, "MediaTek LynxI SGMII PCS (id 0x%08x, ver 0x%04x)\n", id,
+ ver);
+
+ mpcs = kzalloc(sizeof(*mpcs), GFP_KERNEL);
+ if (!mpcs)
+ return NULL;
+
+ mpcs->ana_rgc3 = ana_rgc3;
+ mpcs->regmap = regmap;
+ mpcs->flags = flags;
+ mpcs->pcs.ops = &mtk_pcs_lynxi_ops;
+ mpcs->pcs.poll = true;
+ mpcs->interface = PHY_INTERFACE_MODE_NA;
+
+ return &mpcs->pcs;
+}
+EXPORT_SYMBOL(mtk_pcs_lynxi_create);
+
+void mtk_pcs_lynxi_destroy(struct phylink_pcs *pcs)
+{
+ if (!pcs)
+ return;
+
+ kfree(pcs_to_mtk_pcs_lynxi(pcs));
+}
+EXPORT_SYMBOL(mtk_pcs_lynxi_destroy);
+
+MODULE_LICENSE("GPL");
--- /dev/null
+++ b/include/linux/pcs/pcs-mtk-lynxi.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __LINUX_PCS_MTK_LYNXI_H
+#define __LINUX_PCS_MTK_LYNXI_H
+
+#include <linux/phylink.h>
+#include <linux/regmap.h>
+
+#define MTK_SGMII_FLAG_PN_SWAP BIT(0)
+struct phylink_pcs *mtk_pcs_lynxi_create(struct device *dev,
+ struct regmap *regmap,
+ u32 ana_rgc3, u32 flags);
+void mtk_pcs_lynxi_destroy(struct phylink_pcs *pcs);
+#endif

View File

@ -0,0 +1,36 @@
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Thu, 17 Nov 2022 00:58:46 +0100
Subject: [PATCH] net: ethernet: mtk_eth_soc: remove cpu_relax in
mtk_pending_work
Get rid of cpu_relax in mtk_pending_work routine since MTK_RESETTING is
set only in mtk_pending_work() and it runs holding rtnl lock
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -3495,11 +3495,8 @@ static void mtk_pending_work(struct work
rtnl_lock();
dev_dbg(eth->dev, "[%s][%d] reset\n", __func__, __LINE__);
+ set_bit(MTK_RESETTING, &eth->state);
- while (test_and_set_bit_lock(MTK_RESETTING, &eth->state))
- cpu_relax();
-
- dev_dbg(eth->dev, "[%s][%d] mtk_stop starts\n", __func__, __LINE__);
/* stop all devices to make sure that dma is properly shut down */
for (i = 0; i < MTK_MAC_COUNT; i++) {
if (!eth->netdev[i])
@@ -3533,7 +3530,7 @@ static void mtk_pending_work(struct work
dev_dbg(eth->dev, "[%s][%d] reset done\n", __func__, __LINE__);
- clear_bit_unlock(MTK_RESETTING, &eth->state);
+ clear_bit(MTK_RESETTING, &eth->state);
rtnl_unlock();
}

View File

@ -0,0 +1,80 @@
From: Sujuan Chen <sujuan.chen@mediatek.com>
Date: Thu, 24 Nov 2022 11:18:14 +0800
Subject: [PATCH] net: ethernet: mtk_wed: add wcid overwritten support for wed
v1
All wed versions should enable the wcid overwritten feature,
since the wcid size is controlled by the wlan driver.
Tested-by: Sujuan Chen <sujuan.chen@mediatek.com>
Co-developed-by: Bo Jiao <bo.jiao@mediatek.com>
Signed-off-by: Bo Jiao <bo.jiao@mediatek.com>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
--- a/drivers/net/ethernet/mediatek/mtk_wed.c
+++ b/drivers/net/ethernet/mediatek/mtk_wed.c
@@ -526,9 +526,9 @@ mtk_wed_dma_disable(struct mtk_wed_devic
MTK_WED_WPDMA_RX_D_RX_DRV_EN);
wed_clr(dev, MTK_WED_WDMA_GLO_CFG,
MTK_WED_WDMA_GLO_CFG_TX_DDONE_CHK);
-
- mtk_wed_set_512_support(dev, false);
}
+
+ mtk_wed_set_512_support(dev, false);
}
static void
@@ -1290,9 +1290,10 @@ mtk_wed_start(struct mtk_wed_device *dev
if (mtk_wed_rro_cfg(dev))
return;
- mtk_wed_set_512_support(dev, dev->wlan.wcid_512);
}
+ mtk_wed_set_512_support(dev, dev->wlan.wcid_512);
+
mtk_wed_dma_enable(dev);
dev->running = true;
}
@@ -1358,11 +1359,13 @@ mtk_wed_attach(struct mtk_wed_device *de
}
mtk_wed_hw_init_early(dev);
- if (hw->version == 1)
+ if (hw->version == 1) {
regmap_update_bits(hw->hifsys, HIFSYS_DMA_AG_MAP,
BIT(hw->index), 0);
- else
+ } else {
+ dev->rev_id = wed_r32(dev, MTK_WED_REV_ID);
ret = mtk_wed_wo_init(hw);
+ }
out:
if (ret)
mtk_wed_detach(dev);
--- a/drivers/net/ethernet/mediatek/mtk_wed_regs.h
+++ b/drivers/net/ethernet/mediatek/mtk_wed_regs.h
@@ -20,6 +20,8 @@ struct mtk_wdma_desc {
__le32 info;
} __packed __aligned(4);
+#define MTK_WED_REV_ID 0x004
+
#define MTK_WED_RESET 0x008
#define MTK_WED_RESET_TX_BM BIT(0)
#define MTK_WED_RESET_TX_FREE_AGENT BIT(4)
--- a/include/linux/soc/mediatek/mtk_wed.h
+++ b/include/linux/soc/mediatek/mtk_wed.h
@@ -85,6 +85,9 @@ struct mtk_wed_device {
int irq;
u8 version;
+ /* used by wlan driver */
+ u32 rev_id;
+
struct mtk_wed_ring tx_ring[MTK_WED_TX_QUEUES];
struct mtk_wed_ring rx_ring[MTK_WED_RX_QUEUES];
struct mtk_wed_ring txfree_ring;

View File

@ -0,0 +1,85 @@
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Thu, 24 Nov 2022 16:22:51 +0100
Subject: [PATCH] net: ethernet: mtk_wed: return status value in
mtk_wdma_rx_reset
Move MTK_WDMA_RESET_IDX configuration in mtk_wdma_rx_reset routine.
Increase poll timeout to 10ms in order to be aligned with vendor sdk.
This is a preliminary patch to add Wireless Ethernet Dispatcher reset
support.
Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
--- a/drivers/net/ethernet/mediatek/mtk_wed.c
+++ b/drivers/net/ethernet/mediatek/mtk_wed.c
@@ -101,17 +101,21 @@ mtk_wdma_read_reset(struct mtk_wed_devic
return wdma_r32(dev, MTK_WDMA_GLO_CFG);
}
-static void
+static int
mtk_wdma_rx_reset(struct mtk_wed_device *dev)
{
u32 status, mask = MTK_WDMA_GLO_CFG_RX_DMA_BUSY;
- int i;
+ int i, ret;
wdma_clr(dev, MTK_WDMA_GLO_CFG, MTK_WDMA_GLO_CFG_RX_DMA_EN);
- if (readx_poll_timeout(mtk_wdma_read_reset, dev, status,
- !(status & mask), 0, 1000))
+ ret = readx_poll_timeout(mtk_wdma_read_reset, dev, status,
+ !(status & mask), 0, 10000);
+ if (ret)
dev_err(dev->hw->dev, "rx reset failed\n");
+ wdma_w32(dev, MTK_WDMA_RESET_IDX, MTK_WDMA_RESET_IDX_RX);
+ wdma_w32(dev, MTK_WDMA_RESET_IDX, 0);
+
for (i = 0; i < ARRAY_SIZE(dev->rx_wdma); i++) {
if (dev->rx_wdma[i].desc)
continue;
@@ -119,6 +123,8 @@ mtk_wdma_rx_reset(struct mtk_wed_device
wdma_w32(dev,
MTK_WDMA_RING_RX(i) + MTK_WED_RING_OFS_CPU_IDX, 0);
}
+
+ return ret;
}
static void
@@ -565,9 +571,7 @@ mtk_wed_detach(struct mtk_wed_device *de
mtk_wed_stop(dev);
- wdma_w32(dev, MTK_WDMA_RESET_IDX, MTK_WDMA_RESET_IDX_RX);
- wdma_w32(dev, MTK_WDMA_RESET_IDX, 0);
-
+ mtk_wdma_rx_reset(dev);
mtk_wed_reset(dev, MTK_WED_RESET_WED);
if (mtk_wed_get_rx_capa(dev)) {
wdma_clr(dev, MTK_WDMA_GLO_CFG, MTK_WDMA_GLO_CFG_TX_DMA_EN);
@@ -582,7 +586,6 @@ mtk_wed_detach(struct mtk_wed_device *de
mtk_wed_wo_reset(dev);
mtk_wed_free_rx_rings(dev);
mtk_wed_wo_deinit(hw);
- mtk_wdma_rx_reset(dev);
}
if (dev->wlan.bus_type == MTK_WED_BUS_PCIE) {
@@ -999,11 +1002,7 @@ mtk_wed_reset_dma(struct mtk_wed_device
wed_w32(dev, MTK_WED_RESET_IDX, 0);
}
- wdma_w32(dev, MTK_WDMA_RESET_IDX, MTK_WDMA_RESET_IDX_RX);
- wdma_w32(dev, MTK_WDMA_RESET_IDX, 0);
-
- if (mtk_wed_get_rx_capa(dev))
- mtk_wdma_rx_reset(dev);
+ mtk_wdma_rx_reset(dev);
if (busy) {
mtk_wed_reset(dev, MTK_WED_RESET_WDMA_INT_AGENT);

View File

@ -0,0 +1,52 @@
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Thu, 24 Nov 2022 16:22:52 +0100
Subject: [PATCH] net: ethernet: mtk_wed: move MTK_WDMA_RESET_IDX_TX
configuration in mtk_wdma_tx_reset
Remove duplicated code. Increase poll timeout to 10ms in order to be
aligned with vendor sdk.
This is a preliminary patch to add Wireless Ethernet Dispatcher reset
support.
Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
--- a/drivers/net/ethernet/mediatek/mtk_wed.c
+++ b/drivers/net/ethernet/mediatek/mtk_wed.c
@@ -135,16 +135,15 @@ mtk_wdma_tx_reset(struct mtk_wed_device
wdma_clr(dev, MTK_WDMA_GLO_CFG, MTK_WDMA_GLO_CFG_TX_DMA_EN);
if (readx_poll_timeout(mtk_wdma_read_reset, dev, status,
- !(status & mask), 0, 1000))
+ !(status & mask), 0, 10000))
dev_err(dev->hw->dev, "tx reset failed\n");
- for (i = 0; i < ARRAY_SIZE(dev->tx_wdma); i++) {
- if (dev->tx_wdma[i].desc)
- continue;
+ wdma_w32(dev, MTK_WDMA_RESET_IDX, MTK_WDMA_RESET_IDX_TX);
+ wdma_w32(dev, MTK_WDMA_RESET_IDX, 0);
+ for (i = 0; i < ARRAY_SIZE(dev->tx_wdma); i++)
wdma_w32(dev,
MTK_WDMA_RING_TX(i) + MTK_WED_RING_OFS_CPU_IDX, 0);
- }
}
static void
@@ -573,12 +572,6 @@ mtk_wed_detach(struct mtk_wed_device *de
mtk_wdma_rx_reset(dev);
mtk_wed_reset(dev, MTK_WED_RESET_WED);
- if (mtk_wed_get_rx_capa(dev)) {
- wdma_clr(dev, MTK_WDMA_GLO_CFG, MTK_WDMA_GLO_CFG_TX_DMA_EN);
- wdma_w32(dev, MTK_WDMA_RESET_IDX, MTK_WDMA_RESET_IDX_TX);
- wdma_w32(dev, MTK_WDMA_RESET_IDX, 0);
- }
-
mtk_wed_free_tx_buffer(dev);
mtk_wed_free_tx_rings(dev);

View File

@ -0,0 +1,98 @@
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Thu, 24 Nov 2022 16:22:53 +0100
Subject: [PATCH] net: ethernet: mtk_wed: update mtk_wed_stop
Update mtk_wed_stop routine and rename old mtk_wed_stop() to
mtk_wed_deinit(). This is a preliminary patch to add Wireless Ethernet
Dispatcher reset support.
Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
--- a/drivers/net/ethernet/mediatek/mtk_wed.c
+++ b/drivers/net/ethernet/mediatek/mtk_wed.c
@@ -539,14 +539,8 @@ mtk_wed_dma_disable(struct mtk_wed_devic
static void
mtk_wed_stop(struct mtk_wed_device *dev)
{
- mtk_wed_dma_disable(dev);
mtk_wed_set_ext_int(dev, false);
- wed_clr(dev, MTK_WED_CTRL,
- MTK_WED_CTRL_WDMA_INT_AGENT_EN |
- MTK_WED_CTRL_WPDMA_INT_AGENT_EN |
- MTK_WED_CTRL_WED_TX_BM_EN |
- MTK_WED_CTRL_WED_TX_FREE_AGENT_EN);
wed_w32(dev, MTK_WED_WPDMA_INT_TRIGGER, 0);
wed_w32(dev, MTK_WED_WDMA_INT_TRIGGER, 0);
wdma_w32(dev, MTK_WDMA_INT_MASK, 0);
@@ -558,7 +552,27 @@ mtk_wed_stop(struct mtk_wed_device *dev)
wed_w32(dev, MTK_WED_EXT_INT_MASK1, 0);
wed_w32(dev, MTK_WED_EXT_INT_MASK2, 0);
- wed_clr(dev, MTK_WED_CTRL, MTK_WED_CTRL_WED_RX_BM_EN);
+}
+
+static void
+mtk_wed_deinit(struct mtk_wed_device *dev)
+{
+ mtk_wed_stop(dev);
+ mtk_wed_dma_disable(dev);
+
+ wed_clr(dev, MTK_WED_CTRL,
+ MTK_WED_CTRL_WDMA_INT_AGENT_EN |
+ MTK_WED_CTRL_WPDMA_INT_AGENT_EN |
+ MTK_WED_CTRL_WED_TX_BM_EN |
+ MTK_WED_CTRL_WED_TX_FREE_AGENT_EN);
+
+ if (dev->hw->version == 1)
+ return;
+
+ wed_clr(dev, MTK_WED_CTRL,
+ MTK_WED_CTRL_RX_ROUTE_QM_EN |
+ MTK_WED_CTRL_WED_RX_BM_EN |
+ MTK_WED_CTRL_RX_RRO_QM_EN);
}
static void
@@ -568,7 +582,7 @@ mtk_wed_detach(struct mtk_wed_device *de
mutex_lock(&hw_lock);
- mtk_wed_stop(dev);
+ mtk_wed_deinit(dev);
mtk_wdma_rx_reset(dev);
mtk_wed_reset(dev, MTK_WED_RESET_WED);
@@ -670,7 +684,7 @@ mtk_wed_hw_init_early(struct mtk_wed_dev
{
u32 mask, set;
- mtk_wed_stop(dev);
+ mtk_wed_deinit(dev);
mtk_wed_reset(dev, MTK_WED_RESET_WED);
mtk_wed_set_wpdma(dev);
--- a/include/linux/soc/mediatek/mtk_wed.h
+++ b/include/linux/soc/mediatek/mtk_wed.h
@@ -234,6 +234,8 @@ mtk_wed_get_rx_capa(struct mtk_wed_devic
(_dev)->ops->ppe_check(_dev, _skb, _reason, _hash)
#define mtk_wed_device_update_msg(_dev, _id, _msg, _len) \
(_dev)->ops->msg_update(_dev, _id, _msg, _len)
+#define mtk_wed_device_stop(_dev) (_dev)->ops->stop(_dev)
+#define mtk_wed_device_dma_reset(_dev) (_dev)->ops->reset_dma(_dev)
#else
static inline bool mtk_wed_device_active(struct mtk_wed_device *dev)
{
@@ -250,6 +252,8 @@ static inline bool mtk_wed_device_active
#define mtk_wed_device_rx_ring_setup(_dev, _ring, _regs) -ENODEV
#define mtk_wed_device_ppe_check(_dev, _skb, _reason, _hash) do {} while (0)
#define mtk_wed_device_update_msg(_dev, _id, _msg, _len) -ENODEV
+#define mtk_wed_device_stop(_dev) do {} while (0)
+#define mtk_wed_device_dma_reset(_dev) do {} while (0)
#endif
#endif

View File

@ -0,0 +1,309 @@
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Thu, 24 Nov 2022 16:22:54 +0100
Subject: [PATCH] net: ethernet: mtk_wed: add mtk_wed_rx_reset routine
Introduce mtk_wed_rx_reset routine in order to reset rx DMA for Wireless
Ethernet Dispatcher available on MT7986 SoC.
Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
--- a/drivers/net/ethernet/mediatek/mtk_wed.c
+++ b/drivers/net/ethernet/mediatek/mtk_wed.c
@@ -944,42 +944,130 @@ mtk_wed_ring_reset(struct mtk_wed_ring *
}
static u32
-mtk_wed_check_busy(struct mtk_wed_device *dev)
+mtk_wed_check_busy(struct mtk_wed_device *dev, u32 reg, u32 mask)
{
- if (wed_r32(dev, MTK_WED_GLO_CFG) & MTK_WED_GLO_CFG_TX_DMA_BUSY)
- return true;
-
- if (wed_r32(dev, MTK_WED_WPDMA_GLO_CFG) &
- MTK_WED_WPDMA_GLO_CFG_TX_DRV_BUSY)
- return true;
-
- if (wed_r32(dev, MTK_WED_CTRL) & MTK_WED_CTRL_WDMA_INT_AGENT_BUSY)
- return true;
-
- if (wed_r32(dev, MTK_WED_WDMA_GLO_CFG) &
- MTK_WED_WDMA_GLO_CFG_RX_DRV_BUSY)
- return true;
-
- if (wdma_r32(dev, MTK_WDMA_GLO_CFG) &
- MTK_WED_WDMA_GLO_CFG_RX_DRV_BUSY)
- return true;
-
- if (wed_r32(dev, MTK_WED_CTRL) &
- (MTK_WED_CTRL_WED_TX_BM_BUSY | MTK_WED_CTRL_WED_TX_FREE_AGENT_BUSY))
- return true;
-
- return false;
+ return !!(wed_r32(dev, reg) & mask);
}
static int
-mtk_wed_poll_busy(struct mtk_wed_device *dev)
+mtk_wed_poll_busy(struct mtk_wed_device *dev, u32 reg, u32 mask)
{
int sleep = 15000;
int timeout = 100 * sleep;
u32 val;
return read_poll_timeout(mtk_wed_check_busy, val, !val, sleep,
- timeout, false, dev);
+ timeout, false, dev, reg, mask);
+}
+
+static int
+mtk_wed_rx_reset(struct mtk_wed_device *dev)
+{
+ struct mtk_wed_wo *wo = dev->hw->wed_wo;
+ u8 val = MTK_WED_WO_STATE_SER_RESET;
+ int i, ret;
+
+ ret = mtk_wed_mcu_send_msg(wo, MTK_WED_MODULE_ID_WO,
+ MTK_WED_WO_CMD_CHANGE_STATE, &val,
+ sizeof(val), true);
+ if (ret)
+ return ret;
+
+ wed_clr(dev, MTK_WED_WPDMA_RX_D_GLO_CFG, MTK_WED_WPDMA_RX_D_RX_DRV_EN);
+ ret = mtk_wed_poll_busy(dev, MTK_WED_WPDMA_RX_D_GLO_CFG,
+ MTK_WED_WPDMA_RX_D_RX_DRV_BUSY);
+ if (ret) {
+ mtk_wed_reset(dev, MTK_WED_RESET_WPDMA_INT_AGENT);
+ mtk_wed_reset(dev, MTK_WED_RESET_WPDMA_RX_D_DRV);
+ } else {
+ wed_w32(dev, MTK_WED_WPDMA_RX_D_RST_IDX,
+ MTK_WED_WPDMA_RX_D_RST_CRX_IDX |
+ MTK_WED_WPDMA_RX_D_RST_DRV_IDX);
+
+ wed_set(dev, MTK_WED_WPDMA_RX_D_GLO_CFG,
+ MTK_WED_WPDMA_RX_D_RST_INIT_COMPLETE |
+ MTK_WED_WPDMA_RX_D_FSM_RETURN_IDLE);
+ wed_clr(dev, MTK_WED_WPDMA_RX_D_GLO_CFG,
+ MTK_WED_WPDMA_RX_D_RST_INIT_COMPLETE |
+ MTK_WED_WPDMA_RX_D_FSM_RETURN_IDLE);
+
+ wed_w32(dev, MTK_WED_WPDMA_RX_D_RST_IDX, 0);
+ }
+
+ /* reset rro qm */
+ wed_clr(dev, MTK_WED_CTRL, MTK_WED_CTRL_RX_RRO_QM_EN);
+ ret = mtk_wed_poll_busy(dev, MTK_WED_CTRL,
+ MTK_WED_CTRL_RX_RRO_QM_BUSY);
+ if (ret) {
+ mtk_wed_reset(dev, MTK_WED_RESET_RX_RRO_QM);
+ } else {
+ wed_set(dev, MTK_WED_RROQM_RST_IDX,
+ MTK_WED_RROQM_RST_IDX_MIOD |
+ MTK_WED_RROQM_RST_IDX_FDBK);
+ wed_w32(dev, MTK_WED_RROQM_RST_IDX, 0);
+ }
+
+ /* reset route qm */
+ wed_clr(dev, MTK_WED_CTRL, MTK_WED_CTRL_RX_ROUTE_QM_EN);
+ ret = mtk_wed_poll_busy(dev, MTK_WED_CTRL,
+ MTK_WED_CTRL_RX_ROUTE_QM_BUSY);
+ if (ret)
+ mtk_wed_reset(dev, MTK_WED_RESET_RX_ROUTE_QM);
+ else
+ wed_set(dev, MTK_WED_RTQM_GLO_CFG,
+ MTK_WED_RTQM_Q_RST);
+
+ /* reset tx wdma */
+ mtk_wdma_tx_reset(dev);
+
+ /* reset tx wdma drv */
+ wed_clr(dev, MTK_WED_WDMA_GLO_CFG, MTK_WED_WDMA_GLO_CFG_TX_DRV_EN);
+ mtk_wed_poll_busy(dev, MTK_WED_CTRL,
+ MTK_WED_CTRL_WDMA_INT_AGENT_BUSY);
+ mtk_wed_reset(dev, MTK_WED_RESET_WDMA_TX_DRV);
+
+ /* reset wed rx dma */
+ ret = mtk_wed_poll_busy(dev, MTK_WED_GLO_CFG,
+ MTK_WED_GLO_CFG_RX_DMA_BUSY);
+ wed_clr(dev, MTK_WED_GLO_CFG, MTK_WED_GLO_CFG_RX_DMA_EN);
+ if (ret) {
+ mtk_wed_reset(dev, MTK_WED_RESET_WED_RX_DMA);
+ } else {
+ struct mtk_eth *eth = dev->hw->eth;
+
+ if (MTK_HAS_CAPS(eth->soc->caps, MTK_NETSYS_V2))
+ wed_set(dev, MTK_WED_RESET_IDX,
+ MTK_WED_RESET_IDX_RX_V2);
+ else
+ wed_set(dev, MTK_WED_RESET_IDX, MTK_WED_RESET_IDX_RX);
+ wed_w32(dev, MTK_WED_RESET_IDX, 0);
+ }
+
+ /* reset rx bm */
+ wed_clr(dev, MTK_WED_CTRL, MTK_WED_CTRL_WED_RX_BM_EN);
+ mtk_wed_poll_busy(dev, MTK_WED_CTRL,
+ MTK_WED_CTRL_WED_RX_BM_BUSY);
+ mtk_wed_reset(dev, MTK_WED_RESET_RX_BM);
+
+ /* wo change to enable state */
+ val = MTK_WED_WO_STATE_ENABLE;
+ ret = mtk_wed_mcu_send_msg(wo, MTK_WED_MODULE_ID_WO,
+ MTK_WED_WO_CMD_CHANGE_STATE, &val,
+ sizeof(val), true);
+ if (ret)
+ return ret;
+
+ /* wed_rx_ring_reset */
+ for (i = 0; i < ARRAY_SIZE(dev->rx_ring); i++) {
+ if (!dev->rx_ring[i].desc)
+ continue;
+
+ mtk_wed_ring_reset(&dev->rx_ring[i], MTK_WED_RX_RING_SIZE,
+ false);
+ }
+ mtk_wed_free_rx_buffer(dev);
+
+ return 0;
}
static void
@@ -997,19 +1085,23 @@ mtk_wed_reset_dma(struct mtk_wed_device
true);
}
- if (mtk_wed_poll_busy(dev))
- busy = mtk_wed_check_busy(dev);
-
+ /* 1. reset WED tx DMA */
+ wed_clr(dev, MTK_WED_GLO_CFG, MTK_WED_GLO_CFG_TX_DMA_EN);
+ busy = mtk_wed_poll_busy(dev, MTK_WED_GLO_CFG,
+ MTK_WED_GLO_CFG_TX_DMA_BUSY);
if (busy) {
mtk_wed_reset(dev, MTK_WED_RESET_WED_TX_DMA);
} else {
- wed_w32(dev, MTK_WED_RESET_IDX,
- MTK_WED_RESET_IDX_TX |
- MTK_WED_RESET_IDX_RX);
+ wed_w32(dev, MTK_WED_RESET_IDX, MTK_WED_RESET_IDX_TX);
wed_w32(dev, MTK_WED_RESET_IDX, 0);
}
- mtk_wdma_rx_reset(dev);
+ /* 2. reset WDMA rx DMA */
+ busy = !!mtk_wdma_rx_reset(dev);
+ wed_clr(dev, MTK_WED_WDMA_GLO_CFG, MTK_WED_WDMA_GLO_CFG_RX_DRV_EN);
+ if (!busy)
+ busy = mtk_wed_poll_busy(dev, MTK_WED_WDMA_GLO_CFG,
+ MTK_WED_WDMA_GLO_CFG_RX_DRV_BUSY);
if (busy) {
mtk_wed_reset(dev, MTK_WED_RESET_WDMA_INT_AGENT);
@@ -1026,6 +1118,9 @@ mtk_wed_reset_dma(struct mtk_wed_device
MTK_WED_WDMA_GLO_CFG_RST_INIT_COMPLETE);
}
+ /* 3. reset WED WPDMA tx */
+ wed_clr(dev, MTK_WED_CTRL, MTK_WED_CTRL_WED_TX_FREE_AGENT_EN);
+
for (i = 0; i < 100; i++) {
val = wed_r32(dev, MTK_WED_TX_BM_INTF);
if (FIELD_GET(MTK_WED_TX_BM_INTF_TKFIFO_FDEP, val) == 0x40)
@@ -1033,8 +1128,19 @@ mtk_wed_reset_dma(struct mtk_wed_device
}
mtk_wed_reset(dev, MTK_WED_RESET_TX_FREE_AGENT);
+ wed_clr(dev, MTK_WED_CTRL, MTK_WED_CTRL_WED_TX_BM_EN);
mtk_wed_reset(dev, MTK_WED_RESET_TX_BM);
+ /* 4. reset WED WPDMA tx */
+ busy = mtk_wed_poll_busy(dev, MTK_WED_WPDMA_GLO_CFG,
+ MTK_WED_WPDMA_GLO_CFG_TX_DRV_BUSY);
+ wed_clr(dev, MTK_WED_WPDMA_GLO_CFG,
+ MTK_WED_WPDMA_GLO_CFG_TX_DRV_EN |
+ MTK_WED_WPDMA_GLO_CFG_RX_DRV_EN);
+ if (!busy)
+ busy = mtk_wed_poll_busy(dev, MTK_WED_WPDMA_GLO_CFG,
+ MTK_WED_WPDMA_GLO_CFG_RX_DRV_BUSY);
+
if (busy) {
mtk_wed_reset(dev, MTK_WED_RESET_WPDMA_INT_AGENT);
mtk_wed_reset(dev, MTK_WED_RESET_WPDMA_TX_DRV);
@@ -1045,6 +1151,17 @@ mtk_wed_reset_dma(struct mtk_wed_device
MTK_WED_WPDMA_RESET_IDX_RX);
wed_w32(dev, MTK_WED_WPDMA_RESET_IDX, 0);
}
+
+ dev->init_done = false;
+ if (dev->hw->version == 1)
+ return;
+
+ if (!busy) {
+ wed_w32(dev, MTK_WED_RESET_IDX, MTK_WED_RESET_WPDMA_IDX_RX);
+ wed_w32(dev, MTK_WED_RESET_IDX, 0);
+ }
+
+ mtk_wed_rx_reset(dev);
}
static int
@@ -1267,6 +1384,9 @@ mtk_wed_start(struct mtk_wed_device *dev
{
int i;
+ if (mtk_wed_get_rx_capa(dev) && mtk_wed_rx_buffer_alloc(dev))
+ return;
+
for (i = 0; i < ARRAY_SIZE(dev->rx_wdma); i++)
if (!dev->rx_wdma[i].desc)
mtk_wed_wdma_rx_ring_setup(dev, i, 16);
@@ -1355,10 +1475,6 @@ mtk_wed_attach(struct mtk_wed_device *de
goto out;
if (mtk_wed_get_rx_capa(dev)) {
- ret = mtk_wed_rx_buffer_alloc(dev);
- if (ret)
- goto out;
-
ret = mtk_wed_rro_alloc(dev);
if (ret)
goto out;
--- a/drivers/net/ethernet/mediatek/mtk_wed_regs.h
+++ b/drivers/net/ethernet/mediatek/mtk_wed_regs.h
@@ -24,11 +24,15 @@ struct mtk_wdma_desc {
#define MTK_WED_RESET 0x008
#define MTK_WED_RESET_TX_BM BIT(0)
+#define MTK_WED_RESET_RX_BM BIT(1)
#define MTK_WED_RESET_TX_FREE_AGENT BIT(4)
#define MTK_WED_RESET_WPDMA_TX_DRV BIT(8)
#define MTK_WED_RESET_WPDMA_RX_DRV BIT(9)
+#define MTK_WED_RESET_WPDMA_RX_D_DRV BIT(10)
#define MTK_WED_RESET_WPDMA_INT_AGENT BIT(11)
#define MTK_WED_RESET_WED_TX_DMA BIT(12)
+#define MTK_WED_RESET_WED_RX_DMA BIT(13)
+#define MTK_WED_RESET_WDMA_TX_DRV BIT(16)
#define MTK_WED_RESET_WDMA_RX_DRV BIT(17)
#define MTK_WED_RESET_WDMA_INT_AGENT BIT(19)
#define MTK_WED_RESET_RX_RRO_QM BIT(20)
@@ -158,6 +162,8 @@ struct mtk_wdma_desc {
#define MTK_WED_RESET_IDX 0x20c
#define MTK_WED_RESET_IDX_TX GENMASK(3, 0)
#define MTK_WED_RESET_IDX_RX GENMASK(17, 16)
+#define MTK_WED_RESET_IDX_RX_V2 GENMASK(7, 6)
+#define MTK_WED_RESET_WPDMA_IDX_RX GENMASK(31, 30)
#define MTK_WED_TX_MIB(_n) (0x2a0 + (_n) * 4)
#define MTK_WED_RX_MIB(_n) (0x2e0 + (_n) * 4)
@@ -267,6 +273,9 @@ struct mtk_wdma_desc {
#define MTK_WED_WPDMA_RX_D_GLO_CFG 0x75c
#define MTK_WED_WPDMA_RX_D_RX_DRV_EN BIT(0)
+#define MTK_WED_WPDMA_RX_D_RX_DRV_BUSY BIT(1)
+#define MTK_WED_WPDMA_RX_D_FSM_RETURN_IDLE BIT(3)
+#define MTK_WED_WPDMA_RX_D_RST_INIT_COMPLETE BIT(4)
#define MTK_WED_WPDMA_RX_D_INIT_PHASE_RXEN_SEL GENMASK(11, 7)
#define MTK_WED_WPDMA_RX_D_RXD_READ_LEN GENMASK(31, 24)

View File

@ -0,0 +1,103 @@
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Thu, 24 Nov 2022 16:22:55 +0100
Subject: [PATCH] net: ethernet: mtk_wed: add reset to tx_ring_setup callback
Introduce reset parameter to mtk_wed_tx_ring_setup signature.
This is a preliminary patch to add Wireless Ethernet Dispatcher reset
support.
Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
--- a/drivers/net/ethernet/mediatek/mtk_wed.c
+++ b/drivers/net/ethernet/mediatek/mtk_wed.c
@@ -1181,7 +1181,8 @@ mtk_wed_ring_alloc(struct mtk_wed_device
}
static int
-mtk_wed_wdma_rx_ring_setup(struct mtk_wed_device *dev, int idx, int size)
+mtk_wed_wdma_rx_ring_setup(struct mtk_wed_device *dev, int idx, int size,
+ bool reset)
{
u32 desc_size = sizeof(struct mtk_wdma_desc) * dev->hw->version;
struct mtk_wed_ring *wdma;
@@ -1190,8 +1191,8 @@ mtk_wed_wdma_rx_ring_setup(struct mtk_we
return -EINVAL;
wdma = &dev->rx_wdma[idx];
- if (mtk_wed_ring_alloc(dev, wdma, MTK_WED_WDMA_RING_SIZE, desc_size,
- true))
+ if (!reset && mtk_wed_ring_alloc(dev, wdma, MTK_WED_WDMA_RING_SIZE,
+ desc_size, true))
return -ENOMEM;
wdma_w32(dev, MTK_WDMA_RING_RX(idx) + MTK_WED_RING_OFS_BASE,
@@ -1389,7 +1390,7 @@ mtk_wed_start(struct mtk_wed_device *dev
for (i = 0; i < ARRAY_SIZE(dev->rx_wdma); i++)
if (!dev->rx_wdma[i].desc)
- mtk_wed_wdma_rx_ring_setup(dev, i, 16);
+ mtk_wed_wdma_rx_ring_setup(dev, i, 16, false);
mtk_wed_hw_init(dev);
mtk_wed_configure_irq(dev, irq_mask);
@@ -1498,7 +1499,8 @@ unlock:
}
static int
-mtk_wed_tx_ring_setup(struct mtk_wed_device *dev, int idx, void __iomem *regs)
+mtk_wed_tx_ring_setup(struct mtk_wed_device *dev, int idx, void __iomem *regs,
+ bool reset)
{
struct mtk_wed_ring *ring = &dev->tx_ring[idx];
@@ -1517,11 +1519,12 @@ mtk_wed_tx_ring_setup(struct mtk_wed_dev
if (WARN_ON(idx >= ARRAY_SIZE(dev->tx_ring)))
return -EINVAL;
- if (mtk_wed_ring_alloc(dev, ring, MTK_WED_TX_RING_SIZE,
- sizeof(*ring->desc), true))
+ if (!reset && mtk_wed_ring_alloc(dev, ring, MTK_WED_TX_RING_SIZE,
+ sizeof(*ring->desc), true))
return -ENOMEM;
- if (mtk_wed_wdma_rx_ring_setup(dev, idx, MTK_WED_WDMA_RING_SIZE))
+ if (mtk_wed_wdma_rx_ring_setup(dev, idx, MTK_WED_WDMA_RING_SIZE,
+ reset))
return -ENOMEM;
ring->reg_base = MTK_WED_RING_TX(idx);
--- a/include/linux/soc/mediatek/mtk_wed.h
+++ b/include/linux/soc/mediatek/mtk_wed.h
@@ -158,7 +158,7 @@ struct mtk_wed_device {
struct mtk_wed_ops {
int (*attach)(struct mtk_wed_device *dev);
int (*tx_ring_setup)(struct mtk_wed_device *dev, int ring,
- void __iomem *regs);
+ void __iomem *regs, bool reset);
int (*rx_ring_setup)(struct mtk_wed_device *dev, int ring,
void __iomem *regs);
int (*txfree_ring_setup)(struct mtk_wed_device *dev,
@@ -216,8 +216,8 @@ mtk_wed_get_rx_capa(struct mtk_wed_devic
#define mtk_wed_device_active(_dev) !!(_dev)->ops
#define mtk_wed_device_detach(_dev) (_dev)->ops->detach(_dev)
#define mtk_wed_device_start(_dev, _mask) (_dev)->ops->start(_dev, _mask)
-#define mtk_wed_device_tx_ring_setup(_dev, _ring, _regs) \
- (_dev)->ops->tx_ring_setup(_dev, _ring, _regs)
+#define mtk_wed_device_tx_ring_setup(_dev, _ring, _regs, _reset) \
+ (_dev)->ops->tx_ring_setup(_dev, _ring, _regs, _reset)
#define mtk_wed_device_txfree_ring_setup(_dev, _regs) \
(_dev)->ops->txfree_ring_setup(_dev, _regs)
#define mtk_wed_device_reg_read(_dev, _reg) \
@@ -243,7 +243,7 @@ static inline bool mtk_wed_device_active
}
#define mtk_wed_device_detach(_dev) do {} while (0)
#define mtk_wed_device_start(_dev, _mask) do {} while (0)
-#define mtk_wed_device_tx_ring_setup(_dev, _ring, _regs) -ENODEV
+#define mtk_wed_device_tx_ring_setup(_dev, _ring, _regs, _reset) -ENODEV
#define mtk_wed_device_txfree_ring_setup(_dev, _ring, _regs) -ENODEV
#define mtk_wed_device_reg_read(_dev, _reg) 0
#define mtk_wed_device_reg_write(_dev, _reg, _val) do {} while (0)

View File

@ -0,0 +1,103 @@
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Thu, 1 Dec 2022 16:26:53 +0100
Subject: [PATCH] net: ethernet: mtk_wed: fix sleep while atomic in
mtk_wed_wo_queue_refill
In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.
[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c
Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
--- a/drivers/net/ethernet/mediatek/mtk_wed_wo.c
+++ b/drivers/net/ethernet/mediatek/mtk_wed_wo.c
@@ -133,17 +133,18 @@ mtk_wed_wo_dequeue(struct mtk_wed_wo *wo
static int
mtk_wed_wo_queue_refill(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q,
- gfp_t gfp, bool rx)
+ bool rx)
{
enum dma_data_direction dir = rx ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
int n_buf = 0;
spin_lock_bh(&q->lock);
while (q->queued < q->n_desc) {
- void *buf = page_frag_alloc(&q->cache, q->buf_size, gfp);
struct mtk_wed_wo_queue_entry *entry;
dma_addr_t addr;
+ void *buf;
+ buf = page_frag_alloc(&q->cache, q->buf_size, GFP_ATOMIC);
if (!buf)
break;
@@ -215,7 +216,7 @@ mtk_wed_wo_rx_run_queue(struct mtk_wed_w
mtk_wed_mcu_rx_unsolicited_event(wo, skb);
}
- if (mtk_wed_wo_queue_refill(wo, q, GFP_ATOMIC, true)) {
+ if (mtk_wed_wo_queue_refill(wo, q, true)) {
u32 index = (q->head - 1) % q->n_desc;
mtk_wed_wo_queue_kick(wo, q, index);
@@ -432,7 +433,7 @@ mtk_wed_wo_hardware_init(struct mtk_wed_
if (ret)
goto error;
- mtk_wed_wo_queue_refill(wo, &wo->q_tx, GFP_KERNEL, false);
+ mtk_wed_wo_queue_refill(wo, &wo->q_tx, false);
mtk_wed_wo_queue_reset(wo, &wo->q_tx);
regs.desc_base = MTK_WED_WO_CCIF_DUMMY5;
@@ -446,7 +447,7 @@ mtk_wed_wo_hardware_init(struct mtk_wed_
if (ret)
goto error;
- mtk_wed_wo_queue_refill(wo, &wo->q_rx, GFP_KERNEL, true);
+ mtk_wed_wo_queue_refill(wo, &wo->q_rx, true);
mtk_wed_wo_queue_reset(wo, &wo->q_rx);
/* rx queue irqmask */

View File

@ -0,0 +1,52 @@
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Tue, 10 Jan 2023 10:31:26 +0100
Subject: [PATCH] net: ethernet: mtk_wed: get rid of queue lock for rx queue
Queue spinlock is currently held in mtk_wed_wo_queue_rx_clean and
mtk_wed_wo_queue_refill routines for MTK Wireless Ethernet Dispatcher
MCU rx queue. mtk_wed_wo_queue_refill() is running during initialization
and in rx tasklet while mtk_wed_wo_queue_rx_clean() is running in
mtk_wed_wo_hw_deinit() during hw de-init phase after rx tasklet has been
disabled. Since mtk_wed_wo_queue_rx_clean and mtk_wed_wo_queue_refill
routines can't run concurrently get rid of spinlock for mcu rx queue.
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/36ec3b729542ea60898471d890796f745479ba32.1673342990.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
--- a/drivers/net/ethernet/mediatek/mtk_wed_wo.c
+++ b/drivers/net/ethernet/mediatek/mtk_wed_wo.c
@@ -138,7 +138,6 @@ mtk_wed_wo_queue_refill(struct mtk_wed_w
enum dma_data_direction dir = rx ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
int n_buf = 0;
- spin_lock_bh(&q->lock);
while (q->queued < q->n_desc) {
struct mtk_wed_wo_queue_entry *entry;
dma_addr_t addr;
@@ -172,7 +171,6 @@ mtk_wed_wo_queue_refill(struct mtk_wed_w
q->queued++;
n_buf++;
}
- spin_unlock_bh(&q->lock);
return n_buf;
}
@@ -316,7 +314,6 @@ mtk_wed_wo_queue_rx_clean(struct mtk_wed
{
struct page *page;
- spin_lock_bh(&q->lock);
for (;;) {
void *buf = mtk_wed_wo_dequeue(wo, q, NULL, true);
@@ -325,7 +322,6 @@ mtk_wed_wo_queue_rx_clean(struct mtk_wed
skb_free_frag(buf);
}
- spin_unlock_bh(&q->lock);
if (!q->cache.va)
return;

View File

@ -0,0 +1,75 @@
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Thu, 12 Jan 2023 10:21:29 +0100
Subject: [PATCH] net: ethernet: mtk_wed: get rid of queue lock for tx queue
Similar to MTK Wireless Ethernet Dispatcher (WED) MCU rx queue,
we do not need to protect WED MCU tx queue with a spin lock since
the tx queue is accessed in the two following routines:
- mtk_wed_wo_queue_tx_skb():
it is run at initialization and during mt7915 normal operation.
Moreover MCU messages are serialized through MCU mutex.
- mtk_wed_wo_queue_tx_clean():
it runs just at mt7915 driver module unload when no more messages
are sent to the MCU.
Remove tx queue spinlock.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/7bd0337b2a13ab1a63673b7c03fd35206b3b284e.1673515140.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
--- a/drivers/net/ethernet/mediatek/mtk_wed_wo.c
+++ b/drivers/net/ethernet/mediatek/mtk_wed_wo.c
@@ -258,7 +258,6 @@ mtk_wed_wo_queue_alloc(struct mtk_wed_wo
int n_desc, int buf_size, int index,
struct mtk_wed_wo_queue_regs *regs)
{
- spin_lock_init(&q->lock);
q->regs = *regs;
q->n_desc = n_desc;
q->buf_size = buf_size;
@@ -290,7 +289,6 @@ mtk_wed_wo_queue_tx_clean(struct mtk_wed
struct page *page;
int i;
- spin_lock_bh(&q->lock);
for (i = 0; i < q->n_desc; i++) {
struct mtk_wed_wo_queue_entry *entry = &q->entry[i];
@@ -299,7 +297,6 @@ mtk_wed_wo_queue_tx_clean(struct mtk_wed
skb_free_frag(entry->buf);
entry->buf = NULL;
}
- spin_unlock_bh(&q->lock);
if (!q->cache.va)
return;
@@ -347,8 +344,6 @@ int mtk_wed_wo_queue_tx_skb(struct mtk_w
int ret = 0, index;
u32 ctrl;
- spin_lock_bh(&q->lock);
-
q->tail = mtk_wed_mmio_r32(wo, q->regs.dma_idx);
index = (q->head + 1) % q->n_desc;
if (q->tail == index) {
@@ -379,8 +374,6 @@ int mtk_wed_wo_queue_tx_skb(struct mtk_w
mtk_wed_wo_queue_kick(wo, q, q->head);
mtk_wed_wo_kickout(wo);
out:
- spin_unlock_bh(&q->lock);
-
dev_kfree_skb(skb);
return ret;
--- a/drivers/net/ethernet/mediatek/mtk_wed_wo.h
+++ b/drivers/net/ethernet/mediatek/mtk_wed_wo.h
@@ -211,7 +211,6 @@ struct mtk_wed_wo_queue {
struct mtk_wed_wo_queue_regs regs;
struct page_frag_cache cache;
- spinlock_t lock;
struct mtk_wed_wo_queue_desc *desc;
dma_addr_t desc_dma;

View File

@ -0,0 +1,70 @@
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Sat, 14 Jan 2023 18:01:28 +0100
Subject: [PATCH] net: ethernet: mtk_eth_soc: introduce mtk_hw_reset utility
routine
This is a preliminary patch to add Wireless Ethernet Dispatcher reset
support.
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Daniel Golle <daniel@makrotopia.org>
Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -3254,6 +3254,27 @@ static void mtk_set_mcr_max_rx(struct mt
mtk_w32(mac->hw, mcr_new, MTK_MAC_MCR(mac->id));
}
+static void mtk_hw_reset(struct mtk_eth *eth)
+{
+ u32 val;
+
+ if (MTK_HAS_CAPS(eth->soc->caps, MTK_NETSYS_V2)) {
+ regmap_write(eth->ethsys, ETHSYS_FE_RST_CHK_IDLE_EN, 0);
+ val = RSTCTRL_PPE0_V2;
+ } else {
+ val = RSTCTRL_PPE0;
+ }
+
+ if (MTK_HAS_CAPS(eth->soc->caps, MTK_RSTCTRL_PPE1))
+ val |= RSTCTRL_PPE1;
+
+ ethsys_reset(eth, RSTCTRL_ETH | RSTCTRL_FE | val);
+
+ if (MTK_HAS_CAPS(eth->soc->caps, MTK_NETSYS_V2))
+ regmap_write(eth->ethsys, ETHSYS_FE_RST_CHK_IDLE_EN,
+ 0x3ffffff);
+}
+
static int mtk_hw_init(struct mtk_eth *eth)
{
u32 dma_mask = ETHSYS_DMA_AG_MAP_PDMA | ETHSYS_DMA_AG_MAP_QDMA |
@@ -3293,22 +3314,9 @@ static int mtk_hw_init(struct mtk_eth *e
return 0;
}
- if (MTK_HAS_CAPS(eth->soc->caps, MTK_NETSYS_V2)) {
- regmap_write(eth->ethsys, ETHSYS_FE_RST_CHK_IDLE_EN, 0);
- val = RSTCTRL_PPE0_V2;
- } else {
- val = RSTCTRL_PPE0;
- }
-
- if (MTK_HAS_CAPS(eth->soc->caps, MTK_RSTCTRL_PPE1))
- val |= RSTCTRL_PPE1;
-
- ethsys_reset(eth, RSTCTRL_ETH | RSTCTRL_FE | val);
+ mtk_hw_reset(eth);
if (MTK_HAS_CAPS(eth->soc->caps, MTK_NETSYS_V2)) {
- regmap_write(eth->ethsys, ETHSYS_FE_RST_CHK_IDLE_EN,
- 0x3ffffff);
-
/* Set FE to PDMAv2 if necessary */
val = mtk_r32(eth, MTK_FE_GLO_MISC);
mtk_w32(eth, val | BIT(4), MTK_FE_GLO_MISC);

View File

@ -0,0 +1,107 @@
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Sat, 14 Jan 2023 18:01:29 +0100
Subject: [PATCH] net: ethernet: mtk_eth_soc: introduce mtk_hw_warm_reset
support
Introduce mtk_hw_warm_reset utility routine. This is a preliminary patch
to align reset procedure to vendor sdk and avoid to power down the chip
during hw reset.
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Daniel Golle <daniel@makrotopia.org>
Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -3275,7 +3275,54 @@ static void mtk_hw_reset(struct mtk_eth
0x3ffffff);
}
-static int mtk_hw_init(struct mtk_eth *eth)
+static u32 mtk_hw_reset_read(struct mtk_eth *eth)
+{
+ u32 val;
+
+ regmap_read(eth->ethsys, ETHSYS_RSTCTRL, &val);
+ return val;
+}
+
+static void mtk_hw_warm_reset(struct mtk_eth *eth)
+{
+ u32 rst_mask, val;
+
+ regmap_update_bits(eth->ethsys, ETHSYS_RSTCTRL, RSTCTRL_FE,
+ RSTCTRL_FE);
+ if (readx_poll_timeout_atomic(mtk_hw_reset_read, eth, val,
+ val & RSTCTRL_FE, 1, 1000)) {
+ dev_err(eth->dev, "warm reset failed\n");
+ mtk_hw_reset(eth);
+ return;
+ }
+
+ if (MTK_HAS_CAPS(eth->soc->caps, MTK_NETSYS_V2))
+ rst_mask = RSTCTRL_ETH | RSTCTRL_PPE0_V2;
+ else
+ rst_mask = RSTCTRL_ETH | RSTCTRL_PPE0;
+
+ if (MTK_HAS_CAPS(eth->soc->caps, MTK_RSTCTRL_PPE1))
+ rst_mask |= RSTCTRL_PPE1;
+
+ regmap_update_bits(eth->ethsys, ETHSYS_RSTCTRL, rst_mask, rst_mask);
+
+ udelay(1);
+ val = mtk_hw_reset_read(eth);
+ if (!(val & rst_mask))
+ dev_err(eth->dev, "warm reset stage0 failed %08x (%08x)\n",
+ val, rst_mask);
+
+ rst_mask |= RSTCTRL_FE;
+ regmap_update_bits(eth->ethsys, ETHSYS_RSTCTRL, rst_mask, ~rst_mask);
+
+ udelay(1);
+ val = mtk_hw_reset_read(eth);
+ if (val & rst_mask)
+ dev_err(eth->dev, "warm reset stage1 failed %08x (%08x)\n",
+ val, rst_mask);
+}
+
+static int mtk_hw_init(struct mtk_eth *eth, bool reset)
{
u32 dma_mask = ETHSYS_DMA_AG_MAP_PDMA | ETHSYS_DMA_AG_MAP_QDMA |
ETHSYS_DMA_AG_MAP_PPE;
@@ -3314,7 +3361,12 @@ static int mtk_hw_init(struct mtk_eth *e
return 0;
}
- mtk_hw_reset(eth);
+ msleep(100);
+
+ if (reset)
+ mtk_hw_warm_reset(eth);
+ else
+ mtk_hw_reset(eth);
if (MTK_HAS_CAPS(eth->soc->caps, MTK_NETSYS_V2)) {
/* Set FE to PDMAv2 if necessary */
@@ -3522,7 +3574,7 @@ static void mtk_pending_work(struct work
if (eth->dev->pins)
pinctrl_select_state(eth->dev->pins->p,
eth->dev->pins->default_state);
- mtk_hw_init(eth);
+ mtk_hw_init(eth, true);
/* restart DMA and enable IRQs */
for (i = 0; i < MTK_MAC_COUNT; i++) {
@@ -4114,7 +4166,7 @@ static int mtk_probe(struct platform_dev
eth->msg_enable = netif_msg_init(mtk_msg_level, MTK_DEFAULT_MSG_ENABLE);
INIT_WORK(&eth->pending_work, mtk_pending_work);
- err = mtk_hw_init(eth);
+ err = mtk_hw_init(eth, false);
if (err)
goto err_wed_exit;

View File

@ -0,0 +1,262 @@
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Sat, 14 Jan 2023 18:01:30 +0100
Subject: [PATCH] net: ethernet: mtk_eth_soc: align reset procedure to vendor
sdk
Avoid to power-down the ethernet chip during hw reset and align reset
procedure to vendor sdk.
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Daniel Golle <daniel@makrotopia.org>
Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -2842,14 +2842,29 @@ static void mtk_dma_free(struct mtk_eth
kfree(eth->scratch_head);
}
+static bool mtk_hw_reset_check(struct mtk_eth *eth)
+{
+ u32 val = mtk_r32(eth, MTK_INT_STATUS2);
+
+ return (val & MTK_FE_INT_FQ_EMPTY) || (val & MTK_FE_INT_RFIFO_UF) ||
+ (val & MTK_FE_INT_RFIFO_OV) || (val & MTK_FE_INT_TSO_FAIL) ||
+ (val & MTK_FE_INT_TSO_ALIGN) || (val & MTK_FE_INT_TSO_ILLEGAL);
+}
+
static void mtk_tx_timeout(struct net_device *dev, unsigned int txqueue)
{
struct mtk_mac *mac = netdev_priv(dev);
struct mtk_eth *eth = mac->hw;
+ if (test_bit(MTK_RESETTING, &eth->state))
+ return;
+
+ if (!mtk_hw_reset_check(eth))
+ return;
+
eth->netdev[mac->id]->stats.tx_errors++;
- netif_err(eth, tx_err, dev,
- "transmit timed out\n");
+ netif_err(eth, tx_err, dev, "transmit timed out\n");
+
schedule_work(&eth->pending_work);
}
@@ -3329,15 +3344,17 @@ static int mtk_hw_init(struct mtk_eth *e
const struct mtk_reg_map *reg_map = eth->soc->reg_map;
int i, val, ret;
- if (test_and_set_bit(MTK_HW_INIT, &eth->state))
+ if (!reset && test_and_set_bit(MTK_HW_INIT, &eth->state))
return 0;
- pm_runtime_enable(eth->dev);
- pm_runtime_get_sync(eth->dev);
+ if (!reset) {
+ pm_runtime_enable(eth->dev);
+ pm_runtime_get_sync(eth->dev);
- ret = mtk_clk_enable(eth);
- if (ret)
- goto err_disable_pm;
+ ret = mtk_clk_enable(eth);
+ if (ret)
+ goto err_disable_pm;
+ }
if (eth->ethsys)
regmap_update_bits(eth->ethsys, ETHSYS_DMA_AG_MAP, dma_mask,
@@ -3466,8 +3483,10 @@ static int mtk_hw_init(struct mtk_eth *e
return 0;
err_disable_pm:
- pm_runtime_put_sync(eth->dev);
- pm_runtime_disable(eth->dev);
+ if (!reset) {
+ pm_runtime_put_sync(eth->dev);
+ pm_runtime_disable(eth->dev);
+ }
return ret;
}
@@ -3546,30 +3565,53 @@ static int mtk_do_ioctl(struct net_devic
return -EOPNOTSUPP;
}
+static void mtk_prepare_for_reset(struct mtk_eth *eth)
+{
+ u32 val;
+ int i;
+
+ /* disabe FE P3 and P4 */
+ val = mtk_r32(eth, MTK_FE_GLO_CFG) | MTK_FE_LINK_DOWN_P3;
+ if (MTK_HAS_CAPS(eth->soc->caps, MTK_RSTCTRL_PPE1))
+ val |= MTK_FE_LINK_DOWN_P4;
+ mtk_w32(eth, val, MTK_FE_GLO_CFG);
+
+ /* adjust PPE configurations to prepare for reset */
+ for (i = 0; i < ARRAY_SIZE(eth->ppe); i++)
+ mtk_ppe_prepare_reset(eth->ppe[i]);
+
+ /* disable NETSYS interrupts */
+ mtk_w32(eth, 0, MTK_FE_INT_ENABLE);
+
+ /* force link down GMAC */
+ for (i = 0; i < 2; i++) {
+ val = mtk_r32(eth, MTK_MAC_MCR(i)) & ~MAC_MCR_FORCE_LINK;
+ mtk_w32(eth, val, MTK_MAC_MCR(i));
+ }
+}
+
static void mtk_pending_work(struct work_struct *work)
{
struct mtk_eth *eth = container_of(work, struct mtk_eth, pending_work);
- int err, i;
unsigned long restart = 0;
+ u32 val;
+ int i;
rtnl_lock();
-
- dev_dbg(eth->dev, "[%s][%d] reset\n", __func__, __LINE__);
set_bit(MTK_RESETTING, &eth->state);
+ mtk_prepare_for_reset(eth);
+
/* stop all devices to make sure that dma is properly shut down */
for (i = 0; i < MTK_MAC_COUNT; i++) {
- if (!eth->netdev[i])
+ if (!eth->netdev[i] || !netif_running(eth->netdev[i]))
continue;
+
mtk_stop(eth->netdev[i]);
__set_bit(i, &restart);
}
- dev_dbg(eth->dev, "[%s][%d] mtk_stop ends\n", __func__, __LINE__);
- /* restart underlying hardware such as power, clock, pin mux
- * and the connected phy
- */
- mtk_hw_deinit(eth);
+ usleep_range(15000, 16000);
if (eth->dev->pins)
pinctrl_select_state(eth->dev->pins->p,
@@ -3580,15 +3622,19 @@ static void mtk_pending_work(struct work
for (i = 0; i < MTK_MAC_COUNT; i++) {
if (!test_bit(i, &restart))
continue;
- err = mtk_open(eth->netdev[i]);
- if (err) {
+
+ if (mtk_open(eth->netdev[i])) {
netif_alert(eth, ifup, eth->netdev[i],
- "Driver up/down cycle failed, closing device.\n");
+ "Driver up/down cycle failed\n");
dev_close(eth->netdev[i]);
}
}
- dev_dbg(eth->dev, "[%s][%d] reset done\n", __func__, __LINE__);
+ /* enabe FE P3 and P4 */
+ val = mtk_r32(eth, MTK_FE_GLO_CFG) & ~MTK_FE_LINK_DOWN_P3;
+ if (MTK_HAS_CAPS(eth->soc->caps, MTK_RSTCTRL_PPE1))
+ val &= ~MTK_FE_LINK_DOWN_P4;
+ mtk_w32(eth, val, MTK_FE_GLO_CFG);
clear_bit(MTK_RESETTING, &eth->state);
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -72,12 +72,24 @@
#define MTK_HW_LRO_REPLACE_DELTA 1000
#define MTK_HW_LRO_SDL_REMAIN_ROOM 1522
+/* Frame Engine Global Configuration */
+#define MTK_FE_GLO_CFG 0x00
+#define MTK_FE_LINK_DOWN_P3 BIT(11)
+#define MTK_FE_LINK_DOWN_P4 BIT(12)
+
/* Frame Engine Global Reset Register */
#define MTK_RST_GL 0x04
#define RST_GL_PSE BIT(0)
/* Frame Engine Interrupt Status Register */
#define MTK_INT_STATUS2 0x08
+#define MTK_FE_INT_ENABLE 0x0c
+#define MTK_FE_INT_FQ_EMPTY BIT(8)
+#define MTK_FE_INT_TSO_FAIL BIT(12)
+#define MTK_FE_INT_TSO_ILLEGAL BIT(13)
+#define MTK_FE_INT_TSO_ALIGN BIT(14)
+#define MTK_FE_INT_RFIFO_OV BIT(18)
+#define MTK_FE_INT_RFIFO_UF BIT(19)
#define MTK_GDM1_AF BIT(28)
#define MTK_GDM2_AF BIT(29)
--- a/drivers/net/ethernet/mediatek/mtk_ppe.c
+++ b/drivers/net/ethernet/mediatek/mtk_ppe.c
@@ -710,6 +710,33 @@ int mtk_foe_entry_idle_time(struct mtk_p
return __mtk_foe_entry_idle_time(ppe, entry->data.ib1);
}
+int mtk_ppe_prepare_reset(struct mtk_ppe *ppe)
+{
+ if (!ppe)
+ return -EINVAL;
+
+ /* disable KA */
+ ppe_clear(ppe, MTK_PPE_TB_CFG, MTK_PPE_TB_CFG_KEEPALIVE);
+ ppe_clear(ppe, MTK_PPE_BIND_LMT1, MTK_PPE_NTU_KEEPALIVE);
+ ppe_w32(ppe, MTK_PPE_KEEPALIVE, 0);
+ usleep_range(10000, 11000);
+
+ /* set KA timer to maximum */
+ ppe_set(ppe, MTK_PPE_BIND_LMT1, MTK_PPE_NTU_KEEPALIVE);
+ ppe_w32(ppe, MTK_PPE_KEEPALIVE, 0xffffffff);
+
+ /* set KA tick select */
+ ppe_set(ppe, MTK_PPE_TB_CFG, MTK_PPE_TB_TICK_SEL);
+ ppe_set(ppe, MTK_PPE_TB_CFG, MTK_PPE_TB_CFG_KEEPALIVE);
+ usleep_range(10000, 11000);
+
+ /* disable scan mode */
+ ppe_clear(ppe, MTK_PPE_TB_CFG, MTK_PPE_TB_CFG_SCAN_MODE);
+ usleep_range(10000, 11000);
+
+ return mtk_ppe_wait_busy(ppe);
+}
+
struct mtk_ppe *mtk_ppe_init(struct mtk_eth *eth, void __iomem *base,
int version, int index)
{
--- a/drivers/net/ethernet/mediatek/mtk_ppe.h
+++ b/drivers/net/ethernet/mediatek/mtk_ppe.h
@@ -306,6 +306,7 @@ struct mtk_ppe *mtk_ppe_init(struct mtk_
void mtk_ppe_deinit(struct mtk_eth *eth);
void mtk_ppe_start(struct mtk_ppe *ppe);
int mtk_ppe_stop(struct mtk_ppe *ppe);
+int mtk_ppe_prepare_reset(struct mtk_ppe *ppe);
void __mtk_ppe_check_skb(struct mtk_ppe *ppe, struct sk_buff *skb, u16 hash);
--- a/drivers/net/ethernet/mediatek/mtk_ppe_regs.h
+++ b/drivers/net/ethernet/mediatek/mtk_ppe_regs.h
@@ -58,6 +58,12 @@
#define MTK_PPE_TB_CFG_SCAN_MODE GENMASK(17, 16)
#define MTK_PPE_TB_CFG_HASH_DEBUG GENMASK(19, 18)
#define MTK_PPE_TB_CFG_INFO_SEL BIT(20)
+#define MTK_PPE_TB_TICK_SEL BIT(24)
+
+#define MTK_PPE_BIND_LMT1 0x230
+#define MTK_PPE_NTU_KEEPALIVE GENMASK(23, 16)
+
+#define MTK_PPE_KEEPALIVE 0x234
enum {
MTK_PPE_SCAN_MODE_DISABLED,

View File

@ -0,0 +1,249 @@
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Sat, 14 Jan 2023 18:01:31 +0100
Subject: [PATCH] net: ethernet: mtk_eth_soc: add dma checks to
mtk_hw_reset_check
Introduce mtk_hw_check_dma_hang routine to monitor possible dma hangs.
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Daniel Golle <daniel@makrotopia.org>
Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -50,6 +50,7 @@ static const struct mtk_reg_map mtk_reg_
.delay_irq = 0x0a0c,
.irq_status = 0x0a20,
.irq_mask = 0x0a28,
+ .adma_rx_dbg0 = 0x0a38,
.int_grp = 0x0a50,
},
.qdma = {
@@ -79,6 +80,8 @@ static const struct mtk_reg_map mtk_reg_
[0] = 0x2800,
[1] = 0x2c00,
},
+ .pse_iq_sta = 0x0110,
+ .pse_oq_sta = 0x0118,
};
static const struct mtk_reg_map mt7628_reg_map = {
@@ -109,6 +112,7 @@ static const struct mtk_reg_map mt7986_r
.delay_irq = 0x620c,
.irq_status = 0x6220,
.irq_mask = 0x6228,
+ .adma_rx_dbg0 = 0x6238,
.int_grp = 0x6250,
},
.qdma = {
@@ -138,6 +142,8 @@ static const struct mtk_reg_map mt7986_r
[0] = 0x4800,
[1] = 0x4c00,
},
+ .pse_iq_sta = 0x0180,
+ .pse_oq_sta = 0x01a0,
};
/* strings used by ethtool */
@@ -3337,6 +3343,102 @@ static void mtk_hw_warm_reset(struct mtk
val, rst_mask);
}
+static bool mtk_hw_check_dma_hang(struct mtk_eth *eth)
+{
+ const struct mtk_reg_map *reg_map = eth->soc->reg_map;
+ bool gmac1_tx, gmac2_tx, gdm1_tx, gdm2_tx;
+ bool oq_hang, cdm1_busy, adma_busy;
+ bool wtx_busy, cdm_full, oq_free;
+ u32 wdidx, val, gdm1_fc, gdm2_fc;
+ bool qfsm_hang, qfwd_hang;
+ bool ret = false;
+
+ if (MTK_HAS_CAPS(eth->soc->caps, MTK_SOC_MT7628))
+ return false;
+
+ /* WDMA sanity checks */
+ wdidx = mtk_r32(eth, reg_map->wdma_base[0] + 0xc);
+
+ val = mtk_r32(eth, reg_map->wdma_base[0] + 0x204);
+ wtx_busy = FIELD_GET(MTK_TX_DMA_BUSY, val);
+
+ val = mtk_r32(eth, reg_map->wdma_base[0] + 0x230);
+ cdm_full = !FIELD_GET(MTK_CDM_TXFIFO_RDY, val);
+
+ oq_free = (!(mtk_r32(eth, reg_map->pse_oq_sta) & GENMASK(24, 16)) &&
+ !(mtk_r32(eth, reg_map->pse_oq_sta + 0x4) & GENMASK(8, 0)) &&
+ !(mtk_r32(eth, reg_map->pse_oq_sta + 0x10) & GENMASK(24, 16)));
+
+ if (wdidx == eth->reset.wdidx && wtx_busy && cdm_full && oq_free) {
+ if (++eth->reset.wdma_hang_count > 2) {
+ eth->reset.wdma_hang_count = 0;
+ ret = true;
+ }
+ goto out;
+ }
+
+ /* QDMA sanity checks */
+ qfsm_hang = !!mtk_r32(eth, reg_map->qdma.qtx_cfg + 0x234);
+ qfwd_hang = !mtk_r32(eth, reg_map->qdma.qtx_cfg + 0x308);
+
+ gdm1_tx = FIELD_GET(GENMASK(31, 16), mtk_r32(eth, MTK_FE_GDM1_FSM)) > 0;
+ gdm2_tx = FIELD_GET(GENMASK(31, 16), mtk_r32(eth, MTK_FE_GDM2_FSM)) > 0;
+ gmac1_tx = FIELD_GET(GENMASK(31, 24), mtk_r32(eth, MTK_MAC_FSM(0))) != 1;
+ gmac2_tx = FIELD_GET(GENMASK(31, 24), mtk_r32(eth, MTK_MAC_FSM(1))) != 1;
+ gdm1_fc = mtk_r32(eth, reg_map->gdm1_cnt + 0x24);
+ gdm2_fc = mtk_r32(eth, reg_map->gdm1_cnt + 0x64);
+
+ if (qfsm_hang && qfwd_hang &&
+ ((gdm1_tx && gmac1_tx && gdm1_fc < 1) ||
+ (gdm2_tx && gmac2_tx && gdm2_fc < 1))) {
+ if (++eth->reset.qdma_hang_count > 2) {
+ eth->reset.qdma_hang_count = 0;
+ ret = true;
+ }
+ goto out;
+ }
+
+ /* ADMA sanity checks */
+ oq_hang = !!(mtk_r32(eth, reg_map->pse_oq_sta) & GENMASK(8, 0));
+ cdm1_busy = !!(mtk_r32(eth, MTK_FE_CDM1_FSM) & GENMASK(31, 16));
+ adma_busy = !(mtk_r32(eth, reg_map->pdma.adma_rx_dbg0) & GENMASK(4, 0)) &&
+ !(mtk_r32(eth, reg_map->pdma.adma_rx_dbg0) & BIT(6));
+
+ if (oq_hang && cdm1_busy && adma_busy) {
+ if (++eth->reset.adma_hang_count > 2) {
+ eth->reset.adma_hang_count = 0;
+ ret = true;
+ }
+ goto out;
+ }
+
+ eth->reset.wdma_hang_count = 0;
+ eth->reset.qdma_hang_count = 0;
+ eth->reset.adma_hang_count = 0;
+out:
+ eth->reset.wdidx = wdidx;
+
+ return ret;
+}
+
+static void mtk_hw_reset_monitor_work(struct work_struct *work)
+{
+ struct delayed_work *del_work = to_delayed_work(work);
+ struct mtk_eth *eth = container_of(del_work, struct mtk_eth,
+ reset.monitor_work);
+
+ if (test_bit(MTK_RESETTING, &eth->state))
+ goto out;
+
+ /* DMA stuck checks */
+ if (mtk_hw_check_dma_hang(eth))
+ schedule_work(&eth->pending_work);
+
+out:
+ schedule_delayed_work(&eth->reset.monitor_work,
+ MTK_DMA_MONITOR_TIMEOUT);
+}
+
static int mtk_hw_init(struct mtk_eth *eth, bool reset)
{
u32 dma_mask = ETHSYS_DMA_AG_MAP_PDMA | ETHSYS_DMA_AG_MAP_QDMA |
@@ -3672,6 +3774,7 @@ static int mtk_cleanup(struct mtk_eth *e
mtk_unreg_dev(eth);
mtk_free_dev(eth);
cancel_work_sync(&eth->pending_work);
+ cancel_delayed_work_sync(&eth->reset.monitor_work);
return 0;
}
@@ -4099,6 +4202,7 @@ static int mtk_probe(struct platform_dev
eth->rx_dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
INIT_WORK(&eth->rx_dim.work, mtk_dim_rx);
+ INIT_DELAYED_WORK(&eth->reset.monitor_work, mtk_hw_reset_monitor_work);
eth->tx_dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
INIT_WORK(&eth->tx_dim.work, mtk_dim_tx);
@@ -4301,6 +4405,8 @@ static int mtk_probe(struct platform_dev
netif_napi_add(&eth->dummy_dev, &eth->rx_napi, mtk_napi_rx);
platform_set_drvdata(pdev, eth);
+ schedule_delayed_work(&eth->reset.monitor_work,
+ MTK_DMA_MONITOR_TIMEOUT);
return 0;
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -257,6 +257,8 @@
#define MTK_RX_DONE_INT_V2 BIT(14)
+#define MTK_CDM_TXFIFO_RDY BIT(7)
+
/* QDMA Interrupt grouping registers */
#define MTK_RLS_DONE_INT BIT(0)
@@ -542,6 +544,17 @@
#define MT7628_SDM_RBCNT (MT7628_SDM_OFFSET + 0x10c)
#define MT7628_SDM_CS_ERR (MT7628_SDM_OFFSET + 0x110)
+#define MTK_FE_CDM1_FSM 0x220
+#define MTK_FE_CDM2_FSM 0x224
+#define MTK_FE_CDM3_FSM 0x238
+#define MTK_FE_CDM4_FSM 0x298
+#define MTK_FE_CDM5_FSM 0x318
+#define MTK_FE_CDM6_FSM 0x328
+#define MTK_FE_GDM1_FSM 0x228
+#define MTK_FE_GDM2_FSM 0x22C
+
+#define MTK_MAC_FSM(x) (0x1010C + ((x) * 0x100))
+
struct mtk_rx_dma {
unsigned int rxd1;
unsigned int rxd2;
@@ -938,6 +951,7 @@ struct mtk_reg_map {
u32 delay_irq; /* delay interrupt */
u32 irq_status; /* interrupt status */
u32 irq_mask; /* interrupt mask */
+ u32 adma_rx_dbg0;
u32 int_grp;
} pdma;
struct {
@@ -964,6 +978,8 @@ struct mtk_reg_map {
u32 gdma_to_ppe;
u32 ppe_base;
u32 wdma_base[2];
+ u32 pse_iq_sta;
+ u32 pse_oq_sta;
};
/* struct mtk_eth_data - This is the structure holding all differences
@@ -1006,6 +1022,8 @@ struct mtk_soc_data {
} txrx;
};
+#define MTK_DMA_MONITOR_TIMEOUT msecs_to_jiffies(1000)
+
/* currently no SoC has more than 2 macs */
#define MTK_MAX_DEVS 2
@@ -1128,6 +1146,14 @@ struct mtk_eth {
struct rhashtable flow_table;
struct bpf_prog __rcu *prog;
+
+ struct {
+ struct delayed_work monitor_work;
+ u32 wdidx;
+ u8 wdma_hang_count;
+ u8 qdma_hang_count;
+ u8 adma_hang_count;
+ } reset;
};
/* struct mtk_mac - the structure that holds the info about the MACs of the

View File

@ -0,0 +1,124 @@
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Sat, 14 Jan 2023 18:01:32 +0100
Subject: [PATCH] net: ethernet: mtk_wed: add reset/reset_complete callbacks
Introduce reset and reset_complete wlan callback to schedule WLAN driver
reset when ethernet/wed driver is resetting.
Tested-by: Daniel Golle <daniel@makrotopia.org>
Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -3703,6 +3703,11 @@ static void mtk_pending_work(struct work
set_bit(MTK_RESETTING, &eth->state);
mtk_prepare_for_reset(eth);
+ mtk_wed_fe_reset();
+ /* Run again reset preliminary configuration in order to avoid any
+ * possible race during FE reset since it can run releasing RTNL lock.
+ */
+ mtk_prepare_for_reset(eth);
/* stop all devices to make sure that dma is properly shut down */
for (i = 0; i < MTK_MAC_COUNT; i++) {
@@ -3740,6 +3745,8 @@ static void mtk_pending_work(struct work
clear_bit(MTK_RESETTING, &eth->state);
+ mtk_wed_fe_reset_complete();
+
rtnl_unlock();
}
--- a/drivers/net/ethernet/mediatek/mtk_wed.c
+++ b/drivers/net/ethernet/mediatek/mtk_wed.c
@@ -205,6 +205,48 @@ mtk_wed_wo_reset(struct mtk_wed_device *
iounmap(reg);
}
+void mtk_wed_fe_reset(void)
+{
+ int i;
+
+ mutex_lock(&hw_lock);
+
+ for (i = 0; i < ARRAY_SIZE(hw_list); i++) {
+ struct mtk_wed_hw *hw = hw_list[i];
+ struct mtk_wed_device *dev = hw->wed_dev;
+ int err;
+
+ if (!dev || !dev->wlan.reset)
+ continue;
+
+ /* reset callback blocks until WLAN reset is completed */
+ err = dev->wlan.reset(dev);
+ if (err)
+ dev_err(dev->dev, "wlan reset failed: %d\n", err);
+ }
+
+ mutex_unlock(&hw_lock);
+}
+
+void mtk_wed_fe_reset_complete(void)
+{
+ int i;
+
+ mutex_lock(&hw_lock);
+
+ for (i = 0; i < ARRAY_SIZE(hw_list); i++) {
+ struct mtk_wed_hw *hw = hw_list[i];
+ struct mtk_wed_device *dev = hw->wed_dev;
+
+ if (!dev || !dev->wlan.reset_complete)
+ continue;
+
+ dev->wlan.reset_complete(dev);
+ }
+
+ mutex_unlock(&hw_lock);
+}
+
static struct mtk_wed_hw *
mtk_wed_assign(struct mtk_wed_device *dev)
{
--- a/drivers/net/ethernet/mediatek/mtk_wed.h
+++ b/drivers/net/ethernet/mediatek/mtk_wed.h
@@ -128,6 +128,8 @@ void mtk_wed_add_hw(struct device_node *
void mtk_wed_exit(void);
int mtk_wed_flow_add(int index);
void mtk_wed_flow_remove(int index);
+void mtk_wed_fe_reset(void);
+void mtk_wed_fe_reset_complete(void);
#else
static inline void
mtk_wed_add_hw(struct device_node *np, struct mtk_eth *eth,
@@ -147,6 +149,13 @@ static inline void mtk_wed_flow_remove(i
{
}
+static inline void mtk_wed_fe_reset(void)
+{
+}
+
+static inline void mtk_wed_fe_reset_complete(void)
+{
+}
#endif
#ifdef CONFIG_DEBUG_FS
--- a/include/linux/soc/mediatek/mtk_wed.h
+++ b/include/linux/soc/mediatek/mtk_wed.h
@@ -151,6 +151,8 @@ struct mtk_wed_device {
void (*release_rx_buf)(struct mtk_wed_device *wed);
void (*update_wo_rx_stats)(struct mtk_wed_device *wed,
struct mtk_wed_wo_rx_stats *stats);
+ int (*reset)(struct mtk_wed_device *wed);
+ void (*reset_complete)(struct mtk_wed_device *wed);
} wlan;
#endif
};

View File

@ -0,0 +1,106 @@
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Mon, 5 Dec 2022 12:34:42 +0100
Subject: [PATCH] net: ethernet: mtk_wed: add reset to rx_ring_setup callback
This patch adds reset parameter to mtk_wed_rx_ring_setup signature
in order to align rx_ring_setup callback to tx_ring_setup one introduced
in 'commit 23dca7a90017 ("net: ethernet: mtk_wed: add reset to
tx_ring_setup callback")'
Co-developed-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Sujuan Chen <sujuan.chen@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/29c6e7a5469e784406cf3e2920351d1207713d05.1670239984.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
--- a/drivers/net/ethernet/mediatek/mtk_wed.c
+++ b/drivers/net/ethernet/mediatek/mtk_wed.c
@@ -1252,7 +1252,8 @@ mtk_wed_wdma_rx_ring_setup(struct mtk_we
}
static int
-mtk_wed_wdma_tx_ring_setup(struct mtk_wed_device *dev, int idx, int size)
+mtk_wed_wdma_tx_ring_setup(struct mtk_wed_device *dev, int idx, int size,
+ bool reset)
{
u32 desc_size = sizeof(struct mtk_wdma_desc) * dev->hw->version;
struct mtk_wed_ring *wdma;
@@ -1261,8 +1262,8 @@ mtk_wed_wdma_tx_ring_setup(struct mtk_we
return -EINVAL;
wdma = &dev->tx_wdma[idx];
- if (mtk_wed_ring_alloc(dev, wdma, MTK_WED_WDMA_RING_SIZE, desc_size,
- true))
+ if (!reset && mtk_wed_ring_alloc(dev, wdma, MTK_WED_WDMA_RING_SIZE,
+ desc_size, true))
return -ENOMEM;
wdma_w32(dev, MTK_WDMA_RING_TX(idx) + MTK_WED_RING_OFS_BASE,
@@ -1272,6 +1273,9 @@ mtk_wed_wdma_tx_ring_setup(struct mtk_we
wdma_w32(dev, MTK_WDMA_RING_TX(idx) + MTK_WED_RING_OFS_CPU_IDX, 0);
wdma_w32(dev, MTK_WDMA_RING_TX(idx) + MTK_WED_RING_OFS_DMA_IDX, 0);
+ if (reset)
+ mtk_wed_ring_reset(wdma, MTK_WED_WDMA_RING_SIZE, true);
+
if (!idx) {
wed_w32(dev, MTK_WED_WDMA_RING_TX + MTK_WED_RING_OFS_BASE,
wdma->desc_phys);
@@ -1611,18 +1615,20 @@ mtk_wed_txfree_ring_setup(struct mtk_wed
}
static int
-mtk_wed_rx_ring_setup(struct mtk_wed_device *dev, int idx, void __iomem *regs)
+mtk_wed_rx_ring_setup(struct mtk_wed_device *dev, int idx, void __iomem *regs,
+ bool reset)
{
struct mtk_wed_ring *ring = &dev->rx_ring[idx];
if (WARN_ON(idx >= ARRAY_SIZE(dev->rx_ring)))
return -EINVAL;
- if (mtk_wed_ring_alloc(dev, ring, MTK_WED_RX_RING_SIZE,
- sizeof(*ring->desc), false))
+ if (!reset && mtk_wed_ring_alloc(dev, ring, MTK_WED_RX_RING_SIZE,
+ sizeof(*ring->desc), false))
return -ENOMEM;
- if (mtk_wed_wdma_tx_ring_setup(dev, idx, MTK_WED_WDMA_RING_SIZE))
+ if (mtk_wed_wdma_tx_ring_setup(dev, idx, MTK_WED_WDMA_RING_SIZE,
+ reset))
return -ENOMEM;
ring->reg_base = MTK_WED_RING_RX_DATA(idx);
--- a/include/linux/soc/mediatek/mtk_wed.h
+++ b/include/linux/soc/mediatek/mtk_wed.h
@@ -162,7 +162,7 @@ struct mtk_wed_ops {
int (*tx_ring_setup)(struct mtk_wed_device *dev, int ring,
void __iomem *regs, bool reset);
int (*rx_ring_setup)(struct mtk_wed_device *dev, int ring,
- void __iomem *regs);
+ void __iomem *regs, bool reset);
int (*txfree_ring_setup)(struct mtk_wed_device *dev,
void __iomem *regs);
int (*msg_update)(struct mtk_wed_device *dev, int cmd_id,
@@ -230,8 +230,8 @@ mtk_wed_get_rx_capa(struct mtk_wed_devic
(_dev)->ops->irq_get(_dev, _mask)
#define mtk_wed_device_irq_set_mask(_dev, _mask) \
(_dev)->ops->irq_set_mask(_dev, _mask)
-#define mtk_wed_device_rx_ring_setup(_dev, _ring, _regs) \
- (_dev)->ops->rx_ring_setup(_dev, _ring, _regs)
+#define mtk_wed_device_rx_ring_setup(_dev, _ring, _regs, _reset) \
+ (_dev)->ops->rx_ring_setup(_dev, _ring, _regs, _reset)
#define mtk_wed_device_ppe_check(_dev, _skb, _reason, _hash) \
(_dev)->ops->ppe_check(_dev, _skb, _reason, _hash)
#define mtk_wed_device_update_msg(_dev, _id, _msg, _len) \
@@ -251,7 +251,7 @@ static inline bool mtk_wed_device_active
#define mtk_wed_device_reg_write(_dev, _reg, _val) do {} while (0)
#define mtk_wed_device_irq_get(_dev, _mask) 0
#define mtk_wed_device_irq_set_mask(_dev, _mask) do {} while (0)
-#define mtk_wed_device_rx_ring_setup(_dev, _ring, _regs) -ENODEV
+#define mtk_wed_device_rx_ring_setup(_dev, _ring, _regs, _reset) -ENODEV
#define mtk_wed_device_ppe_check(_dev, _skb, _reason, _hash) do {} while (0)
#define mtk_wed_device_update_msg(_dev, _id, _msg, _len) -ENODEV
#define mtk_wed_device_stop(_dev) do {} while (0)

View File

@ -12,7 +12,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -939,7 +939,7 @@ static int mtk_init_fq_dma(struct mtk_et
@@ -945,7 +945,7 @@ static int mtk_init_fq_dma(struct mtk_et
{
const struct mtk_soc_data *soc = eth->soc;
dma_addr_t phy_ring_tail;
@ -21,7 +21,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
dma_addr_t dma_addr;
int i;
@@ -2203,19 +2203,25 @@ static int mtk_tx_alloc(struct mtk_eth *
@@ -2209,19 +2209,25 @@ static int mtk_tx_alloc(struct mtk_eth *
struct mtk_tx_ring *ring = &eth->tx_ring;
int i, sz = soc->txrx.txd_size;
struct mtk_tx_dma_v2 *txd;
@ -51,7 +51,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
u32 next_ptr = ring->phys + next * sz;
txd = ring->dma + i * sz;
@@ -2235,22 +2241,22 @@ static int mtk_tx_alloc(struct mtk_eth *
@@ -2241,22 +2247,22 @@ static int mtk_tx_alloc(struct mtk_eth *
* descriptors in ring->dma_pdma.
*/
if (!MTK_HAS_CAPS(soc->caps, MTK_QDMA)) {
@ -79,7 +79,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
ring->thresh = MAX_SKB_FRAGS;
/* make sure that all changes to the dma ring are flushed before we
@@ -2262,14 +2268,14 @@ static int mtk_tx_alloc(struct mtk_eth *
@@ -2268,14 +2274,14 @@ static int mtk_tx_alloc(struct mtk_eth *
mtk_w32(eth, ring->phys, soc->reg_map->qdma.ctx_ptr);
mtk_w32(eth, ring->phys, soc->reg_map->qdma.dtx_ptr);
mtk_w32(eth,
@ -96,7 +96,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
mtk_w32(eth, 0, MT7628_TX_CTX_IDX0);
mtk_w32(eth, MT7628_PST_DTX_IDX0, soc->reg_map->pdma.rst_idx);
}
@@ -2287,7 +2293,7 @@ static void mtk_tx_clean(struct mtk_eth
@@ -2293,7 +2299,7 @@ static void mtk_tx_clean(struct mtk_eth
int i;
if (ring->buf) {
@ -105,7 +105,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
mtk_tx_unmap(eth, &ring->buf[i], NULL, false);
kfree(ring->buf);
ring->buf = NULL;
@@ -2295,14 +2301,14 @@ static void mtk_tx_clean(struct mtk_eth
@@ -2301,14 +2307,14 @@ static void mtk_tx_clean(struct mtk_eth
if (ring->dma) {
dma_free_coherent(eth->dma_dev,
@ -122,7 +122,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
ring->dma_pdma, ring->phys_pdma);
ring->dma_pdma = NULL;
}
@@ -2824,7 +2830,7 @@ static void mtk_dma_free(struct mtk_eth
@@ -2830,7 +2836,7 @@ static void mtk_dma_free(struct mtk_eth
netdev_reset_queue(eth->netdev[i]);
if (eth->scratch_ring) {
dma_free_coherent(eth->dma_dev,

View File

@ -12,7 +12,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -4270,7 +4270,7 @@ static const struct mtk_soc_data mt7621_
@@ -4484,7 +4484,7 @@ static const struct mtk_soc_data mt7621_
.hw_features = MTK_HW_FEATURES,
.required_clks = MT7621_CLKS_BITMAP,
.required_pctl = false,
@ -21,7 +21,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
.hash_offset = 2,
.foe_entry_size = sizeof(struct mtk_foe_entry) - 16,
.txrx = {
@@ -4309,7 +4309,7 @@ static const struct mtk_soc_data mt7623_
@@ -4523,7 +4523,7 @@ static const struct mtk_soc_data mt7623_
.hw_features = MTK_HW_FEATURES,
.required_clks = MT7623_CLKS_BITMAP,
.required_pctl = true,

View File

@ -22,7 +22,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -54,6 +54,7 @@ static const struct mtk_reg_map mtk_reg_
@@ -55,6 +55,7 @@ static const struct mtk_reg_map mtk_reg_
},
.qdma = {
.qtx_cfg = 0x1800,
@ -30,7 +30,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
.rx_ptr = 0x1900,
.rx_cnt_cfg = 0x1904,
.qcrx_ptr = 0x1908,
@@ -61,6 +62,7 @@ static const struct mtk_reg_map mtk_reg_
@@ -62,6 +63,7 @@ static const struct mtk_reg_map mtk_reg_
.rst_idx = 0x1a08,
.delay_irq = 0x1a0c,
.fc_th = 0x1a10,
@ -38,7 +38,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
.int_grp = 0x1a20,
.hred = 0x1a44,
.ctx_ptr = 0x1b00,
@@ -113,6 +115,7 @@ static const struct mtk_reg_map mt7986_r
@@ -117,6 +119,7 @@ static const struct mtk_reg_map mt7986_r
},
.qdma = {
.qtx_cfg = 0x4400,
@ -46,7 +46,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
.rx_ptr = 0x4500,
.rx_cnt_cfg = 0x4504,
.qcrx_ptr = 0x4508,
@@ -130,6 +133,7 @@ static const struct mtk_reg_map mt7986_r
@@ -134,6 +137,7 @@ static const struct mtk_reg_map mt7986_r
.fq_tail = 0x4724,
.fq_count = 0x4728,
.fq_blen = 0x472c,
@ -54,7 +54,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
},
.gdm1_cnt = 0x1c00,
.gdma_to_ppe = 0x3333,
@@ -614,6 +618,75 @@ static void mtk_mac_link_down(struct phy
@@ -620,6 +624,75 @@ static void mtk_mac_link_down(struct phy
mtk_w32(mac->hw, mcr, MTK_MAC_MCR(mac->id));
}
@ -130,7 +130,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
static void mtk_mac_link_up(struct phylink_config *config,
struct phy_device *phy,
unsigned int mode, phy_interface_t interface,
@@ -639,6 +712,8 @@ static void mtk_mac_link_up(struct phyli
@@ -645,6 +718,8 @@ static void mtk_mac_link_up(struct phyli
break;
}
@ -139,7 +139,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
/* Configure duplex */
if (duplex == DUPLEX_FULL)
mcr |= MAC_MCR_FORCE_DPX;
@@ -1100,7 +1175,8 @@ static void mtk_tx_set_dma_desc_v1(struc
@@ -1106,7 +1181,8 @@ static void mtk_tx_set_dma_desc_v1(struc
WRITE_ONCE(desc->txd1, info->addr);
@ -149,7 +149,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
if (info->last)
data |= TX_DMA_LS0;
WRITE_ONCE(desc->txd3, data);
@@ -1134,9 +1210,6 @@ static void mtk_tx_set_dma_desc_v2(struc
@@ -1140,9 +1216,6 @@ static void mtk_tx_set_dma_desc_v2(struc
data |= TX_DMA_LS0;
WRITE_ONCE(desc->txd3, data);
@ -159,7 +159,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
data = (mac->id + 1) << TX_DMA_FPORT_SHIFT_V2; /* forward port */
data |= TX_DMA_SWC_V2 | QID_BITS_V2(info->qid);
WRITE_ONCE(desc->txd4, data);
@@ -1180,11 +1253,12 @@ static int mtk_tx_map(struct sk_buff *sk
@@ -1186,11 +1259,12 @@ static int mtk_tx_map(struct sk_buff *sk
.gso = gso,
.csum = skb->ip_summed == CHECKSUM_PARTIAL,
.vlan = skb_vlan_tag_present(skb),
@ -173,7 +173,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
struct mtk_mac *mac = netdev_priv(dev);
struct mtk_eth *eth = mac->hw;
const struct mtk_soc_data *soc = eth->soc;
@@ -1192,8 +1266,10 @@ static int mtk_tx_map(struct sk_buff *sk
@@ -1198,8 +1272,10 @@ static int mtk_tx_map(struct sk_buff *sk
struct mtk_tx_dma *itxd_pdma, *txd_pdma;
struct mtk_tx_buf *itx_buf, *tx_buf;
int i, n_desc = 1;
@ -184,7 +184,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
itxd = ring->next_free;
itxd_pdma = qdma_to_pdma(ring, itxd);
if (itxd == ring->last_free)
@@ -1242,7 +1318,7 @@ static int mtk_tx_map(struct sk_buff *sk
@@ -1248,7 +1324,7 @@ static int mtk_tx_map(struct sk_buff *sk
memset(&txd_info, 0, sizeof(struct mtk_tx_dma_desc_info));
txd_info.size = min_t(unsigned int, frag_size,
soc->txrx.dma_max_len);
@ -193,7 +193,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
txd_info.last = i == skb_shinfo(skb)->nr_frags - 1 &&
!(frag_size - txd_info.size);
txd_info.addr = skb_frag_dma_map(eth->dma_dev, frag,
@@ -1281,7 +1357,7 @@ static int mtk_tx_map(struct sk_buff *sk
@@ -1287,7 +1363,7 @@ static int mtk_tx_map(struct sk_buff *sk
txd_pdma->txd2 |= TX_DMA_LS1;
}
@ -202,7 +202,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
skb_tx_timestamp(skb);
ring->next_free = mtk_qdma_phys_to_virt(ring, txd->txd2);
@@ -1293,8 +1369,7 @@ static int mtk_tx_map(struct sk_buff *sk
@@ -1299,8 +1375,7 @@ static int mtk_tx_map(struct sk_buff *sk
wmb();
if (MTK_HAS_CAPS(soc->caps, MTK_QDMA)) {
@ -212,7 +212,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
mtk_w32(eth, txd->txd2, soc->reg_map->qdma.ctx_ptr);
} else {
int next_idx;
@@ -1363,7 +1438,7 @@ static void mtk_wake_queue(struct mtk_et
@@ -1369,7 +1444,7 @@ static void mtk_wake_queue(struct mtk_et
for (i = 0; i < MTK_MAC_COUNT; i++) {
if (!eth->netdev[i])
continue;
@ -221,7 +221,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
}
}
@@ -1387,7 +1462,7 @@ static netdev_tx_t mtk_start_xmit(struct
@@ -1393,7 +1468,7 @@ static netdev_tx_t mtk_start_xmit(struct
tx_num = mtk_cal_txd_req(eth, skb);
if (unlikely(atomic_read(&ring->free_count) <= tx_num)) {
@ -230,7 +230,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
netif_err(eth, tx_queued, dev,
"Tx Ring full when queue awake!\n");
spin_unlock(&eth->page_lock);
@@ -1413,7 +1488,7 @@ static netdev_tx_t mtk_start_xmit(struct
@@ -1419,7 +1494,7 @@ static netdev_tx_t mtk_start_xmit(struct
goto drop;
if (unlikely(atomic_read(&ring->free_count) <= ring->thresh))
@ -239,7 +239,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
spin_unlock(&eth->page_lock);
@@ -1580,10 +1655,12 @@ static int mtk_xdp_submit_frame(struct m
@@ -1586,10 +1661,12 @@ static int mtk_xdp_submit_frame(struct m
struct skb_shared_info *sinfo = xdp_get_shared_info_from_frame(xdpf);
const struct mtk_soc_data *soc = eth->soc;
struct mtk_tx_ring *ring = &eth->tx_ring;
@ -252,7 +252,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
};
int err, index = 0, n_desc = 1, nr_frags;
struct mtk_tx_buf *htx_buf, *tx_buf;
@@ -1633,6 +1710,7 @@ static int mtk_xdp_submit_frame(struct m
@@ -1639,6 +1716,7 @@ static int mtk_xdp_submit_frame(struct m
memset(&txd_info, 0, sizeof(struct mtk_tx_dma_desc_info));
txd_info.size = skb_frag_size(&sinfo->frags[index]);
txd_info.last = index + 1 == nr_frags;
@ -260,7 +260,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
data = skb_frag_address(&sinfo->frags[index]);
index++;
@@ -1987,8 +2065,46 @@ rx_done:
@@ -1993,8 +2071,46 @@ rx_done:
return done;
}
@ -308,7 +308,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
{
const struct mtk_reg_map *reg_map = eth->soc->reg_map;
struct mtk_tx_ring *ring = &eth->tx_ring;
@@ -2020,12 +2136,9 @@ static int mtk_poll_tx_qdma(struct mtk_e
@@ -2026,12 +2142,9 @@ static int mtk_poll_tx_qdma(struct mtk_e
break;
if (tx_buf->data != (void *)MTK_DMA_DUMMY_DESC) {
@ -323,7 +323,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
budget--;
}
mtk_tx_unmap(eth, tx_buf, &bq, true);
@@ -2044,7 +2157,7 @@ static int mtk_poll_tx_qdma(struct mtk_e
@@ -2050,7 +2163,7 @@ static int mtk_poll_tx_qdma(struct mtk_e
}
static int mtk_poll_tx_pdma(struct mtk_eth *eth, int budget,
@ -332,7 +332,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
{
struct mtk_tx_ring *ring = &eth->tx_ring;
struct mtk_tx_buf *tx_buf;
@@ -2062,12 +2175,8 @@ static int mtk_poll_tx_pdma(struct mtk_e
@@ -2068,12 +2181,8 @@ static int mtk_poll_tx_pdma(struct mtk_e
break;
if (tx_buf->data != (void *)MTK_DMA_DUMMY_DESC) {
@ -347,7 +347,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
budget--;
}
mtk_tx_unmap(eth, tx_buf, &bq, true);
@@ -2089,26 +2198,15 @@ static int mtk_poll_tx(struct mtk_eth *e
@@ -2095,26 +2204,15 @@ static int mtk_poll_tx(struct mtk_eth *e
{
struct mtk_tx_ring *ring = &eth->tx_ring;
struct dim_sample dim_sample = {};
@ -379,7 +379,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
dim_update_sample(eth->tx_events, eth->tx_packets, eth->tx_bytes,
&dim_sample);
@@ -2118,7 +2216,7 @@ static int mtk_poll_tx(struct mtk_eth *e
@@ -2124,7 +2222,7 @@ static int mtk_poll_tx(struct mtk_eth *e
(atomic_read(&ring->free_count) > ring->thresh))
mtk_wake_queue(eth);
@ -388,7 +388,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
}
static void mtk_handle_status_irq(struct mtk_eth *eth)
@@ -2204,6 +2302,7 @@ static int mtk_tx_alloc(struct mtk_eth *
@@ -2210,6 +2308,7 @@ static int mtk_tx_alloc(struct mtk_eth *
int i, sz = soc->txrx.txd_size;
struct mtk_tx_dma_v2 *txd;
int ring_size;
@ -396,7 +396,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
if (MTK_HAS_CAPS(soc->caps, MTK_QDMA))
ring_size = MTK_QDMA_RING_SIZE;
@@ -2271,8 +2370,25 @@ static int mtk_tx_alloc(struct mtk_eth *
@@ -2277,8 +2376,25 @@ static int mtk_tx_alloc(struct mtk_eth *
ring->phys + ((ring_size - 1) * sz),
soc->reg_map->qdma.crx_ptr);
mtk_w32(eth, ring->last_free_ptr, soc->reg_map->qdma.drx_ptr);
@ -424,7 +424,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
} else {
mtk_w32(eth, ring->phys_pdma, MT7628_TX_BASE_PTR0);
mtk_w32(eth, ring_size, MT7628_TX_MAX_CNT0);
@@ -2939,7 +3055,7 @@ static int mtk_start_dma(struct mtk_eth
@@ -2960,7 +3076,7 @@ static int mtk_start_dma(struct mtk_eth
if (MTK_HAS_CAPS(eth->soc->caps, MTK_NETSYS_V2))
val |= MTK_MUTLI_CNT | MTK_RESV_BUF |
MTK_WCOMP_EN | MTK_DMAD_WR_WDONE |
@ -433,7 +433,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
else
val |= MTK_RX_BT_32DWORDS;
mtk_w32(eth, val, reg_map->qdma.glo_cfg);
@@ -2985,6 +3101,45 @@ static void mtk_gdm_config(struct mtk_et
@@ -3006,6 +3122,45 @@ static void mtk_gdm_config(struct mtk_et
mtk_w32(eth, 0, MTK_RST_GL);
}
@ -479,7 +479,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
static int mtk_open(struct net_device *dev)
{
struct mtk_mac *mac = netdev_priv(dev);
@@ -3027,7 +3182,8 @@ static int mtk_open(struct net_device *d
@@ -3048,7 +3203,8 @@ static int mtk_open(struct net_device *d
refcount_inc(&eth->dma_refcnt);
phylink_start(mac->phylink);
@ -489,7 +489,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
return 0;
}
@@ -3562,8 +3718,12 @@ static int mtk_unreg_dev(struct mtk_eth
@@ -3774,8 +3930,12 @@ static int mtk_unreg_dev(struct mtk_eth
int i;
for (i = 0; i < MTK_MAC_COUNT; i++) {
@ -502,7 +502,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
unregister_netdev(eth->netdev[i]);
}
@@ -3779,6 +3939,23 @@ static int mtk_set_rxnfc(struct net_devi
@@ -3992,6 +4152,23 @@ static int mtk_set_rxnfc(struct net_devi
return ret;
}
@ -526,7 +526,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
static const struct ethtool_ops mtk_ethtool_ops = {
.get_link_ksettings = mtk_get_link_ksettings,
.set_link_ksettings = mtk_set_link_ksettings,
@@ -3814,6 +3991,7 @@ static const struct net_device_ops mtk_n
@@ -4027,6 +4204,7 @@ static const struct net_device_ops mtk_n
.ndo_setup_tc = mtk_eth_setup_tc,
.ndo_bpf = mtk_xdp,
.ndo_xdp_xmit = mtk_xdp_xmit,
@ -534,7 +534,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
};
static int mtk_add_mac(struct mtk_eth *eth, struct device_node *np)
@@ -3823,6 +4001,7 @@ static int mtk_add_mac(struct mtk_eth *e
@@ -4036,6 +4214,7 @@ static int mtk_add_mac(struct mtk_eth *e
struct phylink *phylink;
struct mtk_mac *mac;
int id, err;
@ -542,7 +542,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
if (!_id) {
dev_err(eth->dev, "missing mac id\n");
@@ -3840,7 +4019,10 @@ static int mtk_add_mac(struct mtk_eth *e
@@ -4053,7 +4232,10 @@ static int mtk_add_mac(struct mtk_eth *e
return -EINVAL;
}
@ -554,7 +554,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
if (!eth->netdev[id]) {
dev_err(eth->dev, "alloc_etherdev failed\n");
return -ENOMEM;
@@ -3937,6 +4119,11 @@ static int mtk_add_mac(struct mtk_eth *e
@@ -4150,6 +4332,11 @@ static int mtk_add_mac(struct mtk_eth *e
else
eth->netdev[id]->max_mtu = MTK_MAX_RX_LENGTH_2K - MTK_RX_ETH_HLEN;
@ -576,7 +576,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
#define MTK_QDMA_PAGE_SIZE 2048
#define MTK_MAX_RX_LENGTH 1536
#define MTK_MAX_RX_LENGTH_2K 2048
@@ -204,8 +205,26 @@
@@ -216,8 +217,26 @@
#define MTK_RING_MAX_AGG_CNT_H ((MTK_HW_LRO_MAX_AGG_CNT >> 6) & 0x3)
/* QDMA TX Queue Configuration Registers */
@ -603,7 +603,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
/* QDMA Global Configuration Register */
#define MTK_RX_2B_OFFSET BIT(31)
#define MTK_RX_BT_32DWORDS (3 << 11)
@@ -224,6 +243,7 @@
@@ -236,6 +255,7 @@
#define MTK_WCOMP_EN BIT(24)
#define MTK_RESV_BUF (0x40 << 16)
#define MTK_MUTLI_CNT (0x4 << 12)
@ -611,7 +611,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
/* QDMA Flow Control Register */
#define FC_THRES_DROP_MODE BIT(20)
@@ -252,8 +272,6 @@
@@ -266,8 +286,6 @@
#define MTK_STAT_OFFSET 0x40
/* QDMA TX NUM */
@ -620,7 +620,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
#define QID_BITS_V2(x) (((x) & 0x3f) << 16)
#define MTK_QDMA_GMAC2_QID 8
@@ -283,6 +301,7 @@
@@ -297,6 +315,7 @@
#define TX_DMA_PLEN0(x) (((x) & eth->soc->txrx.dma_max_len) << eth->soc->txrx.dma_len_offset)
#define TX_DMA_PLEN1(x) ((x) & eth->soc->txrx.dma_max_len)
#define TX_DMA_SWC BIT(14)
@ -628,7 +628,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
/* PDMA on MT7628 */
#define TX_DMA_DONE BIT(31)
@@ -931,6 +950,7 @@ struct mtk_reg_map {
@@ -957,6 +976,7 @@ struct mtk_reg_map {
} pdma;
struct {
u32 qtx_cfg; /* tx queue configuration */
@ -636,7 +636,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
u32 rx_ptr; /* rx base pointer */
u32 rx_cnt_cfg; /* rx max count configuration */
u32 qcrx_ptr; /* rx cpu pointer */
@@ -948,6 +968,7 @@ struct mtk_reg_map {
@@ -974,6 +994,7 @@ struct mtk_reg_map {
u32 fq_tail; /* fq tail pointer */
u32 fq_count; /* fq free page count */
u32 fq_blen; /* fq free page buffer length */
@ -644,7 +644,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
} qdma;
u32 gdm1_cnt;
u32 gdma_to_ppe;
@@ -1139,6 +1160,7 @@ struct mtk_mac {
@@ -1177,6 +1198,7 @@ struct mtk_mac {
__be32 hwlro_ip[MTK_MAX_LRO_IP_CNT];
int hwlro_ip_cnt;
unsigned int syscfg0;

View File

@ -9,9 +9,9 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
--- a/net/dsa/tag_mtk.c
+++ b/net/dsa/tag_mtk.c
@@ -33,6 +33,8 @@ static struct sk_buff *mtk_tag_xmit(stru
if (__skb_put_padto(skb, ETH_ZLEN + MTK_HDR_LEN, false))
return NULL;
@@ -25,6 +25,8 @@ static struct sk_buff *mtk_tag_xmit(stru
u8 xmit_tpid;
u8 *mtk_tag;
+ skb_set_queue_mapping(skb, dp->index);
+

View File

@ -47,7 +47,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
#define MTK_FOE_IB2_DEST_PORT_V2 GENMASK(12, 9)
#define MTK_FOE_IB2_MULTICAST_V2 BIT(13)
#define MTK_FOE_IB2_WDMA_WINFO_V2 BIT(19)
@@ -350,6 +352,8 @@ int mtk_foe_entry_set_pppoe(struct mtk_e
@@ -351,6 +353,8 @@ int mtk_foe_entry_set_pppoe(struct mtk_e
int sid);
int mtk_foe_entry_set_wdma(struct mtk_eth *eth, struct mtk_foe_entry *entry,
int wdma_idx, int txq, int bss, int wcid);

View File

@ -11,7 +11,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -896,7 +896,13 @@ enum mkt_eth_capabilities {
@@ -921,7 +921,13 @@ enum mkt_eth_capabilities {
#define MTK_MUX_GMAC12_TO_GEPHY_SGMII \
(MTK_ETH_MUX_GMAC12_TO_GEPHY_SGMII | MTK_MUX)

View File

@ -22,7 +22,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
#include "mtk_eth_soc.h"
#include "mtk_wed.h"
@@ -2016,16 +2017,22 @@ static int mtk_poll_rx(struct napi_struc
@@ -2022,16 +2023,22 @@ static int mtk_poll_rx(struct napi_struc
htons(RX_DMA_VPID(trxd.rxd4)),
RX_DMA_VID(trxd.rxd4));
} else if (trxd.rxd2 & RX_DMA_VTAG) {
@ -52,7 +52,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
}
skb_record_rx_queue(skb, 0);
@@ -2850,15 +2857,30 @@ static netdev_features_t mtk_fix_feature
@@ -2856,15 +2863,30 @@ static netdev_features_t mtk_fix_feature
static int mtk_set_features(struct net_device *dev, netdev_features_t features)
{
@ -88,7 +88,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
}
/* wait for DMA to finish whatever it is doing before we start using it again */
@@ -3140,11 +3162,45 @@ found:
@@ -3161,11 +3183,45 @@ found:
return NOTIFY_DONE;
}
@ -135,7 +135,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
err = phylink_of_phy_connect(mac->phylink, mac->of_node, 0);
if (err) {
@@ -3507,6 +3563,10 @@ static int mtk_hw_init(struct mtk_eth *e
@@ -3686,6 +3742,10 @@ static int mtk_hw_init(struct mtk_eth *e
*/
val = mtk_r32(eth, MTK_CDMQ_IG_CTRL);
mtk_w32(eth, val | MTK_CDMQ_STAG_EN, MTK_CDMQ_IG_CTRL);
@ -146,7 +146,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
/* Enable RX VLan Offloading */
mtk_w32(eth, 1, MTK_CDMP_EG_CTRL);
@@ -3710,6 +3770,12 @@ static int mtk_free_dev(struct mtk_eth *
@@ -3922,6 +3982,12 @@ static int mtk_free_dev(struct mtk_eth *
free_netdev(eth->netdev[i]);
}
@ -171,7 +171,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
#define MTK_QDMA_NUM_QUEUES 16
#define MTK_QDMA_PAGE_SIZE 2048
#define MTK_MAX_RX_LENGTH 1536
@@ -93,6 +96,9 @@
@@ -105,6 +108,9 @@
#define MTK_CDMQ_IG_CTRL 0x1400
#define MTK_CDMQ_STAG_EN BIT(0)
@ -181,7 +181,7 @@ Signed-off-by: Felix Fietkau <nbd@nbd.name>
/* CDMP Ingress Control Register */
#define MTK_CDMP_IG_CTRL 0x400
#define MTK_CDMP_STAG_EN BIT(0)
@@ -1140,6 +1146,8 @@ struct mtk_eth {
@@ -1170,6 +1176,8 @@ struct mtk_eth {
int ip_align;

View File

@ -0,0 +1,42 @@
From: =?UTF-8?q?Ar=C4=B1n=C3=A7=20=C3=9CNAL?= <arinc.unal@arinc9.com>
Date: Sat, 28 Jan 2023 12:42:32 +0300
Subject: [PATCH] net: ethernet: mtk_eth_soc: disable hardware DSA untagging
for second MAC
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
According to my tests on MT7621AT and MT7623NI SoCs, hardware DSA untagging
won't work on the second MAC. Therefore, disable this feature when the
second MAC of the MT7621 and MT7623 SoCs is being used.
Fixes: 2d7605a72906 ("net: ethernet: mtk_eth_soc: enable hardware DSA untagging")
Link: https://lore.kernel.org/netdev/6249fc14-b38a-c770-36b4-5af6d41c21d3@arinc9.com/
Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Link: https://lore.kernel.org/r/20230128094232.2451947-1-arinc.unal@arinc9.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -3199,7 +3199,8 @@ static int mtk_open(struct net_device *d
struct mtk_eth *eth = mac->hw;
int i, err;
- if (mtk_uses_dsa(dev) && !eth->prog) {
+ if ((mtk_uses_dsa(dev) && !eth->prog) &&
+ !(mac->id == 1 && MTK_HAS_CAPS(eth->soc->caps, MTK_GMAC1_TRGMII))) {
for (i = 0; i < ARRAY_SIZE(eth->dsa_meta); i++) {
struct metadata_dst *md_dst = eth->dsa_meta[i];
@@ -3216,7 +3217,8 @@ static int mtk_open(struct net_device *d
}
} else {
/* Hardware special tag parsing needs to be disabled if at least
- * one MAC does not use DSA.
+ * one MAC does not use DSA, or the second MAC of the MT7621 and
+ * MT7623 SoCs is being used.
*/
u32 val = mtk_r32(eth, MTK_CDMP_IG_CTRL);
val &= ~MTK_CDMP_STAG_EN;

View File

@ -0,0 +1,54 @@
From: =?UTF-8?q?Ar=C4=B1n=C3=A7=20=C3=9CNAL?= <arinc.unal@arinc9.com>
Date: Sun, 5 Feb 2023 20:53:31 +0300
Subject: [PATCH] net: ethernet: mtk_eth_soc: enable special tag when any MAC
uses DSA
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The special tag is only enabled when the first MAC uses DSA. However, it
must be enabled when any MAC uses DSA. Change the check accordingly.
This fixes hardware DSA untagging not working on the second MAC of the
MT7621 and MT7623 SoCs, and likely other SoCs too. Therefore, remove the
check that disables hardware DSA untagging for the second MAC of the MT7621
and MT7623 SoCs.
Fixes: a1f47752fd62 ("net: ethernet: mtk_eth_soc: disable hardware DSA untagging for second MAC")
Co-developed-by: Richard van Schagen <richard@routerhints.com>
Signed-off-by: Richard van Schagen <richard@routerhints.com>
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -3134,7 +3134,7 @@ static void mtk_gdm_config(struct mtk_et
val |= config;
- if (!i && eth->netdev[0] && netdev_uses_dsa(eth->netdev[0]))
+ if (eth->netdev[i] && netdev_uses_dsa(eth->netdev[i]))
val |= MTK_GDMA_SPECIAL_TAG;
mtk_w32(eth, val, MTK_GDMA_FWD_CFG(i));
@@ -3199,8 +3199,7 @@ static int mtk_open(struct net_device *d
struct mtk_eth *eth = mac->hw;
int i, err;
- if ((mtk_uses_dsa(dev) && !eth->prog) &&
- !(mac->id == 1 && MTK_HAS_CAPS(eth->soc->caps, MTK_GMAC1_TRGMII))) {
+ if (mtk_uses_dsa(dev) && !eth->prog) {
for (i = 0; i < ARRAY_SIZE(eth->dsa_meta); i++) {
struct metadata_dst *md_dst = eth->dsa_meta[i];
@@ -3217,8 +3216,7 @@ static int mtk_open(struct net_device *d
}
} else {
/* Hardware special tag parsing needs to be disabled if at least
- * one MAC does not use DSA, or the second MAC of the MT7621 and
- * MT7623 SoCs is being used.
+ * one MAC does not use DSA.
*/
u32 val = mtk_r32(eth, MTK_CDMP_IG_CTRL);
val &= ~MTK_CDMP_STAG_EN;

View File

@ -0,0 +1,129 @@
From: Vladimir Oltean <vladimir.oltean@nxp.com>
Date: Tue, 7 Feb 2023 12:30:27 +0200
Subject: [PATCH] net: ethernet: mtk_eth_soc: fix DSA TX tag hwaccel for switch
port 0
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Arınç reports that on his MT7621AT Unielec U7621-06 board and MT7623NI
Bananapi BPI-R2, packets received by the CPU over mt7530 switch port 0
(of which this driver acts as the DSA master) are not processed
correctly by software. More precisely, they arrive without a DSA tag
(in packet or in the hwaccel area - skb_metadata_dst()), so DSA cannot
demux them towards the switch's interface for port 0. Traffic from other
ports receives a skb_metadata_dst() with the correct port and is demuxed
properly.
Looking at mtk_poll_rx(), it becomes apparent that this driver uses the
skb vlan hwaccel area:
union {
u32 vlan_all;
struct {
__be16 vlan_proto;
__u16 vlan_tci;
};
};
as a temporary storage for the VLAN hwaccel tag, or the DSA hwaccel tag.
If this is a DSA master it's a DSA hwaccel tag, and finally clears up
the skb VLAN hwaccel header.
I'm guessing that the problem is the (mis)use of API.
skb_vlan_tag_present() looks like this:
#define skb_vlan_tag_present(__skb) (!!(__skb)->vlan_all)
So if both vlan_proto and vlan_tci are zeroes, skb_vlan_tag_present()
returns precisely false. I don't know for sure what is the format of the
DSA hwaccel tag, but I surely know that lowermost 3 bits of vlan_proto
are 0 when receiving from port 0:
unsigned int port = vlan_proto & GENMASK(2, 0);
If the RX descriptor has no other bits set to non-zero values in
RX_DMA_VTAG, then the call to __vlan_hwaccel_put_tag() will not, in
fact, make the subsequent skb_vlan_tag_present() return true, because
it's implemented like this:
static inline void __vlan_hwaccel_put_tag(struct sk_buff *skb,
__be16 vlan_proto, u16 vlan_tci)
{
skb->vlan_proto = vlan_proto;
skb->vlan_tci = vlan_tci;
}
What we need to do to fix this problem (assuming this is the problem) is
to stop using skb->vlan_all as temporary storage for driver affairs, and
just create some local variables that serve the same purpose, but
hopefully better. Instead of calling skb_vlan_tag_present(), let's look
at a boolean has_hwaccel_tag which we set to true when the RX DMA
descriptors have something. Disambiguate based on netdev_uses_dsa()
whether this is a VLAN or DSA hwaccel tag, and only call
__vlan_hwaccel_put_tag() if we're certain it's a VLAN tag.
Arınç confirms that the treatment works, so this validates the
assumption.
Link: https://lore.kernel.org/netdev/704f3a72-fc9e-714a-db54-272e17612637@arinc9.com/
Fixes: 2d7605a72906 ("net: ethernet: mtk_eth_soc: enable hardware DSA untagging")
Reported-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -1878,7 +1878,9 @@ static int mtk_poll_rx(struct napi_struc
while (done < budget) {
unsigned int pktlen, *rxdcsum;
+ bool has_hwaccel_tag = false;
struct net_device *netdev;
+ u16 vlan_proto, vlan_tci;
dma_addr_t dma_addr;
u32 hash, reason;
int mac = 0;
@@ -2018,27 +2020,29 @@ static int mtk_poll_rx(struct napi_struc
if (netdev->features & NETIF_F_HW_VLAN_CTAG_RX) {
if (MTK_HAS_CAPS(eth->soc->caps, MTK_NETSYS_V2)) {
- if (trxd.rxd3 & RX_DMA_VTAG_V2)
- __vlan_hwaccel_put_tag(skb,
- htons(RX_DMA_VPID(trxd.rxd4)),
- RX_DMA_VID(trxd.rxd4));
+ if (trxd.rxd3 & RX_DMA_VTAG_V2) {
+ vlan_proto = RX_DMA_VPID(trxd.rxd4);
+ vlan_tci = RX_DMA_VID(trxd.rxd4);
+ has_hwaccel_tag = true;
+ }
} else if (trxd.rxd2 & RX_DMA_VTAG) {
- __vlan_hwaccel_put_tag(skb, htons(RX_DMA_VPID(trxd.rxd3)),
- RX_DMA_VID(trxd.rxd3));
+ vlan_proto = RX_DMA_VPID(trxd.rxd3);
+ vlan_tci = RX_DMA_VID(trxd.rxd3);
+ has_hwaccel_tag = true;
}
}
/* When using VLAN untagging in combination with DSA, the
* hardware treats the MTK special tag as a VLAN and untags it.
*/
- if (skb_vlan_tag_present(skb) && netdev_uses_dsa(netdev)) {
- unsigned int port = ntohs(skb->vlan_proto) & GENMASK(2, 0);
+ if (has_hwaccel_tag && netdev_uses_dsa(netdev)) {
+ unsigned int port = vlan_proto & GENMASK(2, 0);
if (port < ARRAY_SIZE(eth->dsa_meta) &&
eth->dsa_meta[port])
skb_dst_set_noref(skb, &eth->dsa_meta[port]->dst);
-
- __vlan_hwaccel_clear_tag(skb);
+ } else if (has_hwaccel_tag) {
+ __vlan_hwaccel_put_tag(skb, htons(vlan_proto), vlan_tci);
}
skb_record_rx_queue(skb, 0);

View File

@ -0,0 +1,26 @@
From: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Date: Sun, 12 Feb 2023 07:51:51 +0100
Subject: [PATCH] net: ethernet: mtk_wed: No need to clear memory after a
dma_alloc_coherent() call
dma_alloc_coherent() already clears the allocated memory, there is no need
to explicitly call memset().
Moreover, it is likely that the size in the memset() is incorrect and
should be "size * sizeof(*ring->desc)".
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/d5acce7dd108887832c9719f62c7201b4c83b3fb.1676184599.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
--- a/drivers/net/ethernet/mediatek/mtk_wed.c
+++ b/drivers/net/ethernet/mediatek/mtk_wed.c
@@ -779,7 +779,6 @@ mtk_wed_rro_ring_alloc(struct mtk_wed_de
ring->desc_size = sizeof(*ring->desc);
ring->size = size;
- memset(ring->desc, 0, size);
return 0;
}

View File

@ -0,0 +1,61 @@
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Wed, 7 Dec 2022 15:04:54 +0100
Subject: [PATCH] net: ethernet: mtk_wed: fix some possible NULL pointer
dereferences
Fix possible NULL pointer dereference in mtk_wed_detach routine checking
wo pointer is properly allocated before running mtk_wed_wo_reset() and
mtk_wed_wo_deinit().
Even if it is just a theoretical issue at the moment check wo pointer is
not NULL in mtk_wed_mcu_msg_update.
Moreover, honor mtk_wed_mcu_send_msg return value in mtk_wed_wo_reset()
Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Fixes: 4c5de09eb0d0 ("net: ethernet: mtk_wed: add configure wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
--- a/drivers/net/ethernet/mediatek/mtk_wed.c
+++ b/drivers/net/ethernet/mediatek/mtk_wed.c
@@ -174,9 +174,10 @@ mtk_wed_wo_reset(struct mtk_wed_device *
mtk_wdma_tx_reset(dev);
mtk_wed_reset(dev, MTK_WED_RESET_WED);
- mtk_wed_mcu_send_msg(wo, MTK_WED_MODULE_ID_WO,
- MTK_WED_WO_CMD_CHANGE_STATE, &state,
- sizeof(state), false);
+ if (mtk_wed_mcu_send_msg(wo, MTK_WED_MODULE_ID_WO,
+ MTK_WED_WO_CMD_CHANGE_STATE, &state,
+ sizeof(state), false))
+ return;
if (readx_poll_timeout(mtk_wed_wo_read_status, dev, val,
val == MTK_WED_WOIF_DISABLE_DONE,
@@ -632,9 +633,11 @@ mtk_wed_detach(struct mtk_wed_device *de
mtk_wed_free_tx_rings(dev);
if (mtk_wed_get_rx_capa(dev)) {
- mtk_wed_wo_reset(dev);
+ if (hw->wed_wo)
+ mtk_wed_wo_reset(dev);
mtk_wed_free_rx_rings(dev);
- mtk_wed_wo_deinit(hw);
+ if (hw->wed_wo)
+ mtk_wed_wo_deinit(hw);
}
if (dev->wlan.bus_type == MTK_WED_BUS_PCIE) {
--- a/drivers/net/ethernet/mediatek/mtk_wed_mcu.c
+++ b/drivers/net/ethernet/mediatek/mtk_wed_mcu.c
@@ -207,6 +207,9 @@ int mtk_wed_mcu_msg_update(struct mtk_we
if (dev->hw->version == 1)
return 0;
+ if (WARN_ON(!wo))
+ return -ENODEV;
+
return mtk_wed_mcu_send_msg(wo, MTK_WED_MODULE_ID_WO, id, data, len,
true);
}

View File

@ -0,0 +1,58 @@
From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Wed, 7 Dec 2022 15:04:55 +0100
Subject: [PATCH] net: ethernet: mtk_wed: fix possible deadlock if
mtk_wed_wo_init fails
Introduce __mtk_wed_detach() in order to avoid a deadlock in
mtk_wed_attach routine if mtk_wed_wo_init fails since both
mtk_wed_attach and mtk_wed_detach run holding hw_lock mutex.
Fixes: 4c5de09eb0d0 ("net: ethernet: mtk_wed: add configure wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
--- a/drivers/net/ethernet/mediatek/mtk_wed.c
+++ b/drivers/net/ethernet/mediatek/mtk_wed.c
@@ -619,12 +619,10 @@ mtk_wed_deinit(struct mtk_wed_device *de
}
static void
-mtk_wed_detach(struct mtk_wed_device *dev)
+__mtk_wed_detach(struct mtk_wed_device *dev)
{
struct mtk_wed_hw *hw = dev->hw;
- mutex_lock(&hw_lock);
-
mtk_wed_deinit(dev);
mtk_wdma_rx_reset(dev);
@@ -657,6 +655,13 @@ mtk_wed_detach(struct mtk_wed_device *de
module_put(THIS_MODULE);
hw->wed_dev = NULL;
+}
+
+static void
+mtk_wed_detach(struct mtk_wed_device *dev)
+{
+ mutex_lock(&hw_lock);
+ __mtk_wed_detach(dev);
mutex_unlock(&hw_lock);
}
@@ -1538,8 +1543,10 @@ mtk_wed_attach(struct mtk_wed_device *de
ret = mtk_wed_wo_init(hw);
}
out:
- if (ret)
- mtk_wed_detach(dev);
+ if (ret) {
+ dev_err(dev->hw->dev, "failed to attach wed device\n");
+ __mtk_wed_detach(dev);
+ }
unlock:
mutex_unlock(&hw_lock);

View File

@ -0,0 +1,31 @@
From: Felix Fietkau <nbd@nbd.name>
Date: Fri, 24 Mar 2023 14:56:58 +0100
Subject: [PATCH] net: ethernet: mtk_eth_soc: fix tx throughput regression with
direct 1G links
Using the QDMA tx scheduler to throttle tx to line speed works fine for
switch ports, but apparently caused a regression on non-switch ports.
Based on a number of tests, it seems that this throttling can be safely
dropped without re-introducing the issues on switch ports that the
tx scheduling changes resolved.
Link: https://lore.kernel.org/netdev/trinity-92c3826f-c2c8-40af-8339-bc6d0d3ffea4-1678213958520@3c-app-gmx-bs16/
Fixes: f63959c7eec3 ("net: ethernet: mtk_eth_soc: implement multi-queue support for per-port queues")
Reported-by: Frank Wunderlich <frank-w@public-files.de>
Reported-by: Daniel Golle <daniel@makrotopia.org>
Tested-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
---
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -719,8 +719,6 @@ static void mtk_mac_link_up(struct phyli
break;
}
- mtk_set_queue_speed(mac->hw, mac->id, speed);
-
/* Configure duplex */
if (duplex == DUPLEX_FULL)
mcr |= MAC_MCR_FORCE_DPX;

View File

@ -0,0 +1,55 @@
From b6a709cb51f7bdc55c01cec886098a9753ce8c28 Mon Sep 17 00:00:00 2001
From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Date: Thu, 27 Oct 2022 14:10:42 +0100
Subject: [PATCH 01/10] net: mtk_eth_soc: add definitions for PCS
As a result of help from Frank Wunderlich to investigate and test, we
know a bit more about the PCS on the Mediatek platforms. Update the
definitions from this investigation.
This PCS appears similar, but not identical to the Lynx PCS.
Although not included in this patch, but for future reference, the PHY
ID registers at offset 4 read as 0x4d544950 'MTIP'.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/ethernet/mediatek/mtk_eth_soc.h | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -504,8 +504,10 @@
#define ETHSYS_DMA_AG_MAP_PPE BIT(2)
/* SGMII subsystem config registers */
-/* Register to auto-negotiation restart */
+/* BMCR (low 16) BMSR (high 16) */
#define SGMSYS_PCS_CONTROL_1 0x0
+#define SGMII_BMCR GENMASK(15, 0)
+#define SGMII_BMSR GENMASK(31, 16)
#define SGMII_AN_RESTART BIT(9)
#define SGMII_ISOLATE BIT(10)
#define SGMII_AN_ENABLE BIT(12)
@@ -515,13 +517,18 @@
#define SGMII_PCS_FAULT BIT(23)
#define SGMII_AN_EXPANSION_CLR BIT(30)
+#define SGMSYS_PCS_ADVERTISE 0x8
+#define SGMII_ADVERTISE GENMASK(15, 0)
+#define SGMII_LPA GENMASK(31, 16)
+
/* Register to programmable link timer, the unit in 2 * 8ns */
#define SGMSYS_PCS_LINK_TIMER 0x18
-#define SGMII_LINK_TIMER_DEFAULT (0x186a0 & GENMASK(19, 0))
+#define SGMII_LINK_TIMER_MASK GENMASK(19, 0)
+#define SGMII_LINK_TIMER_DEFAULT (0x186a0 & SGMII_LINK_TIMER_MASK)
/* Register to control remote fault */
#define SGMSYS_SGMII_MODE 0x20
-#define SGMII_IF_MODE_BIT0 BIT(0)
+#define SGMII_IF_MODE_SGMII BIT(0)
#define SGMII_SPEED_DUPLEX_AN BIT(1)
#define SGMII_SPEED_MASK GENMASK(3, 2)
#define SGMII_SPEED_10 FIELD_PREP(SGMII_SPEED_MASK, 0)

View File

@ -0,0 +1,74 @@
From 5cf7797526ee81bea0f627bccaa3d887f48f53e0 Mon Sep 17 00:00:00 2001
From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Date: Thu, 27 Oct 2022 14:10:47 +0100
Subject: [PATCH 02/10] net: mtk_eth_soc: eliminate unnecessary error handling
The functions called by the pcs_config() method always return zero, so
there is no point trying to handle an error from these functions. Make
these functions void, eliminate the "err" variable and simply return
zero from the pcs_config() function itself.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/ethernet/mediatek/mtk_sgmii.c | 18 ++++++------------
1 file changed, 6 insertions(+), 12 deletions(-)
--- a/drivers/net/ethernet/mediatek/mtk_sgmii.c
+++ b/drivers/net/ethernet/mediatek/mtk_sgmii.c
@@ -20,7 +20,7 @@ static struct mtk_pcs *pcs_to_mtk_pcs(st
}
/* For SGMII interface mode */
-static int mtk_pcs_setup_mode_an(struct mtk_pcs *mpcs)
+static void mtk_pcs_setup_mode_an(struct mtk_pcs *mpcs)
{
unsigned int val;
@@ -39,16 +39,13 @@ static int mtk_pcs_setup_mode_an(struct
regmap_read(mpcs->regmap, SGMSYS_QPHY_PWR_STATE_CTRL, &val);
val &= ~SGMII_PHYA_PWD;
regmap_write(mpcs->regmap, SGMSYS_QPHY_PWR_STATE_CTRL, val);
-
- return 0;
-
}
/* For 1000BASE-X and 2500BASE-X interface modes, which operate at a
* fixed speed.
*/
-static int mtk_pcs_setup_mode_force(struct mtk_pcs *mpcs,
- phy_interface_t interface)
+static void mtk_pcs_setup_mode_force(struct mtk_pcs *mpcs,
+ phy_interface_t interface)
{
unsigned int val;
@@ -73,8 +70,6 @@ static int mtk_pcs_setup_mode_force(stru
regmap_read(mpcs->regmap, SGMSYS_QPHY_PWR_STATE_CTRL, &val);
val &= ~SGMII_PHYA_PWD;
regmap_write(mpcs->regmap, SGMSYS_QPHY_PWR_STATE_CTRL, val);
-
- return 0;
}
static int mtk_pcs_config(struct phylink_pcs *pcs, unsigned int mode,
@@ -83,15 +78,14 @@ static int mtk_pcs_config(struct phylink
bool permit_pause_to_mac)
{
struct mtk_pcs *mpcs = pcs_to_mtk_pcs(pcs);
- int err = 0;
/* Setup SGMIISYS with the determined property */
if (interface != PHY_INTERFACE_MODE_SGMII)
- err = mtk_pcs_setup_mode_force(mpcs, interface);
+ mtk_pcs_setup_mode_force(mpcs, interface);
else if (phylink_autoneg_inband(mode))
- err = mtk_pcs_setup_mode_an(mpcs);
+ mtk_pcs_setup_mode_an(mpcs);
- return err;
+ return 0;
}
static void mtk_pcs_restart_an(struct phylink_pcs *pcs)

View File

@ -0,0 +1,46 @@
From c000dca098002da193b98099df051c9ead0cacb4 Mon Sep 17 00:00:00 2001
From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Date: Thu, 27 Oct 2022 14:10:52 +0100
Subject: [PATCH 03/10] net: mtk_eth_soc: add pcs_get_state() implementation
Add a pcs_get_state() implementation which uses the advertisements
to compute the resulting link modes, and BMSR contents to determine
negotiation and link status.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/ethernet/mediatek/mtk_sgmii.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
--- a/drivers/net/ethernet/mediatek/mtk_sgmii.c
+++ b/drivers/net/ethernet/mediatek/mtk_sgmii.c
@@ -19,6 +19,20 @@ static struct mtk_pcs *pcs_to_mtk_pcs(st
return container_of(pcs, struct mtk_pcs, pcs);
}
+static void mtk_pcs_get_state(struct phylink_pcs *pcs,
+ struct phylink_link_state *state)
+{
+ struct mtk_pcs *mpcs = pcs_to_mtk_pcs(pcs);
+ unsigned int bm, adv;
+
+ /* Read the BMSR and LPA */
+ regmap_read(mpcs->regmap, SGMSYS_PCS_CONTROL_1, &bm);
+ regmap_read(mpcs->regmap, SGMSYS_PCS_ADVERTISE, &adv);
+
+ phylink_mii_c22_pcs_decode_state(state, FIELD_GET(SGMII_BMSR, bm),
+ FIELD_GET(SGMII_LPA, adv));
+}
+
/* For SGMII interface mode */
static void mtk_pcs_setup_mode_an(struct mtk_pcs *mpcs)
{
@@ -117,6 +131,7 @@ static void mtk_pcs_link_up(struct phyli
}
static const struct phylink_pcs_ops mtk_pcs_ops = {
+ .pcs_get_state = mtk_pcs_get_state,
.pcs_config = mtk_pcs_config,
.pcs_an_restart = mtk_pcs_restart_an,
.pcs_link_up = mtk_pcs_link_up,

View File

@ -0,0 +1,130 @@
From 0d2351dc2768061689abd4de1529fa206bbd574e Mon Sep 17 00:00:00 2001
From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Date: Thu, 27 Oct 2022 14:10:58 +0100
Subject: [PATCH 04/10] net: mtk_eth_soc: convert mtk_sgmii to use
regmap_update_bits()
mtk_sgmii does a lot of read-modify-write operations, for which there
is a specific regmap function. Use this function instead of open-coding
the operations.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/ethernet/mediatek/mtk_sgmii.c | 61 ++++++++++-------------
1 file changed, 26 insertions(+), 35 deletions(-)
--- a/drivers/net/ethernet/mediatek/mtk_sgmii.c
+++ b/drivers/net/ethernet/mediatek/mtk_sgmii.c
@@ -36,23 +36,18 @@ static void mtk_pcs_get_state(struct phy
/* For SGMII interface mode */
static void mtk_pcs_setup_mode_an(struct mtk_pcs *mpcs)
{
- unsigned int val;
-
/* Setup the link timer and QPHY power up inside SGMIISYS */
regmap_write(mpcs->regmap, SGMSYS_PCS_LINK_TIMER,
SGMII_LINK_TIMER_DEFAULT);
- regmap_read(mpcs->regmap, SGMSYS_SGMII_MODE, &val);
- val |= SGMII_REMOTE_FAULT_DIS;
- regmap_write(mpcs->regmap, SGMSYS_SGMII_MODE, val);
-
- regmap_read(mpcs->regmap, SGMSYS_PCS_CONTROL_1, &val);
- val |= SGMII_AN_RESTART;
- regmap_write(mpcs->regmap, SGMSYS_PCS_CONTROL_1, val);
-
- regmap_read(mpcs->regmap, SGMSYS_QPHY_PWR_STATE_CTRL, &val);
- val &= ~SGMII_PHYA_PWD;
- regmap_write(mpcs->regmap, SGMSYS_QPHY_PWR_STATE_CTRL, val);
+ regmap_update_bits(mpcs->regmap, SGMSYS_SGMII_MODE,
+ SGMII_REMOTE_FAULT_DIS, SGMII_REMOTE_FAULT_DIS);
+
+ regmap_update_bits(mpcs->regmap, SGMSYS_PCS_CONTROL_1,
+ SGMII_AN_RESTART, SGMII_AN_RESTART);
+
+ regmap_update_bits(mpcs->regmap, SGMSYS_QPHY_PWR_STATE_CTRL,
+ SGMII_PHYA_PWD, 0);
}
/* For 1000BASE-X and 2500BASE-X interface modes, which operate at a
@@ -61,29 +56,26 @@ static void mtk_pcs_setup_mode_an(struct
static void mtk_pcs_setup_mode_force(struct mtk_pcs *mpcs,
phy_interface_t interface)
{
- unsigned int val;
+ unsigned int rgc3;
- regmap_read(mpcs->regmap, mpcs->ana_rgc3, &val);
- val &= ~RG_PHY_SPEED_MASK;
if (interface == PHY_INTERFACE_MODE_2500BASEX)
- val |= RG_PHY_SPEED_3_125G;
- regmap_write(mpcs->regmap, mpcs->ana_rgc3, val);
+ rgc3 = RG_PHY_SPEED_3_125G;
+
+ regmap_update_bits(mpcs->regmap, mpcs->ana_rgc3,
+ RG_PHY_SPEED_3_125G, rgc3);
/* Disable SGMII AN */
- regmap_read(mpcs->regmap, SGMSYS_PCS_CONTROL_1, &val);
- val &= ~SGMII_AN_ENABLE;
- regmap_write(mpcs->regmap, SGMSYS_PCS_CONTROL_1, val);
+ regmap_update_bits(mpcs->regmap, SGMSYS_PCS_CONTROL_1,
+ SGMII_AN_ENABLE, 0);
/* Set the speed etc but leave the duplex unchanged */
- regmap_read(mpcs->regmap, SGMSYS_SGMII_MODE, &val);
- val &= SGMII_DUPLEX_FULL | ~SGMII_IF_MODE_MASK;
- val |= SGMII_SPEED_1000;
- regmap_write(mpcs->regmap, SGMSYS_SGMII_MODE, val);
+ regmap_update_bits(mpcs->regmap, SGMSYS_SGMII_MODE,
+ SGMII_IF_MODE_MASK & ~SGMII_DUPLEX_FULL,
+ SGMII_SPEED_1000);
/* Release PHYA power down state */
- regmap_read(mpcs->regmap, SGMSYS_QPHY_PWR_STATE_CTRL, &val);
- val &= ~SGMII_PHYA_PWD;
- regmap_write(mpcs->regmap, SGMSYS_QPHY_PWR_STATE_CTRL, val);
+ regmap_update_bits(mpcs->regmap, SGMSYS_QPHY_PWR_STATE_CTRL,
+ SGMII_PHYA_PWD, 0);
}
static int mtk_pcs_config(struct phylink_pcs *pcs, unsigned int mode,
@@ -105,29 +97,28 @@ static int mtk_pcs_config(struct phylink
static void mtk_pcs_restart_an(struct phylink_pcs *pcs)
{
struct mtk_pcs *mpcs = pcs_to_mtk_pcs(pcs);
- unsigned int val;
- regmap_read(mpcs->regmap, SGMSYS_PCS_CONTROL_1, &val);
- val |= SGMII_AN_RESTART;
- regmap_write(mpcs->regmap, SGMSYS_PCS_CONTROL_1, val);
+ regmap_update_bits(mpcs->regmap, SGMSYS_PCS_CONTROL_1,
+ SGMII_AN_RESTART, SGMII_AN_RESTART);
}
static void mtk_pcs_link_up(struct phylink_pcs *pcs, unsigned int mode,
phy_interface_t interface, int speed, int duplex)
{
struct mtk_pcs *mpcs = pcs_to_mtk_pcs(pcs);
- unsigned int val;
+ unsigned int sgm_mode;
if (!phy_interface_mode_is_8023z(interface))
return;
/* SGMII force duplex setting */
- regmap_read(mpcs->regmap, SGMSYS_SGMII_MODE, &val);
- val &= ~SGMII_DUPLEX_FULL;
if (duplex == DUPLEX_FULL)
- val |= SGMII_DUPLEX_FULL;
+ sgm_mode = SGMII_DUPLEX_FULL;
+ else
+ sgm_mode = 0;
- regmap_write(mpcs->regmap, SGMSYS_SGMII_MODE, val);
+ regmap_update_bits(mpcs->regmap, SGMSYS_SGMII_MODE,
+ SGMII_DUPLEX_FULL, sgm_mode);
}
static const struct phylink_pcs_ops mtk_pcs_ops = {

View File

@ -0,0 +1,52 @@
From 12198c3a410fe69843e335c1bbf6d4c2a4d48e4e Mon Sep 17 00:00:00 2001
From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Date: Thu, 27 Oct 2022 14:11:03 +0100
Subject: [PATCH 05/10] net: mtk_eth_soc: add out of band forcing of speed and
duplex in pcs_link_up
Add support for forcing the link speed and duplex setting in the
pcs_link_up() method for out of band modes, which will be useful when
we finish converting the pcs_config() method. Until then, we still have
to force duplex for 802.3z modes to work correctly.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/ethernet/mediatek/mtk_sgmii.c | 28 ++++++++++++++---------
1 file changed, 17 insertions(+), 11 deletions(-)
--- a/drivers/net/ethernet/mediatek/mtk_sgmii.c
+++ b/drivers/net/ethernet/mediatek/mtk_sgmii.c
@@ -108,17 +108,23 @@ static void mtk_pcs_link_up(struct phyli
struct mtk_pcs *mpcs = pcs_to_mtk_pcs(pcs);
unsigned int sgm_mode;
- if (!phy_interface_mode_is_8023z(interface))
- return;
+ if (!phylink_autoneg_inband(mode) ||
+ phy_interface_mode_is_8023z(interface)) {
+ /* Force the speed and duplex setting */
+ if (speed == SPEED_10)
+ sgm_mode = SGMII_SPEED_10;
+ else if (speed == SPEED_100)
+ sgm_mode = SGMII_SPEED_100;
+ else
+ sgm_mode = SGMII_SPEED_1000;
- /* SGMII force duplex setting */
- if (duplex == DUPLEX_FULL)
- sgm_mode = SGMII_DUPLEX_FULL;
- else
- sgm_mode = 0;
+ if (duplex == DUPLEX_FULL)
+ sgm_mode |= SGMII_DUPLEX_FULL;
- regmap_update_bits(mpcs->regmap, SGMSYS_SGMII_MODE,
- SGMII_DUPLEX_FULL, sgm_mode);
+ regmap_update_bits(mpcs->regmap, SGMSYS_SGMII_MODE,
+ SGMII_DUPLEX_FULL | SGMII_SPEED_MASK,
+ sgm_mode);
+ }
}
static const struct phylink_pcs_ops mtk_pcs_ops = {

View File

@ -0,0 +1,48 @@
From 6f38fffe2179dd29612aea2c67c46ed6682b4e46 Mon Sep 17 00:00:00 2001
From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Date: Thu, 27 Oct 2022 14:11:08 +0100
Subject: [PATCH 06/10] net: mtk_eth_soc: move PHY power up
The PHY power up is common to both configuration paths, so move it into
the parent function. We need to do this for all serdes modes.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/ethernet/mediatek/mtk_sgmii.c | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)
--- a/drivers/net/ethernet/mediatek/mtk_sgmii.c
+++ b/drivers/net/ethernet/mediatek/mtk_sgmii.c
@@ -45,9 +45,6 @@ static void mtk_pcs_setup_mode_an(struct
regmap_update_bits(mpcs->regmap, SGMSYS_PCS_CONTROL_1,
SGMII_AN_RESTART, SGMII_AN_RESTART);
-
- regmap_update_bits(mpcs->regmap, SGMSYS_QPHY_PWR_STATE_CTRL,
- SGMII_PHYA_PWD, 0);
}
/* For 1000BASE-X and 2500BASE-X interface modes, which operate at a
@@ -72,10 +69,6 @@ static void mtk_pcs_setup_mode_force(str
regmap_update_bits(mpcs->regmap, SGMSYS_SGMII_MODE,
SGMII_IF_MODE_MASK & ~SGMII_DUPLEX_FULL,
SGMII_SPEED_1000);
-
- /* Release PHYA power down state */
- regmap_update_bits(mpcs->regmap, SGMSYS_QPHY_PWR_STATE_CTRL,
- SGMII_PHYA_PWD, 0);
}
static int mtk_pcs_config(struct phylink_pcs *pcs, unsigned int mode,
@@ -91,6 +84,10 @@ static int mtk_pcs_config(struct phylink
else if (phylink_autoneg_inband(mode))
mtk_pcs_setup_mode_an(mpcs);
+ /* Release PHYA power down state */
+ regmap_update_bits(mpcs->regmap, SGMSYS_QPHY_PWR_STATE_CTRL,
+ SGMII_PHYA_PWD, 0);
+
return 0;
}

View File

@ -0,0 +1,48 @@
From f752c0df13dfeb721c11d3debb79f08cf437344f Mon Sep 17 00:00:00 2001
From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Date: Thu, 27 Oct 2022 14:11:13 +0100
Subject: [PATCH 07/10] net: mtk_eth_soc: move interface speed selection
Move the selection of the underlying interface speed to the pcs_config
function, so we always program the interface speed.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/ethernet/mediatek/mtk_sgmii.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
--- a/drivers/net/ethernet/mediatek/mtk_sgmii.c
+++ b/drivers/net/ethernet/mediatek/mtk_sgmii.c
@@ -53,14 +53,6 @@ static void mtk_pcs_setup_mode_an(struct
static void mtk_pcs_setup_mode_force(struct mtk_pcs *mpcs,
phy_interface_t interface)
{
- unsigned int rgc3;
-
- if (interface == PHY_INTERFACE_MODE_2500BASEX)
- rgc3 = RG_PHY_SPEED_3_125G;
-
- regmap_update_bits(mpcs->regmap, mpcs->ana_rgc3,
- RG_PHY_SPEED_3_125G, rgc3);
-
/* Disable SGMII AN */
regmap_update_bits(mpcs->regmap, SGMSYS_PCS_CONTROL_1,
SGMII_AN_ENABLE, 0);
@@ -77,6 +69,16 @@ static int mtk_pcs_config(struct phylink
bool permit_pause_to_mac)
{
struct mtk_pcs *mpcs = pcs_to_mtk_pcs(pcs);
+ unsigned int rgc3;
+
+ if (interface == PHY_INTERFACE_MODE_2500BASEX)
+ rgc3 = RG_PHY_SPEED_3_125G;
+ else
+ rgc3 = 0;
+
+ /* Configure the underlying interface speed */
+ regmap_update_bits(mpcs->regmap, mpcs->ana_rgc3,
+ RG_PHY_SPEED_3_125G, rgc3);
/* Setup SGMIISYS with the determined property */
if (interface != PHY_INTERFACE_MODE_SGMII)

View File

@ -0,0 +1,52 @@
From c125c66ea71b9377ae2478c4f1b87b180cc5c6ef Mon Sep 17 00:00:00 2001
From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Date: Thu, 27 Oct 2022 14:11:18 +0100
Subject: [PATCH 08/10] net: mtk_eth_soc: add advertisement programming
Program the advertisement into the mtk PCS block.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/ethernet/mediatek/mtk_sgmii.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/mediatek/mtk_sgmii.c
+++ b/drivers/net/ethernet/mediatek/mtk_sgmii.c
@@ -70,16 +70,27 @@ static int mtk_pcs_config(struct phylink
{
struct mtk_pcs *mpcs = pcs_to_mtk_pcs(pcs);
unsigned int rgc3;
+ int advertise;
+ bool changed;
if (interface == PHY_INTERFACE_MODE_2500BASEX)
rgc3 = RG_PHY_SPEED_3_125G;
else
rgc3 = 0;
+ advertise = phylink_mii_c22_pcs_encode_advertisement(interface,
+ advertising);
+ if (advertise < 0)
+ return advertise;
+
/* Configure the underlying interface speed */
regmap_update_bits(mpcs->regmap, mpcs->ana_rgc3,
RG_PHY_SPEED_3_125G, rgc3);
+ /* Update the advertisement, noting whether it has changed */
+ regmap_update_bits_check(mpcs->regmap, SGMSYS_PCS_ADVERTISE,
+ SGMII_ADVERTISE, advertise, &changed);
+
/* Setup SGMIISYS with the determined property */
if (interface != PHY_INTERFACE_MODE_SGMII)
mtk_pcs_setup_mode_force(mpcs, interface);
@@ -90,7 +101,7 @@ static int mtk_pcs_config(struct phylink
regmap_update_bits(mpcs->regmap, SGMSYS_QPHY_PWR_STATE_CTRL,
SGMII_PHYA_PWD, 0);
- return 0;
+ return changed;
}
static void mtk_pcs_restart_an(struct phylink_pcs *pcs)

View File

@ -0,0 +1,63 @@
From 3027d89f87707e7f3e5b683e0d37a32afb5bde96 Mon Sep 17 00:00:00 2001
From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Date: Thu, 27 Oct 2022 14:11:23 +0100
Subject: [PATCH 09/10] net: mtk_eth_soc: move and correct link timer
programming
Program the link timer appropriately for the interface mode being
used, using the newly introduced phylink helper that provides the
nanosecond link timer interval.
The intervals are 1.6ms for SGMII based protocols and 10ms for
802.3z based protocols.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/ethernet/mediatek/mtk_sgmii.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
--- a/drivers/net/ethernet/mediatek/mtk_sgmii.c
+++ b/drivers/net/ethernet/mediatek/mtk_sgmii.c
@@ -36,10 +36,6 @@ static void mtk_pcs_get_state(struct phy
/* For SGMII interface mode */
static void mtk_pcs_setup_mode_an(struct mtk_pcs *mpcs)
{
- /* Setup the link timer and QPHY power up inside SGMIISYS */
- regmap_write(mpcs->regmap, SGMSYS_PCS_LINK_TIMER,
- SGMII_LINK_TIMER_DEFAULT);
-
regmap_update_bits(mpcs->regmap, SGMSYS_SGMII_MODE,
SGMII_REMOTE_FAULT_DIS, SGMII_REMOTE_FAULT_DIS);
@@ -69,8 +65,8 @@ static int mtk_pcs_config(struct phylink
bool permit_pause_to_mac)
{
struct mtk_pcs *mpcs = pcs_to_mtk_pcs(pcs);
+ int advertise, link_timer;
unsigned int rgc3;
- int advertise;
bool changed;
if (interface == PHY_INTERFACE_MODE_2500BASEX)
@@ -83,6 +79,10 @@ static int mtk_pcs_config(struct phylink
if (advertise < 0)
return advertise;
+ link_timer = phylink_get_link_timer_ns(interface);
+ if (link_timer < 0)
+ return link_timer;
+
/* Configure the underlying interface speed */
regmap_update_bits(mpcs->regmap, mpcs->ana_rgc3,
RG_PHY_SPEED_3_125G, rgc3);
@@ -91,6 +91,9 @@ static int mtk_pcs_config(struct phylink
regmap_update_bits_check(mpcs->regmap, SGMSYS_PCS_ADVERTISE,
SGMII_ADVERTISE, advertise, &changed);
+ /* Setup the link timer and QPHY power up inside SGMIISYS */
+ regmap_write(mpcs->regmap, SGMSYS_PCS_LINK_TIMER, link_timer / 2 / 8);
+
/* Setup SGMIISYS with the determined property */
if (interface != PHY_INTERFACE_MODE_SGMII)
mtk_pcs_setup_mode_force(mpcs, interface);

View File

@ -0,0 +1,132 @@
From 81b0f12a2a8a1699a7d49c3995e5f71e4ec018e6 Mon Sep 17 00:00:00 2001
From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Date: Thu, 27 Oct 2022 14:11:28 +0100
Subject: [PATCH 10/10] net: mtk_eth_soc: add support for in-band 802.3z
negotiation
As a result of help from Frank Wunderlich to investigate and test, we
now know how to program this PCS for in-band 802.3z negotiation. Add
support for this by moving the contents of the two functions into the
common mtk_pcs_config() function and adding the register settings for
802.3z negotiation.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/ethernet/mediatek/mtk_sgmii.c | 77 ++++++++++++-----------
1 file changed, 42 insertions(+), 35 deletions(-)
--- a/drivers/net/ethernet/mediatek/mtk_sgmii.c
+++ b/drivers/net/ethernet/mediatek/mtk_sgmii.c
@@ -33,41 +33,15 @@ static void mtk_pcs_get_state(struct phy
FIELD_GET(SGMII_LPA, adv));
}
-/* For SGMII interface mode */
-static void mtk_pcs_setup_mode_an(struct mtk_pcs *mpcs)
-{
- regmap_update_bits(mpcs->regmap, SGMSYS_SGMII_MODE,
- SGMII_REMOTE_FAULT_DIS, SGMII_REMOTE_FAULT_DIS);
-
- regmap_update_bits(mpcs->regmap, SGMSYS_PCS_CONTROL_1,
- SGMII_AN_RESTART, SGMII_AN_RESTART);
-}
-
-/* For 1000BASE-X and 2500BASE-X interface modes, which operate at a
- * fixed speed.
- */
-static void mtk_pcs_setup_mode_force(struct mtk_pcs *mpcs,
- phy_interface_t interface)
-{
- /* Disable SGMII AN */
- regmap_update_bits(mpcs->regmap, SGMSYS_PCS_CONTROL_1,
- SGMII_AN_ENABLE, 0);
-
- /* Set the speed etc but leave the duplex unchanged */
- regmap_update_bits(mpcs->regmap, SGMSYS_SGMII_MODE,
- SGMII_IF_MODE_MASK & ~SGMII_DUPLEX_FULL,
- SGMII_SPEED_1000);
-}
-
static int mtk_pcs_config(struct phylink_pcs *pcs, unsigned int mode,
phy_interface_t interface,
const unsigned long *advertising,
bool permit_pause_to_mac)
{
struct mtk_pcs *mpcs = pcs_to_mtk_pcs(pcs);
+ unsigned int rgc3, sgm_mode, bmcr;
int advertise, link_timer;
- unsigned int rgc3;
- bool changed;
+ bool changed, use_an;
if (interface == PHY_INTERFACE_MODE_2500BASEX)
rgc3 = RG_PHY_SPEED_3_125G;
@@ -83,6 +57,37 @@ static int mtk_pcs_config(struct phylink
if (link_timer < 0)
return link_timer;
+ /* Clearing IF_MODE_BIT0 switches the PCS to BASE-X mode, and
+ * we assume that fixes it's speed at bitrate = line rate (in
+ * other words, 1000Mbps or 2500Mbps).
+ */
+ if (interface == PHY_INTERFACE_MODE_SGMII) {
+ sgm_mode = SGMII_IF_MODE_SGMII;
+ if (phylink_autoneg_inband(mode)) {
+ sgm_mode |= SGMII_REMOTE_FAULT_DIS |
+ SGMII_SPEED_DUPLEX_AN;
+ use_an = true;
+ } else {
+ use_an = false;
+ }
+ } else if (phylink_autoneg_inband(mode)) {
+ /* 1000base-X or 2500base-X autoneg */
+ sgm_mode = SGMII_REMOTE_FAULT_DIS;
+ use_an = linkmode_test_bit(ETHTOOL_LINK_MODE_Autoneg_BIT,
+ advertising);
+ } else {
+ /* 1000base-X or 2500base-X without autoneg */
+ sgm_mode = 0;
+ use_an = false;
+ }
+
+ if (use_an) {
+ /* FIXME: Do we need to set AN_RESTART here? */
+ bmcr = SGMII_AN_RESTART | SGMII_AN_ENABLE;
+ } else {
+ bmcr = 0;
+ }
+
/* Configure the underlying interface speed */
regmap_update_bits(mpcs->regmap, mpcs->ana_rgc3,
RG_PHY_SPEED_3_125G, rgc3);
@@ -94,11 +99,14 @@ static int mtk_pcs_config(struct phylink
/* Setup the link timer and QPHY power up inside SGMIISYS */
regmap_write(mpcs->regmap, SGMSYS_PCS_LINK_TIMER, link_timer / 2 / 8);
- /* Setup SGMIISYS with the determined property */
- if (interface != PHY_INTERFACE_MODE_SGMII)
- mtk_pcs_setup_mode_force(mpcs, interface);
- else if (phylink_autoneg_inband(mode))
- mtk_pcs_setup_mode_an(mpcs);
+ /* Update the sgmsys mode register */
+ regmap_update_bits(mpcs->regmap, SGMSYS_SGMII_MODE,
+ SGMII_REMOTE_FAULT_DIS | SGMII_SPEED_DUPLEX_AN |
+ SGMII_IF_MODE_SGMII, sgm_mode);
+
+ /* Update the BMCR */
+ regmap_update_bits(mpcs->regmap, SGMSYS_PCS_CONTROL_1,
+ SGMII_AN_RESTART | SGMII_AN_ENABLE, bmcr);
/* Release PHYA power down state */
regmap_update_bits(mpcs->regmap, SGMSYS_QPHY_PWR_STATE_CTRL,
@@ -121,8 +129,7 @@ static void mtk_pcs_link_up(struct phyli
struct mtk_pcs *mpcs = pcs_to_mtk_pcs(pcs);
unsigned int sgm_mode;
- if (!phylink_autoneg_inband(mode) ||
- phy_interface_mode_is_8023z(interface)) {
+ if (!phylink_autoneg_inband(mode)) {
/* Force the speed and duplex setting */
if (speed == SPEED_10)
sgm_mode = SGMII_SPEED_10;

View File

@ -0,0 +1,119 @@
From 7ff82416de8295c61423ef6fd75f052d3837d2f7 Mon Sep 17 00:00:00 2001
From: Alexander Couzens <lynxis@fe80.eu>
Date: Wed, 1 Feb 2023 19:23:29 +0100
Subject: [PATCH 11/13] net: mediatek: sgmii: ensure the SGMII PHY is powered
down on configuration
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The code expect the PHY to be in power down which is only true after reset.
Allow changes of the SGMII parameters more than once.
Only power down when reconfiguring to avoid bouncing the link when there's
no reason to - based on code from Russell King.
There are cases when the SGMII_PHYA_PWD register contains 0x9 which
prevents SGMII from working. The SGMII still shows link but no traffic
can flow. Writing 0x0 to the PHYA_PWD register fix the issue. 0x0 was
taken from a good working state of the SGMII interface.
Fixes: 42c03844e93d ("net-next: mediatek: add support for MediaTek MT7622 SoC")
Suggested-by: Russell King (Oracle) <linux@armlinux.org.uk>
Signed-off-by: Alexander Couzens <lynxis@fe80.eu>
[ bmork: rebased and squashed into one patch ]
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Acked-by: Daniel Golle <daniel@makrotopia.org>
Tested-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/ethernet/mediatek/mtk_eth_soc.h | 2 ++
drivers/net/ethernet/mediatek/mtk_sgmii.c | 39 +++++++++++++++------
2 files changed, 30 insertions(+), 11 deletions(-)
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -1073,11 +1073,13 @@ struct mtk_soc_data {
* @regmap: The register map pointing at the range used to setup
* SGMII modes
* @ana_rgc3: The offset refers to register ANA_RGC3 related to regmap
+ * @interface: Currently configured interface mode
* @pcs: Phylink PCS structure
*/
struct mtk_pcs {
struct regmap *regmap;
u32 ana_rgc3;
+ phy_interface_t interface;
struct phylink_pcs pcs;
};
--- a/drivers/net/ethernet/mediatek/mtk_sgmii.c
+++ b/drivers/net/ethernet/mediatek/mtk_sgmii.c
@@ -43,11 +43,6 @@ static int mtk_pcs_config(struct phylink
int advertise, link_timer;
bool changed, use_an;
- if (interface == PHY_INTERFACE_MODE_2500BASEX)
- rgc3 = RG_PHY_SPEED_3_125G;
- else
- rgc3 = 0;
-
advertise = phylink_mii_c22_pcs_encode_advertisement(interface,
advertising);
if (advertise < 0)
@@ -88,9 +83,22 @@ static int mtk_pcs_config(struct phylink
bmcr = 0;
}
- /* Configure the underlying interface speed */
- regmap_update_bits(mpcs->regmap, mpcs->ana_rgc3,
- RG_PHY_SPEED_3_125G, rgc3);
+ if (mpcs->interface != interface) {
+ /* PHYA power down */
+ regmap_update_bits(mpcs->regmap, SGMSYS_QPHY_PWR_STATE_CTRL,
+ SGMII_PHYA_PWD, SGMII_PHYA_PWD);
+
+ if (interface == PHY_INTERFACE_MODE_2500BASEX)
+ rgc3 = RG_PHY_SPEED_3_125G;
+ else
+ rgc3 = 0;
+
+ /* Configure the underlying interface speed */
+ regmap_update_bits(mpcs->regmap, mpcs->ana_rgc3,
+ RG_PHY_SPEED_3_125G, rgc3);
+
+ mpcs->interface = interface;
+ }
/* Update the advertisement, noting whether it has changed */
regmap_update_bits_check(mpcs->regmap, SGMSYS_PCS_ADVERTISE,
@@ -108,9 +116,17 @@ static int mtk_pcs_config(struct phylink
regmap_update_bits(mpcs->regmap, SGMSYS_PCS_CONTROL_1,
SGMII_AN_RESTART | SGMII_AN_ENABLE, bmcr);
- /* Release PHYA power down state */
- regmap_update_bits(mpcs->regmap, SGMSYS_QPHY_PWR_STATE_CTRL,
- SGMII_PHYA_PWD, 0);
+ /* Release PHYA power down state
+ * Only removing bit SGMII_PHYA_PWD isn't enough.
+ * There are cases when the SGMII_PHYA_PWD register contains 0x9 which
+ * prevents SGMII from working. The SGMII still shows link but no traffic
+ * can flow. Writing 0x0 to the PHYA_PWD register fix the issue. 0x0 was
+ * taken from a good working state of the SGMII interface.
+ * Unknown how much the QPHY needs but it is racy without a sleep.
+ * Tested on mt7622 & mt7986.
+ */
+ usleep_range(50, 100);
+ regmap_write(mpcs->regmap, SGMSYS_QPHY_PWR_STATE_CTRL, 0);
return changed;
}
@@ -171,6 +187,7 @@ int mtk_sgmii_init(struct mtk_sgmii *ss,
return PTR_ERR(ss->pcs[i].regmap);
ss->pcs[i].pcs.ops = &mtk_pcs_ops;
+ ss->pcs[i].interface = PHY_INTERFACE_MODE_NA;
}
return 0;

View File

@ -0,0 +1,52 @@
From 9d32637122de88f1ef614c29703f0e050cad342e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= <bjorn@mork.no>
Date: Wed, 1 Feb 2023 19:23:30 +0100
Subject: [PATCH 12/13] net: mediatek: sgmii: fix duplex configuration
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The logic of the duplex bit is inverted. Setting it means half
duplex, not full duplex.
Fix and rename macro to avoid confusion.
Fixes: 7e538372694b ("net: ethernet: mediatek: Re-add support SGMII")
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Acked-by: Daniel Golle <daniel@makrotopia.org>
Tested-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/ethernet/mediatek/mtk_eth_soc.h | 2 +-
drivers/net/ethernet/mediatek/mtk_sgmii.c | 6 +++---
2 files changed, 4 insertions(+), 4 deletions(-)
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -534,7 +534,7 @@
#define SGMII_SPEED_10 FIELD_PREP(SGMII_SPEED_MASK, 0)
#define SGMII_SPEED_100 FIELD_PREP(SGMII_SPEED_MASK, 1)
#define SGMII_SPEED_1000 FIELD_PREP(SGMII_SPEED_MASK, 2)
-#define SGMII_DUPLEX_FULL BIT(4)
+#define SGMII_DUPLEX_HALF BIT(4)
#define SGMII_IF_MODE_BIT5 BIT(5)
#define SGMII_REMOTE_FAULT_DIS BIT(8)
#define SGMII_CODE_SYNC_SET_VAL BIT(9)
--- a/drivers/net/ethernet/mediatek/mtk_sgmii.c
+++ b/drivers/net/ethernet/mediatek/mtk_sgmii.c
@@ -154,11 +154,11 @@ static void mtk_pcs_link_up(struct phyli
else
sgm_mode = SGMII_SPEED_1000;
- if (duplex == DUPLEX_FULL)
- sgm_mode |= SGMII_DUPLEX_FULL;
+ if (duplex != DUPLEX_FULL)
+ sgm_mode |= SGMII_DUPLEX_HALF;
regmap_update_bits(mpcs->regmap, SGMSYS_SGMII_MODE,
- SGMII_DUPLEX_FULL | SGMII_SPEED_MASK,
+ SGMII_DUPLEX_HALF | SGMII_SPEED_MASK,
sgm_mode);
}
}

View File

@ -0,0 +1,33 @@
From 3337a6e04ddf2923a1bdcf3d31b3b52412bf82dd Mon Sep 17 00:00:00 2001
From: Alexander Couzens <lynxis@fe80.eu>
Date: Wed, 1 Feb 2023 19:23:31 +0100
Subject: [PATCH 13/13] mtk_sgmii: enable PCS polling to allow SFP work
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Currently there is no IRQ handling (even the SGMII supports it).
Enable polling to support SFP ports.
Fixes: 14a44ab0330d ("net: mtk_eth_soc: partially convert to phylink_pcs")
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Alexander Couzens <lynxis@fe80.eu>
[ bmork: changed "1" => "true" ]
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Acked-by: Daniel Golle <daniel@makrotopia.org>
Tested-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/ethernet/mediatek/mtk_sgmii.c | 1 +
1 file changed, 1 insertion(+)
--- a/drivers/net/ethernet/mediatek/mtk_sgmii.c
+++ b/drivers/net/ethernet/mediatek/mtk_sgmii.c
@@ -187,6 +187,7 @@ int mtk_sgmii_init(struct mtk_sgmii *ss,
return PTR_ERR(ss->pcs[i].regmap);
ss->pcs[i].pcs.ops = &mtk_pcs_ops;
+ ss->pcs[i].pcs.poll = true;
ss->pcs[i].interface = PHY_INTERFACE_MODE_NA;
}

View File

@ -0,0 +1,48 @@
From 611e2dabb4b3243d176739fd6a5a34d007fa3f86 Mon Sep 17 00:00:00 2001
From: Daniel Golle <daniel@makrotopia.org>
Date: Tue, 14 Mar 2023 00:34:26 +0000
Subject: [PATCH 1/2] net: ethernet: mtk_eth_soc: reset PCS state
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Reset the internal PCS state machine when changing interface mode.
This prevents confusing the state machine when changing interface
modes, e.g. from SGMII to 2500Base-X or vice-versa.
Fixes: 7e538372694b ("net: ethernet: mediatek: Re-add support SGMII")
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/ethernet/mediatek/mtk_eth_soc.h | 4 ++++
drivers/net/ethernet/mediatek/mtk_sgmii.c | 4 ++++
2 files changed, 8 insertions(+)
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -542,6 +542,10 @@
#define SGMII_SEND_AN_ERROR_EN BIT(11)
#define SGMII_IF_MODE_MASK GENMASK(5, 1)
+/* Register to reset SGMII design */
+#define SGMII_RESERVED_0 0x34
+#define SGMII_SW_RESET BIT(0)
+
/* Register to set SGMII speed, ANA RG_ Control Signals III*/
#define SGMSYS_ANA_RG_CS3 0x2028
#define RG_PHY_SPEED_MASK (BIT(2) | BIT(3))
--- a/drivers/net/ethernet/mediatek/mtk_sgmii.c
+++ b/drivers/net/ethernet/mediatek/mtk_sgmii.c
@@ -88,6 +88,10 @@ static int mtk_pcs_config(struct phylink
regmap_update_bits(mpcs->regmap, SGMSYS_QPHY_PWR_STATE_CTRL,
SGMII_PHYA_PWD, SGMII_PHYA_PWD);
+ /* Reset SGMII PCS state */
+ regmap_update_bits(mpcs->regmap, SGMII_RESERVED_0,
+ SGMII_SW_RESET, SGMII_SW_RESET);
+
if (interface == PHY_INTERFACE_MODE_2500BASEX)
rgc3 = RG_PHY_SPEED_3_125G;
else

View File

@ -0,0 +1,103 @@
From 6e933a804c7db8be64f367f33e63cd7dcc302ebb Mon Sep 17 00:00:00 2001
From: Daniel Golle <daniel@makrotopia.org>
Date: Tue, 14 Mar 2023 00:34:45 +0000
Subject: [PATCH 2/2] net: ethernet: mtk_eth_soc: only write values if needed
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Only restart auto-negotiation and write link timer if actually
necessary. This prevents losing the link in case of minor
changes.
Fixes: 7e538372694b ("net: ethernet: mediatek: Re-add support SGMII")
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/ethernet/mediatek/mtk_sgmii.c | 24 +++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)
--- a/drivers/net/ethernet/mediatek/mtk_sgmii.c
+++ b/drivers/net/ethernet/mediatek/mtk_sgmii.c
@@ -38,20 +38,16 @@ static int mtk_pcs_config(struct phylink
const unsigned long *advertising,
bool permit_pause_to_mac)
{
+ bool mode_changed = false, changed, use_an;
struct mtk_pcs *mpcs = pcs_to_mtk_pcs(pcs);
unsigned int rgc3, sgm_mode, bmcr;
int advertise, link_timer;
- bool changed, use_an;
advertise = phylink_mii_c22_pcs_encode_advertisement(interface,
advertising);
if (advertise < 0)
return advertise;
- link_timer = phylink_get_link_timer_ns(interface);
- if (link_timer < 0)
- return link_timer;
-
/* Clearing IF_MODE_BIT0 switches the PCS to BASE-X mode, and
* we assume that fixes it's speed at bitrate = line rate (in
* other words, 1000Mbps or 2500Mbps).
@@ -77,13 +73,16 @@ static int mtk_pcs_config(struct phylink
}
if (use_an) {
- /* FIXME: Do we need to set AN_RESTART here? */
- bmcr = SGMII_AN_RESTART | SGMII_AN_ENABLE;
+ bmcr = SGMII_AN_ENABLE;
} else {
bmcr = 0;
}
if (mpcs->interface != interface) {
+ link_timer = phylink_get_link_timer_ns(interface);
+ if (link_timer < 0)
+ return link_timer;
+
/* PHYA power down */
regmap_update_bits(mpcs->regmap, SGMSYS_QPHY_PWR_STATE_CTRL,
SGMII_PHYA_PWD, SGMII_PHYA_PWD);
@@ -101,16 +100,17 @@ static int mtk_pcs_config(struct phylink
regmap_update_bits(mpcs->regmap, mpcs->ana_rgc3,
RG_PHY_SPEED_3_125G, rgc3);
+ /* Setup the link timer */
+ regmap_write(mpcs->regmap, SGMSYS_PCS_LINK_TIMER, link_timer / 2 / 8);
+
mpcs->interface = interface;
+ mode_changed = true;
}
/* Update the advertisement, noting whether it has changed */
regmap_update_bits_check(mpcs->regmap, SGMSYS_PCS_ADVERTISE,
SGMII_ADVERTISE, advertise, &changed);
- /* Setup the link timer and QPHY power up inside SGMIISYS */
- regmap_write(mpcs->regmap, SGMSYS_PCS_LINK_TIMER, link_timer / 2 / 8);
-
/* Update the sgmsys mode register */
regmap_update_bits(mpcs->regmap, SGMSYS_SGMII_MODE,
SGMII_REMOTE_FAULT_DIS | SGMII_SPEED_DUPLEX_AN |
@@ -118,7 +118,7 @@ static int mtk_pcs_config(struct phylink
/* Update the BMCR */
regmap_update_bits(mpcs->regmap, SGMSYS_PCS_CONTROL_1,
- SGMII_AN_RESTART | SGMII_AN_ENABLE, bmcr);
+ SGMII_AN_ENABLE, bmcr);
/* Release PHYA power down state
* Only removing bit SGMII_PHYA_PWD isn't enough.
@@ -132,7 +132,7 @@ static int mtk_pcs_config(struct phylink
usleep_range(50, 100);
regmap_write(mpcs->regmap, SGMSYS_QPHY_PWR_STATE_CTRL, 0);
- return changed;
+ return changed || mode_changed;
}
static void mtk_pcs_restart_an(struct phylink_pcs *pcs)

View File

@ -0,0 +1,200 @@
From f5d43ddd334b7c32fcaed9ba46afbd85cb467f1f Mon Sep 17 00:00:00 2001
From: Daniel Golle <daniel@makrotopia.org>
Date: Sun, 19 Mar 2023 12:56:28 +0000
Subject: [PATCH] net: ethernet: mtk_eth_soc: add support for MT7981 SoC
The MediaTek MT7981 SoC comes with two 1G/2.5G SGMII ports, just like
MT7986.
In addition MT7981 is equipped with a built-in 1000Base-T PHY which can
be used with GMAC1.
As many MT7981 boards make use of inverting SGMII signal polarity, add
new device-tree attribute 'mediatek,pn_swap' to support them.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
--- a/drivers/net/ethernet/mediatek/mtk_eth_path.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_path.c
@@ -96,12 +96,20 @@ static int set_mux_gmac2_gmac0_to_gephy(
static int set_mux_u3_gmac2_to_qphy(struct mtk_eth *eth, int path)
{
- unsigned int val = 0;
+ unsigned int val = 0, mask = 0, reg = 0;
bool updated = true;
switch (path) {
case MTK_ETH_PATH_GMAC2_SGMII:
- val = CO_QPHY_SEL;
+ if (MTK_HAS_CAPS(eth->soc->caps, MTK_U3_COPHY_V2)) {
+ reg = USB_PHY_SWITCH_REG;
+ val = SGMII_QPHY_SEL;
+ mask = QPHY_SEL_MASK;
+ } else {
+ reg = INFRA_MISC2;
+ val = CO_QPHY_SEL;
+ mask = val;
+ }
break;
default:
updated = false;
@@ -109,7 +117,7 @@ static int set_mux_u3_gmac2_to_qphy(stru
}
if (updated)
- regmap_update_bits(eth->infra, INFRA_MISC2, CO_QPHY_SEL, val);
+ regmap_update_bits(eth->infra, reg, mask, val);
dev_dbg(eth->dev, "path %s in %s updated = %d\n",
mtk_eth_path_name(path), __func__, updated);
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -4808,6 +4808,26 @@ static const struct mtk_soc_data mt7629_
},
};
+static const struct mtk_soc_data mt7981_data = {
+ .reg_map = &mt7986_reg_map,
+ .ana_rgc3 = 0x128,
+ .caps = MT7981_CAPS,
+ .hw_features = MTK_HW_FEATURES,
+ .required_clks = MT7981_CLKS_BITMAP,
+ .required_pctl = false,
+ .offload_version = 2,
+ .hash_offset = 4,
+ .foe_entry_size = sizeof(struct mtk_foe_entry),
+ .txrx = {
+ .txd_size = sizeof(struct mtk_tx_dma_v2),
+ .rxd_size = sizeof(struct mtk_rx_dma_v2),
+ .rx_irq_done_mask = MTK_RX_DONE_INT_V2,
+ .rx_dma_l4_valid = RX_DMA_L4_VALID_V2,
+ .dma_max_len = MTK_TX_DMA_BUF_LEN_V2,
+ .dma_len_offset = 8,
+ },
+};
+
static const struct mtk_soc_data mt7986_data = {
.reg_map = &mt7986_reg_map,
.ana_rgc3 = 0x128,
@@ -4849,6 +4869,7 @@ const struct of_device_id of_mtk_match[]
{ .compatible = "mediatek,mt7622-eth", .data = &mt7622_data},
{ .compatible = "mediatek,mt7623-eth", .data = &mt7623_data},
{ .compatible = "mediatek,mt7629-eth", .data = &mt7629_data},
+ { .compatible = "mediatek,mt7981-eth", .data = &mt7981_data},
{ .compatible = "mediatek,mt7986-eth", .data = &mt7986_data},
{ .compatible = "ralink,rt5350-eth", .data = &rt5350_data},
{},
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -556,11 +556,22 @@
#define SGMSYS_QPHY_PWR_STATE_CTRL 0xe8
#define SGMII_PHYA_PWD BIT(4)
+/* Register to QPHY wrapper control */
+#define SGMSYS_QPHY_WRAP_CTRL 0xec
+#define SGMII_PN_SWAP_MASK GENMASK(1, 0)
+#define SGMII_PN_SWAP_TX_RX (BIT(0) | BIT(1))
+#define MTK_SGMII_FLAG_PN_SWAP BIT(0)
+
/* Infrasys subsystem config registers */
#define INFRA_MISC2 0x70c
#define CO_QPHY_SEL BIT(0)
#define GEPHY_MAC_SEL BIT(1)
+/* Top misc registers */
+#define USB_PHY_SWITCH_REG 0x218
+#define QPHY_SEL_MASK GENMASK(1, 0)
+#define SGMII_QPHY_SEL 0x2
+
/* MT7628/88 specific stuff */
#define MT7628_PDMA_OFFSET 0x0800
#define MT7628_SDM_OFFSET 0x0c00
@@ -741,6 +752,17 @@ enum mtk_clks_map {
BIT(MTK_CLK_SGMII2_CDR_FB) | \
BIT(MTK_CLK_SGMII_CK) | \
BIT(MTK_CLK_ETH2PLL) | BIT(MTK_CLK_SGMIITOP))
+#define MT7981_CLKS_BITMAP (BIT(MTK_CLK_FE) | BIT(MTK_CLK_GP2) | BIT(MTK_CLK_GP1) | \
+ BIT(MTK_CLK_WOCPU0) | \
+ BIT(MTK_CLK_SGMII_TX_250M) | \
+ BIT(MTK_CLK_SGMII_RX_250M) | \
+ BIT(MTK_CLK_SGMII_CDR_REF) | \
+ BIT(MTK_CLK_SGMII_CDR_FB) | \
+ BIT(MTK_CLK_SGMII2_TX_250M) | \
+ BIT(MTK_CLK_SGMII2_RX_250M) | \
+ BIT(MTK_CLK_SGMII2_CDR_REF) | \
+ BIT(MTK_CLK_SGMII2_CDR_FB) | \
+ BIT(MTK_CLK_SGMII_CK))
#define MT7986_CLKS_BITMAP (BIT(MTK_CLK_FE) | BIT(MTK_CLK_GP2) | BIT(MTK_CLK_GP1) | \
BIT(MTK_CLK_WOCPU1) | BIT(MTK_CLK_WOCPU0) | \
BIT(MTK_CLK_SGMII_TX_250M) | \
@@ -854,6 +876,7 @@ enum mkt_eth_capabilities {
MTK_NETSYS_V2_BIT,
MTK_SOC_MT7628_BIT,
MTK_RSTCTRL_PPE1_BIT,
+ MTK_U3_COPHY_V2_BIT,
/* MUX BITS*/
MTK_ETH_MUX_GDM1_TO_GMAC1_ESW_BIT,
@@ -888,6 +911,7 @@ enum mkt_eth_capabilities {
#define MTK_NETSYS_V2 BIT(MTK_NETSYS_V2_BIT)
#define MTK_SOC_MT7628 BIT(MTK_SOC_MT7628_BIT)
#define MTK_RSTCTRL_PPE1 BIT(MTK_RSTCTRL_PPE1_BIT)
+#define MTK_U3_COPHY_V2 BIT(MTK_U3_COPHY_V2_BIT)
#define MTK_ETH_MUX_GDM1_TO_GMAC1_ESW \
BIT(MTK_ETH_MUX_GDM1_TO_GMAC1_ESW_BIT)
@@ -966,6 +990,11 @@ enum mkt_eth_capabilities {
MTK_MUX_U3_GMAC2_TO_QPHY | \
MTK_MUX_GMAC12_TO_GEPHY_SGMII | MTK_QDMA)
+#define MT7981_CAPS (MTK_GMAC1_SGMII | MTK_GMAC2_SGMII | MTK_GMAC2_GEPHY | \
+ MTK_MUX_GMAC12_TO_GEPHY_SGMII | MTK_QDMA | \
+ MTK_MUX_U3_GMAC2_TO_QPHY | MTK_U3_COPHY_V2 | \
+ MTK_NETSYS_V2 | MTK_RSTCTRL_PPE1)
+
#define MT7986_CAPS (MTK_GMAC1_SGMII | MTK_GMAC2_SGMII | \
MTK_MUX_GMAC12_TO_GEPHY_SGMII | MTK_QDMA | \
MTK_NETSYS_V2 | MTK_RSTCTRL_PPE1)
@@ -1079,12 +1108,14 @@ struct mtk_soc_data {
* @ana_rgc3: The offset refers to register ANA_RGC3 related to regmap
* @interface: Currently configured interface mode
* @pcs: Phylink PCS structure
+ * @flags: Flags indicating hardware properties
*/
struct mtk_pcs {
struct regmap *regmap;
u32 ana_rgc3;
phy_interface_t interface;
struct phylink_pcs pcs;
+ u32 flags;
};
/* struct mtk_sgmii - This is the structure holding sgmii regmap and its
--- a/drivers/net/ethernet/mediatek/mtk_sgmii.c
+++ b/drivers/net/ethernet/mediatek/mtk_sgmii.c
@@ -87,6 +87,11 @@ static int mtk_pcs_config(struct phylink
regmap_update_bits(mpcs->regmap, SGMSYS_QPHY_PWR_STATE_CTRL,
SGMII_PHYA_PWD, SGMII_PHYA_PWD);
+ if (mpcs->flags & MTK_SGMII_FLAG_PN_SWAP)
+ regmap_update_bits(mpcs->regmap, SGMSYS_QPHY_WRAP_CTRL,
+ SGMII_PN_SWAP_MASK,
+ SGMII_PN_SWAP_TX_RX);
+
/* Reset SGMII PCS state */
regmap_update_bits(mpcs->regmap, SGMII_RESERVED_0,
SGMII_SW_RESET, SGMII_SW_RESET);
@@ -186,6 +191,11 @@ int mtk_sgmii_init(struct mtk_sgmii *ss,
ss->pcs[i].ana_rgc3 = ana_rgc3;
ss->pcs[i].regmap = syscon_node_to_regmap(np);
+
+ ss->pcs[i].flags = 0;
+ if (of_property_read_bool(np, "mediatek,pnswap"))
+ ss->pcs[i].flags |= MTK_SGMII_FLAG_PN_SWAP;
+
of_node_put(np);
if (IS_ERR(ss->pcs[i].regmap))
return PTR_ERR(ss->pcs[i].regmap);

View File

@ -0,0 +1,76 @@
From c0a440031d4314d1023c1b87f43a4233634eebdb Mon Sep 17 00:00:00 2001
From: Daniel Golle <daniel@makrotopia.org>
Date: Sun, 19 Mar 2023 12:57:15 +0000
Subject: [PATCH] net: ethernet: mtk_eth_soc: set MDIO bus clock frequency
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Set MDIO bus clock frequency and allow setting a custom maximum
frequency from device tree.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/ethernet/mediatek/mtk_eth_soc.c | 21 +++++++++++++++++++++
drivers/net/ethernet/mediatek/mtk_eth_soc.h | 7 +++++++
2 files changed, 28 insertions(+)
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -745,8 +745,10 @@ static const struct phylink_mac_ops mtk_
static int mtk_mdio_init(struct mtk_eth *eth)
{
+ unsigned int max_clk = 2500000, divider;
struct device_node *mii_np;
int ret;
+ u32 val;
mii_np = of_get_child_by_name(eth->dev->of_node, "mdio-bus");
if (!mii_np) {
@@ -773,6 +775,25 @@ static int mtk_mdio_init(struct mtk_eth
eth->mii_bus->parent = eth->dev;
snprintf(eth->mii_bus->id, MII_BUS_ID_SIZE, "%pOFn", mii_np);
+
+ if (!of_property_read_u32(mii_np, "clock-frequency", &val)) {
+ if (val > MDC_MAX_FREQ || val < MDC_MAX_FREQ / MDC_MAX_DIVIDER) {
+ dev_err(eth->dev, "MDIO clock frequency out of range");
+ ret = -EINVAL;
+ goto err_put_node;
+ }
+ max_clk = val;
+ }
+ divider = min_t(unsigned int, DIV_ROUND_UP(MDC_MAX_FREQ, max_clk), 63);
+
+ /* Configure MDC Divider */
+ val = mtk_r32(eth, MTK_PPSC);
+ val &= ~PPSC_MDC_CFG;
+ val |= FIELD_PREP(PPSC_MDC_CFG, divider) | PPSC_MDC_TURBO;
+ mtk_w32(eth, val, MTK_PPSC);
+
+ dev_dbg(eth->dev, "MDC is running on %d Hz\n", MDC_MAX_FREQ / divider);
+
ret = of_mdiobus_register(eth->mii_bus, mii_np);
err_put_node:
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -363,6 +363,13 @@
#define RX_DMA_VTAG_V2 BIT(0)
#define RX_DMA_L4_VALID_V2 BIT(2)
+/* PHY Polling and SMI Master Control registers */
+#define MTK_PPSC 0x10000
+#define PPSC_MDC_CFG GENMASK(29, 24)
+#define PPSC_MDC_TURBO BIT(20)
+#define MDC_MAX_FREQ 25000000
+#define MDC_MAX_DIVIDER 63
+
/* PHY Indirect Access Control registers */
#define MTK_PHY_IAC 0x10004
#define PHY_IAC_ACCESS BIT(31)

View File

@ -0,0 +1,512 @@
From 2a3ec7ae313310c1092e4256208cc04d1958e469 Mon Sep 17 00:00:00 2001
From: Daniel Golle <daniel@makrotopia.org>
Date: Sun, 19 Mar 2023 12:58:02 +0000
Subject: [PATCH] net: ethernet: mtk_eth_soc: switch to external PCS driver
Now that we got a PCS driver, use it and remove the now redundant
PCS code and it's header macros from the Ethernet driver.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Tested-by: Frank Wunderlich <frank-w@public-files.de>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
drivers/net/ethernet/mediatek/Kconfig | 2 +
drivers/net/ethernet/mediatek/Makefile | 2 +-
drivers/net/ethernet/mediatek/mtk_eth_soc.c | 61 +++++-
drivers/net/ethernet/mediatek/mtk_eth_soc.h | 93 +--------
drivers/net/ethernet/mediatek/mtk_sgmii.c | 217 --------------------
5 files changed, 56 insertions(+), 319 deletions(-)
delete mode 100644 drivers/net/ethernet/mediatek/mtk_sgmii.c
--- a/drivers/net/ethernet/mediatek/Kconfig
+++ b/drivers/net/ethernet/mediatek/Kconfig
@@ -19,6 +19,8 @@ config NET_MEDIATEK_SOC
select DIMLIB
select PAGE_POOL
select PAGE_POOL_STATS
+ select PCS_MTK_LYNXI
+ select REGMAP_MMIO
help
This driver supports the gigabit ethernet MACs in the
MediaTek SoC family.
--- a/drivers/net/ethernet/mediatek/Makefile
+++ b/drivers/net/ethernet/mediatek/Makefile
@@ -4,7 +4,7 @@
#
obj-$(CONFIG_NET_MEDIATEK_SOC) += mtk_eth.o
-mtk_eth-y := mtk_eth_soc.o mtk_sgmii.o mtk_eth_path.o mtk_ppe.o mtk_ppe_debugfs.o mtk_ppe_offload.o
+mtk_eth-y := mtk_eth_soc.o mtk_eth_path.o mtk_ppe.o mtk_ppe_debugfs.o mtk_ppe_offload.o
mtk_eth-$(CONFIG_NET_MEDIATEK_SOC_WED) += mtk_wed.o mtk_wed_mcu.o mtk_wed_wo.o
ifdef CONFIG_DEBUG_FS
mtk_eth-$(CONFIG_NET_MEDIATEK_SOC_WED) += mtk_wed_debugfs.o
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -20,6 +20,7 @@
#include <linux/interrupt.h>
#include <linux/pinctrl/devinfo.h>
#include <linux/phylink.h>
+#include <linux/pcs/pcs-mtk-lynxi.h>
#include <linux/jhash.h>
#include <linux/bitfield.h>
#include <net/dsa.h>
@@ -400,7 +401,7 @@ static struct phylink_pcs *mtk_mac_selec
sid = (MTK_HAS_CAPS(eth->soc->caps, MTK_SHARED_SGMII)) ?
0 : mac->id;
- return mtk_sgmii_select_pcs(eth->sgmii, sid);
+ return eth->sgmii_pcs[sid];
}
return NULL;
@@ -4031,8 +4032,17 @@ static int mtk_unreg_dev(struct mtk_eth
return 0;
}
+static void mtk_sgmii_destroy(struct mtk_eth *eth)
+{
+ int i;
+
+ for (i = 0; i < MTK_MAX_DEVS; i++)
+ mtk_pcs_lynxi_destroy(eth->sgmii_pcs[i]);
+}
+
static int mtk_cleanup(struct mtk_eth *eth)
{
+ mtk_sgmii_destroy(eth);
mtk_unreg_dev(eth);
mtk_free_dev(eth);
cancel_work_sync(&eth->pending_work);
@@ -4462,6 +4472,36 @@ void mtk_eth_set_dma_device(struct mtk_e
rtnl_unlock();
}
+static int mtk_sgmii_init(struct mtk_eth *eth)
+{
+ struct device_node *np;
+ struct regmap *regmap;
+ u32 flags;
+ int i;
+
+ for (i = 0; i < MTK_MAX_DEVS; i++) {
+ np = of_parse_phandle(eth->dev->of_node, "mediatek,sgmiisys", i);
+ if (!np)
+ break;
+
+ regmap = syscon_node_to_regmap(np);
+ flags = 0;
+ if (of_property_read_bool(np, "mediatek,pnswap"))
+ flags |= MTK_SGMII_FLAG_PN_SWAP;
+
+ of_node_put(np);
+
+ if (IS_ERR(regmap))
+ return PTR_ERR(regmap);
+
+ eth->sgmii_pcs[i] = mtk_pcs_lynxi_create(eth->dev, regmap,
+ eth->soc->ana_rgc3,
+ flags);
+ }
+
+ return 0;
+}
+
static int mtk_probe(struct platform_device *pdev)
{
struct resource *res = NULL;
@@ -4525,13 +4565,7 @@ static int mtk_probe(struct platform_dev
}
if (MTK_HAS_CAPS(eth->soc->caps, MTK_SGMII)) {
- eth->sgmii = devm_kzalloc(eth->dev, sizeof(*eth->sgmii),
- GFP_KERNEL);
- if (!eth->sgmii)
- return -ENOMEM;
-
- err = mtk_sgmii_init(eth->sgmii, pdev->dev.of_node,
- eth->soc->ana_rgc3);
+ err = mtk_sgmii_init(eth);
if (err)
return err;
@@ -4542,14 +4576,17 @@ static int mtk_probe(struct platform_dev
"mediatek,pctl");
if (IS_ERR(eth->pctl)) {
dev_err(&pdev->dev, "no pctl regmap found\n");
- return PTR_ERR(eth->pctl);
+ err = PTR_ERR(eth->pctl);
+ goto err_destroy_sgmii;
}
}
if (MTK_HAS_CAPS(eth->soc->caps, MTK_NETSYS_V2)) {
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
- if (!res)
- return -EINVAL;
+ if (!res) {
+ err = -EINVAL;
+ goto err_destroy_sgmii;
+ }
}
if (eth->soc->offload_version) {
@@ -4708,6 +4745,8 @@ err_deinit_hw:
mtk_hw_deinit(eth);
err_wed_exit:
mtk_wed_exit();
+err_destroy_sgmii:
+ mtk_sgmii_destroy(eth);
return err;
}
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -510,65 +510,6 @@
#define ETHSYS_DMA_AG_MAP_QDMA BIT(1)
#define ETHSYS_DMA_AG_MAP_PPE BIT(2)
-/* SGMII subsystem config registers */
-/* BMCR (low 16) BMSR (high 16) */
-#define SGMSYS_PCS_CONTROL_1 0x0
-#define SGMII_BMCR GENMASK(15, 0)
-#define SGMII_BMSR GENMASK(31, 16)
-#define SGMII_AN_RESTART BIT(9)
-#define SGMII_ISOLATE BIT(10)
-#define SGMII_AN_ENABLE BIT(12)
-#define SGMII_LINK_STATYS BIT(18)
-#define SGMII_AN_ABILITY BIT(19)
-#define SGMII_AN_COMPLETE BIT(21)
-#define SGMII_PCS_FAULT BIT(23)
-#define SGMII_AN_EXPANSION_CLR BIT(30)
-
-#define SGMSYS_PCS_ADVERTISE 0x8
-#define SGMII_ADVERTISE GENMASK(15, 0)
-#define SGMII_LPA GENMASK(31, 16)
-
-/* Register to programmable link timer, the unit in 2 * 8ns */
-#define SGMSYS_PCS_LINK_TIMER 0x18
-#define SGMII_LINK_TIMER_MASK GENMASK(19, 0)
-#define SGMII_LINK_TIMER_DEFAULT (0x186a0 & SGMII_LINK_TIMER_MASK)
-
-/* Register to control remote fault */
-#define SGMSYS_SGMII_MODE 0x20
-#define SGMII_IF_MODE_SGMII BIT(0)
-#define SGMII_SPEED_DUPLEX_AN BIT(1)
-#define SGMII_SPEED_MASK GENMASK(3, 2)
-#define SGMII_SPEED_10 FIELD_PREP(SGMII_SPEED_MASK, 0)
-#define SGMII_SPEED_100 FIELD_PREP(SGMII_SPEED_MASK, 1)
-#define SGMII_SPEED_1000 FIELD_PREP(SGMII_SPEED_MASK, 2)
-#define SGMII_DUPLEX_HALF BIT(4)
-#define SGMII_IF_MODE_BIT5 BIT(5)
-#define SGMII_REMOTE_FAULT_DIS BIT(8)
-#define SGMII_CODE_SYNC_SET_VAL BIT(9)
-#define SGMII_CODE_SYNC_SET_EN BIT(10)
-#define SGMII_SEND_AN_ERROR_EN BIT(11)
-#define SGMII_IF_MODE_MASK GENMASK(5, 1)
-
-/* Register to reset SGMII design */
-#define SGMII_RESERVED_0 0x34
-#define SGMII_SW_RESET BIT(0)
-
-/* Register to set SGMII speed, ANA RG_ Control Signals III*/
-#define SGMSYS_ANA_RG_CS3 0x2028
-#define RG_PHY_SPEED_MASK (BIT(2) | BIT(3))
-#define RG_PHY_SPEED_1_25G 0x0
-#define RG_PHY_SPEED_3_125G BIT(2)
-
-/* Register to power up QPHY */
-#define SGMSYS_QPHY_PWR_STATE_CTRL 0xe8
-#define SGMII_PHYA_PWD BIT(4)
-
-/* Register to QPHY wrapper control */
-#define SGMSYS_QPHY_WRAP_CTRL 0xec
-#define SGMII_PN_SWAP_MASK GENMASK(1, 0)
-#define SGMII_PN_SWAP_TX_RX (BIT(0) | BIT(1))
-#define MTK_SGMII_FLAG_PN_SWAP BIT(0)
-
/* Infrasys subsystem config registers */
#define INFRA_MISC2 0x70c
#define CO_QPHY_SEL BIT(0)
@@ -1108,31 +1049,6 @@ struct mtk_soc_data {
/* currently no SoC has more than 2 macs */
#define MTK_MAX_DEVS 2
-/* struct mtk_pcs - This structure holds each sgmii regmap and associated
- * data
- * @regmap: The register map pointing at the range used to setup
- * SGMII modes
- * @ana_rgc3: The offset refers to register ANA_RGC3 related to regmap
- * @interface: Currently configured interface mode
- * @pcs: Phylink PCS structure
- * @flags: Flags indicating hardware properties
- */
-struct mtk_pcs {
- struct regmap *regmap;
- u32 ana_rgc3;
- phy_interface_t interface;
- struct phylink_pcs pcs;
- u32 flags;
-};
-
-/* struct mtk_sgmii - This is the structure holding sgmii regmap and its
- * characteristics
- * @pcs Array of individual PCS structures
- */
-struct mtk_sgmii {
- struct mtk_pcs pcs[MTK_MAX_DEVS];
-};
-
/* struct mtk_eth - This is the main datasructure for holding the state
* of the driver
* @dev: The device pointer
@@ -1152,6 +1068,7 @@ struct mtk_sgmii {
* MII modes
* @infra: The register map pointing at the range used to setup
* SGMII and GePHY path
+ * @sgmii_pcs: Pointers to mtk-pcs-lynxi phylink_pcs instances
* @pctl: The register map pointing at the range used to setup
* GMAC port drive/slew values
* @dma_refcnt: track how many netdevs are using the DMA engine
@@ -1192,8 +1109,8 @@ struct mtk_eth {
u32 msg_enable;
unsigned long sysclk;
struct regmap *ethsys;
- struct regmap *infra;
- struct mtk_sgmii *sgmii;
+ struct regmap *infra;
+ struct phylink_pcs *sgmii_pcs[MTK_MAX_DEVS];
struct regmap *pctl;
bool hwlro;
refcount_t dma_refcnt;
@@ -1355,10 +1272,6 @@ void mtk_stats_update_mac(struct mtk_mac
void mtk_w32(struct mtk_eth *eth, u32 val, unsigned reg);
u32 mtk_r32(struct mtk_eth *eth, unsigned reg);
-struct phylink_pcs *mtk_sgmii_select_pcs(struct mtk_sgmii *ss, int id);
-int mtk_sgmii_init(struct mtk_sgmii *ss, struct device_node *np,
- u32 ana_rgc3);
-
int mtk_gmac_sgmii_path_setup(struct mtk_eth *eth, int mac_id);
int mtk_gmac_gephy_path_setup(struct mtk_eth *eth, int mac_id);
int mtk_gmac_rgmii_path_setup(struct mtk_eth *eth, int mac_id);
--- a/drivers/net/ethernet/mediatek/mtk_sgmii.c
+++ /dev/null
@@ -1,217 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-// Copyright (c) 2018-2019 MediaTek Inc.
-
-/* A library for MediaTek SGMII circuit
- *
- * Author: Sean Wang <sean.wang@mediatek.com>
- *
- */
-
-#include <linux/mfd/syscon.h>
-#include <linux/of.h>
-#include <linux/phylink.h>
-#include <linux/regmap.h>
-
-#include "mtk_eth_soc.h"
-
-static struct mtk_pcs *pcs_to_mtk_pcs(struct phylink_pcs *pcs)
-{
- return container_of(pcs, struct mtk_pcs, pcs);
-}
-
-static void mtk_pcs_get_state(struct phylink_pcs *pcs,
- struct phylink_link_state *state)
-{
- struct mtk_pcs *mpcs = pcs_to_mtk_pcs(pcs);
- unsigned int bm, adv;
-
- /* Read the BMSR and LPA */
- regmap_read(mpcs->regmap, SGMSYS_PCS_CONTROL_1, &bm);
- regmap_read(mpcs->regmap, SGMSYS_PCS_ADVERTISE, &adv);
-
- phylink_mii_c22_pcs_decode_state(state, FIELD_GET(SGMII_BMSR, bm),
- FIELD_GET(SGMII_LPA, adv));
-}
-
-static int mtk_pcs_config(struct phylink_pcs *pcs, unsigned int mode,
- phy_interface_t interface,
- const unsigned long *advertising,
- bool permit_pause_to_mac)
-{
- bool mode_changed = false, changed, use_an;
- struct mtk_pcs *mpcs = pcs_to_mtk_pcs(pcs);
- unsigned int rgc3, sgm_mode, bmcr;
- int advertise, link_timer;
-
- advertise = phylink_mii_c22_pcs_encode_advertisement(interface,
- advertising);
- if (advertise < 0)
- return advertise;
-
- /* Clearing IF_MODE_BIT0 switches the PCS to BASE-X mode, and
- * we assume that fixes it's speed at bitrate = line rate (in
- * other words, 1000Mbps or 2500Mbps).
- */
- if (interface == PHY_INTERFACE_MODE_SGMII) {
- sgm_mode = SGMII_IF_MODE_SGMII;
- if (phylink_autoneg_inband(mode)) {
- sgm_mode |= SGMII_REMOTE_FAULT_DIS |
- SGMII_SPEED_DUPLEX_AN;
- use_an = true;
- } else {
- use_an = false;
- }
- } else if (phylink_autoneg_inband(mode)) {
- /* 1000base-X or 2500base-X autoneg */
- sgm_mode = SGMII_REMOTE_FAULT_DIS;
- use_an = linkmode_test_bit(ETHTOOL_LINK_MODE_Autoneg_BIT,
- advertising);
- } else {
- /* 1000base-X or 2500base-X without autoneg */
- sgm_mode = 0;
- use_an = false;
- }
-
- if (use_an) {
- bmcr = SGMII_AN_ENABLE;
- } else {
- bmcr = 0;
- }
-
- if (mpcs->interface != interface) {
- link_timer = phylink_get_link_timer_ns(interface);
- if (link_timer < 0)
- return link_timer;
-
- /* PHYA power down */
- regmap_update_bits(mpcs->regmap, SGMSYS_QPHY_PWR_STATE_CTRL,
- SGMII_PHYA_PWD, SGMII_PHYA_PWD);
-
- if (mpcs->flags & MTK_SGMII_FLAG_PN_SWAP)
- regmap_update_bits(mpcs->regmap, SGMSYS_QPHY_WRAP_CTRL,
- SGMII_PN_SWAP_MASK,
- SGMII_PN_SWAP_TX_RX);
-
- /* Reset SGMII PCS state */
- regmap_update_bits(mpcs->regmap, SGMII_RESERVED_0,
- SGMII_SW_RESET, SGMII_SW_RESET);
-
- if (interface == PHY_INTERFACE_MODE_2500BASEX)
- rgc3 = RG_PHY_SPEED_3_125G;
- else
- rgc3 = 0;
-
- /* Configure the underlying interface speed */
- regmap_update_bits(mpcs->regmap, mpcs->ana_rgc3,
- RG_PHY_SPEED_3_125G, rgc3);
-
- /* Setup the link timer */
- regmap_write(mpcs->regmap, SGMSYS_PCS_LINK_TIMER, link_timer / 2 / 8);
-
- mpcs->interface = interface;
- mode_changed = true;
- }
-
- /* Update the advertisement, noting whether it has changed */
- regmap_update_bits_check(mpcs->regmap, SGMSYS_PCS_ADVERTISE,
- SGMII_ADVERTISE, advertise, &changed);
-
- /* Update the sgmsys mode register */
- regmap_update_bits(mpcs->regmap, SGMSYS_SGMII_MODE,
- SGMII_REMOTE_FAULT_DIS | SGMII_SPEED_DUPLEX_AN |
- SGMII_IF_MODE_SGMII, sgm_mode);
-
- /* Update the BMCR */
- regmap_update_bits(mpcs->regmap, SGMSYS_PCS_CONTROL_1,
- SGMII_AN_ENABLE, bmcr);
-
- /* Release PHYA power down state
- * Only removing bit SGMII_PHYA_PWD isn't enough.
- * There are cases when the SGMII_PHYA_PWD register contains 0x9 which
- * prevents SGMII from working. The SGMII still shows link but no traffic
- * can flow. Writing 0x0 to the PHYA_PWD register fix the issue. 0x0 was
- * taken from a good working state of the SGMII interface.
- * Unknown how much the QPHY needs but it is racy without a sleep.
- * Tested on mt7622 & mt7986.
- */
- usleep_range(50, 100);
- regmap_write(mpcs->regmap, SGMSYS_QPHY_PWR_STATE_CTRL, 0);
-
- return changed || mode_changed;
-}
-
-static void mtk_pcs_restart_an(struct phylink_pcs *pcs)
-{
- struct mtk_pcs *mpcs = pcs_to_mtk_pcs(pcs);
-
- regmap_update_bits(mpcs->regmap, SGMSYS_PCS_CONTROL_1,
- SGMII_AN_RESTART, SGMII_AN_RESTART);
-}
-
-static void mtk_pcs_link_up(struct phylink_pcs *pcs, unsigned int mode,
- phy_interface_t interface, int speed, int duplex)
-{
- struct mtk_pcs *mpcs = pcs_to_mtk_pcs(pcs);
- unsigned int sgm_mode;
-
- if (!phylink_autoneg_inband(mode)) {
- /* Force the speed and duplex setting */
- if (speed == SPEED_10)
- sgm_mode = SGMII_SPEED_10;
- else if (speed == SPEED_100)
- sgm_mode = SGMII_SPEED_100;
- else
- sgm_mode = SGMII_SPEED_1000;
-
- if (duplex != DUPLEX_FULL)
- sgm_mode |= SGMII_DUPLEX_HALF;
-
- regmap_update_bits(mpcs->regmap, SGMSYS_SGMII_MODE,
- SGMII_DUPLEX_HALF | SGMII_SPEED_MASK,
- sgm_mode);
- }
-}
-
-static const struct phylink_pcs_ops mtk_pcs_ops = {
- .pcs_get_state = mtk_pcs_get_state,
- .pcs_config = mtk_pcs_config,
- .pcs_an_restart = mtk_pcs_restart_an,
- .pcs_link_up = mtk_pcs_link_up,
-};
-
-int mtk_sgmii_init(struct mtk_sgmii *ss, struct device_node *r, u32 ana_rgc3)
-{
- struct device_node *np;
- int i;
-
- for (i = 0; i < MTK_MAX_DEVS; i++) {
- np = of_parse_phandle(r, "mediatek,sgmiisys", i);
- if (!np)
- break;
-
- ss->pcs[i].ana_rgc3 = ana_rgc3;
- ss->pcs[i].regmap = syscon_node_to_regmap(np);
-
- ss->pcs[i].flags = 0;
- if (of_property_read_bool(np, "mediatek,pnswap"))
- ss->pcs[i].flags |= MTK_SGMII_FLAG_PN_SWAP;
-
- of_node_put(np);
- if (IS_ERR(ss->pcs[i].regmap))
- return PTR_ERR(ss->pcs[i].regmap);
-
- ss->pcs[i].pcs.ops = &mtk_pcs_ops;
- ss->pcs[i].pcs.poll = true;
- ss->pcs[i].interface = PHY_INTERFACE_MODE_NA;
- }
-
- return 0;
-}
-
-struct phylink_pcs *mtk_sgmii_select_pcs(struct mtk_sgmii *ss, int id)
-{
- if (!ss->pcs[id].regmap)
- return NULL;
-
- return &ss->pcs[id].pcs;
-}

View File

@ -0,0 +1,46 @@
From f5af7931d2a2cae66d0f9dad4ba517b1b00620b3 Mon Sep 17 00:00:00 2001
From: Daniel Golle <daniel@makrotopia.org>
Date: Wed, 19 Apr 2023 19:07:23 +0100
Subject: [PATCH] net: mtk_eth_soc: use WO firmware for MT7981
In order to support wireless offloading on MT7981 we need to load the
appropriate firmware. Recognize MT7981 and load mt7981_wo.bin.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
---
drivers/net/ethernet/mediatek/mtk_wed_mcu.c | 7 ++++++-
drivers/net/ethernet/mediatek/mtk_wed_wo.h | 1 +
2 files changed, 7 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/mediatek/mtk_wed_mcu.c
+++ b/drivers/net/ethernet/mediatek/mtk_wed_mcu.c
@@ -326,7 +326,11 @@ mtk_wed_mcu_load_firmware(struct mtk_wed
wo->hw->index + 1);
/* load firmware */
- fw_name = wo->hw->index ? MT7986_FIRMWARE_WO1 : MT7986_FIRMWARE_WO0;
+ if (of_device_is_compatible(wo->hw->node, "mediatek,mt7981-wed"))
+ fw_name = MT7981_FIRMWARE_WO;
+ else
+ fw_name = wo->hw->index ? MT7986_FIRMWARE_WO1 : MT7986_FIRMWARE_WO0;
+
ret = request_firmware(&fw, fw_name, wo->hw->dev);
if (ret)
return ret;
@@ -386,5 +390,6 @@ int mtk_wed_mcu_init(struct mtk_wed_wo *
100, MTK_FW_DL_TIMEOUT);
}
+MODULE_FIRMWARE(MT7981_FIRMWARE_WO);
MODULE_FIRMWARE(MT7986_FIRMWARE_WO0);
MODULE_FIRMWARE(MT7986_FIRMWARE_WO1);
--- a/drivers/net/ethernet/mediatek/mtk_wed_wo.h
+++ b/drivers/net/ethernet/mediatek/mtk_wed_wo.h
@@ -88,6 +88,7 @@ enum mtk_wed_dummy_cr_idx {
MTK_WED_DUMMY_CR_WO_STATUS,
};
+#define MT7981_FIRMWARE_WO "mediatek/mt7981_wo.bin"
#define MT7986_FIRMWARE_WO0 "mediatek/mt7986_wo_0.bin"
#define MT7986_FIRMWARE_WO1 "mediatek/mt7986_wo_1.bin"

View File

@ -0,0 +1,28 @@
From 7c83e28f10830aa5105c25eaabe890e3adac36aa Mon Sep 17 00:00:00 2001
From: Daniel Golle <daniel@makrotopia.org>
Date: Tue, 9 May 2023 03:20:06 +0200
Subject: [PATCH] net: ethernet: mtk_eth_soc: fix NULL pointer dereference
Check for NULL pointer to avoid kernel crashing in case of missing WO
firmware in case only a single WEDv2 device has been initialized, e.g. on
MT7981 which can connect just one wireless frontend.
Fixes: 86ce0d09e424 ("net: ethernet: mtk_eth_soc: use WO firmware for MT7981")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/ethernet/mediatek/mtk_wed.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/ethernet/mediatek/mtk_wed.c
+++ b/drivers/net/ethernet/mediatek/mtk_wed.c
@@ -647,7 +647,7 @@ __mtk_wed_detach(struct mtk_wed_device *
BIT(hw->index), BIT(hw->index));
}
- if (!hw_list[!hw->index]->wed_dev &&
+ if ((!hw_list[!hw->index] || !hw_list[!hw->index]->wed_dev) &&
hw->eth->dma_dev != hw->eth->dev)
mtk_eth_set_dma_device(hw->eth, hw->eth->dev);

View File

@ -0,0 +1,150 @@
From cfbd6de588ef659c198083205dc954a6d3ed2aec Mon Sep 17 00:00:00 2001
From: Christian Marangi <ansuelsmth@gmail.com>
Date: Thu, 29 Dec 2022 17:33:35 +0100
Subject: [PATCH 4/5] net: dsa: qca8k: introduce single mii read/write lo/hi
It may be useful to read/write just the lo or hi half of a reg.
This is especially useful for phy poll with the use of mdio master.
The mdio master reg is composed by the first 16 bit related to setup and
the other half with the returned data or data to write.
Refactor the mii function to permit single mii read/write of lo or hi
half of the reg.
Tested-by: Ronald Wahl <ronald.wahl@raritan.com>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/dsa/qca/qca8k-8xxx.c | 106 ++++++++++++++++++++++++-------
1 file changed, 84 insertions(+), 22 deletions(-)
--- a/drivers/net/dsa/qca/qca8k-8xxx.c
+++ b/drivers/net/dsa/qca/qca8k-8xxx.c
@@ -37,42 +37,104 @@ qca8k_split_addr(u32 regaddr, u16 *r1, u
}
static int
-qca8k_mii_read32(struct mii_bus *bus, int phy_id, u32 regnum, u32 *val)
+qca8k_mii_write_lo(struct mii_bus *bus, int phy_id, u32 regnum, u32 val)
{
int ret;
+ u16 lo;
- ret = bus->read(bus, phy_id, regnum);
- if (ret >= 0) {
- *val = ret;
- ret = bus->read(bus, phy_id, regnum + 1);
- *val |= ret << 16;
- }
+ lo = val & 0xffff;
+ ret = bus->write(bus, phy_id, regnum, lo);
+ if (ret < 0)
+ dev_err_ratelimited(&bus->dev,
+ "failed to write qca8k 32bit lo register\n");
+
+ return ret;
+}
- if (ret < 0) {
+static int
+qca8k_mii_write_hi(struct mii_bus *bus, int phy_id, u32 regnum, u32 val)
+{
+ int ret;
+ u16 hi;
+
+ hi = (u16)(val >> 16);
+ ret = bus->write(bus, phy_id, regnum, hi);
+ if (ret < 0)
dev_err_ratelimited(&bus->dev,
- "failed to read qca8k 32bit register\n");
- *val = 0;
- return ret;
- }
+ "failed to write qca8k 32bit hi register\n");
+ return ret;
+}
+
+static int
+qca8k_mii_read_lo(struct mii_bus *bus, int phy_id, u32 regnum, u32 *val)
+{
+ int ret;
+
+ ret = bus->read(bus, phy_id, regnum);
+ if (ret < 0)
+ goto err;
+
+ *val = ret & 0xffff;
return 0;
+
+err:
+ dev_err_ratelimited(&bus->dev,
+ "failed to read qca8k 32bit lo register\n");
+ *val = 0;
+
+ return ret;
}
-static void
-qca8k_mii_write32(struct mii_bus *bus, int phy_id, u32 regnum, u32 val)
+static int
+qca8k_mii_read_hi(struct mii_bus *bus, int phy_id, u32 regnum, u32 *val)
{
- u16 lo, hi;
int ret;
- lo = val & 0xffff;
- hi = (u16)(val >> 16);
+ ret = bus->read(bus, phy_id, regnum);
+ if (ret < 0)
+ goto err;
- ret = bus->write(bus, phy_id, regnum, lo);
- if (ret >= 0)
- ret = bus->write(bus, phy_id, regnum + 1, hi);
+ *val = ret << 16;
+ return 0;
+
+err:
+ dev_err_ratelimited(&bus->dev,
+ "failed to read qca8k 32bit hi register\n");
+ *val = 0;
+
+ return ret;
+}
+
+static int
+qca8k_mii_read32(struct mii_bus *bus, int phy_id, u32 regnum, u32 *val)
+{
+ u32 hi, lo;
+ int ret;
+
+ *val = 0;
+
+ ret = qca8k_mii_read_lo(bus, phy_id, regnum, &lo);
if (ret < 0)
- dev_err_ratelimited(&bus->dev,
- "failed to write qca8k 32bit register\n");
+ goto err;
+
+ ret = qca8k_mii_read_hi(bus, phy_id, regnum + 1, &hi);
+ if (ret < 0)
+ goto err;
+
+ *val = lo | hi;
+
+err:
+ return ret;
+}
+
+static void
+qca8k_mii_write32(struct mii_bus *bus, int phy_id, u32 regnum, u32 val)
+{
+ if (qca8k_mii_write_lo(bus, phy_id, regnum, val) < 0)
+ return;
+
+ qca8k_mii_write_hi(bus, phy_id, regnum + 1, val);
}
static int

View File

@ -0,0 +1,73 @@
From a4165830ca237f2b3318faf62562bce8ce12a389 Mon Sep 17 00:00:00 2001
From: Christian Marangi <ansuelsmth@gmail.com>
Date: Thu, 29 Dec 2022 17:33:36 +0100
Subject: [PATCH 5/5] net: dsa: qca8k: improve mdio master read/write by using
single lo/hi
Improve mdio master read/write by using singe mii read/write lo/hi.
In a read and write we need to poll the mdio master regs in a busy loop
to check for a specific bit present in the upper half of the reg. We can
ignore the other half since it won't contain useful data. This will save
an additional useless read for each read and write operation.
In a read operation the returned data is present in the mdio master reg
lower half. We can ignore the other half since it won't contain useful
data. This will save an additional useless read for each read operation.
In a read operation it's needed to just set the hi half of the mdio
master reg as the lo half will be replaced by the result. This will save
an additional useless write for each read operation.
Tested-by: Ronald Wahl <ronald.wahl@raritan.com>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
drivers/net/dsa/qca/qca8k-8xxx.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
--- a/drivers/net/dsa/qca/qca8k-8xxx.c
+++ b/drivers/net/dsa/qca/qca8k-8xxx.c
@@ -740,9 +740,9 @@ qca8k_mdio_busy_wait(struct mii_bus *bus
qca8k_split_addr(reg, &r1, &r2, &page);
- ret = read_poll_timeout(qca8k_mii_read32, ret1, !(val & mask), 0,
+ ret = read_poll_timeout(qca8k_mii_read_hi, ret1, !(val & mask), 0,
QCA8K_BUSY_WAIT_TIMEOUT * USEC_PER_MSEC, false,
- bus, 0x10 | r2, r1, &val);
+ bus, 0x10 | r2, r1 + 1, &val);
/* Check if qca8k_read has failed for a different reason
* before returnting -ETIMEDOUT
@@ -784,7 +784,7 @@ qca8k_mdio_write(struct qca8k_priv *priv
exit:
/* even if the busy_wait timeouts try to clear the MASTER_EN */
- qca8k_mii_write32(bus, 0x10 | r2, r1, 0);
+ qca8k_mii_write_hi(bus, 0x10 | r2, r1 + 1, 0);
mutex_unlock(&bus->mdio_lock);
@@ -814,18 +814,18 @@ qca8k_mdio_read(struct qca8k_priv *priv,
if (ret)
goto exit;
- qca8k_mii_write32(bus, 0x10 | r2, r1, val);
+ qca8k_mii_write_hi(bus, 0x10 | r2, r1 + 1, val);
ret = qca8k_mdio_busy_wait(bus, QCA8K_MDIO_MASTER_CTRL,
QCA8K_MDIO_MASTER_BUSY);
if (ret)
goto exit;
- ret = qca8k_mii_read32(bus, 0x10 | r2, r1, &val);
+ ret = qca8k_mii_read_lo(bus, 0x10 | r2, r1, &val);
exit:
/* even if the busy_wait timeouts try to clear the MASTER_EN */
- qca8k_mii_write32(bus, 0x10 | r2, r1, 0);
+ qca8k_mii_write_hi(bus, 0x10 | r2, r1 + 1, 0);
mutex_unlock(&bus->mdio_lock);

View File

@ -0,0 +1,514 @@
From patchwork Thu Mar 9 10:57:44 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Daniel Golle <daniel@makrotopia.org>
X-Patchwork-Id: 13167235
X-Patchwork-Delegate: kuba@kernel.org
Return-Path: <netdev-owner@vger.kernel.org>
Date: Thu, 9 Mar 2023 10:57:44 +0000
From: Daniel Golle <daniel@makrotopia.org>
To: netdev@vger.kernel.org, linux-mediatek@lists.infradead.org,
linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org,
Russell King <linux@armlinux.org.uk>,
Heiner Kallweit <hkallweit1@gmail.com>,
Lorenzo Bianconi <lorenzo@kernel.org>,
Mark Lee <Mark-MC.Lee@mediatek.com>,
John Crispin <john@phrozen.org>, Felix Fietkau <nbd@nbd.name>,
AngeloGioacchino Del Regno
<angelogioacchino.delregno@collabora.com>,
Matthias Brugger <matthias.bgg@gmail.com>,
DENG Qingfang <dqfext@gmail.com>,
Landen Chao <Landen.Chao@mediatek.com>,
Sean Wang <sean.wang@mediatek.com>,
Paolo Abeni <pabeni@redhat.com>,
Jakub Kicinski <kuba@kernel.org>,
Eric Dumazet <edumazet@google.com>,
"David S. Miller" <davem@davemloft.net>,
Vladimir Oltean <olteanv@gmail.com>,
Florian Fainelli <f.fainelli@gmail.com>,
Andrew Lunn <andrew@lunn.ch>,
Vladimir Oltean <vladimir.oltean@nxp.com>
Cc: =?iso-8859-1?q?Bj=F8rn?= Mork <bjorn@mork.no>,
Frank Wunderlich <frank-w@public-files.de>,
Alexander Couzens <lynxis@fe80.eu>
Subject: [PATCH net-next v13 11/16] net: dsa: mt7530: use external PCS driver
Message-ID:
<2ac2ee40d3b0e705461b50613fda6a7edfdbc4b3.1678357225.git.daniel@makrotopia.org>
References: <cover.1678357225.git.daniel@makrotopia.org>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <cover.1678357225.git.daniel@makrotopia.org>
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org
X-Patchwork-Delegate: kuba@kernel.org
Implement regmap access wrappers, for now only to be used by the
pcs-mtk driver.
Make use of external PCS driver and drop the reduntant implementation
in mt7530.c.
As a nice side effect the SGMII registers can now also more easily be
inspected for debugging via /sys/kernel/debug/regmap.
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Tested-by: Frank Wunderlich <frank-w@public-files.de>
---
drivers/net/dsa/Kconfig | 1 +
drivers/net/dsa/mt7530.c | 277 ++++++++++-----------------------------
drivers/net/dsa/mt7530.h | 47 +------
3 files changed, 71 insertions(+), 254 deletions(-)
--- a/drivers/net/dsa/Kconfig
+++ b/drivers/net/dsa/Kconfig
@@ -37,6 +37,7 @@ config NET_DSA_MT7530
tristate "MediaTek MT753x and MT7621 Ethernet switch support"
select NET_DSA_TAG_MTK
select MEDIATEK_GE_PHY
+ select PCS_MTK_LYNXI
help
This enables support for the MediaTek MT7530, MT7531, and MT7621
Ethernet switch chips.
--- a/drivers/net/dsa/mt7530.c
+++ b/drivers/net/dsa/mt7530.c
@@ -14,6 +14,7 @@
#include <linux/of_mdio.h>
#include <linux/of_net.h>
#include <linux/of_platform.h>
+#include <linux/pcs/pcs-mtk-lynxi.h>
#include <linux/phylink.h>
#include <linux/regmap.h>
#include <linux/regulator/consumer.h>
@@ -2597,128 +2598,11 @@ static int mt7531_rgmii_setup(struct mt7
return 0;
}
-static void mt7531_pcs_link_up(struct phylink_pcs *pcs, unsigned int mode,
- phy_interface_t interface, int speed, int duplex)
-{
- struct mt7530_priv *priv = pcs_to_mt753x_pcs(pcs)->priv;
- int port = pcs_to_mt753x_pcs(pcs)->port;
- unsigned int val;
-
- /* For adjusting speed and duplex of SGMII force mode. */
- if (interface != PHY_INTERFACE_MODE_SGMII ||
- phylink_autoneg_inband(mode))
- return;
-
- /* SGMII force mode setting */
- val = mt7530_read(priv, MT7531_SGMII_MODE(port));
- val &= ~MT7531_SGMII_IF_MODE_MASK;
-
- switch (speed) {
- case SPEED_10:
- val |= MT7531_SGMII_FORCE_SPEED_10;
- break;
- case SPEED_100:
- val |= MT7531_SGMII_FORCE_SPEED_100;
- break;
- case SPEED_1000:
- val |= MT7531_SGMII_FORCE_SPEED_1000;
- break;
- }
-
- /* MT7531 SGMII 1G force mode can only work in full duplex mode,
- * no matter MT7531_SGMII_FORCE_HALF_DUPLEX is set or not.
- *
- * The speed check is unnecessary as the MAC capabilities apply
- * this restriction. --rmk
- */
- if ((speed == SPEED_10 || speed == SPEED_100) &&
- duplex != DUPLEX_FULL)
- val |= MT7531_SGMII_FORCE_HALF_DUPLEX;
-
- mt7530_write(priv, MT7531_SGMII_MODE(port), val);
-}
-
static bool mt753x_is_mac_port(u32 port)
{
return (port == 5 || port == 6);
}
-static int mt7531_sgmii_setup_mode_force(struct mt7530_priv *priv, u32 port,
- phy_interface_t interface)
-{
- u32 val;
-
- if (!mt753x_is_mac_port(port))
- return -EINVAL;
-
- mt7530_set(priv, MT7531_QPHY_PWR_STATE_CTRL(port),
- MT7531_SGMII_PHYA_PWD);
-
- val = mt7530_read(priv, MT7531_PHYA_CTRL_SIGNAL3(port));
- val &= ~MT7531_RG_TPHY_SPEED_MASK;
- /* Setup 2.5 times faster clock for 2.5Gbps data speeds with 10B/8B
- * encoding.
- */
- val |= (interface == PHY_INTERFACE_MODE_2500BASEX) ?
- MT7531_RG_TPHY_SPEED_3_125G : MT7531_RG_TPHY_SPEED_1_25G;
- mt7530_write(priv, MT7531_PHYA_CTRL_SIGNAL3(port), val);
-
- mt7530_clear(priv, MT7531_PCS_CONTROL_1(port), MT7531_SGMII_AN_ENABLE);
-
- /* MT7531 SGMII 1G and 2.5G force mode can only work in full duplex
- * mode, no matter MT7531_SGMII_FORCE_HALF_DUPLEX is set or not.
- */
- mt7530_rmw(priv, MT7531_SGMII_MODE(port),
- MT7531_SGMII_IF_MODE_MASK | MT7531_SGMII_REMOTE_FAULT_DIS,
- MT7531_SGMII_FORCE_SPEED_1000);
-
- mt7530_write(priv, MT7531_QPHY_PWR_STATE_CTRL(port), 0);
-
- return 0;
-}
-
-static int mt7531_sgmii_setup_mode_an(struct mt7530_priv *priv, int port,
- phy_interface_t interface)
-{
- if (!mt753x_is_mac_port(port))
- return -EINVAL;
-
- mt7530_set(priv, MT7531_QPHY_PWR_STATE_CTRL(port),
- MT7531_SGMII_PHYA_PWD);
-
- mt7530_rmw(priv, MT7531_PHYA_CTRL_SIGNAL3(port),
- MT7531_RG_TPHY_SPEED_MASK, MT7531_RG_TPHY_SPEED_1_25G);
-
- mt7530_set(priv, MT7531_SGMII_MODE(port),
- MT7531_SGMII_REMOTE_FAULT_DIS |
- MT7531_SGMII_SPEED_DUPLEX_AN);
-
- mt7530_rmw(priv, MT7531_PCS_SPEED_ABILITY(port),
- MT7531_SGMII_TX_CONFIG_MASK, 1);
-
- mt7530_set(priv, MT7531_PCS_CONTROL_1(port), MT7531_SGMII_AN_ENABLE);
-
- mt7530_set(priv, MT7531_PCS_CONTROL_1(port), MT7531_SGMII_AN_RESTART);
-
- mt7530_write(priv, MT7531_QPHY_PWR_STATE_CTRL(port), 0);
-
- return 0;
-}
-
-static void mt7531_pcs_an_restart(struct phylink_pcs *pcs)
-{
- struct mt7530_priv *priv = pcs_to_mt753x_pcs(pcs)->priv;
- int port = pcs_to_mt753x_pcs(pcs)->port;
- u32 val;
-
- /* Only restart AN when AN is enabled */
- val = mt7530_read(priv, MT7531_PCS_CONTROL_1(port));
- if (val & MT7531_SGMII_AN_ENABLE) {
- val |= MT7531_SGMII_AN_RESTART;
- mt7530_write(priv, MT7531_PCS_CONTROL_1(port), val);
- }
-}
-
static int
mt7531_mac_config(struct dsa_switch *ds, int port, unsigned int mode,
phy_interface_t interface)
@@ -2741,11 +2625,11 @@ mt7531_mac_config(struct dsa_switch *ds,
phydev = dp->slave->phydev;
return mt7531_rgmii_setup(priv, port, interface, phydev);
case PHY_INTERFACE_MODE_SGMII:
- return mt7531_sgmii_setup_mode_an(priv, port, interface);
case PHY_INTERFACE_MODE_NA:
case PHY_INTERFACE_MODE_1000BASEX:
case PHY_INTERFACE_MODE_2500BASEX:
- return mt7531_sgmii_setup_mode_force(priv, port, interface);
+ /* handled in SGMII PCS driver */
+ return 0;
default:
return -EINVAL;
}
@@ -2770,11 +2654,11 @@ mt753x_phylink_mac_select_pcs(struct dsa
switch (interface) {
case PHY_INTERFACE_MODE_TRGMII:
+ return &priv->pcs[port].pcs;
case PHY_INTERFACE_MODE_SGMII:
case PHY_INTERFACE_MODE_1000BASEX:
case PHY_INTERFACE_MODE_2500BASEX:
- return &priv->pcs[port].pcs;
-
+ return priv->ports[port].sgmii_pcs;
default:
return NULL;
}
@@ -3015,86 +2899,6 @@ static void mt7530_pcs_get_state(struct
state->pause |= MLO_PAUSE_TX;
}
-static int
-mt7531_sgmii_pcs_get_state_an(struct mt7530_priv *priv, int port,
- struct phylink_link_state *state)
-{
- u32 status, val;
- u16 config_reg;
-
- status = mt7530_read(priv, MT7531_PCS_CONTROL_1(port));
- state->link = !!(status & MT7531_SGMII_LINK_STATUS);
- state->an_complete = !!(status & MT7531_SGMII_AN_COMPLETE);
- if (state->interface == PHY_INTERFACE_MODE_SGMII &&
- (status & MT7531_SGMII_AN_ENABLE)) {
- val = mt7530_read(priv, MT7531_PCS_SPEED_ABILITY(port));
- config_reg = val >> 16;
-
- switch (config_reg & LPA_SGMII_SPD_MASK) {
- case LPA_SGMII_1000:
- state->speed = SPEED_1000;
- break;
- case LPA_SGMII_100:
- state->speed = SPEED_100;
- break;
- case LPA_SGMII_10:
- state->speed = SPEED_10;
- break;
- default:
- dev_err(priv->dev, "invalid sgmii PHY speed\n");
- state->link = false;
- return -EINVAL;
- }
-
- if (config_reg & LPA_SGMII_FULL_DUPLEX)
- state->duplex = DUPLEX_FULL;
- else
- state->duplex = DUPLEX_HALF;
- }
-
- return 0;
-}
-
-static void
-mt7531_sgmii_pcs_get_state_inband(struct mt7530_priv *priv, int port,
- struct phylink_link_state *state)
-{
- unsigned int val;
-
- val = mt7530_read(priv, MT7531_PCS_CONTROL_1(port));
- state->link = !!(val & MT7531_SGMII_LINK_STATUS);
- if (!state->link)
- return;
-
- state->an_complete = state->link;
-
- if (state->interface == PHY_INTERFACE_MODE_2500BASEX)
- state->speed = SPEED_2500;
- else
- state->speed = SPEED_1000;
-
- state->duplex = DUPLEX_FULL;
- state->pause = MLO_PAUSE_NONE;
-}
-
-static void mt7531_pcs_get_state(struct phylink_pcs *pcs,
- struct phylink_link_state *state)
-{
- struct mt7530_priv *priv = pcs_to_mt753x_pcs(pcs)->priv;
- int port = pcs_to_mt753x_pcs(pcs)->port;
-
- if (state->interface == PHY_INTERFACE_MODE_SGMII) {
- mt7531_sgmii_pcs_get_state_an(priv, port, state);
- return;
- } else if ((state->interface == PHY_INTERFACE_MODE_1000BASEX) ||
- (state->interface == PHY_INTERFACE_MODE_2500BASEX)) {
- mt7531_sgmii_pcs_get_state_inband(priv, port, state);
- return;
- }
-
- state->link = false;
-}
-
static int mt753x_pcs_config(struct phylink_pcs *pcs, unsigned int mode,
phy_interface_t interface,
const unsigned long *advertising,
@@ -3114,18 +2918,57 @@ static const struct phylink_pcs_ops mt75
.pcs_an_restart = mt7530_pcs_an_restart,
};
-static const struct phylink_pcs_ops mt7531_pcs_ops = {
- .pcs_validate = mt753x_pcs_validate,
- .pcs_get_state = mt7531_pcs_get_state,
- .pcs_config = mt753x_pcs_config,
- .pcs_an_restart = mt7531_pcs_an_restart,
- .pcs_link_up = mt7531_pcs_link_up,
+static int mt7530_regmap_read(void *context, unsigned int reg, unsigned int *val)
+{
+ struct mt7530_priv *priv = context;
+
+ *val = mt7530_read(priv, reg);
+ return 0;
+};
+
+static int mt7530_regmap_write(void *context, unsigned int reg, unsigned int val)
+{
+ struct mt7530_priv *priv = context;
+
+ mt7530_write(priv, reg, val);
+ return 0;
+};
+
+static int mt7530_regmap_update_bits(void *context, unsigned int reg,
+ unsigned int mask, unsigned int val)
+{
+ struct mt7530_priv *priv = context;
+
+ mt7530_rmw(priv, reg, mask, val);
+ return 0;
+};
+
+static const struct regmap_bus mt7531_regmap_bus = {
+ .reg_write = mt7530_regmap_write,
+ .reg_read = mt7530_regmap_read,
+ .reg_update_bits = mt7530_regmap_update_bits,
+};
+
+#define MT7531_PCS_REGMAP_CONFIG(_name, _reg_base) \
+ { \
+ .name = _name, \
+ .reg_bits = 16, \
+ .val_bits = 32, \
+ .reg_stride = 4, \
+ .reg_base = _reg_base, \
+ .max_register = 0x17c, \
+ }
+
+static const struct regmap_config mt7531_pcs_config[] = {
+ MT7531_PCS_REGMAP_CONFIG("port5", MT7531_SGMII_REG_BASE(5)),
+ MT7531_PCS_REGMAP_CONFIG("port6", MT7531_SGMII_REG_BASE(6)),
};
static int
mt753x_setup(struct dsa_switch *ds)
{
struct mt7530_priv *priv = ds->priv;
+ struct regmap *regmap;
int i, ret;
/* Initialise the PCS devices */
@@ -3133,8 +2976,6 @@ mt753x_setup(struct dsa_switch *ds)
priv->pcs[i].pcs.ops = priv->info->pcs_ops;
priv->pcs[i].priv = priv;
priv->pcs[i].port = i;
- if (mt753x_is_mac_port(i))
- priv->pcs[i].pcs.poll = 1;
}
ret = priv->info->sw_setup(ds);
@@ -3149,6 +2990,16 @@ mt753x_setup(struct dsa_switch *ds)
if (ret && priv->irq)
mt7530_free_irq_common(priv);
+ if (priv->id == ID_MT7531)
+ for (i = 0; i < 2; i++) {
+ regmap = devm_regmap_init(ds->dev,
+ &mt7531_regmap_bus, priv,
+ &mt7531_pcs_config[i]);
+ priv->ports[5 + i].sgmii_pcs =
+ mtk_pcs_lynxi_create(ds->dev, regmap,
+ MT7531_PHYA_CTRL_SIGNAL3, 0);
+ }
+
return ret;
}
@@ -3240,7 +3091,7 @@ static const struct mt753x_info mt753x_t
},
[ID_MT7531] = {
.id = ID_MT7531,
- .pcs_ops = &mt7531_pcs_ops,
+ .pcs_ops = &mt7530_pcs_ops,
.sw_setup = mt7531_setup,
.phy_read = mt7531_ind_phy_read,
.phy_write = mt7531_ind_phy_write,
@@ -3348,7 +3199,7 @@ static void
mt7530_remove(struct mdio_device *mdiodev)
{
struct mt7530_priv *priv = dev_get_drvdata(&mdiodev->dev);
- int ret = 0;
+ int ret = 0, i;
if (!priv)
return;
@@ -3367,6 +3218,10 @@ mt7530_remove(struct mdio_device *mdiode
mt7530_free_irq(priv);
dsa_unregister_switch(priv->ds);
+
+ for (i = 0; i < 2; ++i)
+ mtk_pcs_lynxi_destroy(priv->ports[5 + i].sgmii_pcs);
+
mutex_destroy(&priv->reg_mutex);
}
--- a/drivers/net/dsa/mt7530.h
+++ b/drivers/net/dsa/mt7530.h
@@ -364,47 +364,8 @@ enum mt7530_vlan_port_acc_frm {
CCR_TX_OCT_CNT_BAD)
/* MT7531 SGMII register group */
-#define MT7531_SGMII_REG_BASE 0x5000
-#define MT7531_SGMII_REG(p, r) (MT7531_SGMII_REG_BASE + \
- ((p) - 5) * 0x1000 + (r))
-
-/* Register forSGMII PCS_CONTROL_1 */
-#define MT7531_PCS_CONTROL_1(p) MT7531_SGMII_REG(p, 0x00)
-#define MT7531_SGMII_LINK_STATUS BIT(18)
-#define MT7531_SGMII_AN_ENABLE BIT(12)
-#define MT7531_SGMII_AN_RESTART BIT(9)
-#define MT7531_SGMII_AN_COMPLETE BIT(21)
-
-/* Register for SGMII PCS_SPPED_ABILITY */
-#define MT7531_PCS_SPEED_ABILITY(p) MT7531_SGMII_REG(p, 0x08)
-#define MT7531_SGMII_TX_CONFIG_MASK GENMASK(15, 0)
-#define MT7531_SGMII_TX_CONFIG BIT(0)
-
-/* Register for SGMII_MODE */
-#define MT7531_SGMII_MODE(p) MT7531_SGMII_REG(p, 0x20)
-#define MT7531_SGMII_REMOTE_FAULT_DIS BIT(8)
-#define MT7531_SGMII_IF_MODE_MASK GENMASK(5, 1)
-#define MT7531_SGMII_FORCE_DUPLEX BIT(4)
-#define MT7531_SGMII_FORCE_SPEED_MASK GENMASK(3, 2)
-#define MT7531_SGMII_FORCE_SPEED_1000 BIT(3)
-#define MT7531_SGMII_FORCE_SPEED_100 BIT(2)
-#define MT7531_SGMII_FORCE_SPEED_10 0
-#define MT7531_SGMII_SPEED_DUPLEX_AN BIT(1)
-
-enum mt7531_sgmii_force_duplex {
- MT7531_SGMII_FORCE_FULL_DUPLEX = 0,
- MT7531_SGMII_FORCE_HALF_DUPLEX = 0x10,
-};
-
-/* Fields of QPHY_PWR_STATE_CTRL */
-#define MT7531_QPHY_PWR_STATE_CTRL(p) MT7531_SGMII_REG(p, 0xe8)
-#define MT7531_SGMII_PHYA_PWD BIT(4)
-
-/* Values of SGMII SPEED */
-#define MT7531_PHYA_CTRL_SIGNAL3(p) MT7531_SGMII_REG(p, 0x128)
-#define MT7531_RG_TPHY_SPEED_MASK (BIT(2) | BIT(3))
-#define MT7531_RG_TPHY_SPEED_1_25G 0x0
-#define MT7531_RG_TPHY_SPEED_3_125G BIT(2)
+#define MT7531_SGMII_REG_BASE(p) (0x5000 + ((p) - 5) * 0x1000)
+#define MT7531_PHYA_CTRL_SIGNAL3 0x128
/* Register for system reset */
#define MT7530_SYS_CTRL 0x7000
@@ -703,13 +664,13 @@ struct mt7530_fdb {
* @pm: The matrix used to show all connections with the port.
* @pvid: The VLAN specified is to be considered a PVID at ingress. Any
* untagged frames will be assigned to the related VLAN.
- * @vlan_filtering: The flags indicating whether the port that can recognize
- * VLAN-tagged frames.
+ * @sgmii_pcs: Pointer to PCS instance for SerDes ports
*/
struct mt7530_port {
bool enable;
u32 pm;
u16 pvid;
+ struct phylink_pcs *sgmii_pcs;
};
/* Port 5 interface select definitions */

View File

@ -0,0 +1,82 @@
From fbfc4ca465a1f8d81bf2d67d95bf7fc67c3cf0c2 Mon Sep 17 00:00:00 2001
From: Patrick Delaunay <patrick.delaunay@foss.st.com>
Date: Fri, 18 Nov 2022 06:39:20 +0000
Subject: [PATCH] nvmem: stm32: move STM32MP15_BSEC_NUM_LOWER in config
Support STM32MP15_BSEC_NUM_LOWER in stm32 romem config to prepare
the next SoC in STM32MP family.
Signed-off-by: Patrick Delaunay <patrick.delaunay@foss.st.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20221118063932.6418-2-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/nvmem/stm32-romem.c | 21 ++++++++++++++++-----
1 file changed, 16 insertions(+), 5 deletions(-)
--- a/drivers/nvmem/stm32-romem.c
+++ b/drivers/nvmem/stm32-romem.c
@@ -22,16 +22,15 @@
/* shadow registers offest */
#define STM32MP15_BSEC_DATA0 0x200
-/* 32 (x 32-bits) lower shadow registers */
-#define STM32MP15_BSEC_NUM_LOWER 32
-
struct stm32_romem_cfg {
int size;
+ u8 lower;
};
struct stm32_romem_priv {
void __iomem *base;
struct nvmem_config cfg;
+ u8 lower;
};
static int stm32_romem_read(void *context, unsigned int offset, void *buf,
@@ -85,7 +84,7 @@ static int stm32_bsec_read(void *context
for (i = roffset; (i < roffset + rbytes); i += 4) {
u32 otp = i >> 2;
- if (otp < STM32MP15_BSEC_NUM_LOWER) {
+ if (otp < priv->lower) {
/* read lower data from shadow registers */
val = readl_relaxed(
priv->base + STM32MP15_BSEC_DATA0 + i);
@@ -159,6 +158,8 @@ static int stm32_romem_probe(struct plat
priv->cfg.priv = priv;
priv->cfg.owner = THIS_MODULE;
+ priv->lower = 0;
+
cfg = (const struct stm32_romem_cfg *)
of_match_device(dev->driver->of_match_table, dev)->data;
if (!cfg) {
@@ -167,6 +168,7 @@ static int stm32_romem_probe(struct plat
priv->cfg.reg_read = stm32_romem_read;
} else {
priv->cfg.size = cfg->size;
+ priv->lower = cfg->lower;
priv->cfg.reg_read = stm32_bsec_read;
priv->cfg.reg_write = stm32_bsec_write;
}
@@ -174,8 +176,17 @@ static int stm32_romem_probe(struct plat
return PTR_ERR_OR_ZERO(devm_nvmem_register(dev, &priv->cfg));
}
+/*
+ * STM32MP15 BSEC OTP regions: 4096 OTP bits (with 3072 effective bits)
+ * => 96 x 32-bits data words
+ * - Lower: 1K bits, 2:1 redundancy, incremental bit programming
+ * => 32 (x 32-bits) lower shadow registers = words 0 to 31
+ * - Upper: 2K bits, ECC protection, word programming only
+ * => 64 (x 32-bits) = words 32 to 95
+ */
static const struct stm32_romem_cfg stm32mp15_bsec_cfg = {
- .size = 384, /* 96 x 32-bits data words */
+ .size = 384,
+ .lower = 32,
};
static const struct of_device_id stm32_romem_of_match[] = {

View File

@ -0,0 +1,34 @@
From d61784e6410f3df2028e6eb91b06ffed37a660e0 Mon Sep 17 00:00:00 2001
From: Patrick Delaunay <patrick.delaunay@foss.st.com>
Date: Fri, 18 Nov 2022 06:39:21 +0000
Subject: [PATCH] nvmem: stm32: add warning when upper OTPs are updated
As the upper OTPs are ECC protected, they support only one 32 bits word
programming.
For a second modification of this word, these ECC become invalid and
this OTP will be no more accessible, the shadowed value is invalid.
This patch adds a warning to indicate an upper OTP update, because this
operation is dangerous as OTP is not locked by the driver after the first
update to avoid a second update.
Signed-off-by: Patrick Delaunay <patrick.delaunay@foss.st.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20221118063932.6418-3-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/nvmem/stm32-romem.c | 3 +++
1 file changed, 3 insertions(+)
--- a/drivers/nvmem/stm32-romem.c
+++ b/drivers/nvmem/stm32-romem.c
@@ -132,6 +132,9 @@ static int stm32_bsec_write(void *contex
}
}
+ if (offset + bytes >= priv->lower * 4)
+ dev_warn(dev, "Update of upper OTPs with ECC protection (word programming, only once)\n");
+
return 0;
}

View File

@ -0,0 +1,26 @@
From a3816a7d7c097c1da46aad5f5d1e229b607dce04 Mon Sep 17 00:00:00 2001
From: Patrick Delaunay <patrick.delaunay@foss.st.com>
Date: Fri, 18 Nov 2022 06:39:22 +0000
Subject: [PATCH] nvmem: stm32: add nvmem type attribute
Inform NVMEM framework of type attribute for stm32-romem as NVMEM_TYPE_OTP
so userspace is able to know how the data is stored in BSEC.
Signed-off-by: Patrick Delaunay <patrick.delaunay@foss.st.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20221118063932.6418-4-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/nvmem/stm32-romem.c | 1 +
1 file changed, 1 insertion(+)
--- a/drivers/nvmem/stm32-romem.c
+++ b/drivers/nvmem/stm32-romem.c
@@ -160,6 +160,7 @@ static int stm32_romem_probe(struct plat
priv->cfg.dev = dev;
priv->cfg.priv = priv;
priv->cfg.owner = THIS_MODULE;
+ priv->cfg.type = NVMEM_TYPE_OTP;
priv->lower = 0;

View File

@ -0,0 +1,27 @@
From 06aac0e11960a7ddccc1888326b5906d017e0f24 Mon Sep 17 00:00:00 2001
From: Jiangshan Yi <yijiangshan@kylinos.cn>
Date: Fri, 18 Nov 2022 06:39:24 +0000
Subject: [PATCH] nvmem: stm32: fix spelling typo in comment
Fix spelling typo in comment.
Reported-by: k2ci <kernel-bot@kylinos.cn>
Signed-off-by: Jiangshan Yi <yijiangshan@kylinos.cn>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20221118063932.6418-6-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/nvmem/stm32-romem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/nvmem/stm32-romem.c
+++ b/drivers/nvmem/stm32-romem.c
@@ -19,7 +19,7 @@
#define STM32_SMC_WRITE_SHADOW 0x03
#define STM32_SMC_READ_OTP 0x04
-/* shadow registers offest */
+/* shadow registers offset */
#define STM32MP15_BSEC_DATA0 0x200
struct stm32_romem_cfg {

View File

@ -0,0 +1,27 @@
From fb817c4ef63e8cfb6e77ae4a2875ae854c80708f Mon Sep 17 00:00:00 2001
From: Colin Ian King <colin.i.king@gmail.com>
Date: Fri, 18 Nov 2022 06:39:26 +0000
Subject: [PATCH] nvmem: Kconfig: Fix spelling mistake "controlls" ->
"controls"
There is a spelling mistake in a Kconfig description. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20221118063932.6418-8-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/nvmem/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/nvmem/Kconfig
+++ b/drivers/nvmem/Kconfig
@@ -164,7 +164,7 @@ config NVMEM_MICROCHIP_OTPC
depends on ARCH_AT91 || COMPILE_TEST
help
This driver enable the OTP controller available on Microchip SAMA7G5
- SoCs. It controlls the access to the OTP memory connected to it.
+ SoCs. It controls the access to the OTP memory connected to it.
config NVMEM_MTK_EFUSE
tristate "Mediatek SoCs EFUSE support"

View File

@ -0,0 +1,67 @@
From ada84d07af6097b2addd18262668ce6cb9e15206 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Rafa=C5=82=20Mi=C5=82ecki?= <rafal@milecki.pl>
Date: Fri, 18 Nov 2022 06:39:27 +0000
Subject: [PATCH] nvmem: u-boot-env: add Broadcom format support
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Broadcom uses U-Boot for a lot of their bcmbca familiy chipsets. They
decided to store U-Boot environment data inside U-Boot partition and to
use a custom header (with "uEnv" magic and env data length).
Add support for Broadcom's specific binding and their custom format.
Ref: 6b0584c19d87 ("dt-bindings: nvmem: u-boot,env: add Broadcom's variant binding")
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20221118063932.6418-9-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/nvmem/u-boot-env.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
--- a/drivers/nvmem/u-boot-env.c
+++ b/drivers/nvmem/u-boot-env.c
@@ -16,6 +16,7 @@
enum u_boot_env_format {
U_BOOT_FORMAT_SINGLE,
U_BOOT_FORMAT_REDUNDANT,
+ U_BOOT_FORMAT_BROADCOM,
};
struct u_boot_env {
@@ -40,6 +41,13 @@ struct u_boot_env_image_redundant {
uint8_t data[];
} __packed;
+struct u_boot_env_image_broadcom {
+ __le32 magic;
+ __le32 len;
+ __le32 crc32;
+ uint8_t data[0];
+} __packed;
+
static int u_boot_env_read(void *context, unsigned int offset, void *val,
size_t bytes)
{
@@ -138,6 +146,11 @@ static int u_boot_env_parse(struct u_boo
crc32_data_offset = offsetof(struct u_boot_env_image_redundant, data);
data_offset = offsetof(struct u_boot_env_image_redundant, data);
break;
+ case U_BOOT_FORMAT_BROADCOM:
+ crc32_offset = offsetof(struct u_boot_env_image_broadcom, crc32);
+ crc32_data_offset = offsetof(struct u_boot_env_image_broadcom, data);
+ data_offset = offsetof(struct u_boot_env_image_broadcom, data);
+ break;
}
crc32 = le32_to_cpu(*(__le32 *)(buf + crc32_offset));
crc32_data_len = priv->mtd->size - crc32_data_offset;
@@ -202,6 +215,7 @@ static const struct of_device_id u_boot_
{ .compatible = "u-boot,env", .data = (void *)U_BOOT_FORMAT_SINGLE, },
{ .compatible = "u-boot,env-redundant-bool", .data = (void *)U_BOOT_FORMAT_REDUNDANT, },
{ .compatible = "u-boot,env-redundant-count", .data = (void *)U_BOOT_FORMAT_REDUNDANT, },
+ { .compatible = "brcm,env", .data = (void *)U_BOOT_FORMAT_BROADCOM, },
{},
};

View File

@ -0,0 +1,26 @@
From 2e8dc541ae207349b51c65391be625ffe1f86e0c Mon Sep 17 00:00:00 2001
From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Date: Mon, 6 Feb 2023 13:43:41 +0000
Subject: [PATCH] nvmem: core: remove spurious white space
Remove a spurious white space in for the ida_alloc() call.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20230206134356.839737-8-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/nvmem/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/nvmem/core.c
+++ b/drivers/nvmem/core.c
@@ -764,7 +764,7 @@ struct nvmem_device *nvmem_register(cons
if (!nvmem)
return ERR_PTR(-ENOMEM);
- rval = ida_alloc(&nvmem_ida, GFP_KERNEL);
+ rval = ida_alloc(&nvmem_ida, GFP_KERNEL);
if (rval < 0) {
kfree(nvmem);
return ERR_PTR(rval);

View File

@ -0,0 +1,180 @@
From 5d8e6e6c10a3d37486d263b16ddc15991a7e4a88 Mon Sep 17 00:00:00 2001
From: Michael Walle <michael@walle.cc>
Date: Mon, 6 Feb 2023 13:43:46 +0000
Subject: [PATCH] nvmem: core: add an index parameter to the cell
Sometimes a cell can represend multiple values. For example, a base
ethernet address stored in the NVMEM can be expanded into multiple
discreet ones by adding an offset.
For this use case, introduce an index parameter which is then used to
distiguish between values. This parameter will then be passed to the
post process hook which can then use it to create different values
during reading.
At the moment, there is only support for the device tree path. You can
add the index to the phandle, e.g.
&net {
nvmem-cells = <&base_mac_address 2>;
nvmem-cell-names = "mac-address";
};
&nvmem_provider {
base_mac_address: base-mac-address@0 {
#nvmem-cell-cells = <1>;
reg = <0 6>;
};
};
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20230206134356.839737-13-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
drivers/nvmem/core.c | 37 ++++++++++++++++++++++++----------
drivers/nvmem/imx-ocotp.c | 4 ++--
include/linux/nvmem-provider.h | 4 ++--
3 files changed, 30 insertions(+), 15 deletions(-)
--- a/drivers/nvmem/core.c
+++ b/drivers/nvmem/core.c
@@ -60,6 +60,7 @@ struct nvmem_cell_entry {
struct nvmem_cell {
struct nvmem_cell_entry *entry;
const char *id;
+ int index;
};
static DEFINE_MUTEX(nvmem_mutex);
@@ -1122,7 +1123,8 @@ struct nvmem_device *devm_nvmem_device_g
}
EXPORT_SYMBOL_GPL(devm_nvmem_device_get);
-static struct nvmem_cell *nvmem_create_cell(struct nvmem_cell_entry *entry, const char *id)
+static struct nvmem_cell *nvmem_create_cell(struct nvmem_cell_entry *entry,
+ const char *id, int index)
{
struct nvmem_cell *cell;
const char *name = NULL;
@@ -1141,6 +1143,7 @@ static struct nvmem_cell *nvmem_create_c
cell->id = name;
cell->entry = entry;
+ cell->index = index;
return cell;
}
@@ -1179,7 +1182,7 @@ nvmem_cell_get_from_lookup(struct device
__nvmem_device_put(nvmem);
cell = ERR_PTR(-ENOENT);
} else {
- cell = nvmem_create_cell(cell_entry, con_id);
+ cell = nvmem_create_cell(cell_entry, con_id, 0);
if (IS_ERR(cell))
__nvmem_device_put(nvmem);
}
@@ -1227,15 +1230,27 @@ struct nvmem_cell *of_nvmem_cell_get(str
struct nvmem_device *nvmem;
struct nvmem_cell_entry *cell_entry;
struct nvmem_cell *cell;
+ struct of_phandle_args cell_spec;
int index = 0;
+ int cell_index = 0;
+ int ret;
/* if cell name exists, find index to the name */
if (id)
index = of_property_match_string(np, "nvmem-cell-names", id);
- cell_np = of_parse_phandle(np, "nvmem-cells", index);
- if (!cell_np)
- return ERR_PTR(-ENOENT);
+ ret = of_parse_phandle_with_optional_args(np, "nvmem-cells",
+ "#nvmem-cell-cells",
+ index, &cell_spec);
+ if (ret)
+ return ERR_PTR(ret);
+
+ if (cell_spec.args_count > 1)
+ return ERR_PTR(-EINVAL);
+
+ cell_np = cell_spec.np;
+ if (cell_spec.args_count)
+ cell_index = cell_spec.args[0];
nvmem_np = of_get_parent(cell_np);
if (!nvmem_np) {
@@ -1257,7 +1272,7 @@ struct nvmem_cell *of_nvmem_cell_get(str
return ERR_PTR(-ENOENT);
}
- cell = nvmem_create_cell(cell_entry, id);
+ cell = nvmem_create_cell(cell_entry, id, cell_index);
if (IS_ERR(cell))
__nvmem_device_put(nvmem);
@@ -1410,8 +1425,8 @@ static void nvmem_shift_read_buffer_in_p
}
static int __nvmem_cell_read(struct nvmem_device *nvmem,
- struct nvmem_cell_entry *cell,
- void *buf, size_t *len, const char *id)
+ struct nvmem_cell_entry *cell,
+ void *buf, size_t *len, const char *id, int index)
{
int rc;
@@ -1425,7 +1440,7 @@ static int __nvmem_cell_read(struct nvme
nvmem_shift_read_buffer_in_place(cell, buf);
if (nvmem->cell_post_process) {
- rc = nvmem->cell_post_process(nvmem->priv, id,
+ rc = nvmem->cell_post_process(nvmem->priv, id, index,
cell->offset, buf, cell->bytes);
if (rc)
return rc;
@@ -1460,7 +1475,7 @@ void *nvmem_cell_read(struct nvmem_cell
if (!buf)
return ERR_PTR(-ENOMEM);
- rc = __nvmem_cell_read(nvmem, cell->entry, buf, len, cell->id);
+ rc = __nvmem_cell_read(nvmem, cell->entry, buf, len, cell->id, cell->index);
if (rc) {
kfree(buf);
return ERR_PTR(rc);
@@ -1773,7 +1788,7 @@ ssize_t nvmem_device_cell_read(struct nv
if (rc)
return rc;
- rc = __nvmem_cell_read(nvmem, &cell, buf, &len, NULL);
+ rc = __nvmem_cell_read(nvmem, &cell, buf, &len, NULL, 0);
if (rc)
return rc;
--- a/drivers/nvmem/imx-ocotp.c
+++ b/drivers/nvmem/imx-ocotp.c
@@ -222,8 +222,8 @@ read_end:
return ret;
}
-static int imx_ocotp_cell_pp(void *context, const char *id, unsigned int offset,
- void *data, size_t bytes)
+static int imx_ocotp_cell_pp(void *context, const char *id, int index,
+ unsigned int offset, void *data, size_t bytes)
{
struct ocotp_priv *priv = context;
--- a/include/linux/nvmem-provider.h
+++ b/include/linux/nvmem-provider.h
@@ -20,8 +20,8 @@ typedef int (*nvmem_reg_read_t)(void *pr
typedef int (*nvmem_reg_write_t)(void *priv, unsigned int offset,
void *val, size_t bytes);
/* used for vendor specific post processing of cell data */
-typedef int (*nvmem_cell_post_process_t)(void *priv, const char *id, unsigned int offset,
- void *buf, size_t bytes);
+typedef int (*nvmem_cell_post_process_t)(void *priv, const char *id, int index,
+ unsigned int offset, void *buf, size_t bytes);
enum nvmem_type {
NVMEM_TYPE_UNKNOWN = 0,

View File

@ -0,0 +1,78 @@
From fbd03d27776c6121a483921601418e3c8f0ff37e Mon Sep 17 00:00:00 2001
From: Michael Walle <michael@walle.cc>
Date: Mon, 6 Feb 2023 13:43:47 +0000
Subject: [PATCH] nvmem: core: move struct nvmem_cell_info to nvmem-provider.h
struct nvmem_cell_info is used to describe a cell. Thus this should
really be in the nvmem-provider's header. There are two (unused) nvmem
access methods which use the nvmem_cell_info to describe the cell to be
accesses. One can argue, that they will create a cell before accessing,
thus they are both a provider and a consumer.
struct nvmem_cell_info will get used more and more by nvmem-providers,
don't force them to also include the consumer header, although they are
not.
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20230206134356.839737-14-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
include/linux/nvmem-consumer.h | 10 +---------
include/linux/nvmem-provider.h | 19 ++++++++++++++++++-
2 files changed, 19 insertions(+), 10 deletions(-)
--- a/include/linux/nvmem-consumer.h
+++ b/include/linux/nvmem-consumer.h
@@ -18,15 +18,7 @@ struct device_node;
/* consumer cookie */
struct nvmem_cell;
struct nvmem_device;
-
-struct nvmem_cell_info {
- const char *name;
- unsigned int offset;
- unsigned int bytes;
- unsigned int bit_offset;
- unsigned int nbits;
- struct device_node *np;
-};
+struct nvmem_cell_info;
/**
* struct nvmem_cell_lookup - cell lookup entry
--- a/include/linux/nvmem-provider.h
+++ b/include/linux/nvmem-provider.h
@@ -14,7 +14,6 @@
#include <linux/gpio/consumer.h>
struct nvmem_device;
-struct nvmem_cell_info;
typedef int (*nvmem_reg_read_t)(void *priv, unsigned int offset,
void *val, size_t bytes);
typedef int (*nvmem_reg_write_t)(void *priv, unsigned int offset,
@@ -48,6 +47,24 @@ struct nvmem_keepout {
};
/**
+ * struct nvmem_cell_info - NVMEM cell description
+ * @name: Name.
+ * @offset: Offset within the NVMEM device.
+ * @bytes: Length of the cell.
+ * @bit_offset: Bit offset if cell is smaller than a byte.
+ * @nbits: Number of bits.
+ * @np: Optional device_node pointer.
+ */
+struct nvmem_cell_info {
+ const char *name;
+ unsigned int offset;
+ unsigned int bytes;
+ unsigned int bit_offset;
+ unsigned int nbits;
+ struct device_node *np;
+};
+
+/**
* struct nvmem_config - NVMEM device configuration
*
* @dev: Parent device.

Some files were not shown because too many files have changed in this diff Show More