mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
synced 2026-07-02 12:26:35 +02:00
c162d5b4338d72deed61aa65ed0f2f4ba2bbc8ab
26304 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
c162d5b433 |
printk: Hide console waiter logic into helpers
The commit ("printk: Add console owner and waiter logic to load balance
console writes") made vprintk_emit() and console_unlock() even more
complicated.
This patch extracts the new code into 3 helper functions. They should
help to keep it rather self-contained. It will be easier to use and
maintain.
This patch just shuffles the existing code. It does not change
the functionality.
Link: http://lkml.kernel.org/r/20180112160837.GD24497@linux.suse
Cc: akpm@linux-foundation.org
Cc: linux-mm@kvack.org
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: rostedt@home.goodmis.org
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
|
||
|
|
dbdda842fe |
printk: Add console owner and waiter logic to load balance console writes
This patch implements what I discussed in Kernel Summit. I added
lockdep annotation (hopefully correctly), and it hasn't had any splats
(since I fixed some bugs in the first iterations). It did catch
problems when I had the owner covering too much. But now that the owner
is only set when actively calling the consoles, lockdep has stayed
quiet.
Here's the design again:
I added a "console_owner" which is set to a task that is actively
writing to the consoles. It is *not* the same as the owner of the
console_lock. It is only set when doing the calls to the console
functions. It is protected by a console_owner_lock which is a raw spin
lock.
There is a console_waiter. This is set when there is an active console
owner that is not current, and waiter is not set. This too is protected
by console_owner_lock.
In printk() when it tries to write to the consoles, we have:
if (console_trylock())
console_unlock();
Now I added an else, which will check if there is an active owner, and
no current waiter. If that is the case, then console_waiter is set, and
the task goes into a spin until it is no longer set.
When the active console owner finishes writing the current message to
the consoles, it grabs the console_owner_lock and sees if there is a
waiter, and clears console_owner.
If there is a waiter, then it breaks out of the loop, clears the waiter
flag (because that will release the waiter from its spin), and exits.
Note, it does *not* release the console semaphore. Because it is a
semaphore, there is no owner. Another task may release it. This means
that the waiter is guaranteed to be the new console owner! Which it
becomes.
Then the waiter calls console_unlock() and continues to write to the
consoles.
If another task comes along and does a printk() it too can become the
new waiter, and we wash rinse and repeat!
By Petr Mladek about possible new deadlocks:
The thing is that we move console_sem only to printk() call
that normally calls console_unlock() as well. It means that
the transferred owner should not bring new type of dependencies.
As Steven said somewhere: "If there is a deadlock, it was
there even before."
We could look at it from this side. The possible deadlock would
look like:
CPU0 CPU1
console_unlock()
console_owner = current;
spin_lockA()
printk()
spin = true;
while (...)
call_console_drivers()
spin_lockA()
This would be a deadlock. CPU0 would wait for the lock A.
While CPU1 would own the lockA and would wait for CPU0
to finish calling the console drivers and pass the console_sem
owner.
But if the above is true than the following scenario was
already possible before:
CPU0
spin_lockA()
printk()
console_unlock()
call_console_drivers()
spin_lockA()
By other words, this deadlock was there even before. Such
deadlocks are prevented by using printk_deferred() in
the sections guarded by the lock A.
By Steven Rostedt:
To demonstrate the issue, this module has been shown to lock up a
system with 4 CPUs and a slow console (like a serial console). It is
also able to lock up a 8 CPU system with only a fast (VGA) console, by
passing in "loops=100". The changes in this commit prevent this module
from locking up the system.
#include <linux/module.h>
#include <linux/delay.h>
#include <linux/sched.h>
#include <linux/mutex.h>
#include <linux/workqueue.h>
#include <linux/hrtimer.h>
static bool stop_testing;
static unsigned int loops = 1;
static void preempt_printk_workfn(struct work_struct *work)
{
int i;
while (!READ_ONCE(stop_testing)) {
for (i = 0; i < loops && !READ_ONCE(stop_testing); i++) {
preempt_disable();
pr_emerg("%5d%-75s\n", smp_processor_id(),
" XXX NOPREEMPT");
preempt_enable();
}
msleep(1);
}
}
static struct work_struct __percpu *works;
static void finish(void)
{
int cpu;
WRITE_ONCE(stop_testing, true);
for_each_online_cpu(cpu)
flush_work(per_cpu_ptr(works, cpu));
free_percpu(works);
}
static int __init test_init(void)
{
int cpu;
works = alloc_percpu(struct work_struct);
if (!works)
return -ENOMEM;
/*
* This is just a test module. This will break if you
* do any CPU hot plugging between loading and
* unloading the module.
*/
for_each_online_cpu(cpu) {
struct work_struct *work = per_cpu_ptr(works, cpu);
INIT_WORK(work, &preempt_printk_workfn);
schedule_work_on(cpu, work);
}
return 0;
}
static void __exit test_exit(void)
{
finish();
}
module_param(loops, uint, 0);
module_init(test_init);
module_exit(test_exit);
MODULE_LICENSE("GPL");
Link: http://lkml.kernel.org/r/20180110132418.7080-2-pmladek@suse.com
Cc: akpm@linux-foundation.org
Cc: linux-mm@kvack.org
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
[pmladek@suse.com: Commit message about possible deadlocks]
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
|
||
|
|
11ca75d2d6 |
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk
Pull printk updates from Petr Mladek: - print the warning about dropped messages on consoles on a separate line. It makes it more legible. - one typo fix and small code clean up. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: added new line symbol after warning about dropped messages printk: fix typo in printk_safe.c printk: simplify no_printk() |
||
|
|
fa7f578076 |
Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton: - a bit more MM - procfs updates - dynamic-debug fixes - lib/ updates - checkpatch - epoll - nilfs2 - signals - rapidio - PID management cleanup and optimization - kcov updates - sysvipc updates - quite a few misc things all over the place * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (94 commits) EXPERT Kconfig menu: fix broken EXPERT menu include/asm-generic/topology.h: remove unused parent_node() macro arch/tile/include/asm/topology.h: remove unused parent_node() macro arch/sparc/include/asm/topology_64.h: remove unused parent_node() macro arch/sh/include/asm/topology.h: remove unused parent_node() macro arch/ia64/include/asm/topology.h: remove unused parent_node() macro drivers/pcmcia/sa1111_badge4.c: avoid unused function warning mm: add infrastructure for get_user_pages_fast() benchmarking sysvipc: make get_maxid O(1) again sysvipc: properly name ipc_addid() limit parameter sysvipc: duplicate lock comments wrt ipc_addid() sysvipc: unteach ids->next_id for !CHECKPOINT_RESTORE initramfs: use time64_t timestamps drivers/watchdog: make use of devm_register_reboot_notifier() kernel/reboot.c: add devm_register_reboot_notifier() kcov: update documentation Makefile: support flag -fsanitizer-coverage=trace-cmp kcov: support comparison operands collection kcov: remove pointless current != NULL check kernel/panic.c: add TAINT_AUX ... |
||
|
|
2d8364bae4 |
kernel/reboot.c: add devm_register_reboot_notifier()
Add devm_* wrapper around register_reboot_notifier to simplify device specific reboot notifier registration/unregistration. [akpm@linux-foundation.org: move `struct device' forward decl to top-of-file] Link: http://lkml.kernel.org/r/20170320171753.1705-1-andrew.smirnov@gmail.com Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
ded97d2c2b |
kcov: support comparison operands collection
Enables kcov to collect comparison operands from instrumented code. This is done by using Clang's -fsanitize=trace-cmp instrumentation (currently not available for GCC). The comparison operands help a lot in fuzz testing. E.g. they are used in Syzkaller to cover the interiors of conditional statements with way less attempts and thus make previously unreachable code reachable. To allow separate collection of coverage and comparison operands two different work modes are implemented. Mode selection is now done via a KCOV_ENABLE ioctl call with corresponding argument value. Link: http://lkml.kernel.org/r/20171011095459.70721-1-glider@google.com Signed-off-by: Victor Chibotaru <tchibo@google.com> Signed-off-by: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Popov <alex.popov@linux.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Kees Cook <keescook@chromium.org> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: <syzkaller@googlegroups.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
fcf4edac04 |
kcov: remove pointless current != NULL check
__sanitizer_cov_trace_pc() is a hot code, so it's worth to remove pointless '!current' check. Current is never NULL. Link: http://lkml.kernel.org/r/20170929162221.32500-1-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Acked-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Cc: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
4efb442cc1 |
kernel/panic.c: add TAINT_AUX
This is the gist of a patch which we've been forward-porting in our kernels for a long time now and it probably would make a good sense to have such TAINT_AUX flag upstream which can be used by each distro etc, how they see fit. This way, we won't need to forward-port a distro-only version indefinitely. Add an auxiliary taint flag to be used by distros and others. This obviates the need to forward-port whatever internal solutions people have in favor of a single flag which they can map arbitrarily to a definition of their pleasing. The "X" mnemonic could also mean eXternal, which would be taint from a distro or something else but not the upstream kernel. We will use it to mark modules for which we don't provide support. I.e., a really eXternal module. Link: http://lkml.kernel.org/r/20170911134533.dp5mtyku5bongx4c@pd.tnic Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Kees Cook <keescook@chromium.org> Cc: Jessica Yu <jeyu@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Takashi Iwai <tiwai@suse.de> Cc: Petr Mladek <pmladek@suse.com> Cc: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
e8cfbc245e |
pid: remove pidhash
pidhash is no longer required as all the information can be looked up from idr tree. nr_hashed represented the number of pids that had been hashed. Since, nr_hashed and PIDNS_HASH_ADDING are no longer relevant, it has been renamed to pid_allocated and PIDNS_ADDING respectively. [gs051095@gmail.com: v6] Link: http://lkml.kernel.org/r/1507760379-21662-3-git-send-email-gs051095@gmail.com Link: http://lkml.kernel.org/r/1507583624-22146-3-git-send-email-gs051095@gmail.com Signed-off-by: Gargi Sharma <gs051095@gmail.com> Reviewed-by: Rik van Riel <riel@redhat.com> Tested-by: Tony Luck <tony.luck@intel.com> [ia64] Cc: Julia Lawall <julia.lawall@lip6.fr> Cc: Ingo Molnar <mingo@kernel.org> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
95846ecf9d |
pid: replace pid bitmap implementation with IDR API
Patch series "Replacing PID bitmap implementation with IDR API", v4.
This series replaces kernel bitmap implementation of PID allocation with
IDR API. These patches are written to simplify the kernel by replacing
custom code with calls to generic code.
The following are the stats for pid and pid_namespace object files
before and after the replacement. There is a noteworthy change between
the IDR and bitmap implementation.
Before
text data bss dec hex filename
8447 3894 64 12405 3075 kernel/pid.o
After
text data bss dec hex filename
3397 304 0 3701 e75 kernel/pid.o
Before
text data bss dec hex filename
5692 1842 192 7726 1e2e kernel/pid_namespace.o
After
text data bss dec hex filename
2854 216 16 3086 c0e kernel/pid_namespace.o
The following are the stats for ps, pstree and calling readdir on /proc
for 10,000 processes.
ps:
With IDR API With bitmap
real 0m1.479s 0m2.319s
user 0m0.070s 0m0.060s
sys 0m0.289s 0m0.516s
pstree:
With IDR API With bitmap
real 0m1.024s 0m1.794s
user 0m0.348s 0m0.612s
sys 0m0.184s 0m0.264s
proc:
With IDR API With bitmap
real 0m0.059s 0m0.074s
user 0m0.000s 0m0.004s
sys 0m0.016s 0m0.016s
This patch (of 2):
Replace the current bitmap implementation for Process ID allocation.
Functions that are no longer required, for example, free_pidmap(),
alloc_pidmap(), etc. are removed. The rest of the functions are
modified to use the IDR API. The change was made to make the PID
allocation less complex by replacing custom code with calls to generic
API.
[gs051095@gmail.com: v6]
Link: http://lkml.kernel.org/r/1507760379-21662-2-git-send-email-gs051095@gmail.com
[avagin@openvz.org: restore the old behaviour of the ns_last_pid sysctl]
Link: http://lkml.kernel.org/r/20171106183144.16368-1-avagin@openvz.org
Link: http://lkml.kernel.org/r/1507583624-22146-2-git-send-email-gs051095@gmail.com
Signed-off-by: Gargi Sharma <gs051095@gmail.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
f9eb2fdd04 |
kernel/sysctl.c: code cleanups
Remove unnecessary else block, remove redundant return and call to kfree in if block. Link: http://lkml.kernel.org/r/1510238435-1655-1-git-send-email-mail@okal.no Signed-off-by: Ola N. Kaldestad <mail@okal.no> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
de40ccefd1 |
kdump: print a message in case parse_crashkernel_mem resulted in zero bytes
parse_crashkernel_mem() silently returns if we get zero bytes in the parsing function. It is useful for debugging to add a message, especially if the kernel cannot boot correctly. Add a pr_info instead of pr_warn because it is expected behavior for size = 0, eg. crashkernel=2G-4G:128M, size will be 0 in case system memory is less than 2G. Link: http://lkml.kernel.org/r/20171114080129.GA6115@dhcp-128-65.nay.redhat.com Signed-off-by: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Bhupesh Sharma <bhsharma@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
426915796c |
kernel/signal.c: remove the no longer needed SIGNAL_UNKILLABLE check in complete_signal()
complete_signal() checks SIGNAL_UNKILLABLE before it starts to destroy
the thread group, today this is wrong in many ways.
If nothing else, fatal_signal_pending() should always imply that the
whole thread group (except ->group_exit_task if it is not NULL) is
killed, this check breaks the rule.
After the previous changes we can rely on sig_task_ignored();
sig_fatal(sig) && SIGNAL_UNKILLABLE can only be true if we actually want
to kill this task and sig == SIGKILL OR it is traced and debugger can
intercept the signal.
This should hopefully fix the problem reported by Dmitry. This
test-case
static int init(void *arg)
{
for (;;)
pause();
}
int main(void)
{
char stack[16 * 1024];
for (;;) {
int pid = clone(init, stack + sizeof(stack)/2,
CLONE_NEWPID | SIGCHLD, NULL);
assert(pid > 0);
assert(ptrace(PTRACE_ATTACH, pid, 0, 0) == 0);
assert(waitpid(-1, NULL, WSTOPPED) == pid);
assert(ptrace(PTRACE_DETACH, pid, 0, SIGSTOP) == 0);
assert(syscall(__NR_tkill, pid, SIGKILL) == 0);
assert(pid == wait(NULL));
}
}
triggers the WARN_ON_ONCE(!(task->jobctl & JOBCTL_STOP_PENDING)) in
task_participate_group_stop(). do_signal_stop()->signal_group_exit()
checks SIGNAL_GROUP_EXIT and return false, but task_set_jobctl_pending()
checks fatal_signal_pending() and does not set JOBCTL_STOP_PENDING.
And his should fix the minor security problem reported by Kyle,
SECCOMP_RET_TRACE can miss fatal_signal_pending() the same way if the
task is the root of a pid namespace.
Link: http://lkml.kernel.org/r/20171103184246.GD21036@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Kyle Huey <me@kylehuey.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Kyle Huey <me@kylehuey.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
ac25385089 |
kernel/signal.c: protect the SIGNAL_UNKILLABLE tasks from !sig_kernel_only() signals
Change sig_task_ignored() to drop the SIG_DFL && !sig_kernel_only() signals even if force == T. This simplifies the next change and this matches the same check in get_signal() which will drop these signals anyway. Link: http://lkml.kernel.org/r/20171103184227.GC21036@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Tested-by: Kyle Huey <me@kylehuey.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
628c1bcba2 |
kernel/signal.c: protect the traced SIGNAL_UNKILLABLE tasks from SIGKILL
The comment in sig_ignored() says "Tracers may want to know about even ignored signals" but SIGKILL can not be reported to debugger and it is just wrong to return 0 in this case: SIGKILL should only kill the SIGNAL_UNKILLABLE task if it comes from the parent ns. Change sig_ignored() to ignore ->ptrace if sig == SIGKILL and rely on sig_task_ignored(). SISGTOP coming from within the namespace is not really right too but at least debugger can intercept it, and we can't drop it here because this will break "gdb -p 1": ptrace_attach() won't work. Perhaps we will add another ->ptrace check later, we will see. Link: http://lkml.kernel.org/r/20171103184206.GB21036@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Tested-by: Kyle Huey <me@kylehuey.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
fb910c42cc |
sysctl: check for UINT_MAX before unsigned int min/max
Mikulas noticed in the existing do_proc_douintvec_minmax_conv() and do_proc_dopipe_max_size_conv() introduced in this patchset, that they inconsistently handle overflow and min/max range inputs: For example: 0 ... param->min - 1 ---> ERANGE param->min ... param->max ---> the value is accepted param->max + 1 ... 0x100000000L + param->min - 1 ---> ERANGE 0x100000000L + param->min ... 0x100000000L + param->max ---> EINVAL 0x100000000L + param->max + 1, 0x200000000L + param->min - 1 ---> ERANGE 0x200000000L + param->min ... 0x200000000L + param->max ---> EINVAL 0x200000000L + param->max + 1, 0x300000000L + param->min - 1 ---> ERANGE In do_proc_do*() routines which store values into unsigned int variables (4 bytes wide for 64-bit builds), first validate that the input unsigned long value (8 bytes wide for 64-bit builds) will fit inside the smaller unsigned int variable. Then check that the unsigned int value falls inside the specified parameter min, max range. Otherwise the unsigned long -> unsigned int conversion drops leading bits from the input value, leading to the inconsistent pattern Mikulas documented above. Link: http://lkml.kernel.org/r/1507658689-11669-5-git-send-email-joe.lawrence@redhat.com Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com> Reported-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <axboe@kernel.dk> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
7a8d181949 |
pipe: add proc_dopipe_max_size() to safely assign pipe_max_size
pipe_max_size is assigned directly via procfs sysctl:
static struct ctl_table fs_table[] = {
...
{
.procname = "pipe-max-size",
.data = &pipe_max_size,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = &pipe_proc_fn,
.extra1 = &pipe_min_size,
},
...
int pipe_proc_fn(struct ctl_table *table, int write, void __user *buf,
size_t *lenp, loff_t *ppos)
{
...
ret = proc_dointvec_minmax(table, write, buf, lenp, ppos)
...
and then later rounded in-place a few statements later:
...
pipe_max_size = round_pipe_size(pipe_max_size);
...
This leaves a window of time between initial assignment and rounding
that may be visible to other threads. (For example, one thread sets a
non-rounded value to pipe_max_size while another reads its value.)
Similar reads of pipe_max_size are potentially racy:
pipe.c :: alloc_pipe_info()
pipe.c :: pipe_set_size()
Add a new proc_dopipe_max_size() that consolidates reading the new value
from the user buffer, verifying bounds, and calling round_pipe_size()
with a single assignment to pipe_max_size.
Link: http://lkml.kernel.org/r/1507658689-11669-4-git-send-email-joe.lawrence@redhat.com
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
98159d977f |
pipe: match pipe_max_size data type with procfs
Patch series "A few round_pipe_size() and pipe-max-size fixups", v3.
While backporting Michael's "pipe: fix limit handling" patchset to a
distro-kernel, Mikulas noticed that current upstream pipe limit handling
contains a few problems:
1 - procfs signed wrap: echo'ing a large number into
/proc/sys/fs/pipe-max-size and then cat'ing it back out shows a
negative value.
2 - round_pipe_size() nr_pages overflow on 32bit: this would
subsequently try roundup_pow_of_two(0), which is undefined.
3 - visible non-rounded pipe-max-size value: there is no mutual
exclusion or protection between the time pipe_max_size is assigned
a raw value from proc_dointvec_minmax() and when it is rounded.
4 - unsigned long -> unsigned int conversion makes for potential odd
return errors from do_proc_douintvec_minmax_conv() and
do_proc_dopipe_max_size_conv().
This version underwent the same testing as v1:
https://marc.info/?l=linux-kernel&m=150643571406022&w=2
This patch (of 4):
pipe_max_size is defined as an unsigned int:
unsigned int pipe_max_size = 1048576;
but its procfs/sysctl representation is an integer:
static struct ctl_table fs_table[] = {
...
{
.procname = "pipe-max-size",
.data = &pipe_max_size,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = &pipe_proc_fn,
.extra1 = &pipe_min_size,
},
...
that is signed:
int pipe_proc_fn(struct ctl_table *table, int write, void __user *buf,
size_t *lenp, loff_t *ppos)
{
...
ret = proc_dointvec_minmax(table, write, buf, lenp, ppos)
This leads to signed results via procfs for large values of pipe_max_size:
% echo 2147483647 >/proc/sys/fs/pipe-max-size
% cat /proc/sys/fs/pipe-max-size
-2147483648
Use unsigned operations on this variable to avoid such negative values.
Link: http://lkml.kernel.org/r/1507658689-11669-2-git-send-email-joe.lawrence@redhat.com
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
8c703d6604 |
kernel/umh.c: optimize 'proc_cap_handler()'
If 'write' is 0, we can avoid a call to spin_lock/spin_unlock. Link: http://lkml.kernel.org/r/20171020193331.7233-1-christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Luis R. Rodriguez <mcgrof@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
a7bed27af1 |
bug: fix "cut here" location for __WARN_TAINT architectures
Prior to v4.11, x86 used warn_slowpath_fmt() for handling WARN()s.
After WARN() was moved to using UD0 on x86, the warning text started
appearing _before_ the "cut here" line. This appears to have been a
long-standing bug on architectures that used __WARN_TAINT, but it didn't
get fixed.
v4.11 and earlier on x86:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 2956 at drivers/misc/lkdtm_bugs.c:65 lkdtm_WARNING+0x21/0x30
This is a warning message
Modules linked in:
v4.12 and later on x86:
This is a warning message
------------[ cut here ]------------
WARNING: CPU: 1 PID: 2982 at drivers/misc/lkdtm_bugs.c:68 lkdtm_WARNING+0x15/0x20
Modules linked in:
With this fix:
------------[ cut here ]------------
This is a warning message
WARNING: CPU: 3 PID: 3009 at drivers/misc/lkdtm_bugs.c:67 lkdtm_WARNING+0x15/0x20
Since the __FILE__ reporting happens as part of the UD0 handler, it
isn't trivial to move the message to after the WARNING line, but at
least we can fix the position of the "cut here" line so all the various
logging tools will start including the actual runtime warning message
again, when they follow the instruction and "cut here".
Link: http://lkml.kernel.org/r/1510100869-73751-4-git-send-email-keescook@chromium.org
Fixes:
|
||
|
|
2a8358d8a3 |
bug: define the "cut here" string in a single place
The "cut here" string is used in a few paths. Define it in a single place. Link: http://lkml.kernel.org/r/1510100869-73751-3-git-send-email-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
aaf5dcfb22 |
kernel debug: support resetting WARN_ONCE for all architectures
Some architectures store the WARN_ONCE state in the flags field of the bug_entry. Clear that one too when resetting once state through /sys/kernel/debug/clear_warn_once Pointed out by Michael Ellerman Improves the earlier patch that add clear_warn_once. [ak@linux.intel.com: add a missing ifdef CONFIG_MODULES] Link: http://lkml.kernel.org/r/20171020170633.9593-1-andi@firstfloor.org [akpm@linux-foundation.org: fix unused var warning] [akpm@linux-foundation.org: Use 0200 for clear_warn_once file, per mpe] [akpm@linux-foundation.org: clear BUGFLAG_DONE in clear_once_table(), per mpe] Link: http://lkml.kernel.org/r/20171019204642.7404-1-andi@firstfloor.org Signed-off-by: Andi Kleen <ak@linux.intel.com> Tested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
b1fca27d38 |
kernel debug: support resetting WARN*_ONCE
I like _ONCE warnings because it's guaranteed that they don't flood the log. During testing I find it useful to reset the state of the once warnings, so that I can rerun tests and see if they trigger again, or can guarantee that a test run always hits the same warnings. This patch adds a debugfs interface to reset all the _ONCE warnings so that they appear again: echo 1 > /sys/kernel/debug/clear_warn_once This is implemented by putting all the warning booleans into a special section, and clearing it. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20171017221455.6740-1-andi@firstfloor.org Signed-off-by: Andi Kleen <ak@linux.intel.com> Tested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
2dcd9c71c1 |
Merge tag 'trace-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from
- allow module init functions to be traced
- clean up some unused or not used by config events (saves space)
- clean up of trace histogram code
- add support for preempt and interrupt enabled/disable events
- other various clean ups
* tag 'trace-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (30 commits)
tracing, thermal: Hide cpu cooling trace events when not in use
tracing, thermal: Hide devfreq trace events when not in use
ftrace: Kill FTRACE_OPS_FL_PER_CPU
perf/ftrace: Small cleanup
perf/ftrace: Fix function trace events
perf/ftrace: Revert ("perf/ftrace: Fix double traces of perf on ftrace:function")
tracing, dma-buf: Remove unused trace event dma_fence_annotate_wait_on
tracing, memcg, vmscan: Hide trace events when not in use
tracing/xen: Hide events that are not used when X86_PAE is not defined
tracing: mark trace_test_buffer as __maybe_unused
printk: Remove superfluous memory barriers from printk_safe
ftrace: Clear hashes of stale ips of init memory
tracing: Add support for preempt and irq enable/disable events
tracing: Prepare to add preempt and irq trace events
ftrace/kallsyms: Have /proc/kallsyms show saved mod init functions
ftrace: Add freeing algorithm to free ftrace_mod_maps
ftrace: Save module init functions kallsyms symbols for tracing
ftrace: Allow module init functions to be traced
ftrace: Add a ftrace_free_mem() function for modules to use
tracing: Reimplement log2
...
|
||
|
|
93f30c73ec |
Merge branch 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull compat and uaccess updates from Al Viro:
- {get,put}_compat_sigset() series
- assorted compat ioctl stuff
- more set_fs() elimination
- a few more timespec64 conversions
- several removals of pointless access_ok() in places where it was
followed only by non-__ variants of primitives
* 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (24 commits)
coredump: call do_unlinkat directly instead of sys_unlink
fs: expose do_unlinkat for built-in callers
ext4: take handling of EXT4_IOC_GROUP_ADD into a helper, get rid of set_fs()
ipmi: get rid of pointless access_ok()
pi433: sanitize ioctl
cxlflash: get rid of pointless access_ok()
mtdchar: get rid of pointless access_ok()
r128: switch compat ioctls to drm_ioctl_kernel()
selection: get rid of field-by-field copyin
VT_RESIZEX: get rid of field-by-field copyin
i2c compat ioctls: move to ->compat_ioctl()
sched_rr_get_interval(): move compat to native, get rid of set_fs()
mips: switch to {get,put}_compat_sigset()
sparc: switch to {get,put}_compat_sigset()
s390: switch to {get,put}_compat_sigset()
ppc: switch to {get,put}_compat_sigset()
parisc: switch to {get,put}_compat_sigset()
get_compat_sigset()
get rid of {get,put}_compat_itimerspec()
io_getevents: Use timespec64 to represent timeouts
...
|
||
|
|
758f875848 |
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace update from Eric Biederman:
"The only change that is production ready this round is the work to
increase the number of uid and gid mappings a user namespace can
support from 5 to 340.
This code was carefully benchmarked and it was confirmed that in the
existing cases the performance remains the same. In the worst case
with 340 mappings an cache cold stat times go from 158ns to 248ns.
That is noticable but still quite small, and only the people who are
doing crazy things pay the cost.
This work uncovered some documentation and cleanup opportunities in
the mapping code, and patches to make those cleanups and improve the
documentation will be coming in the next merge window"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
userns: Simplify insert_extent
userns: Make map_id_down a wrapper for map_id_range_down
userns: Don't read extents twice in m_start
userns: Simplify the user and group mapping functions
userns: Don't special case a count of 0
userns: bump idmap limits to 340
userns: use union in {g,u}idmap struct
|
||
|
|
487e2c9f44 |
Merge tag 'afs-next-20171113' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull AFS updates from David Howells:
"kAFS filesystem driver overhaul.
The major points of the overhaul are:
(1) Preliminary groundwork is laid for supporting network-namespacing
of kAFS. The remainder of the namespacing work requires some way
to pass namespace information to submounts triggered by an
automount. This requires something like the mount overhaul that's
in progress.
(2) sockaddr_rxrpc is used in preference to in_addr for holding
addresses internally and add support for talking to the YFS VL
server. With this, kAFS can do everything over IPv6 as well as
IPv4 if it's talking to servers that support it.
(3) Callback handling is overhauled to be generally passive rather
than active. 'Callbacks' are promises by the server to tell us
about data and metadata changes. Callbacks are now checked when
we next touch an inode rather than actively going and looking for
it where possible.
(4) File access permit caching is overhauled to store the caching
information per-inode rather than per-directory, shared over
subordinate files. Whilst older AFS servers only allow ACLs on
directories (shared to the files in that directory), newer AFS
servers break that restriction.
To improve memory usage and to make it easier to do mass-key
removal, permit combinations are cached and shared.
(5) Cell database management is overhauled to allow lighter locks to
be used and to make cell records autonomous state machines that
look after getting their own DNS records and cleaning themselves
up, in particular preventing races in acquiring and relinquishing
the fscache token for the cell.
(6) Volume caching is overhauled. The afs_vlocation record is got rid
of to simplify things and the superblock is now keyed on the cell
and the numeric volume ID only. The volume record is tied to a
superblock and normal superblock management is used to mediate
the lifetime of the volume fscache token.
(7) File server record caching is overhauled to make server records
independent of cells and volumes. A server can be in multiple
cells (in such a case, the administrator must make sure that the
VL services for all cells correctly reflect the volumes shared
between those cells).
Server records are now indexed using the UUID of the server
rather than the address since a server can have multiple
addresses.
(8) File server rotation is overhauled to handle VMOVED, VBUSY (and
similar), VOFFLINE and VNOVOL indications and to handle rotation
both of servers and addresses of those servers. The rotation will
also wait and retry if the server says it is busy.
(9) Data writeback is overhauled. Each inode no longer stores a list
of modified sections tagged with the key that authorised it in
favour of noting the modified region of a page in page->private
and storing a list of keys that made modifications in the inode.
This simplifies things and allows other keys to be used to
actually write to the server if a key that made a modification
becomes useless.
(10) Writable mmap() is implemented. This allows a kernel to be build
entirely on AFS.
Note that Pre AFS-3.4 servers are no longer supported, though this can
be added back if necessary (AFS-3.4 was released in 1998)"
* tag 'afs-next-20171113' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (35 commits)
afs: Protect call->state changes against signals
afs: Trace page dirty/clean
afs: Implement shared-writeable mmap
afs: Get rid of the afs_writeback record
afs: Introduce a file-private data record
afs: Use a dynamic port if 7001 is in use
afs: Fix directory read/modify race
afs: Trace the sending of pages
afs: Trace the initiation and completion of client calls
afs: Fix documentation on # vs % prefix in mount source specification
afs: Fix total-length calculation for multiple-page send
afs: Only progress call state at end of Tx phase from rxrpc callback
afs: Make use of the YFS service upgrade to fully support IPv6
afs: Overhaul volume and server record caching and fileserver rotation
afs: Move server rotation code into its own file
afs: Add an address list concept
afs: Overhaul cell database management
afs: Overhaul permit caching
afs: Overhaul the callback handling
afs: Rename struct afs_call server member to cm_server
...
|
||
|
|
7c225c69f8 |
Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton: - a few misc bits - ocfs2 updates - almost all of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (131 commits) memory hotplug: fix comments when adding section mm: make alloc_node_mem_map a void call if we don't have CONFIG_FLAT_NODE_MEM_MAP mm: simplify nodemask printing mm,oom_reaper: remove pointless kthread_run() error check mm/page_ext.c: check if page_ext is not prepared writeback: remove unused function parameter mm: do not rely on preempt_count in print_vma_addr mm, sparse: do not swamp log with huge vmemmap allocation failures mm/hmm: remove redundant variable align_end mm/list_lru.c: mark expected switch fall-through mm/shmem.c: mark expected switch fall-through mm/page_alloc.c: broken deferred calculation mm: don't warn about allocations which stall for too long fs: fuse: account fuse_inode slab memory as reclaimable mm, page_alloc: fix potential false positive in __zone_watermark_ok mm: mlock: remove lru_add_drain_all() mm, sysctl: make NUMA stats configurable shmem: convert shmem_init_inodecache() to void Unify migrate_pages and move_pages access checks mm, pagevec: rename pagevec drained field ... |
||
|
|
4518085e12 |
mm, sysctl: make NUMA stats configurable
This is the second step which introduces a tunable interface that allow numa stats configurable for optimizing zone_statistics(), as suggested by Dave Hansen and Ying Huang. ========================================================================= When page allocation performance becomes a bottleneck and you can tolerate some possible tool breakage and decreased numa counter precision, you can do: echo 0 > /proc/sys/vm/numa_stat In this case, numa counter update is ignored. We can see about *4.8%*(185->176) drop of cpu cycles per single page allocation and reclaim on Jesper's page_bench01 (single thread) and *8.1%*(343->315) drop of cpu cycles per single page allocation and reclaim on Jesper's page_bench03 (88 threads) running on a 2-Socket Broadwell-based server (88 threads, 126G memory). Benchmark link provided by Jesper D Brouer (increase loop times to 10000000): https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm/bench ========================================================================= When page allocation performance is not a bottleneck and you want all tooling to work, you can do: echo 1 > /proc/sys/vm/numa_stat This is system default setting. Many thanks to Michal Hocko, Dave Hansen, Ying Huang and Vlastimil Babka for comments to help improve the original patch. [keescook@chromium.org: make sure mutex is a global static] Link: http://lkml.kernel.org/r/20171107213809.GA4314@beast Link: http://lkml.kernel.org/r/1508290927-8518-1-git-send-email-kemi.wang@intel.com Signed-off-by: Kemi Wang <kemi.wang@intel.com> Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Suggested-by: Dave Hansen <dave.hansen@intel.com> Suggested-by: Ying Huang <ying.huang@intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: "Luis R . Rodriguez" <mcgrof@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Christopher Lameter <cl@linux.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Tim Chen <tim.c.chen@intel.com> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Aaron Lu <aaron.lu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
453f85d43f |
mm: remove __GFP_COLD
As the page free path makes no distinction between cache hot and cold pages, there is no real useful ordering of pages in the free list that allocation requests can take advantage of. Juding from the users of __GFP_COLD, it is likely that a number of them are the result of copying other sites instead of actually measuring the impact. Remove the __GFP_COLD parameter which simplifies a number of paths in the page allocator. This is potentially controversial but bear in mind that the size of the per-cpu pagelists versus modern cache sizes means that the whole per-cpu list can often fit in the L3 cache. Hence, there is only a potential benefit for microbenchmarks that alloc/free pages in a tight loop. It's even worse when THP is taken into account which has little or no chance of getting a cache-hot page as the per-cpu list is bypassed and the zeroing of multiple pages will thrash the cache anyway. The truncate microbenchmarks are not shown as this patch affects the allocation path and not the free path. A page fault microbenchmark was tested but it showed no sigificant difference which is not surprising given that the __GFP_COLD branches are a miniscule percentage of the fault path. Link: http://lkml.kernel.org/r/20171018075952.10627-9-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
4675ff05de |
kmemcheck: rip it out
Fix up makefiles, remove references, and git rm kmemcheck. Link: http://lkml.kernel.org/r/20171007030159.22241-4-alexander.levin@verizon.com Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Vegard Nossum <vegardno@ifi.uio.no> Cc: Pekka Enberg <penberg@kernel.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Alexander Potapenko <glider@google.com> Cc: Tim Hansen <devtimhansen@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
75f296d93b |
kmemcheck: stop using GFP_NOTRACK and SLAB_NOTRACK
Convert all allocations that used a NOTRACK flag to stop using it. Link: http://lkml.kernel.org/r/20171007030159.22241-3-alexander.levin@verizon.com Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Cc: Alexander Potapenko <glider@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tim Hansen <devtimhansen@gmail.com> Cc: Vegard Nossum <vegardno@ifi.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
4950276672 |
kmemcheck: remove annotations
Patch series "kmemcheck: kill kmemcheck", v2. As discussed at LSF/MM, kill kmemcheck. KASan is a replacement that is able to work without the limitation of kmemcheck (single CPU, slow). KASan is already upstream. We are also not aware of any users of kmemcheck (or users who don't consider KASan as a suitable replacement). The only objection was that since KASAN wasn't supported by all GCC versions provided by distros at that time we should hold off for 2 years, and try again. Now that 2 years have passed, and all distros provide gcc that supports KASAN, kill kmemcheck again for the very same reasons. This patch (of 4): Remove kmemcheck annotations, and calls to kmemcheck from the kernel. [alexander.levin@verizon.com: correctly remove kmemcheck call from dma_map_sg_attrs] Link: http://lkml.kernel.org/r/20171012192151.26531-1-alexander.levin@verizon.com Link: http://lkml.kernel.org/r/20171007030159.22241-2-alexander.levin@verizon.com Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Cc: Alexander Potapenko <glider@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tim Hansen <devtimhansen@gmail.com> Cc: Vegard Nossum <vegardno@ifi.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
af5b0f6a09 |
mm: consolidate page table accounting
Currently, we account page tables separately for each page table level, but that's redundant -- we only make use of total memory allocated to page tables for oom_badness calculation. We also provide the information to userspace, but it has dubious value there too. This patch switches page table accounting to single counter. mm->pgtables_bytes is now used to account all page table levels. We use bytes, because page table size for different levels of page table tree may be different. The change has user-visible effect: we don't have VmPMD and VmPUD reported in /proc/[pid]/status. Not sure if anybody uses them. (As alternative, we can always report 0 kB for them.) OOM-killer report is also slightly changed: we now report pgtables_bytes instead of nr_ptes, nr_pmd, nr_puds. Apart from reducing number of counters per-mm, the benefit is that we now calculate oom_badness() more correctly for machines which have different size of page tables depending on level or where page tables are less than a page in size. The only downside can be debuggability because we do not know which page table level could leak. But I do not remember many bugs that would be caught by separate counters so I wouldn't lose sleep over this. [akpm@linux-foundation.org: fix mm/huge_memory.c] Link: http://lkml.kernel.org/r/20171006100651.44742-2-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> [kirill.shutemov@linux.intel.com: fix build] Link: http://lkml.kernel.org/r/20171016150113.ikfxy3e7zzfvsr4w@black.fi.intel.com Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
c4812909f5 |
mm: introduce wrappers to access mm->nr_ptes
Let's add wrappers for ->nr_ptes with the same interface as for nr_pmd and nr_pud. The patch also makes nr_ptes accounting dependent onto CONFIG_MMU. Page table accounting doesn't make sense if you don't have page tables. It's preparation for consolidation of page-table counters in mm_struct. Link: http://lkml.kernel.org/r/20171006100651.44742-1-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
b4e98d9ac7 |
mm: account pud page tables
On a machine with 5-level paging support a process can allocate
significant amount of memory and stay unnoticed by oom-killer and memory
cgroup. The trick is to allocate a lot of PUD page tables. We don't
account PUD page tables, only PMD and PTE.
We already addressed the same issue for PMD page tables, see commit
|
||
|
|
22714a2ba4 |
Merge branch 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
"Cgroup2 cpu controller support is finally merged.
- Basic cpu statistics support to allow monitoring by default without
the CPU controller enabled.
- cgroup2 cpu controller support.
- /sys/kernel/cgroup files to help dealing with new / optional
features"
* 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: export list of cgroups v2 features using sysfs
cgroup: export list of delegatable control files using sysfs
cgroup: mark @cgrp __maybe_unused in cpu_stat_show()
MAINTAINERS: relocate cpuset.c
cgroup, sched: Move basic cpu stats from cgroup.stat to cpu.stat
sched: Implement interface for cgroup unified hierarchy
sched: Misc preps for cgroup unified hierarchy interface
sched/cputime: Add dummy cputime_adjust() implementation for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
cgroup: statically initialize init_css_set->dfl_cgrp
cgroup: Implement cgroup2 basic CPU usage accounting
cpuacct: Introduce cgroup_account_cputime[_field]()
sched/cputime: Expose cputime_adjust()
|
||
|
|
0be500363c |
Merge branch 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo: "There was a commit to make unbound kworkers respect cpu isolation but it conflicted with the restructuring of cpu isolation and got reverted, so the only thing left is the trivial comment fix. Will retry the cpu isolation change after this merge window" * 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Fix comment for unbound workqueue's attrbutes Revert "workqueue: respect isolated cpus when queueing an unbound work" workqueue: respect isolated cpus when queueing an unbound work |
||
|
|
1be2172e96 |
Merge tag 'modules-for-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux
Pull module updates from Jessica Yu:
"Summary of modules changes for the 4.15 merge window:
- treewide module_param_call() cleanup, fix up set/get function
prototype mismatches, from Kees Cook
- minor code cleanups"
* tag 'modules-for-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
module: Do not paper over type mismatches in module_param_call()
treewide: Fix function prototypes for module_param_call()
module: Prepare to convert all module_param_call() prototypes
kernel/module: Delete an error message for a failed memory allocation in add_module_usage()
|
||
|
|
f9bab2677a |
Merge tag 'audit-pr-20171113' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit updates from Paul Moore:
"Another relatively small pull request for audit, nine patches total.
The only real new bit of functionality is the patch from Richard which
adds the ability to filter records based on the filesystem type.
The remainder are bug fixes and cleanups; the bug fix highlights
include:
- ensuring that we properly audit init/PID-1 (me)
- allowing the audit daemon to shutdown the kernel/auditd connection
cleanly by setting the audit PID to zero (Steve)"
* tag 'audit-pr-20171113' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: filter PATH records keyed on filesystem magic
Audit: remove unused audit_log_secctx function
audit: Allow auditd to set pid to 0 to end auditing
audit: Add new syscalls to the perm=w filter
audit: use audit_set_enabled() in audit_enable()
audit: convert audit_ever_enabled to a boolean
audit: don't use simple_strtol() anymore
audit: initialize the audit subsystem as early as possible
audit: ensure that 'audit=1' actually enables audit for PID 1
|
||
|
|
5bbcc0f595 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
"Highlights:
1) Maintain the TCP retransmit queue using an rbtree, with 1GB
windows at 100Gb this really has become necessary. From Eric
Dumazet.
2) Multi-program support for cgroup+bpf, from Alexei Starovoitov.
3) Perform broadcast flooding in hardware in mv88e6xxx, from Andrew
Lunn.
4) Add meter action support to openvswitch, from Andy Zhou.
5) Add a data meta pointer for BPF accessible packets, from Daniel
Borkmann.
6) Namespace-ify almost all TCP sysctl knobs, from Eric Dumazet.
7) Turn on Broadcom Tags in b53 driver, from Florian Fainelli.
8) More work to move the RTNL mutex down, from Florian Westphal.
9) Add 'bpftool' utility, to help with bpf program introspection.
From Jakub Kicinski.
10) Add new 'cpumap' type for XDP_REDIRECT action, from Jesper
Dangaard Brouer.
11) Support 'blocks' of transformations in the packet scheduler which
can span multiple network devices, from Jiri Pirko.
12) TC flower offload support in cxgb4, from Kumar Sanghvi.
13) Priority based stream scheduler for SCTP, from Marcelo Ricardo
Leitner.
14) Thunderbolt networking driver, from Amir Levy and Mika Westerberg.
15) Add RED qdisc offloadability, and use it in mlxsw driver. From
Nogah Frankel.
16) eBPF based device controller for cgroup v2, from Roman Gushchin.
17) Add some fundamental tracepoints for TCP, from Song Liu.
18) Remove garbage collection from ipv6 route layer, this is a
significant accomplishment. From Wei Wang.
19) Add multicast route offload support to mlxsw, from Yotam Gigi"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2177 commits)
tcp: highest_sack fix
geneve: fix fill_info when link down
bpf: fix lockdep splat
net: cdc_ncm: GetNtbFormat endian fix
openvswitch: meter: fix NULL pointer dereference in ovs_meter_cmd_reply_start
netem: remove unnecessary 64 bit modulus
netem: use 64 bit divide by rate
tcp: Namespace-ify sysctl_tcp_default_congestion_control
net: Protect iterations over net::fib_notifier_ops in fib_seq_sum()
ipv6: set all.accept_dad to 0 by default
uapi: fix linux/tls.h userspace compilation error
usbnet: ipheth: prevent TX queue timeouts when device not ready
vhost_net: conditionally enable tx polling
uapi: fix linux/rxrpc.h userspace compilation errors
net: stmmac: fix LPI transitioning for dwmac4
atm: horizon: Fix irq release error
net-sysfs: trigger netlink notification on ifalias change via sysfs
openvswitch: Using kfree_rcu() to simplify the code
openvswitch: Make local function ovs_nsh_key_attr_size() static
openvswitch: Fix return value check in ovs_meter_cmd_features()
...
|
||
|
|
c9b012e5f4 |
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"The big highlight is support for the Scalable Vector Extension (SVE)
which required extensive ABI work to ensure we don't break existing
applications by blowing away their signal stack with the rather large
new vector context (<= 2 kbit per vector register). There's further
work to be done optimising things like exception return, but the ABI
is solid now.
Much of the line count comes from some new PMU drivers we have, but
they're pretty self-contained and I suspect we'll have more of them in
future.
Plenty of acronym soup here:
- initial support for the Scalable Vector Extension (SVE)
- improved handling for SError interrupts (required to handle RAS
events)
- enable GCC support for 128-bit integer types
- remove kernel text addresses from backtraces and register dumps
- use of WFE to implement long delay()s
- ACPI IORT updates from Lorenzo Pieralisi
- perf PMU driver for the Statistical Profiling Extension (SPE)
- perf PMU driver for Hisilicon's system PMUs
- misc cleanups and non-critical fixes"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (97 commits)
arm64: Make ARMV8_DEPRECATED depend on SYSCTL
arm64: Implement __lshrti3 library function
arm64: support __int128 on gcc 5+
arm64/sve: Add documentation
arm64/sve: Detect SVE and activate runtime support
arm64/sve: KVM: Hide SVE from CPU features exposed to guests
arm64/sve: KVM: Treat guest SVE use as undefined instruction execution
arm64/sve: KVM: Prevent guests from using SVE
arm64/sve: Add sysctl to set the default vector length for new processes
arm64/sve: Add prctl controls for userspace vector length management
arm64/sve: ptrace and ELF coredump support
arm64/sve: Preserve SVE registers around EFI runtime service calls
arm64/sve: Preserve SVE registers around kernel-mode NEON use
arm64/sve: Probe SVE capabilities and usable vector lengths
arm64: cpufeature: Move sys_caps_initialised declarations
arm64/sve: Backend logic for setting the vector length
arm64/sve: Signal handling support
arm64/sve: Support vector length resetting for new processes
arm64/sve: Core task context handling
arm64/sve: Low-level CPU setup
...
|
||
|
|
0ef76878cf |
Merge branch 'for-linus' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching
Pull livepatching updates from Jiri Kosina: - shadow variables support, allowing livepatches to associate new "shadow" fields to existing data structures, from Joe Lawrence - pre/post patch callbacks API, allowing livepatch writers to register callbacks to be called before and after patch application, from Joe Lawrence * 'for-linus' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching: livepatch: __klp_disable_patch() should never be called for disabled patches livepatch: Correctly call klp_post_unpatch_callback() in error paths livepatch: add transition notices livepatch: move transition "complete" notice into klp_complete_transition() livepatch: add (un)patch callbacks livepatch: Small shadow variable documentation fixes livepatch: __klp_shadow_get_or_alloc() is local to shadow.c livepatch: introduce shadow variable API |
||
|
|
9682b3dea2 |
Merge branch 'for-linus' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina: "The usual rocket-science from trivial tree for 4.15" * 'for-linus' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: MAINTAINERS: relinquish kconfig MAINTAINERS: Update my email address treewide: Fix typos in Kconfig kfifo: Fix comments init/Kconfig: Fix module signing document location misc: ibmasm: Return error on error path HID: logitech-hidpp: fix mistake in printk, "feeback" -> "feedback" MAINTAINERS: Correct path to uDraw PS3 driver tracing: Fix doc mistakes in trace sample tracing: Kconfig text fixes for CONFIG_HWLAT_TRACER MIPS: Alchemy: Remove reverted CONFIG_NETLINK_MMAP from db1xxx_defconfig mm/huge_memory.c: fixup grammar in comment lib/xz: Add fall-through comments to a switch statement |
||
|
|
89ad2fa3f0 |
bpf: fix lockdep splat
pcpu_freelist_pop() needs the same lockdep awareness than
pcpu_freelist_populate() to avoid a false positive.
[ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ]
switchto-defaul/12508 [HC0[0]:SC0[6]:HE0:SE0] is trying to acquire:
(&htab->buckets[i].lock){......}, at: [<ffffffff9dc099cb>] __htab_percpu_map_update_elem+0x1cb/0x300
and this task is already holding:
(dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2){+.-...}, at: [<ffffffff9e135848>] __dev_queue_xmit+0
x868/0x1240
which would create a new lock dependency:
(dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2){+.-...} -> (&htab->buckets[i].lock){......}
but this new dependency connects a SOFTIRQ-irq-safe lock:
(dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2){+.-...}
... which became SOFTIRQ-irq-safe at:
[<ffffffff9db5931b>] __lock_acquire+0x42b/0x1f10
[<ffffffff9db5b32c>] lock_acquire+0xbc/0x1b0
[<ffffffff9da05e38>] _raw_spin_lock+0x38/0x50
[<ffffffff9e135848>] __dev_queue_xmit+0x868/0x1240
[<ffffffff9e136240>] dev_queue_xmit+0x10/0x20
[<ffffffff9e1965d9>] ip_finish_output2+0x439/0x590
[<ffffffff9e197410>] ip_finish_output+0x150/0x2f0
[<ffffffff9e19886d>] ip_output+0x7d/0x260
[<ffffffff9e19789e>] ip_local_out+0x5e/0xe0
[<ffffffff9e197b25>] ip_queue_xmit+0x205/0x620
[<ffffffff9e1b8398>] tcp_transmit_skb+0x5a8/0xcb0
[<ffffffff9e1ba152>] tcp_write_xmit+0x242/0x1070
[<ffffffff9e1baffc>] __tcp_push_pending_frames+0x3c/0xf0
[<ffffffff9e1b3472>] tcp_rcv_established+0x312/0x700
[<ffffffff9e1c1acc>] tcp_v4_do_rcv+0x11c/0x200
[<ffffffff9e1c3dc2>] tcp_v4_rcv+0xaa2/0xc30
[<ffffffff9e191107>] ip_local_deliver_finish+0xa7/0x240
[<ffffffff9e191a36>] ip_local_deliver+0x66/0x200
[<ffffffff9e19137d>] ip_rcv_finish+0xdd/0x560
[<ffffffff9e191e65>] ip_rcv+0x295/0x510
[<ffffffff9e12ff88>] __netif_receive_skb_core+0x988/0x1020
[<ffffffff9e130641>] __netif_receive_skb+0x21/0x70
[<ffffffff9e1306ff>] process_backlog+0x6f/0x230
[<ffffffff9e132129>] net_rx_action+0x229/0x420
[<ffffffff9da07ee8>] __do_softirq+0xd8/0x43d
[<ffffffff9e282bcc>] do_softirq_own_stack+0x1c/0x30
[<ffffffff9dafc2f5>] do_softirq+0x55/0x60
[<ffffffff9dafc3a8>] __local_bh_enable_ip+0xa8/0xb0
[<ffffffff9db4c727>] cpu_startup_entry+0x1c7/0x500
[<ffffffff9daab333>] start_secondary+0x113/0x140
to a SOFTIRQ-irq-unsafe lock:
(&head->lock){+.+...}
... which became SOFTIRQ-irq-unsafe at:
... [<ffffffff9db5971f>] __lock_acquire+0x82f/0x1f10
[<ffffffff9db5b32c>] lock_acquire+0xbc/0x1b0
[<ffffffff9da05e38>] _raw_spin_lock+0x38/0x50
[<ffffffff9dc0b7fa>] pcpu_freelist_pop+0x7a/0xb0
[<ffffffff9dc08b2c>] htab_map_alloc+0x50c/0x5f0
[<ffffffff9dc00dc5>] SyS_bpf+0x265/0x1200
[<ffffffff9e28195f>] entry_SYSCALL_64_fastpath+0x12/0x17
other info that might help us debug this:
Chain exists of:
dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2 --> &htab->buckets[i].lock --> &head->lock
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&head->lock);
local_irq_disable();
lock(dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2);
lock(&htab->buckets[i].lock);
<Interrupt>
lock(dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2);
*** DEADLOCK ***
Fixes:
|
||
|
|
fc41efc184 |
Merge branch 'for-4.15/callbacks' into for-linus
This pulls in an infrastructure/API that allows livepatch writers to register pre-patch and post-patch callbacks that allow for running a glue code necessary for finalizing the patching if necessary. Conflicts: kernel/livepatch/core.c - trivial conflict by adding a callback call into module going notifier vs. moving that code block to klp_cleanup_module_patches_limited() Signed-off-by: Jiri Kosina <jkosina@suse.cz> |
||
|
|
cb65dc7b89 |
Merge branch 'for-4.15/shadow-variables' into for-linus
Shadow variables allow callers to associate new shadow fields to existing data structures. This is intended to be used by livepatch modules seeking to emulate additions to data structure definitions. |
||
|
|
e2c5923c34 |
Merge branch 'for-4.15/block' of git://git.kernel.dk/linux-block
Pull core block layer updates from Jens Axboe:
"This is the main pull request for block storage for 4.15-rc1.
Nothing out of the ordinary in here, and no API changes or anything
like that. Just various new features for drivers, core changes, etc.
In particular, this pull request contains:
- A patch series from Bart, closing the whole on blk/scsi-mq queue
quescing.
- A series from Christoph, building towards hidden gendisks (for
multipath) and ability to move bio chains around.
- NVMe
- Support for native multipath for NVMe (Christoph).
- Userspace notifications for AENs (Keith).
- Command side-effects support (Keith).
- SGL support (Chaitanya Kulkarni)
- FC fixes and improvements (James Smart)
- Lots of fixes and tweaks (Various)
- bcache
- New maintainer (Michael Lyle)
- Writeback control improvements (Michael)
- Various fixes (Coly, Elena, Eric, Liang, et al)
- lightnvm updates, mostly centered around the pblk interface
(Javier, Hans, and Rakesh).
- Removal of unused bio/bvec kmap atomic interfaces (me, Christoph)
- Writeback series that fix the much discussed hundreds of millions
of sync-all units. This goes all the way, as discussed previously
(me).
- Fix for missing wakeup on writeback timer adjustments (Yafang
Shao).
- Fix laptop mode on blk-mq (me).
- {mq,name} tupple lookup for IO schedulers, allowing us to have
alias names. This means you can use 'deadline' on both !mq and on
mq (where it's called mq-deadline). (me).
- blktrace race fix, oopsing on sg load (me).
- blk-mq optimizations (me).
- Obscure waitqueue race fix for kyber (Omar).
- NBD fixes (Josef).
- Disable writeback throttling by default on bfq, like we do on cfq
(Luca Miccio).
- Series from Ming that enable us to treat flush requests on blk-mq
like any other request. This is a really nice cleanup.
- Series from Ming that improves merging on blk-mq with schedulers,
getting us closer to flipping the switch on scsi-mq again.
- BFQ updates (Paolo).
- blk-mq atomic flags memory ordering fixes (Peter Z).
- Loop cgroup support (Shaohua).
- Lots of minor fixes from lots of different folks, both for core and
driver code"
* 'for-4.15/block' of git://git.kernel.dk/linux-block: (294 commits)
nvme: fix visibility of "uuid" ns attribute
blk-mq: fixup some comment typos and lengths
ide: ide-atapi: fix compile error with defining macro DEBUG
blk-mq: improve tag waiting setup for non-shared tags
brd: remove unused brd_mutex
blk-mq: only run the hardware queue if IO is pending
block: avoid null pointer dereference on null disk
fs: guard_bio_eod() needs to consider partitions
xtensa/simdisk: fix compile error
nvme: expose subsys attribute to sysfs
nvme: create 'slaves' and 'holders' entries for hidden controllers
block: create 'slaves' and 'holders' entries for hidden gendisks
nvme: also expose the namespace identification sysfs files for mpath nodes
nvme: implement multipath access to nvme subsystems
nvme: track shared namespaces
nvme: introduce a nvme_ns_ids structure
nvme: track subsystems
block, nvme: Introduce blk_mq_req_flags_t
block, scsi: Make SCSI quiesce and resume work reliably
block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag
...
|
||
|
|
f14fc0ccee |
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull quota, ext2, isofs and udf fixes from Jan Kara: - two small quota error handling fixes - two isofs fixes for architectures with signed char - several udf block number overflow and signedness fixes - ext2 rework of mount option handling to avoid GFP_KERNEL allocation with spinlock held - ... it also contains a patch to implement auditing of responses to fanotify permission events. That should have been in the fanotify pull request but I mistakenly merged that patch into a wrong branch and noticed only now at which point I don't think it's worth rebasing and redoing. * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: quota: be aware of error from dquot_initialize quota: fix potential infinite loop isofs: use unsigned char types consistently isofs: fix timestamps beyond 2027 udf: Fix some sign-conversion warnings udf: Fix signed/unsigned format specifiers udf: Fix 64-bit sign extension issues affecting blocks > 0x7FFFFFFF udf: Remove some outdate references from documentation udf: Avoid overflow when session starts at large offset ext2: Fix possible sleep in atomic during mount option parsing ext2: Parse mount options into a dedicated structure audit: Record fanotify access control decisions |
||
|
|
23281c8034 |
Merge branch 'fsnotify' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify updates from Jan Kara: - fixes of use-after-tree issues when handling fanotify permission events from Miklos - refcount_t conversions from Elena - fixes of ENOMEM handling in dnotify and fsnotify from me * 'fsnotify' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fsnotify: convert fsnotify_mark.refcnt from atomic_t to refcount_t fanotify: clean up CONFIG_FANOTIFY_ACCESS_PERMISSIONS ifdefs fsnotify: clean up fsnotify() fanotify: fix fsnotify_prepare_user_wait() failure fsnotify: fix pinning group in fsnotify_prepare_user_wait() fsnotify: pin both inode and vfsmount mark fsnotify: clean up fsnotify_prepare/finish_user_wait() fsnotify: convert fsnotify_group.refcnt from atomic_t to refcount_t fsnotify: Protect bail out path of fsnotify_add_mark_locked() properly dnotify: Handle errors from fsnotify_add_mark_locked() in fcntl_dirnotify() |