* Re-use start trace objects from previous micro passes,
allocating all over is wasteful and harms cache locality,
this gives a minor performance boost to avoid that.
* Specialize start traces to not have and use a "previous"
trace, making some methods faster since they don't have
to check, and start traces use up less memory as a result
as well.
* Avoid parent class "__init__" calls in value traces, instead
duplicate implementations. That's ugly but only faster.
* For some methods they now can be static methods with start
traces, making their calls slightly faster, too.
* Use proper abstract methods instead of assertions against
being called, that was old code.
* Temporary and locals dict variables cannot escape yet were
using a lot of time in virtual method calls.
* This avoids that time entirely making this step even faster.
* We do not take the effort to dynamically clean it up, so
variables could be added there that then don't exist, but
it's still faster this way as those are mostly rare cases.
* We now trace variables in trace collection as a dictionary
per variable with a dictionary of the versions, this is
closer to out frequent usage per variable.
* That makes it a lot easier to update variables after the
tracing is finished to know their users and writers.
* Requires a lot less work, but also makes work less memory
local such that the performance gain is relatively small
despite less work being done.
* Also avoids that a set for the users is to be maintained.
* Escape and unknown traces now have their own
number spaces. This allows to do some tests
for a trace without using the actual object.
* Narrow the scope of variables to the outline scope
that uses them, so that they don't need to be dealt
with in merging later code where they don't ever
change anymore and are not used at all.
* When checking for unused variables, do not ask the
trace collection to filter its traces, instead work
of the ones attached in the variable already. This
avoids a lot of searching work. Also use a method
to decide if a trace constitutes usage rather than
a long elif chain.
* For "PASS 1" of "telethon.tl.types" which has been one
of the known trouble makers with many classes and type
annotations all changes combined improve the compilation
time by 800%.
* The assignment and del nodes were using function to find what they
already knew.
* The "self.variable_trace" already kept track of the previous one,
and in case of overwriting just needs to be preserved.
* For matching unescaped traces we will do similar, but it's not
really used right now, so make it only a TODO.
* This speeds up the first micro pass even more, because it doesn't
have to search and do other things, if not previous trace exists.
* Also the common check if no by name or merges of a value occurred
was used inverted and now should be slightly faster to use.
* While this accelerated the first micro pass by a lot for per
assignment work, it mainly means to cleanup the design such
that traces are easier to re-recognize. And this is a first
step with immediate impacts.
* Cleanup different code used for variable and other names used.
* Cannot use "$" on some platforms, one of which is gcc on AIX, since it's not supported by their assembler.
* This was marking loop variables as unknown in functions where
that was not needed at all, since those variables cannot be
assigned by anything else at all unless they are shared.
* This was also blocking loop variable analysis, since after
becoming "unknown", there was no recovery, now loop types
are properly detected again.
* Also the "exec" re-formulation didn't indicate boolean type properly.
* We should become able to change the type of a variable on the fly
and not just at creation time, to allow more optimization.
* New black removes a few leading new lines in blocks, so many
files changed in that way.
* The "pygments" has a vulnerability, but updating that and the
restructured text checking stuff did not actually matter at
all.
* Bumped needed version for development to 3.8, since black does
not do 3.7 anymore, and it's old enough.
* We need this for optimization of dictionary "in" operation, such that
the use in the Python2 reformulation of it does not cause problems, as
Python3 classes do not have this (immediately).
* This also opens the door for more variations, e.g. where annotations
are used inside the body or not.
* Also make sure to transfer state of existing variable trace when
switching from generic to specialize trace, that should save per
module passes potentially.
* Also do not escape variable assign traces from immutabel values,
that is not necessary, instead have a special assign trace for
it, to know it needs not be escaped. That should save memory.
* Also another specialized variable assign trace for forward
propagation is used, sparing other traces to have the
replacement factory attribute and to be slow to look it up.
* Also container building with "list", etc. should escape variable
content, as the container is not yet traces if that's potentially
not really an issue. And for iterators, we need that.
* The instance counting provided a "__del__" that was a noop, but
still needed to be executed and would cost therefore time
even if not activated.
* Also some sources claim that garbage collection is not done when
"__del__" is overloaded, which is avoided this way too.
* Also cleaned up how its imported and used for greater harmony.
* Derived classes do not have to repeat it, unless they are not using
base class init.
* It has been observed that a KeyboardInterrupt was ignored in
the deactivated form of instance release counting,
which points out that these things should not exist
unless used.
* The variable completeness of the module is now traced in the module locals
scope and taken from there, allowing to use it sooner.
* The functions completeness is likewise traces in its local scope.
* Unused temporary variables and unnecessary closure variables are now
removed more immediately
* Auto releases of parameter variables are introduced sooner
* This should make it use less iterations to optimize a module and
ideally only one global pass for these optimizations.
* This has been done in some places only, now do it everywhere.
* This is supposed to make things cleaner and faster.
* Use class level attributes for cases where e.g. the step of a range
is only present for some variants.
* Dedicated return node for case of returning in tried handler.
could hang
* The locals could refer to the wrong scope if it is not attached at
creation time, which it now is.
* To get to the optimization, variable references for builtin names
have the scope attached just in case they later use it, this has
a dedicated node to save the attribute for the general case.
* Also for Optimization, prepare inlining of helper function bodies,
with local scopes, this should now be easy to accomplish later.
* For loops, we were not considering the initial trace state reliably,
esp. not if it was an uninit trace, which has been fixed.
* Compare traces instead of type shapes to decide if we stablized in
a loop analysis, that is more telling and faster to do.
* More correct annotation of trace escaping, doing away with loop usage
and potential usage.
* Make sure construct the type shapes with alternatives in place, so
we don't have do unmerge them later in code that uses the set of
shapes for a trace or variable.
* The "onExpression" returns the value for a while now,
no need to get at it via accessor funcions.
* Removed child getters and setters for lookup sources and unified
subnode name.
* Finalize assignment statements releasing traces too.
* We use "bool" temp variables for indicator variables.
* These are known to become a C type that needs no releases, so save nodes for
their try/finally release
* Also remove attribute of variables that is only used during tree
building and not afterwards.
* This should save compile time memory as a fist step towards optimizing
unused assignments.
* This will need releases to also have their own traces to be done
reliably.
* This reduces generated C code in some cases, as no extra assignment
needs to be done anymore.
* All helpers now have a target type added, mostly OBJECT now, but e.g.
also NBOOL and CBOOL can be used.
* The helpers for operations are now built with factory functions for their
consistency
* Added shapes for all operations for the common types.
* Added dedicated nodes for all in-place operations solving a TODO.
* Inplace operations to be created are now derived from the binary
operations.
* Conversions for missing target types instead of falling back to
most generic helper.
* Converted manual comparison helpers to new Jinja template for
generating code automatically.
* Added optimization for more operations and their types, e.g. tuples,
floats, and more operations.
* More generalized templates to allow more C types that are not objects.
* Added tests to cover in-place operations
* Test infrastructure for generated tests from Jinja2 templates.
* Type shapes are now all instances, avoid mixing classes and instances
for clarity of code and correctness.
* Enable warnings for all operations in case of specific shape
combinatations not defined.
* Some corrections to boolean C target type code, now applied in many
more test cases.
* Proper void C type added, to be used in code specialization later.
* Autoformat currently only sorts pylint disable comments, and does
not call black yet.
* This also adds a few doc strings that I failed to keep separate as a
commit, but who cares.
* The code is tasked but mark as unknown all shared local variables in
case of Python3, where they can be written by unknown code, but went
too far and did it for all of them.
* This should enable a huge deal of optimization currently prevented for
that version.
* Also solves the TODO to not wholesale disable optimization for all
closure variables, but instead the written ones only.
* This applies proper loop SSA with an initial loop trace that behaves
more like an unknown type shape and only when type knowledge has
converged, it will be considered complete.
* Also some more binary add shape work, with initial support for the
right hand side of an add operation to make the decision.
* Adding "finalize" method that deletes attribute values of nodes that
will cause issues with cyclic dependencies.
* This achieves releasing local dicts nodes for scopes that got
propagated.
* This removes variable and version from the variable traces, which then
can be used as value traces too.
* In the change to value tracing, variable and version is nothing we can
rely on, so these APIs got removed, and users updated.
* This means the version must be stored outside of the trace for
some node types.
* This disallows some assertions about trace values, as they become
ignorant of what they are for.
* Also picking the C type was in fact independent of the trace and moved
to the outside.
* An outline is a function body with a return value used an
an expression. It has its own variable scope and can have
its own frames.
* Using them for contractions of all kinds. The Python2 set
and dict contractions were falsely having an own frame,
this is corrected with this change too.
* Using them for class dictionary creations. This required
to allow for nested classes to cleanup how locals dictionary
is handled.
* There are now entry points and a proper base class for the
functions that are them. Outlines are not entry points.
* This should allow for more optimization to happen across
list contractions and across class bodies.
* Very important part is the nested frames are now supported
and handled.
* Better PyLint 1.7 support, fixed some new warning
it gives.
* Make all call node creations go through either factory or
new helper function.
* Added TODOs.