From the Python documentation:
* NotImplemented: "Special value which can be returned by the 'rich comparison'
special methods (__eq__(), __lt__(), and friends), to indicate that the
comparison is not implemented with respect to the other type."
* NotImplementedError: "This exception is derived from RuntimeError. In user
defined base classes, abstract methods should raise this exception when they
require derived classes to override the method."
* E101: indentation contains mixed spaces and tabs
* E111: indentation is not a multiple of four
* E128: continuation line under-indented for visual indent
* E302: expected 2 blank lines, found 1
* W191: indentation contains tabs
The repo contains roughly 80 Python scripts. "snake_case" naming is used for
local variables in all those scripts. This is the form recommended by the PEP 8
naming recommendations (Python Software Foundation) and typically associated
with idiomatic Python code.
However, in nine of the 80 scripts there were at least one instance of
"camelCase" naming prior to this commit.
This commit improves consistency in the Python code base by making sure that
these nine remaining files follow the variable naming convention used for
Python code in the project.
References:
* PEP 8: https://www.python.org/dev/peps/pep-0008/
* pep8-naming: https://pypi.python.org/pypi/pep8-naming
Fixes:
* multiple statements on one line (colon) (E701)
* missing whitespace around arithmetic operator (E226)
* missing whitespace around operator (E225)
* closing bracket does not match visual indentation (E124)
* blank line contains whitespace (W293)
* continuation line missing indentation or outdented (E122)
* continuation line over-indented for hanging indent (E126)
* missing expected blank line (E301)
* trailing whitespace (W291)
* unexpected spaces around keyword / parameter equals (E251)
* whitespace after '(', '[' or '{' (E201)
* whitespace before ')', ']' or '}' (E202)
* whitespace before ',' or ':' (E203)
Fixes:
* blank line at end of file
* closing bracket does not match indentation of opening bracket's line
* continuation line over-indented for hanging indent
* continuation line over-indented for visual indent
* continuation line unaligned for hanging indent
* inline comment should start with '# '
* missing whitespace around arithmetic operator
* missing whitespace around bitwise or shift operator
* multiple imports on one line
* multiple spaces after ':'
* multiple spaces after operator
PEP8 regressions can be found by running:
```
flake8 --ignore=E101,E111,E121,E122,E123,E124,E125,E126,E127,E128,E129,E131,E201,E202,E203,E222,E225,E226,E227,E231,E241,E251,E261,E262,E265,E301,E302,E303,E401,E501,E701,W191,W291,W293,W391 .
```
The ignores are minor PEP8 violations currently found in the repo. We ignore those.
Python 2's map function [1] returns a list by default. Compared with
Python 3's map function [2] which returns an iterator (or map object).
The former is subscriptable, while the latter is not.
This patch explicitly converts the result of some map operations to be
a list. That way they have the same intended behaviour on both Python 2
and 3.
[1] https://docs.python.org/2/library/functions.html#map
[2] https://docs.python.org/3/library/functions.html#map
PEP 3106 [1] changed the behavior of the dictionaries `items` method.
In Python 2, `items` builds a real list of tuples where `iteritems`
returns a generator. PEP 3106 changes Python 3's `items` method to be
equivalent to Python 2's `iteritems` and completely removes `iteritems`
in Python 3.
This patch switches to both to use `items`. This could have a negative
impact on Python 2's performance because it now causes the dictionary
tuples to be built in memory.
[1] https://www.python.org/dev/peps/pep-3106/
All strings are sequences of Unicode characters in Python 3. This is
entirely different than that of Python 2. Python 2's strings were of
bytes. However, Python 2 does have the concept of Unicode strings. This
patch changes the behavior of the file reader to use the same the codecs
module on Python 2 to properly read a string into a unicode string. From
there the strings are meant to be equivalent on 2 and 3. The rest of the
patch just updates the code to natively work with unicode strings.
To test the class `GraphemeClusterBreakPropertyTable`:
$ python2 utils/gyb --test \
-DunicodeGraphemeBreakPropertyFile=./utils/UnicodeData/GraphemeBreakProperty.txt \
-DunicodeGraphemeBreakTestFile=./utils/UnicodeData/GraphemeBreakTest.txt \
-DCMAKE_SIZEOF_VOID_P=8 \
-o /tmp/UnicodeExtendedGraphemeClusters.cpp.2.7.tmp \
./stdlib/public/stubs/UnicodeExtendedGraphemeClusters.cpp.gyb
$ python3 utils/gyb --test \
-DunicodeGraphemeBreakPropertyFile=./utils/UnicodeData/GraphemeBreakProperty.txt \
-DunicodeGraphemeBreakTestFile=./utils/UnicodeData/GraphemeBreakTest.txt \
-DCMAKE_SIZEOF_VOID_P=8 \
-o /tmp/UnicodeExtendedGraphemeClusters.cpp.3.5.tmp \
./stdlib/public/stubs/UnicodeExtendedGraphemeClusters.cpp.gyb
$ diff -u /tmp/UnicodeExtendedGraphemeClusters.cpp.2.7.tmp \
/tmp/UnicodeExtendedGraphemeClusters.cpp.3.5.tmp
To test the method `get_grapheme_cluster_break_tests_as_UTF8`:
$ python2 utils/gyb --test \
-DunicodeGraphemeBreakPropertyFile=./utils/UnicodeData/GraphemeBreakProperty.txt \
-DunicodeGraphemeBreakTestFile=./utils/UnicodeData/GraphemeBreakTest.txt \
-DCMAKE_SIZEOF_VOID_P=8 \
-o /tmp/UnicodeGraphemeBreakTest.cpp.2.7.tmp \
./unittests/Basic/UnicodeGraphemeBreakTest.cpp.gyb
$ python3 utils/gyb --test \
-DunicodeGraphemeBreakPropertyFile=./utils/UnicodeData/GraphemeBreakProperty.txt \
-DunicodeGraphemeBreakTestFile=./utils/UnicodeData/GraphemeBreakTest.txt \
-DCMAKE_SIZEOF_VOID_P=8 \
-o /tmp/UnicodeGraphemeBreakTest.cpp.3.5.tmp \
./unittests/Basic/UnicodeGraphemeBreakTest.cpp.gyb
$ diff -u /tmp/UnicodeGraphemeBreakTest.cpp.2.7.tmp \
/tmp/UnicodeGraphemeBreakTest.cpp.3.5.tmp
trie parameters and fix a few bugs
The bugs did not affect correctness of the particular instance of trie created
for grapheme cluster property, because trie parameters that were confused with
each other happened to be equal.
Also, fix a trie size bug: we were creating a trie large enough to store
information for 0x200000 code points, but there are only 0x10ffff. It saved
only 15 bytes in the grapheme cluster tree, because that extra information was
compressed with some supplementary planes that also had default values. This
also improved trie generation time by almost 2x.
Swift SVN r19457
algorithm
The implementation uses a specialized trie that has not been tuned to the table
data. I tried guessing parameter values that should work well, but did not do
any performance measurements.
There is no efficient way to initialize arrays with static data in Swift. The
required tables are being generated as C++ code in the runtime library.
rdar://16013860
Swift SVN r19340