There are several performance optimizations:
* ByteBuffer/CharBuffer/StringBuilder objects pre-allocated and are
reused on each call to readLine.
* The state for readLine is lazily allocated via JVM classloading
(using a singleton object).
* There is an auto-detection heuristic for "directEOL" encodings which
represent LF ('\n') directly as the corresponding byte
(UTF-8 and many single-byte encodings are like that).
When "directEOL" encoding is used, then bytes are batched into
ByteBuffer for a single call to CharsetDecoder.decode which
results in higher throughput. Otherwise (UTF-16, etc), slower
byte-by-byte approach is used.
* Bytes and chars are directly moved in/out of byte/char arrays and
ByteBuffer/CharBuffer wrappers are used only to interface with
JVM CharsetDecoder class (which is the slowest piece).
* StringBuilder is not used at all for short lines (<=32 chars).
There are also some function improvements to readLine functionality:
* Restriction on "max chars per byte" is lifted, so readLine works with
all encodings that JVM supports.
* It support on-the-fly changes to system default charset, because
it rechecks current charset on each call and updates it decoder
when needed.
All the other features of readLine function are retained:
* It does not read more bytes from System.in than needed, so it
is compatible with other ways to read System.in. On-the-fly
changes to System.in are supported.
* It is thread-safe. Its internal mutable state is protected by
synchronization.
* There is an internal method for tests that supports explicit
charset specification, but the name of this method has changed.
There are additional tests:
* Check all supported encodings on JVM to make sure that readLine
works correctly with them all.
* Check unicode code points of different bits length with all standard
unicode encodings (UTF-8, UTF-16, and UTF-32 in LE/HE byte orders).
Benchmarks that compare different implementations of readLine,
including this one (readLine6NoLV in the set) can be found here:
https://github.com/elizarov/ReadLineBenchmark
Taking BufferedReader as 100% baseline we see that:
* Current readLine is 7.5 times slower than BufferedReader baseline.
* New implementation in this commit is 2.5 timer slower than baseline.
It is ~3 times faster than existing implementation of readLine.
Altogether these optimizations are enough to enable reading of
~500K lines in sports programming setting under 2s time-limit with
plenty of headroom in time. Example that is using this version of
readLine can be found here:
https://codeforces.com/contest/1322/submission/73005366
#KT-37416 Fixed
Added container builders for lists, sets, and maps.
The new experimental type inference only works for the simple builders
with a single generic type. The versions with nullability and the maps
still require explicit specification of the types. Obviously explicit
specification is required for all users who are not using the new
experimental inference. Improving the type inference is something that
should be done separately and many things – including these builders –
will benefit from it, however, this is not a blocker for these builders
in my opinion.
After reading 'length' bytes from a file, check if there's extra content
available and read it into a resizable buffer, then append it after
the first 'length' bytes.
#KT-33864
The code generation uses a mixture of literal `\n` characters and `appendln`. The latter insert `\r\n` on Windows by default, causing generated files to contain a mixture of line endings.
This commit sets the `line.separator` system property for the generator to `\n` so that `appendln` will never insert `\r` characters. As an additional measure, `.gitattributes` files were added to checkout generated stdlib files always with LF line endings.
- switch to scientific notation when value is too big (greater than 1e14)
- allow at most 12 decimals, larger values are coerced to 12
- use scientific format with exactly two decimals in mantissa in JVM