kotlin-fork

Files

T

Roman Elizarov e26a3ad033 Speed up stdlib readLine function (#3185 )

There are several performance optimizations:

* ByteBuffer/CharBuffer/StringBuilder objects pre-allocated and are
  reused on each call to readLine.
* The state for readLine is lazily allocated via JVM classloading
  (using a singleton object).
* There is an auto-detection heuristic for "directEOL" encodings which
  represent LF ('\n') directly as the corresponding byte
  (UTF-8 and many single-byte encodings are like that).
  When "directEOL" encoding is used, then bytes are batched into
  ByteBuffer for a single call to CharsetDecoder.decode which
  results in higher throughput. Otherwise (UTF-16, etc), slower
  byte-by-byte approach is used.
* Bytes and chars are directly moved in/out of byte/char arrays and
  ByteBuffer/CharBuffer wrappers are used only to interface with
  JVM CharsetDecoder class (which is the slowest piece).
* StringBuilder is not used at all for short lines (<=32 chars).

There are also some function improvements to readLine functionality:

* Restriction on "max chars per byte" is lifted, so readLine works with
  all encodings that JVM supports.
* It support on-the-fly changes to system default charset, because
  it rechecks current charset on each call and updates it decoder
  when needed.

All the other features of readLine function are retained:

* It does not read more bytes from System.in than needed, so it
  is compatible with other ways to read System.in. On-the-fly
  changes to System.in are supported.
* It is thread-safe. Its internal mutable state is protected by
  synchronization.
* There is an internal method for tests that supports explicit
  charset specification, but the name of this method has changed.

There are additional tests:

* Check all supported encodings on JVM to make sure that readLine
  works correctly with them all.
* Check unicode code points of different bits length with all standard
  unicode encodings (UTF-8, UTF-16, and UTF-32 in LE/HE byte orders).

Benchmarks that compare different implementations of readLine,
including this one (readLine6NoLV in the set) can be found here:
https://github.com/elizarov/ReadLineBenchmark

Taking BufferedReader as 100% baseline we see that:

* Current readLine is 7.5 times slower than BufferedReader baseline.
* New implementation in this commit is 2.5 timer slower than baseline.
  It is ~3 times faster than existing implementation of readLine.

Altogether these optimizations are enough to enable reading of
~500K lines in sports programming setting under 2s time-limit with
plenty of headroom in time. Example that is using this version of
readLine can be found here:
https://codeforces.com/contest/1322/submission/73005366

#KT-37416 Fixed

2020-03-23 14:36:55 +03:00

Console.kt

Speed up stdlib readLine function (#3185 )

2020-03-23 14:36:55 +03:00

Files.kt

Test reading files with incorrectly reported length

2019-09-28 19:23:26 +03:00

FileTreeWalks.kt

Update copyright.

2019-04-23 20:09:22 +03:00

IOStreams.kt

Update copyright.