Implementing FFT

The other day I had a discussion with someone about how to implement FFT-based convolution/polynomial multiplication - they were having a hard time squeezing their library implementation into the time limit on this problem, and it soon turned into a discussion on how to optimize it as much as possible. It turned out that the bit-reversing part of their iterative implementation was taking a pretty large amount of time, so I suggested not using bit-reversal at all, as is done in a few libraries. Since not a lot of people turned out to be familiar with it, I decided to write a post on some ways of implementing FFT and deriving these ways from one another. ...

June 1, 2024 · 19 min · 4001 words · nor

Convenient and near-optimal binary search on floating point numbers

This post was originally written on Codeforces; relevant discussion can be found here. TL;DR Use the following template (C++20) for efficient and near-optimal binary search (in terms of number of queries) on floating point numbers. Template template <std::size_t N_BITS> using int_least_t = std::conditional_t< N_BITS <= 8, std::uint8_t, std::conditional_t< N_BITS <= 16, std::uint16_t, std::conditional_t< N_BITS <= 32, std::uint32_t, std::conditional_t< N_BITS <= 64, std::uint64_t, std::conditional_t<N_BITS <= 128, __uint128_t, void>>>>>; // this should work for float and doubles, but for long doubles, std::bit_cast will fail on most systems due to being 80 bits wide. // to handle this, consider using doubles instead or std::bit_cast the long double to an 80-bit bitset and convert it to a 128 bit integer using to_ullong. /* * returns first x in [a, b] such that predicate(x) is false, conditioned on * logical_predicate(a) && !logical_predicate(b) && logical_predicate(-inf) && * !logical_predicate(inf) * here logical_predicate is the mathematical value of the predicate, not the * machine value of the predicate * it is guaranteed that non-nan, non-inf inputs are passed into the predicate * if NaNs or infinities are passed to this function as argument, then the * inputs to the predicate will start from smallest/largest representable * floating point numbers of the input type - this can be a source of errors * if you multiply the input by something > 1 for example * strictly speaking, the predicate should also be perfectly monotonic, but if * it gives out-of-order booleans in some small range [a, a + eps] (and the * correct order elsewhere), then the answer will be somewhere in between * the same holds for how denormals are handled by this code */ // template <bool check_infinities = false, bool distinguish_plus_minus_zero = false, bool deal_with_nans_and_infs = false, std::floating_point T> T partition_point_fp(T a, T b, auto&& predicate) { static constexpr std::size_t T_WIDTH = sizeof(T) * CHAR_BIT; using Int = int_least_t<T_WIDTH>; static constexpr auto is_negative = [](T x) { return static_cast<bool>((std::bit_cast<Int>(x) >> (T_WIDTH - 1)) & 1); }; if constexpr (distinguish_plus_minus_zero) { if (a == T(0.0) && b == T(0.0) && is_negative(a) && !is_negative(b)) { if (!predicate(-T(0.0))) { return -T(0.0); } else { // predicate(0.0) is guaranteed to be true because b = 0.0 return T(0.0); } } } if (a >= b) return NAN; if constexpr (deal_with_nans_and_infs) { // get rid of NaNs as soon as possible if (std::isnan(a)) a = -std::numeric_limits<T>::infinity(); if (std::isnan(b)) b = std::numeric_limits<T>::infinity(); // deal with infinities if (a == -std::numeric_limits<T>::infinity()) { if constexpr (check_infinities) { if (predicate(-std::numeric_limits<T>::max())) { a = -std::numeric_limits<T>::max(); } else { return -std::numeric_limits<T>::max(); } } else { a = -std::numeric_limits<T>::max(); } } if (b == std::numeric_limits<T>::infinity()) { if constexpr (check_infinities) { if (!predicate(std::numeric_limits<T>::max())) { b = std::numeric_limits<T>::max(); } else { return std::numeric_limits<T>::infinity(); } } else { b = std::numeric_limits<T>::max(); } } } // now a and b are both finite - deal with differently signed a and b if (is_negative(a) && !is_negative(b)) { // check 0 once if constexpr (distinguish_plus_minus_zero) { if (!predicate(-T(0.0))) { b = -T(0.0); } else if (predicate(T(0.0))) { a = T(0.0); } else { return T(0.0); } } else { if (!predicate(T(0.0))) { b = -T(0.0); } else { a = T(0.0); } } } // in the case a and b are both 0 after the above check, return 0 if (a == b) return T(0.0); // start actual binary search auto get_int = [](T x) { return std::bit_cast<Int, T>(x); }; auto get_float = [](Int x) { return std::bit_cast<T, Int>(x); }; if (b > 0) { while (get_int(a) + 1 < get_int(b)) { auto m = std::midpoint(get_int(a), get_int(b)); if (predicate(get_float(m))) { a = get_float(m); } else { b = get_float(m); } } } else { while (get_int(-b) + 1 < get_int(-a)) { auto m = std::midpoint(get_int(-b), get_int(-a)); if (predicate(-get_float(m))) { a = -get_float(m); } else { b = -get_float(m); } } } return b; } It is also possible to extend this to breaking early when a custom closeness predicate is true (for example, min(absolute error, relative error) < 1e-9 and so on), but for the sake of simplicity, this template does not do so. ...

March 5, 2024 · 8 min · 1611 words · nor

The Floyd-Warshall algorithm and its generalizations

This post was originally written on Codeforces; relevant discussion can be found here. TL;DR The Floyd-Warshall algorithm is a special case of the more general case of aggregation over paths in a graph — in this post, we look at a few examples and point to some references for further study. The algebraic path problem constitutes a family of problems that can be solved using similar techniques. This and this paper develop the theory of semirings in this context, and this treats some computational aspects of it. Kleene algebras are what end up being used in most of the examples given here, but semirings are more general structures. While not discussed in the post, the idea of a valuation algebra is a further generalization that shows connections to more problems of various types. Introduction Most people doing competitive programming who learn about the Floyd-Warshall algorithm learn it in the context of computing the lengths of the shortest paths between all pairs of vertices in a graph on \(n\) vertices in \(O(n^3)\), which is better than naively running Dijkstra or other kinds of shortest-path algorithms (other than Johnson’s algorithm for the sparse case), and can also identify negative cycles. ...

July 1, 2023 · 16 min · 3310 words · nor

A comprehensive guide to permutations for beginners

This post was originally written on Codeforces, relevant discussion can be found here. This post is intended to be an introduction to permutations for beginners, as there seem to be no resources other than cryptic (for beginners) editorials that talk about some cycles/composition, and people unfamiliar with permutations are left wondering what those things are. We’ll break the post into three major parts based on how we most commonly look at permutations. Of course, even though we will treat these approaches separately (though not so much for the last two approaches), in many problems, you might need to use multiple approaches at the same time and maybe even something completely new. Usually, you’d use these ideas in conjunction with something like greedy/DP after you realize the underlying setup. ...

January 9, 2023 · 33 min · 6854 words · nor

Greedoids: a formal way to look at families of greedily-solvable problems

This post was originally written on Codeforces; relevant discussion can be found here. Disclaimer: This is not an introduction to greedy algorithms. Rather, it is only a way to formalize and internalize certain patterns that crop up while solving problems that can be solved using a greedy algorithm. Note for beginners: If you’re uncomfortable with proving the correctness of greedy algorithms, I would refer you to this tutorial that describes two common ways of proving correctness — “greedy stays ahead” and “exchange arguments”. For examples of such arguments, I would recommend trying to prove the correctness of standard greedy algorithms (such as choosing events to maximize number of non-overlapping events, Kruskal’s algorithm, binary representation of an integer) using these methods, and using your favourite search engine to look up more examples. ...

January 4, 2023 · 35 min · 7373 words · nor

Catalan Numbers and Generating Uniform Balanced Bracket Sequences

This was written jointly by errorgorn and me and published on errorgorn’s blog, and the original post is here. You can also check out his personal blog here. Hi everyone! Today nor sir and I would like to talk about generating uniform bracket sequences. Over the past few years when preparing test cases, I have had to generate a uniform bracket sequence a few times. Unfortunately I could not figure out how, so I would like to write a post about this now. Hopefully this post would be useful to future problem setters :) Scroll down to the end of the post to copy our generators. ...

May 27, 2022 · 24 min · 5007 words · nor

Binary search and other "halving" methods

This post was originally written on Codeforces; relevant discussion can be found here. As a certain legendary grandmaster once said (paraphrased), if you know obscure techniques and are not red yet, go and learn binary search. The main aim of this tutorial is to collect and explain some ideas that are related to binary search, and collect some great tutorials/blogs/comments that explain related things. I haven’t found a comprehensive tutorial for binary search that also explains some intuition about how to get your own implementation right with invariants, as well as some language features for binary search, so I felt like I should write something that covers most of what I know about binary search. ...

November 7, 2021 · 27 min · 5555 words · nor

Traversing the complement graph in linear/near-linear time in multiple ways

This post was originally written on Codeforces; relevant discussion can be found here. In this post, I will focus on the following problem: 920E. More generally, we will look at how to do efficiently traverse the complement graph of a given graph in both dfs and bfs orders. In particular, the bfs order also allows us to find a shortest-path tree of the complement graph. This is my first post, so please let me know of any errors/typos that might have crept in. ...

August 12, 2021 · 8 min · 1539 words · nor
>