Spark API

We recommend that you use Spark API. Spark works with partitions of the data, a partition is around ~128MB in size. Because Pandas loads everything into the memory, its API is easier but very limited in terms of scalability.