We have seen a big issue with Spark job, which is, it writes its output files with part-nnnn naming due to its distributed behavior, and its not possible to rename it directly before writing, or modifying the underlying functions is not that easy. The only way to carry out this task is to do it on s3 directly once the file has been written. If you have few files you can do manually on the web interface of s3, but if you have many files in different folders, you can use S3 SDK.
Renaming Spark Part-NNNN Files on S3
Renaming Spark Part-NNNN Files on S3
Renaming Spark Part-NNNN Files on S3
We have seen a big issue with Spark job, which is, it writes its output files with part-nnnn naming due to its distributed behavior, and its not possible to rename it directly before writing, or modifying the underlying functions is not that easy. The only way to carry out this task is to do it on s3 directly once the file has been written. If you have few files you can do manually on the web interface of s3, but if you have many files in different folders, you can use S3 SDK.