DataFrame, is like a table in database. It has schema to describe the data. We can easily manipulate 2 dataframes just like sql in relational database. You can do groupBy, count, sort, join, where.
RDD, resilient distributed dataset. collection of elements partitioned across the nodes of the cluster that can be operated on in parallel. It doesn’t has optimization engine.