Android Repository Pattern in the Real World
What is a Repository and is it a must in your Android app
Table of contents
Repository is a common pattern in Android, well, actually not only in Android but even more common in backend development, take Spring Framework, .NET or even Phoenix Framework, all of them use repository pattern.
On Android, however, I find it strange, it doesn't seem right somehow.
In this post we'll be talking about what a repository really is and how to adopt the pattern in your project.
"Android Repository Pattern Explained"
You see this everywhere, in blog posts, template projects or in talks. I believe it was first shown up during RxJava era, in order to provide the best user experiences. The idea is when user open the app, you show locally cached data first to make your app looks fast and responsive, then you fetch data remotely from your API and show the updated data later. The caching mechanism is built-in to ensure consistency between local and remote. To illustrate what I'm talking about, the code snippet below will be served as a basis for example throughout this post.
TaskRepository Example
data class Task(val id: String, val content: String, val isDone: Boolean)
interface TaskDataSource {
suspend fun listTasks(): List<Task>
suspend fun getTask(id: String): Task?
suspend fun saveTask(task: Task)
}
class RemoteTaskDataSource : TaskDataSource {
override suspend fun listTasks(): List<Task> = TODO("list from API")
override suspend fun getTask(id: String): Task = TODO("get from API")
override suspend fun saveTask(task: Task) = TODO("save to API")
}
class LocalTaskDataSource : TaskDataSource {
override suspend fun listTasks(): List<Task> = TODO("list from SharedPreferences/DB")
override suspend fun getTask(id: String): Task = TODO("get from SharedPreferences/DB")
override suspend fun saveTask(task: Task) = TODO("save to SharedPreferences/DB")
}
interface TaskRepository {
fun listTasks(forceUpdate: Boolean): Flow<List<Task>>
suspend fun getTask(id: String, forceUpdate: Boolean): Task?
suspend fun saveTask(task: Task)
}
class DefaultTaskRepository(
val localTaskDataSource: TaskDataSource,
val remoteTaskDataSource: TaskDataSource,
) : TaskRepository {
override fun listTasks(forceUpdate: Boolean): Flow<List<Task>> {
return flow {
emit(localTaskDataSource.listTasks())
if (forceUpdate) {
val tasks = remoteTaskDataSource.listTasks()
emit(tasks)
tasks.forEach { localTaskDataSource.saveTask(it) }
}
}
}
override suspend fun getTask(id: String, forceUpdate: Boolean): Task? {
if (forceUpdate) {
val task = remoteTaskDataSource.getTask(id)
if (task != null) {
localTaskDataSource.saveTask(task)
}
}
return localTaskDataSource.getTask(id)
}
override suspend fun saveTask(task: Task) {
localTaskDataSource.saveTask(task)
remoteTaskDataSource.saveTask(task)
}
}
This is a dumb down version of a standard task/todo list example where you are building a todo app. I'm sure you've seen before, but for clarity, let's go through it again together
Task
is just a data classTaskDataSource
is an interface defines what a data source for tasks can doRemoteTaskDataSource
implementsTaskDataSource
and actually make API calls list/get/save tasks remotelyLocalTaskDataSource
implementsTaskDataSource
and list/get/save tasks locally by SharedPreferences or DatabaseTaskRepository
is an interface defines what a repository for tasks can doDefaultTaskRepository
implementsTaskRepository
, makes API calls through an instance ofRemoteTaskDataSource
and manages cache locally throughLocalTaskDataSource
DefaultTaskRepository.listTasks()
yield task list from local, ifforceUpdate
is true then yield task list from remote and save the updated tasks to localDefaultTaskRepository.getTask()
return a task with specific ID from either local or remote depend on the flagforceUpdate: Boolean
DefaultTaskRepository.saveTask()
save a task locally then remotely
Sample scenario
Now let's see where each of the functions defined on TaskRepository
might have been used.
DefaultTaskRepository.listTasks(true)
would be called at task list screen, so local cache can be used laterDefaultTaskRepository.getTask("id", false)
would be called when you enter task detail screen from the task list screen, so you can use local cache you recently fetched from remote seconds agoDefaultTaskRepository.getTask("id", true)
would be called when you enter task detail screen from a push notification outside the app, so you have the latest data from remote sourceDefaultTaskRepository.saveTask()
would be called at task detail screen after editDefaultTaskRepository.listTasks(false)
would be called at task list screen after you navigate back from task detail screen, so you always get the latest data whether the task was actually modified or not
So what's wrong with it?
Nothing, as long as this exact pattern serves your requirements well. But things go south when your requirements differ, and you strictly enforce this pattern throughout your app. What if you don't need to show the cache data first? What if the data can be updated outside the app, and you always want the latest data from remote? Can you do this without local data source? What is the different between Repository and data source? Do you even need a remote data source then?
Let's explore some scenarios where this pattern can cause you problems and how to avoid them.
Not all DataSources are created equal
Both implementations of TaskDataSource
are equal and all. But not all of them are. Take this UserDataSource
for example
data class User(val id: String, val name: String)
interface UserDataSource {
suspend fun getCurrentUser(): User
suspend fun saveCurrentUser(user: User)
}
class LocalUserDataSource : UserDataSource {
override suspend fun getCurrentUser(): User = TODO("get from local")
override suspend fun saveCurrentUser(user: User) = TODO("save to local")
}
class RemoteUserDataSource : UserDataSource {
override suspend fun getCurrentUser(): User = TODO("get from API")
override suspend fun saveCurrentUser(user: User) {
throw UnsupportedOperationException("RemoteUserDataSource does not support saveCurrentUser")
}
}
interface UserRepository {
suspend fun getCurrentUser(forceUpdate: Boolean): User
}
class DefaultUserRepository(
val localUserDataSource: UserDataSource,
val remoteUserDataSource: UserDataSource,
) : UserRepository {
override suspend fun getCurrentUser(forceUpdate: Boolean): User {
if (forceUpdate) {
val user = remoteUserDataSource.getCurrentUser()
localUserDataSource.saveCurrentUser(user)
}
return localUserDataSource.getCurrentUser()
}
}
After a user logged in, you typically fetch a User
from API, right? Then you save locally for later use without having to fetch a second time. In this scenario, the API to saveCurrentUser()
doesn't exist. You have 2 problems here.
An implementation denies to perform an operation it claims to conform, even worse if you are not throwing an exception and just do nothing, you'll never know your data is not getting saved!
- Possible solution would be not to declare
saveCurrentUser()
on the interface
- Possible solution would be not to declare
Imagine when you call constructor and mess up the argument order because they both are the exact same interface, like this
DefaultUserRepository(RemoteUserDataSource(), LocalUserDataSource())
. Big trouble here, you get from local and save to remote, wait, what? I'm confused.- Possible solution would be to accept a concrete class instead of an interface when you want specific behavior
Not everything is Repository's responsibilities
Repository pattern in Android is somewhat more relaxed than any other framework. For example, in Spring Framework, the Repository definition is declared (and implemented) by the framework itself. Phoenix Framework uses Ecto.Repo which is a module provide by the library. But in Android, you are free to implement whatever you want in a Repository, so it is easy to give it responsibility it shouldn't have been responsible for.
Repository should not prepare data for a specific scenario
You can see this coming from miles away. The requirement "local data first, remote data later" itself is fine, but it shouldn't be done by a repository. A repository is supposed to just get the data, it should not know how to hand you data from 2 different sources for a specific screen. It should have been ViewModel's responsibility.
interface BetterTaskRepository {
suspend fun listTasks(forceUpdate: Boolean): List<Task>
suspend fun getTask(id: String, forceUpdate: Boolean): Task?
suspend fun saveTask(task: Task)
}
class DefaultBetterTaskRepository(
val localTaskDataSource: LocalTaskDataSource,
val remoteTaskDataSource: RemoteTaskDataSource,
) : BetterTaskRepository {
override suspend fun listTasks(forceUpdate: Boolean): List<Task> {
if (forceUpdate) {
val tasks = remoteTaskDataSource.listTasks()
tasks.forEach { localTaskDataSource.saveTask(it) }
}
return localTaskDataSource.listTasks()
}
override suspend fun getTask(id: String, forceUpdate: Boolean): Task? {
if (forceUpdate) {
val task = remoteTaskDataSource.getTask(id)
if (task != null) {
localTaskDataSource.saveTask(task)
}
}
return localTaskDataSource.getTask(id)
}
override suspend fun saveTask(task: Task) {
localTaskDataSource.saveTask(task)
remoteTaskDataSource.saveTask(task)
}
}
class TaskListViewModel(val betterTaskRepository: BetterTaskRepository) : ViewModel() {
val tasks: MutableLiveData<List<Task>> = MutableLiveData(listOf())
fun getTaskList() {
viewModelScope.launch {
tasks.value = betterTaskRepository.listTasks(false)
tasks.value = betterTaskRepository.listTasks(true)
}
}
}
Repository is not responsible for State Management
If you have some web frontend background, you know exactly what I'm talking about, for native mobile folks, let me explain.
Remember the sample scenarios where you only get local data because you know you have the latest data available? The thing is you are not using the "cache" as a cache, the local data must be there in order to work correctly.
The problem is there is a specific flow, specific sequence of function calls needed to make the next function call in another specific screen/flow works correctly. The worse thing is this connection is undocumented, you can't imply this just from reading individual piece of code, you need to know all the components involved in the whole flow to understand. Imagine if there are new screen/flow calling functions from your repository out of order, the "cache" will be messed up and the existing screens will fail to work without anybody touching them.
The actual problem we are solving here is "How to share same piece of data across different screen without going insane", also known as "State Management". This is a common issue found in every frontend development, web, native app, or cross-platform. Each of them have different approach to solve this. For example, React's Redux uses one big single immutable state object for the whole webapp, there are also MobX and many other patterns I have no idea how they work. In Flutter, you can simply "lift the state up" or even choose to adopt Redux's idea from React.
In Android, using any sort "shared" data was discouraged in the past. I remember vividly back when I got countless crashes from using a simple Singleton due to Android's lifecycle and the fact that I had not implemented onSaveInstanceState()
properly. Plus the fact that passing data between Activity or Fragment properly is a tedious task. I guess that is why the majority of Android community solves this problem using a persistence storage instead.
But this is 2022 people! We have several options now. We have ViewModel, it can out lives a Fragment. We can have a shared ViewModel across Fragments by scope it to an Activity. We can even have a shared ViewModel scope to a nav graph! Or you can take a different approach and utilize DI to control the scope.
Now, let's see a possible solution for this "State Management" problem.
class TaskListViewModel(val betterTaskRepository: BetterTaskRepository) : ViewModel() {
val tasks: MutableLiveData<List<Task>> = MutableLiveData(listOf())
fun getTaskList() {
viewModelScope.launch {
tasks.value = betterTaskRepository.listTasks(false)
tasks.value = betterTaskRepository.listTasks(true)
}
}
fun saveTask(task: Task) {
viewModelScope.launch {
betterTaskRepository.saveTask(task)
val newTaskList = tasks.value!!.map {
if (task.id == it.id) {
task
} else {
it
}
}
tasks.value = newTaskList
}
}
}
class TaskListFragment : Fragment() {
val viewModel: TaskListViewModel by activityViewModels()
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
viewModel.getTaskList()
}
override fun onViewCreated(view: View, savedInstanceState: Bundle?) {
super.onViewCreated(view, savedInstanceState)
viewModel.tasks.observe(viewLifecycleOwner) { taskList: List<Task>? ->
// set view task list
}
}
}
class TaskDetailFragment : Fragment() {
val viewModel: TaskListViewModel by activityViewModels()
val args: Args by navArgs()
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
viewModel.tasks.observe(viewLifecycleOwner) { taskList: List<Task>? ->
val task = taskList?.first { it.id == args.taskId }
// set view task detail
}
}
fun saveTask(task: Task) {
viewModel.saveTask(task)
}
class Args(val taskId: String) : NavArgs
}
TaskListViewModel
supports both task listing forTaskListFragment
and save task forTaskDetailFragment
TaskListFragment
andTaskDetailFragment
lives under the same Activity- They both share the same instance of
val viewModel: TaskListViewModel by activityViewModels()
- They both obviously observe the same piece of data. Unlike the use of "cache"
LocalDataSource
where you don't know they read the same piece of data at first glance. - Navigate from
TaskListFragment
toTaskDetailFragment
passing only an ID as an argument TaskDetailFragment
observe the same list asTaskListFragment
but use only the one with the same ID passed- When
TaskDetailFragment
save the task,TaskListViewModel
ensure the task on the list gets updated correctly - Navigate back from
TaskDetailFragment
toTaskListFragment
, no need to do anything because they observe the same list which is always up-to-date
The major benefit is everything is obvious, not hiding under layers of abstraction, and the risk of anyone messing up the "cache" is zero. Because the LocalDataSource
truly serve as a cache for TaskListViewModel.getTaskList()
What if you only need data from RemoteDataSource?
What if your app has announcement like a popup or banners that only need the latest data from remote. What would your repository be?
data class Popup(val title: String, val imageUrl: String)
data class Banner(val title: String, val imageUrl: String, val content: String)
interface AnnouncementDataSource {
suspend fun getPopup(): Popup
suspend fun getHomeScreenBanners(): List<Banner>
suspend fun getMarketPlaceScreenBanners(): List<Banner>
}
interface RetrofitAnnouncementApi {
@GET("popup")
suspend fun getPopup(): Popup
@GET("banners/home")
suspend fun getHomeScreenBanners(): List<Banner>
@GET("banners/marketplace")
suspend fun getMarketPlaceScreenBanners(): List<Banner>
}
class RemoteAnnouncementDataSource(val retrofitAnnouncementApi: RetrofitAnnouncementApi) : AnnouncementDataSource {
override suspend fun getPopup() = retrofitAnnouncementApi.getPopup()
override suspend fun getHomeScreenBanners() = retrofitAnnouncementApi.getHomeScreenBanners()
override suspend fun getMarketPlaceScreenBanners() = retrofitAnnouncementApi.getMarketPlaceScreenBanners()
}
interface AnnouncementRepository {
suspend fun getPopup(): Popup
suspend fun getHomeScreenBanners(): List<Banner>
suspend fun getMarketPlaceScreenBanners(): List<Banner>
}
class DefaultAnnouncementRepository(
val remoteAnnouncementDataSource: RemoteAnnouncementDataSource
) : AnnouncementRepository {
override suspend fun getPopup() = remoteAnnouncementDataSource.getPopup()
override suspend fun getHomeScreenBanners() = remoteAnnouncementDataSource.getHomeScreenBanners()
override suspend fun getMarketPlaceScreenBanners() = remoteAnnouncementDataSource.getMarketPlaceScreenBanners()
}
class HomeViewModel(val announcementRepository: AnnouncementRepository) : ViewModel() {
val bannerList: MutableLiveData<List<Banner>> = MutableLiveData(listOf())
fun getBanners() {
viewModelScope.launch {
val banners = announcementRepository.getHomeScreenBanners()
bannerList.value = banners
}
}
}
class HomeFragment : Fragment() {
val viewModel: HomeViewModel by viewModel()
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
viewModel.getBanners()
}
override fun onViewCreated(view: View, savedInstanceState: Bundle?) {
super.onViewCreated(view, savedInstanceState)
viewModel.bannerList.observe(viewLifecycleOwner) {
//set view
}
}
}
Popup
andBanner
are just data classesAnnouncementDataSource
defines what kind of announcement you can getRetrofitAnnouncementApi
is an interface for Retrofit to create an implementationRemoteAnnouncementDataSource
implementsAnnouncementDataSource
acceptRetrofitAnnouncementApi
as a constructor parameterRemoteAnnouncementDataSource
get the actual data from Retrofit interface- No
LocalAnnouncementDataSource
because it is not needed AnnouncementRepository
defines what kind of announcement you can getDefaultAnnouncementRepository
implementsAnnouncementRepository
accept onlyRemoteAnnouncementDataSource
as a constructor parameterDefaultAnnouncementRepository
delegates all the work toRemoteAnnouncementDataSource
HomeViewModel
accept an interfaceAnnouncementRepository
as a constructor parameterHomeViewModel
get and hold the data forHomeFragment
to observeHomeFragment
callHomeViewModel.getBanners()
at an appropriate lifecycle callback and observeHomeViewModel.bannerList
to render UI
What's wrong with it?
The problem is obvious right? AnnouncementDataSource
, RetrofitAnnouncementApi
and AnnouncementRepository
have
the exact same functions. The only thing DefaultAnnouncementRepository
and RemoteAnnouncementDataSource
do is delegate the work down the chain.
Why on earth calling a simple API has to be this hard? You know in your heart that DefaultAnnouncementRepository
and RemoteAnnouncementDataSource
are useless, they are just middle men to satisfy the pattern. They are just more code
for you to maintain, they don't really abstract anything for you because you need the latest data from API.
Possible solution
First, accept the fact that you only need data from API and just remove all middle men leaving you only useful components.
interface AnnouncementRepository {
@GET("popup")
suspend fun getPopup(): Popup
@GET("banners/home")
suspend fun getHomeScreenBanners(): List<Banner>
@GET("banners/marketplace")
suspend fun getMarketPlaceScreenBanners(): List<Banner>
}
class HomeViewModel(val announcementRepository: AnnouncementRepository) : ViewModel() {
val bannerList: MutableLiveData<List<Banner>> = MutableLiveData(listOf())
fun getBanners() {
viewModelScope.launch {
val banners = announcementRepository.getHomeScreenBanners()
bannerList.value = banners
}
}
}
class HomeFragment : Fragment() {
val viewModel: HomeViewModel by viewModel()
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
viewModel.getBanners()
}
override fun onViewCreated(view: View, savedInstanceState: Bundle?) {
super.onViewCreated(view, savedInstanceState)
viewModel.bannerList.observe(viewLifecycleOwner) {
//set view
}
}
}
Yes, you can just annotate your AnnouncementRepository
interface and have Retrofit create an implementation for you.
But what about abstraction, unit testing all and those best practices?
Calm down, they are still intact.
The abstraction is still there. From ViewModel's perspective, it is getting data from an AnnouncementRepository
,
it doesn't know it is from Retrofit, or it is hand-coded DefaultAnnouncementRepository
with all those middle men.
To be honest, ViewModel doesn't have to know, you just give it an instance of AnnouncementRepository
in your DI setup,
and it'll work fine.
Unit testing is still possible. Since ViewModel accept an interface AnnouncementRepository
, you can inject mock
object just fine. Having some annotation doesn't mean it is not an interface anymore.
Wait, what if you want to change the implementation later?
Oh come on, do you really want to move away from Retrofit? Okay, fine, yes, you can. It is an interface, so you can just
write a new implementation and plug it in whenever you want. After that you can safely delete the annotations
from AnnouncementRepository
and remove Retrofit from your project.
Read more on how retrofit "fits" in MVVM here
Not every Repository in a project needs to be the same
Well this is kind of you choice, really. You can adopt the same pattern throughout your project for consistency or pick and choose depend on the requirement.
Not every API call is a Repository
In Android, on paper, Repository is an abstraction over data source but in reality, we all know Repository is mostly API calls with some local cache. So you have a Repository like this.
interface LoginRepository {
suspend fun login(username: String, password: String): String
suspend fun verifyPin(pin: String)
suspend fun requestOtpForgotPassword(username: String): String
suspend fun verifyOtpForgotPassword(otp: String)
}
Let's take a step back and consult google search on "repository", thanks to Oxford Languages, we have the definition.
COMPUTING, a central location in which data is stored and managed.
Looks like our LoginRepository
does a lot more than store and manage data, doesn't it. Does LoginRepository
seem to be an appropriate name?
I don't think so.
If you are not convinced, let's take a look at what other framework's Repository looks like.
Spring Framework's Repository
Head over to our dear JVM friend, String Data JPA, and you'll see this
public interface CrudRepository<T, ID> extends Repository<T, ID> {
<S extends T> S save(S entity);
Optional<T> findById(ID primaryKey);
Iterable<T> findAll();
long count();
void delete(T entity);
boolean existsById(ID primaryKey);
// … more functionality omitted.
}
Spring's Repository provides CRUD operations and query abstraction. There is no way you can implement LoginRepository
in Spring using a Repository
. You probably need at least 1 UserRepository
, 1 OtpRepository
and a LoginService
for logic. But in reality you wouldn't implement auth yourself, Spring has security module.
Phoenix Framework's Repo
Phoenix Framework uses Ecto as its persistence framework. If you go read the documentation, it will tell you straight up Repo is how you work with database. It provides CRUD operations and query abstraction as well.
Ecto repositories are the interface into a storage system, be it a database like PostgreSQL or an external service like a RESTful API. The Repo module's purpose is to take care of the finer details of persistence and data querying for us. As the caller, we only care about fetching and persisting data. The Repo module takes care of the underlying database adapter communication, connection pooling, and error translation for database constraint violations.
Possible solution
Name it LoginService
, AuthService
or AuthApi
just the way it is, don't abstract things that aren't there for the sake of pattern, the use of interface is enough for flexibility and testability.
Of course, I'm not telling you to drop the whole Repository
altogether, just use it where it makes sense, TasksRepository
is a good example, it basically provides CRUD operations, a repository by the book. Another example AnnouncementRepository
is kind of a mix bag, you have only read operation but with different type Popup
and Banner
, I'd feel at ease name it AnnouncementApi
or split into PopupRepository
and BannerRepository
, I don't know, this is hard. But LoginRepository
is a no no.
Conclusion
Repository Pattern is used to abstract the underlying data source(s), typically a table in a database, to separate database logic from business logic. In some case, often in mobile apps, Repository abstract a group of API calls to access database behind a backend.
Patterns are invented to solve problems, use the right pattern to solve your problem, don't adopt a pattern to a problem you've never had.