Android Repository Pattern in the Real World

What is a Repository and is it a must in your Android app

Repository is a common pattern in Android, well, actually not only in Android but even more common in backend development, take Spring Framework, .NET or even Phoenix Framework, all of them use repository pattern.

On Android, however, I find it strange, it doesn't seem right somehow.

In this post we'll be talking about what a repository really is and how to adopt the pattern in your project.

"Android Repository Pattern Explained"

You see this everywhere, in blog posts, template projects or in talks. I believe it was first shown up during RxJava era, in order to provide the best user experiences. The idea is when user open the app, you show locally cached data first to make your app looks fast and responsive, then you fetch data remotely from your API and show the updated data later. The caching mechanism is built-in to ensure consistency between local and remote. To illustrate what I'm talking about, the code snippet below will be served as a basis for example throughout this post.

TaskRepository Example

data class Task(val id: String, val content: String, val isDone: Boolean)

interface TaskDataSource {
    suspend fun listTasks(): List<Task>
    suspend fun getTask(id: String): Task?
    suspend fun saveTask(task: Task)
}

class RemoteTaskDataSource : TaskDataSource {
    override suspend fun listTasks(): List<Task> = TODO("list from API")

    override suspend fun getTask(id: String): Task = TODO("get from API")

    override suspend fun saveTask(task: Task) = TODO("save to API")
}

class LocalTaskDataSource : TaskDataSource {
    override suspend fun listTasks(): List<Task> = TODO("list from SharedPreferences/DB")

    override suspend fun getTask(id: String): Task = TODO("get from SharedPreferences/DB")

    override suspend fun saveTask(task: Task) = TODO("save to SharedPreferences/DB")
}

interface TaskRepository {
    fun listTasks(forceUpdate: Boolean): Flow<List<Task>>
    suspend fun getTask(id: String, forceUpdate: Boolean): Task?
    suspend fun saveTask(task: Task)
}

class DefaultTaskRepository(
        val localTaskDataSource: TaskDataSource,
        val remoteTaskDataSource: TaskDataSource,
) : TaskRepository {
    override fun listTasks(forceUpdate: Boolean): Flow<List<Task>> {
        return flow {
            emit(localTaskDataSource.listTasks())
            if (forceUpdate) {
                val tasks = remoteTaskDataSource.listTasks()
                emit(tasks)
                tasks.forEach { localTaskDataSource.saveTask(it) }
            }
        }
    }

    override suspend fun getTask(id: String, forceUpdate: Boolean): Task? {
        if (forceUpdate) {
            val task = remoteTaskDataSource.getTask(id)
            if (task != null) {
                localTaskDataSource.saveTask(task)
            }
        }
        return localTaskDataSource.getTask(id)
    }

    override suspend fun saveTask(task: Task) {
        localTaskDataSource.saveTask(task)
        remoteTaskDataSource.saveTask(task)
    }
}

This is a dumb down version of a standard task/todo list example where you are building a todo app. I'm sure you've seen before, but for clarity, let's go through it again together

  • Task is just a data class
  • TaskDataSource is an interface defines what a data source for tasks can do
  • RemoteTaskDataSource implements TaskDataSource and actually make API calls list/get/save tasks remotely
  • LocalTaskDataSource implements TaskDataSource and list/get/save tasks locally by SharedPreferences or Database
  • TaskRepository is an interface defines what a repository for tasks can do
  • DefaultTaskRepository implements TaskRepository, makes API calls through an instance of RemoteTaskDataSource and manages cache locally through LocalTaskDataSource
    • DefaultTaskRepository.listTasks() yield task list from local, if forceUpdate is true then yield task list from remote and save the updated tasks to local
    • DefaultTaskRepository.getTask() return a task with specific ID from either local or remote depend on the flag forceUpdate: Boolean
    • DefaultTaskRepository.saveTask() save a task locally then remotely

Sample scenario

Now let's see where each of the functions defined on TaskRepository might have been used.

  • DefaultTaskRepository.listTasks(true) would be called at task list screen, so local cache can be used later
  • DefaultTaskRepository.getTask("id", false) would be called when you enter task detail screen from the task list screen, so you can use local cache you recently fetched from remote seconds ago
  • DefaultTaskRepository.getTask("id", true) would be called when you enter task detail screen from a push notification outside the app, so you have the latest data from remote source
  • DefaultTaskRepository.saveTask() would be called at task detail screen after edit
  • DefaultTaskRepository.listTasks(false) would be called at task list screen after you navigate back from task detail screen, so you always get the latest data whether the task was actually modified or not

So what's wrong with it?

Nothing, as long as this exact pattern serves your requirements well. But things go south when your requirements differ, and you strictly enforce this pattern throughout your app. What if you don't need to show the cache data first? What if the data can be updated outside the app, and you always want the latest data from remote? Can you do this without local data source? What is the different between Repository and data source? Do you even need a remote data source then?

Let's explore some scenarios where this pattern can cause you problems and how to avoid them.

Not all DataSources are created equal

Both implementations of TaskDataSource are equal and all. But not all of them are. Take this UserDataSource for example

data class User(val id: String, val name: String)

interface UserDataSource {
    suspend fun getCurrentUser(): User
    suspend fun saveCurrentUser(user: User)
}

class LocalUserDataSource : UserDataSource {
    override suspend fun getCurrentUser(): User = TODO("get from local")

    override suspend fun saveCurrentUser(user: User) = TODO("save to local")
}

class RemoteUserDataSource : UserDataSource {
    override suspend fun getCurrentUser(): User = TODO("get from API")

    override suspend fun saveCurrentUser(user: User) {
        throw UnsupportedOperationException("RemoteUserDataSource does not support saveCurrentUser")
    }
}

interface UserRepository {
    suspend fun getCurrentUser(forceUpdate: Boolean): User
}

class DefaultUserRepository(
        val localUserDataSource: UserDataSource,
        val remoteUserDataSource: UserDataSource,
) : UserRepository {
    override suspend fun getCurrentUser(forceUpdate: Boolean): User {
        if (forceUpdate) {
            val user = remoteUserDataSource.getCurrentUser()
            localUserDataSource.saveCurrentUser(user)
        }
        return localUserDataSource.getCurrentUser()
    }
}

After a user logged in, you typically fetch a User from API, right? Then you save locally for later use without having to fetch a second time. In this scenario, the API to saveCurrentUser() doesn't exist. You have 2 problems here.

  1. An implementation denies to perform an operation it claims to conform, even worse if you are not throwing an exception and just do nothing, you'll never know your data is not getting saved!

    • Possible solution would be not to declare saveCurrentUser() on the interface
  2. Imagine when you call constructor and mess up the argument order because they both are the exact same interface, like this DefaultUserRepository(RemoteUserDataSource(), LocalUserDataSource()). Big trouble here, you get from local and save to remote, wait, what? I'm confused.

    • Possible solution would be to accept a concrete class instead of an interface when you want specific behavior

Not everything is Repository's responsibilities

Repository pattern in Android is somewhat more relaxed than any other framework. For example, in Spring Framework, the Repository definition is declared (and implemented) by the framework itself. Phoenix Framework uses Ecto.Repo which is a module provide by the library. But in Android, you are free to implement whatever you want in a Repository, so it is easy to give it responsibility it shouldn't have been responsible for.

Repository should not prepare data for a specific scenario

You can see this coming from miles away. The requirement "local data first, remote data later" itself is fine, but it shouldn't be done by a repository. A repository is supposed to just get the data, it should not know how to hand you data from 2 different sources for a specific screen. It should have been ViewModel's responsibility.

interface BetterTaskRepository {
    suspend fun listTasks(forceUpdate: Boolean): List<Task>
    suspend fun getTask(id: String, forceUpdate: Boolean): Task?
    suspend fun saveTask(task: Task)
}

class DefaultBetterTaskRepository(
        val localTaskDataSource: LocalTaskDataSource,
        val remoteTaskDataSource: RemoteTaskDataSource,
) : BetterTaskRepository {
    override suspend fun listTasks(forceUpdate: Boolean): List<Task> {
        if (forceUpdate) {
            val tasks = remoteTaskDataSource.listTasks()
            tasks.forEach { localTaskDataSource.saveTask(it) }
        }
        return localTaskDataSource.listTasks()
    }

    override suspend fun getTask(id: String, forceUpdate: Boolean): Task? {
        if (forceUpdate) {
            val task = remoteTaskDataSource.getTask(id)
            if (task != null) {
                localTaskDataSource.saveTask(task)
            }
        }
        return localTaskDataSource.getTask(id)
    }

    override suspend fun saveTask(task: Task) {
        localTaskDataSource.saveTask(task)
        remoteTaskDataSource.saveTask(task)
    }
}

class TaskListViewModel(val betterTaskRepository: BetterTaskRepository) : ViewModel() {
    val tasks: MutableLiveData<List<Task>> = MutableLiveData(listOf())
    fun getTaskList() {
        viewModelScope.launch {
            tasks.value = betterTaskRepository.listTasks(false)
            tasks.value = betterTaskRepository.listTasks(true)
        }
    }
}

Repository is not responsible for State Management

If you have some web frontend background, you know exactly what I'm talking about, for native mobile folks, let me explain.

Remember the sample scenarios where you only get local data because you know you have the latest data available? The thing is you are not using the "cache" as a cache, the local data must be there in order to work correctly.

The problem is there is a specific flow, specific sequence of function calls needed to make the next function call in another specific screen/flow works correctly. The worse thing is this connection is undocumented, you can't imply this just from reading individual piece of code, you need to know all the components involved in the whole flow to understand. Imagine if there are new screen/flow calling functions from your repository out of order, the "cache" will be messed up and the existing screens will fail to work without anybody touching them.

The actual problem we are solving here is "How to share same piece of data across different screen without going insane", also known as "State Management". This is a common issue found in every frontend development, web, native app, or cross-platform. Each of them have different approach to solve this. For example, React's Redux uses one big single immutable state object for the whole webapp, there are also MobX and many other patterns I have no idea how they work. In Flutter, you can simply "lift the state up" or even choose to adopt Redux's idea from React.

In Android, using any sort "shared" data was discouraged in the past. I remember vividly back when I got countless crashes from using a simple Singleton due to Android's lifecycle and the fact that I had not implemented onSaveInstanceState() properly. Plus the fact that passing data between Activity or Fragment properly is a tedious task. I guess that is why the majority of Android community solves this problem using a persistence storage instead.

But this is 2022 people! We have several options now. We have ViewModel, it can out lives a Fragment. We can have a shared ViewModel across Fragments by scope it to an Activity. We can even have a shared ViewModel scope to a nav graph! Or you can take a different approach and utilize DI to control the scope.

Now, let's see a possible solution for this "State Management" problem.

class TaskListViewModel(val betterTaskRepository: BetterTaskRepository) : ViewModel() {
    val tasks: MutableLiveData<List<Task>> = MutableLiveData(listOf())
    fun getTaskList() {
        viewModelScope.launch {
            tasks.value = betterTaskRepository.listTasks(false)
            tasks.value = betterTaskRepository.listTasks(true)
        }
    }

    fun saveTask(task: Task) {
        viewModelScope.launch {
            betterTaskRepository.saveTask(task)
            val newTaskList = tasks.value!!.map {
                if (task.id == it.id) {
                    task
                } else {
                    it
                }
            }
            tasks.value = newTaskList
        }
    }
}

class TaskListFragment : Fragment() {
    val viewModel: TaskListViewModel by activityViewModels()

    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        viewModel.getTaskList()
    }

    override fun onViewCreated(view: View, savedInstanceState: Bundle?) {
        super.onViewCreated(view, savedInstanceState)
        viewModel.tasks.observe(viewLifecycleOwner) { taskList: List<Task>? ->
            // set view task list
        }
    }
}

class TaskDetailFragment : Fragment() {
    val viewModel: TaskListViewModel by activityViewModels()
    val args: Args by navArgs()
    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        viewModel.tasks.observe(viewLifecycleOwner) { taskList: List<Task>? ->
            val task = taskList?.first { it.id == args.taskId }
            // set view task detail
        }
    }

    fun saveTask(task: Task) {
        viewModel.saveTask(task)
    }

    class Args(val taskId: String) : NavArgs
}
  • TaskListViewModel supports both task listing for TaskListFragment and save task for TaskDetailFragment
  • TaskListFragment and TaskDetailFragment lives under the same Activity
  • They both share the same instance of val viewModel: TaskListViewModel by activityViewModels()
  • They both obviously observe the same piece of data. Unlike the use of "cache" LocalDataSource where you don't know they read the same piece of data at first glance.
  • Navigate from TaskListFragment to TaskDetailFragment passing only an ID as an argument
  • TaskDetailFragment observe the same list as TaskListFragment but use only the one with the same ID passed
  • When TaskDetailFragment save the task, TaskListViewModel ensure the task on the list gets updated correctly
  • Navigate back from TaskDetailFragment to TaskListFragment, no need to do anything because they observe the same list which is always up-to-date

The major benefit is everything is obvious, not hiding under layers of abstraction, and the risk of anyone messing up the "cache" is zero. Because the LocalDataSource truly serve as a cache for TaskListViewModel.getTaskList()

What if you only need data from RemoteDataSource?

What if your app has announcement like a popup or banners that only need the latest data from remote. What would your repository be?

data class Popup(val title: String, val imageUrl: String)
data class Banner(val title: String, val imageUrl: String, val content: String)

interface AnnouncementDataSource {
    suspend fun getPopup(): Popup
    suspend fun getHomeScreenBanners(): List<Banner>
    suspend fun getMarketPlaceScreenBanners(): List<Banner>
}

interface RetrofitAnnouncementApi {
    @GET("popup")
    suspend fun getPopup(): Popup

    @GET("banners/home")
    suspend fun getHomeScreenBanners(): List<Banner>

    @GET("banners/marketplace")
    suspend fun getMarketPlaceScreenBanners(): List<Banner>
}

class RemoteAnnouncementDataSource(val retrofitAnnouncementApi: RetrofitAnnouncementApi) : AnnouncementDataSource {
    override suspend fun getPopup() = retrofitAnnouncementApi.getPopup()
    override suspend fun getHomeScreenBanners() = retrofitAnnouncementApi.getHomeScreenBanners()
    override suspend fun getMarketPlaceScreenBanners() = retrofitAnnouncementApi.getMarketPlaceScreenBanners()
}

interface AnnouncementRepository {
    suspend fun getPopup(): Popup
    suspend fun getHomeScreenBanners(): List<Banner>
    suspend fun getMarketPlaceScreenBanners(): List<Banner>
}

class DefaultAnnouncementRepository(
        val remoteAnnouncementDataSource: RemoteAnnouncementDataSource
) : AnnouncementRepository {
    override suspend fun getPopup() = remoteAnnouncementDataSource.getPopup()
    override suspend fun getHomeScreenBanners() = remoteAnnouncementDataSource.getHomeScreenBanners()
    override suspend fun getMarketPlaceScreenBanners() = remoteAnnouncementDataSource.getMarketPlaceScreenBanners()
}

class HomeViewModel(val announcementRepository: AnnouncementRepository) : ViewModel() {
    val bannerList: MutableLiveData<List<Banner>> = MutableLiveData(listOf())
    fun getBanners() {
        viewModelScope.launch {
            val banners = announcementRepository.getHomeScreenBanners()
            bannerList.value = banners
        }
    }
}

class HomeFragment : Fragment() {
    val viewModel: HomeViewModel by viewModel()
    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        viewModel.getBanners()
    }

    override fun onViewCreated(view: View, savedInstanceState: Bundle?) {
        super.onViewCreated(view, savedInstanceState)
        viewModel.bannerList.observe(viewLifecycleOwner) {
            //set view
        }
    }
}
  • Popup and Banner are just data classes
  • AnnouncementDataSource defines what kind of announcement you can get
  • RetrofitAnnouncementApi is an interface for Retrofit to create an implementation
  • RemoteAnnouncementDataSource implements AnnouncementDataSource accept RetrofitAnnouncementApi as a constructor parameter
  • RemoteAnnouncementDataSource get the actual data from Retrofit interface
  • No LocalAnnouncementDataSource because it is not needed
  • AnnouncementRepository defines what kind of announcement you can get
  • DefaultAnnouncementRepository implements AnnouncementRepository accept only RemoteAnnouncementDataSource as a constructor parameter
  • DefaultAnnouncementRepository delegates all the work to RemoteAnnouncementDataSource
  • HomeViewModel accept an interface AnnouncementRepository as a constructor parameter
  • HomeViewModel get and hold the data for HomeFragment to observe
  • HomeFragment call HomeViewModel.getBanners() at an appropriate lifecycle callback and observe HomeViewModel.bannerList to render UI

What's wrong with it?

The problem is obvious right? AnnouncementDataSource, RetrofitAnnouncementApi and AnnouncementRepository have the exact same functions. The only thing DefaultAnnouncementRepository and RemoteAnnouncementDataSource do is delegate the work down the chain.

Why on earth calling a simple API has to be this hard? You know in your heart that DefaultAnnouncementRepository and RemoteAnnouncementDataSource are useless, they are just middle men to satisfy the pattern. They are just more code for you to maintain, they don't really abstract anything for you because you need the latest data from API.

Possible solution

First, accept the fact that you only need data from API and just remove all middle men leaving you only useful components.

interface AnnouncementRepository {
    @GET("popup")
    suspend fun getPopup(): Popup

    @GET("banners/home")
    suspend fun getHomeScreenBanners(): List<Banner>

    @GET("banners/marketplace")
    suspend fun getMarketPlaceScreenBanners(): List<Banner>
}

class HomeViewModel(val announcementRepository: AnnouncementRepository) : ViewModel() {
    val bannerList: MutableLiveData<List<Banner>> = MutableLiveData(listOf())
    fun getBanners() {
        viewModelScope.launch {
            val banners = announcementRepository.getHomeScreenBanners()
            bannerList.value = banners
        }
    }
}

class HomeFragment : Fragment() {
    val viewModel: HomeViewModel by viewModel()
    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        viewModel.getBanners()
    }

    override fun onViewCreated(view: View, savedInstanceState: Bundle?) {
        super.onViewCreated(view, savedInstanceState)
        viewModel.bannerList.observe(viewLifecycleOwner) {
            //set view
        }
    }
}

Yes, you can just annotate your AnnouncementRepository interface and have Retrofit create an implementation for you.

But what about abstraction, unit testing all and those best practices?

Calm down, they are still intact.

The abstraction is still there. From ViewModel's perspective, it is getting data from an AnnouncementRepository, it doesn't know it is from Retrofit, or it is hand-coded DefaultAnnouncementRepository with all those middle men. To be honest, ViewModel doesn't have to know, you just give it an instance of AnnouncementRepository in your DI setup, and it'll work fine.

Unit testing is still possible. Since ViewModel accept an interface AnnouncementRepository, you can inject mock object just fine. Having some annotation doesn't mean it is not an interface anymore.

Wait, what if you want to change the implementation later?

Oh come on, do you really want to move away from Retrofit? Okay, fine, yes, you can. It is an interface, so you can just write a new implementation and plug it in whenever you want. After that you can safely delete the annotations from AnnouncementRepository and remove Retrofit from your project.

Read more on how retrofit "fits" in MVVM here

Not every Repository in a project needs to be the same

Well this is kind of you choice, really. You can adopt the same pattern throughout your project for consistency or pick and choose depend on the requirement.

Not every API call is a Repository

In Android, on paper, Repository is an abstraction over data source but in reality, we all know Repository is mostly API calls with some local cache. So you have a Repository like this.

interface LoginRepository {
    suspend fun login(username: String, password: String): String
    suspend fun verifyPin(pin: String)
    suspend fun requestOtpForgotPassword(username: String): String
    suspend fun verifyOtpForgotPassword(otp: String)
}

Let's take a step back and consult google search on "repository", thanks to Oxford Languages, we have the definition.

COMPUTING, a central location in which data is stored and managed.

Looks like our LoginRepository does a lot more than store and manage data, doesn't it. Does LoginRepository seem to be an appropriate name?

I don't think so.

If you are not convinced, let's take a look at what other framework's Repository looks like.

Spring Framework's Repository

Head over to our dear JVM friend, String Data JPA, and you'll see this

public interface CrudRepository<T, ID> extends Repository<T, ID> {
    <S extends T> S save(S entity);

    Optional<T> findById(ID primaryKey);

    Iterable<T> findAll();

    long count();

    void delete(T entity);

    boolean existsById(ID primaryKey);
    // … more functionality omitted.
}

Spring's Repository provides CRUD operations and query abstraction. There is no way you can implement LoginRepository in Spring using a Repository. You probably need at least 1 UserRepository, 1 OtpRepository and a LoginService for logic. But in reality you wouldn't implement auth yourself, Spring has security module.

Phoenix Framework's Repo

Phoenix Framework uses Ecto as its persistence framework. If you go read the documentation, it will tell you straight up Repo is how you work with database. It provides CRUD operations and query abstraction as well.

Ecto repositories are the interface into a storage system, be it a database like PostgreSQL or an external service like a RESTful API. The Repo module's purpose is to take care of the finer details of persistence and data querying for us. As the caller, we only care about fetching and persisting data. The Repo module takes care of the underlying database adapter communication, connection pooling, and error translation for database constraint violations.

Possible solution

Name it LoginService, AuthService or AuthApi just the way it is, don't abstract things that aren't there for the sake of pattern, the use of interface is enough for flexibility and testability.

Of course, I'm not telling you to drop the whole Repository altogether, just use it where it makes sense, TasksRepository is a good example, it basically provides CRUD operations, a repository by the book. Another example AnnouncementRepository is kind of a mix bag, you have only read operation but with different type Popup and Banner, I'd feel at ease name it AnnouncementApi or split into PopupRepository and BannerRepository, I don't know, this is hard. But LoginRepository is a no no.

Conclusion

Repository Pattern is used to abstract the underlying data source(s), typically a table in a database, to separate database logic from business logic. In some case, often in mobile apps, Repository abstract a group of API calls to access database behind a backend.

Patterns are invented to solve problems, use the right pattern to solve your problem, don't adopt a pattern to a problem you've never had.