Using big data to identify tax risk

By William Brink, CPA, Ph.D., and Victoria Hansen, CPA, Ph.D.

Editor: Annette Nellen, Esq., CPA, CGMA

Technology allows for the capture of immense amounts of data, as well as various tools for analyzing them. To harness the power of this phenomenon, accountants need to know what data exist and where and how to use them to improve their audit, tax, and business consulting work. Accounting students need to gain familiarity and practice with big data in their accounting, audit, and tax studies. This column provides background on usable data readily available from the IRS and a case study in which students can use the IRS data to assess audit risk.

Big data and tax

The increased use of technology has generated tremendous amounts of electronic information. This information, often referred to as big data, provides significant opportunities for those able to identify and use it. Because the amount of information collected is extensive, companies need help analyzing and interpreting the captured data, as well as assistance in finding ways to use the information in those data to improve business processes. As business advisers, accountants are strategically placed to capitalize on this opportunity. Public accounting firms also have the opportunity to use big data to improve their own business processes and provide more efficient and effective client services. All of the Big Four accounting firms and many other large public accounting firms have specialized teams dedicated to using data analytics and big data to solve complex business problems.

To take advantage of these opportunities, accountants need to develop the skills to work with big data. Professional accounting organizations, such as the AICPA and the American Accounting Association, have taken steps to inform and provide their members with numerous learning opportunities, such as by hosting annual big data conferences, webinars, and seminars on working with and analyzing big data, and creating numerous big data and data analytic continuing education courses.

At the university level, data analytics courses and degrees have been created. In addition, big data and data analytics have been incorporated into accounting curricula. While audit courses have found ways to incorporate analytics and big data into their courses, finding ways to incorporate big data into the tax classroom curriculum has been more challenging.

A review of accounting education journals shows zero data analytics case studies dedicated to tax-specific topics. One challenge tax professors face when attempting to incorporate data analytics into their courses is time constraints. Therefore, it is important for class projects to complement the curriculum rather than detract from it.

From a practical standpoint, one easy way to illustrate to students how big data is used in a tax setting is to discuss how the IRS and state tax agencies use data from millions of tax returns to statistically create discriminant analysis functions that assess a likelihood of noncompliance. This allows tax authorities to more efficiently conduct audit selections. Unfortunately, professors do not have access to the microdata that are used for this complex statistical analysis, but it is possible to introduce students to these data in macro aggregate form.

The IRS maintains and publishes statistical information gathered from various tax filings. This information is freely accessible on the IRS Statistics of Income (SOI) webpage. Tax professionals can use these free statistical data to add value to client engagements by using them to assess a client's tax audit risk (the risk of being audited). The SOI database provides a plethora of data related to many types of tax returns (e.g., individual, business, and estate) in Microsoft Excel format that can easily be used by the public. Introducing students to the SOI database will give them a greater appreciation for big data and allow them to see how client risk can be analyzed. These data can also be used to obtain an understanding of the size and number of types of taxpayer entities, and various elements of taxable income.

The accompanying case study is designed to give accounting students and entry-level accounting staff experience using statistical information to assess tax audit risk, reviewing individual tax returns, and working with Excel. The comprehensive nature of the case study allows the user to look beyond the impact of income and expense items on overall tax liability to understanding how positions taken on a tax return may impact tax audit risk. Users must verify the accuracy of tax return amounts, prepare review notes, calculate corrected amounts, use the SOI data to analyze audit risk, and answer questions by working with the data.

The audit risk assessment portion of the case requires the user to prepare an analysis using Excel, thus reinforcing spreadsheet skills. Using Excel in this case provides two benefits. First, incorporating Excel into the class is consistent with the AICPA's Model Tax Curriculum (available at, which notes the need for students to develop the technological skills necessary to be successful in the tax profession. Second, students gain experience working with a program (Excel) that practitioners indicate is an essential tax planning and compliance tool.

The accompanying case study may be used in its entirety as an individual income tax assignment, and all case study materials are provided for this purpose. The case study may also be used solely as a guideline for how the use of the SOI data and tax risk assessment can be taught in the classroom. Instructors can apply Parts 2 and 3 of the case study to any tax return project they currently use.

Readers can download sample instructions for students, a staff-prepared return, to-do notes on the return, and an answer key.

IRS Statistics of Income

Sec. 6108 requires the IRS to maintain statistics regarding the operation of the income tax laws. Under this section, the IRS must prepare and publish at least annually tax statistics related to classifications of taxpayers and of income; the amounts claimed or allowed as deductions, exemptions, and credits; and other facts deemed pertinent and valuable. To fulfill this mandate, the IRS collects data from various tax return and other filings, such as individual income tax, corporate income tax, and not-for-profit annual returns. There is a lag between the current year and the data's availability. Statistics are then compiled from these returns based on stratified probability samples, using classes such as size of income, industrial activity, and filing status. The statistical information is analyzed and reported annually in various IRS reports and is available online at the SOI website.

The SOI's main reporting areas are individuals, businesses, charitable and exempt organizations, and IRS operations. Statistical data in these areas are reported in both Excel tables and in publications that may include additional discussion of the data (such as in the annual IRS Data Book and the quarterly Statistics of Income Bulletin).

For individuals, data are available regarding income, estate and gift, and international tax filings. The statistics related to income tax report items such as sources of income and average income, exemptions, deductions, taxable income, income tax, tax credits, and tax payments reported. Data can be viewed on a copy of the tax form (Form 1040, U.S. Individual Income Tax Return) or in Excel tables. The annual Individual Income Tax Returns Line Item Estimates (Publication 4801) reports on a Form 1040 the number of tax returns filed with an input on each specific line item, as well as the aggregate amount reported on each line item. The Excel tables report the same data by taxpayer adjusted gross income (AGI), taxpayer filing status, geographic location, and other parameters. Additional stratification is done to report information for only those taxpayers who reported itemized deductions, for example, or for only those taxpayers who had a Form W-2, Wage and Tax Statement. Statistics related to individual income tax compliance, such as the use of paid preparers, the frequency of use for specific forms, and the number of individuals who e-file, is also available.

Business-related data are broken down in many useful ways as well. Information for income and international tax filings is provided. This information is reported in aggregate for all businesses, and separately by business type (C corporation, S corporation, partnership, or sole proprietorship) or by type of form filed (e.g., Forms 1120, 1120-A, 1120S, etc.). Line item estimate publications are available for the various business returns, showing aggregate income and expense amounts and balance sheet amounts reported on the applicable tax form. Excel tables report similar data by sector or industry and by size.

The IRS SOI also provides a source for charitable and exempt organizations' tax-filing information. Income and expense amounts, as well as balance sheet information, are reported by type of charity (public, private foundation, etc.) or by tax form filed (e.g., Form 990, 990-EZ, or 990-PF). Average unrelated business taxable income reported and excise taxes paid by type of charity are also available.

Finally, information regarding the operations of the IRS and general tax compliance is available on the SOI website. Operational information such as the number of taxpayers assisted and the number of returns examined is reported by tax type and type of examination. Also available are data on the number of returns filed, the number of returns filed electronically, amounts collected, refunds paid, etc. The IRS also publishes data on the U.S. tax gap (available at­statistics/irs-the-tax-gap). The tax gap is a measure of noncompliance in monetary terms. The tax gap data are broken down by tax type: individual income tax, business income tax, employment tax, estate tax, and excise tax.

The SOI data contain a wealth of information of value to academics and the tax profession. The case study described here focuses on the individual income tax statistics. Instructors using this case will need to provide a brief lecture on tax audit risk assessment and on how to use the SOI data to assess risk.

As discussed above, the SOI data report individual tax return information by filing status and by AGI. The statistical information provided by filing status includes average income, deduction, tax, tax credit, and payment information by each available filing status. Tax preparers can compare these averages to amounts reported on the tax return they are reviewing to determine areas at risk for audit. Part 2 of the case study requires students to assess audit risk by comparing the taxpayers' Form 1040 to the SOI data for married-filing-jointly filers for 2015. Since the taxpayers in the case study itemize deductions, students should use Table 2.2 (available at

Tax preparers can also use the statistical information sorted by AGI to compare the tax return they are reviewing to the average income, deduction, tax, tax credit, and payment amounts reported by people with similar AGI amounts in prior years. The AGI statistics can also be used to look for statistical patterns. Part 3 of the case study requires students to search the AGI-ordered tables for statistical patterns and to use those patterns and their knowledge of individual income tax to answer selected questions. To answer these questions, students will use Table 1.4 (available at and Table 3.3 (available at

Case overview

The case requires students to assume the role of an entry-level tax professional in a public accounting firm. As a new hire, students are given the task of reviewing the accuracy of the previously prepared individual 2016 federal income tax return of John and Jane Harrison, two hypothetical clients. Students are informed that the tax return was prepared by an overseas staff member of their firm. Client information and original tax documents, as well as a copy of the Form 1040 with supporting forms and schedules, are provided.

The exercise is designed for use in an undergraduate-level federal tax course toward the completion of the individual income tax components. Students in this class should have the knowledge base to successfully complete the case. Additionally, the material covered in the case mirrors that of an individual income tax course, allowing instructors to use the case to reinforce classroom concepts through experiential learning. It is also well-suited for public accounting firms to use during staff training, since many of those individuals will be responsible for reviewing previously prepared tax returns and analyzing tax audit risk.

To successfully complete the case, users must review client information and source documents to determine the accuracy of a previously prepared tax return. As part of their review, users prepare review notes for the overseas staff member who prepared the tax return, including explanations of noted errors and corrected Form 1040 amounts. They must also use the SOI data to analyze the tax audit risk of the corrected Form 1040 line items using a variance-analysis approach. Finally, to reinforce critical thinking and comprehension of the tax material, the users apply their understanding of individual income tax law to identify patterns within the SOI data.

Users are asked to prepare their tax audit risk analysis using Excel, allowing students to further develop their skills and confidence with this software tool and to see its value in practice. Instructors have the flexibility to tailor the requirements for student-created worksheets to include those Excel skills they want students to demonstrate, such as requiring the use of Excel's "IF" functions.

The case study

The case study packet includes the assignment and relevant materials in ­Microsoft Word and Adobe Acrobat. The solution files include (1) review notes for incorrect return items for the preparer; (2) an assessment of the clients' tax risk, including a comparison of the corrected Form 1040 amounts to the 2015 SOI data for married-filing-jointly taxpayers; and (3) answers to the questions in Part 3.

The case focuses on federal individual tax law and compliance. Students are provided background client information, including names, addresses, dates of birth, and Social Security numbers for the fictional husband-and-wife taxpayers and their children. The taxpayers' employment and health insurance coverage status are provided. The fictional husband is self-employed. Students are provided background information about his sole proprietorship and the business's 2016 income statement. Students are also informed the husband has a simplified employee pension individual retirement arrangement (SEP-IRA) and would like to make a $5,000 contribution. The fictional wife is employed as a human resources executive. Students are provided with her Form W-2 and informed that she is covered by her employer's pension plan.

The taxpayers in the case study also earn income from investments and rental property. Students are provided with documents and information regarding this income. The case includes the taxpayers' 2016 year-end statement from their brokerage firm. The statement provides summary and detail information for Forms 1099-INT, Interest Income; 1099-DIV, Dividends and Distributions; and 1099-B, Proceeds From Broker and Barter Exchange Transactions. Also included in the case is background information on the rental property and the 2016 rental income and related rental expenses.

Information and documents regarding the taxpayers' expenses are also included. Students are provided a listing of personal expenses and estimated tax payments made by the taxpayers. Moving expenses and day care costs incurred by the taxpayers are included. Finally, home mortgage interest paid (Form 1098, Mortgage Interest Statement) and tuition payments made (Form 1098-T, Tuition Statement) are also provided.

The requisite tasks of the case study are as follows:

Part 1

Use the provided client information to review the accuracy of the taxpayers' 2016 tax return as prepared by the overseas staff member. Provide a "to-do" list for the overseas staff member, including a list of items on the return that are incorrect, and clear explanations. Also provide the staff member with corrected amounts for Form 1040.

Part 2

In Excel, use your corrected amounts to compare the taxpayers' Form 1040 to the SOI data. Compute variances using the SOI data for married-filing-jointly filers who itemized in 2015 (use the "by filing status" table). For any variances greater than $5,000, note whether the variance increases, decreases, or has no impact on the taxpayers' audit risk. Explain your answer.

Part 3

Your manager is excited you are using the SOI data to assess audit risk and is interested to know what other information can be retrieved from the SOI database. He has asked you to use the Statistics of Income for individual income tax to answer the following questions (use the by-AGI tables). Provide answers using the All Returns information:

  1. At what dollar amount of AGI do exemptions start to decrease? Why might that be?
  2. At what dollar amount of AGI do itemized deductions substantially increase? Why might that be?
  3. At what dollar amount of AGI does the percentage of Social Security that is taxable flatten out? Why might that be?
  4. At what dollar amount of AGI do Keogh deductions substantially increase? Why might that be?
  5. At what dollar amount of AGI do student loan interest deductions start to decrease? Why might that be?
  6. Describe the pattern of alternative minimum tax in relation to AGI. Why might this pattern exist?
  7. At what dollar amount of AGI does the nonrefundable education credit start to decrease? Why might that be?
  8. Based on the statistics, is there a tax credit that does not appear to phase out with AGI?

Explanation of SOI tables

Data in the SOI tables may need to be manipulated when completing Parts 2 and 3. Money amounts reported in the tables are rounded to the nearest $1,000. In addition, income/deduction/credit items in the tables are the total amounts for each item reported on all tax returns. The total number of returns that reported each item is also listed. To compute income/deduction/credit amount per return, the total amount should be divided by number of returns reporting the item and then multiplied by 1,000.

Case variations

Several modifications can be made to the case study, allowing the case to be used in different settings and at varying levels of difficulty. The case study includes income and expenses that are commonly seen for individual taxpayers; however, as previously mentioned, instructors may use the case study as merely a guideline for bringing big data, the SOI database, and tax risk into the classroom. Instructors can substitute their own individual tax return assignment for the tax return in Part 1 of the case study. This provides instructors the flexibility to cover specific income/expense items they want to stress in their course, as well as the ability to embed errors they prefer.

Instructors covering corporate or flowthrough entity taxation may wish to substitute a corporate or flowthrough tax return assignment for Part 1 of the case study. (Note that a substitution such as this would also require the instructor to change the SOI tables referred to in Part 2 of the case study to analyze tax risk and, if desired, change the questions in Part 3 to better reflect the income tax area they are teaching.)

To enhance the variance-analysis component of this assignment in Part 2, instructors can ask students to conduct this investigation in different ways. Variances of interest could be based on a percentage (rather than $5,000). Variances could be analyzed as being conditional on other line items. For example, a tax preparer would not expect a self-employment tax deduction without the presence of income from a Schedule C, Profit or Loss From Business, or Schedule E, Supplemental Income and Loss. The variance analysis could be made more complex to account for these relationships.

Instructors who want to add more writing content to their course could add a writing element to Part 2 of the case study. Students can be required to write a letter to the taxpayers explaining the tax audit risks identified in their analysis. The letter could also include explanations for how the taxpayers can reduce their audit risk and the types of documentation the taxpayers should keep for IRS audit purposes. This assignment could be used to start a classroom discussion of IRS audit policies and procedures.

In the university setting, the case offers educators an opportunity to invite tax practitioners into the classroom to discuss their own experiences reviewing tax returns and assessing tax audit risk. Tax practitioners can be invited to introduce the topic of tax audit risk and the information provided within the SOI database. Another option would be for practitioners to participate in a debriefing exercise, discussing their review and risk assessment process and the similarities of the tasks in the current case to their own practice.

Applying real data to real-life situations

As the amount of available data increases, the push to find ways to use big data by every area of accounting increases. For tax practices, one way to provide value-added services using big data is through tax risk assessment. Identifying and resolving areas of risk early in a tax compliance engagement can save a lot of time, money, effort, and headache for both the tax professional and his or her client. This case study provides students the opportunity to learn how to use the SOI data to assess tax audit risk and to apply their individual income tax law knowledge to search for patterns in the SOI data.

This case also provides students with valuable experience reviewing individual tax returns and writing review notes. The primary role of a new tax professional on a tax compliance engagement can range from inputting initial tax data into the tax return to reviewing work completed by another staff member, whether that is an intern, an offshore office, or even artificial intelligence. In addition, the case study allows students to see how Excel can be effectively used in a tax practice. Requiring users to create their own Excel worksheet provides students with practical experience using a tool they will most likely need to use when they enter the workforce. Finally, the case provides an opportunity to involve local practitioners in the classroom to further discuss tax risk assessment.   



William Brink is an assistant professor of accountancy at the Farmer School of Business at Miami University in Oxford, Ohio. Victoria Hansen is an associate professor of accountancy with the Cameron School of Business at the University of North Carolina Wilmington in Wilmington, N.C. Annette Nellen is a professor in the Department of Accounting and Finance at San José State University in San José, Calif., and is the chair of the AICPA Tax Executive Committee. For more information about this column, please contact


Tax Insider Articles


Business meal deductions after the TCJA

This article discusses the history of the deduction of business meal expenses and the new rules under the TCJA and the regulations and provides a framework for documenting and substantiating the deduction.


Quirks spurred by COVID-19 tax relief

This article discusses some procedural and administrative quirks that have emerged with the new tax legislative, regulatory, and procedural guidance related to COVID-19.