Skip to content

Commit b1126c4

Browse files
committed
passing all the tests
1 parent 984dada commit b1126c4

5 files changed

Lines changed: 482 additions & 17 deletions

File tree

docs/file_reader_refactoring.md

Lines changed: 179 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,179 @@
1+
# DefaultFileReader Class Refactoring
2+
3+
## Overview
4+
5+
The `DefaultFileReader` class has been refactored to improve testability, readability, and maintainability. This document outlines the changes made and how to use the new implementation.
6+
7+
## Key Improvements
8+
9+
### 1. **Separation of Concerns**
10+
- **File path resolution** is now handled by dedicated methods
11+
- **File opening** is separated from path resolution
12+
- **Configuration management** is centralized and configurable
13+
14+
### 2. **Enhanced Testability**
15+
- **Dependency injection** through constructor parameters
16+
- **Mockable methods** for unit testing
17+
- **Clear interfaces** between different responsibilities
18+
- **Comprehensive test coverage** with isolated test cases
19+
20+
### 3. **Better Error Handling**
21+
- **Custom exception hierarchy** for different error types
22+
- **Descriptive error messages** with context
23+
- **Proper exception chaining** for debugging
24+
25+
### 4. **Improved Configuration**
26+
- **Configurable defaults** that can be overridden
27+
- **Environment-specific settings** support
28+
- **Clear configuration contract**
29+
30+
### 5. **Enhanced Readability**
31+
- **Comprehensive docstrings** for all methods
32+
- **Clear method names** that describe their purpose
33+
- **Logical method organization** from public to private
34+
- **Type hints** throughout the codebase
35+
36+
## Class Structure
37+
38+
### DefaultFileReader
39+
The main class that provides the file reading framework:
40+
41+
```python
42+
class DefaultFileReader(BaseDataAccessLayer):
43+
# Configuration constants
44+
DEFAULT_CODE_PACKAGE = 'payload'
45+
DEFAULT_FILE_FOLDER = 'files'
46+
DEFAULT_CONFIG_FILE = 'config.json'
47+
48+
def __init__(self, code_package=None, file_folder=None, config_file=None):
49+
# Initialize with custom or default configuration
50+
51+
def file_open(self, file_name: str) -> io.TextIOWrapper:
52+
# Main public method for opening files
53+
54+
def get_search_locations(self) -> list[Path]:
55+
# Get all possible search locations
56+
```
57+
58+
## Exception Hierarchy
59+
60+
```python
61+
FileReaderError (base)
62+
├── FileNotFoundError (file not found in any location)
63+
└── FileAccessError (permission, I/O errors, etc.)
64+
```
65+
66+
## Usage Examples
67+
68+
### Basic Usage
69+
```python
70+
from datacustomcode.file.reader.default import DefaultFileReader
71+
72+
# Use default configuration
73+
reader = DefaultFileReader()
74+
with reader.file_open('data.csv') as f:
75+
content = f.read()
76+
```
77+
78+
### Custom Configuration
79+
```python
80+
from datacustomcode.file.reader.default import DefaultFileReader
81+
82+
# Custom configuration
83+
reader = DefaultFileReader(
84+
code_package='my_package',
85+
file_folder='data',
86+
config_file='settings.json'
87+
)
88+
```
89+
90+
### Error Handling
91+
```python
92+
try:
93+
with reader.file_open('data.csv') as f:
94+
content = f.read()
95+
except FileNotFoundError as e:
96+
print(f"File not found: {e}")
97+
except FileAccessError as e:
98+
print(f"Access error: {e}")
99+
```
100+
101+
## File Resolution Strategy
102+
103+
The file reader uses a two-tier search strategy:
104+
105+
1. **Primary Location**: `{code_package}/{file_folder}/{filename}`
106+
2. **Fallback Location**: `{config_file_parent}/{file_folder}/{filename}`
107+
108+
This allows for flexible deployment scenarios where files might be in different locations depending on the environment.
109+
110+
## Testing
111+
112+
### Unit Tests
113+
The refactored class includes comprehensive unit tests covering:
114+
- Configuration initialization
115+
- File path resolution
116+
- Error handling scenarios
117+
- File opening operations
118+
- Search location determination
119+
120+
### Mocking
121+
The class is designed for easy mocking in tests:
122+
```python
123+
from unittest.mock import patch
124+
125+
with patch('DefaultFileReader._resolve_file_path') as mock_resolve:
126+
mock_resolve.return_value = Path('/test/file.txt')
127+
# Test file opening logic
128+
```
129+
130+
### Integration Tests
131+
Integration tests verify the complete file resolution and opening flow using temporary directories and real file operations.
132+
133+
## Migration Guide
134+
135+
### From Old Implementation
136+
The old implementation had these issues:
137+
- Hardcoded configuration values
138+
- Mixed responsibilities in single methods
139+
- Limited error handling
140+
- Difficult to test
141+
142+
### To New Implementation
143+
1. **Update imports**: Use `DefaultFileReader` from `datacustomcode.file.reader.default`
144+
2. **Error handling**: Catch specific exceptions instead of generic ones
145+
3. **Configuration**: Use constructor parameters for custom settings
146+
4. **Testing**: Leverage the new mockable methods
147+
148+
## Benefits
149+
150+
### For Developers
151+
- **Easier debugging** with clear error messages
152+
- **Better IDE support** with type hints and docstrings
153+
- **Simplified testing** with dependency injection
154+
- **Clearer code structure** with separated responsibilities
155+
156+
### For Maintainers
157+
- **Easier to extend** with new file resolution strategies
158+
- **Better error tracking** with custom exception types
159+
- **Improved test coverage** with isolated test cases
160+
- **Clearer documentation** with comprehensive docstrings
161+
162+
### For Users
163+
- **More reliable** with proper error handling
164+
- **More flexible** with configurable behavior
165+
- **Better debugging** with descriptive error messages
166+
- **Consistent interface** across different implementations
167+
168+
## Future Enhancements
169+
170+
The refactored structure makes it easy to add:
171+
- **Additional file resolution strategies** (URLs, cloud storage, etc.)
172+
- **File format detection** and automatic handling
173+
- **Caching mechanisms** for frequently accessed files
174+
- **Async file operations** for better performance
175+
- **File validation** and integrity checking
176+
177+
## Conclusion
178+
179+
The refactored `DefaultFileReader` class provides a solid foundation for file reading operations while maintaining backward compatibility. The improvements in testability, readability, and maintainability make it easier to develop, test, and maintain file reading functionality in the Data Cloud Custom Code SDK.

src/datacustomcode/client.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,6 @@
1515
from __future__ import annotations
1616

1717
from enum import Enum
18-
import io
1918
from typing import (
2019
TYPE_CHECKING,
2120
ClassVar,
@@ -29,6 +28,8 @@
2928
from datacustomcode.io.reader.base import BaseDataCloudReader
3029

3130
if TYPE_CHECKING:
31+
import io
32+
3233
from pyspark.sql import DataFrame as PySparkDataFrame
3334

3435
from datacustomcode.io.reader.base import BaseDataCloudReader

src/datacustomcode/file/base.py

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,6 @@
1414
# limitations under the License.
1515
from __future__ import annotations
1616

17-
from abc import ABC
1817

19-
20-
class BaseDataAccessLayer(ABC):
21-
pass
18+
class BaseDataAccessLayer:
19+
"""Base class for data access layer implementations."""

src/datacustomcode/file/reader/default.py

Lines changed: 6 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -14,13 +14,15 @@
1414
# limitations under the License.
1515
from __future__ import annotations
1616

17-
import io
1817
import os
1918
from pathlib import Path
20-
from typing import Optional
19+
from typing import TYPE_CHECKING, Optional
2120

2221
from datacustomcode.file.base import BaseDataAccessLayer
2322

23+
if TYPE_CHECKING:
24+
import io
25+
2426

2527
class FileReaderError(Exception):
2628
"""Base exception for file reader operations."""
@@ -63,7 +65,7 @@ def __init__(
6365
self.file_folder = file_folder or self.DEFAULT_FILE_FOLDER
6466
self.config_file = config_file or self.DEFAULT_CONFIG_FILE
6567

66-
def file_open(self, file_name: str) -> io.TextIOWrapper:
68+
def file_open(self, file_name: str) -> "io.TextIOWrapper":
6769
"""Open a file for reading.
6870
6971
Args:
@@ -170,7 +172,7 @@ def _find_file_in_tree(self, filename: str, search_path: Path) -> Optional[Path]
170172
return file_path
171173
return None
172174

173-
def _open_file(self, file_path: Path) -> io.TextIOWrapper:
175+
def _open_file(self, file_path: Path) -> "io.TextIOWrapper":
174176
"""Open a file at the given path.
175177
176178
Args:
@@ -199,11 +201,3 @@ def get_search_locations(self) -> list[Path]:
199201
locations.append(config_path.parent.joinpath(self.file_folder))
200202

201203
return locations
202-
203-
204-
class BaseDataAccessLayer:
205-
"""Default implementation of the file reader.
206-
207-
This class provides the standard file reading behavior and can be
208-
easily mocked or subclassed for testing.
209-
"""

0 commit comments

Comments
 (0)